Classification experiments

This experiment vividly demonstrates the importance of PDF projection. We now try to build classifiers using the various feature chains that we compared above. Generating data from the two theoretical PDFs defined in Section 11.1.4 as “AR" and “MFCC" classes, we conducted a classification experiment using four classifiers:
  1. Neyman-Pearson. This classifier is just the optimal Neyman-pearson classifier using the known theoretical PDFs.
  2. PDF Projection. This classifier approximates the optimal Neyman-pearson classifier by approximating the theoretical PDFs using PDF projection. To approximate the “AR" data PDF, we used the AR feature chain. To approximate the “MFCC" data PDF, we used either the MFCC-SPA feature chain or the MFCC-ML feature chain. The feature PDFs were estimated using Gaussian mixtures (3 mixture components)
  3. Concatenation. This classifier concatenated the “AR" and “MFCC-ML" feature chain features into a single larger-dimensional feature vector, then classified using a Gaussian-mixture PDF classifier.
  4. Stacking. This classifier built separate Gaussian-mixture PDF classifiers for the “AR" and “MFCC-ML" feature chain feature sets, then combined the log-likelihoods.
We first evaluated the optimal performance using the theoretical Neyman-Pearson classifier constructed using the theoretical PDFs using (11.2) and the two circular PDF models (11.4) and (11.5).

Figure 11.5 (left), shows the theoretical AR model log likelihood on the X-axis and the theoretical MFCC log likelihood on the Y-axis for 100 samples each of MFCC data (circles) and AR data (dots). For perfect performance, the data from each class should remain on the correct side of the X=Y line. A few errors can be seen. The Optimal classification error probability was determined to be 1.68% using 80,000 test samples.

Figure 11.5 (right) shows the experiment repeated using the projected PDFs using AR and MFCC features. It is difficult to see a difference between the two plots. To obtain a more quantitative result, we need to measure error probability in trials.

Figure 11.5: One hundred generated data samples from each model. The theoretical log-likelihood of each sample is displayed for each model assumption (AR on X axis and MFCC on Y axis). Circles: MFCC data, Dots: AR data. Left: using theoretical PDFs. Right: using PDF projection.
\includegraphics[height=3.2in,width=3.2in]{lp100.eps} \includegraphics[height=3.2in,width=3.2in]{lp100g-1.eps}

Next, we re-ran the experiment using a variety of training sample sizes, measuring classification performance. We compared the method with (a) Neyman-Pearson (optimal) classifier, (b) additive combination of the AR and MFCC feature log-likelihood functions, sometimes called “stacking", and (c) feature concatenation in which the union of the AR and MFCC features was formed. The same feature PDF estimation approach was used as for the feature density $\hat{g}({\bf z})$ in PDF projection. We ran the experiments using both MFCC and MFCC-ML features. The results are shown in Figure 11.6 which shows the classification error probability in percent. For the left graph we used MFCC features, and for the right graph MFCC-ML features. After the optimal Neyman-Pearson classifier, PDF projection was best over-all, with MFCC-ML slightly better than MFCC. For MFCC features, feature concatenation showed about 35% more errors than PDF projection. For MFCC-ML, that ratio went up to about 50%. Likelihood stacking did much worse than feature concatenation, indicating that feature concatenation took advantage of the statistical dependence between the ML and AR features.

Figure 11.6: Classification performance as a function of training set size for 80,000 testing samples. Left: MFCC-SPA features. Right MFCC-ML features.
\includegraphics[height=4.0in,width=3.0in]{eres2-b.eps} \includegraphics[height=4.0in,width=3.0in]{eres1-b.eps}