Application of CLT to MEL Band Analysis

The MEL frequency cepstral coefficients (MFCC) is the predominant feature extraction method used in human speech analysis [32,33]. After computation of FFT and magnitude-squared of the bins to obtain the raw spectrum ${\bf x}$, the next stage in MFCC processing is the innter product of ${\bf x}$ with the individual MEL band functions, collected as the columns of matrix ${\bf A}$. These columns are shown for $N_c=24$ bands in Figure 5.5 for FFT size $N_t=768$ and $N=385$. Note that in Figure 5.5, the sum of the MEL band functions (line on top) is a constant, which shows that the requirement to contain an energy statistic is met.
Figure: MEL band functions for $N=768$. There are 24 bands including the zero and Nyquist bands. Their sum, the flat line on top, is a constant.
\includegraphics[height=1.6in,width=5.0in]{mfcc1.eps}

As we said, in CLT analysis, the primary difficulty is finding the apropriate floating reference hypothesis, parameterized by the mean $\bar{{\bf x}}$. Because for MEL band analysis, there is no equivalent of the AR spectral estimate, the problem of finding $\bar{{\bf x}}$ is more difficult. To find a suitable $\bar{{\bf x}}$, we turn to maximum entropy spectral analysis. We seek to maximize the spectral entropy (5.5) under the constraint that ${\bf A}^\prime \bar{{\bf x}}={\bf z}.$ The remainder of the procedure follows Section 5.2.5. See software/module_mel_bank.m with method='clt', or to test with software/module_mel_bank_test.m with method='clt'. Resynthesis is accomplished with software/module_mel_bank_synth.m.

Example 10   We now compare the J-function for MEL band analysis computed using CLT and SPA. Figure 5.6 shows the result of comparing the J-function from software/module_A_chisq.mwith software/module_mel_bank.m. See software/module_A_chisq_test.m with task='compare'.
Figure 5.6: Jout comparison for MEL band analysis using SPA and CLT.
\includegraphics[width=4.5in,height=3.0in, clip]{test_mel_clt.eps}