Application of CLT to MEL Band Analysis

The MEL frequency cepstral coefficients (MFCC) is the predominant feature extraction method used in human speech analysis [26,27]. After computation of FFT and magnitude-squared of the bins to obtain the raw spectrum $ {\bf x}$, the next stage in MFCC processing is the innter product of $ {\bf x}$ with the individual MEL band functions, collected as the columns of matrix $ {\bf A}$. These columns are shown for $ N_c=24$ bands in Figure 5.5 for FFT size $ N_t=768$ and $ N=385$. Note that in Figure 5.5, the sum of the MEL band functions (line on top) is a constant, which shows that the requirement to contain an energy statistic is met.
Figure: MEL band functions for $ N=768$. There are 24 bands including the zero and Nyquist bands. Their sum, the flat line on top, is a constant.

As we said, in CLT analysis, the primary difficulty is finding the apropriate floating reference hypothesis, parameterized by the mean $ \bar{{\bf x}}$. Because for MEL band analysis, there is no equivalent of the AR spectral estimate, the problem of finding $ \bar{{\bf x}}$ is more difficult. To find a suitable $ \bar{{\bf x}}$, we turn to maximum entropy spectral analysis. We seek to maximize the spectral entropy (5.5) under the constraint that $ {\bf A}^\prime \bar{{\bf x}}={\bf z}.$ The remainder of the procedure follows Section 5.2.5. See software/module_mel_bank.m with method='clt', or to test with software/module_mel_bank_test.m with method='clt'. Resynthesis is accomplished with software/module_mel_bank_synth.m.

Example 10   We now compare the J-function for MEL band analysis computed using CLT and SPA. Figure 5.6 shows the result of comparing the J-function from software/module_A_chisq.mwith software/module_mel_bank.m. See software/module_A_chisq_test.m with task='compare'.
Figure 5.6: Jout comparison for MEL band analysis using SPA and CLT.
\includegraphics[width=4.5in,height=3.0in, clip]{test_mel_clt.eps}

Baggenstoss 2017-05-19