Application of CLT to MEL Band Analysis
The MEL frequency cepstral coefficients (MFCC)
is the predominant feature extraction method
used in human speech analysis [32,33].
After computation of FFT and
magnitude-squared of the bins to obtain the
raw spectrum , the next stage in MFCC processing
is the innter product of with the individual
MEL band functions, collected as the columns of matrix
.
These columns are shown for bands in Figure 5.5 for
FFT size and . Note that in Figure 5.5, the sum of the
MEL band functions (line on top) is a constant, which shows that
the requirement to contain an energy statistic is met.
Figure:
MEL band functions for . There are 24 bands including the
zero and Nyquist bands. Their sum, the flat line
on top, is a constant.
|
As we said, in CLT analysis, the primary difficulty
is finding the apropriate floating reference hypothesis,
parameterized by the mean
.
Because for MEL band analysis, there is no equivalent of the
AR spectral estimate, the problem of finding
is
more difficult. To find a suitable
, we
turn to maximum entropy spectral analysis.
We seek to maximize the spectral entropy (5.5)
under the constraint that
The remainder of the procedure follows Section 5.2.5.
See
software/module_mel_bank.m with method='clt',
or to test with
software/module_mel_bank_test.m with method='clt'.
Resynthesis is accomplished with
software/module_mel_bank_synth.m.