Cepstral analysis is a widely-used means to extract meaningful information about the spectrum. The most widely-used method is the MEL frequency cepstral coefficients (MFCC) . They generally consist of these steps:
Rather than truncating the DCT coefficients, we believe it is better to use fewer but fatter spectral band functions, then keep all the DCT bins. This produces ``tighter" features from an information point of view, as we will see in Section 10.1. The first (zero-frequency) DCT coefficient is often left out in practice. As leaving this out disturbs the energy statistic (Section 3.2.2), we prefer to keep it. If the practitioner wishes to ignore the information in the zero-frequency DCT coefficient, this can be handled by assigning a fixed prior distribution to it in the PDF estimation step. This will have the same effect in the classifier as eliminating it altogether.
The factor with the most influence is the number shape, and spacing of the spectral band functions. The cepstral features can be produced (excepting the last DCT step) using the following module chain (assume x is the input data vector):
[y,jout]=module_dftmsq(x,0); [w,jout]=module_mel_XXX(y,jout,...); [u,jout]=module_log(w,jout); z=dct(u);These four modules correspond one-to-one with the first four steps Note that the DCT step requires no modification of the J-function since it is an invertible transformation with Jacobian . Above, module_mel_XXX can be either module_mel_bank, which is described in Sections 5.2.6 and 5.2.3, and in Section 5.3.7 in the context of sampling inversion. Or, module_mel_XXX can be module_mel_ml which is described in Section 5.2.9. These modules have options to produce linear of MEL-spaced spectral band functions. But, even when module_mel_bank and module_mel_ml are set up to produce the same bank of band functions, they produce different features. Let be the raw spectral vector, and be the matrix of spectral band functions. As we have explained in Section 5.2.9, the first produces the closed-form feature , and the second iteratively seeks the that approximates through . We'll compare these in Section 10.1.
The band spacing and shape of the spectral band functions are controlled by the two variables fs, type. fs is the assumed sample rate for setting up the spacing and center frequencies of the MEL band function. The variable type controls whether the bands are linear spaced, triangular or hanning-shaped as follows:
% type | shape | spacing % --------------------------------- % 0 | hanning | MEL % 1 | triangular | MEL % 2 | hanning | linear % 3 | triangular | linearNote that the end bin values are set to ``half", so the sum of the band functions at the zero and Nyquist frequency bins will be 1/2, while it will be 1 in all others. This preserves in the features the total energy at the DFT input, i.e. it controls the energy statistic.
If we are going to truncate the DCT from down to coefficients, we model it as a matrix multiplication , where is the truncation matrix A=eye(K,D) along with the Gaussian reduction in Section 4.4, i.e. use software/module_lin_gauss.m. Thus, let z be the DCT output above. To trucate, we continue with:
A=eye(K,D); [zt,jout]=module_lin_gauss(z,jout,A);Alternatively, we can build the matrix A to do both the DCT and truncation. Thus, let u be the output of the log operation above. We continue with:
A=idct(eye(K,D)); [z,jout]=module_lin_gauss(u,jout,A);At this point, the last (-th)element of is the energy statistic. It makes sense to take the log of that:
[z,jout]=module_log(z,jout,D+1);If the information content in the energy statistic is not wanted, it can be handled like the zero-frequency DCT bin above. Thus, in summary, if truncated DCT is wanted (we don't recommend it for reasons we will explain in Section 10.1), then use:
[y,jout]=module_dftmsq(x,0); [w,jout]=module_mel_XXX(y,jout,...); [u,jout]=module_log(w,jout); A=idct(eye(K,D)); [z,jout]=module_lin_gauss(u,jout,A); [z,jout]=module_log(z,jout,D+1);