Cepstral Analysis

Cepstral analysis is a widely-used means to extract meaningful information about the spectrum. The most widely-used method is the MEL frequency cepstral coefficients (MFCC) [26]. They generally consist of these steps:

- Discrete Fourier transform (DFT) followed by calculating the magnitude or squared magnitude of the bins to produce a raw spectral vector.
- Energy binning by inner product of the raw spectral vector with a bank of positive-valued spectral band functions.
- Log (taking the logarithm of the binned band energies).
- (optional) Discrete cosine transform (DCT), with optional truncation (elimination of some highest-frequency bins).

Rather than truncating the DCT coefficients, we believe it is better to use fewer but fatter spectral band functions, then keep all the DCT bins. This produces ``tighter" features from an information point of view, as we will see in Section 10.1. The first (zero-frequency) DCT coefficient is often left out in practice. As leaving this out disturbs the energy statistic (Section 3.2.2), we prefer to keep it. If the practitioner wishes to ignore the information in the zero-frequency DCT coefficient, this can be handled by assigning a fixed prior distribution to it in the PDF estimation step. This will have the same effect in the classifier as eliminating it altogether.

The factor with the most influence is the number
shape, and spacing of the spectral band functions.
The cepstral features can be produced (excepting the last DCT step)
using the following module chain (assume `x` is the
input data vector):

[y,jout]=module_dftmsq(x,0); [w,jout]=module_mel_XXX(y,jout,...); [u,jout]=module_log(w,jout); z=dct(u);These four modules correspond one-to-one with the first four steps Note that the DCT step requires no modification of the J-function since it is an invertible transformation with Jacobian . Above,

The band spacing and shape of the spectral band functions are
controlled by the two variables `fs, type`.
`fs` is the assumed sample rate for setting up the
spacing and center frequencies of the MEL band function.
The variable `type` controls whether the bands are
linear spaced, triangular or hanning-shaped as follows:

% type | shape | spacing % --------------------------------- % 0 | hanning | MEL % 1 | triangular | MEL % 2 | hanning | linear % 3 | triangular | linearNote that the end bin values are set to ``half", so the sum of the band functions at the zero and Nyquist frequency bins will be 1/2, while it will be 1 in all others. This preserves in the features the total energy at the DFT

If we are going to truncate the DCT from down to
coefficients, we model it as a matrix multiplication
, where is the truncation matrix
`A=eye(K,D)` along with the Gaussian reduction in Section 4.4,
i.e. use
`software/module_lin_gauss.m`. Thus, let `z`
be the DCT output above. To trucate, we continue with:

A=eye(K,D); [zt,jout]=module_lin_gauss(z,jout,A);Alternatively, we can build the matrix

A=idct(eye(K,D)); [z,jout]=module_lin_gauss(u,jout,A);At this point, the last (-th)element of is the energy statistic. It makes sense to take the log of that:

[z,jout]=module_log(z,jout,D+1);If the information content in the energy statistic is not wanted, it can be handled like the zero-frequency DCT bin above. Thus, in summary, if truncated DCT is wanted (we don't recommend it for reasons we will explain in Section 10.1), then use:

[y,jout]=module_dftmsq(x,0); [w,jout]=module_mel_XXX(y,jout,...); [u,jout]=module_log(w,jout); A=idct(eye(K,D)); [z,jout]=module_lin_gauss(u,jout,A); [z,jout]=module_log(z,jout,D+1);