We extracted features by 2/3 overlapped hanning-weighted
MEL frequency cepstral coefficient (MFCC)
feature analysis . For the TIMIT data, which
is sampled at 16 KHz, we first downsampled the data to 12 KHz,
then used 288-sample windows (24 milliseconds).
For the office sounds data, which is sampled at 32 KHz,
we used 288-sample windows (18 milliseconds).
For both data sets, we used 24 Hanning-shaped MEL bands (including the
zero and Nyquist bands), and no DCT truncation, producing
a 24-dimensional feature.