To illustrate the effect of
feature augmentation, we chose two data sets
with different amount of dynamic information.
This data set consisted of three dyphthongs (phonemes with
time-varying formants) from the TIMIT corpus .
We extracted examples of the phonemes AY, EY, and OW.
An example of AY is shown in Figure 15.1 (left).
The total number of samples were 3196 for ``AY",
3030 for ``EY", and 2858 for ``OW". We joined
all available utterances of the phonemes from both the training
and testing subsets, then divided them into two sets
for 2-fold holdout.
- Office sounds. The Office Sounds database 
contains twenty-four signal classes of 102 samples each
created by dropping common objects or operating office tools
such as scissors or staplers. All time-series are 16128 samples long
(1/2 second in duration at 32000 Hz).
We chose three classes with abrupt temporal character:
penny, quart, skit.
An example of ``penny" is shown in Figure 15.1 (right).
Sample spectrograms. Left: dyphthong (AY).
Right: office sounds (penny). Note the gradually changing spectral content
of dyphthong ``AY" in contrast to the abrupt character of ``penny".