To illustrate the effect of
feature augmentation, we chose two data sets
with different amount of dynamic information.
- dyphthongs.
This data set consisted of three dyphthongs (phonemes with
time-varying formants) from the TIMIT corpus [80].
We extracted examples of the phonemes AY, EY, and OW.
An example of AY is shown in Figure 15.1 (left).
The total number of samples were 3196 for “AY",
3030 for “EY", and 2858 for “OW". We joined
all available utterances of the phonemes from both the training
and testing subsets, then divided them into two sets
for 2-fold holdout.
- Office sounds. The Office Sounds database [81]
contains twenty-four signal classes of 102 samples each
created by dropping common objects or operating office tools
such as scissors or staplers. All time-series are 16128 samples long
(1/2 second in duration at 32000 Hz).
We chose three classes with abrupt temporal character:
penny, quart, skit.
An example of “penny" is shown in Figure 15.1 (right).
Figure 15.1:
Sample spectrograms. Left: dyphthong (AY).
Right: office sounds (penny). Note the gradually changing spectral content
of dyphthong “AY" in contrast to the abrupt character of “penny".
|