Data sets

To illustrate the effect of feature augmentation, we chose two data sets with different amount of dynamic information.
  1. dyphthongs. This data set consisted of three dyphthongs (phonemes with time-varying formants) from the TIMIT corpus [80]. We extracted examples of the phonemes AY, EY, and OW. An example of AY is shown in Figure 15.1 (left). The total number of samples were 3196 for “AY", 3030 for “EY", and 2858 for “OW". We joined all available utterances of the phonemes from both the training and testing subsets, then divided them into two sets for 2-fold holdout.
  2. Office sounds. The Office Sounds database [81] contains twenty-four signal classes of 102 samples each created by dropping common objects or operating office tools such as scissors or staplers. All time-series are 16128 samples long (1/2 second in duration at 32000 Hz). We chose three classes with abrupt temporal character: penny, quart, skit. An example of “penny" is shown in Figure 15.1 (right).
Figure 15.1: Sample spectrograms. Left: dyphthong (AY). Right: office sounds (penny). Note the gradually changing spectral content of dyphthong “AY" in contrast to the abrupt character of “penny".
Image ay_1-2 Image penny_1-2