What are the different topologies of a CSM implementation?

The topology that is best suited to an application depends on the type of data (time-series, image, spectrogram, text, ...). Four distinct topologies for CSM that come to mind are:
  1. General topology, where the input data is an arbitrary vector ${\bf x}$. If the vector is a time-series, one must distinguish between two types of time-series data vectors: circular and non-circular. Any time one starts signal processing with an FFT, there is a tacit assumption that the data is circularly stationary, so that a circular rotation is possible without causing discontinuities or changing the spectral information. When a window function (such as Hanning) is used, this distinction disappears. The circular assumption results in more efficient processing, but may not be apropriate for some types of data.

  2. Overlapped window topology. For analysis of large time-series, the processing of the all the data at once may be impractical and the exact location or length of events in the data stream is not known in advance. Then, it is advantageous to segment the data into overlapping time windows, and then extract features from each segment. This true for analysis of human speech or acoustic events. In segmented processing, we strongly recommend using a special type of segmentation called “hanning-3". CSM allows one to define several parallel competing branches, each using a different window length and/or feature type. The topology that we call “hanning-3" is suited to this kind of analysis. Hanning-3 is described in Section 12.1.3. A tacit assumption of circluar stationarity of the individual segments is used.

  3. MR-HMM topology. For some applications, data may be very irregular and span wide ranges of time and frequency resolutions. An example would be the analysis of music or short-duration acoustic events. In this case, a set of fixed overlapping analysis windows might not sufficiently capture the content of the data (even if there are a wide range of analysis window lengths). Applying a window function could also destroy some characteristics of the data that might be important such as exponeitial attack/decay, or rectangular envelopes, present in many types of acoustic events. For this situation, the multi-resolution HMM (MR-HMM) is suitable. In the MR-HMM, data is segmented on-the-fly, allowing short and long time spans, narrow and broad spectral features, to be used simultaneously for a data event. In fact, the accurate positioning of segments makes it unecessary to use windowing functions such as Hanning weighting. For the MR-HMM circluar stationarity is not assumed used for the segments. The MR-HMM is described in Chapter 14.

  4. Neural Network topology (PBN). A neural network architecture might be appropriate for some applications such as image analysis, or analysis of spectrograms. The projected belief network (PBN) is term we use for CSM applied to a neural network architecture. Since a neural network is divided into layers, is perfectly suited for analysis by the chain-rule (Section 2.2.4). There are two key idea present in a neural network architecture that differentiates it from standard CSM:
    1. All feature extraction is learned - and not assumed. For example, a MEL-Cepstrum is a fixed feature extraction approach that is not learned. In a neural network architecture, the raw spectral data would be applied to an arbitrary linear transformation, either dense or convolutional, and the weights are learned.
    2. All layers start with a linear transformation and end with the application of an activation function, which is a non-linear function applied element-wise. Some flexibility does exist, for example one may start a layer with a generalized element-wise non-linearity [2].
    The PBN is described in Chapter 16.

  5. Combined CSM-PBN. A combined CSM-PBN arcitecture would first extract fixed features, such as MEL Cepstrum features, and then proceed with a neural network architecture. The entire combined CSM-PBN chain would conform to the CSM paradigm: it would be reversible and have a J-function.