What is the Class-Specific Method (CSM)?

Most classifiers or machine learning approaches begin with some sort of signal processing or feature extraction. The class-specific method (CSM) is a new approach to machine learning, signal processing and feature extraction, made possible by probability density function (PDF) projection. Other approaches can be called “one-way" approaches because the unprocessed (raw) input data becomes irrelevant once the features are extracted. In the class-specific method, a return path to the raw data is always present. The return path is in two forms: (a) a log-likelihood correction, called “J-function", and (b) a method to reconstruct the raw data from the features.

I like to compare CSM to the physical concept of reciprocity. Ideal processes that obey reciprocity tend to be lossless. The conversion of sound to electrical impulses in a microphone can be reversed - electrical input can be converted to sound. A loudspeaker can act as a microphone and vice versa. There are limits to the analogy, because in CSM, dimension is reduced and information is lost, so the original input cannot be reconstructed. However, CSM can be coupled with the principle of maximum entropy to provide a type of optimality to the reconstruction (Chapter 3). In certain cases, data reconstruction using CSM can result in the conditional mean estimate, an estimate of the input data with certain statistical optimalities (Section 16.2). Also, it has been shown that applying CSM to estimate the input data distribution results in the maximization of information in the features (Section 3.4).

Figure 1.1 illustrates CSM. Input data ${\bf x}$ is presented to two or more signal processing chains. In each chain, a series of signal processing steps are taken. In parallel, a log-likelihood correction (J-function) term is computed. Each stage in the chain adds a term to the accumulated J-function. The total J-function at the chain output is the sum of the correction made at each step. Note that the log-likelihood functions of two output features ${\bf z}_1$ and ${\bf z}_2$ can be compared once the correction terms are added. In other words, $\log p({\bf z}_1) + J_1$ can be compared to $\log p({\bf z}_2) + J_2$ in forming statistical tests. In fact, these are two different estimates of the input data distribution: $\hat{p}_1({\bf x})=\log p({\bf z}_1) + J_1$, $\hat{p}_2({\bf x})=\log p({\bf z}_2) + J_2$. This is called probability density function projection (PDF projection) because the feature distribution is projected back to the input data domain. Furthermore, estimates of the input data, $\hat{{\bf x}}_1$, $\hat{{\bf x}}_2$ can be created by propagating the features ${\bf z}_1$ and ${\bf z}_2$ back through their respective reconstruction chains, which are generative models with a closed-form likelihood function, and are inverse forms of the feature extraction chains. These capabilities of CSM have far-reaching implications in statistical signal processing and machine learning, as we will explain in the following sections and chapters.

Figure 1.1: Illustration of the modular design of class-specific signal processing chains.
\includegraphics[width=5.5in, clip]{csm_modular.eps}