What are the advantages of CSM?

Data-adaptive features. Using CSM, one can define parallel signal processing branches, eliminating the need to “put all the eggs in one basket", as is often done by the choice of a fixed FFT size, model order, or feature type. Instead of being constrained by one choice of features, CSM effectively allows the data to choose the most suitable path to the output.
Dimension reduction. Through the use of parallel signal procssing branches, more information can be brought to bear on the problem without increasing dimension.
Information maximization. Using CSM maximizes information content of the features without the need to specify a task or target function. The concept of information maximizytion is explained in Section 3.4.
Reversibility. Just like reversible physical processes that are the most efficient, CSM provides a closer link to the input data, making it less likely to lose critical information in the feature extraction. CSM provides a “return path" to the input data. By reconstructing the input data, the most apropriate feature can be chosen. Or, by providing a likelihood function referenced to the input data, statistical test can be performed across feature sets, making it even possible to select features based on likelihood comparison.
When CSM is applied to a neural network architecture, it results in a projected belief networks (PBN). A PBN can be at the same time a generative and discriminative network, and can attain the best properties of both types of networks. In fact, is has been demonstrated that a PBN trained with both discriminative and generative cost function can compete with fully discriminative classifiers [1].