Class-Specific Feature Classifier

When the same reference hypothesis is used for each class, $H_{0,k}=H_0$, (12.4) simplifies to

$\displaystyle \arg \max_k \left\{ \frac{p({\bf z}\vert H_{k})}{p({\bf z}\vert H_{0})} \; p(H_k)\right\}$ (2.8)

This is the class-specific feature (CSF) classifier [9,10] which preceded the discovery of PDF projection [6]. This method does not require PDF projection because it is based on the comparison of likelihood ratios, not likelihood functions. The validity of the classifier is based on the idea of sufficient statistics.

Some authors have proposed using ad-hoc reference hypotheses with (2.8), such as collections of data classes (“all classes" hypothesis) [11,12]. In this case, $p({\bf z}\vert H_0)$ can be estimated, along with the class PDFs $p({\bf z}\vert H_k)$. These methods are techically not using PDF projection, and definitely not maximum entropy PDF projection. This is not to say that these methods do not have merit. On the contrary, using the “union class" hypothesis (union of all classes) as a reference hypothesis may have advantages in classifying among a set of similar classes based on total KL divergence[13] (thanks fo Steven Kay for this observation). The method has been proven, for example in text classification [14]. For additional discussion of choosing $H_0$ for maximum entropy, See Section 3.2.3.