Interpretation of the J-function

The J-function is a measure of the ability of the features to describe the input data. Mathematically, the J-function is equal to the manifold density (2.5) [5]. The manifold density comes into play when we generate samples from $ G({\bf x};H_0,T,g)$. For maximum entropy PDF projection (See Chapter 3), the manifold density is the uniform density (see Section 3.3). The manifold is a range of input data values that map to a given feature value. So, if the features are very descriptive, and accurately describe the peculiarities of the given data sample, the range of possible input data values shrinks, increasing the value of manifold density.

Another interpretation, based on asymptotic maximum likelihood (ML) theory, starts by assuming that there exists some parametric model $ p({\bf x};$$ \theta$$ )$ such that the features are maximum likelihood estimates of the parameters, $ {\bf z}=\hat{\mbox{\boldmath $\theta$}}$. The J-function for ML, given in (2.27), is dominated by the numerator, which is the likelihood function of the data evaluated at $ \hat{\mbox{\boldmath $\theta$}}$. Thus, the J-function has the interpretation as a quantitative measure of how well the parametric model can describe the raw data. The better the features, the better this notional parametric model. Interestingly, because the J-function can be computed without actually implementing the ML estimator, this information is available without needing to know the parametric form nor needing to maximize it! Naturally, there are situations where this information is detrimental to classification - specifically if the data contains nuisance information or interference. There are work-arounds that significantly improve classification performance, for example the class-specific feature mixture ([20], section II.B).

Baggenstoss 2017-05-19