##

Interpretation of the J-function

The J-function is a measure of the ability of the
features to describe the input data.
Mathematically, the J-function is equal to
the manifold density (2.5) [5].
The manifold density comes into play when we generate
samples from
. For maximum entropy
PDF projection (See Chapter 3), the manifold density is
the uniform density (see Section 3.3).
The manifold is a range of input data values
that map to a given feature value. So, if the features
are very descriptive, and accurately describe
the peculiarities of the given data sample,
the range of possible input data values shrinks,
increasing the value of manifold density.
Another interpretation, based on asymptotic
maximum likelihood (ML) theory, starts by
assuming that there exists some parametric model
such that the features are
maximum likelihood estimates of the parameters,
.
The J-function for ML, given in (2.27), is dominated by
the numerator, which is the likelihood function of the data
evaluated at
.
Thus, the J-function has the interpretation as
a quantitative measure of how well the parametric
model can describe the raw data. The better the features, the better this
notional parametric model. Interestingly, because the J-function
can be computed without
actually implementing the ML estimator, this information
is available without needing to know the parametric form nor
needing to maximize it!
Naturally, there are situations where this information
is detrimental to classification - specifically if the data contains
nuisance information or interference.
There are work-arounds that significantly
improve classification performance, for example the
class-specific feature mixture ([20], section II.B).

Baggenstoss
2017-05-19