Another interpretation, based on asymptotic maximum likelihood (ML) theory, starts by assuming that there exists some parametric model such that the features are maximum likelihood estimates of the parameters, . The J-function for ML, given in (2.27), is dominated by the numerator, which is the likelihood function of the data evaluated at . Thus, the J-function has the interpretation as a quantitative measure of how well the parametric model can describe the raw data. The better the features, the better this notional parametric model. Interestingly, because the J-function can be computed without actually implementing the ML estimator, this information is available without needing to know the parametric form nor needing to maximize it! Naturally, there are situations where this information is detrimental to classification - specifically if the data contains nuisance information or interference. There are work-arounds that significantly improve classification performance, for example the class-specific feature mixture (, section II.B).