### Optimality Conditions of the Theorem

Theorem 1 shows that provided we know the PDF under some reference hypothesis at both the input and output of transformation , if we are given an arbitrary PDF defined on , we can immediately find a PDF defined on that generates . While it is interesting that generates , there are an infinite number of them and it is not yet clear that is the best choice. However, suppose we would like to use as an approximation to the PDF . Define

 (2.6)

From Theorem 1, we see that (2.6) is a PDF. Furthermore, if is a sufficient statistic for vs , we have

 (2.7)

thus, as , we have

or that the PDF estimate approaches the true PDF .

This result gives us guidance about how to choose not just the features, but also . In short, in order to make the projected PDF as good as possible an approximation to , choose and so that is approximately sufficient statistic for the likelihood ratio test between and . But, the sufficiency condition is required for optimality, but is not necessary for 2.6 to be a valid PDF. Here we can see the importance of the theorem which provides a means of creating PDF approximations on the high-dimensional input data space without dimensionality penalty using low-dimensional feature PDFs. It also provides a way to optimize the approximation by controlling both the reference hypothesis as well as the features themselves. This is the remarkable property of Theorem 1 - that the resulting function remains a PDF whether or not the features are sufficient statistics. Since sufficiency means optimality of the classifier, approximate sufficiency mean PDF approximation and approximate optimality.

Baggenstoss 2017-05-19