## Floating Reference Hypothesis

One way to alleviate potential numerical issues in evaluating and/or , is with a floating reference hypothesis. Under certain conditions, the reference hypothesis may be changed on the fly" to alleviate numerical or PDF approximation issues with the calculation of the J-function. Loosely stated, is allowed to vary within a set of hypotheses that can be optimally distinguished using the feature . For example, as long as contains the sample variance, which is a sufficient statistic for distinguishing two zero-mean Gaussian densities that differ only in variance, then may be the zero-mean Gaussian distribution with arbitrary variance - as the assumed variance varies, the numerator and denominator terms of the J-function vary over a wide range, but the ratio (the J-function) remains constant. Therefore, for numerical convenience, we can set the variance to the value at or near the maximum of both the numerator and denominator.

We first define the region of sufficiency (ROS) of feature set transformation , denoted by , as a set of hypotheses such that every pair of hypotheses obeys the relationship

Thus, a likelihood ratio between any two hypotheses in can be optimally constructed either in the raw data or in the feature space without loss of information. An ROS may be thought of as a family of PDFs traced out by the parameters of a PDF where is a sufficient statistic for the parameters. We can re-arrange the above equation as follows:

Thus, the J-function",

 (2.13)

is independent of as long as remains within ROS .

Defining the ROS should in no way be interpreted as a sufficiency requirement for . Every feature has a ROS, because at the very least, the projected PDF itself (2.2) serves a one hypothesis, and as another, for which the feature is sufficient. As long as the feature contains an energy statistic, the J-function is independent of scale parameters in .

Example 3   We re-visit example 1 with an eye to using a floating reference hypothesis. Let be the hypothesis that is a set of independent identically distributed Gaussian samples with mean and variance . We now show that is a sufficient statistic for , and an ROS for is the set of all PDFs traced out by . We have

 (2.14)

It is well known [15] that and are statistically independent, so they can be treated separately. Furthermore, under , is Gaussian with mean and variance , thus

Also, is a chi-square RV with degrees of freedom derived from a zero-mean Normal distribution with variance (See Section 16.1.2), thus

It may be verified, either by simulation, or by expanding and canceling terms, the contributions of and are exactly canceled in the J-function ratio

See software/test_mv2.m.

Because is independent of , it is possible to make both and a function of the data itself, changing them (floating) with each input sample. The most logical approach would be to set and . But, if is independent of , one may question why we would bother to do it. The reason is purely numerical. While this example is a trivial case, in general we do not have exact formulas for the PDFs, particularly the denominator . Therefore, our approach is to position within the ROS of to simultaneously maximize the numerator PDF and the denominator. By doing this, we are allowed to use PDF approximations such as the central limit theorem (CLT) (see software/test_mv2.m).

Example 4   We now expand upon the previous example by using a floating reference hypothesis and a CLT approximation for the denominator PDF. The feature was Gaussian, with mean and variance . Therefore the PDF obtained using the CLT is the same as the true PDF. But, for , we need to compute the mean and variance under . In particular, the expected value of is and the variance is (See Section 16.1.2). The theoretical Chi-square PDF and Gaussian PDF based on the CLT are plotted together for in Figure 2.3. There is close agreement near the central peak. And while not visible in the PDF plot (top panel), there are huge errors in the tail regions visible on the log-PDF plot (bottom panel). Using the CLT PDF estimate in place of the Chi-square distribution, we obtain a J-function estimate. Figure 2.4 shows the result of comparing the J-function using a CLT PDF estimate with the exact J-function. We used Gaussian data with variance and mean chosen at random (not corresponding to ). We used a floating reference hypothesis ( . The error is on the order 1e-3 for log-J function values ranging from -400 to 200! It is clear that using the floating reference hypothesis makes the approach feasible. See software/test_chisq_clt.m.

Since we position near or at the maximum of , we may ask whether there is a relationship to maximum likelihood (ML). We will explore the relationship of this method to asymptotic ML theory in a later section. To indicate the dependence of on , we adopt the notation . Thus,

 (2.15)

The existence of on the right side of the conditioning operator is admittedly a very bad use of notation, but is done for simplicity.

In many problems, the ROS is not easily found and we must be satisfied with an approximate ROS. In this case, there is a weak dependence of upon . This dependence is generally unpredictable unless, as we have suggested, is always chosen to maximize the numerator PDF. Then, the behavior of is somewhat predictable. By maximizing the numerator, the result is often a positive bias. This positive bias is most notable when there is a good match to the data - a desirable feature.

Another example of the use of the CLT is provided in section 5.2.4.

Baggenstoss 2017-05-19