Floating Reference Hypothesis

We first define the region of sufficiency (ROS) of feature set transformation ${\bf z}=T({\bf x})$ , denoted by ${\cal H}_z$ , as a set of hypotheses such that every pair of hypotheses $H_{0a}, H_{0b}\in {\cal H}_z$ obeys the relationship

Defining the ROS should in no way be interpreted as a sufficiency requirement for ${\bf z}$ . Every feature has a ROS, because at the very least, the projected PDF itself (2.2) serves a one hypothesis, and as another, for which the feature is sufficient. As long as the feature contains an energy statistic (see Section 3.2.2), the J-function is independent of scale parameters in .

Example 3 We re-visit example 1 with an eye to using a floating reference hypothesis. Let $H_0(\mu, \sigma^2)$ be the hypothesis that ${\bf x}$ is a set of independent identically distributed Gaussian samples with mean $\mu$ and variance $\sigma^2$ . We now show that ${\bf z}$ is a sufficient statistic for $(\mu, \; \sigma^2)$ , and an ROS for ${\bf z}$ is the set of all PDFs traced out by $(\mu, \; \sigma^2)$ . We have

$\displaystyle p({\bf x}\vert H_0(\mu,\sigma^2)) = (2\pi\sigma^2)^{-N/2} \; \exp\left\{ -\frac{1}{2\sigma^2} \sum_{n=1}^N (x_i-\mu)^2 \right\}$

(2.14)

It is well known [15] that $\hat{\mu}$ and $\hat{\sigma^2}$ are statistically independent, so they can be treated separately. Furthermore, under , is Gaussian with mean $\mu$ and variance $\sigma^2/N$ , thus

$\displaystyle p(z_0 \vert H_0(\mu,\sigma^2)) = (2\pi\sigma^2/N)^{-1/2} \; \exp\left\{ -\frac{N}{2\sigma^2} \; (z_0-\mu)^2 \right\}.$

Also, $\hat{\sigma^2}$ is a chi-square RV with degrees of freedom derived from a zero-mean Normal distribution with variance $\frac{\sigma^2}{N-1}$ (See Section 17.1.2), thus

$\displaystyle p(z_1 \vert H_0(\mu,\sigma^2)) = \frac{N-1}{\sigma^2} \; \Gamma^... ...^2} \right)^{(N-1)/2-1} \; \exp\left\{ -{z_1 (N-1) \over 2 \sigma^2} \right\}$

It may be verified, either by simulation, or by expanding and canceling terms, the contributions of $\sigma^2$ and $\mu$ are exactly canceled in the J-function ratio

$\displaystyle J({\bf x}; H_0,T) = {p({\bf x}\vert H_0(\mu,\sigma^2))\over p(z_0 \vert H_0(\mu,\sigma^2)) \; p(z_1 \vert H_0(\mu,\sigma^2))}.$

See software/test_mv2.m.

Because $J({\bf x}; H_0(\mu,\sigma^2),T)$ is independent of $\mu,\; \sigma^2$ , it is possible to make both $\mu$ and $\sigma^2$ a function of the data itself, changing them (floating) with each input sample. The most logical approach would be to set $\mu=z_0$ and $\sigma^2 = z_1$ . But, if $J({\bf x}; H_0(\mu,\sigma^2),T)$ is independent of $\mu,\; \sigma^2$ , one may question why we would bother to do it. The reason is purely numerical. While this example is a trivial case, in general we do not have exact formulas for the PDFs, particularly the denominator $p({\bf z}\vert H_0)$ . Therefore, our approach is to position within the ROS of ${\bf z}$ to simultaneously maximize the numerator PDF and the denominator. By doing this, we are allowed to use PDF approximations such as the central limit theorem (CLT) (see software/test_mv2.m).

Example 4 We now expand upon the previous example by using a floating reference hypothesis and a CLT approximation for the denominator PDF. The feature was Gaussian, with mean $\mu$ and variance $\sigma^2/N$ . Therefore the PDF obtained using the CLT is the same as the true PDF. But, for , we need to compute the mean and variance under $H_0(\mu, \sigma^2)$ . In particular, the expected value of is $\sigma^2$ and the variance is $2\sigma^4/(N-1)$ (See Section 17.1.2). The theoretical Chi-square PDF and Gaussian PDF based on the CLT are plotted together for in Figure 2.3. There is close agreement near the central peak. And while not visible in the PDF plot (top panel), there are huge errors in the tail regions visible on the log-PDF plot (bottom panel). Using the CLT PDF estimate in place of the Chi-square distribution, we obtain a J-function estimate. Figure 2.4 shows the result of comparing the J-function using a CLT PDF estimate with the exact J-function. We used Gaussian data with variance and mean chosen at random (not corresponding to ). We used a floating reference hypothesis ( $\mu=z_0, \; \sigma^2=z_1)$ . The error is on the order 1e-3 for log-J function values ranging from -400 to 200! It is clear that using the floating reference hypothesis makes the approach feasible. See software/test_chisq_clt.m.

**Figure 2.3:** Example of Gaussian CLT approximation (red dotted) with true Chi-square PDF (blue solid).
$\includegraphics[scale=0.6, clip]{test_chisq_clt.eps}$

**Figure:** Example of J-function estimation using CLT approximation. Horizontal axis is the true log-J function and the vertical axis is the CLT approximation. Clearly the CLT approximation is very bad when used with a fixed but very good when used with a floating reference hypothesis.
$\includegraphics[width=4.0in,height=3.0in, clip]{test_chisq_clt_comp.eps}$

Since we position near or at the maximum of $p({\bf x}\vert H_0)$ , we may ask whether there is a relationship to maximum likelihood (ML). We will explore the relationship of this method to asymptotic ML theory in a later section. To indicate the dependence of on ${\bf z}$ , we adopt the notation $({\bf z})$ . Thus,

In many problems, the ROS ${\cal H}_z$ is not easily found and we must be satisfied with an approximate ROS. In this case, there is a weak dependence of $J({\bf x};H_0,T)$ upon . This dependence is generally unpredictable unless, as we have suggested, $({\bf z})$ is always chosen to maximize the numerator PDF. Then, the behavior of $J({\bf x};H_0,T)$ is somewhat predictable. By maximizing the numerator, the result is often a positive bias. This positive bias is most notable when there is a good match to the data - a desirable feature.