Maximum likelihood and PDF Projection

To apply equation (2.15), $\hat{\mbox{\boldmath $\theta$}}$ takes the place of ${\bf z}$ and $({\bf z})$ is the hypothesis that $\hat{\mbox{\boldmath $\theta$}}$ is the true value of $\theta$ . We substitute (2.18) for $p_x({\bf x}\vert H_0$ $({\bf z})$ and (2.19) for $p_z({\bf z}\vert H_0$ $({\bf z})$ . Under the stated conditions, the exponential terms in approximations (2.18),(2.19) become 1. Using these approximations, we arrive at

To compare equations (2.15) and (2.20), we note that for both, there is an implied sufficiency requirement for ${\bf z}$ and $\hat{\mbox{\boldmath $\theta$}}$ , respectively. Specifically, $({\bf z})$ must remain in the ROS of ${\bf z}$ , while $\hat{\mbox{\boldmath $\theta$}}$ must be asymptotically sufficient for $\theta$ . However, (2.15) is more general since (2.20) is valid only when all of the features are ML estimators and only holds asymptotically for large data records with the implication that $\hat{\mbox{\boldmath $\theta$}}$ tends to Gaussian, while (2.15) has no such implication. This is particularly important in upstream processing where there has not been significant data reduction and asymptotic results don't apply. Using (2.15), we can make simple adjustments to the reference hypothesis to match the data better and avoid the PDF tails (such as controlling variance) where we are certain that we remain in the ROS of ${\bf z}$ .

Example 5 We revisit example 3 and 4, this time using the ML approach. Note that $\hat{\mu}$ , and $\hat{\sigma}^2$ are the ML estimates of mean and variance [15]. It is instructive to derive the CR bound for this problem (Section 17.5). Taking the log of (2.14),

$\displaystyle \log p({\bf x}; \mu,\sigma^2) = -\frac{N}{2} \log (2\pi\sigma^2) -\frac{1}{2\sigma^2} \sum_{n=1}^N (x_i-\mu)^2 .$

(2.21)

We require the first derivatives

$\displaystyle \frac{\partial}{\partial \mu} \log p({\bf x}; \mu,\sigma^2) = \frac{1}{\sigma^2} \sum_{n=1}^N (x_i-\mu),$

$\displaystyle \frac{\partial}{\partial \sigma^2} \log p({\bf x}; \mu,\sigma^2) = -\frac{N}{2\sigma^2} +\frac{1}{2\sigma^4} \sum_{n=1}^N (x_i-\mu)^2 .$

Taking second derivatives,

$\displaystyle \frac{\partial^2}{\partial^2 \mu} \log p({\bf x}; \mu,\sigma^2) = -\frac{N}{\sigma^2},$

$\displaystyle \frac{\partial^2}{\partial \mu\partial \sigma^2} \log p({\bf x}; \mu,\sigma^2) = -\frac{1}{\sigma^4} \sum_{n=1}^N (x_i-\mu) ,$

$\displaystyle \frac{\partial^2}{\partial^2 \sigma^2} \log p({\bf x}; \mu,\sigma^2) = \frac{N}{2\sigma^4} -\frac{1}{\sigma^6} \sum_{n=1}^N (x_i-\mu)^2 .$

The next step is to take $-{\cal E}(\;\;)$ of the above. Using that ${\cal E}(x_i)=\mu$ , ${\cal E}(x_i-\mu)^2=\sigma^2$ ,

$\displaystyle I(\mu,\mu) = \frac{N}{\sigma^2},$

$\displaystyle I(\mu,\sigma^2) = 0,$

$\displaystyle I(\sigma^2,\sigma^2) = -\frac{N}{2\sigma^4} +\frac{1}{\sigma^6} N\sigma^2 = \frac{N}{2\sigma^4}.$

Finally, the FIM for this problem is given by

$\displaystyle {\bf I}(\hat{\mu}, \hat{\sigma}^2) = \left[ \begin{array}{cc} \frac{N}{\sigma^2} & 0\\ 0 & \frac{N}{2\sigma^4} \end{array} \right],$

whos inverse is the CR bound

$\displaystyle {\bf C}(\hat{\mu}, \hat{\sigma}^2) = \left[ \begin{array}{cc} \frac{\sigma^2}{N} & 0\\ 0 & \frac{2\sigma^4}{N} \end{array} \right].$

Note the close relationship to the CLT approach used in example 4. There is essentially no difference aside from the variance of which is $\frac{2\sigma^4}{N}$ in the CR bound analysis, but $\frac{2\sigma^4}{N-1}$ in the CLT example. Whenever the ML approach can be used, it is, in fact the same as the CLT approach asymptotically as becomes large.

We let the floating reference hypothesis be that $({\bf z})$ $: \{\mu=\hat{\mu}, \sigma^2=\hat{\sigma}^2\}$ , or in other words that the true values of $\mu$ and $\sigma^2$ are equal to the ML estimates. We have,

$\displaystyle \log p({\bf x}\vert H_0$ $\displaystyle \mbox{\small$({\bf z})$}$ $\displaystyle ) = -\frac{N}{2}\log(2\pi \hat{\sigma}^2) -\frac{1}{2\hat{\sigma}^2} \sum_{i=1}^N ( x_i-\hat{\mu})^2 .$

(2.22)

Note that

$\displaystyle \sum_{i=1}^N ( x_i-\hat{\mu})^2 = N\hat{\sigma}^2,$

leaving

$\displaystyle \log p({\bf x}\vert H_0$ $\displaystyle \mbox{\small$({\bf z})$}$ $\displaystyle ) = -\frac{N}{2}\log(2\pi \hat{\sigma}^2) -\frac{N}{2}.$

(2.23)

For $p({\bf z}\vert H_0$ $({\bf z})$ , we have (See denominator of equation 2.20 ) that

$\displaystyle p({\bf z}\vert H_0$ $({\bf z})$ $\displaystyle ) = -\frac{D}{2}\log(2\pi) + \frac{1}{2} \log \left\vert {\bf I}(\hat{\mu}, \hat{\sigma}^2) \right\vert,$

where .

We therefore have that

$\displaystyle \log p({\bf z}\vert H_0$ $({\bf z})$ $\displaystyle ) = -\log(2\pi)-\frac{1}{2}\log\left\{ \frac{\sigma^2}{N} \cdot ... ...{N} \right\} = -\log(2\pi)-\frac{1}{2}\log\left(\frac{2\sigma^6}{N^2} \right).$

We compared the J-functions using the above equations with the J-function from the fixed reference hypothesis (example 1). There was close agreement. For details, see Figure 2.5 and software/test_mv_ml.m.

**Figure 2.5:** Comparison of J-function from exact solution (example 1) with ML approximation.
$\includegraphics[width=4.2in,height=3.9in, clip]{test_mv_ml.eps}$