Maximum likelihood and PDF Projection

Two well-known results from asymptotic theory [17] are the following.

- Subject to certain
regularity conditions (large amount of data,
a PDF that depends on a finite number of parameters
and is differentiable, etc.), the PDF
**may be approximated by**

where**is an arbitrary value of the parameter, is the maximum likelihood estimate (MLE) of****, and****is the***Fisher's information matrix*(FIM) [17]. The components of the FIM for PDF parameters are given by**in the vicinity of the MLE (and the true value).** - The MLE
is approximately
Gaussian with mean equal to the true value
**and covariance equal to****, or**

where is the dimension of**. Note we use in evaluating the FIM in place of****, which is unknown. This is allowed because****has a weak dependence on****. The approximation is valid only for****in the vicinity of the MLE.**

To apply equation (2.15),
takes the place of and
** is the hypothesis that
is the true
value of
****. We substitute (2.18)
for
**** and (2.19) for
****.
Under the stated conditions,
the exponential terms in approximations (2.18),(2.19)
become 1. Using these approximations, we arrive at
**

which agrees with the PDF approximation from asymptotic theory [18], [19]. Equation (2.20) is very useful for integrating ML estimators into class-specific classifier and we will give examples of its use. The first term (in brackets) is the J-function.

To compare equations (2.15) and (2.20),
we note that for both, there is an implied sufficiency
requirement for and
, respectively.
Specifically,
** must remain in the ROS of ,
while
must be asymptotically sufficient for
****.
However, (2.15) is more general
since (2.20) is valid only when
***all* of the features are ML estimators and
only holds asymptotically
for large data records with the implication that
tends to Gaussian, while (2.15) has no such
implication. This is particularly important in
upstream processing where there has not been significant data reduction
and asymptotic results don't apply.
Using (2.15), we can make simple adjustments to the
reference hypothesis to match the data better and avoid the PDF
tails (such as controlling variance)
where we are certain that we remain in the ROS of .

*We let the floating reference hypothesis
be that
, or
in other words that the true values of and are equal
to the ML estimates. We have,
*

*For
, we have (See denominator of equation 2.20 ) that
*

*We therefore have that
*

*We compared the J-functions using the above equations
with the J-function from the fixed reference hypothesis
(example 1). There was close agreement.
For details, see Figure 2.5 and software/test_mv_ml.m.
*

Baggenstoss 2017-05-19