Maximum Likelihood Parameter Estimation and the CR bound

Consider a general log PDF that depends on $ P$ parameters: $ \theta$$ = [\theta_1, \theta_2 \ldots \theta_P]^\prime$:

$\displaystyle p({\bf x}) = p({\bf x};$   $\displaystyle \mbox{\boldmath$\theta$}$$\displaystyle ).$

The Fisher's information between any two parameters $ \theta_1$ and $ \theta_2$ is defined by

$\displaystyle I_{\theta_2, \theta_2} = - {\rm E}\left\{ \frac{\partial^2 \log p...
...f x};\mbox{\boldmath$\theta$}) }{\partial \theta_1 \partial \theta_2} \right\}.$ (16.13)

Collecting all these values into the matrix $ {\bf I}($$ \theta$$ )$, we have the Fisher's information matrix. The Cramer-Rao lower bound states that the covariance matrix $ {\bf C}$ of any joint unbiased estimator for the parameters $ \theta$ is such that

$\displaystyle {\rm det}\left\{{\bf C}-{\bf I}^{-1}(\mbox{\boldmath $\theta$})\right\} > 0$

. This effectively means that $ {\bf I}^{-1}($$ \theta$$ )$ is the lower bound for the covariance of any unbiased estimator.

The inverse of the Fisher's information matrix is a good estimate of the parameter estimation error covariance and is useful for iterative optimization. Given a parameter estimate $ \theta$$ _n$, the new estimate is obtained as

$\displaystyle \mbox{\boldmath$\theta$}$$\displaystyle _{n+1}=$   $\displaystyle \mbox{\boldmath$\theta$}$$\displaystyle _n + I^{-1}($$\displaystyle \mbox{\boldmath$\theta$}$$\displaystyle _n) \;$   $\displaystyle \mbox{\boldmath$\delta$}$$\displaystyle ,$ (16.14)

where

$ \delta$$\displaystyle = \left[ D(\theta_1) \; D(\theta_2) \ldots \right]^\prime$

is the gradient vector formed from the first partial derivatives

$\displaystyle D(\theta)\stackrel{\mbox{\tiny $\Delta$}}{=}\frac{\partial}{\partial \theta} \log p({\bf x}; \theta).$

It is possible to optimize only subsets of the features as well. A feature pair $ \theta_1, \theta_2$ is updated according to

$\displaystyle \left[ \begin{array}{l} \theta_1  \theta_2 \end{array}\right]_{...
...^{-1} \; \left[ \begin{array}{l} D(\theta_1)  D(\theta_2) \end{array}\right].$ (16.15)

Baggenstoss 2017-05-19