Statement of MaxEnt PDF Projection theorem

Maximum entropy PDF projection [3] is a means of finding the unique member of ${\cal G}(T,g)$ with highest entropy, more precisely: to find the $H_0$ that produces $G({\bf x};H_0,T,g)$ with highest entropy. The entropy of $G({\bf x};H_0,T,g)$ is given by

$\displaystyle Q_G= -\int_{{\bf x}} \log G({\bf x}; H_0, T,g) \; G({\bf x}; H_0, T,g) \;
{\rm d} {\bf x}.$

It can be shown (See [3], equation 8) that this can be expanded as follows:

$\displaystyle Q_G= Q_g + \int_{{\bf z}} Q_{\mu \vert z; H_0} \; g({\bf z}) \; {\rm d} {\bf z}$ (3.1)

where the entropy of $g$ is $Q_g=-\int_{{\bf z}} \log g({\bf z}) \; g({\bf z}) \; {\rm d} {\bf z},$ and the manifold entropy is

$\displaystyle Q_{\mu\vert z; H_0}=-\int_{{\bf x}\in{\cal M}(z;T)} \log \mu({\bf x}\vert{\bf z}; T, H_0) \; \mu({\bf x}\vert{\bf z}; T, H_0) {\rm d} {\bf x},$ (3.2)

where $\mu({\bf x}\vert{\bf z}; T, H_0)$ comes from (2.5). Since $Q_g$ is fixed, to absolutely maximize $Q_{\mu\vert z; H_0}$ (i.e. for each $g({\bf z})$), $Q_{\mu\vert z; H_0}$ must be maximized for each ${\bf z}$. This is achieved if $\mu({\bf x}\vert{\bf z}; T, H_0)$ is the uniform distribution, which is the MaxEnt distribution on regions of compact support. But, in (2.5), $\mu({\bf x}\vert{\bf z}; T, H_0)$ is shaped by $p({\bf x}\vert H_0)$. Therefore, we have two requirements, (a) $p({\bf x}\vert H_0)$ must be of constant value on any manifold (2.4), and (b) all manifolds ${\cal M}({\bf z};T)$ must be compact. There are two ways to achieve these requirements depending on ${\cal X}$.

When ${\cal X}$ is itself a compact set, such as the unit hypercube $0\leq x_i \leq 1$ for $1\leq i \leq N$, we can make $p({\bf x}\vert H_0)$ the uniform distribution. Then, so long as the manifold ${\cal M}({\bf z};T)$ is compact for all ${\bf z}$ 3.1, then $\mu({\bf x}\vert{\bf z}; T, H_0)$ will be a proper uniform distribution for all ${\bf z}$, which has maximum entropy. Alternatively, when ${\cal X}$ is infinite in extent, the manifold can be forced to be compact by the inclusion of an energy statistic in ${\bf z}$ (first proposed in [3]). The solution for compact ${\cal X}$ and the solution for unbounded ${\cal X}$ are formalized by the following two theorems.

Theorem 2   Maximum Entropy PDF Projection - Compact ${\cal X}$. Starting with the same assumptions as Theorem 1, we further assume that ${\cal X}$ is a compact set and $\int_{{\bf x}\in {\cal X}} \; {\rm d} {\bf x}= a < \infty.$ Furthermore, we assume that ${\cal M}({\bf z};T)$ is a compact set for all ${\bf z}\in{\cal Z}$. Then, the PDF

$\displaystyle G^*({\bf x}; T,g) = \frac{a^{-1}}{p({\bf z}\vert H_0;T)} g({\bf z}),$ (3.3)

where $p({\bf z}\vert H_0;T)$ is the distribution of ${\bf z}$ under the uniform assumption $p({\bf x}\vert H_0)=a^{-1}$, is the member of ${\cal G}(T,g)$ with highest entropy.

The second case considers when ${\cal X}$ is not compact. The central problem in choosing the maximum entropy reference hypothesis when ${\cal X}$ is not compact is that entropy of a distribution can go to infinity unless something is done to constrain it. There are two approaches:

  1. Assume a fixed variance (or mean). A finite entropy results if the scale parameter of the distribution (variance or mean depending on the distribution) is fixed. We use this approach for each layer in a projected belief network (PBN), covered in Chapter 16.
  2. Use an Energy statistic. When assuming a fixed scale parameter is not wanted or does not make sense, we are left with using an energy statistic. For this case, we need the next theorem.

Theorem 3   Maximum Entropy PDF Projection - Unbounded ${\cal X}$. Starting with the same assumptions as Theorem 1, we further assume that there exists a function $f$ such that

$\displaystyle f({\bf z})=f(T({\bf x}))=\Vert{\bf x}\Vert$ (3.4)

for some norm $\Vert{\bf x}\Vert$ valid on ${\cal X}$. We further assume that for all finite ${\bf z}\in{\cal Z}$, ${\cal M}({\bf z};T)$ is a compact set. Then, if the reference distribution can be written in the form

$\displaystyle p({\bf x}\vert H_0) = h(T({\bf x}))$ (3.5)

for some function $h$, then the projected PDF (2.2) is the member of ${\cal G}(T,g)$ with highest entropy.

The highest-entropy distribution, which is found by maximizing the entropy of $G({\bf x};H_0,T,g)$ over $H_0$, is denoted by $G^*({\bf x}; T,g)$. Intuitively, $G^*({\bf x}; T,g)$ has the MaxEnt property because all samples generated by $G({\bf x}; T,g)$ for a given ${\bf z}$ are equally likely (the uniform distribution, which has maximum entropy on a compact set). This is a result of the fact that for a given ${\bf z}$, all samples ${\bf x}$ on the manifold ${\cal M}({\bf z})$ have constant value on $p({\bf x}\vert H_0)$, making $G^*({\bf x}; T,g)$ also constant on the manifold, implying that the manifold distribution is in fact the uniform distribution. The reader is referred to a previous article for additional details of the proof [3].