Statement of MaxEnt PDF Projection theorem

Maximum entropy PDF projection [3] is a means of finding the unique member of ${\cal G}(T,g)$ with highest entropy, more precisely: to find the that produces $G({\bf x};H_0,T,g)$ with highest entropy. The entropy of $G({\bf x};H_0,T,g)$ is given by

$\displaystyle Q_G= -\int_{{\bf x}} \log G({\bf x}; H_0, T,g) \; G({\bf x}; H_0, T,g) \; {\rm d} {\bf x}.$

It can be shown (See [3], equation 8) that this can be expanded as follows:

$\displaystyle Q_G= Q_g + \int_{{\bf z}} Q_{\mu \vert z; H_0} \; g({\bf z}) \; {\rm d} {\bf z}$

(3.1)

where the entropy of is $Q_g=-\int_{{\bf z}} \log g({\bf z}) \; g({\bf z}) \; {\rm d} {\bf z},$ and the manifold entropy is

$\displaystyle Q_{\mu\vert z; H_0}=-\int_{{\bf x}\in{\cal M}(z;T)} \log \mu({\bf x}\vert{\bf z}; T, H_0) \; \mu({\bf x}\vert{\bf z}; T, H_0) {\rm d} {\bf x},$

(3.2)

where $\mu({\bf x}\vert{\bf z}; T, H_0)$ comes from (2.5). Since is fixed, to absolutely maximize $Q_{\mu\vert z; H_0}$ (i.e. for each $g({\bf z})$ ), $Q_{\mu\vert z; H_0}$ must be maximized for each ${\bf z}$ . This is achieved if $\mu({\bf x}\vert{\bf z}; T, H_0)$ is the uniform distribution, which is the MaxEnt distribution on regions of compact support. But, in (2.5), $\mu({\bf x}\vert{\bf z}; T, H_0)$ is shaped by $p({\bf x}\vert H_0)$ . Therefore, we have two requirements, (a) $p({\bf x}\vert H_0)$ must be of constant value on any manifold (2.4), and (b) all manifolds ${\cal M}({\bf z};T)$ must be compact. There are two ways to achieve these requirements depending on ${\cal X}$ .

When ${\cal X}$ is itself a compact set, such as the unit hypercube $0\leq x_i \leq 1$ for $1\leq i \leq N$ , we can make $p({\bf x}\vert H_0)$ the uniform distribution. Then, so long as the manifold ${\cal M}({\bf z};T)$ is compact for all ${\bf z}$ ^3.1, then $\mu({\bf x}\vert{\bf z}; T, H_0)$ will be a proper uniform distribution for all ${\bf z}$ , which has maximum entropy. Alternatively, when ${\cal X}$ is infinite in extent, the manifold can be forced to be compact by the inclusion of an energy statistic in ${\bf z}$ (first proposed in [3]). The solution for compact ${\cal X}$ and the solution for unbounded ${\cal X}$ are formalized by the following two theorems.

Theorem 2 Maximum Entropy PDF Projection - Compact ${\cal X}$ . Starting with the same assumptions as Theorem 1, we further assume that ${\cal X}$ is a compact set and $\int_{{\bf x}\in {\cal X}} \; {\rm d} {\bf x}= a < \infty.$ Furthermore, we assume that ${\cal M}({\bf z};T)$ is a compact set for all ${\bf z}\in{\cal Z}$ . Then, the PDF

$\displaystyle G^*({\bf x}; T,g) = \frac{a^{-1}}{p({\bf z}\vert H_0;T)} g({\bf z}),$

(3.3)

where $p({\bf z}\vert H_0;T)$ is the distribution of ${\bf z}$ under the uniform assumption $p({\bf x}\vert H_0)=a^{-1}$ , is the member of ${\cal G}(T,g)$ with highest entropy.

The second case considers when ${\cal X}$ is not compact. The central problem in choosing the maximum entropy reference hypothesis when ${\cal X}$ is not compact is that entropy of a distribution can go to infinity unless something is done to constrain it. There are two approaches:

Assume a fixed variance (or mean). A finite entropy results if the scale parameter of the distribution (variance or mean depending on the distribution) is fixed. We use this approach for each layer in a projected belief network (PBN), covered in Chapter 16.
Use an Energy statistic. When assuming a fixed scale parameter is not wanted or does not make sense, we are left with using an energy statistic. For this case, we need the next theorem.

Theorem 3 Maximum Entropy PDF Projection - Unbounded ${\cal X}$ . Starting with the same assumptions as Theorem 1, we further assume that there exists a function such that

$\displaystyle f({\bf z})=f(T({\bf x}))=\Vert{\bf x}\Vert$

(3.4)

for some norm $\Vert{\bf x}\Vert$ valid on ${\cal X}$ . We further assume that for all finite ${\bf z}\in{\cal Z}$ , ${\cal M}({\bf z};T)$ is a compact set. Then, if the reference distribution can be written in the form

$\displaystyle p({\bf x}\vert H_0) = h(T({\bf x}))$

(3.5)

for some function , then the projected PDF (2.2) is the member of ${\cal G}(T,g)$ with highest entropy.

The highest-entropy distribution, which is found by maximizing the entropy of $G({\bf x};H_0,T,g)$ over , is denoted by $G^*({\bf x}; T,g)$ . Intuitively, $G^*({\bf x}; T,g)$ has the MaxEnt property because all samples generated by $G({\bf x}; T,g)$ for a given ${\bf z}$ are equally likely (the uniform distribution, which has maximum entropy on a compact set). This is a result of the fact that for a given ${\bf z}$ , all samples ${\bf x}$ on the manifold ${\cal M}({\bf z})$ have constant value on $p({\bf x}\vert H_0)$ , making $G^*({\bf x}; T,g)$ also constant on the manifold, implying that the manifold distribution is in fact the uniform distribution. The reader is referred to a previous article for additional details of the proof [3].