Data re-synthesis of positive data from linear features: UMS

We now turn our attention to the re-synthesis of $ {\bf x}$ from $ {\bf z}$ using UMS. This function is implemented by software/module_A_chisq_synth.m. The theory behind the method is not simple, and deserves a lengthy treatment.

Given a fixed feature, say $ {\bf z}^*$, we sample from the manifold

$\displaystyle {\bf x}\; : \; {\bf A}^\prime {\bf x}={\bf z}^*, \;\;\;\; {\rm such \; that\;} x_i>0, \; {\rm for \; all}\; i$ (5.9)

In general, we can access the desired manifold by using the linear space orthogonal to the columns of $ {\bf A}$. Let matrix $ {\bf B}$ defined as in Section 4.4. Let

$\displaystyle {\bf x}= {\bf A} ({\bf A}^\prime {\bf A})^{-1} {\bf z}^* + {\bf B} {\bf u}.$ (5.10)

Thus, $ {\bf u}$ is the ancillary statistic that spans the manifold. We need to generate samples uniformly in $ {\bf u}$ within the region that meets the positivity constraint in (5.9). Uniformly sampling in $ {\bf u}$ creates a uniformly-sampled region in $ {\bf x}$, but unfortunately, it is difficult to find the region in $ {\cal R}^{N-D}$ which maps to $ {\cal P}^N$. Rejection sampling is known to suffer from exponentially decreasing acceptance rate. A method based on Hit-and-Run sampling, described in Section 5.3.1, can efficiently generate samples uniformly distributed on the manifold.

To visualize the distribution of $ {\bf x}$ generated using UMS, we conducted the following experiment. We used a feature of dimension $ D=2$, with the first feature equal to $ t_1({\bf x})$, and generated random samples of $ {\bf x}$ on the manifold using rejection sampling. Figure 5.7 (top left) shows samples of $ x_1,x_2$ showing the desired uniform distribution. Figure 5.7 (top right) shows the histogram of $ x_2$ for 10000 samples.

Figure: From top: manifold sampling results for $ N=4$, $ N=6$, and $ N=10$ (manifold dimension 2,4,8, respectively). Left: random samples of $ x_1$, $ x_2$. Right, histogram of $ x_2$.
For manifold dimensions above 2, the manifold distribution does not look uniform when projected onto a 2D plane even though it is uniform in the higher dimensions. With increasing manifold dimension, the histogram on the right side of the figure looks increasingly exponential. This effect is analogous to Figure 4.1, which tended to Gaussian. There is an analogous argument based on the fact that dividing a set of exponential random variables by their sum, generates uniform distribution on a simplex [23] (the constraint that $ t_1({\bf x})$ is fixed constrains $ {\bf x}$ to a simplex).

Baggenstoss 2017-05-19