Accuracy of Predicted Mean and Mixing Rate

While Figure 5.11 suggests that $\hat{\mbox{\boldmath $\lambda$}}({\bf z}^*)$ is an accurate predictor of the asymptotic mean of MCMC-UMS, we need to quantify the accuracy. We can visualize the convergence to the predicted mean, by plotting the error metric

$\displaystyle \epsilon = \frac{1}{N}\sum_{i=1}^N \left[ \log
\left( \frac{\bar{x}_i}{\hat{\lambda}_i}
\right)\right]^2,$ (5.24)

where $\bar{x}_i$ is the sample mean of element $i$ of ${\bf x}$ and $\lambda_i$ is corresponding element of $\hat{\mbox{\boldmath $\lambda$}}({\bf z}^*)$. We compute $\epsilon$, which is mean square fractional error, as a function of iterations. Figure 5.12 shows the results for the systematic and random directions approaches plotted as a function of full iterations5.2. The graph also shows the result with whitening, which will be described later.

Since the samples in consecutive iterations are correlated, it may take some iterations for the sample mean to achieve a low error. The slower the mixing, the more samples are needed. Mixing rate for MCMC approaches is generally measured by effective sample size (ESS), which measures the number of independent samples needed to obtain the equivalent estimation error [44]. Thus, we can see the change in ESS in Figure 5.12 by drawing horizontal lines. Note that the error continues decreasing even after 100,000 full iterations, a strong indication that the proposed MaxEnt solution is a very good (if not exact) approximation to the asymptotic mean of MCMC-UMS. But, it is surprising to see that the random directions method is much slower.

Figure 5.12: Convergence of MCMC-UMS sample mean to MaxEnt solution for four approaches: systematic (SYS), random directions (RAN), with whitening (-W). X-axis: number of full iterations.
\includegraphics[height=3.5in,width=5.2in]{all_iter.eps}

To investigate this, we tried the systematic approach after randomizing the orthogonal basis matrix ${\bf B}$, by post-multiplying it by an orthogonalized random matrix of dimension $(N-D)\times(N-D)$. Surprisingly, the performance then approximately matched that of random directions. It appears, then, that the advantage of the systematic method comes not from the systematic way that the basis functions are used, but has to do with the fact that the basis functions, as they are calculated by SVD or QR decomposition of matrix ${\bf A}$, are optimally aligned. Computing ${\bf B}$ from ${\bf A}$ using either SVD and QR had the same effect. So, the same orthogonal subspace is spanned by the randomized matrix ${\bf B}$ but has much slower mixing. To understand this effect, think of a very long, thin convex region. If we start at one end of the region, and use a random walk, we are likely to remain at the same end of the region for a long time since movement precisely along the main axis is rare. However, if we walk systematically, first perpendicular to the main axis, then parallel to the main axis, we can quickly cover the region. Why this optimal alignment happens and whether it is problem-specific is not clear.