Merging Modes (gmix_merge.m)

Merging is creating a single mode from two nearly identical ones. The closeness of two modes is determined by software/mode_dist.m which works as follows. Let there be two PDF's $p_1(x)$ and $p_2(x)$. Let there be a collection of points denoted $x_i\in{\bf X}_1$ near the central peak of $p_1(x)$ and a collection of points denoted $x_i\in{\bf X}_2$ near the central peak of $p_2(x)$. Then we define the closeness metric

$\displaystyle d = \log \left\{ \frac{\displaystyle \prod_{x_i\in{\bf X}_1} p_2(...
... \prod_{x_i\in{\bf X}_1} p_1(x_i)
\prod_{x_i\in{\bf X}_2} p_2(x_i) }\right\}.
$

Notice that this metric is zero when $p_1(x)=p_2(x)$ and less that zero when $p_1(x)\neq p_2(x)$. A threshold (usually about -1 * DIM) is used to determine if the modes are too close. This threshold should increase (become more negative) as the dimension goes up.

Since $p_1(x)$ and $p_2(x)$ are just two Gaussian modes, it is easy to know where some good points for ${\bf X}_1$ and ${\bf X}_2$ are. We choose the means (centers) and then go one standard deviation in each direction along all the principal axes. The principal axes are found by SVD decomposition of ${\bf R}$ (the Cholesky factor of the covariance matrix). This is illustrated in Figure 13.4 for a Gaussian mode of dimension $P=2$. There is a center point and two points per dimension. Therefore there are $2P+1$ points per mode, and two modes, thus $4P+2$ points.

Figure: The 5 summation points for a 2-dimensional mode. Contour at 2$\sigma $.
\includegraphics[scale=0.6, clip]{closest.eps}

If two modes are found to be too close, they are merged. Merging is forming a weighted sum of two modes (weighted by $\alpha_1,\alpha_2$). The new mean is thus

$\displaystyle \mbox{\boldmath$\mu$}$$\displaystyle = \frac{\alpha_1 \mbox{\boldmath$\mu$}_1 + \alpha_2 \mbox{\boldmath$\mu$}_2 }
{ \alpha_1 + \alpha_2}$ (13.3)

The proper way to form a weighted combination of the covariances is not simply a weighed sum of the covariances, which does not take into account the separation of the means. You need to be more clever. Consider the Cholesky decomposition of the covariance matrix ${\bf\Sigma}={\bf R}^\prime {\bf R}$. It is possible to consider the rows of $\sqrt{P}\;{\bf R}$ to be samples of $P$-dimensional vectors whose covariance is ${\bf\Sigma}$, where $P$ is the dimension. The sample covariance is, of course $\frac{1}{P} (\sqrt{P})^2 \;{\bf R}^\prime {\bf R}
= {\bf\Sigma}$, Now, given two modes to merge, we regard $\sqrt{P} \;{\bf R}_1$ and $\sqrt{P}\;{\bf R}_2$ as two populations to be joined. The sample covariance of the collection of rows is the desired covariance. But this assigns equal weight to the two populations. To weight them by their respective weights, we multiply them by $\sqrt{\frac{\alpha_1}{\alpha_1+\alpha_2}}$ and $\sqrt{\frac{\alpha_2 }{\alpha_1+\alpha_2}}$. Before they can be joined, however, they must be shifted so they are re-referenced to the new central mean. Here is a summary of the method:
  1. Let $\mu$ be as in (13.3).
  2. Let ${\bf R}_i$ be the Cholesky factor of ${\bf\Sigma}_i$, $i=1,2$.
  3. Let ${\bf C}_i=\sqrt{P}\;{\bf R}_i$, each $i$.
  4. Add the vector $\mu$$_i-$$\mu$ to each row of ${\bf C}_i$, each $i$.
  5. Multiply ${\bf C}_i$ by $\sqrt{\frac{\alpha_i}{\alpha_1+\alpha_2}}$, each $i$.
  6. Form

    $\displaystyle {\bf C}=\left[
\begin{array}{c} {\bf C}_1 \cdots  {\bf C}_2
\end{array}\right]
$

  7. Then the new covariance is ${\bf\Sigma}=\frac{1}{P}\;{\bf C}^\prime {\bf C}$, or take the QR-decomposition of ${\bf C}/\sqrt{P}$ as the Cholesky factor of the new covariance.
The above algorithm is implemented by software/merge.m. The subroutine that iterates over all the pairs of modes and calls software/merge.m and software/mode_dist.mis software/gmix_merge.m. The calling syntax for software/gmix_merge.m is
	gparm = gmix_merge(gparm,max_closeness)
A good choice for the max_closeness threshold is about -1.0 times $P$, the PDF dimension.