To illustrate the PDF estimation problem,
we will use some 3-dimensional features from
a mysterious source.
Samples of the feature vector
were used as training data
and were stored in variable data1, which is of size
3 by , where is the number of independent samples.
Each row of the matrix stores the
samples of a different feature.
The following code segment implements the training
and displays the resulting PDF in a density plot.
NMODE=10; min_std = [20 20 1.0]; names = {'Z1','Z2','Z3'}; gparm1 = init_gmix(data1,NMODE,names,min_std); for i=1:100, [gparm1,Q] = gmix_step(gparm1,data1); fprintf('%d: Total log-likelihood=%g\n',i,Q); end; gmix_view2(gparm1,data1,1,2);Refer to table 13.3 for symbol names. The variable names is a cell array that stores the feature names for use in visualization plots. The variable min_std stores the minimum feature standard deviations (See section 13.2.4). The routine software/init_gmix.m creates an initial set of parameters. In simple problems, the mixture can be trained by repeated calls to software/gmix_step.mas shown. In more difficult problems, it is necessary to do more to insure that there are the right number of modes and that the algorithm is converging properly. A representative MATLAB program for training are software/gmix_est.m and software/gmix_trainscript.m, which in turn call software/gmix_step.m, the subroutine that actually implements the E-M algorithm. We will discuss the use of software/gmix_trainscript.m and software/gmix_est.m in more detail in the following sections. Results of running the above code segment are shown in Figure 13.1.
|
Before iterating, a starting point is needed for the GM parameters. This is handled by software/init_gmix.m. This routine inputs some samples of data vectors , the number of GM terms to use (), the covariance conditioning parameters , and the names of all the features. The GM component means are initialized to randomly selected input data samples. The covariances are initialized to diagonal matrices with large variances. It is important to use variances on the order of the square of the data volume width . The size of the variances at initialization determines the data “window" through which each GM component “sees" the data. Too small a window at initialization can lock the algorithm into the wrong local minimum of the likelihood function. The initial weights are set to be all equal.
There are two approaches to determining the number of modes. The first is to sprinkle a large number of modes throughout the data volume and remove the weak or redundant ones as it converges. The second approach is to start with just one mode and add modes as needed. The way you determine if a new mode is needed (by splitting an existing mode) is by a skew or kurtosis measure (software/gmix_kurt.m). These two methods, called top-down and bottom-up, respectively will be covered in section 13.2.5.