Auto Labeling

To run the script with auto-labeling, use manual_label=0.

   iclass=1; 
   manual_label=0;
   use_ar=1;
   iseed=17;

Auto labeling was described in Section 14.4.2. To prepare for auto-labeling, we need to run the automatic segmentor:

 % segment the data
    iplot=0; % set to 1 to see the segmentation in action
    Seg=segment_data(X,K,Ns,Ps,fs,1,0,iplot);

Then, to complete the auto-labeling, we need to run the clusterer:

   iplot=0; % set to 1 to see the clustering in action
   ncluster=4;
   [Idx,Mu,wts]=cluster_data(X,Seg,ncluster,fs,iplot);
   subclassnames={};
   for i=1:ncluster, subclassnames{i}=sprintf('Cls%d',i); end;

Note that arbitrary signal class names are chosen. Note also that each time the clustering is run, a different result can be obtained. It is prudent to run the clusterer several times, train the MR-HMM, then choose the one that results in the largest MR-HMM log likelihood. Since we are using auto-labeling, we can't take into account our knowledge of the states and signal classes. Therefore, we must use four signal classes and four states with 1:1 correspondence. The input arguments class_feat_map, state_to_subclass, A_mask, Pi_mask, beta_end are therefore not used:

  [hparm,lptot]=mrhmm_iterate(hp,Z, J,{},ntot,num_iter,use_viterbi,iplot,X);

As you will see, the auto-labeling results in about as good a set of parameters as manual labeling, but occasionally, a bad parameter set will result, which can be recognized by low total log-likelihood values.

**Figure 14.7:** MR-HMM in operation
$\includegraphics[width=6.0in]{mrhmm_test1b.eps}$

Because the states are automatically determined by clustering, the states occur in arbitrary order, but the resulting state probabilities should show crisp values, and should occur at the correct instances. Figure 14.7 is an example of good results using auto-labeling. Often there is confusion between the 'gap' and 'noise' states, for example at the start of Figure 14.7, which occurs for obvious reasons. A good parameter set will show a distinct difference beween them and will generally have a higher total log likelihood value.