Training on Huge data sets

If the number of data samples ($ N$) is very large, the training scripts can choke like a chihuahua trying to eat a watermellon in one gulp. To handle this problem, there are scripts that can chop the watermellon into bite-size chunks and have the same effect as the whole watermellon. The relevant scripts are software/gmix_accum.m and software/gmix_norm.m. The following code demonstrates how to use these two routines in place of software/gmix_step.m.



%--------------------------------------------------
% Synopsis: bite-size replacement for
%    [gparm,Q] = gmix_step(gparm,xn);
%  The following code is equivalent to one call to
%  gmix_step.  The numerical behavior is identical.
%--------------------------------------------------
%

gparm = init_gmix( .....);

for iteration=1:10,

   % initialize accumulators to zero
   % at start of each iteration
   newmean=[];
   newvar=[];
   atot=zeros(nmode,1);
   for i=1:nmode,
      newmean{i}=zeros(dim,1);
      newvar{i}=zeros(dim,dim);
   end;
   qtot=0;

   % Loop over 1000 bite-size pieces
   for i=1 : 1000,
      x = ...     % get new data matrix
      [newmean,newvar,atot,qtot] = ...
         gmix_accum(gparm,x,newmean,newvar,atot,qtot);
   end;

   % finalize the iteration
   gparm = gmix_norm(gparm,newmean,newvar,atot);
end;



Baggenstoss 2017-05-19