The maximum number of modes to start with is about where is the dimension and is the number of samples. If all the modes “share" the data equally, that is samples per mode, a bare minimum. It is generally not problematic if the number of modes is over-specified since covariance estimates are stabilized by the conditioning discussed in section 13.2.4. And, as long as the amount of training data can support the number of modes chosen, the approximation is good. The mixing weight of a mode () multiplied by the number of input data samples determines how many input data samples are effectively used to estimate the mode parameters. This is a simple measure of the “value" of each mode. As long as this product is high enough, the mode is estimated accurately. If falls too low, the mode is eliminated or combined with another. With a combination of covariance constraints, pruning, merging, and mode splitting, a good PDF approximation can be obtained reliably.