Feature PDF

In PDF projection, we assume that the PDF of the features is known. Unless there is a need to identify special hypotheses, it is normally denoted by $g({\bf z})$. In practice, $g({\bf z})$ is estimated from available training data, or may be given. If ${\bf z}$ is a fixed-size vector, $g({\bf z})$ can be modeled as a Gaussian mixture (Section 13.2). If ${\bf z}$ consists of a sequence of feature vectors, the PDF $g({\bf z})$ must be the joint distribution of the entire sequence, typically calculated under a Markov assumption using a hidden Markov model and the forward procedure (Section 13.3).

In practice, the input data can be segmented. In this case, ${\bf x}$ can represent a single segment or sample, or else the entire data record, depending on the application. Important is that we must be consistent, so if ${\bf x}$ represents the entire data record, so must ${\bf z}$ represent the collection of all feature vectors extracted from ${\bf x}$, and if ${\bf x}$ represents one segment, ${\bf z}$ must be the feature extracted from that segment or sample.