Feature PDF

In PDF projection, we assume that the PDF of the features is known. Unless there is a need to identify special hypotheses, it is normally denoted by $ g({\bf z})$. In practice, $ g({\bf z})$ is estimated from available training data, or may be given. If $ {\bf z}$ is a fixed-size vector, $ g({\bf z})$ can be modeled as a Gaussian mixture (Section 13.2). If $ {\bf z}$ consists of a sequence of feature vectors, the PDF $ g({\bf z})$ must be the joint distribution of the entire sequence, typically calculated under a Markov assumption using a hidden Markov model and the forward procedure (Section 13.3).

In practice, the input data can be segmented. In this case, $ {\bf x}$ can represent a single segment, or else the entire data record, depending on the application. Important is that we must be consistent, so if $ {\bf x}$ represents the entire data record, so must $ {\bf z}$ represent the collection of all feature vectors extracted from $ {\bf x}$, and if $ {\bf x}$ represents one segment, $ {\bf z}$ must be the feature extracted from that segment.

Baggenstoss 2017-05-19