Features

Let ${\bf x}=[x_1 x_2 \ldots x_N]$ be a set of independent random variables (RVs) distributed under hypothesis according to the common PDF . The joint probability density function (PDF) of ${\bf x}$ is

$\displaystyle p({\bf x}\vert H_0) = \prod_{i=1}^N \; p_0(x_i).$

Now let be ordered in decreasing order into the set ${\bf y}=[y_1, y_2 \ldots y_N]$ where $y_i \geq y_{i+1}$ . We now choose a set of indexes $t_1, t_2 \ldots t_{M-1}$ , with $1\leq t_1<t_2< \cdots t_{M-1}\leq N$ to form a selected collection of order statistics $y_{t_1} ,y_{t_2} \ldots y_{t_{M-1}}$ . To this set, we add the residual “energy",

$\displaystyle r=\sum_{j\in {\cal M}} h(y_j),$

(8.1)

where the set ${\cal M}$ are the integers from to not including the values $t_1, t_2 \ldots t_M$ , and is a function which is needed to insure that has units of energy and controls the energy statistic. We then form the complete feature vector of length ( $M\geq 2$ ):

$\displaystyle {\bf z}= [y_{t_1} \;y_{t_2} \ldots y_{t_{M-1}} \; r]^\prime.$

By appending the residual energy to the feature vector, we insure that ${\bf z}$ contains the energy statistic. We consider two important cases:

if ${\bf x}$ is positive intensity or spectral data and has approximate chi-square statistics (resulting from sums of squares of Gaussian RV), then is sufficient. The resulting energy statistic and reference hypotheses are the “Exponential" in Table 3.1. For this case, we consider ${\bf x}$ to be a set of magnitude-squared DFT bin outputs, which are exponentially distributed.
if ${\bf x}$ are raw measurements and have approximate Gaussian statistics, use or $h(x)=\vert x\vert^2$ . The resulting energy statistic and reference hypotheses are the “Gaussian" in Table 3.1. For this case, we let ${\bf x}$ be a set of absolute values of zero-mean Gaussian RVs. Then, .