Features

Let ${\bf x}=[x_1 x_2 \ldots x_N]$ be a set of $N$ independent random variables (RVs) distributed under hypothesis $H_0$ according to the common PDF $p_0(x)$. The joint probability density function (PDF) of ${\bf x}$ is

$\displaystyle p({\bf x}\vert H_0) = \prod_{i=1}^N \; p_0(x_i).
$

Now let $x_i$ be ordered in decreasing order into the set ${\bf y}=[y_1, y_2 \ldots y_N]$ where $y_i \geq y_{i+1}$. We now choose a set of $M-1$ indexes $t_1, t_2 \ldots t_{M-1}$, with $1\leq t_1<t_2< \cdots t_{M-1}\leq N$ to form a selected collection of order statistics $y_{t_1} ,y_{t_2}
\ldots y_{t_{M-1}}$. To this set, we add the residual “energy",

$\displaystyle r=\sum_{j\in {\cal M}} h(y_j),$ (8.1)

where the set ${\cal M}$ are the integers from $1$ to $N$ not including the values $t_1, t_2 \ldots t_M$, and $h(x)$ is a function which is needed to insure that $r$ has units of energy and controls the energy statistic. We then form the complete feature vector of length $M$ ($M\geq 2$):

$\displaystyle {\bf z}= [y_{t_1} \;y_{t_2}
\ldots y_{t_{M-1}} \; r]^\prime.
$

By appending the residual energy to the feature vector, we insure that ${\bf z}$ contains the energy statistic. We consider two important cases:
  1. if ${\bf x}$ is positive intensity or spectral data and has approximate chi-square statistics (resulting from sums of squares of Gaussian RV), then $h(x)=x$ is sufficient. The resulting energy statistic and reference hypotheses are the “Exponential" in Table 3.1. For this case, we consider ${\bf x}$ to be a set of magnitude-squared DFT bin outputs, which are exponentially distributed.
  2. if ${\bf x}$ are raw measurements and have approximate Gaussian statistics, use $h(x)=x^2$ or $h(x)=\vert x\vert^2$. The resulting energy statistic and reference hypotheses are the “Gaussian" in Table 3.1. For this case, we let ${\bf x}$ be a set of absolute values of zero-mean Gaussian RVs. Then, $h(x)=x^2$.