Features

Let $ {\bf x}=[x_1 x_2 \ldots x_N]$ be a set of $ N$ independent random variables (RVs) distributed under hypothesis $ H_0$ according to the common PDF $ p_0(x)$. The joint probability density function (PDF) of $ {\bf x}$ is

$\displaystyle p({\bf x}\vert H_0) = \prod_{i=1}^N \; p_0(x_i).
$

Now let $ x_i$ be ordered in decreasing order into the set $ {\bf y}=[y_1, y_2 \ldots y_N]$ where $ y_i \geq y_{i+1}$. We now choose a set of $ M-1$ indexes $ t_1, t_2 \ldots t_{M-1}$, with $ 1\leq t_1<t_2< \cdots t_{M-1}\leq N$ to form a selected collection of order statistics $ y_{t_1} ,y_{t_2}
\ldots y_{t_{M-1}}$. To this set, we add the residual ``energy",

$\displaystyle r=\sum_{j\in {\cal M}} h(y_j),$ (7.1)

where the set $ {\cal M}$ are the integers from $ 1$ to $ N$ not including the values $ t_1, t_2 \ldots t_M$, and $ h(x)$ is a function which is needed to insure that $ r$ has units of energy and controls the energy statistic. We then form the complete feature vector of length $ M$ ($ M\geq 2$):

$\displaystyle {\bf z}= [y_{t_1} \;y_{t_2}
\ldots y_{t_{M-1}} \; r]^\prime.
$

By appending the residual energy to the feature vector, we insure that $ {\bf z}$ contains the energy statistic. We consider two important cases:
  1. if $ {\bf x}$ is positive intensity or spectral data and has approximate chi-square statistics (resulting from sums of squares of Gaussian RV), then $ h(x)=x$ is sufficient. The resulting energy statistic and reference hypotheses are the ``Exponential" in Table 3.1. For this case, we consider $ {\bf x}$ to be a set of magnitude-squared DFT bin outputs, which are exponentially distributed.
  2. if $ {\bf x}$ are raw measurements and have approximate Gaussian statistics, use $ h(x)=x^2$ or $ h(x)=\vert x\vert^2$. The resulting energy statistic and reference hypotheses are the ``Gaussian" in Table 3.1. For this case, we let $ {\bf x}$ be a set of absolute values of zero-mean Gaussian RVs. Then, $ h(x)=x^2$.

Baggenstoss 2017-05-19