Hanning-3 Segmentation

Block segmentation was necessary for strict application of PDF projection to segmented data. In 2012, a means was discovered to circumvent this restriction, and still obtain ``exact" projected likelihood comparison between differently-segmented branches, but it is only possible for 2/3 overlap, 3/4 overlap, etc. It is not possible for the commonly-used 50% overlap. Since 3/4 overlap and more is rarely used, we use 2/3 overlap and call the method Hanning-3. Consider overlapped segments with segment size $ K$ and window time shift $ S$, overlapping by $ O = K-S$ samples. If we circularly-index the data such that $ x_{N+i}=x_i$, we will obtain exactly $ T = N/S$ segments. Let $ {\bf x}_i=[x_{(1+Si)} w_1, x_{(2+Si)} w_2, \ldots x_{(K+Si)} w_K],$ be the $ i$-th segment where $ [w_1,w_2\ldots w_K]$ are the Hanning weights. The Hanning weights must be periodic with a period of $ K$, not $ K-1$ as is typically used to avoid any zero weights.

$\displaystyle w_t=\frac{1-\cos(2\pi(t-1)/K)}{c}, \;\; 1\leq t \leq K,$ (12.3)

where $ c=\frac{3}{\sqrt{2}}.$ The special scaling factor $ c$ is non-standard and results in the desired property that follows. Let the complete hanning-3 segmentation be denoted by $ {\bf X}^{K,h3}=[{\bf x}_1 , {\bf x}_2 \ldots {\bf x}_T].$ Note that $ {\bf X}^{K,h3}$ has a total dimension of $ K\times T = (3S)\times (N/S)$, so has a total dimension of $ 3N$, three times as large as the dimension of $ {\bf x}$.

To use various hanning-3 segmentations together in a class-specific classifier, we need to apply the concept of virtual input data. Consider two hanning-3 segmentations with different segment sizes $ K_l$ and $ K_m$, denoted by $ {\bf X}^{K_l,h3}$ and $ {\bf X}^{K_m,h3}$. It has been shown [49] that with weights $ w_t$ as defined in (12.3), that $ {\bf X}^{K_l,h3}$ and $ {\bf X}^{K_m,h3}$ are related by an orthogonal linear transformation. Specifically, there exists an ortho-normal matrix $ {\bf U}$ such that $ {\bf X}^{K_l,h3} = {\bf U} \; {\bf X}^{K_m,h3}.$ In Figure 12.2, the output of each segmentation operation is considered as the ``virtual input data" of each branch. Each branch has a different virtual input data, but they are considered ``equivalent". Therefore, the projected likelihood function for $ {\bf X}^{K_l,h3}$ may be compared to the projected the likelihood function for $ {\bf X}^{K_m,h3}$.

Each block ``feature calculation" in the figure normally consists of more than one stage, organized as a chain (See Section 2.2.4). The starting reference hypothesis ($ H_{0x}$ in equation 2.9) is typically canonical reference hypothesis, exponential or Gaussian, in which all the elements of $ {\bf x}$ are independent. Thus, all the elements of $ {\bf X}^{K,h3}$ are assumed independent under $ H_0$. Naturally, this assumption is false. How can this be good to base PDF projection and the comparison of the various branch projected PDFs on such an obviously false assumption?

Good question! Let's delve into this question further. Let's start off by saying that the reference hypothesis is there only as a mathematical tool for PDF projection. The PDF projection theorem guaranteed that regardless of the reference hypothesis, independent assumption or not, the resulting projected PDF, $ G({\bf X}^{K,h3};H_0,T,g)$ is a PDF, and is a member of the class of PDFs that generate the corresponding feature PDF $ g({\bf Z}^K)$. Furthermore, provided there is a corresponding energy statistic, it is the maximum entropy member of the class. The projected PDFs of all the branches are then PDFs in the same data space since using an orthonormal rotation, all can be converted to PDFs of a common data space.

With that said, what effect does using this ``false" independence assumption really have? Let's assume we are striving for sufficiency optimality (Section 2.2.2) in which the projected PDF can be equated to the true PDF. For the sufficiency condition to be met or approximated, the feature must be a sufficient statistic for distinguishing between the true PDF and $ H_0$. At the virtual input data, under the ``true" PDF of $ {\bf x}$, which in this case, means when real data is used, the individual segments of $ {\bf X}^{K,h3}$ are Hanning-weighted, and adjacent segments are highly correlated since $ 2/3$ of the data samples are the same. But, under $ H_0$, the samples in a segment of $ {\bf X}^{K,h3}$ are assumed random with identical distributions, and adjacent segments are completely independent. There is generally no hope that the features are good to distinguish these two conditions. The most common first-stage signal processing is DFT followed by magnitude squared, then some kind of smoothing in which virtually all information relating to the Hanning weighting is lost.

The Hanning-3 ``expansion" that takes a size-$ N$ input sample $ {\bf x}$ and increases it to dimension $ 3N$, is analogous to sampling rate increase through interpolation. Just as in interpolation, the original data can be recoverd using decimation as can $ {\bf x}$ be recovered from $ {\bf X}^{K,h3}$ using overlap-add. And, analogous to feature extraction from 3:1 interpolated data, the suitability of the features must be evaluated with respect to the low-pass information, not with repect to the missing high-pass information. Therefore, although using a reference hypothesis that makes the ``false" assumption of independent data is very suspicious in the realm of ``virtual input data", it can be quite reasonable in the realm of real data.

Figure 12.2: The concept of virtual input data illustrated for three hanning-3 segmenation sizes.

The mathematical postulations of Hanning-3 can be tested using the function software/hanning3_wts.m, with syntax [w,W,A]=hanning3_wts(K,N);. The outputs include w, which is the weight vector $ {\bf w}$, W which is the $ N\times K$ matrix of window functions, and A is the $ 3N\times N$ linear expansion matrix that creates $ {\bf X}^{K_l,h3}$ from $ {\bf x}$ in one column. In Figure 12.3, we plotted W as an image for $ K=192$, $ N=1536,$ for which there are $ T=24$ segments, and for $ K=768$, $ N=1536,$ for which there are $ T=6$ segments. In the figure, you can see the circular indexing.

Let $ {\bf A}_K$ be the matrix A produced for the given value of $ K$. It is easy to verify in either case that

  1. the product $ {\bf A}_K{\bf x}$ produces the concatenated segmentation $ {\bf X}^{K,h3}$,
  2. that $ {\bf A}_K$ is orthonormal, so that $ {\bf A}_K^\prime {\bf A}_K={\bf I},$
  3. that to transform between the segmentations $ K$ and $ \tilde{K}$, we can use

    $\displaystyle {\bf A}_{\tilde{K}}{\bf x}=
{\bf A}_{\tilde{K}} \left({\bf A}_{K...
...\bf x}=
\left({\bf A}_{\tilde{K}} {\bf A}_{K}^\prime\right) {\bf A}_{K}{\bf x},$

    where $ {\bf A}_{\tilde{K}} {\bf A}_{K}^\prime$ is an orthonormal transformation since it is the product of two orthonormal transformations.
Figure: Window functions for Hanning-3 segmentation for $ K=192$, $ N=1536$ (left) and $ K=768$, $ N=1536$ (right).
\includegraphics[height=2.5in,width=2.0in]{h3192.eps} \includegraphics[height=2.5in,width=2.0in]{h3768.eps}

Baggenstoss 2017-05-19