When to use Circulary-Stationary Processes

In the preceding sections, we presented the theory for both exact and approximate forms of the data PDF for stationary ARMA, MA, and AR processes, for both circular and non-circular stationarity. The exact forms for non-circular stationarity involved equation (10.6) where the type of process is defined by the ACF. We will present various methods for efficiently evaluating (10.6) as well as ways of efficiently evaluating the derivatives and CR bound.

Exact forms for circularly stationary processes are based on equation (10.7). Because most operations can be done with the FFT, it is much more efficient to work with circularly-stationary processes. The problem is that real-world data is not circularly-stationary. The sample at the start of a chunk of data is generally independent of the data at the end of the chunk. Consider a segment of data with a rising trend. When seen as a circular process, there will be a large discontinuity caused by the large level shift between the first and last sample. This causes severe and unwanted disturbances in the spectrum, if processed directly by the FFT. This can mask the true spectral character that is desired.

Can we use circularly stationary processes to approximate stationary processes? Refer to Figure 10.1. In the figure are two time-series of length 128 samples. Both time-series were produced by a non-circular ARMA process. The known theoretical ARMA parameters were used in both equations (10.6) and (10.7) and the difference was noted. For the top panel, there was a large log-likelihood difference, but for the lower panel, hardly any difference. The reason is the apparent discontinuity in the time-series when the time-series is viewed as a circular ARMA process. Then, the first samples must “mate" with the last few samples. As $N$ increases, this likelihood difference remains, but becomes a smaller and smaller compared to the total log-likelihood or when expressed as a per-sample log likelihood.

Figure: Short segments of ARMA data ($N=128$). Top panel has a large discontinuity when viewed as a circularly stationary process. The difference between the exact PDF value (eq. 10.6) and the approximate form (eq. 10.7) is 71. The bottom panel has a small discontinuity when viewed as a circularly stationary process. The difference is only 0.8.
\includegraphics[width=4.5in,height=3.0in, clip]{arma_exact_plots.eps}

This problem is generally solved by using a shading function, such as Hamming or Hanning weighting that gradually brings the edges of a data segment to zero, so there will be no circular discontinuity. In class-specific classifiers using the PDF projection theorem, such a shading function causes difficulties. The problem with using the PPT and shading has to do with the definition of the input data ${\bf x}$. The advantage of the PPT is the ability to use a variety of feature extraction methods, but all must use the same ${\bf x}$ as “input data". As long as all feature extraction methods that are being compared use the same shading function, there's no problem. But, this would effectively mean that ${\bf x}$ is the shaded data. This could defeat the purpose of PPT, where a variety of feature extractions can be applied to the same input data. In reality, one shading function may not be optimal for all features. One feature, for example, may first chop ${\bf x}$ into smaller segments, whereas another might process ${\bf x}$ as it is. This means that some features would use different shading functions. If the shading function is considered part of the feature extraction, then some samples in ${\bf x}$ will be weighted nearly to zero for some features, and not for others. This violates basic rules that should be followed in using the PPT. Each feature set should contain an “energy statistic" that is able to measure the norm of the full ${\bf x}$ (See Section 3.2.2). If we violate this, then there is no guarantee that there is any reasonable basis for comparing the projected PDFs of different feature sets.

We will discuss this problem more in Section 12.1 when we talk about ways of segmenting the data. In fact, there is a way to compare feature sets using different shading functions, but this is subject to strict rules. The shading functions must conform to the Hanning-3 method (Section 12.1.3) with 2/3 overlap and Hanning weighting. The basis of comparison is derived from the “virtual input data" principle (Section 12.1.3). There is another way outside of shading to avoid the spectral disturbances caused by circular discontinuities. This requires on-the fly segmention methods (Section 12.1.5) or MR-HMM (Chapter 14).

So, as a general rule, use circular processes and features based on the FFT, only within the framework of Hanning-3 segmentation (Section 12.1.3), or possibly within the framework of on-the fly segmention methods (Section 12.1.5), or MR-HMM (Chapter 14). These frameworks will reduce the effects of circular discontinuities. Unless your data is well-behaved enough that discontinuties caused by FFT analysis is not a problem, use non-circular methods for time-series analysis. In what follows, we will be careful to mention whether we are talking about circular on non-circular feature extraction.