Exact forms for circularly stationary processes are based on equation (9.7). Because most operations can be done with the FFT, it is much more efficient to work with circularlystationary processes. The problem is that realworld data is not circularlystationary. The sample at the start of a chunk of data is generally independent of the data at the end of the chunk. Consider a segment of data with a rising trend. When seen as a circular process, there will be a large discontinuity caused by the large level shift between the first and last sample. This causes severe and unwanted disturbances in the spectrum, if processed directly by the FFT. This can mask the true spectral character that is desired.
Can we use circularly stationary processes to approximate stationary processes? Refer to Figure 9.1. In the figure are two timeseries of length 128 samples. Both timeseries were produced by a noncircular ARMA process. The known theoretical ARMA parameters were used in both equations (9.6) and (9.7) and the difference was noted. For the top panel, there was a large loglikelihood difference, but for the lower panel, hardly any difference. The reason is the apparent discontinuity in the timeseries when the timeseries is viewed as a circular ARMA process. Then, the first samples must ``mate" with the last few samples. As increases, this likelihood difference remains, but becomes a smaller and smaller compared to the total loglikelihood or when expressed as a persample log likelihood.

This problem is generally solved by using a shading function, such as Hamming or Hanning weighting that gradually brings the edges of a data segment to zero, so there will be no circular discontinuity. In classspecific classifiers using the PDF projection theorem, such a shading function causes difficulties. The problem with using the PPT and shading has to do with the definition of the input data . The advantage of the PPT is the ability to use a variety of feature extraction methods, but all must use the same as ``input data". As long as all feature extraction methods that are being compared use the same shading function, there's no problem. But, this would effectively mean that is the shaded data. This could defeat the purpose of PPT, where a variety of feature extractions can be applied to the same input data. In reality, one shading function may not be optimal for all features. One feature, for example, may first chop into smaller segments, whereas another might process as it is. This means that some features would use different shading functions. If the shading function is considered part of the feature extraction, then some samples in will be weighted nearly to zero for some features, and not for others. This violates basic rules that should be followed in using the PPT. Each feature set should contain an ``energy statistic" that is able to measure the norm of the full (See Section 3.2.2). If we violate this, then there is no guarantee that there is any reasonable basis for comparing the projected PDFs of different feature sets.
We will discuss this problem more in Section 12.1 when we talk about ways of segmenting the data. In fact, there is a way to compare feature sets using different shading functions, but this is subject to strict rules. The shading functions must conform to the Hanning3 method (Section 12.1.3) with 2/3 overlap and Hanning weighting. The basis of comparison is derived from the ``virtual input data" principle (Section 12.1.3). There is another way outside of shading to avoid the spectral disturbances caused by circular discontinuities. This requires onthe fly segmention methods (Section 12.1.4) or MRHMM (Chapter 14).
So, as a general rule, use circular processes and features based on the FFT, only within the framework of Hanning3 segmentation (Section 12.1.3), or possibly within the framework of onthe fly segmention methods (Section 12.1.4), or MRHMM (Chapter 14). These frameworks will reduce the effects of circular discontinuities. Unless your data is wellbehaved enough that discontinuties caused by FFT analysis is not a problem, use noncircular methods for timeseries analysis. In what follows, we will be careful to mention whether we are talking about circular on noncircular feature extraction.
Baggenstoss 20170519