Exact PDF of an ARMA process - Very efficient method using filtering

In spite of the speedup of the Levinson algorithm, notice that for $N=5000$, 100 milliseconds are needed, and at the rate of $N^2$, 10 seconds would be needed for $N=50000$. In some applications, that may be excessive. An even more efficient method exists that based on the idea of a whitening filter. The basic idea is the following. Let ${\bf C}$ be any lower-triangular (LT) matrix such that

$\displaystyle {\bf R}^{-1}_N = {\bf C}^\prime {\bf C}.$

Thus,

$\displaystyle {\bf x}^\prime \; {\bf R}_N^{-1} \; {\bf x}= {\bf x}^\prime \; {\bf C}^\prime {\bf C} \; {\bf x}.$

Now consider the vector ${\bf u}= {\bf C} \; {\bf x}$. The covariance of ${\bf u}$ is the identity matrix:

$\displaystyle {\cal E}({\bf u}{\bf u}^\prime) = {\cal E}( {\bf C} \; {\bf x}{\bf x}^\prime {\bf C}^\prime) = {\bf C} {\bf R}_N {\bf C}^\prime = {\bf I}.
$

So, ${\bf u}$ is said to be whitened, because it consists of iid samples of Gaussian noise with mean 0 and variance 1, denoted by N(0,1), having a flat power spectrum. The log PDF of ${\bf x}$ can be written in terms of u.
   lpx = -N/2*log(2*pi) + ldetC -.5*sum(u.^2);
where ldetC is the log-determinant of ${\bf C}$. We may, if we want, break down ${\bf C}$ into an LT part $\tilde{\bf C}$ with diagonal elements equal to 1, and a diagonal matrix ${\bf S}$,

$\displaystyle {\bf C} = {\bf S} \; \tilde{\bf C}.$

Then, we have

$\displaystyle {\bf S} \; \tilde{\bf C} \; {\bf R}_N \tilde{\bf C}^\prime \; {\bf S} = {\bf I}.
$

Thus,

$\displaystyle {\bf R}_N^{-1} = \tilde{\bf C}^\prime \; {\bf S}^2 \; \tilde{\bf C}
$

Because we imposed the constraint that $\tilde{\bf C}$ is lower-triangular with diagonal elements of 1, $\tilde{\bf C}$ has a determinant of 1. The determinant of R is therefore the same as the product of the diagonal elements of ${\bf S}^2$. The whitened vector u is not unique because any matrix ${\bf C}$ which accomplishes the whitening, in fact any linear transformation which accomplishes this is suitable. Note that we can obtain ${\bf C}$ by Cholesky factorization. By convention, the Cholesky factor is upper-triangular, therefore, in MATLAB, we need to twist it around:
   C = fliplr(flipud(chol(inv(R))));

For an ARMA process, it is only necessary to reverse the AR and MA coefficients to obtain an inverse filter that recovers the innovation sequence $e_t$ and whitens the data:

   u = filter(a,b,x)/sqrt(sig2);
If we ignore filter startup errors, the data PDF could be written:
>> u=filter(a,b,x) / sqrt(sig2); 
>> lpx = -N/2*log(2*pi) +ldetC -.5*sum(u.^2);
Note that by comparing the filtering approach with the above matrix approach, we can deduce that the rows of the matrix $\tilde{\bf C}$ converge to the impulse response of the whitening filter and that the diagonal elements of ${\bf S}^2$ converge to $1/\sigma^2$. Thus, ldetC = -N/2*log(sig2).

But, not knowing the correct filter initial values causes a startup transient that dies away at a rate depending on the numerator polynomial $[1 \; b_1 \ldots b_Q]$. After some amount of time, say $K$ samples, the samples of u will agree closely with e. There is a requirement that the MA polynomial

$\displaystyle B(z) = 1 + b_1 z^{-1} + b_2 z^{-2} \cdots + b_Q z^{-Q}$

has $Q$ roots inside the unit circle. We can determine how long we have to wait by solving for the autocorrelation function of the inverse of the ARMA process.
>> ri = real(ifft( 1./ rho));
>> K = max(find(abs(rbi(1:floor(N/2))) > ri(1)/30));
Note that due to the symmetry of the ACF, the time-reversed whitener will also work, but will produce a sequence that does not converge to $\epsilon_t$.
   v = filter(a,b,flipud(x))/sqrt(sig2);
Our approach will be to use the exact approach for the first K values and the filter method for indexes [K+1:N]. As $N$ increases, the overhead of computing the first K values will be constant and we will have an algorithm with order $N$ execution time. We end up with a hybrid filtering/exact approach:
>> %% PART 1 - EXACT on first K samples
>> [y,ldetR] = toepsolve(r(1:K),x(1:K),K);
>> lpx1 = -K/2*log(2*pi) -.5*ldetR -.5*x(1:K)'*y;
>> %% PART 2 - FILTERING APPROACH
>> u=filter(a,b,x) / sqrt(sig2); % filtering approach
>> lpx2 = -(N-K)/2*log(2*pi)  -(N-K)/2*log(sig2) -.5*sum(u(K+1:end).^2);
>> lpx = lpx1 + lpx2; % combine
In PART 2, we have made use of the equivalence of the filtering approach and the matrix approach to deduce the determinant term
>>   -(N-K)/2*log(sig2)

Note that above, the first $K$ elements of u are not “white". If we really need the first $K$ samples of u to be iid, we can “correct" them:

>> Rb=toeplitz(r(1:K));
>> C = fliplr(flipud(chol(inv(Rb))));
>> u(1:K) = C * x(1:K); % exact "fix" of first K samples
however, it would be only for academic purposes - in general we don't need u(1:K). In Figure 10.4, we compare u generated from u=filter(a,b,x)/sqrt(sig2); with e and with the “corrected" u. See software/pdf_arma_exact.m An example is run by script software/test_arma.m.
Figure: Comparison of $u_t$ with $e_t$ for ARMA process $e_t$. Connected dots: e, circles: u obtained from filtering only, triangles: u obtained from filtering and “fixing" first $K$ samples ($K=5$). Note that the first samples of e can never be reconstructed.
\includegraphics[width=4.5in,height=3.0in, clip]{ma_u.eps}
For more details see software/pdf_arma_exact.m, which is used in software/module_acfx.m and software/test_arma.m. There is more on this in Sections 10.4.10 and 10.4.10.