Introductory theory

Assume that time series values we observe are realisations of random variables $Y_1, Y_2, ..., Y_T$ , which are in turn part of a larger stochastic process $\{ Y_t: t \in \mathbb{Z}\}$ .

Mean function $= \mu(t) = E[Y_t]$ Autocovariance function $= \gamma(s,t) = cov(Y_s, Y_t)$

These are fundamental parameters and would be useful to obtain estimates, however, in general there are $T + T(T+1) / 2$ parameters with $T$ units and it is not possible to estimate them all. We’ll need to impose some constraints; the most common is stationarity.

Strict stationarity: if for any $k \gt 0$ and any $t_1, \ldots, t_k \in \mathbb{Z}$ , the distribution of $(Y_t_1, \ldots, Y_t_k)$ , is the same as $(Y_{t_1+L}, \ldots, Y_{t_k+L})$ . This implies that $\mu{t} = \mu{0}$ and $\gamma(s,t) = \gamma(s-t, 0)$ . It turns out that these two implications are enough for a reasonable amount of theory.

Weak stationarity: if $E[Y_t] \lt \infinity$ , $\mu(t) = \mu$ , and $\gamma(s,t) = \gamma(s-t, 0)$ , for all $t$ and $u$ .

Equivalent for Gaussian. When TS are stationary possible to simplify: $\mu = E[Y_t]$ , $\gamma(u) = cov(Y_t, Y_{t+u})$ , may also be interested in correlation $\rho(u) = \mu(u) / \mu(0)$ .

Hilbert spaces

(vector space + inner product completed to limits of Cauchy integrals = Hilbert space, $Y_n \to Y$ , $E|Y - Y_n| \to 0$ , mean-squared convergence). Places many of operator manipulations to follow on rigorous basis.

Vector space composed of $\sum_{i=1}{N} c_i Y_t_i$ . Inner product = covariance. Denoted $\mathcal{H}$ .

Consider the lag operator, L, defined by $LY_t = Y_{t-1}$ . (Defined obviously for linear combinations). As well as being linear, the lag operator preserves inner products, thus it is a unitary operator.

Linear processes

The time series $Y_t$ , defined by $Y_t = \sum_{-\infinity}^\infinity \phi_u \epsilon_{t-u}$ , where $\epsilon_t$ is white noise, and $\sum_{-\infinity}^\infinity |\phi_u|^2 \lt \infinity$ , is called a linear process.

Partial autocorrelation function

Given stretch of time series values, partial autocorrelation is the correlation between $Y_t$ and $Y_{t-u}$ not conveyed through intervening values. If the Y-values are normally distributed then $\phi(u)$ can be defined as $\cor(Y_t Y_{t-u} | Y_{t-1},..., Y_{t-u+1})$ . A more general approach is based on regression theory, from which we can find $\phi(u) = cor(Y_t - \hat{Y}_t, Y_{t-u} - \hat{Y}_{t-u})$ .

By convention $\phi(1) = \rho(1)$ . We can find $\rho(2)$ using the theory above and the best predictor of $Y_t$ based on $Y_{t-1}$ , $\rho(1) Y_{t-1}$ . $\phi(2) = \frac{\rho(2) - \rho(1)^2}{1 - \rho(1)^2}$ .

For general AR (p) $\phi(u) = 0$ , $\forall u \gt p$ . For general MA (q) $\phi(u)$ decays exponentially as $u \to \infinity$ .

Computing the PACF

Although the definition ahove is conceptually simple, it is computationally hard. This section presents an alternative form which is simple to compute.

Consider the $k$ th order autoregressive predictor of $Y_{k+1}$ , $Y_{k+1} = \phi_{n1} Y_t + ... + \phi_kk Y_1$ , obtained by minimising $E(Y_{k+1} - \hat{Y}_{k+1})^2$ . We will show that $\phi(k) = \phi_kk$ .

Let $\mathcal{H}_n = span\{Y_1, Y_2, ..., Y_k\}$ , with associated projection $P_\mathcal{H}_n$ , $\mathcal{H}_1 = span\{Y_2, ..., Y_k\}$ , $\mathcal{H}_2 = span\{ Y_1 - P_\mathcal{H}_1 Y_1\}$ .

Thus $\hat{Y}_{k+1} = P_\mathcal{H}_n Y_{k+1} = P_\mathcal{H}_1 Y_{k+1} + P_\mathcal{H}_2 Y_{k+1}$ $ = P_\mathcal{H}_1 Y_{k+1} + a(Y_1 - P_\mathcal{H}_1 Y_1)$ , rearranging and equating to first equation shows $a = \phi_kk$ .

Dubin-Levinson algorithm calculates these coefficients recursively.