Knowledge Base / Time Series Analysis Advanced Analysis 58 min read

Time Series Analysis

Learn fundamental time series analysis techniques.

Time Series Analysis: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of time series analysis all the way through advanced modelling, forecasting, and diagnostics, with practical usage within the DataStatPro application. Whether you are a complete beginner or an experienced analyst, this guide is structured to build your understanding step by step.


Table of Contents

  1. Prerequisites and Background Concepts
  2. What is Time Series Analysis?
  3. Components of a Time Series
  4. Time Series Decomposition
  5. Stationarity
  6. Autocorrelation and Partial Autocorrelation
  7. Classical Time Series Models
  8. Exponential Smoothing Models
  9. Model Identification, Estimation, and Selection
  10. Model Diagnostics
  11. Forecasting and Prediction Intervals
  12. Advanced Topics
  13. Using the Time Series Component
  14. Computational and Formula Details
  15. Worked Examples
  16. Common Mistakes and How to Avoid Them
  17. Troubleshooting
  18. Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into time series analysis, it is helpful to be familiar with the following foundational concepts. Do not worry if you are not — each concept is briefly explained here.

1.1 Random Variables and Expectation

A random variable XX is a variable whose value is the outcome of a random phenomenon. The expectation (mean) of XX is:

E[X]=μ=xxP(X=x)(discrete),E[X]=xf(x)dx(continuous)E[X] = \mu = \sum_x x \cdot P(X = x) \quad \text{(discrete)}, \qquad E[X] = \int_{-\infty}^{\infty} x f(x)\, dx \quad \text{(continuous)}

The variance is:

Var(X)=E[(Xμ)2]=E[X2]μ2\text{Var}(X) = E\left[(X - \mu)^2\right] = E[X^2] - \mu^2

1.2 Covariance and Correlation

The covariance between two random variables XX and YY measures how they move together:

Cov(X,Y)=E[(XμX)(YμY)]\text{Cov}(X, Y) = E\left[(X - \mu_X)(Y - \mu_Y)\right]

The Pearson correlation normalises covariance to the range [1,1][-1, 1]:

ρXY=Cov(X,Y)Var(X)Var(Y)\rho_{XY} = \frac{\text{Cov}(X, Y)}{\sqrt{\text{Var}(X) \cdot \text{Var}(Y)}}

In time series, the covariance between a series and its own lagged values plays a central role.

1.3 The Lag Operator

The lag operator LL (also called the backshift operator BB) shifts a time series back by one period:

LXt=Xt1,LkXt=XtkL X_t = X_{t-1}, \qquad L^k X_t = X_{t-k}

The lag operator is a powerful notational tool that simplifies the expression of time series models:

ΔXt=XtXt1=(1L)Xt\Delta X_t = X_t - X_{t-1} = (1 - L) X_t

1.4 White Noise

A sequence {ϵt}\{\epsilon_t\} is called white noise if:

White noise is the building block of all time series models. If additionally ϵtN(0,σ2)\epsilon_t \sim \mathcal{N}(0, \sigma^2), it is called Gaussian white noise.

1.5 The Normal Distribution

The normal distribution N(μ,σ2)\mathcal{N}(\mu, \sigma^2) with probability density function:

f(x)=1σ2πe(xμ)22σ2f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}

is assumed for residuals in many time series models. Departures from normality are assessed during model diagnostics.


2. What is Time Series Analysis?

A time series is a sequence of observations recorded at successive, equally spaced points in time. Time series analysis is the set of methods used to understand the structure of a time series, model its behaviour, and generate forecasts of future values.

2.1 What Makes Time Series Special?

Unlike cross-sectional data (where observations are assumed to be independent), observations in a time series are ordered in time and are typically correlated with past values. This temporal dependence is both a challenge and an opportunity:

2.2 Real-World Applications

Time series analysis is one of the most widely applied quantitative tools across many domains:

2.3 Goals of Time Series Analysis

The primary goals of time series analysis are:

GoalDescription
DescriptionSummarising the main features of the series (trend, seasonality, variability)
ExplanationUnderstanding relationships between the series and other variables
DecompositionSeparating the series into its underlying components
ModellingFitting a mathematical model that captures the series' structure
ForecastingPredicting future values of the series with associated uncertainty
Anomaly DetectionIdentifying unusual observations or structural breaks
ControlMonitoring a process and intervening when it deviates from target

2.4 Types of Time Series Data

TypeDescriptionExample
UnivariateA single variable observed over timeMonthly sales figures
MultivariateMultiple variables observed over timeDaily temperature, humidity, and wind speed simultaneously
ContinuousRecorded continuously (or at very fine intervals)ECG heart rate signal
DiscreteRecorded at distinct time pointsQuarterly GDP
Equally spacedFixed time intervals between observationsWeekly stock closing prices
Irregularly spacedVariable time intervalsTransaction data, event logs

The DataStatPro application focuses primarily on univariate, equally spaced discrete time series, which covers the vast majority of applied use cases.


3. Components of a Time Series

Most time series can be understood as a combination of several underlying structural components. Identifying and separating these components is the first step in any time series analysis.

3.1 Trend (TtT_t)

The trend is the long-term, systematic increase or decrease in the level of the series over time. It represents the underlying direction of the data, abstracting away short-term fluctuations.

3.2 Seasonality (StS_t)

Seasonality refers to regular, periodic fluctuations that repeat at a fixed and known frequency (the seasonal period mm). Seasonality is caused by calendar or institutional factors.

FrequencySeasonal Period mmExample
Monthly datam=12m = 12Retail sales peak in December
Quarterly datam=4m = 4Energy consumption peaks in Q1/Q3
Weekly datam=7m = 7Traffic higher on weekdays
Hourly datam=24m = 24Electricity demand peaks at 6–8pm

💡 Seasonality is distinct from cyclical behaviour — seasonal patterns repeat at fixed intervals (e.g., every 12 months), whereas cycles have variable duration (typically 2–10 years) and are driven by broader economic forces.

3.3 Cyclical Fluctuations (CtC_t)

Cyclical fluctuations are wave-like patterns that occur over periods longer than one seasonal cycle, typically driven by economic or business cycles. Unlike seasonal patterns, cycles:

3.4 Irregular (Residual) Component (ItI_t)

The irregular component (also called the error, noise, or residual) is the portion of the series that remains after removing trend, seasonality, and cyclical components. It represents:

In a well-fitted model, the irregular component should resemble white noise.

3.5 Summary of Components

Yt=f(Tt, St, Ct, It)Y_t = f(T_t,\ S_t,\ C_t,\ I_t)

Where the function ff depends on the decomposition model (additive or multiplicative — see Section 4).


4. Time Series Decomposition

Decomposition is the process of separating a time series into its constituent components. It provides a clearer picture of the underlying structure and aids in modelling and forecasting.

4.1 Additive Decomposition

In the additive model, the components are assumed to add together:

Yt=Tt+St+Ct+ItY_t = T_t + S_t + C_t + I_t

When to use: When the magnitude of seasonal fluctuations is constant over time, regardless of the level of the series. The amplitude of the seasonal swings does not change as the trend rises or falls.

4.2 Multiplicative Decomposition

In the multiplicative model, the components multiply together:

Yt=Tt×St×Ct×ItY_t = T_t \times S_t \times C_t \times I_t

When to use: When the magnitude of seasonal fluctuations increases or decreases proportionally with the level of the series. As the trend rises, the seasonal swings get larger. Most economic and business time series exhibit multiplicative seasonality.

💡 A multiplicative model can be converted to an additive model by taking logarithms: ln(Yt)=ln(Tt)+ln(St)+ln(Ct)+ln(It)\ln(Y_t) = \ln(T_t) + \ln(S_t) + \ln(C_t) + \ln(I_t). This is a common preprocessing step.

4.3 Moving Average Smoothing for Trend Estimation

A centred moving average (CMA) of order mm is the most common method for estimating the trend-cycle component:

For odd mm: T^t=1mj=(m1)/2(m1)/2Yt+j\hat{T}_t = \frac{1}{m} \sum_{j=-(m-1)/2}^{(m-1)/2} Y_{t+j}

For even mm (requires a 2×m2 \times m CMA to maintain centring): T^t=1m(12Ytm/2+Ytm/2+1++Yt+m/21+12Yt+m/2)\hat{T}_t = \frac{1}{m}\left(\frac{1}{2}Y_{t-m/2} + Y_{t-m/2+1} + \dots + Y_{t+m/2-1} + \frac{1}{2}Y_{t+m/2}\right)

For seasonal data with period mm, the CMA of order mm removes seasonality and smooths the irregular component, leaving the trend-cycle.

4.4 Classical Decomposition Procedure

Step 1: Estimate the Trend-Cycle (T^t\hat{T}_t) Apply a centred moving average of order mm (the seasonal period).

Step 2: Detrend the Series

Step 3: Estimate Seasonal Component (S^t\hat{S}_t) Average the detrended values for each season across all years. Normalise so that seasonal indices sum to zero (additive) or average to 1 (multiplicative).

Step 4: Calculate the Irregular Component (I^t\hat{I}_t)

4.5 STL Decomposition (Seasonal and Trend Decomposition using Loess)

STL (Cleveland et al.) is a more robust and flexible decomposition method based on locally weighted regression (Loess). Advantages over classical decomposition:

The STL decomposition is controlled by two primary smoothing parameters:

4.6 Seasonal Adjustment

A seasonally adjusted series is obtained by removing the estimated seasonal component:

Seasonally adjusted series are widely used in economic reporting (e.g., seasonally adjusted unemployment rate) to reveal the underlying trend more clearly.


5. Stationarity

Stationarity is the single most important concept in classical time series modelling. Nearly all standard time series models (ARMA, ARIMA) assume some form of stationarity.

5.1 Strict Stationarity

A process {Yt}\{Y_t\} is strictly stationary if the joint distribution of (Yt1,Yt2,,Ytk)(Y_{t_1}, Y_{t_2}, \dots, Y_{t_k}) is identical to the joint distribution of (Yt1+h,Yt2+h,,Ytk+h)(Y_{t_1+h}, Y_{t_2+h}, \dots, Y_{t_k+h}) for all kk, all time points t1,,tkt_1, \dots, t_k, and all shifts hh. This is a very strong condition.

5.2 Weak (Covariance) Stationarity

Weak stationarity (also called second-order stationarity) requires only:

  1. Constant mean: E[Yt]=μE[Y_t] = \mu for all tt.
  2. Constant variance: Var(Yt)=σ2<\text{Var}(Y_t) = \sigma^2 < \infty for all tt.
  3. Autocovariance depends only on lag: Cov(Yt,Yt+h)=γ(h)\text{Cov}(Y_t, Y_{t+h}) = \gamma(h) for all tt, where γ(h)\gamma(h) is a function of the lag hh only, not of tt.

In practice, "stationarity" almost always refers to weak stationarity. A non-stationary series has a time-varying mean (trend), time-varying variance, or both.

5.3 Why Stationarity Matters

ARMA models are only valid for stationary series. Applying these models to a non-stationary series leads to:

5.4 Types of Non-Stationarity

TypeDescriptionSolution
Trend stationaritySeries has a deterministic trend (mean increases linearly)Detrend by regression on time
Difference stationarity (Unit Root)Series has a stochastic trend (random walk)First-difference: ΔYt=YtYt1\Delta Y_t = Y_t - Y_{t-1}
Seasonal non-stationaritySeasonal pattern is non-stationarySeasonal differencing: ΔmYt=YtYtm\Delta_m Y_t = Y_t - Y_{t-m}
HeteroscedasticityVariance changes over timeLog or Box-Cox transformation

5.5 Differencing

First differencing (d=1d = 1) removes a stochastic linear trend (unit root):

ΔYt=YtYt1=(1L)Yt\Delta Y_t = Y_t - Y_{t-1} = (1 - L) Y_t

Second differencing (d=2d = 2) removes a stochastic quadratic trend:

Δ2Yt=Δ(ΔYt)=Yt2Yt1+Yt2=(1L)2Yt\Delta^2 Y_t = \Delta(\Delta Y_t) = Y_t - 2Y_{t-1} + Y_{t-2} = (1-L)^2 Y_t

Seasonal differencing of order mm removes seasonal non-stationarity:

ΔmYt=YtYtm=(1Lm)Yt\Delta_m Y_t = Y_t - Y_{t-m} = (1 - L^m) Y_t

⚠️ Over-differencing introduces unnecessary MA structure and inflates variance. Always apply the minimum number of differences needed to achieve stationarity.

5.6 The Box-Cox Transformation

When the series exhibits heteroscedasticity (variance increases with the level), the Box-Cox transformation stabilises the variance:

Yt(λ)={Ytλ1λif λ0ln(Yt)if λ=0Y_t^{(\lambda)} = \begin{cases} \frac{Y_t^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0 \\ \ln(Y_t) & \text{if } \lambda = 0 \end{cases}

Common choices:

The optimal λ\lambda can be estimated by maximising the log-likelihood. The Guerrero method provides a fast, robust estimator.

5.7 Formal Tests for Stationarity

5.7.1 Augmented Dickey-Fuller (ADF) Test

The ADF test tests for the presence of a unit root (stochastic trend):

ΔYt=α+βt+γYt1+j=1pδjΔYtj+ϵt\Delta Y_t = \alpha + \beta t + \gamma Y_{t-1} + \sum_{j=1}^p \delta_j \Delta Y_{t-j} + \epsilon_t

The test statistic is τ=γ^/SE(γ^)\tau = \hat{\gamma}/SE(\hat{\gamma}), compared to critical values from the Dickey-Fuller distribution (not the standard tt-distribution). A small (very negative) τ\tau and small p-value lead to rejection of H0H_0.

The lag order pp is chosen to remove autocorrelation from the residuals (e.g., using AIC/BIC).

Three variants exist based on the deterministic terms included:

VariantEquationUse Case
No constant, no trendΔYt=γYt1+δjΔYtj+ϵt\Delta Y_t = \gamma Y_{t-1} + \sum \delta_j \Delta Y_{t-j} + \epsilon_tSeries fluctuates around zero
With constant (drift)Add α\alphaSeries has a non-zero mean
With constant and trendAdd α+βt\alpha + \beta tSeries has both a mean and a linear trend

5.7.2 KPSS Test (Kwiatkowski-Phillips-Schmidt-Shin)

The KPSS test has the opposite null hypothesis from the ADF test:

The test statistic is:

KPSS=1T2σ^2t=1TSt2\text{KPSS} = \frac{1}{T^2 \hat{\sigma}^2} \sum_{t=1}^T S_t^2

Where St=i=1te^iS_t = \sum_{i=1}^t \hat{e}_i is the partial sum of OLS residuals from regressing YtY_t on a constant (or constant + trend), and σ^2\hat{\sigma}^2 is a long-run variance estimator.

Large KPSS statistic → reject H0H_0 → series is non-stationary.

💡 Using both ADF and KPSS together is recommended: if ADF rejects (non-stationary) and KPSS does not reject (stationary), there is a contradiction suggesting more careful examination. If both agree, the conclusion is clearer.

5.7.3 Phillips-Perron (PP) Test

The Phillips-Perron test is a non-parametric modification of the Dickey-Fuller test. Instead of adding lagged difference terms to control for serial correlation (as ADF does), it uses a non-parametric correction to the test statistic. It is more robust to heteroscedasticity and serial correlation in the errors.

Decision rule and interpretation are the same as for the ADF test.

5.8 Determining the Order of Differencing

EvidenceAction
ADF: fail to reject H0H_0; KPSS: reject H0H_0Apply first difference (d=1d = 1)
After first difference, ADF: reject H0H_0; KPSS: fail to rejectSeries is I(1); use d=1d = 1 in ARIMA
After first difference, still non-stationaryApply second difference (d=2d = 2); rarely needed
ACF decays very slowlyStrong evidence of non-stationarity; difference required
ACF cuts off quicklySeries may already be stationary

6. Autocorrelation and Partial Autocorrelation

The autocorrelation function (ACF) and partial autocorrelation function (PACF) are the primary tools for identifying the structure of a time series and selecting appropriate model orders.

6.1 Autocovariance Function

For a stationary process, the autocovariance at lag hh is:

γ(h)=Cov(Yt,Yt+h)=E[(Ytμ)(Yt+hμ)]\gamma(h) = \text{Cov}(Y_t, Y_{t+h}) = E\left[(Y_t - \mu)(Y_{t+h} - \mu)\right]

Note that γ(0)=Var(Yt)=σ2\gamma(0) = \text{Var}(Y_t) = \sigma^2.

6.2 Autocorrelation Function (ACF)

The autocorrelation at lag hh is the autocovariance normalised by the variance:

ρ(h)=γ(h)γ(0)=Cov(Yt,Yt+h)Var(Yt)\rho(h) = \frac{\gamma(h)}{\gamma(0)} = \frac{\text{Cov}(Y_t, Y_{t+h})}{\text{Var}(Y_t)}

Properties:

Sample ACF: Estimated from data as:

ρ^(h)=t=1Th(YtYˉ)(Yt+hYˉ)t=1T(YtYˉ)2\hat{\rho}(h) = \frac{\sum_{t=1}^{T-h}(Y_t - \bar{Y})(Y_{t+h} - \bar{Y})}{\sum_{t=1}^T (Y_t - \bar{Y})^2}

Bartlett's approximate 95% confidence bounds for testing whether ρ^(h)=0\hat{\rho}(h) = 0:

±1.96T\pm \frac{1.96}{\sqrt{T}}

Autocorrelations that fall outside these bounds are considered statistically significant.

6.3 Partial Autocorrelation Function (PACF)

The partial autocorrelation at lag hh, denoted ϕhh\phi_{hh}, measures the correlation between YtY_t and Yt+hY_{t+h} after removing the linear influence of the intervening lags Yt+1,Yt+2,,Yt+h1Y_{t+1}, Y_{t+2}, \dots, Y_{t+h-1}.

It can be computed using the Yule-Walker equations:

(ρ(1)ρ(2)ρ(h))=(1ρ(1)ρ(h1)ρ(1)1ρ(h2)ρ(h1)ρ(h2)1)(ϕh1ϕh2ϕhh)\begin{pmatrix} \rho(1) \\ \rho(2) \\ \vdots \\ \rho(h) \end{pmatrix} = \begin{pmatrix} 1 & \rho(1) & \cdots & \rho(h-1) \\ \rho(1) & 1 & \cdots & \rho(h-2) \\ \vdots & \vdots & \ddots & \vdots \\ \rho(h-1) & \rho(h-2) & \cdots & 1 \end{pmatrix} \begin{pmatrix} \phi_{h1} \\ \phi_{h2} \\ \vdots \\ \phi_{hh} \end{pmatrix}

The PACF at lag hh is the last element ϕhh\phi_{hh} of the solution vector.

95% confidence bounds for PACF:

±1.96T\pm \frac{1.96}{\sqrt{T}}

6.4 ACF and PACF as Model Identification Tools

The patterns of ACF and PACF are the fingerprints of different time series models:

ModelACF PatternPACF Pattern
White NoiseNo significant spikes at any lagNo significant spikes at any lag
AR(pp)Decays gradually (exponential or sinusoidal)Cuts off sharply after lag pp
MA(qq)Cuts off sharply after lag qqDecays gradually (exponential or sinusoidal)
ARMA(pp,qq)Decays gradually after lag qpq-pDecays gradually after lag pqp-q
Non-stationary (unit root)Decays very slowly (near-unit persistence)Large spike at lag 1, near zero thereafter
Seasonal ARSignificant spikes at multiples of mm (decaying)Spike cuts off at lag mm
Seasonal MASpike at lag mm onlyDecaying spikes at multiples of mm

💡 In practice, ACF/PACF identification is an art as much as a science. Real data rarely produce textbook-perfect patterns. Use ACF/PACF alongside information criteria (AIC/BIC) for model selection.

6.5 The Ljung-Box Test for Autocorrelation

The Ljung-Box test (Box-Pierce modification) tests whether a group of autocorrelations are jointly zero:

QLB(m)=T(T+2)h=1mρ^2(h)ThQ_{LB}(m) = T(T+2) \sum_{h=1}^m \frac{\hat{\rho}^2(h)}{T-h}

Under H0H_0 (no autocorrelation up to lag mm), QLB(m)χm2Q_{LB}(m) \sim \chi^2_m.

This test is used primarily for residual diagnostics after fitting a model (see Section 10).


7. Classical Time Series Models

7.1 Autoregressive (AR) Models

An autoregressive model of order pp, denoted AR(pp), models the current value as a linear combination of its pp most recent past values plus white noise:

Yt=c+ϕ1Yt1+ϕ2Yt2++ϕpYtp+ϵtY_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \dots + \phi_p Y_{t-p} + \epsilon_t

Where:

Using the lag operator:

(1ϕ1Lϕ2L2ϕpLp)Yt=c+ϵt(1 - \phi_1 L - \phi_2 L^2 - \dots - \phi_p L^p) Y_t = c + \epsilon_t

Φ(L)Yt=c+ϵt\Phi(L) Y_t = c + \epsilon_t

Where Φ(L)=1ϕ1Lϕ2L2ϕpLp\Phi(L) = 1 - \phi_1 L - \phi_2 L^2 - \dots - \phi_p L^p is the AR characteristic polynomial.

7.1.1 Stationarity Condition for AR(pp)

An AR(pp) process is stationary if and only if all roots of the characteristic polynomial lie outside the unit circle:

Φ(z)=1ϕ1zϕ2z2ϕpzp=0    z>1 for all roots\Phi(z) = 1 - \phi_1 z - \phi_2 z^2 - \dots - \phi_p z^p = 0 \implies |z| > 1 \text{ for all roots}

For AR(1): The stationarity condition is simply ϕ1<1|\phi_1| < 1.

For AR(2): The stationarity conditions are: ϕ2<1,ϕ1+ϕ2<1,ϕ2ϕ1<1|\phi_2| < 1, \quad \phi_1 + \phi_2 < 1, \quad \phi_2 - \phi_1 < 1

7.1.2 Properties of the AR(1) Process

The simplest AR model, AR(1):

Yt=c+ϕ1Yt1+ϵtY_t = c + \phi_1 Y_{t-1} + \epsilon_t

7.2 Moving Average (MA) Models

A moving average model of order qq, denoted MA(qq), models the current value as a linear combination of the current and qq most recent white noise terms:

Yt=μ+ϵt+θ1ϵt1+θ2ϵt2++θqϵtqY_t = \mu + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \dots + \theta_q \epsilon_{t-q}

Where:

Using the lag operator:

Yt=μ+(1+θ1L+θ2L2++θqLq)ϵt=μ+Θ(L)ϵtY_t = \mu + (1 + \theta_1 L + \theta_2 L^2 + \dots + \theta_q L^q) \epsilon_t = \mu + \Theta(L) \epsilon_t

7.2.1 Properties of the MA(qq) Process

7.2.2 Invertibility Condition for MA(qq)

An MA(qq) process is invertible if all roots of the MA characteristic polynomial Θ(z)=1+θ1z++θqzq=0\Theta(z) = 1 + \theta_1 z + \dots + \theta_q z^q = 0 lie outside the unit circle (z>1|z| > 1). Invertibility ensures a unique MA representation and is required for estimation and forecasting.

For MA(1): Invertibility requires θ1<1|\theta_1| < 1.

7.3 ARMA Models

The Autoregressive Moving Average model of orders pp and qq, denoted ARMA(pp,qq), combines both AR and MA components:

Yt=c+ϕ1Yt1++ϕpYtp+ϵt+θ1ϵt1++θqϵtqY_t = c + \phi_1 Y_{t-1} + \dots + \phi_p Y_{t-p} + \epsilon_t + \theta_1 \epsilon_{t-1} + \dots + \theta_q \epsilon_{t-q}

Or compactly using the lag operator:

Φ(L)Yt=c+Θ(L)ϵt\Phi(L) Y_t = c + \Theta(L) \epsilon_t

Stationarity requires the AR polynomial roots to lie outside the unit circle. Invertibility requires the MA polynomial roots to lie outside the unit circle. Both must hold for a well-behaved ARMA model.

Properties:

7.4 ARIMA Models

The Autoregressive Integrated Moving Average model, denoted ARIMA(pp,dd,qq), extends ARMA to handle non-stationary series by incorporating dd rounds of differencing:

Φ(L)(1L)dYt=c+Θ(L)ϵt\Phi(L)(1-L)^d Y_t = c + \Theta(L) \epsilon_t

Where:

The differenced series Wt=(1L)dYtW_t = (1-L)^d Y_t follows an ARMA(pp,qq) model.

Special Cases:

ModelParametersDescription
ARIMA(0,1,0)d=1d=1Random walk
ARIMA(1,0,0)p=1p=1, d=0d=0AR(1)
ARIMA(0,0,1)q=1q=1, d=0d=0MA(1)
ARIMA(0,1,1)d=1d=1, q=1q=1Exponential smoothing
ARIMA(1,1,0)p=1p=1, d=1d=1Differenced AR(1)
ARIMA(0,2,2)d=2d=2, q=2q=2Equivalent to Holt's linear method

7.5 SARIMA Models

The Seasonal ARIMA model, denoted SARIMA(pp,dd,qq)(PP,DD,QQ)m_m, extends ARIMA by adding seasonal AR, differencing, and MA components with seasonal period mm:

Φ(L)ΦS(Lm)(1L)d(1Lm)DYt=c+Θ(L)ΘS(Lm)ϵt\Phi(L)\Phi_S(L^m)(1-L)^d(1-L^m)^D Y_t = c + \Theta(L)\Theta_S(L^m)\epsilon_t

Where:

Common SARIMA Configurations:

ModelUse Case
SARIMA(0,1,1)(0,1,1)12_{12}Airline model (Box-Jenkins); monthly data with seasonal differencing
SARIMA(1,1,1)(1,1,1)12_{12}General monthly seasonal model
SARIMA(0,1,1)(0,1,1)4_4Quarterly seasonal model
SARIMA(2,1,0)(1,1,0)12_{12}Monthly data with AR structure

7.6 SARIMAX Models

The SARIMAX model extends SARIMA by including exogenous (external) predictor variables XtX_t:

Φ(L)ΦS(Lm)(1L)d(1Lm)DYt=c+β1X1t++βkXkt+Θ(L)ΘS(Lm)ϵt\Phi(L)\Phi_S(L^m)(1-L)^d(1-L^m)^D Y_t = c + \beta_1 X_{1t} + \dots + \beta_k X_{kt} + \Theta(L)\Theta_S(L^m)\epsilon_t

Where XjtX_{jt} are exogenous variables (e.g., advertising spend, temperature, day-of-week indicators) and βj\beta_j are their regression coefficients.

💡 SARIMAX is equivalent to a dynamic regression model with ARIMA errors. The exogenous variables account for the deterministic part of the series structure, while the ARIMA component models the stochastic error structure.

7.7 The Random Walk and Random Walk with Drift

Random Walk: ARIMA(0,1,0) with c=0c = 0:

Yt=Yt1+ϵt    Yt=Y0+i=1tϵiY_t = Y_{t-1} + \epsilon_t \implies Y_t = Y_0 + \sum_{i=1}^t \epsilon_i

Random Walk with Drift: ARIMA(0,1,0) with c0c \neq 0:

Yt=c+Yt1+ϵtY_t = c + Y_{t-1} + \epsilon_t


8. Exponential Smoothing Models

Exponential smoothing methods are a family of forecasting procedures that generate weighted averages of past observations, with weights that decay exponentially as observations recede into the past. They are intuitive, robust, and widely used in practice.

8.1 Simple Exponential Smoothing (SES)

SES (also called single exponential smoothing) is appropriate for series with no trend and no seasonality (or trend and seasonality that have been removed).

Smoothed Level:

Y^t+1t=t=αYt+(1α)t1\hat{Y}_{t+1|t} = \ell_t = \alpha Y_t + (1 - \alpha) \ell_{t-1}

Where:

Equivalently, as a weighted average of all past observations:

t=αj=0t1(1α)jYtj+(1α)t0\ell_t = \alpha \sum_{j=0}^{t-1}(1-\alpha)^j Y_{t-j} + (1-\alpha)^t \ell_0

Forecasts: Y^t+ht=t\hat{Y}_{t+h|t} = \ell_t for all forecast horizons h1h \geq 1 (flat forecast line).

Error Correction Form (ETS interpretation):

t=t1+αϵt\ell_t = \ell_{t-1} + \alpha \epsilon_t

Where ϵt=Ytt1\epsilon_t = Y_t - \ell_{t-1} is the one-step-ahead forecast error. SES updates the level proportionally to the most recent error.

Equivalence: SES is equivalent to an ARIMA(0,1,1) model with θ1=α1\theta_1 = \alpha - 1.

8.2 Holt's Linear Exponential Smoothing (Double Exponential Smoothing)

Holt's method extends SES to handle series with a linear trend (but no seasonality), by separately smoothing both the level and the trend.

Level equation:

t=αYt+(1α)(t1+bt1)\ell_t = \alpha Y_t + (1 - \alpha)(\ell_{t-1} + b_{t-1})

Trend (slope) equation:

bt=β(tt1)+(1β)bt1b_t = \beta^*(\ell_t - \ell_{t-1}) + (1 - \beta^*) b_{t-1}

Where:

Forecasts:

Y^t+ht=t+hbt\hat{Y}_{t+h|t} = \ell_t + h \cdot b_t

The hh-step-ahead forecast is a linear extrapolation of the trend.

Equivalence: Holt's method is equivalent to ARIMA(0,2,2).

8.2.1 Damped Trend Method

The damped trend modification (Gardner & McKenzie) introduces a damping parameter ϕ(0,1)\phi \in (0, 1) that dampens the trend toward zero for longer forecast horizons, avoiding the unrealistic assumption of a constant growth rate into the indefinite future:

Level: t=αYt+(1α)(t1+ϕbt1)\ell_t = \alpha Y_t + (1-\alpha)(\ell_{t-1} + \phi b_{t-1})

Trend: bt=β(tt1)+(1β)ϕbt1b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*)\phi b_{t-1}

Forecasts: Y^t+ht=t+(j=1hϕj)bt=t+ϕ(1ϕh)1ϕbt\hat{Y}_{t+h|t} = \ell_t + \left(\sum_{j=1}^h \phi^j\right) b_t = \ell_t + \frac{\phi(1-\phi^h)}{1-\phi} b_t

As hh \to \infty, the forecast converges to t+ϕbt/(1ϕ)\ell_t + \phi b_t / (1 - \phi) (a constant).

💡 The damped trend method is among the most robust and widely recommended methods for general-purpose forecasting. It is less likely to over-extrapolate trends than undamped Holt's method.

8.3 Holt-Winters Exponential Smoothing (Triple Exponential Smoothing)

Holt-Winters extends Holt's method to handle series with both trend and seasonality.

8.3.1 Additive Holt-Winters

Appropriate when the seasonal variation is roughly constant in magnitude.

Level: t=α(Ytstm)+(1α)(t1+bt1)\ell_t = \alpha(Y_t - s_{t-m}) + (1-\alpha)(\ell_{t-1} + b_{t-1})

Trend: bt=β(tt1)+(1β)bt1b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*) b_{t-1}

Seasonal: st=γ(Ytt1bt1)+(1γ)stms_t = \gamma(Y_t - \ell_{t-1} - b_{t-1}) + (1-\gamma) s_{t-m}

Forecasts (hh steps ahead): Y^t+ht=t+hbt+st+hm(k+1)\hat{Y}_{t+h|t} = \ell_t + h \cdot b_t + s_{t+h-m(k+1)}

Where k=(h1)/mk = \lfloor (h-1)/m \rfloor ensures we pick the correct seasonal index.

8.3.2 Multiplicative Holt-Winters

Appropriate when the seasonal variation grows proportionally with the level of the series.

Level: t=αYtstm+(1α)(t1+bt1)\ell_t = \alpha \frac{Y_t}{s_{t-m}} + (1-\alpha)(\ell_{t-1} + b_{t-1})

Trend: bt=β(tt1)+(1β)bt1b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*) b_{t-1}

Seasonal: st=γYtt1+bt1+(1γ)stms_t = \gamma \frac{Y_t}{\ell_{t-1} + b_{t-1}} + (1-\gamma) s_{t-m}

Forecasts: Y^t+ht=(t+hbt)×st+hm(k+1)\hat{Y}_{t+h|t} = (\ell_t + h \cdot b_t) \times s_{t+h-m(k+1)}

Where:

8.4 The ETS Framework

The ETS (Error, Trend, Seasonal) framework (Hyndman et al.) provides a unified taxonomy for all exponential smoothing methods based on the nature of:

ETS Model Taxonomy (selected):

ETS ModelEquivalent MethodErrorTrendSeasonal
ETS(A,N,N)Simple Exponential SmoothingANN
ETS(A,A,N)Holt's LinearAAN
ETS(A,Ad_d,N)Damped Holt'sAAd_dN
ETS(A,A,A)Additive Holt-WintersAAA
ETS(M,A,M)Multiplicative Holt-WintersMAM
ETS(M,Ad_d,M)Damped Multiplicative HWMAd_dM

The ETS framework provides a state space representation with a likelihood function, enabling proper:

8.5 Initialisation of Smoothing Parameters

Initial values for 0\ell_0, b0b_0, and s1m,,s0s_{1-m}, \dots, s_0 are required to start the recursions. Common methods:

The DataStatPro application uses optimised initialisation by default.


9. Model Identification, Estimation, and Selection

9.1 The Box-Jenkins Methodology

The Box-Jenkins methodology is the classical framework for ARIMA model building. It proceeds through three iterative stages:

Stage 1 — Identification

Stage 2 — Estimation

Stage 3 — Diagnostic Checking

9.2 Maximum Likelihood Estimation (MLE) for ARIMA

For ARIMA models, parameters are estimated by maximising the log-likelihood of the observed data. Assuming Gaussian innovations, the exact log-likelihood is:

(ϕ,θ,σ2)=T2ln(2π)12t=1Tlnft12t=1Tϵt2ft\ell(\phi, \theta, \sigma^2) = -\frac{T}{2}\ln(2\pi) - \frac{1}{2}\sum_{t=1}^T \ln f_t - \frac{1}{2}\sum_{t=1}^T \frac{\epsilon_t^2}{f_t}

Where ftf_t is the conditional variance of YtY_t given all past values (computed via the Kalman filter for exact likelihood, or recursively for conditional likelihood). MLE is solved numerically using iterative algorithms (e.g., L-BFGS-B, Newton-Raphson).

9.3 Information Criteria for Model Selection

Information criteria penalise the log-likelihood for model complexity to avoid overfitting.

Akaike Information Criterion (AIC):

AIC=2(θ^)+2k\text{AIC} = -2\ell(\hat{\boldsymbol{\theta}}) + 2k

Corrected AIC (AICc) — recommended for small samples:

AICc=AIC+2k(k+1)Tk1\text{AICc} = \text{AIC} + \frac{2k(k+1)}{T - k - 1}

Bayesian Information Criterion (BIC):

BIC=2(θ^)+kln(T)\text{BIC} = -2\ell(\hat{\boldsymbol{\theta}}) + k \ln(T)

Where:

Rules:

9.4 Automatic ARIMA Selection (auto.ARIMA)

Searching all possible combinations of (p,d,q)(P,D,Q)m(p, d, q)(P, D, Q)_m is computationally expensive. The Hyndman-Khandakar algorithm (implemented as auto.arima in R and replicated in DataStatPro) automates ARIMA selection:

  1. Determine dd and DD using unit root tests (KPSS for dd; Canova-Hansen or KPSS on seasonally differenced series for DD).
  2. Start with a default model (e.g., ARIMA(2,dd,2)(1,DD,1)m_m).
  3. Evaluate neighbouring models (varying pp, qq, PP, QQ by ±1).
  4. Select the model with the lowest AICc.
  5. Repeat until no neighbouring model improves AICc.

⚠️ Automatic selection is a useful starting point but should not replace domain knowledge and manual inspection of ACF/PACF plots, residual diagnostics, and out-of-sample forecast evaluation.

9.5 Forecast Accuracy Metrics

To compare competing models, accuracy metrics are computed on a hold-out (test) set of the last hh observations not used in model fitting:

Mean Error (ME): ME=1ht=1hetME = \frac{1}{h} \sum_{t=1}^h e_t

Mean Absolute Error (MAE): MAE=1ht=1hetMAE = \frac{1}{h} \sum_{t=1}^h |e_t|

Root Mean Squared Error (RMSE): RMSE=1ht=1het2RMSE = \sqrt{\frac{1}{h} \sum_{t=1}^h e_t^2}

Mean Absolute Percentage Error (MAPE): MAPE=100ht=1hetYtMAPE = \frac{100}{h} \sum_{t=1}^h \left|\frac{e_t}{Y_t}\right|

Mean Absolute Scaled Error (MASE) — scale-free, robust to zero values: MASE=MAE1Tmt=m+1TYtYtmMASE = \frac{MAE}{\frac{1}{T-m}\sum_{t=m+1}^T |Y_t - Y_{t-m}|}

Where the denominator is the in-sample MAE of the seasonal naïve forecast. MASE < 1 means the model outperforms the seasonal naïve benchmark.

MetricScaleSensitive to OutliersNotes
MAESame as dataNoEasy to interpret
RMSESame as dataYesPenalises large errors more heavily
MAPEPercentageNoUndefined when Yt=0Y_t = 0; biased for asymmetric series
MASEScale-freeNoPreferred for cross-series comparison

10. Model Diagnostics

After fitting a time series model, residual analysis is essential to verify that the model has adequately captured the structure of the data.

10.1 Residual Definition

For a fitted model, the residuals (one-step-ahead forecast errors) are:

ϵ^t=YtY^tt1\hat{\epsilon}_t = Y_t - \hat{Y}_{t|t-1}

A well-specified model should produce residuals that are approximately white noise: uncorrelated, zero-mean, and homoscedastic.

10.2 Diagnostic Checks

10.2.1 Time Plot of Residuals

Plot ϵ^t\hat{\epsilon}_t against time. Look for:

10.2.2 ACF of Residuals

Plot the ACF of the residuals. For a well-fitted model:

10.2.3 Ljung-Box Test on Residuals

Test the joint significance of autocorrelations up to lag mm:

QLB(m)=T(T+2)h=1mρ^2(h)Thχmpq2Q_{LB}(m) = T(T+2) \sum_{h=1}^m \frac{\hat{\rho}^2(h)}{T-h} \sim \chi^2_{m-p-q}

For residuals from an ARMA(pp,qq) fit, the degrees of freedom are adjusted to mpqm - p - q.

10.2.4 Histogram and Q-Q Plot of Residuals

Assess normality of residuals:

10.2.5 Jarque-Bera Test for Normality

A formal test for normality based on skewness (S^\hat{S}) and excess kurtosis (K^\hat{K}):

JB=T6[S^2+(K^3)24]χ22 under H0JB = \frac{T}{6}\left[\hat{S}^2 + \frac{(\hat{K}-3)^2}{4}\right] \sim \chi^2_2 \text{ under } H_0

10.2.6 ARCH-LM Test for Heteroscedasticity

The ARCH-LM test (Engle, 1982) tests whether residual variance is serially correlated:

ϵ^t2=a0+a1ϵ^t12++amϵ^tm2+vt\hat{\epsilon}_t^2 = a_0 + a_1 \hat{\epsilon}_{t-1}^2 + \dots + a_m \hat{\epsilon}_{t-m}^2 + v_t

Test statistic: LM=TR2LM = T \cdot R^2 from this regression, distributed χm2\chi^2_m under H0H_0 of no ARCH effects.

10.3 Overfitting and Parsimony

A model with too many parameters may overfit the training data (low in-sample errors) but generalise poorly to new data (high out-of-sample errors). The principle of parsimony (Occam's Razor) favours the simplest model that adequately captures the data structure. AICc and BIC both penalise complexity to guard against overfitting.


11. Forecasting and Prediction Intervals

11.1 Point Forecasts

The hh-step-ahead point forecast made at time TT is the conditional expectation:

Y^T+hT=E[YT+hYT,YT1,]\hat{Y}_{T+h|T} = E[Y_{T+h} \mid Y_T, Y_{T-1}, \dots]

For ARIMA models, forecasts are computed recursively using the estimated model equations, replacing unknown future values with their forecasts and unknown future errors with zero.

AR(pp) forecast: Y^T+hT=c^+ϕ^1Y^T+h1T++ϕ^pY^T+hpT\hat{Y}_{T+h|T} = \hat{c} + \hat{\phi}_1 \hat{Y}_{T+h-1|T} + \dots + \hat{\phi}_p \hat{Y}_{T+h-p|T}

Where Y^T+jT=YT+j\hat{Y}_{T+j|T} = Y_{T+j} for j0j \leq 0 (known past values) and Y^T+jT\hat{Y}_{T+j|T} for j>0j > 0 (previously computed forecasts).

11.2 Forecast Error and Variance

The hh-step-ahead forecast error is:

eT+hT=YT+hY^T+hTe_{T+h|T} = Y_{T+h} - \hat{Y}_{T+h|T}

For ARIMA models, YtY_t has an infinite MA representation (the Wold decomposition):

Yt=μ+j=0ψjϵtjY_t = \mu + \sum_{j=0}^{\infty} \psi_j \epsilon_{t-j}

Where the ψ\psi-weights can be derived recursively from the AR and MA polynomials. The variance of the hh-step-ahead forecast error is:

Var(eT+hT)=σ2j=0h1ψj2\text{Var}(e_{T+h|T}) = \sigma^2 \sum_{j=0}^{h-1} \psi_j^2

For h=1h = 1: Var(eT+1T)=σ2\text{Var}(e_{T+1|T}) = \sigma^2 (one-step forecast error variance equals the innovation variance).

The forecast uncertainty grows with the forecast horizon hh, reflecting increasing uncertainty about the distant future.

11.3 Prediction Intervals

A (1α)×100%(1-\alpha) \times 100\% prediction interval for YT+hY_{T+h} is:

Y^T+hT±zα/2σ^2j=0h1ψ^j2\hat{Y}_{T+h|T} \pm z_{\alpha/2} \cdot \sqrt{\hat{\sigma}^2 \sum_{j=0}^{h-1} \hat{\psi}_j^2}

For a 95% prediction interval, z0.025=1.96z_{0.025} = 1.96.

Key properties:

11.4 Bootstrap Prediction Intervals

When residuals are non-normal, bootstrap prediction intervals are more reliable:

  1. Fit the model; save the standardised residuals ϵ^t=ϵ^t/σ^\hat{\epsilon}_t^* = \hat{\epsilon}_t / \hat{\sigma}.
  2. For each bootstrap replicate b=1,,Bb = 1, \dots, B: a. Simulate future errors by sampling with replacement from {ϵ^t}\{\hat{\epsilon}_t^*\}. b. Generate a simulated future path YT+1(b),,YT+h(b)Y_{T+1}^{(b)}, \dots, Y_{T+h}^{(b)} using the fitted model.
  3. The α/2\alpha/2 and 1α/21-\alpha/2 percentiles of {YT+h(b)}\{Y_{T+h}^{(b)}\} form the bootstrap PI.

Bootstrap PIs are distribution-free and automatically capture non-normality and non-linearity.

11.5 Benchmark Forecasting Methods

Before applying sophisticated models, it is good practice to compare against simple benchmark methods:

MethodFormulaUse Case
Naïve$\hat{Y}_{T+hT} = Y_T$
Seasonal Naïve$\hat{Y}_{T+hT} = Y_{T+h-m \cdot k}$
Drift$\hat{Y}_{T+hT} = Y_T + h \frac{Y_T - Y_1}{T-1}$
Mean$\hat{Y}_{T+hT} = \bar{Y}$

Any proposed model should outperform these benchmarks (MASE < 1 relative to seasonal naïve).


12. Advanced Topics

12.1 ARCH and GARCH Models

When residuals exhibit volatility clustering (periods of high volatility followed by high volatility, and calm followed by calm), standard ARIMA models with constant variance σ2\sigma^2 are inadequate.

12.1.1 ARCH(mm) Model

The Autoregressive Conditional Heteroscedasticity model (Engle, 1982) models the conditional variance as:

σt2=ω+α1ϵt12+α2ϵt22++αmϵtm2\sigma_t^2 = \omega + \alpha_1 \epsilon_{t-1}^2 + \alpha_2 \epsilon_{t-2}^2 + \dots + \alpha_m \epsilon_{t-m}^2

Where ω>0\omega > 0 and αj0\alpha_j \geq 0 to ensure positive variance. The conditional variance depends on past squared residuals.

12.1.2 GARCH(pp,qq) Model

The Generalised ARCH model (Bollerslev, 1986) adds lagged conditional variances to the equation:

σt2=ω+i=1qαiϵti2+j=1pβjσtj2\sigma_t^2 = \omega + \sum_{i=1}^q \alpha_i \epsilon_{t-i}^2 + \sum_{j=1}^p \beta_j \sigma_{t-j}^2

With constraints ω>0\omega > 0, αi0\alpha_i \geq 0, βj0\beta_j \geq 0, and αi+βj<1\sum \alpha_i + \sum \beta_j < 1 for stationarity.

The GARCH(1,1) model is by far the most widely used:

σt2=ω+α1ϵt12+β1σt12\sigma_t^2 = \omega + \alpha_1 \epsilon_{t-1}^2 + \beta_1 \sigma_{t-1}^2

It captures the fact that large shocks to volatility decay slowly (volatility persistence = α1+β1\alpha_1 + \beta_1, typically close to 1 for financial data).

12.2 Vector Autoregression (VAR)

When analysing multiple time series simultaneously and modelling their interdependencies, VAR models extend univariate AR models to a multivariate setting.

A VAR(pp) model for a KK-dimensional vector Yt=(Y1t,Y2t,,YKt)T\mathbf{Y}_t = (Y_{1t}, Y_{2t}, \dots, Y_{Kt})^T is:

Yt=c+A1Yt1+A2Yt2++ApYtp+ϵt\mathbf{Y}_t = \mathbf{c} + \mathbf{A}_1 \mathbf{Y}_{t-1} + \mathbf{A}_2 \mathbf{Y}_{t-2} + \dots + \mathbf{A}_p \mathbf{Y}_{t-p} + \boldsymbol{\epsilon}_t

Where:

VAR models are used for:

12.3 Structural Breaks

A structural break is a sudden change in the parameters of a time series model — e.g., a shift in the mean, a change in slope, or a change in variance — caused by an external event (financial crisis, policy change, pandemic).

Detection methods:

Handling structural breaks:

12.4 Spectral Analysis

Spectral analysis (frequency-domain analysis) decomposes a time series into sinusoidal components of different frequencies, revealing cyclical patterns.

The spectral density (power spectrum) at frequency ω[0,π]\omega \in [0, \pi] is:

f(ω)=12πh=γ(h)eiωh=σ22πΘ(eiω)Φ(eiω)2f(\omega) = \frac{1}{2\pi} \sum_{h=-\infty}^{\infty} \gamma(h) e^{-i\omega h} = \frac{\sigma^2}{2\pi} \left|\frac{\Theta(e^{-i\omega})}{\Phi(e^{-i\omega})}\right|^2

Peaks in the spectral density correspond to dominant cyclical frequencies. The periodogram is the sample estimate of the spectral density.


13. Using the Time Series Component

The Time Series component in the DataStatPro application provides a full end-to-end workflow for analysing and forecasting time series data.

Step-by-Step Guide

Step 1 — Select Dataset Choose the dataset from the "Dataset" dropdown. The dataset should contain:

Step 2 — Select Time Series Variable Select the numeric variable to analyse from the "Time Series Variable" dropdown.

Step 3 — Select Date/Index Column Select the column identifying the time ordering of observations. Specify the frequency/period (e.g., monthly = 12, quarterly = 4, weekly = 52, daily = 365, hourly = 24).

Step 4 — Select Analysis Type Choose the type of analysis to perform:

Step 5 — Configure Preprocessing

Step 6 — Configure Model

For ARIMA/SARIMA:

For Exponential Smoothing (ETS):

Step 7 — Set Forecast Horizon Specify the number of periods to forecast ahead (hh). Choose the confidence level for prediction intervals (default: 95%).

Step 8 — Select Display Options Choose which outputs to display:

Step 9 — Run the Analysis Click "Run Time Series Analysis". The application will:

  1. Parse and sort the time series by date/index.
  2. Apply any specified transformations.
  3. Run stationarity tests and produce ACF/PACF plots.
  4. Fit the specified (or automatically selected) model.
  5. Run residual diagnostics.
  6. Generate forecasts with prediction intervals.
  7. Compute accuracy metrics on the hold-out set (if configured).
  8. Produce all selected visualisations and tables.

14. Computational and Formula Details

14.1 The Wold Decomposition and ψ\psi-Weights

Any covariance-stationary process can be represented as an infinite MA:

Ytμ=j=0ψjϵtj,ψ0=1Y_t - \mu = \sum_{j=0}^{\infty} \psi_j \epsilon_{t-j}, \quad \psi_0 = 1

For an ARMA(pp,qq) model, the ψ\psi-weights satisfy the recursion:

ψj=θj+k=1min(j,p)ϕkψjk,j=0,1,2,\psi_j = \theta_j + \sum_{k=1}^{\min(j,p)} \phi_k \psi_{j-k}, \quad j = 0, 1, 2, \dots

Where θj=0\theta_j = 0 for j>qj > q and ϕk=0\phi_k = 0 for k>pk > p.

These weights are used to compute forecast error variances and prediction intervals.

14.2 Yule-Walker Equations for AR Parameter Estimation

For an AR(pp) model, the Yule-Walker equations relate the ACF to the AR coefficients:

(ρ(1)ρ(2)ρ(p))=(1ρ(1)ρ(p1)ρ(1)1ρ(p2)ρ(p1)ρ(p2)1)(ϕ1ϕ2ϕp)\begin{pmatrix} \rho(1) \\ \rho(2) \\ \vdots \\ \rho(p) \end{pmatrix} = \begin{pmatrix} 1 & \rho(1) & \cdots & \rho(p-1) \\ \rho(1) & 1 & \cdots & \rho(p-2) \\ \vdots & \vdots & \ddots & \vdots \\ \rho(p-1) & \rho(p-2) & \cdots & 1 \end{pmatrix} \begin{pmatrix} \phi_1 \\ \phi_2 \\ \vdots \\ \phi_p \end{pmatrix}

Or in matrix form: ρ=Rϕ\boldsymbol{\rho} = \mathbf{R}\boldsymbol{\phi}, giving ϕ^=R1ρ\hat{\boldsymbol{\phi}} = \mathbf{R}^{-1}\boldsymbol{\rho}.

The innovation variance is: σ^2=γ^(0)(1ϕ^Tρ^)\hat{\sigma}^2 = \hat{\gamma}(0)(1 - \hat{\boldsymbol{\phi}}^T \boldsymbol{\hat{\rho}}).

14.3 ARIMA Estimation via the Kalman Filter

Exact MLE for ARIMA models is computed using the Kalman filter, which recursively computes the innovations and their variances:

State space representation of ARIMA(pp,dd,qq):

xt=Fxt1+gϵt(transition equation)\mathbf{x}_t = \mathbf{F} \mathbf{x}_{t-1} + \mathbf{g} \epsilon_t \quad \text{(transition equation)} Yt=hTxt+ϵt(measurement equation)Y_t = \mathbf{h}^T \mathbf{x}_t + \epsilon_t \quad \text{(measurement equation)}

Where F\mathbf{F}, g\mathbf{g}, h\mathbf{h} are matrices determined by the ARIMA orders. The Kalman filter provides the optimal linear predictor Y^tt1\hat{Y}_{t|t-1} and the innovation et=YtY^tt1e_t = Y_t - \hat{Y}_{t|t-1} at each step. The log-likelihood is:

=T2ln(2π)12t=1Tlnft12t=1Tet2ft\ell = -\frac{T}{2}\ln(2\pi) - \frac{1}{2}\sum_{t=1}^T \ln f_t - \frac{1}{2}\sum_{t=1}^T \frac{e_t^2}{f_t}

Where ft=Var(et)f_t = \text{Var}(e_t) is the innovation variance at time tt.

14.4 Seasonal Decomposition Formulae

For a series with seasonal period mm, the classical decomposition proceeds as:

Step 1: Estimate trend using CMA of order mm.

For even mm (e.g., monthly data, m=12m=12):

T^t=124(Yt6+2Yt5++2Yt+5+Yt+6)=112(Yt62+Yt5++Yt+5+Yt+62)\hat{T}_t = \frac{1}{24}\left(Y_{t-6} + 2Y_{t-5} + \dots + 2Y_{t+5} + Y_{t+6}\right) = \frac{1}{12}\left(\frac{Y_{t-6}}{2} + Y_{t-5} + \dots + Y_{t+5} + \frac{Y_{t+6}}{2}\right)

Step 2: Compute deseasonalised values.

dt={YtT^t(additive)Yt/T^t(multiplicative)d_t = \begin{cases} Y_t - \hat{T}_t & \text{(additive)} \\ Y_t / \hat{T}_t & \text{(multiplicative)} \end{cases}

Step 3: Average dtd_t over all periods of the same season jj to get raw seasonal indices:

sˉj=1njt:season(t)=jdt,j=1,,m\bar{s}_j = \frac{1}{n_j} \sum_{t: \text{season}(t) = j} d_t, \quad j = 1, \dots, m

Step 4: Normalise seasonal indices.

Additive: s^j=sˉj1mj=1msˉj\hat{s}_j = \bar{s}_j - \frac{1}{m}\sum_{j=1}^m \bar{s}_j (so they sum to zero).

Multiplicative: s^j=sˉj/(1mj=1msˉj)\hat{s}_j = \bar{s}_j / \left(\frac{1}{m}\sum_{j=1}^m \bar{s}_j\right) (so they average to 1).

Step 5: Compute irregular component.

I^t={YtT^tS^t(additive)Yt/(T^tS^t)(multiplicative)\hat{I}_t = \begin{cases} Y_t - \hat{T}_t - \hat{S}_t & \text{(additive)} \\ Y_t / (\hat{T}_t \cdot \hat{S}_t) & \text{(multiplicative)} \end{cases}

14.5 Computing ACF and PACF

Sample ACF at lag hh:

ρ^(h)=γ^(h)γ^(0),γ^(h)=1Tt=1Th(YtYˉ)(Yt+hYˉ)\hat{\rho}(h) = \frac{\hat{\gamma}(h)}{\hat{\gamma}(0)}, \quad \hat{\gamma}(h) = \frac{1}{T}\sum_{t=1}^{T-h}(Y_t - \bar{Y})(Y_{t+h} - \bar{Y})

Sample PACF via Durbin-Levinson algorithm:

Initialise: ϕ^1,1=ρ^(1)\hat{\phi}_{1,1} = \hat{\rho}(1)

For k=2,3,k = 2, 3, \dots: ϕ^k,k=ρ^(k)j=1k1ϕ^k1,jρ^(kj)1j=1k1ϕ^k1,jρ^(j)\hat{\phi}_{k,k} = \frac{\hat{\rho}(k) - \sum_{j=1}^{k-1}\hat{\phi}_{k-1,j}\hat{\rho}(k-j)}{1 - \sum_{j=1}^{k-1}\hat{\phi}_{k-1,j}\hat{\rho}(j)}

ϕ^k,j=ϕ^k1,jϕ^k,kϕ^k1,kj,j=1,,k1\hat{\phi}_{k,j} = \hat{\phi}_{k-1,j} - \hat{\phi}_{k,k}\hat{\phi}_{k-1,k-j}, \quad j = 1, \dots, k-1

The PACF at lag kk is ϕ^k,k\hat{\phi}_{k,k}.

14.6 Optimising Exponential Smoothing Parameters

Smoothing parameters (α,β,γ)(\alpha, \beta^*, \gamma) are estimated by minimising the sum of squared one-step-ahead forecast errors:

SSE(α,β,γ)=t=1T(YtY^tt1)2SSE(\alpha, \beta^*, \gamma) = \sum_{t=1}^T \left(Y_t - \hat{Y}_{t|t-1}\right)^2

Subject to constraints (e.g., 0<α<10 < \alpha < 1, 0<β<10 < \beta^* < 1, 0<γ<10 < \gamma < 1). This is solved using numerical optimisation (e.g., Nelder-Mead, L-BFGS-B), typically starting from a grid of initial values to avoid local minima.


15. Worked Examples

Example 1: ARIMA Modelling of Monthly Sales Data

Data: Monthly sales figures for a retail company, T=60T = 60 months (5 years), no seasonal pattern.

Step 1: Plot and Examine the Series

Visual inspection reveals an upward trend with roughly constant variance → possible ARIMA model with d=1d = 1.

Step 2: Stationarity Testing

ADF test on original series: τ=1.82\tau = -1.82, p=0.37p = 0.37 → Fail to reject H0H_0Non-stationary.

Apply first difference: ΔYt=YtYt1\Delta Y_t = Y_t - Y_{t-1}.

ADF test on ΔYt\Delta Y_t: τ=5.41\tau = -5.41, p<0.01p < 0.01 → Reject H0H_0Stationary after first differencing. Therefore d=1d = 1.

KPSS test on ΔYt\Delta Y_t: statistic = 0.12, p>0.10p > 0.10 → Fail to reject H0H_0Stationary. Both tests agree: d=1d = 1.

Step 3: ACF and PACF of ΔYt\Delta Y_t

LagACF ρ^(h)\hat{\rho}(h)Significant?PACF ϕ^hh\hat{\phi}_{hh}Significant?
1-0.312Yes-0.312Yes
20.051No-0.065No
3-0.038No-0.052No
40.029No0.018No

Pattern: ACF has a single significant spike at lag 1 (cuts off after lag 1); PACF decays (though also approximately cuts off after lag 1). This suggests an MA(1) process for ΔYt\Delta Y_t, i.e., ARIMA(0,1,1).

Step 4: Fit ARIMA(0,1,1)

Estimated model:

ΔYt=ϵ^t+θ^1ϵ^t1\Delta Y_t = \hat{\epsilon}_t + \hat{\theta}_1 \hat{\epsilon}_{t-1}

θ^1=0.348(SE=0.117,z=2.97,p=0.003)\hat{\theta}_1 = -0.348 \quad (SE = 0.117, \quad z = -2.97, \quad p = 0.003)

σ^2=156.4,AICc=423.7\hat{\sigma}^2 = 156.4, \quad \text{AICc} = 423.7

For completeness, also fit ARIMA(1,1,0): ϕ^1=0.332\hat{\phi}_1 = -0.332, AICc = 424.3. ARIMA(0,1,1) has lower AICc → preferred.

Step 5: Residual Diagnostics

Model passes all diagnostic checks.

Step 6: Forecasting

For h=1,2,3h = 1, 2, 3 steps ahead, the ARIMA(0,1,1) forecasts are:

Y^T+hT=YT+θ^1ϵ^T(h=1)\hat{Y}_{T+h|T} = Y_T + \hat{\theta}_1 \hat{\epsilon}_T \quad (h = 1) Y^T+hT=YT+θ^1ϵ^T(h2, flat after h=1)\hat{Y}_{T+h|T} = Y_T + \hat{\theta}_1 \hat{\epsilon}_T \quad (h \geq 2, \text{ flat after } h=1)

The ψ\psi-weights: ψ0=1\psi_0 = 1, ψj=1+θ^1=1+(0.348)=0.652\psi_j = 1 + \hat{\theta}_1 = 1 + (-0.348) = 0.652 for j1j \geq 1.

95% Prediction Intervals:

Var(eT+hT)=σ^2j=0h1ψj2\text{Var}(e_{T+h|T}) = \hat{\sigma}^2 \sum_{j=0}^{h-1}\psi_j^2

For h=1h = 1: Var=156.4×1=156.4\text{Var} = 156.4 \times 1 = 156.4, SE=12.51SE = 12.51, PI = Y^T+1T±1.96×12.51\hat{Y}_{T+1|T} \pm 1.96 \times 12.51

For h=2h = 2: Var=156.4×(1+0.6522)=156.4×1.425=222.9\text{Var} = 156.4 \times (1 + 0.652^2) = 156.4 \times 1.425 = 222.9, SE=14.93SE = 14.93

For h=3h = 3: Var=156.4×(1+0.6522+0.6522)=156.4×1.850=289.4\text{Var} = 156.4 \times (1 + 0.652^2 + 0.652^2) = 156.4 \times 1.850 = 289.4, SE=17.01SE = 17.01

HorizonForecast95% PI Lower95% PI Upper
T+1T+1524.3499.8548.8
T+2T+2524.3494.9553.7
T+3T+3524.3490.9557.7

Note: Prediction intervals widen with horizon, reflecting growing uncertainty.


Example 2: SARIMA Modelling of Monthly Airline Passenger Data

Data: Monthly international airline passenger counts (thousands), T=144T = 144 months (12 years). This is the classic Box-Jenkins "airline dataset."

Step 1: Plot and Transform

The series shows:

Apply log transformation (λ=0\lambda = 0) to stabilise variance:

Wt=ln(Yt)W_t = \ln(Y_t)

Step 2: Determine Differencing Orders

ADF test on WtW_t: non-stationary → apply regular differencing (d=1d = 1).

Canova-Hansen test on ΔWt\Delta W_t: seasonal non-stationarity detected → apply seasonal differencing (D=1D = 1, m=12m = 12).

Let Vt=ΔΔ12Wt=(1L)(1L12)WtV_t = \Delta\Delta_{12} W_t = (1-L)(1-L^{12}) W_t.

ADF test on VtV_t: stationary ✅. Therefore d=1d = 1, D=1D = 1.

Step 3: ACF and PACF of VtV_t

Significant ACF spikes at lags 1 and 12; PACF decays at lag 1 and shows a spike at lag 12. This pattern strongly suggests:

Candidate model: SARIMA(0,1,1)(0,1,1)12_{12} — the airline model.

Step 4: Fit SARIMA(0,1,1)(0,1,1)12_{12} on ln(Yt)\ln(Y_t)

ΔΔ12Wt=(1+θ^1L)(1+Θ^1L12)ϵt\Delta\Delta_{12} W_t = (1 + \hat{\theta}_1 L)(1 + \hat{\Theta}_1 L^{12}) \epsilon_t

ParameterEstimateSEz-valuep-value
θ^1\hat{\theta}_1 (non-seasonal MA)-0.4020.083-4.84< 0.001
Θ^1\hat{\Theta}_1 (seasonal MA)-0.5570.073-7.63< 0.001
σ^2\hat{\sigma}^20.00134

AICc = −467.3.

Step 5: Residual Diagnostics

Step 6: Forecasting (12 months ahead)

Forecasts are generated on the log scale and back-transformed with a bias correction:

Y^T+hT=exp(W^T+hT+σ^22ψ^h12)\hat{Y}_{T+h|T} = \exp\left(\hat{W}_{T+h|T} + \frac{\hat{\sigma}^2}{2}\hat{\psi}_{h-1}^2\right)

MonthW^T+h\hat{W}_{T+h}Y^T+h\hat{Y}_{T+h} (000s)95% PI Lower95% PI Upper
T+1T+16.181483.5447.2522.8
T+6T+66.302545.1490.3606.2
T+12T+126.249513.7446.9590.4

Example 3: Holt-Winters Forecasting of Quarterly Retail Sales

Data: Quarterly retail sales (T=40T = 40 quarters, 10 years). Clear upward trend and stable multiplicative seasonality.

Selected Model: ETS(M,A,M) — Multiplicative Holt-Winters.

Estimated Parameters (via SSE minimisation):

α^=0.312,β^=0.058,γ^=0.183,ϕ^=1.000 (not damped)\hat{\alpha} = 0.312, \quad \hat{\beta}^* = 0.058, \quad \hat{\gamma} = 0.183, \quad \hat{\phi} = 1.000 \text{ (not damped)}

Final State Values at t=Tt = T:

^T=412.6,b^T=4.82\hat{\ell}_T = 412.6, \quad \hat{b}_T = 4.82

Seasonal Indices:

Quarters^\hat{s}
Q10.863
Q20.971
Q31.048
Q41.118

Check: 0.863+0.971+1.048+1.118=4.0000.863 + 0.971 + 1.048 + 1.118 = 4.000, average = 1.00 ✅

4-Step-Ahead Forecasts:

Y^T+hT=(^T+hb^T)×s^T+hm\hat{Y}_{T+h|T} = (\hat{\ell}_T + h \hat{b}_T) \times \hat{s}_{T+h-m}

h=1h=1 (Q1 of next year): Y^T+1T=(412.6+1×4.82)×0.863=417.42×0.863=360.2\hat{Y}_{T+1|T} = (412.6 + 1 \times 4.82) \times 0.863 = 417.42 \times 0.863 = 360.2

h=2h=2 (Q2): Y^T+2T=(412.6+2×4.82)×0.971=422.24×0.971=410.0\hat{Y}_{T+2|T} = (412.6 + 2 \times 4.82) \times 0.971 = 422.24 \times 0.971 = 410.0

h=3h=3 (Q3): Y^T+3T=(412.6+3×4.82)×1.048=427.06×1.048=447.6\hat{Y}_{T+3|T} = (412.6 + 3 \times 4.82) \times 1.048 = 427.06 \times 1.048 = 447.6

h=4h=4 (Q4): Y^T+4T=(412.6+4×4.82)×1.118=431.88×1.118=482.8\hat{Y}_{T+4|T} = (412.6 + 4 \times 4.82) \times 1.118 = 431.88 \times 1.118 = 482.8

Accuracy Metrics (on 8-quarter hold-out set):

MetricValue
MAE12.4
RMSE15.7
MAPE3.2%
MASE0.61

MASE = 0.61 < 1: The Holt-Winters model outperforms the seasonal naïve benchmark by 39%.


16. Common Mistakes and How to Avoid Them

Mistake 1: Fitting ARMA to a Non-Stationary Series

Problem: Applying ARMA models directly to a trending or non-stationary series, producing spurious results, unreliable coefficients, and invalid inference.
Solution: Always test for stationarity (ADF, KPSS) before fitting ARMA. Apply the necessary differencing (and/or transformation) to achieve stationarity. Use ARIMA(pp,dd,qq) with the appropriate dd.

Mistake 2: Over-Differencing

Problem: Applying more differences than necessary (e.g., differencing twice when once is sufficient), which introduces unnecessary MA components and inflates forecast variance.
Solution: Apply the minimum number of differences needed to pass stationarity tests. Check whether the differenced series passes ADF/KPSS before differencing again. If the ACF of the differenced series shows a large negative spike at lag 1, it may be over-differenced.

Mistake 3: Ignoring Seasonality

Problem: Fitting a non-seasonal ARIMA model to a clearly seasonal series, leaving seasonal structure in the residuals (which will show spikes in the ACF at seasonal lags).
Solution: Identify the seasonal period mm from domain knowledge and data inspection. Apply seasonal differencing if needed (D=1D = 1). Use SARIMA or ETS models with seasonal components. Always check the ACF of residuals at seasonal lags.

Mistake 4: Confusing ACF and PACF Patterns

Problem: Misreading the ACF/PACF and specifying wrong model orders (e.g., using an AR model when MA would be more appropriate).
Solution: Remember: ACF cuts off for MA; PACF cuts off for AR; both decay for ARMA. Use information criteria (AICc, BIC) alongside ACF/PACF to narrow down model orders. Consider multiple candidate models.

Mistake 5: Not Checking Residual Diagnostics

Problem: Accepting a model without verifying that the residuals are white noise, leading to a mis-specified model with poor forecasting performance.
Solution: Always perform a full residual diagnostic check: time plot, ACF plot, Ljung-Box test, histogram, Q-Q plot. If residuals show autocorrelation, refine the model. If they show heteroscedasticity, consider GARCH.

Mistake 6: Evaluating Model Fit on Training Data Only

Problem: Selecting a model based solely on in-sample fit metrics (e.g., the lowest AIC) without validating forecast performance on held-out data.
Solution: Reserve the last hh observations as a test set. Compute out-of-sample accuracy metrics (RMSE, MASE). Use time-series cross-validation (rolling-origin or expanding-window) for more robust evaluation.

Mistake 7: Ignoring Structural Breaks

Problem: Fitting a single model over a period that contains a structural break (e.g., a financial crisis, a pandemic), leading to poor fit and unreliable forecasts.
Solution: Plot the series and look for sudden level shifts or trend changes. Test for breaks formally (CUSUM, Chow, Bai-Perron). If breaks are detected, model each segment separately or include dummy variables.

Mistake 8: Treating Multiplicative Seasonality as Additive

Problem: Using an additive decomposition or additive Holt-Winters when seasonal variation grows with the level, underestimating seasonality in peak periods and overestimating in troughs.
Solution: Plot the series and assess whether seasonal swings are roughly constant (additive) or grow proportionally with the level (multiplicative). Apply a log transformation to convert multiplicative to additive, or use the multiplicative ETS model directly.

Mistake 9: Extrapolating Trends Too Far

Problem: Generating long-horizon forecasts from a model with a strong trend, leading to unrealistic forecasts that grow without bound.
Solution: Use the damped trend method (ETS with damping) for longer horizons. Report widening prediction intervals to communicate growing uncertainty. Treat long-range forecasts with appropriate scepticism.

Mistake 10: Using MAPE with Near-Zero Values

Problem: MAPE is undefined or extremely large when the actual values YtY_t are zero or close to zero, leading to misleading accuracy assessments.
Solution: Use MASE or RMSE instead of MAPE when the series contains zero or very small values. MASE is always well-defined and has the additional advantage of being scale-free.


17. Troubleshooting

IssueLikely CauseSolution
ARIMA model fails to convergeVery short series; too many parameters; near-unit-root behaviourReduce pp, qq; check stationarity; use simpler model
AICc selects a very high-order model (pp or q>3q > 3)Insufficient data; non-stationarity not fully addressed; outliersIncrease data length; check stationarity; inspect for outliers; try REML-based estimation
Residual ACF shows significant spike at seasonal lag (e.g., lag 12)Seasonal component not modelledAdd seasonal MA or AR term (Q=1Q=1 or P=1P=1); apply seasonal differencing (D=1D=1)
Residual ACF has one large negative spike at lag 1Over-differencing (dd or DD too large)Reduce differencing order by 1
Prediction intervals are extremely wideHigh dd or DD; large σ^2\hat{\sigma}^2; long forecast horizonReconsider differencing; check for outliers inflating σ^2\hat{\sigma}^2; report shorter horizon
ADF and KPSS tests give contradictory resultsNear-unit-root behaviour; small sampleIncrease sample if possible; use PP test as tiebreaker; consult ACF pattern
Forecasts quickly converge to a flat lineRandom walk structure (d=1d=1, no AR); or SES appliedThis is expected behaviour for ARIMA(0,1,1); use Holt's if trend is needed
Holt-Winters gives very poor out-of-sample accuracyWrong seasonality type (additive vs. multiplicative); outliers at end of seriesTry both additive and multiplicative; inspect and handle outliers
GARCH estimation fails to convergeNear-integrated volatility (α+β1\alpha+\beta \approx 1); insufficient dataTry IGARCH; increase data; use simpler ARCH(1)
Ljung-Box test is always significant regardless of modelOutliers or structural breaks inflating residual autocorrelationIdentify and handle outliers; test for structural breaks; use robust estimation
Seasonal naïve outperforms all fitted models (MASE > 1)Insufficient data to estimate model; strong irregular componentCollect more data; consider ensemble approaches; report seasonal naïve as the baseline
Log transformation produces negative back-transformed forecastsSeries contains zeros or negative valuesUse Box-Cox with λ>0\lambda > 0; add a constant before transforming

18. Quick Reference Cheat Sheet

Core Model Equations

ModelEquation
AR(pp)Yt=c+j=1pϕjYtj+ϵtY_t = c + \sum_{j=1}^p \phi_j Y_{t-j} + \epsilon_t
MA(qq)Yt=μ+ϵt+j=1qθjϵtjY_t = \mu + \epsilon_t + \sum_{j=1}^q \theta_j \epsilon_{t-j}
ARMA(pp,qq)Φ(L)Yt=c+Θ(L)ϵt\Phi(L)Y_t = c + \Theta(L)\epsilon_t
ARIMA(pp,dd,qq)Φ(L)(1L)dYt=c+Θ(L)ϵt\Phi(L)(1-L)^d Y_t = c + \Theta(L)\epsilon_t
SARIMA(pp,dd,qq)(PP,DD,QQ)m_mΦ(L)ΦS(Lm)(1L)d(1Lm)DYt=c+Θ(L)ΘS(Lm)ϵt\Phi(L)\Phi_S(L^m)(1-L)^d(1-L^m)^D Y_t = c + \Theta(L)\Theta_S(L^m)\epsilon_t
SESt=αYt+(1α)t1\ell_t = \alpha Y_t + (1-\alpha)\ell_{t-1}; Y^t+h=t\hat{Y}_{t+h} = \ell_t
Holt'st=αYt+(1α)(t1+bt1)\ell_t = \alpha Y_t + (1-\alpha)(\ell_{t-1}+b_{t-1}); bt=β(tt1)+(1β)bt1b_t = \beta^*(\ell_t-\ell_{t-1})+(1-\beta^*)b_{t-1}; Y^t+h=t+hbt\hat{Y}_{t+h} = \ell_t + hb_t
HW AdditiveY^t+h=t+hbt+st+hm(k+1)\hat{Y}_{t+h} = \ell_t + hb_t + s_{t+h-m(k+1)}
HW MultiplicativeY^t+h=(t+hbt)×st+hm(k+1)\hat{Y}_{t+h} = (\ell_t + hb_t) \times s_{t+h-m(k+1)}
GARCH(1,1)σt2=ω+α1ϵt12+β1σt12\sigma_t^2 = \omega + \alpha_1\epsilon_{t-1}^2 + \beta_1\sigma_{t-1}^2

ACF/PACF Pattern Guide

ModelACFPACF
White NoiseNo spikesNo spikes
AR(pp)Decays exponentially/sinusoidallyCuts off after lag pp
MA(qq)Cuts off after lag qqDecays exponentially/sinusoidally
ARMA(pp,qq)Decays after lag qpq-pDecays after lag pqp-q
Non-stationaryVery slow decay (near 1.0)Large spike at lag 1
Seasonal AR(PP)Spikes at m,2m,m, 2m, \dots decayingSpike at lag mm only
Seasonal MA(QQ)Spike at lag mm onlySpikes at m,2m,m, 2m, \dots decaying

Stationarity Tests Summary

TestH0H_0H1H_1Significant result means
ADFUnit root (non-stationary)Stationaryp<0.05p < 0.05: Stationary
KPSSStationaryNon-stationary (unit root)p<0.05p < 0.05: Non-stationary
PPUnit root (non-stationary)Stationaryp<0.05p < 0.05: Stationary

Model Selection Guide

ScenarioRecommended Model
No trend, no seasonalitySES / ARIMA(0,1,1)
Trend, no seasonalityHolt's / ARIMA(0,2,2)
Trend, dampedDamped Holt's / ETS(A,Ad_d,N)
Additive seasonality + trendAdditive HW / SARIMA
Multiplicative seasonality + trendMultiplicative HW / SARIMA on log scale
Volatility clustering (financial data)GARCH(1,1) on residuals
Multiple interrelated seriesVAR(pp)
External predictors availableSARIMAX
Unknown structure; small datasetETS with automatic selection
Unknown structure; large datasetAuto ARIMA

Differencing Guide

Pattern in Original SeriesAction
No trend, no seasonality, ACF decays fastNo differencing needed (d=0d=0, D=0D=0)
Linear trend; ADF non-significantFirst difference (d=1d=1)
Quadratic trendSecond difference (d=2d=2)
Seasonal non-stationaritySeasonal difference (D=1D=1)
Both trend and seasonal non-stationarityd=1d=1 and D=1D=1
Increasing varianceLog or Box-Cox transformation first

Information Criteria Reference

CriterionFormulaPreferNotes
AIC2+2k-2\ell + 2kLowerCan overfit with small TT
AICcAIC+2k(k+1)Tk1\text{AIC} + \frac{2k(k+1)}{T-k-1}LowerRecommended for time series
BIC2+kln(T)-2\ell + k\ln(T)LowerMore parsimonious than AIC

Forecast Accuracy Metrics

MetricFormulaNotes
MAE1het\frac{1}{h}\sum\|e_t\|Intuitive; same units as data
RMSE1het2\sqrt{\frac{1}{h}\sum e_t^2}Penalises large errors; same units
MAPE100het/Yt\frac{100}{h}\sum\|e_t/Y_t\|Percentage; undefined if Yt=0Y_t = 0
MASEMAE/MAEnaı¨ve\text{MAE} / \text{MAE}_{\text{naïve}}Scale-free; MASE < 1 beats naïve

ETS Model Taxonomy

ErrorTrendSeasonalETS CodeMethod
ANNETS(A,N,N)SES
AANETS(A,A,N)Holt's Linear
AAd_dNETS(A,Ad_d,N)Damped Holt's
AAAETS(A,A,A)Additive HW
MAMETS(M,A,M)Multiplicative HW
MAd_dMETS(M,Ad_d,M)Damped Multiplicative HW

This tutorial provides a comprehensive foundation for understanding, applying, and interpreting Time Series Analysis using the DataStatPro application. For further reading, consult Hyndman & Athanasopoulos's "Forecasting: Principles and Practice" (freely available at otexts.com/fpp3), Box, Jenkins, Reinsel & Ljung's "Time Series Analysis: Forecasting and Control", or Brockwell & Davis's "Introduction to Time Series and Forecasting". For feature requests or support, contact the DataStatPro team.