Time Series Analysis: Zero to Hero Tutorial
This comprehensive tutorial takes you from the foundational concepts of time series analysis all the way through advanced modelling, forecasting, and diagnostics, with practical usage within the DataStatPro application. Whether you are a complete beginner or an experienced analyst, this guide is structured to build your understanding step by step.
Table of Contents
- Prerequisites and Background Concepts
- What is Time Series Analysis?
- Components of a Time Series
- Time Series Decomposition
- Stationarity
- Autocorrelation and Partial Autocorrelation
- Classical Time Series Models
- Exponential Smoothing Models
- Model Identification, Estimation, and Selection
- Model Diagnostics
- Forecasting and Prediction Intervals
- Advanced Topics
- Using the Time Series Component
- Computational and Formula Details
- Worked Examples
- Common Mistakes and How to Avoid Them
- Troubleshooting
- Quick Reference Cheat Sheet
1. Prerequisites and Background Concepts
Before diving into time series analysis, it is helpful to be familiar with the following foundational concepts. Do not worry if you are not — each concept is briefly explained here.
1.1 Random Variables and Expectation
A random variable is a variable whose value is the outcome of a random phenomenon. The expectation (mean) of is:
The variance is:
1.2 Covariance and Correlation
The covariance between two random variables and measures how they move together:
The Pearson correlation normalises covariance to the range :
In time series, the covariance between a series and its own lagged values plays a central role.
1.3 The Lag Operator
The lag operator (also called the backshift operator ) shifts a time series back by one period:
The lag operator is a powerful notational tool that simplifies the expression of time series models:
1.4 White Noise
A sequence is called white noise if:
- for all .
- for all (constant variance).
- for all (no autocorrelation).
White noise is the building block of all time series models. If additionally , it is called Gaussian white noise.
1.5 The Normal Distribution
The normal distribution with probability density function:
is assumed for residuals in many time series models. Departures from normality are assessed during model diagnostics.
2. What is Time Series Analysis?
A time series is a sequence of observations recorded at successive, equally spaced points in time. Time series analysis is the set of methods used to understand the structure of a time series, model its behaviour, and generate forecasts of future values.
2.1 What Makes Time Series Special?
Unlike cross-sectional data (where observations are assumed to be independent), observations in a time series are ordered in time and are typically correlated with past values. This temporal dependence is both a challenge and an opportunity:
- Challenge: Standard statistical methods that assume independence are not directly applicable.
- Opportunity: Past behaviour contains information about future values, enabling forecasting.
2.2 Real-World Applications
Time series analysis is one of the most widely applied quantitative tools across many domains:
- Finance & Economics: Forecasting stock prices, exchange rates, GDP growth, inflation, and interest rates.
- Meteorology: Predicting temperature, rainfall, wind speed, and weather patterns.
- Healthcare: Monitoring patient vital signs over time; forecasting disease incidence and epidemic spread.
- Engineering & Manufacturing: Detecting anomalies in industrial sensors; predicting equipment failure.
- Retail & Supply Chain: Demand forecasting for inventory management and logistics planning.
- Energy: Forecasting electricity demand, solar/wind power generation.
- Transportation: Predicting traffic volumes, airline passenger counts, ride-sharing demand.
- Social Sciences: Analysing unemployment rates, population growth, and crime statistics over time.
2.3 Goals of Time Series Analysis
The primary goals of time series analysis are:
| Goal | Description |
|---|---|
| Description | Summarising the main features of the series (trend, seasonality, variability) |
| Explanation | Understanding relationships between the series and other variables |
| Decomposition | Separating the series into its underlying components |
| Modelling | Fitting a mathematical model that captures the series' structure |
| Forecasting | Predicting future values of the series with associated uncertainty |
| Anomaly Detection | Identifying unusual observations or structural breaks |
| Control | Monitoring a process and intervening when it deviates from target |
2.4 Types of Time Series Data
| Type | Description | Example |
|---|---|---|
| Univariate | A single variable observed over time | Monthly sales figures |
| Multivariate | Multiple variables observed over time | Daily temperature, humidity, and wind speed simultaneously |
| Continuous | Recorded continuously (or at very fine intervals) | ECG heart rate signal |
| Discrete | Recorded at distinct time points | Quarterly GDP |
| Equally spaced | Fixed time intervals between observations | Weekly stock closing prices |
| Irregularly spaced | Variable time intervals | Transaction data, event logs |
The DataStatPro application focuses primarily on univariate, equally spaced discrete time series, which covers the vast majority of applied use cases.
3. Components of a Time Series
Most time series can be understood as a combination of several underlying structural components. Identifying and separating these components is the first step in any time series analysis.
3.1 Trend ()
The trend is the long-term, systematic increase or decrease in the level of the series over time. It represents the underlying direction of the data, abstracting away short-term fluctuations.
- Upward trend: Series generally increases over time (e.g., global CO₂ levels, e-commerce sales).
- Downward trend: Series generally decreases over time (e.g., cost of solar panels per watt).
- No trend: Series fluctuates around a constant mean (e.g., a stationary series).
- Non-linear trend: Trend follows a curve (e.g., exponential growth, S-curve adoption).
3.2 Seasonality ()
Seasonality refers to regular, periodic fluctuations that repeat at a fixed and known frequency (the seasonal period ). Seasonality is caused by calendar or institutional factors.
| Frequency | Seasonal Period | Example |
|---|---|---|
| Monthly data | Retail sales peak in December | |
| Quarterly data | Energy consumption peaks in Q1/Q3 | |
| Weekly data | Traffic higher on weekdays | |
| Hourly data | Electricity demand peaks at 6–8pm |
💡 Seasonality is distinct from cyclical behaviour — seasonal patterns repeat at fixed intervals (e.g., every 12 months), whereas cycles have variable duration (typically 2–10 years) and are driven by broader economic forces.
3.3 Cyclical Fluctuations ()
Cyclical fluctuations are wave-like patterns that occur over periods longer than one seasonal cycle, typically driven by economic or business cycles. Unlike seasonal patterns, cycles:
- Do not have a fixed, regular period.
- Typically span 2–10 years.
- Are harder to model and forecast than seasonal patterns.
3.4 Irregular (Residual) Component ()
The irregular component (also called the error, noise, or residual) is the portion of the series that remains after removing trend, seasonality, and cyclical components. It represents:
- Random fluctuations with no predictable structure.
- One-off events: strikes, natural disasters, policy changes, pandemics.
- Measurement error.
In a well-fitted model, the irregular component should resemble white noise.
3.5 Summary of Components
Where the function depends on the decomposition model (additive or multiplicative — see Section 4).
4. Time Series Decomposition
Decomposition is the process of separating a time series into its constituent components. It provides a clearer picture of the underlying structure and aids in modelling and forecasting.
4.1 Additive Decomposition
In the additive model, the components are assumed to add together:
When to use: When the magnitude of seasonal fluctuations is constant over time, regardless of the level of the series. The amplitude of the seasonal swings does not change as the trend rises or falls.
4.2 Multiplicative Decomposition
In the multiplicative model, the components multiply together:
When to use: When the magnitude of seasonal fluctuations increases or decreases proportionally with the level of the series. As the trend rises, the seasonal swings get larger. Most economic and business time series exhibit multiplicative seasonality.
💡 A multiplicative model can be converted to an additive model by taking logarithms: . This is a common preprocessing step.
4.3 Moving Average Smoothing for Trend Estimation
A centred moving average (CMA) of order is the most common method for estimating the trend-cycle component:
For odd :
For even (requires a CMA to maintain centring):
For seasonal data with period , the CMA of order removes seasonality and smooths the irregular component, leaving the trend-cycle.
4.4 Classical Decomposition Procedure
Step 1: Estimate the Trend-Cycle () Apply a centred moving average of order (the seasonal period).
Step 2: Detrend the Series
- Additive:
- Multiplicative:
Step 3: Estimate Seasonal Component () Average the detrended values for each season across all years. Normalise so that seasonal indices sum to zero (additive) or average to 1 (multiplicative).
Step 4: Calculate the Irregular Component ()
- Additive:
- Multiplicative:
4.5 STL Decomposition (Seasonal and Trend Decomposition using Loess)
STL (Cleveland et al.) is a more robust and flexible decomposition method based on locally weighted regression (Loess). Advantages over classical decomposition:
- Can handle any seasonal period.
- Robust to outliers (uses iterative, robust Loess fitting).
- Allows the seasonal component to change over time.
- Does not require a symmetric moving average window at the series endpoints.
The STL decomposition is controlled by two primary smoothing parameters:
- : Seasonal smoothing window (must be odd; larger = smoother seasonal component).
- : Trend smoothing window (must be odd; larger = smoother trend).
4.6 Seasonal Adjustment
A seasonally adjusted series is obtained by removing the estimated seasonal component:
- Additive:
- Multiplicative:
Seasonally adjusted series are widely used in economic reporting (e.g., seasonally adjusted unemployment rate) to reveal the underlying trend more clearly.
5. Stationarity
Stationarity is the single most important concept in classical time series modelling. Nearly all standard time series models (ARMA, ARIMA) assume some form of stationarity.
5.1 Strict Stationarity
A process is strictly stationary if the joint distribution of is identical to the joint distribution of for all , all time points , and all shifts . This is a very strong condition.
5.2 Weak (Covariance) Stationarity
Weak stationarity (also called second-order stationarity) requires only:
- Constant mean: for all .
- Constant variance: for all .
- Autocovariance depends only on lag: for all , where is a function of the lag only, not of .
In practice, "stationarity" almost always refers to weak stationarity. A non-stationary series has a time-varying mean (trend), time-varying variance, or both.
5.3 Why Stationarity Matters
ARMA models are only valid for stationary series. Applying these models to a non-stationary series leads to:
- Spurious regressions (finding illusory relationships between unrelated trending series).
- Unreliable coefficient estimates and invalid inference.
- Poor forecasting performance (predictions diverge to infinity or are systematically biased).
5.4 Types of Non-Stationarity
| Type | Description | Solution |
|---|---|---|
| Trend stationarity | Series has a deterministic trend (mean increases linearly) | Detrend by regression on time |
| Difference stationarity (Unit Root) | Series has a stochastic trend (random walk) | First-difference: |
| Seasonal non-stationarity | Seasonal pattern is non-stationary | Seasonal differencing: |
| Heteroscedasticity | Variance changes over time | Log or Box-Cox transformation |
5.5 Differencing
First differencing () removes a stochastic linear trend (unit root):
Second differencing () removes a stochastic quadratic trend:
Seasonal differencing of order removes seasonal non-stationarity:
⚠️ Over-differencing introduces unnecessary MA structure and inflates variance. Always apply the minimum number of differences needed to achieve stationarity.
5.6 The Box-Cox Transformation
When the series exhibits heteroscedasticity (variance increases with the level), the Box-Cox transformation stabilises the variance:
Common choices:
- : Natural log transformation (most common).
- : Square root transformation.
- : No transformation.
- : Reciprocal transformation.
The optimal can be estimated by maximising the log-likelihood. The Guerrero method provides a fast, robust estimator.
5.7 Formal Tests for Stationarity
5.7.1 Augmented Dickey-Fuller (ADF) Test
The ADF test tests for the presence of a unit root (stochastic trend):
- : (unit root present → series is non-stationary).
- : (no unit root → series is stationary).
The test statistic is , compared to critical values from the Dickey-Fuller distribution (not the standard -distribution). A small (very negative) and small p-value lead to rejection of .
The lag order is chosen to remove autocorrelation from the residuals (e.g., using AIC/BIC).
Three variants exist based on the deterministic terms included:
| Variant | Equation | Use Case |
|---|---|---|
| No constant, no trend | Series fluctuates around zero | |
| With constant (drift) | Add | Series has a non-zero mean |
| With constant and trend | Add | Series has both a mean and a linear trend |
5.7.2 KPSS Test (Kwiatkowski-Phillips-Schmidt-Shin)
The KPSS test has the opposite null hypothesis from the ADF test:
- : Series is stationary (trend-stationary).
- : Series has a unit root (non-stationary).
The test statistic is:
Where is the partial sum of OLS residuals from regressing on a constant (or constant + trend), and is a long-run variance estimator.
Large KPSS statistic → reject → series is non-stationary.
💡 Using both ADF and KPSS together is recommended: if ADF rejects (non-stationary) and KPSS does not reject (stationary), there is a contradiction suggesting more careful examination. If both agree, the conclusion is clearer.
5.7.3 Phillips-Perron (PP) Test
The Phillips-Perron test is a non-parametric modification of the Dickey-Fuller test. Instead of adding lagged difference terms to control for serial correlation (as ADF does), it uses a non-parametric correction to the test statistic. It is more robust to heteroscedasticity and serial correlation in the errors.
- : Unit root present (non-stationary).
- : No unit root (stationary).
Decision rule and interpretation are the same as for the ADF test.
5.8 Determining the Order of Differencing
| Evidence | Action |
|---|---|
| ADF: fail to reject ; KPSS: reject | Apply first difference () |
| After first difference, ADF: reject ; KPSS: fail to reject | Series is I(1); use in ARIMA |
| After first difference, still non-stationary | Apply second difference (); rarely needed |
| ACF decays very slowly | Strong evidence of non-stationarity; difference required |
| ACF cuts off quickly | Series may already be stationary |
6. Autocorrelation and Partial Autocorrelation
The autocorrelation function (ACF) and partial autocorrelation function (PACF) are the primary tools for identifying the structure of a time series and selecting appropriate model orders.
6.1 Autocovariance Function
For a stationary process, the autocovariance at lag is:
Note that .
6.2 Autocorrelation Function (ACF)
The autocorrelation at lag is the autocovariance normalised by the variance:
Properties:
- always.
- for all .
- (symmetric).
Sample ACF: Estimated from data as:
Bartlett's approximate 95% confidence bounds for testing whether :
Autocorrelations that fall outside these bounds are considered statistically significant.
6.3 Partial Autocorrelation Function (PACF)
The partial autocorrelation at lag , denoted , measures the correlation between and after removing the linear influence of the intervening lags .
It can be computed using the Yule-Walker equations:
The PACF at lag is the last element of the solution vector.
95% confidence bounds for PACF:
6.4 ACF and PACF as Model Identification Tools
The patterns of ACF and PACF are the fingerprints of different time series models:
| Model | ACF Pattern | PACF Pattern |
|---|---|---|
| White Noise | No significant spikes at any lag | No significant spikes at any lag |
| AR() | Decays gradually (exponential or sinusoidal) | Cuts off sharply after lag |
| MA() | Cuts off sharply after lag | Decays gradually (exponential or sinusoidal) |
| ARMA(,) | Decays gradually after lag | Decays gradually after lag |
| Non-stationary (unit root) | Decays very slowly (near-unit persistence) | Large spike at lag 1, near zero thereafter |
| Seasonal AR | Significant spikes at multiples of (decaying) | Spike cuts off at lag |
| Seasonal MA | Spike at lag only | Decaying spikes at multiples of |
💡 In practice, ACF/PACF identification is an art as much as a science. Real data rarely produce textbook-perfect patterns. Use ACF/PACF alongside information criteria (AIC/BIC) for model selection.
6.5 The Ljung-Box Test for Autocorrelation
The Ljung-Box test (Box-Pierce modification) tests whether a group of autocorrelations are jointly zero:
Under (no autocorrelation up to lag ), .
- Large and small p-value: Significant autocorrelation present (residuals are not white noise).
- Small and large p-value: No significant autocorrelation (residuals resemble white noise).
This test is used primarily for residual diagnostics after fitting a model (see Section 10).
7. Classical Time Series Models
7.1 Autoregressive (AR) Models
An autoregressive model of order , denoted AR(), models the current value as a linear combination of its most recent past values plus white noise:
Where:
- is a constant (related to the mean: ).
- are the autoregressive coefficients.
- is white noise.
Using the lag operator:
Where is the AR characteristic polynomial.
7.1.1 Stationarity Condition for AR()
An AR() process is stationary if and only if all roots of the characteristic polynomial lie outside the unit circle:
For AR(1): The stationarity condition is simply .
For AR(2): The stationarity conditions are:
7.1.2 Properties of the AR(1) Process
The simplest AR model, AR(1):
- Mean:
- Variance:
- Autocovariance at lag :
- ACF: — decays exponentially.
- PACF: ; for — cuts off after lag 1.
7.2 Moving Average (MA) Models
A moving average model of order , denoted MA(), models the current value as a linear combination of the current and most recent white noise terms:
Where:
- is the mean of the process.
- are the moving average coefficients.
- .
Using the lag operator:
7.2.1 Properties of the MA() Process
- Always stationary regardless of the values of (since it is a finite linear combination of white noise terms).
- Mean: .
- Variance: .
- Autocovariance: for — cuts off after lag .
- ACF: Cuts off sharply after lag .
- PACF: Decays gradually (exponentially or with oscillation).
7.2.2 Invertibility Condition for MA()
An MA() process is invertible if all roots of the MA characteristic polynomial lie outside the unit circle (). Invertibility ensures a unique MA representation and is required for estimation and forecasting.
For MA(1): Invertibility requires .
7.3 ARMA Models
The Autoregressive Moving Average model of orders and , denoted ARMA(,), combines both AR and MA components:
Or compactly using the lag operator:
Stationarity requires the AR polynomial roots to lie outside the unit circle. Invertibility requires the MA polynomial roots to lie outside the unit circle. Both must hold for a well-behaved ARMA model.
Properties:
- The ACF and PACF both decay gradually (neither cuts off sharply).
- More parsimonious than a pure AR or MA model of the same effective order.
7.4 ARIMA Models
The Autoregressive Integrated Moving Average model, denoted ARIMA(,,), extends ARMA to handle non-stationary series by incorporating rounds of differencing:
Where:
- = order of the autoregressive part.
- = degree of differencing (the integration order).
- = order of the moving average part.
The differenced series follows an ARMA(,) model.
Special Cases:
| Model | Parameters | Description |
|---|---|---|
| ARIMA(0,1,0) | Random walk | |
| ARIMA(1,0,0) | , | AR(1) |
| ARIMA(0,0,1) | , | MA(1) |
| ARIMA(0,1,1) | , | Exponential smoothing |
| ARIMA(1,1,0) | , | Differenced AR(1) |
| ARIMA(0,2,2) | , | Equivalent to Holt's linear method |
7.5 SARIMA Models
The Seasonal ARIMA model, denoted SARIMA(,,)(,,), extends ARIMA by adding seasonal AR, differencing, and MA components with seasonal period :
Where:
- = non-seasonal AR order, differencing, MA order.
- = seasonal AR order, seasonal differencing, seasonal MA order.
- = seasonal period (e.g., 12 for monthly, 4 for quarterly, 7 for daily with weekly seasonality).
Common SARIMA Configurations:
| Model | Use Case |
|---|---|
| SARIMA(0,1,1)(0,1,1) | Airline model (Box-Jenkins); monthly data with seasonal differencing |
| SARIMA(1,1,1)(1,1,1) | General monthly seasonal model |
| SARIMA(0,1,1)(0,1,1) | Quarterly seasonal model |
| SARIMA(2,1,0)(1,1,0) | Monthly data with AR structure |
7.6 SARIMAX Models
The SARIMAX model extends SARIMA by including exogenous (external) predictor variables :
Where are exogenous variables (e.g., advertising spend, temperature, day-of-week indicators) and are their regression coefficients.
💡 SARIMAX is equivalent to a dynamic regression model with ARIMA errors. The exogenous variables account for the deterministic part of the series structure, while the ARIMA component models the stochastic error structure.
7.7 The Random Walk and Random Walk with Drift
Random Walk: ARIMA(0,1,0) with :
- Mean: (constant).
- Variance: → grows without bound.
- Forecasts: for all (naïve forecast).
Random Walk with Drift: ARIMA(0,1,0) with :
- Mean: → linear trend.
- Forecasts: (linear extrapolation).
8. Exponential Smoothing Models
Exponential smoothing methods are a family of forecasting procedures that generate weighted averages of past observations, with weights that decay exponentially as observations recede into the past. They are intuitive, robust, and widely used in practice.
8.1 Simple Exponential Smoothing (SES)
SES (also called single exponential smoothing) is appropriate for series with no trend and no seasonality (or trend and seasonality that have been removed).
Smoothed Level:
Where:
- is the smoothed level at time .
- is the smoothing parameter: larger gives more weight to recent observations.
Equivalently, as a weighted average of all past observations:
Forecasts: for all forecast horizons (flat forecast line).
Error Correction Form (ETS interpretation):
Where is the one-step-ahead forecast error. SES updates the level proportionally to the most recent error.
Equivalence: SES is equivalent to an ARIMA(0,1,1) model with .
8.2 Holt's Linear Exponential Smoothing (Double Exponential Smoothing)
Holt's method extends SES to handle series with a linear trend (but no seasonality), by separately smoothing both the level and the trend.
Level equation:
Trend (slope) equation:
Where:
- is the level smoothing parameter.
- is the trend smoothing parameter.
- is the estimated slope (trend) at time .
Forecasts:
The -step-ahead forecast is a linear extrapolation of the trend.
Equivalence: Holt's method is equivalent to ARIMA(0,2,2).
8.2.1 Damped Trend Method
The damped trend modification (Gardner & McKenzie) introduces a damping parameter that dampens the trend toward zero for longer forecast horizons, avoiding the unrealistic assumption of a constant growth rate into the indefinite future:
Level:
Trend:
Forecasts:
As , the forecast converges to (a constant).
💡 The damped trend method is among the most robust and widely recommended methods for general-purpose forecasting. It is less likely to over-extrapolate trends than undamped Holt's method.
8.3 Holt-Winters Exponential Smoothing (Triple Exponential Smoothing)
Holt-Winters extends Holt's method to handle series with both trend and seasonality.
8.3.1 Additive Holt-Winters
Appropriate when the seasonal variation is roughly constant in magnitude.
Level:
Trend:
Seasonal:
Forecasts ( steps ahead):
Where ensures we pick the correct seasonal index.
8.3.2 Multiplicative Holt-Winters
Appropriate when the seasonal variation grows proportionally with the level of the series.
Level:
Trend:
Seasonal:
Forecasts:
Where:
- is the seasonal smoothing parameter.
- are the seasonal indices (sum to for additive; average to 1 for multiplicative).
8.4 The ETS Framework
The ETS (Error, Trend, Seasonal) framework (Hyndman et al.) provides a unified taxonomy for all exponential smoothing methods based on the nature of:
- Error (E): Additive (A) or Multiplicative (M).
- Trend (T): None (N), Additive (A), Additive Damped (A).
- Seasonal (S): None (N), Additive (A), Multiplicative (M).
ETS Model Taxonomy (selected):
| ETS Model | Equivalent Method | Error | Trend | Seasonal |
|---|---|---|---|---|
| ETS(A,N,N) | Simple Exponential Smoothing | A | N | N |
| ETS(A,A,N) | Holt's Linear | A | A | N |
| ETS(A,A,N) | Damped Holt's | A | A | N |
| ETS(A,A,A) | Additive Holt-Winters | A | A | A |
| ETS(M,A,M) | Multiplicative Holt-Winters | M | A | M |
| ETS(M,A,M) | Damped Multiplicative HW | M | A | M |
The ETS framework provides a state space representation with a likelihood function, enabling proper:
- Maximum likelihood estimation of smoothing parameters.
- Model selection using AIC/BIC.
- Prediction intervals derived from the model's error structure.
8.5 Initialisation of Smoothing Parameters
Initial values for , , and are required to start the recursions. Common methods:
- Heuristic initialisation: Use the average of the first few observations for ; estimate from the first two periods; estimate initial seasonal indices from the first full seasonal cycle.
- Optimised initialisation: Treat the initial values as additional parameters to be optimised alongside , , by minimising the sum of squared errors (SSE).
The DataStatPro application uses optimised initialisation by default.
9. Model Identification, Estimation, and Selection
9.1 The Box-Jenkins Methodology
The Box-Jenkins methodology is the classical framework for ARIMA model building. It proceeds through three iterative stages:
Stage 1 — Identification
- Plot the time series and examine its features.
- Check stationarity (ADF, KPSS tests); apply transformations and differencing as needed.
- Examine the ACF and PACF of the (transformed, differenced) series to suggest tentative values of and .
Stage 2 — Estimation
- Estimate the parameters of the identified model(s) by maximum likelihood (or conditional least squares).
- Compute standard errors, t-statistics, and confidence intervals for parameters.
Stage 3 — Diagnostic Checking
- Analyse the residuals to verify they resemble white noise.
- Ljung-Box test for residual autocorrelation.
- Normality tests and Q-Q plots for residuals.
- If residuals are not white noise, return to Stage 1 with a modified model.
9.2 Maximum Likelihood Estimation (MLE) for ARIMA
For ARIMA models, parameters are estimated by maximising the log-likelihood of the observed data. Assuming Gaussian innovations, the exact log-likelihood is:
Where is the conditional variance of given all past values (computed via the Kalman filter for exact likelihood, or recursively for conditional likelihood). MLE is solved numerically using iterative algorithms (e.g., L-BFGS-B, Newton-Raphson).
9.3 Information Criteria for Model Selection
Information criteria penalise the log-likelihood for model complexity to avoid overfitting.
Akaike Information Criterion (AIC):
Corrected AIC (AICc) — recommended for small samples:
Bayesian Information Criterion (BIC):
Where:
- = number of estimated parameters.
- = number of observations.
Rules:
- Lower AIC / AICc / BIC = better model (better fit relative to complexity).
- AICc converges to AIC as but penalises more heavily for small .
- BIC penalises model complexity more strongly than AIC and tends to select more parsimonious models.
- Use AICc as the primary criterion for ARIMA model selection; use BIC as a robustness check.
9.4 Automatic ARIMA Selection (auto.ARIMA)
Searching all possible combinations of is computationally expensive. The Hyndman-Khandakar algorithm (implemented as auto.arima in R and replicated in DataStatPro) automates ARIMA selection:
- Determine and using unit root tests (KPSS for ; Canova-Hansen or KPSS on seasonally differenced series for ).
- Start with a default model (e.g., ARIMA(2,,2)(1,,1)).
- Evaluate neighbouring models (varying , , , by ±1).
- Select the model with the lowest AICc.
- Repeat until no neighbouring model improves AICc.
⚠️ Automatic selection is a useful starting point but should not replace domain knowledge and manual inspection of ACF/PACF plots, residual diagnostics, and out-of-sample forecast evaluation.
9.5 Forecast Accuracy Metrics
To compare competing models, accuracy metrics are computed on a hold-out (test) set of the last observations not used in model fitting:
Mean Error (ME):
Mean Absolute Error (MAE):
Root Mean Squared Error (RMSE):
Mean Absolute Percentage Error (MAPE):
Mean Absolute Scaled Error (MASE) — scale-free, robust to zero values:
Where the denominator is the in-sample MAE of the seasonal naïve forecast. MASE < 1 means the model outperforms the seasonal naïve benchmark.
| Metric | Scale | Sensitive to Outliers | Notes |
|---|---|---|---|
| MAE | Same as data | No | Easy to interpret |
| RMSE | Same as data | Yes | Penalises large errors more heavily |
| MAPE | Percentage | No | Undefined when ; biased for asymmetric series |
| MASE | Scale-free | No | Preferred for cross-series comparison |
10. Model Diagnostics
After fitting a time series model, residual analysis is essential to verify that the model has adequately captured the structure of the data.
10.1 Residual Definition
For a fitted model, the residuals (one-step-ahead forecast errors) are:
A well-specified model should produce residuals that are approximately white noise: uncorrelated, zero-mean, and homoscedastic.
10.2 Diagnostic Checks
10.2.1 Time Plot of Residuals
Plot against time. Look for:
- No obvious patterns, trends, or structural breaks.
- Roughly constant variance over time (homoscedasticity).
- No clustering of large or small residuals.
10.2.2 ACF of Residuals
Plot the ACF of the residuals. For a well-fitted model:
- No autocorrelations should fall significantly outside the bounds.
- Significant autocorrelation at any lag suggests the model has not fully captured the serial dependence.
10.2.3 Ljung-Box Test on Residuals
Test the joint significance of autocorrelations up to lag :
For residuals from an ARMA(,) fit, the degrees of freedom are adjusted to .
- : No evidence of residual autocorrelation → model is adequate.
- : Significant residual autocorrelation → model needs refinement.
10.2.4 Histogram and Q-Q Plot of Residuals
Assess normality of residuals:
- Histogram: Should be roughly bell-shaped and centred at zero.
- Q-Q Plot: Points should fall close to the 45° diagonal line.
- Departures indicate non-normality, which affects the validity of prediction intervals.
10.2.5 Jarque-Bera Test for Normality
A formal test for normality based on skewness () and excess kurtosis ():
- : Residuals are normally distributed.
- : Reject normality. Prediction intervals may be unreliable.
10.2.6 ARCH-LM Test for Heteroscedasticity
The ARCH-LM test (Engle, 1982) tests whether residual variance is serially correlated:
Test statistic: from this regression, distributed under of no ARCH effects.
- Significant result suggests volatility clustering: large errors tend to be followed by large errors.
- In such cases, a GARCH model (Section 12) should be considered.
10.3 Overfitting and Parsimony
A model with too many parameters may overfit the training data (low in-sample errors) but generalise poorly to new data (high out-of-sample errors). The principle of parsimony (Occam's Razor) favours the simplest model that adequately captures the data structure. AICc and BIC both penalise complexity to guard against overfitting.
11. Forecasting and Prediction Intervals
11.1 Point Forecasts
The -step-ahead point forecast made at time is the conditional expectation:
For ARIMA models, forecasts are computed recursively using the estimated model equations, replacing unknown future values with their forecasts and unknown future errors with zero.
AR() forecast:
Where for (known past values) and for (previously computed forecasts).
11.2 Forecast Error and Variance
The -step-ahead forecast error is:
For ARIMA models, has an infinite MA representation (the Wold decomposition):
Where the -weights can be derived recursively from the AR and MA polynomials. The variance of the -step-ahead forecast error is:
For : (one-step forecast error variance equals the innovation variance).
The forecast uncertainty grows with the forecast horizon , reflecting increasing uncertainty about the distant future.
11.3 Prediction Intervals
A prediction interval for is:
For a 95% prediction interval, .
Key properties:
- Prediction intervals are centred on the point forecast.
- They widen with forecast horizon (uncertainty accumulates).
- They assume Gaussian innovations; if residuals are non-normal, bootstrap prediction intervals are more appropriate.
- They account for model estimation uncertainty only approximately — they do not account for model misspecification uncertainty.
11.4 Bootstrap Prediction Intervals
When residuals are non-normal, bootstrap prediction intervals are more reliable:
- Fit the model; save the standardised residuals .
- For each bootstrap replicate : a. Simulate future errors by sampling with replacement from . b. Generate a simulated future path using the fitted model.
- The and percentiles of form the bootstrap PI.
Bootstrap PIs are distribution-free and automatically capture non-normality and non-linearity.
11.5 Benchmark Forecasting Methods
Before applying sophisticated models, it is good practice to compare against simple benchmark methods:
| Method | Formula | Use Case |
|---|---|---|
| Naïve | $\hat{Y}_{T+h | T} = Y_T$ |
| Seasonal Naïve | $\hat{Y}_{T+h | T} = Y_{T+h-m \cdot k}$ |
| Drift | $\hat{Y}_{T+h | T} = Y_T + h \frac{Y_T - Y_1}{T-1}$ |
| Mean | $\hat{Y}_{T+h | T} = \bar{Y}$ |
Any proposed model should outperform these benchmarks (MASE < 1 relative to seasonal naïve).
12. Advanced Topics
12.1 ARCH and GARCH Models
When residuals exhibit volatility clustering (periods of high volatility followed by high volatility, and calm followed by calm), standard ARIMA models with constant variance are inadequate.
12.1.1 ARCH() Model
The Autoregressive Conditional Heteroscedasticity model (Engle, 1982) models the conditional variance as:
Where and to ensure positive variance. The conditional variance depends on past squared residuals.
12.1.2 GARCH(,) Model
The Generalised ARCH model (Bollerslev, 1986) adds lagged conditional variances to the equation:
With constraints , , , and for stationarity.
The GARCH(1,1) model is by far the most widely used:
It captures the fact that large shocks to volatility decay slowly (volatility persistence = , typically close to 1 for financial data).
12.2 Vector Autoregression (VAR)
When analysing multiple time series simultaneously and modelling their interdependencies, VAR models extend univariate AR models to a multivariate setting.
A VAR() model for a -dimensional vector is:
Where:
- is a vector of constants.
- are coefficient matrices.
- is a vector of white noise innovations with covariance matrix .
VAR models are used for:
- Impulse response analysis: How does a shock to one variable propagate through the system over time?
- Forecast error variance decomposition: What proportion of the forecast error variance of one variable is explained by shocks to each variable?
- Granger causality testing: Does past information about improve forecasts of ?
12.3 Structural Breaks
A structural break is a sudden change in the parameters of a time series model — e.g., a shift in the mean, a change in slope, or a change in variance — caused by an external event (financial crisis, policy change, pandemic).
Detection methods:
- Chow test: Tests for a break at a known date by comparing residual sums of squares from models fitted on subsamples.
- CUSUM test: Tests for parameter instability by examining cumulative sums of recursive residuals.
- Bai-Perron test: Data-driven detection of multiple unknown breakpoints.
Handling structural breaks:
- Include dummy variables at known break dates in the model.
- Split the series and model each segment separately.
- Use state space models that allow parameters to evolve over time.
12.4 Spectral Analysis
Spectral analysis (frequency-domain analysis) decomposes a time series into sinusoidal components of different frequencies, revealing cyclical patterns.
The spectral density (power spectrum) at frequency is:
Peaks in the spectral density correspond to dominant cyclical frequencies. The periodogram is the sample estimate of the spectral density.
13. Using the Time Series Component
The Time Series component in the DataStatPro application provides a full end-to-end workflow for analysing and forecasting time series data.
Step-by-Step Guide
Step 1 — Select Dataset Choose the dataset from the "Dataset" dropdown. The dataset should contain:
- A time/date column (or an index column representing equally spaced observations).
- One or more numeric value columns representing the time series.
Step 2 — Select Time Series Variable Select the numeric variable to analyse from the "Time Series Variable" dropdown.
Step 3 — Select Date/Index Column Select the column identifying the time ordering of observations. Specify the frequency/period (e.g., monthly = 12, quarterly = 4, weekly = 52, daily = 365, hourly = 24).
Step 4 — Select Analysis Type Choose the type of analysis to perform:
- Exploratory Analysis (decomposition, ACF/PACF plots, stationarity tests)
- ARIMA / SARIMA Modelling
- Exponential Smoothing (ETS)
- SARIMAX (with exogenous variables)
- GARCH Modelling
- VAR (multivariate; requires selecting multiple series)
Step 5 — Configure Preprocessing
- Apply Box-Cox transformation (specify or use automatic selection).
- Apply differencing (specify and , or use automatic selection via unit root tests).
Step 6 — Configure Model
For ARIMA/SARIMA:
- Specify manually, or select "Auto" for automatic selection via AICc.
- Set maximum search bounds for auto selection (e.g., max , max ).
- Choose estimation method (MLE or conditional least squares).
For Exponential Smoothing (ETS):
- Choose model type manually (e.g., ETS(A,A,A)) or select "Auto" for automatic selection via AICc.
- Enable/disable damped trend.
- Specify initial values method (heuristic or optimised).
Step 7 — Set Forecast Horizon Specify the number of periods to forecast ahead (). Choose the confidence level for prediction intervals (default: 95%).
Step 8 — Select Display Options Choose which outputs to display:
- ✅ Time Series Plot
- ✅ Decomposition Plot (trend, seasonal, residual)
- ✅ ACF and PACF Plots
- ✅ Stationarity Test Results (ADF, KPSS, PP)
- ✅ Model Summary (coefficients, SE, t-values, p-values)
- ✅ Information Criteria (AIC, AICc, BIC)
- ✅ Residual Diagnostics (time plot, ACF, histogram, Q-Q plot)
- ✅ Ljung-Box Test Results
- ✅ Forecast Plot with Prediction Intervals
- ✅ Forecast Table
- ✅ Accuracy Metrics (MAE, RMSE, MAPE, MASE)
Step 9 — Run the Analysis Click "Run Time Series Analysis". The application will:
- Parse and sort the time series by date/index.
- Apply any specified transformations.
- Run stationarity tests and produce ACF/PACF plots.
- Fit the specified (or automatically selected) model.
- Run residual diagnostics.
- Generate forecasts with prediction intervals.
- Compute accuracy metrics on the hold-out set (if configured).
- Produce all selected visualisations and tables.
14. Computational and Formula Details
14.1 The Wold Decomposition and -Weights
Any covariance-stationary process can be represented as an infinite MA:
For an ARMA(,) model, the -weights satisfy the recursion:
Where for and for .
These weights are used to compute forecast error variances and prediction intervals.
14.2 Yule-Walker Equations for AR Parameter Estimation
For an AR() model, the Yule-Walker equations relate the ACF to the AR coefficients:
Or in matrix form: , giving .
The innovation variance is: .
14.3 ARIMA Estimation via the Kalman Filter
Exact MLE for ARIMA models is computed using the Kalman filter, which recursively computes the innovations and their variances:
State space representation of ARIMA(,,):
Where , , are matrices determined by the ARIMA orders. The Kalman filter provides the optimal linear predictor and the innovation at each step. The log-likelihood is:
Where is the innovation variance at time .
14.4 Seasonal Decomposition Formulae
For a series with seasonal period , the classical decomposition proceeds as:
Step 1: Estimate trend using CMA of order .
For even (e.g., monthly data, ):
Step 2: Compute deseasonalised values.
Step 3: Average over all periods of the same season to get raw seasonal indices:
Step 4: Normalise seasonal indices.
Additive: (so they sum to zero).
Multiplicative: (so they average to 1).
Step 5: Compute irregular component.
14.5 Computing ACF and PACF
Sample ACF at lag :
Sample PACF via Durbin-Levinson algorithm:
Initialise:
For :
The PACF at lag is .
14.6 Optimising Exponential Smoothing Parameters
Smoothing parameters are estimated by minimising the sum of squared one-step-ahead forecast errors:
Subject to constraints (e.g., , , ). This is solved using numerical optimisation (e.g., Nelder-Mead, L-BFGS-B), typically starting from a grid of initial values to avoid local minima.
15. Worked Examples
Example 1: ARIMA Modelling of Monthly Sales Data
Data: Monthly sales figures for a retail company, months (5 years), no seasonal pattern.
Step 1: Plot and Examine the Series
Visual inspection reveals an upward trend with roughly constant variance → possible ARIMA model with .
Step 2: Stationarity Testing
ADF test on original series: , → Fail to reject → Non-stationary.
Apply first difference: .
ADF test on : , → Reject → Stationary after first differencing. Therefore .
KPSS test on : statistic = 0.12, → Fail to reject → Stationary. Both tests agree: .
Step 3: ACF and PACF of
| Lag | ACF | Significant? | PACF | Significant? |
|---|---|---|---|---|
| 1 | -0.312 | Yes | -0.312 | Yes |
| 2 | 0.051 | No | -0.065 | No |
| 3 | -0.038 | No | -0.052 | No |
| 4 | 0.029 | No | 0.018 | No |
Pattern: ACF has a single significant spike at lag 1 (cuts off after lag 1); PACF decays (though also approximately cuts off after lag 1). This suggests an MA(1) process for , i.e., ARIMA(0,1,1).
Step 4: Fit ARIMA(0,1,1)
Estimated model:
For completeness, also fit ARIMA(1,1,0): , AICc = 424.3. ARIMA(0,1,1) has lower AICc → preferred.
Step 5: Residual Diagnostics
- Time plot of residuals: No visible pattern, roughly constant spread ✅
- ACF of residuals: All autocorrelations within bounds ✅
- Ljung-Box test at lag 10: , , → No significant autocorrelation ✅
- Jarque-Bera test: , → Residuals are approximately normal ✅
Model passes all diagnostic checks.
Step 6: Forecasting
For steps ahead, the ARIMA(0,1,1) forecasts are:
The -weights: , for .
95% Prediction Intervals:
For : , , PI =
For : ,
For : ,
| Horizon | Forecast | 95% PI Lower | 95% PI Upper |
|---|---|---|---|
| 524.3 | 499.8 | 548.8 | |
| 524.3 | 494.9 | 553.7 | |
| 524.3 | 490.9 | 557.7 |
Note: Prediction intervals widen with horizon, reflecting growing uncertainty.
Example 2: SARIMA Modelling of Monthly Airline Passenger Data
Data: Monthly international airline passenger counts (thousands), months (12 years). This is the classic Box-Jenkins "airline dataset."
Step 1: Plot and Transform
The series shows:
- A clear upward trend.
- Multiplicative seasonality (seasonal swings increase with the level).
Apply log transformation () to stabilise variance:
Step 2: Determine Differencing Orders
ADF test on : non-stationary → apply regular differencing ().
Canova-Hansen test on : seasonal non-stationarity detected → apply seasonal differencing (, ).
Let .
ADF test on : stationary ✅. Therefore , .
Step 3: ACF and PACF of
Significant ACF spikes at lags 1 and 12; PACF decays at lag 1 and shows a spike at lag 12. This pattern strongly suggests:
- Non-seasonal MA(1): .
- Seasonal MA(1): .
- No AR terms: .
Candidate model: SARIMA(0,1,1)(0,1,1) — the airline model.
Step 4: Fit SARIMA(0,1,1)(0,1,1) on
| Parameter | Estimate | SE | z-value | p-value |
|---|---|---|---|---|
| (non-seasonal MA) | -0.402 | 0.083 | -4.84 | < 0.001 |
| (seasonal MA) | -0.557 | 0.073 | -7.63 | < 0.001 |
| 0.00134 | — | — | — |
AICc = −467.3.
Step 5: Residual Diagnostics
- Ljung-Box test at lag 24: , , → No autocorrelation ✅
- Jarque-Bera: → Approximately normal ✅
Step 6: Forecasting (12 months ahead)
Forecasts are generated on the log scale and back-transformed with a bias correction:
| Month | (000s) | 95% PI Lower | 95% PI Upper | |
|---|---|---|---|---|
| 6.181 | 483.5 | 447.2 | 522.8 | |
| 6.302 | 545.1 | 490.3 | 606.2 | |
| 6.249 | 513.7 | 446.9 | 590.4 |
Example 3: Holt-Winters Forecasting of Quarterly Retail Sales
Data: Quarterly retail sales ( quarters, 10 years). Clear upward trend and stable multiplicative seasonality.
Selected Model: ETS(M,A,M) — Multiplicative Holt-Winters.
Estimated Parameters (via SSE minimisation):
Final State Values at :
Seasonal Indices:
| Quarter | |
|---|---|
| Q1 | 0.863 |
| Q2 | 0.971 |
| Q3 | 1.048 |
| Q4 | 1.118 |
Check: , average = 1.00 ✅
4-Step-Ahead Forecasts:
(Q1 of next year):
(Q2):
(Q3):
(Q4):
Accuracy Metrics (on 8-quarter hold-out set):
| Metric | Value |
|---|---|
| MAE | 12.4 |
| RMSE | 15.7 |
| MAPE | 3.2% |
| MASE | 0.61 |
MASE = 0.61 < 1: The Holt-Winters model outperforms the seasonal naïve benchmark by 39%.
16. Common Mistakes and How to Avoid Them
Mistake 1: Fitting ARMA to a Non-Stationary Series
Problem: Applying ARMA models directly to a trending or non-stationary series, producing spurious results, unreliable coefficients, and invalid inference.
Solution: Always test for stationarity (ADF, KPSS) before fitting ARMA. Apply the necessary differencing (and/or transformation) to achieve stationarity. Use ARIMA(,,) with the appropriate .
Mistake 2: Over-Differencing
Problem: Applying more differences than necessary (e.g., differencing twice when once is sufficient), which introduces unnecessary MA components and inflates forecast variance.
Solution: Apply the minimum number of differences needed to pass stationarity tests. Check whether the differenced series passes ADF/KPSS before differencing again. If the ACF of the differenced series shows a large negative spike at lag 1, it may be over-differenced.
Mistake 3: Ignoring Seasonality
Problem: Fitting a non-seasonal ARIMA model to a clearly seasonal series, leaving seasonal structure in the residuals (which will show spikes in the ACF at seasonal lags).
Solution: Identify the seasonal period from domain knowledge and data inspection. Apply seasonal differencing if needed (). Use SARIMA or ETS models with seasonal components. Always check the ACF of residuals at seasonal lags.
Mistake 4: Confusing ACF and PACF Patterns
Problem: Misreading the ACF/PACF and specifying wrong model orders (e.g., using an AR model when MA would be more appropriate).
Solution: Remember: ACF cuts off for MA; PACF cuts off for AR; both decay for ARMA. Use information criteria (AICc, BIC) alongside ACF/PACF to narrow down model orders. Consider multiple candidate models.
Mistake 5: Not Checking Residual Diagnostics
Problem: Accepting a model without verifying that the residuals are white noise, leading to a mis-specified model with poor forecasting performance.
Solution: Always perform a full residual diagnostic check: time plot, ACF plot, Ljung-Box test, histogram, Q-Q plot. If residuals show autocorrelation, refine the model. If they show heteroscedasticity, consider GARCH.
Mistake 6: Evaluating Model Fit on Training Data Only
Problem: Selecting a model based solely on in-sample fit metrics (e.g., the lowest AIC) without validating forecast performance on held-out data.
Solution: Reserve the last observations as a test set. Compute out-of-sample accuracy metrics (RMSE, MASE). Use time-series cross-validation (rolling-origin or expanding-window) for more robust evaluation.
Mistake 7: Ignoring Structural Breaks
Problem: Fitting a single model over a period that contains a structural break (e.g., a financial crisis, a pandemic), leading to poor fit and unreliable forecasts.
Solution: Plot the series and look for sudden level shifts or trend changes. Test for breaks formally (CUSUM, Chow, Bai-Perron). If breaks are detected, model each segment separately or include dummy variables.
Mistake 8: Treating Multiplicative Seasonality as Additive
Problem: Using an additive decomposition or additive Holt-Winters when seasonal variation grows with the level, underestimating seasonality in peak periods and overestimating in troughs.
Solution: Plot the series and assess whether seasonal swings are roughly constant (additive) or grow proportionally with the level (multiplicative). Apply a log transformation to convert multiplicative to additive, or use the multiplicative ETS model directly.
Mistake 9: Extrapolating Trends Too Far
Problem: Generating long-horizon forecasts from a model with a strong trend, leading to unrealistic forecasts that grow without bound.
Solution: Use the damped trend method (ETS with damping) for longer horizons. Report widening prediction intervals to communicate growing uncertainty. Treat long-range forecasts with appropriate scepticism.
Mistake 10: Using MAPE with Near-Zero Values
Problem: MAPE is undefined or extremely large when the actual values are zero or close to zero, leading to misleading accuracy assessments.
Solution: Use MASE or RMSE instead of MAPE when the series contains zero or very small values. MASE is always well-defined and has the additional advantage of being scale-free.
17. Troubleshooting
| Issue | Likely Cause | Solution |
|---|---|---|
| ARIMA model fails to converge | Very short series; too many parameters; near-unit-root behaviour | Reduce , ; check stationarity; use simpler model |
| AICc selects a very high-order model ( or ) | Insufficient data; non-stationarity not fully addressed; outliers | Increase data length; check stationarity; inspect for outliers; try REML-based estimation |
| Residual ACF shows significant spike at seasonal lag (e.g., lag 12) | Seasonal component not modelled | Add seasonal MA or AR term ( or ); apply seasonal differencing () |
| Residual ACF has one large negative spike at lag 1 | Over-differencing ( or too large) | Reduce differencing order by 1 |
| Prediction intervals are extremely wide | High or ; large ; long forecast horizon | Reconsider differencing; check for outliers inflating ; report shorter horizon |
| ADF and KPSS tests give contradictory results | Near-unit-root behaviour; small sample | Increase sample if possible; use PP test as tiebreaker; consult ACF pattern |
| Forecasts quickly converge to a flat line | Random walk structure (, no AR); or SES applied | This is expected behaviour for ARIMA(0,1,1); use Holt's if trend is needed |
| Holt-Winters gives very poor out-of-sample accuracy | Wrong seasonality type (additive vs. multiplicative); outliers at end of series | Try both additive and multiplicative; inspect and handle outliers |
| GARCH estimation fails to converge | Near-integrated volatility (); insufficient data | Try IGARCH; increase data; use simpler ARCH(1) |
| Ljung-Box test is always significant regardless of model | Outliers or structural breaks inflating residual autocorrelation | Identify and handle outliers; test for structural breaks; use robust estimation |
| Seasonal naïve outperforms all fitted models (MASE > 1) | Insufficient data to estimate model; strong irregular component | Collect more data; consider ensemble approaches; report seasonal naïve as the baseline |
| Log transformation produces negative back-transformed forecasts | Series contains zeros or negative values | Use Box-Cox with ; add a constant before transforming |
18. Quick Reference Cheat Sheet
Core Model Equations
| Model | Equation |
|---|---|
| AR() | |
| MA() | |
| ARMA(,) | |
| ARIMA(,,) | |
| SARIMA(,,)(,,) | |
| SES | ; |
| Holt's | ; ; |
| HW Additive | |
| HW Multiplicative | |
| GARCH(1,1) |
ACF/PACF Pattern Guide
| Model | ACF | PACF |
|---|---|---|
| White Noise | No spikes | No spikes |
| AR() | Decays exponentially/sinusoidally | Cuts off after lag |
| MA() | Cuts off after lag | Decays exponentially/sinusoidally |
| ARMA(,) | Decays after lag | Decays after lag |
| Non-stationary | Very slow decay (near 1.0) | Large spike at lag 1 |
| Seasonal AR() | Spikes at decaying | Spike at lag only |
| Seasonal MA() | Spike at lag only | Spikes at decaying |
Stationarity Tests Summary
| Test | Significant result means | ||
|---|---|---|---|
| ADF | Unit root (non-stationary) | Stationary | : Stationary |
| KPSS | Stationary | Non-stationary (unit root) | : Non-stationary |
| PP | Unit root (non-stationary) | Stationary | : Stationary |
Model Selection Guide
| Scenario | Recommended Model |
|---|---|
| No trend, no seasonality | SES / ARIMA(0,1,1) |
| Trend, no seasonality | Holt's / ARIMA(0,2,2) |
| Trend, damped | Damped Holt's / ETS(A,A,N) |
| Additive seasonality + trend | Additive HW / SARIMA |
| Multiplicative seasonality + trend | Multiplicative HW / SARIMA on log scale |
| Volatility clustering (financial data) | GARCH(1,1) on residuals |
| Multiple interrelated series | VAR() |
| External predictors available | SARIMAX |
| Unknown structure; small dataset | ETS with automatic selection |
| Unknown structure; large dataset | Auto ARIMA |
Differencing Guide
| Pattern in Original Series | Action |
|---|---|
| No trend, no seasonality, ACF decays fast | No differencing needed (, ) |
| Linear trend; ADF non-significant | First difference () |
| Quadratic trend | Second difference () |
| Seasonal non-stationarity | Seasonal difference () |
| Both trend and seasonal non-stationarity | and |
| Increasing variance | Log or Box-Cox transformation first |
Information Criteria Reference
| Criterion | Formula | Prefer | Notes |
|---|---|---|---|
| AIC | Lower | Can overfit with small | |
| AICc | Lower | Recommended for time series | |
| BIC | Lower | More parsimonious than AIC |
Forecast Accuracy Metrics
| Metric | Formula | Notes |
|---|---|---|
| MAE | Intuitive; same units as data | |
| RMSE | Penalises large errors; same units | |
| MAPE | Percentage; undefined if | |
| MASE | Scale-free; MASE < 1 beats naïve |
ETS Model Taxonomy
| Error | Trend | Seasonal | ETS Code | Method |
|---|---|---|---|---|
| A | N | N | ETS(A,N,N) | SES |
| A | A | N | ETS(A,A,N) | Holt's Linear |
| A | A | N | ETS(A,A,N) | Damped Holt's |
| A | A | A | ETS(A,A,A) | Additive HW |
| M | A | M | ETS(M,A,M) | Multiplicative HW |
| M | A | M | ETS(M,A,M) | Damped Multiplicative HW |
This tutorial provides a comprehensive foundation for understanding, applying, and interpreting Time Series Analysis using the DataStatPro application. For further reading, consult Hyndman & Athanasopoulos's "Forecasting: Principles and Practice" (freely available at otexts.com/fpp3), Box, Jenkins, Reinsel & Ljung's "Time Series Analysis: Forecasting and Control", or Brockwell & Davis's "Introduction to Time Series and Forecasting". For feature requests or support, contact the DataStatPro team.