Skip to content

Time Series Analysis

Time series analysis encompasses a collection of statistical methods and machine learning techniques for handling data points ordered chronologically. From stock price prediction to weather forecasting, from server load monitoring to supply chain management, time series analysis has extremely broad applications in industry.

Learning path: Stationarity testing → Classical statistical models → Exponential smoothing → ML feature engineering → Deep learning methods → Evaluation and practice


Overview of Time Series Analysis

Basic Concepts

A time series \(\{y_t\}_{t=1}^{T}\) is a sequence of data observed at equally spaced time points. The core components of a time series include:

Component Description Example
Trend Long-term upward or downward direction GDP growing year over year
Seasonality Regular fluctuations with a fixed period Retail sales surging every Christmas
Cyclicity Fluctuations without a fixed period Business cycles (recessions and booms)
Noise Unpredictable random fluctuations Measurement errors

Stationarity

Stationarity is the most fundamental concept in time series analysis. A strictly stationary process has a joint distribution invariant to time shifts. In practice, weak stationarity (Wide-Sense Stationary) is more commonly used:

  1. Constant mean: \(\mathbb{E}[y_t] = \mu\) for all \(t\)
  2. Finite and constant variance: \(\text{Var}(y_t) = \sigma^2 < \infty\)
  3. Autocovariance depends only on lag: \(\text{Cov}(y_t, y_{t+h}) = \gamma(h)\), depending only on the lag \(h\)

Stationarity tests:

  • ADF test (Augmented Dickey-Fuller): Tests for the presence of a unit root; a small p-value leads to rejection of the "non-stationary" null hypothesis
  • KPSS test: The null hypothesis is stationarity; using both tests together is more reliable
  • Differencing: Apply \(d\)-th order differencing to a non-stationary series to achieve stationarity, \(\Delta y_t = y_t - y_{t-1}\)

Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF)

The ACF measures the linear correlation between a time series and its lagged version:

\[ \rho(h) = \frac{\gamma(h)}{\gamma(0)} = \frac{\text{Cov}(y_t, y_{t+h})}{\text{Var}(y_t)} \]

The PACF measures the direct linear relationship between \(y_t\) and \(y_{t+h}\) after removing the effects of intermediate lags.

ACF and PACF plots are essential tools for selecting model orders:

Model ACF pattern PACF pattern
AR(p) Tails off (exponential decay) Cuts off after lag \(p\)
MA(q) Cuts off after lag \(q\) Tails off
ARMA(p,q) Tails off Tails off

Classical Methods

AR / MA / ARMA / ARIMA Models

AR(p) -- Autoregressive model: The current value is a linear combination of the past \(p\) values plus white noise:

\[ y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_p y_{t-p} + \epsilon_t \]

MA(q) -- Moving Average model: The current value is a linear combination of the past \(q\) noise terms:

\[ y_t = \mu + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \dots + \theta_q \epsilon_{t-q} \]

ARMA(p,q): Combines AR and MA:

\[ y_t = c + \sum_{i=1}^{p} \phi_i y_{t-i} + \epsilon_t + \sum_{j=1}^{q} \theta_j \epsilon_{t-j} \]

ARIMA(p,d,q): Adds differencing to ARMA to handle non-stationary series. \(d\) is the order of differencing.

Box-Jenkins Methodology

The Box-Jenkins methodology is a systematic procedure for selecting and fitting ARIMA models:

  1. Identification: Examine ACF/PACF plots to determine candidate values for \(p\), \(d\), \(q\)
  2. Estimation: Fit model parameters using Maximum Likelihood Estimation (MLE)
  3. Diagnostics: Check whether residuals are white noise (Ljung-Box test)
  4. Forecasting: If diagnostics pass, use the model for prediction

Seasonal Decomposition

Seasonal ARIMA, i.e., SARIMA(p,d,q)(P,D,Q)\(_s\), where \(s\) is the seasonal period.

Classical additive/multiplicative decomposition:

  • Additive model: \(y_t = T_t + S_t + R_t\) (trend + seasonality + residual)
  • Multiplicative model: \(y_t = T_t \times S_t \times R_t\)

STL decomposition (Seasonal and Trend decomposition using Loess) is a more robust method that can handle changing seasonality.


Exponential Smoothing

Simple Exponential Smoothing (SES)

Suitable for series with no trend and no seasonality:

\[ \hat{y}_{t+1} = \alpha y_t + (1 - \alpha) \hat{y}_t, \quad 0 < \alpha < 1 \]

A larger \(\alpha\) gives more weight to recent observations; a smaller \(\alpha\) produces smoother forecasts.

Double Exponential Smoothing (Holt's Method)

Adds a trend component:

\[ \ell_t = \alpha y_t + (1 - \alpha)(\ell_{t-1} + b_{t-1}) \]
\[ b_t = \beta(\ell_t - \ell_{t-1}) + (1 - \beta) b_{t-1} \]
\[ \hat{y}_{t+h} = \ell_t + h \cdot b_t \]

where \(\ell_t\) is the level component and \(b_t\) is the trend component.

Holt-Winters Method

Further incorporates a seasonal component, with both additive and multiplicative variants:

Method Use case Seasonality behavior
Holt-Winters additive Constant seasonal amplitude \(S_t\) is added to the forecast
Holt-Winters multiplicative Seasonal amplitude grows with trend \(S_t\) multiplies the forecast

Prophet

Overview

Prophet is an open-source time series forecasting tool from Meta (formerly Facebook), designed specifically for business time series. It is robust to missing values, outliers, and trend changes.

Core Model

Prophet decomposes a time series into three additive components:

\[ y(t) = g(t) + s(t) + h(t) + \epsilon_t \]
Component Description Implementation
\(g(t)\): Trend Long-term growth trend Piecewise linear or logistic growth curve with automatic changepoint detection
\(s(t)\): Seasonality Periodic patterns Fourier series: \(s(t) = \sum_{n=1}^{N}\left(a_n \cos\frac{2\pi nt}{P} + b_n \sin\frac{2\pi nt}{P}\right)\)
\(h(t)\): Holidays Holiday/special event effects User-provided holiday list; model estimates effect sizes

Advantages of Prophet:

  • User-friendly for non-data-scientists with intuitive parameters
  • Automatically handles missing data and outliers
  • Allows manual addition of changepoints and holidays
  • Built-in uncertainty intervals

Machine Learning Methods

Feature Engineering

Transforming time series into tabular data is the key to applying traditional ML models:

Feature type Example Description
Sliding window (lag features) \(y_{t-1}, y_{t-2}, \dots, y_{t-k}\) Values from the past \(k\) time steps
Rolling statistics Moving average, rolling standard deviation Captures local trends and volatility
Date/time features Month, day of week, hour, is_holiday Encodes temporal periodicity
Difference features \(y_t - y_{t-1}\), \(y_t - y_{t-7}\) Captures changes
Fourier features \(\sin(2\pi t / P)\), \(\cos(2\pi t / P)\) Encodes seasonality

XGBoost / LightGBM for Time Series

Gradient boosted trees perform exceptionally well in time series competitions:

  • Advantages: No stationarity assumption required, automatically handles nonlinearity, can incorporate external features
  • Caveats: Must use time-ordered cross-validation (no random splitting) to avoid data leakage
  • Multi-step forecasting: Recursive forecasting (predict step-by-step, feeding predictions as next inputs) or direct multi-output

Deep Learning Methods

LSTM for Time Series

LSTM (Long Short-Term Memory) is naturally suited for sequence modeling:

  • Encoder-decoder architecture for multi-step forecasting
  • Can handle multivariate time series
  • Drawbacks: slow training, sensitive to hyperparameters

Temporal Fusion Transformer (TFT)

Google's TFT (2021) combines several advanced techniques:

  • Variable selection network: Automatically identifies important features
  • Temporal attention: Captures both short- and long-term dependencies
  • Interpretability: Provides feature importance scores and temporal attention weights
  • Achieved SOTA on multiple benchmark datasets

PatchTST

Nie et al. (2023) proposed segmenting time series into patches (similar to how ViT processes images):

  • Splits long sequences into fixed-length patches
  • Each patch serves as a token input to the Transformer
  • Dramatically reduces computational complexity while preserving long-range dependencies
  • Channel-independence strategy improves multivariate forecasting

Evaluation Methods

Common Evaluation Metrics

Metric Formula Characteristics
MAE \(\frac{1}{T}\sum_{t=1}^T \|y_t - \hat{y}_t\|\) Intuitive, robust to outliers
RMSE \(\sqrt{\frac{1}{T}\sum_{t=1}^T (y_t - \hat{y}_t)^2}\) Amplifies large errors
MAPE \(\frac{100\%}{T}\sum_{t=1}^T \left\|\frac{y_t - \hat{y}_t}{y_t}\right\|\) Percentage error, but unstable when \(y_t \approx 0\)
sMAPE \(\frac{200\%}{T}\sum_{t=1}^T \frac{\|y_t - \hat{y}_t\|}{\|y_t\| + \|\hat{y}_t\|}\) Symmetric version of MAPE
MASE \(\frac{\text{MAE}}{\text{MAE}_{\text{naive}}}\) Improvement relative to naive forecast, suitable for cross-series comparison

Backtesting

Time series evaluation must respect temporal ordering:

  • Rolling window validation: A fixed-size window slides forward; the model is retrained and evaluated at each step
  • Expanding window validation: The training set progressively grows while the prediction window moves forward
  • No future data leakage: Strictly ensure all training data precedes the prediction time point
Expanding window validation illustration:

Fold 1: [=====Train=====][Test]
Fold 2: [======Train======][Test]
Fold 3: [=======Train=======][Test]
Fold 4: [========Train========][Test]
                                    → Time direction

Method Selection Guide

Scenario Recommended method Rationale
Small data, univariate ARIMA / Exponential smoothing Few parameters, less prone to overfitting
Business forecasting (with seasonality/holidays) Prophet Easy to use, interpretable
Rich external features XGBoost / LightGBM Strong feature integration capability
Long sequences, multivariate, large data Transformer-based (TFT/PatchTST) Powerful modeling capacity
Uncertainty estimation needed GP / Bayesian methods / Prophet Built-in uncertainty quantification

评论 #