Time Series Analysis
Time series analysis encompasses a collection of statistical methods and machine learning techniques for handling data points ordered chronologically. From stock price prediction to weather forecasting, from server load monitoring to supply chain management, time series analysis has extremely broad applications in industry.
Learning path: Stationarity testing → Classical statistical models → Exponential smoothing → ML feature engineering → Deep learning methods → Evaluation and practice
Overview of Time Series Analysis
Basic Concepts
A time series \(\{y_t\}_{t=1}^{T}\) is a sequence of data observed at equally spaced time points. The core components of a time series include:
| Component | Description | Example |
|---|---|---|
| Trend | Long-term upward or downward direction | GDP growing year over year |
| Seasonality | Regular fluctuations with a fixed period | Retail sales surging every Christmas |
| Cyclicity | Fluctuations without a fixed period | Business cycles (recessions and booms) |
| Noise | Unpredictable random fluctuations | Measurement errors |
Stationarity
Stationarity is the most fundamental concept in time series analysis. A strictly stationary process has a joint distribution invariant to time shifts. In practice, weak stationarity (Wide-Sense Stationary) is more commonly used:
- Constant mean: \(\mathbb{E}[y_t] = \mu\) for all \(t\)
- Finite and constant variance: \(\text{Var}(y_t) = \sigma^2 < \infty\)
- Autocovariance depends only on lag: \(\text{Cov}(y_t, y_{t+h}) = \gamma(h)\), depending only on the lag \(h\)
Stationarity tests:
- ADF test (Augmented Dickey-Fuller): Tests for the presence of a unit root; a small p-value leads to rejection of the "non-stationary" null hypothesis
- KPSS test: The null hypothesis is stationarity; using both tests together is more reliable
- Differencing: Apply \(d\)-th order differencing to a non-stationary series to achieve stationarity, \(\Delta y_t = y_t - y_{t-1}\)
Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF)
The ACF measures the linear correlation between a time series and its lagged version:
The PACF measures the direct linear relationship between \(y_t\) and \(y_{t+h}\) after removing the effects of intermediate lags.
ACF and PACF plots are essential tools for selecting model orders:
| Model | ACF pattern | PACF pattern |
|---|---|---|
| AR(p) | Tails off (exponential decay) | Cuts off after lag \(p\) |
| MA(q) | Cuts off after lag \(q\) | Tails off |
| ARMA(p,q) | Tails off | Tails off |
Classical Methods
AR / MA / ARMA / ARIMA Models
AR(p) -- Autoregressive model: The current value is a linear combination of the past \(p\) values plus white noise:
MA(q) -- Moving Average model: The current value is a linear combination of the past \(q\) noise terms:
ARMA(p,q): Combines AR and MA:
ARIMA(p,d,q): Adds differencing to ARMA to handle non-stationary series. \(d\) is the order of differencing.
Box-Jenkins Methodology
The Box-Jenkins methodology is a systematic procedure for selecting and fitting ARIMA models:
- Identification: Examine ACF/PACF plots to determine candidate values for \(p\), \(d\), \(q\)
- Estimation: Fit model parameters using Maximum Likelihood Estimation (MLE)
- Diagnostics: Check whether residuals are white noise (Ljung-Box test)
- Forecasting: If diagnostics pass, use the model for prediction
Seasonal Decomposition
Seasonal ARIMA, i.e., SARIMA(p,d,q)(P,D,Q)\(_s\), where \(s\) is the seasonal period.
Classical additive/multiplicative decomposition:
- Additive model: \(y_t = T_t + S_t + R_t\) (trend + seasonality + residual)
- Multiplicative model: \(y_t = T_t \times S_t \times R_t\)
STL decomposition (Seasonal and Trend decomposition using Loess) is a more robust method that can handle changing seasonality.
Exponential Smoothing
Simple Exponential Smoothing (SES)
Suitable for series with no trend and no seasonality:
A larger \(\alpha\) gives more weight to recent observations; a smaller \(\alpha\) produces smoother forecasts.
Double Exponential Smoothing (Holt's Method)
Adds a trend component:
where \(\ell_t\) is the level component and \(b_t\) is the trend component.
Holt-Winters Method
Further incorporates a seasonal component, with both additive and multiplicative variants:
| Method | Use case | Seasonality behavior |
|---|---|---|
| Holt-Winters additive | Constant seasonal amplitude | \(S_t\) is added to the forecast |
| Holt-Winters multiplicative | Seasonal amplitude grows with trend | \(S_t\) multiplies the forecast |
Prophet
Overview
Prophet is an open-source time series forecasting tool from Meta (formerly Facebook), designed specifically for business time series. It is robust to missing values, outliers, and trend changes.
Core Model
Prophet decomposes a time series into three additive components:
| Component | Description | Implementation |
|---|---|---|
| \(g(t)\): Trend | Long-term growth trend | Piecewise linear or logistic growth curve with automatic changepoint detection |
| \(s(t)\): Seasonality | Periodic patterns | Fourier series: \(s(t) = \sum_{n=1}^{N}\left(a_n \cos\frac{2\pi nt}{P} + b_n \sin\frac{2\pi nt}{P}\right)\) |
| \(h(t)\): Holidays | Holiday/special event effects | User-provided holiday list; model estimates effect sizes |
Advantages of Prophet:
- User-friendly for non-data-scientists with intuitive parameters
- Automatically handles missing data and outliers
- Allows manual addition of changepoints and holidays
- Built-in uncertainty intervals
Machine Learning Methods
Feature Engineering
Transforming time series into tabular data is the key to applying traditional ML models:
| Feature type | Example | Description |
|---|---|---|
| Sliding window (lag features) | \(y_{t-1}, y_{t-2}, \dots, y_{t-k}\) | Values from the past \(k\) time steps |
| Rolling statistics | Moving average, rolling standard deviation | Captures local trends and volatility |
| Date/time features | Month, day of week, hour, is_holiday | Encodes temporal periodicity |
| Difference features | \(y_t - y_{t-1}\), \(y_t - y_{t-7}\) | Captures changes |
| Fourier features | \(\sin(2\pi t / P)\), \(\cos(2\pi t / P)\) | Encodes seasonality |
XGBoost / LightGBM for Time Series
Gradient boosted trees perform exceptionally well in time series competitions:
- Advantages: No stationarity assumption required, automatically handles nonlinearity, can incorporate external features
- Caveats: Must use time-ordered cross-validation (no random splitting) to avoid data leakage
- Multi-step forecasting: Recursive forecasting (predict step-by-step, feeding predictions as next inputs) or direct multi-output
Deep Learning Methods
LSTM for Time Series
LSTM (Long Short-Term Memory) is naturally suited for sequence modeling:
- Encoder-decoder architecture for multi-step forecasting
- Can handle multivariate time series
- Drawbacks: slow training, sensitive to hyperparameters
Temporal Fusion Transformer (TFT)
Google's TFT (2021) combines several advanced techniques:
- Variable selection network: Automatically identifies important features
- Temporal attention: Captures both short- and long-term dependencies
- Interpretability: Provides feature importance scores and temporal attention weights
- Achieved SOTA on multiple benchmark datasets
PatchTST
Nie et al. (2023) proposed segmenting time series into patches (similar to how ViT processes images):
- Splits long sequences into fixed-length patches
- Each patch serves as a token input to the Transformer
- Dramatically reduces computational complexity while preserving long-range dependencies
- Channel-independence strategy improves multivariate forecasting
Evaluation Methods
Common Evaluation Metrics
| Metric | Formula | Characteristics |
|---|---|---|
| MAE | \(\frac{1}{T}\sum_{t=1}^T \|y_t - \hat{y}_t\|\) | Intuitive, robust to outliers |
| RMSE | \(\sqrt{\frac{1}{T}\sum_{t=1}^T (y_t - \hat{y}_t)^2}\) | Amplifies large errors |
| MAPE | \(\frac{100\%}{T}\sum_{t=1}^T \left\|\frac{y_t - \hat{y}_t}{y_t}\right\|\) | Percentage error, but unstable when \(y_t \approx 0\) |
| sMAPE | \(\frac{200\%}{T}\sum_{t=1}^T \frac{\|y_t - \hat{y}_t\|}{\|y_t\| + \|\hat{y}_t\|}\) | Symmetric version of MAPE |
| MASE | \(\frac{\text{MAE}}{\text{MAE}_{\text{naive}}}\) | Improvement relative to naive forecast, suitable for cross-series comparison |
Backtesting
Time series evaluation must respect temporal ordering:
- Rolling window validation: A fixed-size window slides forward; the model is retrained and evaluated at each step
- Expanding window validation: The training set progressively grows while the prediction window moves forward
- No future data leakage: Strictly ensure all training data precedes the prediction time point
Expanding window validation illustration:
Fold 1: [=====Train=====][Test]
Fold 2: [======Train======][Test]
Fold 3: [=======Train=======][Test]
Fold 4: [========Train========][Test]
→ Time direction
Method Selection Guide
| Scenario | Recommended method | Rationale |
|---|---|---|
| Small data, univariate | ARIMA / Exponential smoothing | Few parameters, less prone to overfitting |
| Business forecasting (with seasonality/holidays) | Prophet | Easy to use, interpretable |
| Rich external features | XGBoost / LightGBM | Strong feature integration capability |
| Long sequences, multivariate, large data | Transformer-based (TFT/PatchTST) | Powerful modeling capacity |
| Uncertainty estimation needed | GP / Bayesian methods / Prophet | Built-in uncertainty quantification |