Linear Discriminant Analysis and Feature Selection

In non-invasive EEG BCI (motor imagery, P300, SSVEP), where data are scarce, SNR is low, and categories are few, Linear Discriminant Analysis (LDA) and feature selection have remained the dominant methods for years. They represent a completely different technical philosophy from "big-data invasive BCI."

1. Why EEG Needs Linear Discrimination

Little training data (a single user in a single session may have only a few hundred trials)
High dimensionality (64 channels × multiple frequency bands × time windows)
Few classes (typically 2–4)
Very high overfitting risk

Deep learning trains unstably in this regime — a linear classifier plus carefully engineered features is actually optimal. This has remained true on benchmarks like BCI Competition IV-2a for years.

2. LDA (Linear Discriminant Analysis)

Core Idea

Find a projection direction \(\mathbf{w}\) that: - Maximizes the between-class mean difference - Minimizes the sum of within-class variances

\[\mathbf{w}^* = \arg\max_\mathbf{w} \frac{(\mathbf{w}^T \mu_1 - \mathbf{w}^T \mu_2)^2}{\mathbf{w}^T (\Sigma_1 + \Sigma_2) \mathbf{w}}\]

Closed-form solution: \(\mathbf{w}^* = (\Sigma_1 + \Sigma_2)^{-1} (\mu_1 - \mu_2)\)

Applications in BCI

P300 classification: signal vs. non-signal
Motor imagery: left hand vs. right hand
SSVEP target selection: which frequency's response is strongest

Advantages

Closed-form solution, fast training
Robust to small data
Interpretable

Limitations

Linear decision boundary
Assumes Gaussian covariances
Unstable \(\Sigma\) estimation in high dimensions

3. Shrinkage LDA (Regularized LDA)

Blankertz et al. 2011 NeuroImage proposed shrinkage LDA:

\[\Sigma_{\text{shrink}} = (1 - \gamma) \Sigma + \gamma \frac{\text{tr}(\Sigma)}{d} I\]

where \(\gamma \in [0, 1]\) is the shrinkage coefficient. Shrinkage stabilizes covariance estimation in high dimensions and is standard for EEG BCI.

4. CSP Features + LDA

Common Spatial Pattern (CSP) + LDA has been the decade-long gold standard for motor-imagery BCI.

CSP

Learn a set of spatial filters \(W\) that maximize the ratio of two classes' covariances:

\[\max_\mathbf{w} \frac{\mathbf{w}^T \Sigma_1 \mathbf{w}}{\mathbf{w}^T \Sigma_2 \mathbf{w}}\]

The filtered-signal variance (log-variance) is used as a feature and passed to LDA.

FBCSP (Filter Bank CSP)

Ang et al. 2008 extended CSP:

Split the signal into multiple frequency bands (4–40 Hz, 4 Hz steps)
Run CSP independently on each band
Concatenate CSP features across bands
Feature selection (mutual information)
LDA or SVM classification

FBCSP won the BCI Competition IV 2a and has long stood as the benchmark for EEG MI-BCI.

5. Riemannian Geometry

Treat each EEG trial's covariance matrix \(\mathbf{C}_i \in \mathbb{R}^{d \times d}\) as a point on the symmetric positive-definite (SPD) manifold and classify using manifold distance.

Key Operations

Log map: map an SPD matrix to the tangent space
Riemannian mean: the center on the manifold
MDM (Minimum Distance to Mean): the classifier is simply "nearest class mean"

Performance

Barachant et al. 2012 IEEE TBME first applied Riemannian methods to BCI: - Beat FBCSP on BCI Competition IV - No need for channel or band selection - Good cross-subject transfer

pyRiemann is the most popular open-source implementation today.

6. Feature Selection

High-dimensional EEG features often require feature selection to reduce overfitting:

Filter Methods

Score each feature independently based on its relation to the label: - Fisher score - Mutual information - ReliefF

Wrapper Methods

Sequential Forward/Backward Selection
Add/remove one feature at a time and check classification performance

Embedded Methods

L1 regularization (LASSO): zeros out features during training
Elastic net: L1 + L2

Stable Feature Selection

Nogueira & Brown 2016 proposed stability selection — select features across multiple bootstrap training sets and retain those with high stability.

7. Specialized Methods for P300 and SSVEP

P300 Classifiers

SWLDA (Stepwise Wise LDA): Bender 1988, standard for P300 spellers
xDAWN spatial filtering: boosts P300 SNR

SSVEP Classifiers

CCA (Canonical Correlation Analysis) is the classical SSVEP method:

\[\rho = \max \text{corr}(X\mathbf{a}, Y\mathbf{b})\]

where \(X\) is the EEG and \(Y\) is a reference sine/cosine template. The frequency yielding the largest \(\rho\) is the user's selection.

Extensions: - FBCCA (Filter-Bank CCA): Chen 2015, multi-band ensemble - TRCA (Task-Related Component Analysis): uses multi-trial templates

8. Placement in Modern BCI

LDA and feature selection are the mainstay of consumer-grade/non-invasive BCI:

Muse meditation device: α/β/γ band power + LDA classification
Emotiv Cortex: state classification based on band power
OpenBCI: CSP + LDA as a teaching standard

They are also the baseline for deep-learning methods — any new EEG deep network must beat shrinkage-LDA to be meaningful.

9. Comparison with Deep Learning

Scenario	Linear Discrimination	Deep Learning
Small data (<500 trials)	✓ Better	Overfits
Big data (>10000)	Bottleneck	✓ Better
Cross-subject transfer	Riemannian is good	✓ Better
Interpretability	✓ High	Low
Online	✓ Fast	Needs optimization

A modern hybrid strategy: Riemannian features + a shallow CNN, or EEGNet + a shrinkage-LDA head — combining the strengths of both.

10. Logical Chain

EEG BCI is a small-data problem, which is why classical statistical methods still dominate.
LDA + shrinkage is the foundational method for EEG classification.
CSP / FBCSP has been the gold standard for motor-imagery BCI for years.
Riemannian geometry treats covariance matrices as manifold points, enabling stronger cross-subject capability.
Deep learning surpassing classical methods still requires big data; in small-data settings, feature engineering + linear classifiers remain the best option.

References

Blankertz et al. (2011). Single-trial analysis and classification of ERP components—a tutorial. NeuroImage. — shrinkage LDA
Ang et al. (2008). Filter Bank Common Spatial Pattern (FBCSP) in brain-computer interface. IJCNN.
Barachant et al. (2012). Multi-class brain-computer interface classification by Riemannian geometry. IEEE TBME. https://ieeexplore.ieee.org/document/6046114
Chen et al. (2015). Filter bank canonical correlation analysis for implementing a high-speed SSVEP-based brain-computer interface. J Neural Eng.
Lotte et al. (2018). A review of classification algorithms for EEG-based brain-computer interfaces: a 10 year update. J Neural Eng.