Linear Discriminant Analysis and Feature Selection
In non-invasive EEG BCI (motor imagery, P300, SSVEP), where data are scarce, SNR is low, and categories are few, Linear Discriminant Analysis (LDA) and feature selection have remained the dominant methods for years. They represent a completely different technical philosophy from "big-data invasive BCI."
1. Why EEG Needs Linear Discrimination
- Little training data (a single user in a single session may have only a few hundred trials)
- High dimensionality (64 channels × multiple frequency bands × time windows)
- Few classes (typically 2–4)
- Very high overfitting risk
Deep learning trains unstably in this regime — a linear classifier plus carefully engineered features is actually optimal. This has remained true on benchmarks like BCI Competition IV-2a for years.
2. LDA (Linear Discriminant Analysis)
Core Idea
Find a projection direction \(\mathbf{w}\) that: - Maximizes the between-class mean difference - Minimizes the sum of within-class variances
Closed-form solution: \(\mathbf{w}^* = (\Sigma_1 + \Sigma_2)^{-1} (\mu_1 - \mu_2)\)
Applications in BCI
- P300 classification: signal vs. non-signal
- Motor imagery: left hand vs. right hand
- SSVEP target selection: which frequency's response is strongest
Advantages
- Closed-form solution, fast training
- Robust to small data
- Interpretable
Limitations
- Linear decision boundary
- Assumes Gaussian covariances
- Unstable \(\Sigma\) estimation in high dimensions
3. Shrinkage LDA (Regularized LDA)
Blankertz et al. 2011 NeuroImage proposed shrinkage LDA:
where \(\gamma \in [0, 1]\) is the shrinkage coefficient. Shrinkage stabilizes covariance estimation in high dimensions and is standard for EEG BCI.
4. CSP Features + LDA
Common Spatial Pattern (CSP) + LDA has been the decade-long gold standard for motor-imagery BCI.
CSP
Learn a set of spatial filters \(W\) that maximize the ratio of two classes' covariances:
The filtered-signal variance (log-variance) is used as a feature and passed to LDA.
FBCSP (Filter Bank CSP)
Ang et al. 2008 extended CSP:
- Split the signal into multiple frequency bands (4–40 Hz, 4 Hz steps)
- Run CSP independently on each band
- Concatenate CSP features across bands
- Feature selection (mutual information)
- LDA or SVM classification
FBCSP won the BCI Competition IV 2a and has long stood as the benchmark for EEG MI-BCI.
5. Riemannian Geometry
Treat each EEG trial's covariance matrix \(\mathbf{C}_i \in \mathbb{R}^{d \times d}\) as a point on the symmetric positive-definite (SPD) manifold and classify using manifold distance.
Key Operations
- Log map: map an SPD matrix to the tangent space
- Riemannian mean: the center on the manifold
- MDM (Minimum Distance to Mean): the classifier is simply "nearest class mean"
Performance
Barachant et al. 2012 IEEE TBME first applied Riemannian methods to BCI: - Beat FBCSP on BCI Competition IV - No need for channel or band selection - Good cross-subject transfer
pyRiemann is the most popular open-source implementation today.
6. Feature Selection
High-dimensional EEG features often require feature selection to reduce overfitting:
Filter Methods
Score each feature independently based on its relation to the label: - Fisher score - Mutual information - ReliefF
Wrapper Methods
- Sequential Forward/Backward Selection
- Add/remove one feature at a time and check classification performance
Embedded Methods
- L1 regularization (LASSO): zeros out features during training
- Elastic net: L1 + L2
Stable Feature Selection
Nogueira & Brown 2016 proposed stability selection — select features across multiple bootstrap training sets and retain those with high stability.
7. Specialized Methods for P300 and SSVEP
P300 Classifiers
- SWLDA (Stepwise Wise LDA): Bender 1988, standard for P300 spellers
- xDAWN spatial filtering: boosts P300 SNR
SSVEP Classifiers
CCA (Canonical Correlation Analysis) is the classical SSVEP method:
where \(X\) is the EEG and \(Y\) is a reference sine/cosine template. The frequency yielding the largest \(\rho\) is the user's selection.
Extensions: - FBCCA (Filter-Bank CCA): Chen 2015, multi-band ensemble - TRCA (Task-Related Component Analysis): uses multi-trial templates
8. Placement in Modern BCI
LDA and feature selection are the mainstay of consumer-grade/non-invasive BCI:
- Muse meditation device: α/β/γ band power + LDA classification
- Emotiv Cortex: state classification based on band power
- OpenBCI: CSP + LDA as a teaching standard
They are also the baseline for deep-learning methods — any new EEG deep network must beat shrinkage-LDA to be meaningful.
9. Comparison with Deep Learning
| Scenario | Linear Discrimination | Deep Learning |
|---|---|---|
| Small data (<500 trials) | ✓ Better | Overfits |
| Big data (>10000) | Bottleneck | ✓ Better |
| Cross-subject transfer | Riemannian is good | ✓ Better |
| Interpretability | ✓ High | Low |
| Online | ✓ Fast | Needs optimization |
A modern hybrid strategy: Riemannian features + a shallow CNN, or EEGNet + a shrinkage-LDA head — combining the strengths of both.
10. Logical Chain
- EEG BCI is a small-data problem, which is why classical statistical methods still dominate.
- LDA + shrinkage is the foundational method for EEG classification.
- CSP / FBCSP has been the gold standard for motor-imagery BCI for years.
- Riemannian geometry treats covariance matrices as manifold points, enabling stronger cross-subject capability.
- Deep learning surpassing classical methods still requires big data; in small-data settings, feature engineering + linear classifiers remain the best option.
References
- Blankertz et al. (2011). Single-trial analysis and classification of ERP components—a tutorial. NeuroImage. — shrinkage LDA
- Ang et al. (2008). Filter Bank Common Spatial Pattern (FBCSP) in brain-computer interface. IJCNN.
- Barachant et al. (2012). Multi-class brain-computer interface classification by Riemannian geometry. IEEE TBME. https://ieeexplore.ieee.org/document/6046114
- Chen et al. (2015). Filter bank canonical correlation analysis for implementing a high-speed SSVEP-based brain-computer interface. J Neural Eng.
- Lotte et al. (2018). A review of classification algorithms for EEG-based brain-computer interfaces: a 10 year update. J Neural Eng.