Meta Learning
Meta-learning, also known as "Learning to Learn," aims to enable models to rapidly learn new tasks from only a few examples. While traditional deep learning requires large amounts of data to train a model for a specific task, meta-learning seeks to acquire a cross-task learning ability.
Tribe-level perspective: the metric-based methods in this article (Siamese / Prototypical / Matching Networks) are the modern incarnation of the Analogizers tribe in Domingos's taxonomy. For full Triplet/InfoNCE derivations and SimCLR/MoCo/CLIP read as metric learning, see The Master Algorithm — Metric Learning & Contrastive Learning.
Core Idea
The key distinction of meta-learning is that it operates at the task level rather than the sample level:
- Traditional learning: Given a single task (e.g., cat vs. dog classification), learn from a large number of samples
- Meta-learning: Given a series of tasks, learn the ability to "quickly learn new tasks"
Few-shot Learning Setup
Few-shot learning is the most common application scenario for meta-learning. The standard setup is known as N-way K-shot:
- N-way: Each task involves N classes
- K-shot: Each class has only K labeled samples (the Support Set)
- Query Set: Used to evaluate the model's performance on the task
For example, 5-way 1-shot means: the model is given 1 image from each of 5 classes, and then must classify new images into these 5 categories.
Training Paradigm: Episodic Training
Meta-learning employs episodic training:
- Randomly sample N classes from the training set
- Sample K examples from each class to form the Support Set
- Sample additional examples from each class to form the Query Set
- The model uses the Support Set to make predictions on the Query Set
- Compute the loss and update the model parameters
Each such sampling procedure is called an episode (or task). By training over a large number of episodes, the model learns a cross-task learning ability.
Main Approaches
Metric-based Methods
Core idea: Learn a good embedding space where same-class samples are close together and different-class samples are far apart.
Siamese Networks:
Two weight-sharing networks process two inputs separately, and their embeddings are compared to determine whether they belong to the same class.
Prototypical Networks:
For each class, the mean embedding of the Support Set samples is computed as the class "prototype." New samples are classified based on their distance to each class prototype:
where \(f_\phi\) is the feature extractor and \(d\) is a distance function (typically Euclidean distance).
Matching Networks:
An attention mechanism is used to weight the samples in the Support Set for predicting the class of a query sample.
Optimization-based Methods
Core idea: Learn a good parameter initialization so that the model can adapt to a new task with only a few gradient steps.
MAML (Model-Agnostic Meta-Learning):
MAML is the most classic meta-learning algorithm. Its core idea is to find a set of model parameters \(\theta\) such that only a few gradient descent steps are needed to achieve strong performance on any new task.
Algorithm Procedure:
- Sample a batch of tasks \(\{\mathcal{T}_i\}\) from the task distribution
- For each task \(\mathcal{T}_i\):
- Perform K gradient descent steps on the Support Set to obtain task-specific parameters: \(\theta_i' = \theta - \alpha \nabla_\theta \mathcal{L}_{\mathcal{T}_i}(\theta)\)
- Update the meta-parameters using the loss on the Query Set: \(\theta \leftarrow \theta - \beta \nabla_\theta \sum_i \mathcal{L}_{\mathcal{T}_i}(\theta_i')\)
Key point: The outer-loop gradient must pass "through" the inner-loop gradient steps (second-order derivatives), which makes MAML computationally expensive.
Intuition behind MAML: MAML does not seek parameters that "perform well on all tasks" (that would be multi-task learning). Instead, it seeks parameters that are "close to the optimal solution for every task" — in other words, a good starting point.
Memory-based Methods
Core idea: Use an external memory module to store and retrieve experiences.
Memory-Augmented Neural Networks (MANN): By incorporating external memory mechanisms such as the Neural Turing Machine (NTM), new task samples can be rapidly written to memory, and relevant information can be retrieved from memory at prediction time.
Meta-Learning vs. Transfer Learning
| Dimension | Meta-Learning | Transfer Learning |
|---|---|---|
| Objective | Learn "how to learn" | Transfer existing knowledge to new tasks |
| Training scheme | Episode-based (task level) | Standard training + fine-tuning |
| Adaptation to new tasks | Designed for rapid adaptation | Requires a certain amount of fine-tuning data |
| Typical methods | MAML, Prototypical Networks | Pre-training + Fine-tuning |
Appendix: Related Projects
Few-shot Image Classification via Meta-Learning
This project focuses on the application of the MAML (Model-Agnostic Meta-Learning) algorithm to few-shot image classification tasks.
For detailed project content, please visit: https://github.com/jeffliulab/few-shot-image-classification
References
- Finn et al., "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks", ICML 2017
- Snell et al., "Prototypical Networks for Few-shot Learning", NeurIPS 2017
- Vinyals et al., "Matching Networks for One Shot Learning", NeurIPS 2016