Meta Learning

Meta-learning, also known as "Learning to Learn," aims to enable models to rapidly learn new tasks from only a few examples. While traditional deep learning requires large amounts of data to train a model for a specific task, meta-learning seeks to acquire a cross-task learning ability.

Tribe-level perspective: the metric-based methods in this article (Siamese / Prototypical / Matching Networks) are the modern incarnation of the Analogizers tribe in Domingos's taxonomy. For full Triplet/InfoNCE derivations and SimCLR/MoCo/CLIP read as metric learning, see The Master Algorithm — Metric Learning & Contrastive Learning.

Core Idea

The key distinction of meta-learning is that it operates at the task level rather than the sample level:

Traditional learning: Given a single task (e.g., cat vs. dog classification), learn from a large number of samples
Meta-learning: Given a series of tasks, learn the ability to "quickly learn new tasks"

Few-shot Learning Setup

Few-shot learning is the most common application scenario for meta-learning. The standard setup is known as N-way K-shot:

N-way: Each task involves N classes
K-shot: Each class has only K labeled samples (the Support Set)
Query Set: Used to evaluate the model's performance on the task

For example, 5-way 1-shot means: the model is given 1 image from each of 5 classes, and then must classify new images into these 5 categories.

Training Paradigm: Episodic Training

Meta-learning employs episodic training:

Randomly sample N classes from the training set
Sample K examples from each class to form the Support Set
Sample additional examples from each class to form the Query Set
The model uses the Support Set to make predictions on the Query Set
Compute the loss and update the model parameters

Each such sampling procedure is called an episode (or task). By training over a large number of episodes, the model learns a cross-task learning ability.

Main Approaches

Metric-based Methods

Core idea: Learn a good embedding space where same-class samples are close together and different-class samples are far apart.

Siamese Networks:

Two weight-sharing networks process two inputs separately, and their embeddings are compared to determine whether they belong to the same class.

Prototypical Networks:

For each class, the mean embedding of the Support Set samples is computed as the class "prototype." New samples are classified based on their distance to each class prototype:

\[ c_k = \frac{1}{|S_k|}\sum_{(x,y) \in S_k} f_\phi(x) \]

\[ P(y = k | x) = \frac{\exp(-d(f_\phi(x), c_k))}{\sum_{k'}\exp(-d(f_\phi(x), c_{k'}))} \]

where \(f_\phi\) is the feature extractor and \(d\) is a distance function (typically Euclidean distance).

Matching Networks:

An attention mechanism is used to weight the samples in the Support Set for predicting the class of a query sample.

Optimization-based Methods

Core idea: Learn a good parameter initialization so that the model can adapt to a new task with only a few gradient steps.

MAML (Model-Agnostic Meta-Learning):

MAML is the most classic meta-learning algorithm. Its core idea is to find a set of model parameters \(\theta\) such that only a few gradient descent steps are needed to achieve strong performance on any new task.

Algorithm Procedure:

Sample a batch of tasks \(\{\mathcal{T}_i\}\) from the task distribution
For each task \(\mathcal{T}_i\):
- Perform K gradient descent steps on the Support Set to obtain task-specific parameters: \(\theta_i' = \theta - \alpha \nabla_\theta \mathcal{L}_{\mathcal{T}_i}(\theta)\)
Update the meta-parameters using the loss on the Query Set: \(\theta \leftarrow \theta - \beta \nabla_\theta \sum_i \mathcal{L}_{\mathcal{T}_i}(\theta_i')\)

Key point: The outer-loop gradient must pass "through" the inner-loop gradient steps (second-order derivatives), which makes MAML computationally expensive.

Intuition behind MAML: MAML does not seek parameters that "perform well on all tasks" (that would be multi-task learning). Instead, it seeks parameters that are "close to the optimal solution for every task" — in other words, a good starting point.

Memory-based Methods

Core idea: Use an external memory module to store and retrieve experiences.

Memory-Augmented Neural Networks (MANN): By incorporating external memory mechanisms such as the Neural Turing Machine (NTM), new task samples can be rapidly written to memory, and relevant information can be retrieved from memory at prediction time.

Meta-Learning vs. Transfer Learning

Dimension	Meta-Learning	Transfer Learning
Objective	Learn "how to learn"	Transfer existing knowledge to new tasks
Training scheme	Episode-based (task level)	Standard training + fine-tuning
Adaptation to new tasks	Designed for rapid adaptation	Requires a certain amount of fine-tuning data
Typical methods	MAML, Prototypical Networks	Pre-training + Fine-tuning

Few-shot Image Classification via Meta-Learning

This project focuses on the application of the MAML (Model-Agnostic Meta-Learning) algorithm to few-shot image classification tasks.

For detailed project content, please visit: https://github.com/jeffliulab/few-shot-image-classification

References

Finn et al., "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks", ICML 2017
Snell et al., "Prototypical Networks for Few-shot Learning", NeurIPS 2017
Vinyals et al., "Matching Networks for One Shot Learning", NeurIPS 2016