Classic Datasets
This note introduces some classic datasets commonly used in machine learning and deep learning.
Image Datasets
MNIST
MNIST is one of the most classic datasets for getting started with machine learning and deep learning, primarily used for handwritten digit recognition. It is widely regarded as the "Hello World" of artificial intelligence.
MNIST contains a total of 70,000 images:
- Training set: 60,000 images
- Test set: 10,000 images
Data Format and Characteristics
- Image size: 28 × 28 pixels
- Channels: 1 (grayscale)
- Pixel values: 0–255
- Labels: 0–9 (corresponding digits)
Each image is essentially a 28×28 matrix, making it ideal for feeding into models as a vector or tensor.
Commonly used with: Logistic Regression, KNN, SVM, FNN, CNN
Modern CNNs can easily achieve over 99% accuracy on MNIST.
CIFAR-10
CIFAR-10 is the second most classic introductory dataset in computer vision after MNIST, but with significantly higher difficulty. It is often used to test whether a model truly "understands images."
| Dataset | Count |
|---|---|
| Training set | 50,000 |
| Test set | 10,000 |
| Total | 60,000 |
Each class contains 6,000 images, with a perfectly balanced class distribution.
CIFAR-10 has the following 10 fixed categories:
- airplane
- automobile
- bird
- cat
- deer
- dog
- frog
- horse
- ship
- truck
Key characteristics:
- A mix of animals and vehicles
- Semantic similarity exists between certain classes (e.g., cat vs. dog)
ImageNet
ImageNet was introduced in 2009 by a team led by Stanford professor Fei-Fei Li. It contains over 14 million annotated images organized according to the WordNet hierarchy. For example, "canine" is a broad category that branches into thousands of subcategories such as "German Shepherd" and "Poodle." Most images were manually labeled through crowdsourcing platforms such as Amazon Mechanical Turk. ImageNet served as the cradle for many renowned neural network architectures, including ResNet, VGG, and Inception.
ILSVRC (ImageNet Large Scale Visual Recognition Challenge) was an annual competition held from 2010 onward, using a subset of ImageNet (approximately 1,000 categories and 1.2 million images). In 2012, AlexNet won the challenge by an unprecedented margin, demonstrating the enormous potential of convolutional neural networks (CNNs) for image recognition and sparking the deep learning revolution.
As of now, ImageNet-21K (Full) contains over 14 million images across more than 21,000 category labels.
Open Images
The Open Images Dataset is maintained by Google for object detection and instance segmentation. It contains over 9 million image annotations, with more than 15 million bounding box annotations covering over 600 categories.
LAION-Art
LAION-Art is a curated, condensed subset of LAION.
LAION-5B
LAION-5B is currently the largest publicly available image-text paired dataset. It contains over 5.85 billion image-text pairs. Although it is not purely an image dataset, it serves as the core foundation for training visual generative models such as Stable Diffusion.