Classic Datasets

This note introduces some classic datasets commonly used in machine learning and deep learning.

Image Datasets

MNIST

MNIST is one of the most classic datasets for getting started with machine learning and deep learning, primarily used for handwritten digit recognition. It is widely regarded as the "Hello World" of artificial intelligence.

MNIST contains a total of 70,000 images:

Training set: 60,000 images
Test set: 10,000 images

Data Format and Characteristics

Image size: 28 × 28 pixels
Channels: 1 (grayscale)
Pixel values: 0–255
Labels: 0–9 (corresponding digits)

Each image is essentially a 28×28 matrix, making it ideal for feeding into models as a vector or tensor.

Commonly used with: Logistic Regression, KNN, SVM, FNN, CNN

Modern CNNs can easily achieve over 99% accuracy on MNIST.

CIFAR-10

CIFAR-10 is the second most classic introductory dataset in computer vision after MNIST, but with significantly higher difficulty. It is often used to test whether a model truly "understands images."

Dataset	Count
Training set	50,000
Test set	10,000
Total	60,000

Each class contains 6,000 images, with a perfectly balanced class distribution.

CIFAR-10 has the following 10 fixed categories:

airplane
automobile
bird
cat
deer
dog
frog
horse
ship
truck

Key characteristics:

A mix of animals and vehicles
Semantic similarity exists between certain classes (e.g., cat vs. dog)

ImageNet

ImageNet was introduced in 2009 by a team led by Stanford professor Fei-Fei Li. It contains over 14 million annotated images organized according to the WordNet hierarchy. For example, "canine" is a broad category that branches into thousands of subcategories such as "German Shepherd" and "Poodle." Most images were manually labeled through crowdsourcing platforms such as Amazon Mechanical Turk. ImageNet served as the cradle for many renowned neural network architectures, including ResNet, VGG, and Inception.

ILSVRC (ImageNet Large Scale Visual Recognition Challenge) was an annual competition held from 2010 onward, using a subset of ImageNet (approximately 1,000 categories and 1.2 million images). In 2012, AlexNet won the challenge by an unprecedented margin, demonstrating the enormous potential of convolutional neural networks (CNNs) for image recognition and sparking the deep learning revolution.

As of now, ImageNet-21K (Full) contains over 14 million images across more than 21,000 category labels.

Open Images

The Open Images Dataset is maintained by Google for object detection and instance segmentation. It contains over 9 million image annotations, with more than 15 million bounding box annotations covering over 600 categories.

LAION-Art

LAION-Art is a curated, condensed subset of LAION.

LAION-5B

LAION-5B is currently the largest publicly available image-text paired dataset. It contains over 5.85 billion image-text pairs. Although it is not purely an image dataset, it serves as the core foundation for training visual generative models such as Stable Diffusion.