Deep Learning Landscape

Deep Learning is a subfield of machine learning that automatically learns hierarchical feature representations from data through multi-layer neural networks. Since AlexNet stunned the world in 2012, deep learning has become the most central technological paradigm in artificial intelligence.

Deep Learning Technology Stack

graph TD
    A[Deep Learning] --> B[Core Architectures]
    A --> C[Learning Paradigms]
    A --> D[Application Domains]
    A --> E[Engineering Practice]

    B --> B1[MLP/Feedforward]
    B --> B2[CNN]
    B --> B3[RNN/LSTM/GRU]
    B --> B4[Transformer]
    B --> B5[GNN]
    B --> B6[SSM/Mamba]

    C --> C1[Supervised]
    C --> C2[Self-Supervised]
    C --> C3[Generative]
    C --> C4[RL Fine-tuning]

    D --> D1[Computer Vision]
    D --> D2[NLP]
    D --> D3[Multimodal]
    D --> D4[Scientific Computing]

    E --> E1[Distributed Training]
    E --> E2[Model Compression]
    E --> E3[Efficient Inference]
    E --> E4[MLOps]

Architecture Evolution Timeline

Year	Milestone	Core Innovation	Impact
1986	Backpropagation	Error backpropagation algorithm	Multi-layer networks trainable
1998	LeNet-5	Convolution + pooling	Handwritten digit recognition
2012	AlexNet	GPU training + ReLU + Dropout	ImageNet error plummeted, DL explosion
2014	GoogLeNet/VGG	Deeper networks, Inception module	The power of depth
2015	ResNet	Residual connections	Broke depth barrier (152 layers)
2014	GAN	Generative adversarial networks	Beginning of generative AI
2014	Seq2Seq+Attention	Attention mechanism	Machine translation breakthrough
2017	Transformer	Self-attention, dropped RNN	NLP paradigm revolution
2018	BERT	Bidirectional pretraining	New NLU benchmark
2018	GPT	Autoregressive pretraining	Language generation
2020	ViT	Transformer for vision	CV+NLP unification
2020	GPT-3	175B params, emergent abilities	Large model era begins
2021	CLIP	Vision-language contrastive learning	Multimodal alignment
2022	Stable Diffusion	Latent space diffusion	Text-to-image explosion
2022	ChatGPT	RLHF alignment	AI goes mainstream
2023	GPT-4	Multimodal large model	AGI discussion
2023	Mamba	State space models	Transformer alternative
2024	Llama 4/DeepSeek	MoE architecture	Efficient large models

Core Learning Paradigms

Supervised Learning

The most traditional paradigm: given labeled data \((x_i, y_i)\), minimize loss:

\[\min_\theta \frac{1}{N}\sum_{i=1}^{N} \mathcal{L}(f_\theta(x_i), y_i)\]

Self-Supervised Learning

Construct supervision signals from data itself, no manual labeling:

Contrastive Learning: Pull positive pairs closer, push negative pairs apart (SimCLR/MoCo/CLIP)
Masked Prediction: Mask part of input, predict masked content (BERT/MAE)
Autoregressive Prediction: Predict next token based on context (GPT series)

Generative Learning

Learn data distribution \(p(x)\) and generate new samples:

Method	Principle	Representative Models
VAE	Variational inference, ELBO maximization	VAE, VQ-VAE
GAN	Generator vs discriminator adversarial	StyleGAN, BigGAN
Diffusion	Step-by-step denoising	DDPM, Stable Diffusion
Flow Matching	Learn probability flow ODE	Rectified Flow
Autoregressive	Token-by-token generation	GPT, DALL-E

Fundamentals — MLP, loss functions, probability basics
CNN — Convolutional networks and architectures
RNN — Sequence modeling: RNN→LSTM→GRU
Transformer — Self-attention architecture
Generative Models — VAE/GAN/Diffusion/Flow
GNN — Graph neural networks
Foundation Models — LLM/Vision/Multimodal
SSM/Mamba — State space models
Optimization — Training techniques
Frontiers — MoE/Efficient inference/Latest advances