Deep Learning Landscape
Deep Learning is a subfield of machine learning that automatically learns hierarchical feature representations from data through multi-layer neural networks. Since AlexNet stunned the world in 2012, deep learning has become the most central technological paradigm in artificial intelligence.
Deep Learning Technology Stack
graph TD
A[Deep Learning] --> B[Core Architectures]
A --> C[Learning Paradigms]
A --> D[Application Domains]
A --> E[Engineering Practice]
B --> B1[MLP/Feedforward]
B --> B2[CNN]
B --> B3[RNN/LSTM/GRU]
B --> B4[Transformer]
B --> B5[GNN]
B --> B6[SSM/Mamba]
C --> C1[Supervised]
C --> C2[Self-Supervised]
C --> C3[Generative]
C --> C4[RL Fine-tuning]
D --> D1[Computer Vision]
D --> D2[NLP]
D --> D3[Multimodal]
D --> D4[Scientific Computing]
E --> E1[Distributed Training]
E --> E2[Model Compression]
E --> E3[Efficient Inference]
E --> E4[MLOps]
Architecture Evolution Timeline
| Year | Milestone | Core Innovation | Impact |
|---|---|---|---|
| 1986 | Backpropagation | Error backpropagation algorithm | Multi-layer networks trainable |
| 1998 | LeNet-5 | Convolution + pooling | Handwritten digit recognition |
| 2012 | AlexNet | GPU training + ReLU + Dropout | ImageNet error plummeted, DL explosion |
| 2014 | GoogLeNet/VGG | Deeper networks, Inception module | The power of depth |
| 2015 | ResNet | Residual connections | Broke depth barrier (152 layers) |
| 2014 | GAN | Generative adversarial networks | Beginning of generative AI |
| 2014 | Seq2Seq+Attention | Attention mechanism | Machine translation breakthrough |
| 2017 | Transformer | Self-attention, dropped RNN | NLP paradigm revolution |
| 2018 | BERT | Bidirectional pretraining | New NLU benchmark |
| 2018 | GPT | Autoregressive pretraining | Language generation |
| 2020 | ViT | Transformer for vision | CV+NLP unification |
| 2020 | GPT-3 | 175B params, emergent abilities | Large model era begins |
| 2021 | CLIP | Vision-language contrastive learning | Multimodal alignment |
| 2022 | Stable Diffusion | Latent space diffusion | Text-to-image explosion |
| 2022 | ChatGPT | RLHF alignment | AI goes mainstream |
| 2023 | GPT-4 | Multimodal large model | AGI discussion |
| 2023 | Mamba | State space models | Transformer alternative |
| 2024 | Llama 4/DeepSeek | MoE architecture | Efficient large models |
Core Learning Paradigms
Supervised Learning
The most traditional paradigm: given labeled data \((x_i, y_i)\), minimize loss:
\[\min_\theta \frac{1}{N}\sum_{i=1}^{N} \mathcal{L}(f_\theta(x_i), y_i)\]
Self-Supervised Learning
Construct supervision signals from data itself, no manual labeling:
- Contrastive Learning: Pull positive pairs closer, push negative pairs apart (SimCLR/MoCo/CLIP)
- Masked Prediction: Mask part of input, predict masked content (BERT/MAE)
- Autoregressive Prediction: Predict next token based on context (GPT series)
Generative Learning
Learn data distribution \(p(x)\) and generate new samples:
| Method | Principle | Representative Models |
|---|---|---|
| VAE | Variational inference, ELBO maximization | VAE, VQ-VAE |
| GAN | Generator vs discriminator adversarial | StyleGAN, BigGAN |
| Diffusion | Step-by-step denoising | DDPM, Stable Diffusion |
| Flow Matching | Learn probability flow ODE | Rectified Flow |
| Autoregressive | Token-by-token generation | GPT, DALL-E |
Section Navigation
- Fundamentals — MLP, loss functions, probability basics
- CNN — Convolutional networks and architectures
- RNN — Sequence modeling: RNN→LSTM→GRU
- Transformer — Self-attention architecture
- Generative Models — VAE/GAN/Diffusion/Flow
- GNN — Graph neural networks
- Foundation Models — LLM/Vision/Multimodal
- SSM/Mamba — State space models
- Optimization — Training techniques
- Frontiers — MoE/Efficient inference/Latest advances