Skip to content

Deep Learning Landscape

Deep Learning is a subfield of machine learning that automatically learns hierarchical feature representations from data through multi-layer neural networks. Since AlexNet stunned the world in 2012, deep learning has become the most central technological paradigm in artificial intelligence.

Deep Learning Technology Stack

graph TD
    A[Deep Learning] --> B[Core Architectures]
    A --> C[Learning Paradigms]
    A --> D[Application Domains]
    A --> E[Engineering Practice]

    B --> B1[MLP/Feedforward]
    B --> B2[CNN]
    B --> B3[RNN/LSTM/GRU]
    B --> B4[Transformer]
    B --> B5[GNN]
    B --> B6[SSM/Mamba]

    C --> C1[Supervised]
    C --> C2[Self-Supervised]
    C --> C3[Generative]
    C --> C4[RL Fine-tuning]

    D --> D1[Computer Vision]
    D --> D2[NLP]
    D --> D3[Multimodal]
    D --> D4[Scientific Computing]

    E --> E1[Distributed Training]
    E --> E2[Model Compression]
    E --> E3[Efficient Inference]
    E --> E4[MLOps]

Architecture Evolution Timeline

Year Milestone Core Innovation Impact
1986 Backpropagation Error backpropagation algorithm Multi-layer networks trainable
1998 LeNet-5 Convolution + pooling Handwritten digit recognition
2012 AlexNet GPU training + ReLU + Dropout ImageNet error plummeted, DL explosion
2014 GoogLeNet/VGG Deeper networks, Inception module The power of depth
2015 ResNet Residual connections Broke depth barrier (152 layers)
2014 GAN Generative adversarial networks Beginning of generative AI
2014 Seq2Seq+Attention Attention mechanism Machine translation breakthrough
2017 Transformer Self-attention, dropped RNN NLP paradigm revolution
2018 BERT Bidirectional pretraining New NLU benchmark
2018 GPT Autoregressive pretraining Language generation
2020 ViT Transformer for vision CV+NLP unification
2020 GPT-3 175B params, emergent abilities Large model era begins
2021 CLIP Vision-language contrastive learning Multimodal alignment
2022 Stable Diffusion Latent space diffusion Text-to-image explosion
2022 ChatGPT RLHF alignment AI goes mainstream
2023 GPT-4 Multimodal large model AGI discussion
2023 Mamba State space models Transformer alternative
2024 Llama 4/DeepSeek MoE architecture Efficient large models

Core Learning Paradigms

Supervised Learning

The most traditional paradigm: given labeled data \((x_i, y_i)\), minimize loss:

\[\min_\theta \frac{1}{N}\sum_{i=1}^{N} \mathcal{L}(f_\theta(x_i), y_i)\]

Self-Supervised Learning

Construct supervision signals from data itself, no manual labeling:

  • Contrastive Learning: Pull positive pairs closer, push negative pairs apart (SimCLR/MoCo/CLIP)
  • Masked Prediction: Mask part of input, predict masked content (BERT/MAE)
  • Autoregressive Prediction: Predict next token based on context (GPT series)

Generative Learning

Learn data distribution \(p(x)\) and generate new samples:

Method Principle Representative Models
VAE Variational inference, ELBO maximization VAE, VQ-VAE
GAN Generator vs discriminator adversarial StyleGAN, BigGAN
Diffusion Step-by-step denoising DDPM, Stable Diffusion
Flow Matching Learn probability flow ODE Rectified Flow
Autoregressive Token-by-token generation GPT, DALL-E

Section Navigation


评论 #