Foundation Models
Foundation models acquire general-purpose capabilities through large-scale pre-training and can be adapted to a wide range of downstream tasks. This section systematically covers the theoretical foundations of foundation models, modality-specific foundation models, and safety & alignment issues.
Contents:
- Introduction to Foundation Models — What are foundation models? Historical evolution, core characteristics, paradigm shifts
- Pre-training Paradigms — Self-supervised learning, masked modeling, contrastive learning, fine-tuning methods
- Representation Learning — What has been learned? Embedding spaces, semantic geometry, transfer learning
- Scaling & Architecture — Scaling laws, Dense-to-Sparse evolution, MoE deep dive
- LLM as Foundation — GPT series evolution, emergent abilities, RLHF, open-source ecosystem
- Vision Foundation Models — ViT, MAE, DINO, CLIP, SAM
- Multimodal LLMs — Projection/compression/unified architectures, LLaVA, GPT-4V, Gemini
- Generative Foundation Models — Diffusion, DiT, text-to-image/video/3D/audio
- Safety & Alignment — RLHF/DPO, hallucination, red teaming