Large Language Models
From attention mechanisms to the Transformer architecture, and on to pre-trained models like BERT and GPT, this section traces the core technical evolution of large language models.
Contents:
- Traditional NLP — Word vectors, Word2Vec, language model fundamentals
- Attention Mechanism — Self-attention, multi-head attention, scaled dot-product attention
- Transformer Architecture — Encoder-decoder, positional encoding, layer normalization
- BERT Architecture — Bidirectional encoder, masked language model, NSP
- GPT Architecture — Autoregressive generation, causal attention, emergent abilities
- ViT Architecture — Image patching, Vision Transformer
- DiT Architecture — Diffusion Transformer, class-conditional generation