Large Language Models

From attention mechanisms to the Transformer architecture, and on to pre-trained models like BERT and GPT, this section traces the core technical evolution of large language models.

Contents:

Traditional NLP — Word vectors, Word2Vec, language model fundamentals
Attention Mechanism — Self-attention, multi-head attention, scaled dot-product attention
Transformer Architecture — Encoder-decoder, positional encoding, layer normalization
BERT Architecture — Bidirectional encoder, masked language model, NSP
GPT Architecture — Autoregressive generation, causal attention, emergent abilities
ViT Architecture — Image patching, Vision Transformer
DiT Architecture — Diffusion Transformer, class-conditional generation

Large Language Models

评论 #