LLM Post-Training

This chapter provides an in-depth exploration of post-training methods for large language models from a reinforcement learning perspective. Pre-training endows LLMs with powerful language capabilities, but "predicting the next token" is not the same as "behaving as humans expect." Post-training uses various RL and RL-variant methods to align LLM behavior with human preferences.