Policy Gradient Pipeline

Policy gradient methods are one of the core paradigms in modern reinforcement learning. From REINFORCE to PPO, all policy gradient algorithms share the same pipeline. This chapter systematically dissects every component of this pipeline from first principles.

Contents: