Classical Reinforcement Learning
Classical reinforcement learning studies how agents learn optimal policies through interaction with the environment, forming the theoretical foundation of deep reinforcement learning.
Contents:
- Introduction to Classical RL — MDP framework, value functions, policies
- Multi-armed Bandits — Exploration vs. exploitation, UCB, Thompson sampling
- Finite MDP — Bellman equations, optimal policies
- Dynamic Programming — Policy iteration, value iteration
- Monte Carlo Methods — MC prediction, MC control, importance sampling
- TD(0) — Temporal difference learning, SARSA, Q-Learning
- N-step TD — Multi-step bootstrapping, bias-variance tradeoff
- Learning & Planning — Dyna architecture, model learning
- Approximation Methods — Function approximation, linear methods
- TD(lambda) — Eligibility traces, forward view & backward view
- Policy Gradient — REINFORCE, baseline functions, Actor-Critic