Policy-based Reinforcement Learning Methods on Walker2d task
V1: a PPO demo with stable-baseline3. In V2 and later versions, I will only focus on fully self-designed and implemented strategies.
This is an ongoing project, you can see the V1 source codes at:
GITHUB_LINK