Reinforcement Learning · PPO · Actor-Critic
This is an ongoing project. In V1, a PPO agent is trained using Stable-Baselines3 on the Walker2d MuJoCo environment. In V2 and later versions, the focus will shift to fully self-designed and implemented strategies.