A collection of clean, educational implementations of Policy Gradient reinforcement learning algorithms in PyTorch, all trained on the CartPole-v1 environment.
| File | Algorithm | Key Idea |
|---|---|---|
policy_gradient_vanilla.py |
Vanilla Policy Gradient | REINFORCE with a value-function baseline for variance reduction |
policy_gradient_ppo.py |
Proximal Policy Optimization (PPO) | Clipped surrogate objective to constrain policy updates |
policy_gradient_trpo.py |
Trust Region Policy Optimization (TRPO) | Hard KL constraint enforced via conjugate gradient and line search |
pip install -r requirements.txtEach script is standalone — run it directly:
# Vanilla Policy Gradient
python policy_gradient_vanilla.py
# PPO
python policy_gradient_ppo.py
# TRPO
python policy_gradient_trpo.pyEach script trains for 100 iterations and prints per-iteration stats to stdout.
- Python 3.8+
- PyTorch
- NumPy
- Gymnasium (or legacy
gym >= 0.26)