Skip to content

ROZBEH/RL-Policy-Gradient

Repository files navigation

RL Policy Gradient Algorithms

A collection of clean, educational implementations of Policy Gradient reinforcement learning algorithms in PyTorch, all trained on the CartPole-v1 environment.

Algorithms

File Algorithm Key Idea
policy_gradient_vanilla.py Vanilla Policy Gradient REINFORCE with a value-function baseline for variance reduction
policy_gradient_ppo.py Proximal Policy Optimization (PPO) Clipped surrogate objective to constrain policy updates
policy_gradient_trpo.py Trust Region Policy Optimization (TRPO) Hard KL constraint enforced via conjugate gradient and line search

Installation

pip install -r requirements.txt

Running

Each script is standalone — run it directly:

# Vanilla Policy Gradient
python policy_gradient_vanilla.py

# PPO
python policy_gradient_ppo.py

# TRPO
python policy_gradient_trpo.py

Each script trains for 100 iterations and prints per-iteration stats to stdout.

Requirements

  • Python 3.8+
  • PyTorch
  • NumPy
  • Gymnasium (or legacy gym >= 0.26)

About

RL Policy Gradient Algorithms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages