Causal Reinforcement Learning

Authors

Andrea Baisero, Prakhar Patidar, Rishabh Shanbhag, Sagar Singh

Requirements

This package is dependent on the rl_parsers and gym-pyro packages. Install them from the linked repositories first, then install the packages from the requirements.txt file.

Abstract

Reinforcement learning has found many applications in the field of gaming world such as Atari games, and super mario games. Another emerging field machine learning world is Causal Modeling, which is building of models that can explicitly represent and reason cause and effect. Here we wanted to combine the two and study the effects on a RL agent in cases when there are latent confounders. Hence, we introduce Causality in RL to use the concept of inference to exploit the concept of action in RL to estimate the exact movement of our agent in the existence of unobserved confounders.

The primary purpose of this project was to first implement a Softmax agent capable of solving the FrozenLake environment from OpenAI Gym and then generalizing this to other environments. We also extended this to add another experimental analysis to observe the effect of confounding on agent's action while solving a problem in general Open AI framework.

How to explore this project

This project has 3 tracks.

The Softmax-agent and Planning as Inference with FrozenLake

In this track we tried to implement and use the softmax agent described in agentmodels to solve the FrozenLake environment, which was extended by FrozenLakeWrapper to implement reward shaping. Various attempts at implementing the softmax agent resulted in various forms of planning-as-inference methods and softmax-like agents.

The scripts control_as_inference.py, softmax_presample_policy.py and softmax_recursive.py contain these implementations. These scripts will work on either the FrozenLake environment, or any other PyroMDP environment, which will be explained next.

Generalization to Other Environments

We implemented OpenAI Gym environments for generic finite MDPs, POMDPs, and confounded MDPs which run the environments dynamics described in the .mdp, .pomdp, and .cmdp formats as pyro-ppl probabilistic programs, i.e., sampling sites relative to system states, rewards, observations, confounders, etc. As an example, we included our version of the gridworld.mdp environment, a standard RL toy problem, and the circle.cmdp, a custom-made MDP with confounding variable.

This contribution has been outsourced into its own standalone repository, gym-pyro. Read that package's documentation for more info.

Preliminary Study on Confounding MDPs

We performed a preliminary analysis showing the difference between conditioning and interventions when performing inference in decision problems with unobserved confounders, i.e., confounded MDPs. Specifically, we show that expectedreturns, and that the conditional expectation over-estimates the value of the expected return in circle.cmdp.

Additional Resources

The video abstract
The tutorial gives an overview of the project's major components.
The slides of our final presentation.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
gym-pyro-master		gym-pyro-master
rl_parsers-master		rl_parsers-master
slides		slides
src		src
tutorial		tutorial
.DS_Store		.DS_Store
README.md		README.md
circle.cmdp		circle.cmdp
circles cmdp calc.jpg		circles cmdp calc.jpg
gridworld mdp calc.pdf		gridworld mdp calc.pdf
gridworld.mdp		gridworld.mdp
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Causal Reinforcement Learning

Authors

Requirements

Abstract

How to explore this project

The Softmax-agent and Planning as Inference with FrozenLake

Generalization to Other Environments

Preliminary Study on Confounding MDPs

Additional Resources

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Causal Reinforcement Learning

Authors

Requirements

Abstract

How to explore this project

The Softmax-agent and Planning as Inference with FrozenLake

Generalization to Other Environments

Preliminary Study on Confounding MDPs

Additional Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages