Skip to content

sagar-singh20/Causal-Reinforcement-Learning

Repository files navigation

Causal Reinforcement Learning

Authors

Andrea Baisero, Prakhar Patidar, Rishabh Shanbhag, Sagar Singh

Requirements

This package is dependent on the rl_parsers and gym-pyro packages. Install them from the linked repositories first, then install the packages from the requirements.txt file.

Abstract

Reinforcement learning has found many applications in the field of gaming world such as Atari games, and super mario games. Another emerging field machine learning world is Causal Modeling, which is building of models that can explicitly represent and reason cause and effect. Here we wanted to combine the two and study the effects on a RL agent in cases when there are latent confounders. Hence, we introduce Causality in RL to use the concept of inference to exploit the concept of action in RL to estimate the exact movement of our agent in the existence of unobserved confounders.

The primary purpose of this project was to first implement a Softmax agent capable of solving the FrozenLake environment from OpenAI Gym and then generalizing this to other environments. We also extended this to add another experimental analysis to observe the effect of confounding on agent's action while solving a problem in general Open AI framework.

How to explore this project

This project has 3 tracks.

The Softmax-agent and Planning as Inference with FrozenLake

In this track we tried to implement and use the softmax agent described in agentmodels to solve the FrozenLake environment, which was extended by FrozenLakeWrapper to implement reward shaping. Various attempts at implementing the softmax agent resulted in various forms of planning-as-inference methods and softmax-like agents.

The scripts control_as_inference.py, softmax_presample_policy.py and softmax_recursive.py contain these implementations. These scripts will work on either the FrozenLake environment, or any other PyroMDP environment, which will be explained next.

Generalization to Other Environments

We implemented OpenAI Gym environments for generic finite MDPs, POMDPs, and confounded MDPs which run the environments dynamics described in the .mdp, .pomdp, and .cmdp formats as pyro-ppl probabilistic programs, i.e., sampling sites relative to system states, rewards, observations, confounders, etc. As an example, we included our version of the gridworld.mdp environment, a standard RL toy problem, and the circle.cmdp, a custom-made MDP with confounding variable.

This contribution has been outsourced into its own standalone repository, gym-pyro. Read that package's documentation for more info.

Preliminary Study on Confounding MDPs

We performed a preliminary analysis showing the difference between conditioning and interventions when performing inference in decision problems with unobserved confounders, i.e., confounded MDPs. Specifically, we show that expectedreturns, and that the conditional expectation over-estimates the value of the expected return in circle.cmdp.

Additional Resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors