Andrea Baisero, Prakhar Patidar, Rishabh Shanbhag, Sagar Singh
This package is dependent on the rl_parsers and gym-pyro packages. Install them from the linked repositories first, then install the packages from the requirements.txt file.
Reinforcement learning has found many applications in the field of gaming world such as Atari games, and super mario games. Another emerging field machine learning world is Causal Modeling, which is building of models that can explicitly represent and reason cause and effect. Here we wanted to combine the two and study the effects on a RL agent in cases when there are latent confounders. Hence, we introduce Causality in RL to use the concept of inference to exploit the concept of action in RL to estimate the exact movement of our agent in the existence of unobserved confounders.
The primary purpose of this project was to first implement a Softmax agent capable of solving the FrozenLake environment from OpenAI Gym and then generalizing this to other environments. We also extended this to add another experimental analysis to observe the effect of confounding on agent's action while solving a problem in general Open AI framework.
This project has 3 tracks.
In this track we tried to implement and use the softmax agent described in
agentmodels to solve the FrozenLake environment, which was extended by
FrozenLakeWrapper to implement reward shaping. Various attempts at
implementing the softmax agent resulted in various forms of
planning-as-inference methods and softmax-like agents.
The scripts control_as_inference.py, softmax_presample_policy.py and
softmax_recursive.py contain these implementations. These scripts will work
on either the FrozenLake environment, or any other PyroMDP environment, which
will be explained next.
We implemented OpenAI Gym environments for generic finite MDPs, POMDPs, and
confounded MDPs which run the environments dynamics described in the .mdp,
.pomdp, and .cmdp formats as pyro-ppl probabilistic programs, i.e.,
sampling sites relative to system states, rewards, observations, confounders,
etc. As an example, we included our version of the gridworld.mdp
environment, a standard RL toy problem, and the circle.cmdp, a custom-made
MDP with confounding variable.
This contribution has been outsourced into its own standalone repository, gym-pyro. Read that package's documentation for more info.
We performed a preliminary analysis showing the difference between conditioning and interventions when performing inference in decision problems with unobserved confounders, i.e., confounded MDPs. Specifically, we show that expectedreturns, and that the conditional expectation over-estimates the value of the expected return in circle.cmdp.
- The video abstract
- The tutorial gives an overview of the project's major components.
- The slides of our final presentation.