RL: Generic reinforcement learning codebase in TensorFlow

Vast reinforcement learning (RL) research groups, such as DeepMind and OpenAI, have their internal (private) reinforcement learning codebases, which enable quick prototyping and comparing of ideas to many state-of-the-art (SOTA) methods. We argue the five fundamental properties of a sophisticated research codebase are: modularity, reproducibility, many RL algorithms pre-implemented, speed and ease of running on different hardware/ integration with visualization packages. Currently, there does not exist any RL codebase, to the author’s knowledge, which contains all the five properties, particularly with TensorBoard logging and abstracting away cloud hardware such as TPU’s from the user. The codebase aims to help distil the best research practices into the community as well as ease the entry access and accelerate the pace of the field. More detailed documentation can be found here.


Related Work
There are currently various implementations available for reinforcement learning codebase like OpenAI baselines (Dhariwal et al., 2017), Stable baselines (Hill, 2019), Tensorforce (Schaarschmidt, Kuhnle, & Fricke, 2017), Ray rllib (Liang et al., 2017), Intel Coach (Caspi, Leibovich, & Novik, 2017), Keras-RL (Plappert, 2019), Dopamine baselines (Castro, Moitra, Gelada, Kumar, & Bellemare, 2018) and TF-Agents (Sergio Guadarrama, 2018).Ray rllib (Liang et al., 2017) is amongst the strongest of existing RL frameworks, supporting; distributed operations, TensorFlow (Abadi et al., 2016), PyTorch (Paszke et al., 2017) and multi-agent reinforcement learning (MARL).Unlike Ray rllib, we choose to focus on Tensorflow support, allowing us to integrate specific framework visualisation and experiment tracking into our codebase.On top of this, we are developing a Kuberenetes script for MacOS and Linux users to connect to any cloud computing platform, such as Google TPU's, Amazon AWS etc.Most other frameworks are plagued with problems like usability issues (difficult to get started and increment over), very little modularity in code (no/ little hierarchy and code reuse), no asynchronous training support, weak support for TensorBoard logging and so on.All these problems are solved by our project, which is a generic codebase built for reinforcement learning (RL) research in Tensorflow (Schaarschmidt et al., 2017), with favoured RL agents pre-implemented as well as integration with OpenAI Gym (Brockman et al., 2016) environment focusing on quick prototyping and visualisation.
Deep Reinforcement Learning Reinforcement learning refers to a paradigm in artificial intelligence where an agent performs a sequence of actions in an environment to maximise rewards (Sutton & Barto, 1998).It is in many ways more general and challenging than supervised learning since it requires no labels to train on; instead, the agent interacts continuously with the environment, gathering more and more data and guiding its learning process.

Introduction: for-ai/rl
Further to the core ideas mentioned in the beginning, a good research codebase should enable good development practices such as continually checkpointing the model's parameters as well as instantly restoring them to the latest checkpoint when available.Moreover, it should be composed of simple, interchangeable building blocks, making it easy to understand and to prototype new research ideas.
We will first introduce the framework for this project, and then we will detail significant components.Lastly, we will discuss how to get started with training an agent under this framework.To accomplish this, we chose to modularise the codebase in the hierarchy shown below.In order to run an experiment, run: python train.py--sys ... --hparams ... --output_dir ....
Ideally, "train.py"should never need to be modified for any of the typical single agent environments.It covers the logging of reward, checkpointing, loading, rendering environment/ dealing with crashes and saving the experiments hyperparameters, which takes a significant workload off the average reinforcement learning researcher.

Full Example
Before you run a full examples, it would be to your benefit to install the following: • Nvidia CUDA on machines with GPUs to enable faster training.Installation instructions here • Tensorboard for training visualization.Install by running pip install tensorboard This tuturial will make use of a Conda environment as the preferred package manager.Installation instructions can be found here.

Conclusion
We have outlined the benefits of using a highly modularised reinforcement learning codebase.The next stages of development for the RL codebase are implementing more SOTA model-free RL techniques (GAE, Rainbow, SAC, IMPALA), introducing model-based approaches, such as World Models (Ha & Schmidhuber, 2018), integrating into an open-sourced experiment managing tool and expanding the codebases compatibility with a broader range of environments, such as Habitat (Savva et al., 2019).We would also like to see automatic hyperparameter optimization techniques to be integrated, such as Bayesian Optimization method which was crucial to the success of some of DeepMinds most considerable reinforcement learning feats (Y.Chen et al., 2018).
Our modularisation enables simple and easy-to-read implementation of each component, such as the Agent, Algo and Environment class, as shown below.
The project also includes simple random sampling and proportional prioritized experience replay approaches, support for Discrete and Box environments, option to render environment replay and record the replay in a video.The project also gives the possibility to conduct model-free asynchronous training, setting hyperparameters for your algorithm of choice, modularized action and gradient update functions and option to show your training logs in a TensorBoard summary.