graphenv: a Python library for reinforcement learning on graph search spaces

Many important and challenging problems in combinatorial optimization (CO) can be expressed as graph search problems, in which graph vertices represent full or partial solutions and edges represent decisions that connect them. Graph structure not only introduces strong relational inductive biases for learning (Battaglia et al., 2018) – in this context, by providing a way to explicitly model the value of transitioning (along edges) between one search state (vertex) and the next – but lends itself to problems both with and without clearly defined algebraic structure. For example, classic CO problems on graphs such as the Traveling Salesman Problem (TSP) can be expressed as either pure graph search or integer programs. Other problems, however, such as molecular optimization, do no have concise algebraic formulations and yet are readily implemented as a graph search (V. et al., 2022; Zhou et al., 2019). Such “model-free” problems constitute a large fraction of modern reinforcement learning (RL) research owing to the fact that it is often much easier to write a forward simulation that expresses all of the state transitions and rewards, than to write down the precise mathematical expression of the full optimization problem. In the case of molecular optimization, for example, one can use domain knowledge alongside existing software libraries to model the effect of adding a single bond or atom to an existing but incomplete molecule, and let the RL algorithm build a model of how good a given decision is by “experiencing” the simulated environment many times through. In contrast, a model-based mathematical formulation that fully expresses all the chemical and physical constraints is intractable.


Summary
Many important and challenging problems in combinatorial optimization (CO) can be expressed as graph search problems, in which graph vertices represent full or partial solutions and edges represent decisions that connect them. Graph structure not only introduces strong relational inductive biases for learning (Battaglia et al., 2018) -in this context, by providing a way to explicitly model the value of transitioning (along edges) between one search state (vertex) and the next -but lends itself to problems both with and without clearly defined algebraic structure. For example, classic CO problems on graphs such as the Traveling Salesman Problem (TSP) can be expressed as either pure graph search or integer programs. Other problems, however, such as molecular optimization, do no have concise algebraic formulations and yet are readily implemented as a graph search (V. et al., 2022;Zhou et al., 2019). Such "model-free" problems constitute a large fraction of modern reinforcement learning (RL) research owing to the fact that it is often much easier to write a forward simulation that expresses all of the state transitions and rewards, than to write down the precise mathematical expression of the full optimization problem. In the case of molecular optimization, for example, one can use domain knowledge alongside existing software libraries to model the effect of adding a single bond or atom to an existing but incomplete molecule, and let the RL algorithm build a model of how good a given decision is by "experiencing" the simulated environment many times through. In contrast, a model-based mathematical formulation that fully expresses all the chemical and physical constraints is intractable.
In recent years, RL has emerged as an effective paradigm for optimizing searches over graphs and led to state-of-the-art heuristics for games like Go and chess, as well as for classical CO problems such as the TSP. This combination of graph search and RL, while powerful, requires non-trivial software to execute, especially when combining advanced state representations such as Graph Neural Networks (GNN) with scalable RL algorithms.

Statement of need
The graphenv Python library is designed to 1) make graph search problems more readily expressible as RL problems via an extension of the OpenAI gym API (Brockman et al., 2016) while 2) enabling their solution via scalable learning algorithms in the popular RLlib library (Liang et al., 2018). The intended audience consist of researchers working on graph search problems that are amenable to a reinforcement learning formulation, broadly described as "learning to optimize". This includes those working on classical combinatorial optimization problems such as the TSP, as well as problems that do not have a clear algebraic expression but where the environment dynamics can be simulated, for instance, molecular design. of RL to complex search problems (e.g., parametrically-defined actions and invalid action masking). However, native support for action spaces where the action choices change for each state is challenging to implement in a computationally efficient fashion. The graphenv library provides utility classes that simplify the flattening and masking of action observations for choosing from a set of successor states at every node in a graph search.
Related software efforts have addressed parts of the above need. OpenGraphGym (Zheng et al., 2020) implements RL-based stragies for common graph optimization challenges such as minimum vertex cover or maximum cut, but does not interface with external RL libraries and has minimal documentation. Ecole (Prouvost et al., 2020) provides an OpenAI-like gym environment for combinatorial optimization, but intends to operate in concert with traditional mixed integer solvers rather than directly exposing the environment to an RL agent.

Examples of usage
This package is a generalization of methods employed in the optimization of molecular structure for energy storage applications, funded by US Department of Energy (DOE)'s Advanced Research Projects Agency -Energy (V. et al., 2022). Specifically, this package enables optimization against a surrogate objective function based on high-throughput density functional theory calculations St. John, Guan, Kim, Etz, et al., 2020) by considering molecule selection as an iterative process of adding atoms and bonds, transforming the optimization into a rooted search over a directed, acyclic graph. Ongoing work is leveraging this library to enable similar optimization for inorganic crystal structures, again using a surrogate objective function based on high-throughput quantum mechanical calculations (Pandey et al., 2021).