Ethical Smart Grid: a Gym environment for learning ethical behaviours

The ethical-smart-grid package is a reinforcement learning (RL) simulator of a smart grid, based on Gym (Brockman


Statement of need
The field of Machine Ethics has recently received numerous contributions that try to implement so-called ethical behaviours (Tolmeijer et al., 2020), which target various domains and usecases.For example, Anderson et al. (2019) propose a "value-driven robot" instantiated on the eldercare domain, considering several ethical considerations, such as respecting the patient's autonomy, or maximizing its well-being.Although the decision-making algorithms of such approaches are thoroughly defined, the environment in which they are tested is most often not available.Thus, Machine Ethics researchers cannot compare different contributions on a common environment, e.g., to observe their respective effect on the exhibited behaviours.
ethical-smart-grid is an open-source environment including ethical considerations, which we propose to the community as a first step.In addition, this package may facilitate new contributions to the Machine Ethics community, by providing a ready-to-use environment, such that researchers can focus on the decision-making algorithms themselves rather than building both the algorithm and the application environment.

A Smart Grid simulator
The simulator is composed of multiple prosumer (producer and consumer) agents that interact in a shared smart grid by consuming and exchanging energy.It follows the standard RL interaction loop (Sutton & Barto, 2018): agents receive observations, and take actions that update the environment's state (see Figure 1).A noteworthy aspect of this simulator is that both observations and actions are continuous and multi-dimensional, i.e., observations are represented as vectors in ℝ 11 and actions in ℝ 6 .Some observations are shared by all agents whereas others are individual: other agents cannot access them.This is a design choice targeting the privacy of users who may be represented by such agents, if this simulator was deployed in the real world.
Four moral values taken from the literature (Boijmans, 2019;De Wildt et al., 2019;Milchram et al., 2018) are targeted in this environment: security of supply, affordability, inclusiveness, and environmental sustainability.Depending on the reward function used in simulations, these moral values can be focused individually, or several at a time, and may conflict at some time steps, making this simulator a suitable environment for learning ethics.
In order to propose a simple, yet extensible, simulator, we follow the Gym (now Gymnasium) library standard (Brockman et al., 2016).This standard is well-known in the Reinforcement Learning community, which makes our simulator easily compatible with many existing decisionmaking algorithms that have an implementation available for Gym environments.
However, the simulator slightly differs with Gym by accepting multiple agents instead of a single one.The main step function thus takes a list of actions, and returns a list of observations and a list of rewards.This modification should be compatible with existing multi-agent algorithms; we explain nonetheless how to adapt custom models to our simulator in the documentation.
Multiple scenarios can be used to parametrize the simulator, with several parameters such as the number of agents, their types, the quantity of energy available, and so on.Several components can be extended, and new scenarios can be implemented (see Open to extensions).

Using the simulator
This package can be used through the standard Gym interaction loop: More complex scenarios can be created to fully customize the simulator, by configuring existing components instead of using the make_basic_smartgrid() function; the documentation provides a tutorial on this subject.

Open to extensions
Although this package is completely usable as-is, it was designed to be open to extensions, in particular by third-party researchers.This is particularly important within the field of Machine Ethics, in which ethical considerations are not always the same between different groups or cultural contexts.To allow other members of the community to bring their own moral values, ethical considerations, or even data sets for realistic simulations, we explain in the documentation how to extend several aspects, including: • Agents' profiles, which determine a few common characteristics of agents, such as their needs for each time step, or the quantity of energy they produce.These characteristics can be based on external datasets for a realistic simulation, e.g., using real data corresponding to existing buildings.• The environment's conditions, such as the quantity of energy available at each time step.
This can be used to simulate, for example, a scenario of scarcity, or on the contrary an abundance.More complex scenarios can also be created, e.g., where agents first experience abundance and then have to adapt to scarcity.• Reward functions that determine the rewards received by agents, i.e., the degree of "correctness" of their actions, with respect to the moral values represented by the function.
Several moral values are already available, and other ones can be implemented through new reward functions.These functions can focus on one or several objectives, and potentially be aggregated to cope with single-objective decision-making algorithms.

Mentions
A previous (closed-source) version was used by one of the authors (Rémy Chaput) during his PhD studies to support experiments on several learning algorithms.However, the simulator was not easily re-usable by the community; this open-source version is a port, as close as possible to the old one, but with an emphasis on extensibility and re-usability.