Towards a Scalable and Flexible Simulation and Testing Environment Toolbox for Intelligent Microgrid Control

Micro- and smart grids (MSG) play an important role both for integrating renewable energy sources in conventional electricity grids and for providing power supply in remote areas. Modern MSGs are largely driven by power electronic converters due to their high efficiency and flexibility. Nevertheless, controlling MSGs is a challenging task due to highest requirements on energy availability, safety and voltage quality within a wide range of different MSG topologies. This results in a high demand for comprehensive testing of new control concepts during their development phase and comparisons with the state of the art in order to ensure their feasibility. This applies in particular to data-driven control approaches from the field of reinforcement learning (RL), whose stability and operating behavior can hardly be evaluated a priori. Therefore, the OpenModelica Microgrid Gym (OMG) package, an open-source software toolbox for the simulation and control optimization of MSGs, is proposed. It is capable of modeling and simulating arbitrary MSG topologies and offers a Python-based interface for plug \&play controller testing. In particular, the standardized OpenAI Gym interface allows for easy RL-based controller integration. Besides the presentation of the OMG toolbox, application examples are highlighted including safe Bayesian optimization for low-level controller tuning.


I. INTRODUCTION
T HE transition of conventional energy supply systems based on fossil fuels to a sustainable structure characterized by renewable energies is a central technical and social challenge of the 21st century [1]. To achieve this, the inherent volatility of renewable energy sources requires a shift away from conventional, centralized structured topdown energy networks towards flexible, cross-sectoral and intelligent energy systems [2]. Therefore, in the course of the energy transition, micro-and smart grids (MSG) represent an important solution component to ensure a clean, efficient and cost-effective energy supply [3] [4]. MSG is the concept of a local network consisting of distributed energy resources (e.g. wind power), energy storage units (e.g. battery) and consumers in various sectors (e.g. electricity, heat, mobility) [5]. The local integration of renewable energies by means of The authors would like to thank Jarren Lange from Paderborn University for providing much MSG controller help and Andreas Heuermann from Linköping University for support on OpenModelica and FMU integration issues.
MSGs, e.g. within industrial companies or residential areas, relieves energy transmission grids and, thus, reduces the need for cost-and resource-intensive grid expansion. Moreover, MSGs can provide energy supply for remote areas without connection to a public distribution grid. In this context, power electronic converters became a central component of modern MSG approaches due to their very high energy conversion efficiency and flexibility in order to directly control the power flow between different MSG components [6].
MSGs are highly heterogeneous, complex systems coming with many different topologies depending on their purpose of application [7] [8]. Moreover, their operation contains a significant stochastic component, which is caused by the uncertainty of both the load demand and the regenerative feed-in as well as topology changes due to the insertion or removal of components during operation. Furthermore, some MSGs may use hybrid AC/DC subgrids in order to boost energy efficiency by reducing the number of required energy conversion stages. Consequently, controlling MSGs is a demanding task that comes with several key requirements, which can be summarized as follows: • Safety: the continuous availability of energy is of prime importance. Outages or component failures due to control errors (e.g. by overloading) are unacceptable. • Adaptivity: due to the wide range of MSG use cases, a high degree of control flexibility in a plug & play sense is necessary. • Resource optimality: minimizing both energy losses and operation costs utilizing available control degrees of freedom is highly desirable. • Power quality: providing energy supply at high power quality level is important for ensuring nominal functionality at load side.
In order to pursue these objectives, MSG control is typically addressed by hierarchical approaches on different time scales including [9] [10]: • Inner level: current (and voltage) control in the micro-to millisecond range including auxiliaries such as protective measures or phase-locked loops for each inverter. • Primary level: (re-)active power balancing between different inverters in the (sub-)second range for voltage and/or frequency control.

A. Contribution
We present the OpenModelica Microgrid Gym (OMG) package, an open-source software toolbox for the simulation and control optimization of MSGs based on energy conversion by power electronic converters [20]. The main contributions and features of the OMG toolbox are: • Flexible and scalable simulations of arbitrary MSG topologies using OpenModelica [21] backend; • Python-interface for easy access, configuration and evaluation of arbitrary controllers; • OpenAI gym [22] interface for training reinforcement learning agents or similar data-driven approaches; • Single and three phase configurations with AC or DC power supply; • Time domain resolution in the micro-and millisecond range targeting inner and primary level control; • Fully open-source and collaborative project under GNU GPLv3 license. The toolbox is under active development and currently focusing on simulation durations in the second range targeting the inner and primary control level. Extensions to simplified and lightweight model frameworks for extended simulation horizons are planed.
Selected background information on implementation of the OMG toolbox are presented in the following and more details can be found in the user guide and API documents [20]. Additionally, we present a use case of applying safe Bayesian controller optimization [23] to highlight the challenges in the application of data-driven control algorithms with regard to the aforementioned MSG requirements. In particular, we will deal with the topic of safety in the context of data-driven controller design.

B. Related Work
In the domain of power system simulations, the following software toolboxes are often mentioned: • MATPOWER [24], an open-source, Matlab-based project targeting static power flow simulation and optimization on distribution grid level. Since dynamic modeling is completely omitted, the focus is on secondary and tertiary control level assuming simplified quasi-stationary operation of all components. An OpenAI gym interface could be easily added, but is not available yet. • Pandapower [25], an open-source, Python-based project targeting static power flow simulation and optimization on distribution grid level. It. has a similar scope and functionality as MATPOWER. • PyPSA [26], an open-source, Python-based project targeting static power flow simulation and optimization on distributed grid level. It has a similar scope and functionality as the aforementioned packages. • PSAT [27], an open-source, Matlab-based project for simplified single line general power system simulation including optimized scheduling. Public user guide or code documentation is not available. It comes with a limited, fixed number of pre-defined primary level controllers. An external interfacing for other controller types or reinforcement learning is not provided. Due to the lack both of interfaces and especially dynamic simulation possibilities, the mentioned packages cannot be considered for the control engineering treatment of MSGs on inner and primary control level. Besides the above mentioned open-source solutions, there is also a range of commercial software with similar functionality focusing on static grid simulations, which is not reported in detail here. Furthermore, there is a variety of energy market-oriented packages (opensource and commercial) available (e.g. [28]), but since this work is focusing on technical control-oriented problems, these are not discussed here. In the field of dynamic grid and power electronic simulations, the following software packages have to be mentioned: • Simscape [29] is a commercial Matlab/Simulink extension offered by Mathworks. It enables a wide range of physics-oriented, dynamic modeling applications including power systems and in particular power electronics. Its functionality and scope is similar to the OMG toolbox, but closed-source and interfacing to non-Matlab software products comes with significant calculation overhead (c.f. Matlab engine API for Python [30]). • SPICE-related software such as LTspice, PLECS or ngspice focus on integrated circuit simulations often with nanosecond range time steps. Therefore, it is suitable to accurately simulate single power electronic converters on small simulation durations, but computationally not feasible for MSGs with multiple power units. Therefore, the OMG toolbox is currently the only available open-source solution for dynamic MSG simulations on small time scales. Due to the offered interfaces, it is particularly suitable for control development and testing, including training and evaluation of recent reinforcement learning techniques.

A. Toolbox Structure
The overall structure of the software package is inspired by the tensorforce library [31]. OMG contains wrappers for OpenAI Gym environments as well as fully implemented controlling agents. One of the main contributions of this toolbox is an OpenAI Gym instance in which a reinforcement learning agent can be trained and tested. However, in order to ease the use of this library, we also provide some predefined agents that can control the environment and also a service class that handles the execution of the specified agent on its environment. This reduces the boilerplate code significantly and allows developers and engineers to focus on designing and testing controllers. The runner class will take care of initializing and termination of agents and environments, as well as the execution of multiple episodes. The class will handle all information exchange between the agent and the environment, as shown in Fig. 1. The functionality is particularly handy, as the training of an agent usually spans multiple training epochs.
The agent class encapsulates all state related to the learning process. For example, it may contain a base controller such as a linear feedback controller that will be parametrized by external agents during learning and provide the control actions that the agent plays out on the environment (cf. Sec. III). This example corresponds to a hybrid approach mixing expertdriven and data-driven control. Nevertheless, the definition of the OMG interfaces is completely open, allowing the toolbox to be connected to a wide range of solutions, between entirely data-driven and entirely expert-driven. The agent also provides possibilities to record monitoring data of the learning process.
The highly configurable environment class provides an interface from OpenAI Gym to the internal simulation model. It will record data for monitoring and visualizing each epoch, as well as analyzing the control performance in more depth.

B. Modelica Integration
The overall integration of the OMG toolbox is shown in Fig. 2. For the model transfer from OpenModelica to Python, the functional mock-up interface (FMI) [32] is used. FMI is a tool-independent open source standard for the exchange of dynamic models. According to this standard, the model-including objects created for the exchange are called functional mock-up units (FMU), which support two flavours of simulation types: co-simulation (CS) and model exchange (ME). In CS, the numerical solver is embedded and supplied by the exporting tool, in this case OpenModelica. On the other side, in ME, the importing tool supplies the solver, while the FMU only provides the differential equation system.
Since the explicit Euler forward is the only solver implemented for CS in OpenModelica so far, ME is the preferred choice and the default one in OMG. To avoid small simulation step sizes in order to ensure numerical stability, implicit solvers are required. Here, the Python package SciPy [33] provides several implicit solvers and is used as the standard solution within the OMG toolbox.
The model is imported via PyFMI [34], which provides a wide range of communication methods between the model in the FMU and the Python interface. After extracting the initial states and the equation system, the controllers define the actions (i.e. control input) for the following step. Next, the equation system gets solved, and the states are transferred to the system interface.
This step-by-step approach increases the simulation time due to an overhead, which is not required by other simulation types, but it provides full access to any result and parameter at any time of the simulation, which is essential for some advanced techniques such as reinforcement learning (RL).

C. Microgrid Modelica Library
Together with the OMG Python package, an OpenModelica library to create customized MSG topologies is provided. It mainly consists of freely linkable inverters, filters and loads. The library is sketched in Fig. 3 together with an example network.
In this example, a DC bus, which can be adjusted via Python, supplies each inverter. The inverters can be connected via filters (e.g. LC or LCL). Moreover, a wide range of different load nodes are pre-defined in the library, which can be also be extended by the user. The filters and loads can be freely parametrized, either directly in the OpenModelica model or via PyFMI. In addition, the toolbox provides auxiliary components (such as phase-locked loops) and pre-defined voltage and current forming inverters, the latter with direct and indirect droop controllers.
Complex loss models for each filter are included. Due to their large impact on the complexity of the system and the resulting increase of simulation time, it is recommended to use the loss models only if the efficiency and loss behavior is of particular interest. Switching losses in the inverters are not implemented yet, but can be easily integrated on the Python level of the OMG toolbox.
The controller behavior at load steps is a very important point when addressing MSGs. Due to the lack of structure variability of equation systems in the Modelica language, it is currently not possible to fully add and delete load nodes during a simulation (step-like system topology changes). With further progress in the active research area of multimode modeling, this kind of switching behavior will be implemented in the future. In its current version, OpenModelica provides switches only in the form of variable resistors having high resistance for an open switch. Therefore, even open switches are conductive and the high resistance increases the stiffness of the ordinary differencial equation (ODE) system, resulting in numerical difficulties of the simulation.
As a workaround, load steps are provided through parameter-variation of the load nodes. These parameters can   III. USE CASE: SAFE BAYESIAN OPTIMIZATION FOR LOW-LEVEL INVERTER CONTROLLER TUNING In the following, the previously introduced OMG toolbox is applied to an exemplary automatic low-level inverter controller tuning process by safe Bayesian optimization to highlight its usage in MSG control scenarios. To this end, we will briefly summarize the fundamentals of inner level inverter current and voltage control in the MSG context, before linking this with a data-driven controller optimization.

A. Rotation Reference Frame -A Short introduction
MSGs can be described by an ODE system with state variables x like currents and voltages, and control inputs u such as the inverter duty cycle: (1) Applying for example an ohmic-inductive load (R a , L a ) with a sinusodial voltage v a in a (simplified) one-phase grid, the differential equation for controlling the current i a is Extending this for a three-phase system, the state-space model can be described as follows: The current can be represented as a vector in a fixed abc reference frame. This vector is rotating with the frequency of the sinusoidal supply voltage. Using the Park transformation, the system variables can be mapped into a rotating reference frame. Here, the d-axis is aligned with the a-axis of the rotating three-phase system, the q-axis is orthogonal to the d-axis and the third is the zero component: If the angular speed of the rotating frame is set equal to the grid frequency, the sinusoidal grid voltages and currents become stationary DC-variables. This simplifies the control design a lot, especially if linear feedback control such as PI controllers is applied. For more information on the basics of power electronic control we refer to [35] and similar textbooks.
For the remaining part of the paper, the simulation of the physical grid system is executed in the abc frame within OpenModelica, while the dq frame is utilized within the Python-based control parts.

B. Problem Statement
In this use case an inverter with an LC filter is supplying an ohmic-inductive load. As shown in Fig. 4, standard PI controllers are used to control the currents through the filter inductors in all phases. It is assumed that the filter and load parameters are not exactly known and, consequently, the optimal PI controller gains {K i , K p } cannot be calculated beforehand with respect to a given control performance metric. This assumption can be motivated both by parameter variations due to manufacturing uncertainties [36] and by missing system knowledge from the control point of view, e.g., when the MSG system topology is changed during operation (plug & play component insertion or removal) [37] [38]. Consequently, the controller parameters need to be automatically tuned online during regular system operation. As an important constraint, this tuning has to be performed in a safe way, such that unsuitable gain parameters leading to severe overshoots or other unsafe system behavior are prohibited at all time.
The exemplary use case is to supply the load node by a 15 A peak current in all three-phases at a grid frequency of 50 Hz with zero phase margin. The inverter is fed by an idealized DC source of 1000 V . The control is done in the dq-frame, hence, the setpoints for the current controllers are i * dq0 = [15 A, 0, 0] T , while for all subsequent training episodes a blackstart is assumed (i.e. all voltages and currents are initially zero). To keep this example simple and clearly arranged, superimposed control loops (such as an additional voltage loop) are not considered, nevertheless, they can be directly integrated into the example files provided in the source code of [20].
For all following investigations, the filter capacity is set as C filt = 20 µF, the inductance to L filt = 2 mH, the resistance, the inductance of the load R load = 20 Ω and L load = 1 mH, respectively. These parameters are used for the simulation, but are unknown to the controller tuning processing.

General control SafeOpt
Python-based control Fig. 4: Control plant in OpenModelica in abc reference frame and the Python-based control part in dq reference frame

Control plant in OpenModelica
The agent consists of a controller and an optimization part. In the control part, the observations are used to calculate the modulation indices for the inverter using the PI controllers. This is done in every step. The stepsize is set to ∆ t = 50 µs and the experiment runs for N = 300 steps. As shown in Fig. 4, the optimization part uses safe Bayesian optimization to calculate new controller parameters after each episode.

C. Solution Approach
The task of the previously described agent is to find optimal controller parameters during online operation. To avoid failures during controller optimization, an extension of Bayesian optimization [39], called SafeOpt, has been proposed [23].
In this work, the term safety is defined with respect to the performance metric J induced by the environments rewards. Therefore, large negative rewards are seen as indicators of system failure. In the defined reward function (4), the meanroot-error (MRE) between the measured phase currents i abc from the observations and the setpoints i * abc is provided as the regular performance indicator. However, the MRE is only an exemplary way to evaluate the control performance. Compared to the classical mean-squared-error (MSE) metric the MRE is penalizing smaller control errors around zero stronger and, therefore, focuses on a steady-state control error free behavior. Nevertheless, arbitrary performance measures can be included in the control agent definition within the OMG toolbox. Additionally to the MRE, a barrier function is used as a penalty if the nominal current i nom is exceeded to avoid that the current limit i limit is reached. It is assumed that exceeding i limit will either lead to severe component damage (e.g. by thermal overloading of the power electronic semiconductors) or to an emergency shutdown of the system. For this example, the nominal current and its limit are set to i nom = 20 A, i limit = 30 A .
The total reward is then given by with µ being the weight of the barrier function. In the following, µ = 2 is chosen. The performance J is calculated as the average reward per episode over all N time steps: The SafeOpt algorithm defines safety in a probabilistic manner. A parameter region is considered as safe if the lower confidence bound of the predicted performance does not violate a given lower performance threshold. The confidence bounds are calculated with a Gaussian process (GP) regression [40] over all known performance points seen so far. This parameter region is called safe region. Consequently, the SafeOpt algorithm needs to be initialized with a safe parameter set from which the safe exploration can be started.
The parameters considered for subsequent iterations are split into two sets: expanders and potential maximizers. Expanders are parameters that lay close to the boundaries of the safe region. Evaluating them increases the knowledge and, therefore, narrows the confidence bounds. An increased lower confidence bound expands the safe region, because the intersection with the minimal performance bound is pushed out. Potential maximizers are points whose upper confidence bound exceed the currently highest lower bound of the optimal performance. All other parameters are ignored for evaluation. For the next iteration, the parameter set with the largest confidence bound among the two sets is selected. This solves the exploration/exploitation trade-off in a similar way as the upper confidence bound algorithm [41].
Summarizing, the safety concept relies on two assumptions. Firstly, the performance in the learning environment must be representative of the real applications faced by the learned agent later on. This is important, as each parameter set is only evaluated a predefined number of time steps in its learning environment. Secondly, the GP must be able to sufficiently fit the observed performance, otherwise the confidence bounds are not reliable.
A Matérn kernel with ν = 3/2 is used as covariance function for the GP regression. Additionally, boundaries for the range of the parameters are specified. The GP processes thus defined subsequently samples from {K p , K i }, which are considered to be safe within the estimated confidence bound. Fig. 5 shows a visualization of the GP model. Here, only the control parameter K i is considered for the optimization and initialized with 10 V/(As), while K p = 0.01 V/A is fixed for demonstration purpose. The blue curve line shows the mean function of the Gaussian process. The blue region surrounding the mean function shows the 95 % confidence bounds. Minimal allowed performance, i.e. performances considered as safe, is indicated by the dashed line. Therefore, the safe region of K p is roughly given by [0.008 V/A, 0.012 V/A].

D. Evaluation on Single Gain Tuning of only K i
First, an investigation is carried out assuming a fixed gain K p while only K i is tuned by SafeOpt. We start with this simplified analysis to highlight certain findings, which are much easier to depict when only a single parameter is adapted. In Sec. III-E, the evaluation is then extended to a simultaneous tuning of both controller gains.
K i is constrained to the range [0, 300] V/(As) and is initialized with 10 V/(As). This safe initial choice can be motivated by application-specific expert knowledge or robust methods for controller design. However, it should be noted that the parameter range limitation is not the safe set. It potentially contains unsafe regions and can be interpreted as a very vague guess of the optimal location of the controller parameter.
To illustrate the optimization process, 15 episodes are run by the SafeOpt agent. After each episode, a new parameter for K i is chosen. The algorithm interrupts the episode if the current threshold is exceeded. The safe threshold J min = 2 · J init is set to twice the (negative) initial performance J init . So, a parameter set is classified as unsafe if the average reward per episode -indicating the error -has doubled in comparison to the initial parameter set. In Fig. 6, the results of the SafeOpt tuning process is shown, where K i is on the x-axis and the performance J on the yaxis. The initial performance J init = −0.52 is achieved using the above mentioned initial parameter set (K p = 0.005 V/A and K i = 10 V/(As)). The plot shows all results of the 15 performance measurements. It can be seen that for increasing integral gain the performance rises until K i ≈ 70 V/(As) and stays fairly constant until K i ≈ 120 V/(As). The best performance J = −0.29 of the 15 episodes was achieved using a parameter set of K p = 0.005 V/A and K i = 71.84 V/(As) .
If K i is increased further, a severe current overshoot exceeding the nominal current limit results during the blackstart of the system. Consequently, the performance decreases significantly. If K i is decreased relative to the initial value, the stationary control error increases and the performance decreases significantly, too. The decreasing characteristic for low K i already starts from K i ≈ 20 V/(As). This consequently shows that the initial K i was suboptimal.
An example for a large stationary control error is shown in Fig. 7. Plotted are the current waveforms achieved by a controller parameter set of K p = 0.005 V/A and K i = 0 V/(As). Since the absolute value of K p is low, the reference current cannot be reached. Furthermore, since the integral controller gain is zero, the stationary control error is not compensated. As a result, the performance is unacceptable and the parameter set is classified as unsafe (cf. Fig. 6). This finding will be discussed in more detail in Sec. III-F. In the following, both controller parameters are adjusted at the same time to find an optimal controller design. K p is constrained to the range [0, 0.03] V/A and K i to [0, 300] V/(As).
The initial values are chosen as K p = 0.005 V/A and K i = 10 V/(As).
Using these initial values, the controller reaches a performance of J init = −0.52. As the safe threshold J min = 2 · J init is selected. The SafeOpt algorithm is used and the one-dimensional example from Sec. III-D is extended to two dimensions. In Fig. 8, the contour lines visualize the performance landscape. The integral gain K i and proportional gain K p are shown on the x-and y-axis, respectively. The safe threshold J min and the confidence bounds are not indicated in this plot. Since the optimization problem is significantly harder the number of evaluations is increased to 50.
With respect to the performance landscape in Fig. 8, the parameters found optimal are K p = 0.0125 V/A and K i = 117.81 V/(As) .
This configuration achieves J = −0.28, effectively halving the error compared to the initial performance J init . Moreover, this result is slightly better than the performance of the first experiment in Sec. III-D. Compared to the optimal parameter values, increasing K i further leads to decreasing performance. Similarly, increasing K p forbids higher values of K i , because a severe current overshoot exceeding the nominal current limit and stronger oscillations are resulting during the blackstart of the system. A decreasing K p value also results in bad performance for high K i values, which causes oscillations initially triggered by the high error at blackstart. In Fig. 8, the gray line indicates the constraint parameter space of the first experiments. The mean function in Fig. 6 can be seen as an intersection depicted by the gray line in Fig. 8. In the second experiment, the gray line is sampled less densely. Therefore, the model naively predicts quite high performance for K i = 0 V/(As). Fig. 9 shows the current waveforms applying the optimal controller design. Here, a fast control response with only a small current overshoot is achieved while the nominal current is not exceeded.

F. Discussion
In GP regression, the fitted function is expected to change equally fast over the whole domain. This assumption is expressed in the covariance function and its lengthscale. Fig. 6, however, shows, that such a performance function can be fairly constant in some regions and show rapid change in others. To allow the GP to predict confidence bounds in regions of fast change, the lengthscale of the covariance function has to be chosen extremely small. This means that parameters that are expected to have strong correlation in their performance values lay very close to each other. This parametrization, however, also results in very conservative exploration in regions with fairly constant performance. Unfortunately, the GP cannot predict the steep performance drop (cf. again Fig. 6) if neither the prior expectation nor the observed data indicate such a steep change. Therefore, selecting a larger lengthscale results in highly overconfident behavior as the boundaries of the performance plateau are missed. Overall the prior covariance assumptions of the GP are not met by our performance function, which significantly limits its reliability. Accordingly, the examined SafeOpt algorithm cannot solve the given task satisfactorily. In the future, other design methods from the field of safe, data-driven optimization should be examined for their suitability in the context of optimal control of MSGs.

IV. CONCLUSION AND OUTLOOK
With the OMG toolbox, a fully open-source, scalable and flexible platform for simulation and testing of intelligent microgrid control is proposed. The toolbox fills a gap in the area of dynamic system and control analysis for inverter-driven microgrids. The core feature is a customizable interface between OpenModelica for plug and play-like system modeling and Python for the integration of arbitrary control algorithms. OMG already offers some standard controllers as well as auxiliary tools (e.g. phase-looked loops) to speed up the overall simulation and control design process for the user. In addition, the integrated OpenAI Gym interface offers a wide range of options for training and evaluating data-driven controllers from the field of reinforcement learning.
The importance of safety has already been highlighted by the data-driven optimization case study of a linear feedback controller. Although the standard controller framework is heavily based on expert knowledge and is not a complete datadriven RL agent, its data-driven optimization is associated with the risk of creating unsafe system states, potentially leading to system malfunctions or damages. Safe Bayesian optimization could only prevent this to a limited extent, because its abstract uncertainty evaluation based on Gaussian processes cannot provide a reliable safety prediction.
The safe, data-based control of microgrids therefore remains an exciting research challenge for future work. Here, the integration of a priori expert knowledge for the evaluation of safe control methods appears to be especially promising, for example to monitor and guide the training of reinforcement learning-based methods.