Linopy: Linear optimization with n-dimensional labeled variables

Summary Linopy is an open-source package written in Python to build and process linear and mixed-integer optimization with n-dimensional labeled input data. Using state-of-the-art data analysis packages, Linopy enables a high-level algebraic syntax and memory-efficient, fast communication with open and proprietary solvers. While similar packages use object-oriented implementations of single variables and constraints, Linopy stores and processes its data in an array-based data model. This allows the user to build large optimization models quickly and lays the foundation for features such as fast writing to array-oriented scientific data formats, masking, automatic solving on remote servers and model scaling.


Statement of need
Decades after its inception (Dantzig, 1963), mathematical optimization is nowadays of immense importance for business, industry and governmental decision-making. Optimization is used to address various sorts of complex problems, such as challenges related to climate change, energy transitions, and food supply. Typically, an optimization problem, i.e. a mathematical program, consists of one objective function to be numerically minimized and a set of constraints that restrict the underlying variables to external conditions. Algebraic Modeling Languages (AML) aim at facilitating mathematical programming by allowing the user to formulate large scale, complex problems with a high-level syntax similar to the mathematical notation. The formulated problem is then passed to the solver of choice where a solution is calculated. AMLs provide the most user-friendly interface possible to various solvers, each with its own set of features.
Well established AMLs such as GAMS (Bussieck & Meeraus, 2004) and AMPL (Fourer et al., 1990) support a wide range of solvers, but are license-restricted and rely on closed-source code. In contrast, AMLs as JuMP (Dunning et al., 2017), CVXPY (Diamond & Boyd, 2016), Pyomo (Hart et al., 2017), GEKKO (Beal et al., 2018) and PuLP (Mitchel et al., 2022) are open-source and have gained increasing attention throughout the recent years. While the Julia package JuMP is characterized by high-performance, in-memory communication with the solvers, the Python packages Pyomo, GEKKO and PuLP lack parallelized, low-level operations and communicate slower with the solver via intermediate files written to disk. An exception is CVXPY, which supports fast array-based operations and uses low-level wrappers to the solvers. However, it is common among Python AMLs not to make use of state-of-the-art data handling packages. In particular, the assignment of coordinates or indexes is often not supported or memory extensive due to use of an object-oriented implementation where every single combination of coordinates is stored separately.
Linopy is an open-source Python package representing a new kind of AML that tackles these issues together. By introducing an array-based data model for variables and constraints, Linopy makes mathematical programming compatible with Python's advanced data handling packages Numpy (Harris et al., 2020), Pandas (Reback et al., 2022) and Xarray (Hoyer & Hamman, 2017).
The approach follows the idea that a variable ( 1 , 2 , ..., ) may be defined on an arbitrary number of ≥ 0 dimensions, each dimension spanning over a set of discrete coordinates of arbitrary data type (integer, string, date-time, etc.), i.e. ∈ { ,1 , ..., , }. The variable ( 1 , 2 , ..., ) is then stored as an array of shape 1 × 2 × ... × containing integer labels referencing the optimization variables used by the solver. Coordinates are automatically aligned when variables are used in linear expressions or when applying built-in functions, such as summing over specific dimension or grouping by user-defined labels. Note that if a variable should not be defined on the full set of coordinates given by { 1 , 2 , ..., }, a boolean mask of the same shape may be used to select where the variable is defined and where not.
The array-based modelling approach does not only lead to more flexibility but also increases the overall performance. The following figure shows the benchmark against the AMLs JuMP, Pyomo, PuLP and CVXPY as well as the solver specific interface Gurobipy. The included AMLs packages are all open source and well-established and therefore suitable for comparison. Linopy outperforms all Python AMLs in memory efficiency and is close to CVXPY and Gurobipy in terms of speed while being faster than JuMP. The producing Snakemake workflow is available here. The software and hardware specifications are detailed here. The benchmark is based on a 1-dimensional knapsack problem and uses the Gurobi solver. The overhead is calculated from the difference of the whole solving process via the AML and the solving process on the solver side alone. Note that the benchmark is hardly dependent on the complexity of the problem. Thus, adding more terms to the constraints, setting different kind of index labels or changing it to a purely linear problem does hardly have an effect on the overhead.
Due to a strong alignment to the Xarray package, Linopy supports storing the optimization model as a NetCDF file (Rew & Davis, 1990), which allows users to quickly share optimization problems with others. Using the Paramiko package, Linopy offers the user to send unsolved problems to a server and retrieve the solution after running the optimization remotely, which is particularly helpful if large computing resources are needed.

Related Research
Linopy is used by several research projects and groups, mostly related to energy system modelling. The energy system modelling tool PyPSA package , which is used by various institutions and builds the core of the PyPSA-Eur workflow , , uses Linopy as the primary optimization interface. The Fraunhofer Institute for Energy Economics and Energy System Technology is using Linopy in order to create an interface to GPU-based solvers. The German Aerospace Center uses Linopy for calculating stochastic optimization problems. Finally, a TU Berlin and Google Inc. cooperate on a research project that uses Linopy to analyze system-level impacts of 24/7 carbon-free electricity procurement in Europe.

Availability
Stable versions of the Linopy package are available for Linux, MacOS and Windows via pip in the Python Package Index (PyPI). Development branches are available in the project's GitHub repository together with a documentation on Read the Docs. For continuous integration, Linopy uses automated tests on Github together with Pre-Commit hooks. The Linopy package is released under GPLv3 and welcomes contributions via the project's GitHub repository.