PyMPDATA v1: Numba-accelerated implementation of MPDATA with examples in Python, Julia and Matlab

Convection-diffusion problems arise across a wide range of pure and applied research, in particular in geosciences, aerospace engineering, and financial modelling (for an overview of applications, see, e.g., section 1.1 in Morton (1996)). One of the key challenges in numerical solutions of problems involving advective transport is sign preservation of the advected field (for an overview of this and other aspects of numerical solutions to advection problems, see, e.g., Røed (2019)). The Multidimensional Positive Definite Advection Transport Algorithm (MPDATA) is a robust, explicit-in-time, and sign-preserving solver introduced in Smolarkiewicz (1983) and Smolarkiewicz (1984). MPDATA has been subsequently developed into a family of numerical schemes with numerous variants and solution procedures addressing a diverse set of problems in geophysical fluid dynamics and beyond. For reviews of MPDATA applications and variants, see, e.g., Smolarkiewicz & Margolin (1998) and Smolarkiewicz (2006).


Usage examples
Simulations included in the PyMPDATA-examples package are listed below, labelled with the paper reference on which the example setup is based. Each example is annotated with the dimensionality, number of equations constituting the system, and an outline of setup.

Implementation highlights
In 2D and 3D simulations, domain-decomposition is used for multi-threaded parallelism. Domain decomposition is performed along the outer dimension only and is realised using the numba.prange() functionality.
PyMPDATA design features a custom-built multi-dimensional Arakawa-C staggered grid layer, allowing to concisely represent multi-dimensional stencil operations on both scalar and vector fields. The grid layer is built on top of NumPy's ndarrays (using "C" ordering) using the Numba's @njit functionality for high-performance multi-threaded array traversals. The array-traversal layer enables to code once for multiple dimensions (i.e. one set of MPDATA formulae for 1D, 2D, and 3D), and automatically handles (if needed) any halo-filling logic related to boundary conditions.
The Numba's deviation from Python semantics rendering closure variables as compile-time constants is extensively exploited within PyMPDATA code base enabling the just-in-time compilation to benefit from information on domain extents, algorithm variant used and problem characteristics (e.g., coordinate transformation used, or lack thereof).
In general, the numerical and concurrency aspects of PyMPDATA implementation follow the Bartman   .

Performance
A basic performance analysis is carried out comparing PyMPDATA execution (wall) times: (i) with or without Numba JIT, as well as (ii) comparing performance against the C++ implementation of MPDATA in libmpdata++. The tests are carried out using a 3D simulation based on the revolving sphere case from Smolarkiewicz (1984) (figs. 13-16 therein) as used in Jaruga et al. (2015) (fig. 13 therein). The simulation setup involves solution of a homogeneous advection problem in a cubic domain with a non-divergent rotational flow. Here, for simplicity, all simulations are carried out for 64 timesteps, and the measured wall-time is divided by the number of steps and reported as wall-time per timestep. In all reported runs, both for libmpdata++ and PyMPDATA, two corrective iterations of MPDATA are used, and the basic flavour of the algorithm is employed. Wall-time measurements are carried out using the timers built-in into libmpdata++, and using Python's timeit routines, respectively. Timing applies to integration only excluding initial condition evaluation or output handling. Simulations are repeated four times and the minimal value is reported in order to filter out JIT-compilation and caching overhead and to minimise thread-schedulling differences. For PyMPDATA runs with Numba JIT disabled, the number of repetitions is reduced from four to two. Simulations are carried out on one, two or three threads on a machine with four physical cores.  , and PyMPDATA with JIT enabled for both dynamic grid (i.e., grid extents specified at run-time, plotted with orange connected points) and static grid (i.e., grid extents specified ahead of JIT compilation, blue connected points). First, an over three orders of magnitude speedup is depicted comparing wall-times with JIT disabled and enabled. Comparison of PyMPDATA and libmpdata++ reveals comparable performance and scaling with number of threads with consistently shorter wall-times for PyMPDATA, and a slight further improvement when switching from dynamic to static grid. can be observed over a range of domain sizes starting from 16 by 16 by 16 up to 128 by 128 by 128. While more comprehensive tests and analyses would be needed to identify the cause of this superior performance, two possible factors include overhead from employment of the Blitz++ library in libmpdata++ as well as explicit (potentially superfluous) halo-filling triggers in libmpdata++ as opposed to on-demand halo-filling design implemented in PyMPDATA. Noteworthy, as reported in Jaruga et al. (2015) (section 7 therein), for the very test case discussed herein, and for small grid sizes (59 by 59 by 59), libmpdata++ had up to five times longer execution times compared with the original FORTRAN-77 serial implementaion of MPDATA. This indicates that the measured performance of PyMPDATA approaches the performance of the original FORTRAN-77 implementation, at the same time offering multithreading concurrency, hiding the compilation and linking stages from the user, and featuring interoperability with the Python package ecosystem.

Author contributions
PB had been the architect of PyMPDATA with SA taking the role of main developer and maintainer over the time. MO participated in the package core development and led the development of the condensational-growth example, which was the basis of his MSc thesis. JB contributed the DPDC algorithm variant handling. SD contributed the advection-diffusion example. MM contributed to the numba-mpi package. PR contributed the shallow-water example. MS contributed the advection-on-a-sphere example. The paper was composed by SA and is based on the contents of the README files of the PyMPDATA, PyMPDATA-examples, and numba-mpi packages.