PARMESAN: Meteorological Timeseries and Turbulence Analysis Backed by Symbolic Mathematics

PARMESAN (the Python Atmospheric Research Package for MEteorological TimeSeries and Turbulence ANalysis) is a Python package providing common functionality for atmospheric scientists doing time series or turbulence analysis. Several meteorological quantities such as potential temperature, various humidity measures, gas concentrations, wind speed and direction, turbulence and stability parameters can be calculated. Furthermore, signal processing functionality such as properly normed variance spectra for frequency analysis is available. In contrast to existing packages with similar goals, its routines for physical quantities are derived from symbolic mathematical expressions, enabling inspection, automatic rearrangement, reuse and recombination of the underlying equations. Building on this, PARMESAN's functions as well as their comprehensive parameter documentation are mostly auto-generated, minimizing human error and effort. In addition, sensitivity/error propagation analysis is possible as mathematical operations like derivations can be applied to the underlying equations. Physical consistency in terms of units and value domains are transparently ensured for PARMESAN functions. PARMESAN's approach can be reused to simplify implementation of robust routines in other fields of physics.


Statement of need
The need to assert properly balanced physical units right from within running programs and models has been recognised for a long time now [3,2].Unit conversion errors in science and engineering have caused costly system failures such as the NASA Mars Climate Orbiter crash in 1999 [15].Nowadays, the Python ecosystem comprises many packages that ease specific tasks when performing physical calculations: numpy [6] and scipy [20] provide efficient numerical routines, pandas [19] and xarray [7] provide structures to read, write and aggregate data, pint [5] handles physical units and the uncertainties package [10] simplifies linear error propagation.Partly based on those, collections of routines for atmospheric science exist such as metpy [11], iris [13] and aoslib/PyAOS [17].However these focus more on gridded, spatial data which is common in modelling and remote sensing and have little functionality for turbulence analysis.Turbulence plays an important role in atmospheric exchange processes, especially in the planetary boundary layer [18].It is a statistical process and thus mostly quantified through high-resolution in-situ measurement techniques [4].metpy and iris can both handle units and require the user to explicitly specify them.Their physical quantities are calculated using hard-coded expressions.In contrast, the atmos package [12] has implemented an equation solving system for more flexible reusability and less hard-coding of relationships 1 arXiv:2309.15063v1[physics.ao-ph]26 Sep 2023 between quantities.Its development seems to have stalled since 2020, though.None of the above packages have a mechanism for transparently checking that input and output values are within reasonable physical bounds.
PARMESAN addresses the aforementioned gaps by providing functions for meteorological quantities that are backed by symbolic mathematical expressions employing SymPy [14], a powerful computer algebra system written purely in Python.Inputs and outputs are checked for and potentially converted to correct units while asserting that the physical domains are not exceeded.It can rearrange its equations and thus flexibly increase the number of available functions.PARMESAN has already been used successfully in Büchau et al. [1] for data analysis of atmospheric measurements.

Structure
Functions for physical quantities in PARMESAN are based on symbolic mathematical equations created using SymPy [14].PARMESAN defines a descriptive list of symbols (i.e.variables and constants, Figure 1) and relates them to form the common laws of thermodynamics, parametrisations and definitions used in atmospheric science.
Figure 1: Excerpt of auto-generated symbol list in parmesan.symbols.Symbols have metadata such as descriptions, units and default values attached.For readability, they can be referred to with different variable names, which are also available as parameter aliases when calling functions in PARMESAN.This approach has many advantages over the traditional method of hard-coding mathematical operations between function inputs using language-specific constructs.First of all, information about the mathematical relationship between quantities is not lost, but can instead be queried and reused.SymPy equations can be rearranged and recombined to generate new expressions, enabling the generation of many specific functions from a set of base equations.Additionally, SymPy expressions are translatable into code for numerous programming languages.PARMESAN uses this mechanism to turn its equations into executable Python functions that use the efficient numpy package internally [6], so no runtime overhead is introduced and array inputs and outputs are supported.Symbolic expressions are automatically simplified and terms cancelled accordingly, revealing the set of input parameters an equation really depends on.This information is then used to automatically generate extensive documentation for each individual function (Figure 2) -a great benefit for consistency and minimisation of human effort and oversight in the documentation.
Creating new functions in or from PARMESAN thus often requires only very few lines of code.Here is a compacted version of PARMESAN's function for potential temperature specifically for dry air: from parmesan.symbolsimport * # Import all of PARMESAN's symbols @from_sympy() # decorator turning SymPy expression into code and documentation def potential_temperature(): # no arguments necessary, added automatically return T * (p_ref / p) ** (R_dryair / c_p_dryair) # SymPy expression -practically equal to typical Python code In this case, the resulting quantity is derived from the function's name, documentation is generated (Figure 2) and the equation is immediately checked for units consistency employing the pint package [5].Each symbol has metadata attached, such as a physical unit and a domain (Figure 1).These are available to the resulting function for assertion, so a PARMESAN function will check and auto-convert input and output units and issue a warning when unphysical values arise such as negative absolute temperatures: Another benefit of having the underlying symbolic expression for an equation available is the possibility to do sensitivity analysis.PARMESAN can derive the maximum relative error ∆y max,rel (Equation 1) for its symbolic functions (Figure 3): The maximum relative error is a conservative estimation method for the propagation of errors of input quantities x i to effective error in the output quantity y, assuming the most severe combination of input quantity deviations ∆x imax .Custom sensitivity analyses can also be implemented based on PARMESAN's equations.
Figure 3: Auto-generated maximum relative error equation (Equation 1) for PARMESAN's potential_temperature() function (Figure 2).Symbolic PARMESAN functions automatically have a sensitivity analysis attached to quantify how a change in input parameters affects the output.In this case, the maximum expected relative error of potential temperature [%] is the sum of the maximum relative errors of temperature and pressure [%], with the pressure term scaled by a factor.PARMESAN can also rearrange its existing equations (Figure 4) for a quantity of interest by its provided get_function() function: from parmesan.symbolsimport * # get (or rearrange) functions that calculate mixing ratio mixing_ratio_functions = list(get_function(result=mixing_ratio) # get (or rearrange) functions that calculate mixing ratio # from at least temperature and pressure mixing_ratio_functions = list(get_function(result=mixing_ratio, inputs=(T, p)) The functions found can be called as usual or their underlying equations can be examined by accessing their .equationattribute.In a Jupyter notebook [9] the equations appear as formatted markup similar to what is depicted in Figure 4.
Besides physical equations, PARMESAN provides tools often needed when analysing timeseries such as calculating second-order moments, variance spectrum (Figure 5), autocorrelation, structure function (variogram) and running covariance, e.g. for calculating eddy fluxes [4], backed by the scipy package [20] for efficient numerics and matplotlib [8] for visualisation.PARMESAN integrates with the common pandas data analysis framework [19] by adding a .parmesanaccessor to DataFrame and Series objects to apply PARMESAN functions such as a variance spectrum or autocorrelation directly to them.[18] is correctly fulfilled as the timeseries variance equals the sum of discrete spectral variances.A Kolmogorov power-law fit [16] was optionally added by PARMESAN.

Figure 2 :
Figure 2: Auto-generated comprehensive parameter documentation and LaTeX-formatted equation for PARMESAN's potential_temperature() function to calculate potential temperature from atmospheric pressure and temperature.Parameter aliases, units, defaults and bounds are taken from PARMESAN's symbol library (Figure 1) and used coherently across functions in PARMESAN.

Figure 4 :
Figure 4: Excerpt of auto-generated humidity equation list in PARMESAN's humidity module.As the underlying equations in PARMESAN's functions are available as symbolic expressions, it can provide overviews of all related equations.

Figure 5 :
Figure5: PARMESAN discrete variance_spectrum() of an artificial wind timeseries (random walk overlayed with 2Hz and 3Hz sine waves).Note how Parseval's Theorem[18] is correctly fulfilled as the timeseries variance equals the sum of discrete spectral variances.A Kolmogorov power-law fit[16] was optionally added by PARMESAN.