Taweret : a Python package for Bayesian model mixing

Uncertainty quantification using Bayesian methods is a growing area of research. Bayesian model mixing (BMM) is a recent development which combines the predictions from multiple models such that each model's best qualities are preserved in the final result. Practical tools and analysis suites that facilitate such methods are therefore needed. Taweret introduces BMM to existing Bayesian uncertainty quantification efforts. Currently Taweret contains three individual Bayesian model mixing techniques, each pertaining to a different type of problem structure; we encourage the future inclusion of user-developed mixing methods. Taweret's first use case is in nuclear physics, but the package has been structured such that it should be adaptable to any research engaged in model comparison or model mixing.


Statement of need
In physics applications, multiple models with different physics assumptions can be used to describe an underlying system of interest.Though each model may have similar predictive accuracy on average, the fidelity of the approximation across a subset of the domain may differ drastically for each of the models under consideration.In such cases, inference and prediction based on a single model may be unreliable.One strategy for improving accuracy is to combine, or "mix", the predictions from each model using a linear combination or weighted average with input-dependent weights.This approach is intended to improve reliability of inference and prediction and properly quantify model uncertainties.When operating under a Bayesian framework, this technique is referred to as Bayesian model mixing (BMM).In general, model mixing techniques are designed to combine the individual mean predictions or density estimates from the  models under consideration.For example, mean-mixing techniques predict the underlying system by where [ | ] denotes the mean of  given the vector of input parameters ,   () is the mean prediction under the  th model ℳ  , and   () is the corresponding weight function.The density-mixing approach estimates the underlying predictive density by where ( 0 |  0 ,  , ℳ  ) represents the predictive density of a future observation  0 with respect to the  th model ℳ  at a new input  0 .In either BMM setup, a key challenge is defining   ()-the functional relationship between the inputs and the weights.
Figure 1: Schematic of Bayesian model mixing.Each model has region of parameter space where it has a high fidelity, but all models are meant to describe the same phenomenon.To obtain a model that works well for all of parameter space, we combine them using Bayesian model mixing methods.
This work introduces Taweret, a Python package for Bayesian model mixing that includes three novel approaches for combining models, each of which defines the weight function in a distinct way (see Table 1 for a comparison of the methods).This package has been developed as an integral piece of the Bayesian Analysis of Nuclear Dynamics (BAND) collaboration's software.BAND is a multi-institutional effort to build a cyber-infrastructure framework for use in the nuclear physics community (Beyer et al., 2023;Phillips et al., 2021).The software is designed to lower the barrier for researchers to employ uncertainty quantification in their data analysis and/or theoretical modeling, and to integrate, as best as possible, with the community's current standards concerning coding style (pep8).Bayesian model mixing is one of BAND's four central pillars in this framework (the others being emulation, calibration, and experimental design).
In addition to this need, we are aware of several other fields outside of physics that use techniques such as model stacking and Bayesian model averaging (BMA) (Fragoso et al., 2018), e.g., statistics (Yao et al., 2018(Yao et al., , 2022)), meteorology (Sloughter et al., 2007), and neuroscience (FitzGerald et al., 2014).It is expected that the Bayesian model mixing methods implemented in Taweret can also be applied to use cases within these fields.Statisticians have developed several versatile BMA/stacking packages, e.g.(Raftery et al., 2022;Vehtari et al., 2017).However, the only BMM-based package available is SAMBA-a BAND collaboration effort that was developed for testing BMM methods on a toy model (Semposki et al., 2022a).
Taweret's increased functionality, user-friendly structure, and diverse selection of mixing methods make it a marked improvement over SAMBA.

Bivariate linear mixing
The full description of this mixing method and several of its applications in relativistic heavy-ion collision physics can be found in the Ph.D. thesis of D. Liyanage (Liyanage, 2023).The bivariate linear mixing method can mix two models either using a density-mixing or a mean-mixing strategy.Currently, this is the only mixing method in Taweret that can also calibrate the models while mixing.Each physics-based model we consider may have unknown parameters that have physical meaning.In this context, Bayesian calibration corresponds to the process of using observational data to learn the values (and more generally, the posterior distributions) of this these unknown parameters.Most approaches in model mixing and model averaging use a two-step approach: first, fit individual models using a subset of the data; second, mix the predictions from each model (obtained from the previous step) using the other subset of the data to learn the weights.This method, employing simultaneous calibration and mixing, enables doing both steps at once.
The user may choose among the following mixing functions: • step: Θ( 0 − ) Here Θ denotes the Heaviside step function,  0 and  1 determine the shape of the weight function and are inferred from the experimental data, and  is the model input parameter (which is expected to be 1-dimensional for this mixing method).

Multivariate model mixing
Another Bayesian model mixing method incorporated into Taweret was originally published in (Semposki et al., 2022a), and was the focus of the BMM Python package SAMBA (Semposki et al., 2022b).It can be described as combining models by weighting each of them by their precision, defined as the inverse of their respective variances.The posterior predictive distribution (PPD) of the mixed model is a Gaussian and can be expressed as where (,  2 ) is a normal distribution with mean  and variance  2 ,   is the precision of the models, and each individual model is assumed to possess a Gaussian form such as Here,   () is the mean of the model , and  2  () its variance, both at input parameter .In this method, the software receives the one-dimensional input space , the mean of the  models at each point  ∈  (hence it is a mean-based mixing procedure), and the variances of the models at each point  ∈ .Each model is assumed to have been calibrated prior to being included in the mix.The ignorance of this mixing method with respect to how each model was generated allows for any model to be used, including Bayesian Machine Learning tools such as Gaussian Processes (Semposki et al., 2022a) and Bayesian Neural Networks (Kronheim et al., 2022).

Model mixing using Bayesian additive regression trees
A third BMM approach implemented in Taweret adopts a mean-mixing strategy which models the weight functions using Bayesian Additive Regression Trees (BART) conditional on the mean predictions from a set of  models (Yannotty et al., 2023).This approach enables the weight functions to be adaptively learned using tree bases and avoids the need for user-specified basis functions (such as a generalized linear model).Formally, the weight functions are defined by where   (;   ,   ) defines the  th output of the  th tree,   , using the associated set of parameters,   .Each weight function is implicitly regularized via a prior to prefer the interval [0, 1].Furthermore, the weight functions are not required to sum to one and can take values outside of the range of [0,1].This regularization approach is designed to maintain the flexibility of the model while also encouraging the weight functions to take values which preserve desired inferential properties.This BART-based approach is implemented in C++ with the trees module in Taweret acting as a Python interface.The C++ back-end implements Bayesian tree models and originates from the Open Bayesian Trees Project (OpenBT) (Pratola et al., 2023).This module serves as an example for how existing code bases can be integrated into the Taweret framework.Taweret uses abstract base classes to ensure compatibility and uniformity of models and mixing methods.The two base classes are BaseModel and BaseMixer located in the core folder (see Fig. 1 for a schematic); any model mixing method developed with Taweret is required to inherit from these.The former represents physics-based models that may include parameters which need to be determined by Bayesian inference.The latter, BaseMixer, represents a mixing method used to combine the predictions from the physics-based models using Bayesian model mixing.

Overview of package structure
The design philosophy for Taweret is to make it easy to switch between mixing methods without having to rewrite an analysis script.Thus, the base classes prescribe which functions need to be present for interoperability between mixing methods, and in particular, the models being called in the method.The functions required by BaseModel are • evaluate -gives a point prediction for the model; • log_likelihood_elementwise -calculates the log-likelihood, reducing along the last axis of an array if the input array has multiple axes; • set_prior -sets priors for parameters in the model.
The functions required by BaseMixer are • evaluate -gives point prediction for the mixed model given a set of parameters; • evaluate_weights -gives point prediction for the weights given a set of parameters; • map -returns the maximum a posteriori estimate for the parameters of the mixed model (which includes both the weights and any model parameters); • posterior -returns the chains of the sampled parameters from the mixed model; • predict -returns the posterior predictive distribution for the mixed model; • predict_weights -returns the posterior predictive distribution for the model weights; • prior -returns the prior distributions (typically objects, not arrays); • prior_predict -returns the prior predictive distribution for the mixed model; • set_prior -sets the prior distributions for the mixing method; • train -executes the Bayesian model mixing step.
Following our design philosophy, the general workflow for an analysis using Taweret is described in Fig. 3. From this, one can see three sources of information are generally required for an analysis: a selected mixing method, a model set, and training data.Each of these sources are connected through the training phase, which is where the mixing weights are learned.This leads into the prediction phase, where final predictions are obtained for the overall system and the weight functions.This process is summarized in the code snippet below.This workflow is preserved across the various methods implemented in Taweret and is intended to be maintained for future mixing methods included in this work.

Figure 2 :
Figure 2: Diagram depicting the base classes, their methods (functions) and their properties (data).

Figure 3 :
Figure 3: The general workflow for an analysis using Taweret.(Blue) The user must define each of the  models as a class inherited from BaseModel.(Green) The user can select an existing mixing method from Taweret (solid) or contribute a new method (dashed).(Purple) The model is trained using a set of training data (red), the model set (blue), and the selected mixing method (green).Predictions and uncertainty quantification follow from the training process.

Table 1 :
A summary of the three BMM approaches currently implemented in Taweret.Note that  ≥ 2. Following the method name and the type of mixing model, the Number of inputs column details the dimensions of the parameter which the mixing weights depend on (e.g., in heavy-ion collisions this is the centrality bin); the Number of outputs details how many observables the models themselves can have to compute the model likelihood (e.g., in heavy-ion collisions this can include charge multiplicities, transverse momentum distributions, transverse momentum fluctuations, etc.); the Number of models column details how many models the mixing method can combine, and the Weight functions column describes the available parameterization of how the mixing weights depend on the input parameter.