ambrosia: An R package for calculating and analyzing food demand that is responsive to changing incomes and prices

The ambrosia R package was developed to calculate food demand for staples and nonstaple commodities that is responsive to changing levels of incomes and prices. ambrosia implements the framework to quantify food demand as established by Edmonds et al. (2017) and allows the user to explore and estimate different variables related to the food demand system. Currently ambrosia provides three main functions:


Summary
The ambrosia R package was developed to calculate food demand for staples and nonstaple commodities that is responsive to changing levels of incomes and prices. ambrosia implements the framework to quantify food demand as established by Edmonds et al. (2017) and allows the user to explore and estimate different variables related to the food demand system. Currently ambrosia provides three main functions: (1) calculation of food demand for any given set of income levels and prices; (2) estimation of calibration parameters within a given a dataset. Note: ambrosia is used to calculate the calibration parameters for the food demand model implemented in the Global Change Analysis Model (GCAM; Calvin et al., 2019); (3) exploration and preparation of raw data before starting a calibration parameter estimation.

Statement of need
An important motivation to develop ambrosia is functionalizing and separating out the different components of the sophisticated food demand framework from Edmonds et al. (2017) (summarized below) into usable R functions that can be easily parameterized and customized by the user. Thus, ambrosia has been developed to help researchers explore questions related to trends in food demand empirically. Since the equations of the model are grounded in peer reviewed research while the code itself is written in R (which increases usability), the tool is useful to researchers interested in, 1) analyzing and exploring trends in food demand with a computational model that is responsive to changes in incomes and prices that can easily be implemented on any time series (dataset); 2) re-estimating calibration parameters of the food demand model using custom data, thus effectively allowing the user to calibrate the model to custom data; 3) incorporating a detailed food demand model in their own earth system and economic models.
ambrosia is part of an ecosystem of tools within the Global Change Intersectoral Modeling System (GCIMS) that help users computationally explore science and policy questions related to different dimensions of human-Earth systems (PNNL, 2020). The parameters calculated from ambrosia are utilized directly in GCAM (Calvin et al., 2019) to represent forecasts of food demand. ambrosia ensures that the parameters that are used within GCAM are scientifically and empirically sound and also ensures reproducibility of the parameters for validation to comply with the commitment of GCIMS to FAIR guiding principles for scientific data management (Wilkinson et al., 2016). The code is structured to ensure that the parameters can be updated and tested effectively with changes to the underlying data.
Thus, the tool not only enables easy use and future development, but also enables easy modularization of the code within other systems. The sections below contain a detailed discussion of the different functions and customization options available within the tool.

Summary of the Edmonds et al. framework
The Edmonds et al. (2017) model represents a food demand model for staples and nonstaple commodities at different levels of prices and incomes. Demand for staples is described as increasing when income is lower, eventually peaks at under 1000$ per person per capita, and then begins to decline as higher income ranges are approached. Demand for non-staples increases with income over all income ranges; however, total (staple + non-staple) demand saturates at high income level.
The Edmonds et al. (2017) approach uses 11 calibration parameters where the parameters are fit using pooled cross-sectional-timeseries observations and a Bayesian Markov Chain Monte Carlo method (MCMC; Hastings, 1988). The framework represents demand for three categories of goods: staples (s), non-staples (n) and materials (m) where materials represent everything in the economy other than staples and non-staple food commodities. The demand for these three categories changes with changes in income (Y) and prices (P), with the response to price changes varying with income. Expenditures on these three goods are assumed to exhaust income.
Demand for these three categories can be represented mathematically as, (2) Non-staple food demand: q n = A n (x hn(x) )(w e ij is defined in a general way, where g i,j are constants, i >= j and f i (x) = (δln(x hi(x) ))/(δln(x)).
If h and e were constants, h would be an income elasticity as x = Y /P m and e ij would be own and cross price elasticity as w i = P i /P m .
The following functional forms are chosen for h s and h n , (5) h s (x) = (λ/x)(1 + (κ/ln(x))) In addition to the above, two other scaling parameters are applied when normalizing the demand values to that of materials. These are psscl for staples and pnscl for non-staples.
The parameters are fit using a weighted least square log likelihood function (Caroll & Ruppert, 1988) described below.
where, y i is the ith data value andŷ i is the corresponding model output and w i is the weight assigned to the data point. Since the parameters were fit based on regional data, the regional population was used as the weight.
By applying the 11 parameters to the equations described above, the user can generate estimates of demand for staples and non-staple commodities in thousand calories across different income levels and prices.

Main functions and customization
The 'ambrosia_vignette provides usable examples for all the major functions within the code.
The ambrosia package can be easily loaded as a standard R package after installation from GitHub. The user can calculate demand for staples and non-staples using the food.dmnd() function. The user will have to pass in a dataset with the price of staples (P s), price of non-staples (P n), incomes (Y ) (Current income proxy used in ambrosia is GDP per capita in thousand USD). In addition to the dataset, the user must pass a vector of 11 calibration parameters. In order to functionalize the parameters, the code contains a function called vec2param() that will generate a parameter structure that can be used by the food demand function. The food demand function is implemented using equations (1), (2), (3) described above. The user can also calculate and analyze price elasticities using the function calc1ep s(). These elasticities are calculated in accordance with equation (4) described above.
An interactive version of the food demand model can be launched by the user through the runapp() function to explore the impact of different parameters.
One of the benefits of using ambrosia is that a user can estimate their own calibration parameters with a custom data set using the log-likelihood maximization approach. To enable this, ambrosia is equipped with a function create.dataset.for.parameter.fit() that will help a user generate a dataset that is appropriate for parameter estimation. The user can re-create the training data used to calculate the parameters for GCAM using the Process_D emand_Data.R under the scripts directory.
Users can complete the parameter estimation on the dataset returned by create.dataset. for.parameter.fit() with a call to the calculate.ambrosia.params() function. This function builds on the Edmonds et al. (2017) approach by maximizing the log-likelihood score using the optim() function. Note that the user can also choose to use a different method (for example , a MCMC) to maximize the log-likelihood function by first setting up the function using the mc.setup() function. The code contains an example of a MCMC implementation in C++ under scripts/cpp.
Pacific Northwest National Laboratory is operated for DOE by Battelle Memorial Institute under contract DE-AC05-76RL01830. The views and opinions expressed in this paper are those of the authors alone.