sptotal: an R package for predicting totals and weighted sums from spatial data

Summary In ecological or environmental surveys, it is often desired to predict the mean or total of a variable in some finite region. However, because of time and money constraints, sampling the entire region is often unfeasible. The purpose of the sptotal R package is to provide software that gives a prediction for a quantity of interest, such as a total, and an associated standard error for the prediction. The predictor, referred to as the Finite-Population-Block-Kriging (FPBK) predictor in the literature (J. M. Ver Hoef, 2008), incorporates possible spatial correlation in the data and also incorporates an appropriate variance reduction for sampling from a finite population. In the remainder of the paper, we give an overview of both the background of the method and of the sptotal package.


Statement of Need
sptotal provides an implementation of the Finite Population Block Kriging (FPBK) methods developed in J. Ver Hoef (2002) and J. M. Ver Hoef (2008).Next we provide a short overview of FPBK.
Suppose that we have a response variable Y s i , i = 1, 2, …, N, where the vector s i contains the coordinates for the i th spatial location and N is a finite number of spatial locations.Then y, an N-length column vector of the Y s i , can be modeled with a spatial linear model y = Xβ + ϵ, where X is a design matrix for the fixed effects and β is a parameter vector of fixed effects.The vector of random errors follows a multivariate normal distribution with a mean vector of 0 and a covariance of where σ 2 is the spatial dependent error variance (commonly called the partial sill), τ 2 is the spatial independent error variance (commonly called the nugget), and I is the identity matrix.The i th row and j th column of the N × N spatial correlation matrix R contains the correlation between the random error of the i th spatial location, ϵ i , and the random error of the j th spatial location, ϵ j .A common used to generate R is the exponential correlation function (Cressie, 2015).
FPBK predicts some linear function of the response, f y = b′y, where b is an N-length column vector of weights.A common vector of weights is a vector of 1's so that the resulting prediction is for the total abundance across all sites.If only some of the values in y are observed, then the sptotal package can be used to find the the Best Linear Unbiased Predictor (BLUP) for b′y, referred to as the FPBK predictor, along with its prediction variance.
The primary functions in the sptotal package are described in the following section.In short, the FPBK method is implemented in sptotal's predict() generic function, which is used on a spatial model that is fit with sptotal::slmfit().

Package Methods
Before discussing comparable methods and R packages, we show how the main functions in sptotal can be used on a real data set to predict total abundance of moose in a region of Alaska.We use the AKmoose_df data in the sptotal package, provided by the Alaska Department of Fish and Game.
The data contains a response variable total, x-coordinate centroid variable x, ycoordinate centroid variable y, and covariates elev_mean (the elevation) and strat (a stratification variable).There are a total of 860 rows of unique spatial locations.Locations that were not surveyed have an NA value for total.
The two primary functions in sptotal are slmfit(), which fits a spatial linear model, and predict.slmfit(),which uses FPBK to predict a quantity of interest (such as a mean or total) using a fitted slmfit object.slmfit() has required arguments formula, data, xcoordcol, and ycoordcol.If data is a simple features object from the sf (Pebesma & others, 2018) package, then xcoordcol and ycoordcol are not required.The CorModel argument is the correlation model used for the errors.With the summary() generic, we obtain output similar to the summary output of a linear model fit with lm(), as well as a table of fitted covariance parameter estimates.Next, we use predict() to implement FPBK and obtain a prediction for the total abundance across all spatial locations, along with a standard error for the prediction.By default, predict() gives a prediction for total abundance, though the default can be modified by specifying a column of prediction weights for the vector b with the wtscol argument.The output of printed predict() gives a table of prediction information, including the Prediction (a total abundance of 1610 moose, in this example), the SE (Standard Error) of the prediction, and bounds for a prediction interval (with a nominal level of 90% by default).Additionally, some summary information about the data set used is given.
sptotal also provides many helper generic functions for spatial linear models.The structure of the arguments and of the output of these generics often mirrors that of the generics used for base R linear models fit with lm().Examples (applied to the moose_mod object) include AIC(moose_mod), coef(moose_mod), fitted(moose_mod), plot(moose_mod), and residuals(moose_mod).

Comparable Methods and Related Work
Design-based analysis and k-nearest neighbors (Fix, 1985) are two approaches that can be used to compute a mean or total in a finite population.Dumelle, Higham, Ver Hoef, Olsen, & Madsen (2022) provide an overview of design-based spatial analysis and FPBK, showing that FPBK often outperforms the design-based analysis.J. M. Ver Hoef & Temesgen (2013) show that FPBK often outperforms k-nearest-neighbors and highlight that quantifying uncertainty is much more challenging with k-nearest-neighbors.
Note that there are many spatial packages in R that can be used to predict values at unobserved locations, including gstat (Pebesma, 2004), geoR (Ribeiro Jr, Diggle, Schlather, Bivand, & Ripley, 2020), and spmodel (Dumelle, Higham, & Ver Hoef, 2023), among others.What sptotal contributes is the ability to obtain the appropriate variance of a linear combination of predicted values that incorporates a variance reduction when sampling from a finite number of sampling units.

Past and Ongoing Research Projects
Dumelle et al. (2022) used the sptotal package to compare model-based and design-based approaches for analysis of spatial data.Currently, a Shiny app is in development at the Alaska Department of Fish and Game that uses sptotal to predict abundance from moose surveys conducted in Alaska.
of papers retain copyright and release the work under a Creative Commons Attribution 4.0 International License (CC BY 4.0).Disclaimer The views expressed in this article are those of the author(s) and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency.Mention of trade names or commercial products does not constitute endorsement or recommendation for use.