performance: an R package for assessment, comparison and testing of statistical models

A crucial part of statistical analysis is evaluating a model's quality and fit, or performance. During analysis, especially with regression models, investigating the fit of models to data also often involves selecting the best fitting model amongst many competing models. Upon investigation, fit indices should also be reported both visually and numerically to bring readers in on the investigative effort. While functions to build and produce diagnostic plots or to compute fit statistics exist, these are located across many packages, which results in a lack of a unique and consistent approach to assess the performance of many types of models. The result is a difficult-to-navigate, unorganized ecosystem of individual packages with different syntax, making it onerous for researchers to locate and use fit indices relevant for their unique purposes. The performance package in R fills this gap by offering researchers a suite of intuitive functions with consistent syntax for computing, building, and presenting regression model fit statistics and visualizations.


Summary
A crucial part of statistical analysis is evaluating a model's quality and fit, or performance. During analysis, especially with regression models, investigating the fit of models to data also often involves selecting the best fitting model amongst many competing models. Upon investigation, fit indices should also be reported both visually and numerically to bring readers in on the investigative effort.
The performance R-package (R Core Team, 2021) provides utilities for computing measures to assess model quality, many of which are not directly provided by R's base or stats packages. These include measures like R 2 , intraclass correlation coefficient (ICC), root mean squared error (RMSE), or functions to check for vexing issues like overdispersion, singularity, or zeroinflation. These functions support a large variety of regression models including generalized linear models, (generalized) mixed-effects models, their Bayesian cousins, and many others.

Statement of Need
While functions to build and produce diagnostic plots or to compute fit statistics exist, these are located across many packages, which results in a lack of a unique and consistent approach to assess the performance of many types of models. The result is a difficult-to-navigate, unorganized ecosystem of individual packages with different syntax, making it onerous for researchers to locate and use fit indices relevant for their unique purposes. The performance package in R fills this gap by offering researchers a suite of intuitive functions with consistent syntax for computing, building, and presenting regression model fit statistics and visualizations.
performance is part of the easystats ecosystem, which is a collaborative project focused on facilitating simple and intuitive usage of R for statistical analysis Lüdecke, Ben-Shachar, Patil, Waggoner, et al., 2020;Makowski et al., 2019Makowski et al., , 2020.

Comparison to other Packages
Compared to other packages (e.g., lmtest (Zeileis & Hothorn, 2002), MuMIn (Barton, 2020), car (Fox & Weisberg, 2019), broom (Robinson et al., 2020)), the performance package offers functions for checking validity and model quality systematically and comprehensively for many regression model objects such as (generalized) linear models, mixed-effects models, and Bayesian models. performance also offers functions to compare and test multiple models simultaneously to evaluate the best fitting model to the data.

Checking Model Assumptions
Inferences made from regression models such as significance tests or interpretation of coefficients require meeting several assumptions, which vary based on the type of model. performance offers a collection of functions to check if assumptions are met. To demonstrate the efficiency of the package, we provide examples for a few functions, followed by a broader function that runs a comprehensive suite of checks in a single call.
For example, linear (Gaussian) models assume constant error variance (homoscedasticity). We can use check_heteroscedasticity() from performance to check if this assumption has been violated.

#> Warning: Heteroscedasticity (non-constant error variance) detected (p = 0.031).
For another example, Poisson regression models assume equidispersion. Violating this assumption leads to overdispersion, which occurs when the observed variance in the data is higher than the expected variance from the model. We can call check_overdispersion() to check if overdispersion is an issue. In addition to tests for checking assumptions, performance also provides convenience functions to visually assess these assumptions of regression models. performance's visual checks detect the type of model passed to the function call, and return the appropriate visual checks for each model type. At present, there are many supported regression models, such as linear models, linear mixed-effects models or their Bayesian equivalents. Inspect the package documentation for a complete listing.
For example, consider the visual checks from a simple linear regression model.

Computing Quality Indices of Models
performance offers a number of indices to assess the goodness of fit of a model. For example, R 2 , also known as the coefficient of determination, is a popular statistical measure to gauge the amount of the variance in the dependent variable accounted for by the specified model. The r2() function from performance computes and returns this index for a variety of regression models. Depending on the model, the returned value may be R 2 , pseudo-R 2 , or marginal/adjusted R 2 .

Testing Models
While comparing these indices is often useful, making a decision such as whether to keep or drop a model, can often be difficult as some indices can give conflicting suggestions. Additionally, it may be unclear which index to favour in different contexts. This difficulty is one of the reasons why tests are often useful as they facilitate decisions via "significance" indices like p-values (in a frequentist framework) or Bayes Factors (in a Bayesian framework).
The generic test_performance() function computes the appropriate test(s) based on the supplied input. For instance, the following example shows results from Vuong's Test (Vuong, 1989).