Empirical and non-parametric copula models with the cort R package

Copulas are functions that describe dependence structure of a given dataset, or of a given multivariate random event, without describing the univariate events, the marginals. In statistics, it is sometimes useful to separate the marginal distributions (inflation of the money, mortality of the population) from the dependence structure between them, since estimating everything separately is usually easier. Copulas are broadly used in finance, actuarial science, geostatistics, biostatistics, and many other fields, when dealing with dependence.


Summary
The R package cort implements object-oriented classes and methods to estimate, simulate and visualize certain types of non-parametric copulas.
Copulas are functions that describe dependence structure of a given dataset, or of a given multivariate random event, without describing the univariate events, the marginals. In statistics, it is sometimes useful to separate the marginal distributions (inflation of the money, mortality of the population) from the dependence structure between them, since estimating everything separately is usually easier. Copulas are broadly used in finance, actuarial science, geostatistics, biostatistics, and many other fields, when dealing with dependence.
Copulas are distribution functions on the unit hypercube that have uniform margins (what we call the 'copula constraints'), and hence this package can be classified in 'density estimation software.' Although the estimation of copulas is a widely-treated subject, most performing estimators available in the literature are based on restricted, parametric estimation: vine copulas (Nagler & Czado, 2016) and graphical models (Li et al., 2019) for example are potential solutions but under restrictive assumptions. Classical density estimators such as kernels or wavelets do not satisfy marginal copula constraints. There also exist several treestructured piece wise constant density estimators, but they do not always lead to proper copulas when applied on pseudo-observations or true copula samples. The new models that are implemented in this package try to solve these issues.
We note that a lot of tools are available in R for copula modeling through the excellent package copula (Hofert et al., 2020;Ivan Kojadinovic & Jun Yan, 2010;Jun Yan, 2007;Marius Hofert & Martin Mächler, 2011). Most of these tools however focus on parametric estimation. We start to bridge the gap by providing some tools for non-parametric estimation.

Statement of need
The Copula recursive tree, or Cort, designed by ) is a flexible, consistent, piece wise linear estimator for a copula (Sklar, 1959), leveraging the patchwork copula formalization (Durante et al., 2015) and a specific piece wise constant density estimator, the density estimation tree (Ram & Gray, 2011). While the patchwork structure imposes the grid, this estimator is data-driven and constructs the grid recursively from the data, minimizing a chosen distance on the copula space. Furthermore, while the addition of the copula constraints makes the available solutions for density estimation unusable, our estimator is only concerned with dependence and guarantees the uniformity of margins. The R package cort provides a useful implementation of this model and several potential refinements, allowing for fast computations of Cort trees, and parallel computations of Cort forests.
The main feature implemented in the package is the Cort algorithm, a non-parametric, piece wise constant, copula density estimator. The implementation is recursive and hence quite efficient. The cort package is a statistical package that allows to estimate several nonparametric copula models in R. Although the state of the art copula package has functions to estimate the empirical copula, we provide a structured set of S4 classes that allows estimation of empirical copulas, checkerboard copulas, Cort copula and bagging of all of these. A specific class exists for bagging Cort models, which implementation runs in parallel, to fasten the computations, using the future package (Bengtsson, 2020). Most of the underlying machinery and computations are written in C++, through the Rcpp (Eddelbuettel, 2013;Eddelbuettel & Balamuta, 2017;Eddelbuettel & François, 2011;Eddelbuettel & Sanderson, 2014) package.
The cort package was designed to be used by statisticians who need a non-parametric view of the dependence structure of a given dataset. It features a rich and extensive API to call the statistical fitting procedures, plotting functions and tools to assess the quality of the fits, while complying with the R standards. Examples datasets are included in the package, and the many vignettes give examples of use cases. The package is available on the Comprehensive R Archive Network (CRAN).