ccostr : An R package for estimating mean costs with censored data

Censoring is a frequent obstacle when working with time to event data, as e.g. not all patients in a medical study can be observed until death. For estimating the distribution of time to event the Kaplan-Meier estimator is useful, but when estimating mean costs it is not, since costs, as opposed to time, typically don’t accumulate at a constant rate. Often costs accumulate at a higher rate at the beginning (e.g. at diagnosis) and end (e.g. death) of the study.

Several methods for estimating mean costs when working with censored data have been developed.Of note is the work by Lin, Feuer, Etzioni, & Wax (1997), who proposed three different estimators.The first, LinT , partitions the time period into small intervals and then estimates the mean costs by weighting the mean total cost of fully observed individuals in the interval with the probability of dying in the interval.The two others, LinA and LinB, weight the mean total cost within each interval with the probability of being alive at respectively the start or end of the interval.
Later Bang & Tsiatis (2000) proposed another method based on inverse probability weighting, where complete (fully observed) cases are weighted with the probability of being censored at their event time.Two estimators were presented: the simple weighted estimator, BT , using total costs for fully observed cases, and the partitioned estimator, BT p, utilizing cost history.Hongwei Zhao & Tian (2001) proposed an extension of the BT estimator, ZT , which includes cost history from both censored and fully observed cases.The ZT estimator was later simplified by Pfeifer & Bang (2005).
In Hongwei Zhao, Bang, Wang, & Pfeifer (2007) they demonstrated the similarity of the different estimators when using the distinct censoring times for defining intervals.They concluded that the following equalities hold for the estimates of mean cost: μBT = μLinT and μLinA = μLinB = μBT p = μZT .The estimators can be split into two classes: those that use and those that do not use cost history.As cost history contributes additional information these estimators are in general more efficient, and should be chosen if cost history is available.
Previous implementations of these estimators into statistical software have been done in Stata, first by Kim & Thompson (2011) who implemented the method from Lin et al. (1997), and later by Chen, Rolfes, & Zhao (2015) who implemented the BT and ZT estimators, and in SAS by Honwei Zhao & Wang (2010).To our knowledge none of the methods have previously been implemented in an R package.

Estimators
The R package ccostr includes four different estimators of the mean cost.The average sample, AS, estimator simply averages the total cost per individual, disregarding censoring, giving a downwards biased estimate since costs after censoring are not accounted for.The complete case, CC, estimator averages the cost of only fully observed cases, biasing the estimate towards the average cost for individuals with shorter survival, typically downwards biased.These two naive estimators are included as reference for the estimators accounting for censoring.For dealing with censored data we implement the BT and ZT estimators for handling situations with or without cost histories.

Assume we observe {(T
where n is the number of individuals, T i is the observation time, M i (u) the cost until time u, and ∆ i is event indicator for individual i, with ∆ i = 1 or ∆ i = 0 for fully observed and censored cases, respectively.Then the estimates are given by: Naive "Available Sample estimator" and "Complete Case estimator": where M i = M i (T i ) denotes the total cost.
Bang and Tsiatis's estimator (also known as Weighted Complete Case estimator): Where K(T i ) is the Kaplan-Meier estimator of the probability of censoring at time T i , i.e. the time of event for individual i.

Zhao and Tian's estimator (also known as Weighted Available Sample estimator):
where M (C i ) is the average of cost until time C i among individuals with event time later than C i , and K(C i ) is the Kaplan-Meier estimator of the censoring probability at the time T i .

Application
We have implemented the functions above in an R package, ccostr.The package includes two main functions, the first is ccmean() which calculates the mean cost until a time limit, specified with the parameter "L=", and takes as input a data frame in the following format: The data shown above are simulated data from the Stata hcost package (Chen et al., 2015).Applying ccmean() on the data with a time limit of L = 1461, gives results identical to hcost in Chen et al. (2015).The option addInterpol adds a small value to the numerator and denominator of the fraction used for interpolation of cost at unobserved times, and is only used here to mimic the implementation in hcost, by default it is set to zero.The second main function in ccostr is simCostData().This function simulates data in the correct format according to the method in Lin et al. (1997), and may be used for testing purposes: sim <-simCostData(n = 1000, dist = "unif", censor = "heavy", L = 10) head(sim$censoredCostHistory) The true mean cost of the simulated dataset is 40,000 (Lin et al., 1997).Applying the ccmean function to the simulated data yields the result below.We here present the result graphically using the built-in plotting function for an object of the ccobject class.