tehtuner: An R package to fit and tune models for the conditional average treatment effect

Randomized clinical trials (RCTs) often test and describe the average treatment effect, or how the candidate intervention is expected to increase or decrease an outcome of interest for all patients in the population of interest. However, secondary analyses may seek to identify ways in which underlying subject characteristics such as age or health status may modify the expected treatment effect, resulting in treatment effect heterogeneity. This phenomenon is expressed through the conditional average treatment effect (CATE) which is defined as

where is a vector of covariate measurements and 1 and 0 are the potential outcomes that would be observed under the treatment and control arms, respectively. Information about the CATE can then be used to determine the optimal treatment on a subject-to-subject basis (i.e., "personalized medicine") or identify sub-populations for which additional interventions or support are needed.
tehtuner fits models to estimate the CATE using the Virtual Twins method (Foster et al., 2011) while controlling the method's probability of falsely detecting treatment modifiers when all subjects would respond to treatment the same by implementing the permutation procedure proposed in Wolf et al. (2022). A key feature of Virtual Twins is that it estimates a simple model such as a regression tree which can be easily interpreted to understand the CATE as opposed to other popular data-adaptive methods which trade in interpretability for model flexibility. This is accomplished through a two-step procedure which first uses a flexible method such as random forests to estimate each subject's anticipated response under each treatment (Step 1) and then models the difference in these response estimates through a simple model such as a regression tree (Step 2).

Statement of need
Although there are several readily available R packages that can estimate the CATE (Hill, 2011;J. Tibshirani et al., 2022;Vieille & Foster, 2018); there are few at the time of writing which both estimate an interpretable model and guarantee controlled behavior when there is no treatment effect heterogeneity. While Vieille & Foster (2018) does provide an implementation of the original Virtual Twins manuscript, it does not support the methods in Steps 1 and 2 evaluated and recommended in Deng et al. (2023) such as super learner (van der Laan et al., 2007) and is prone to overfitting when there is no effect heterogeneity. Fan & Hong (2022) can estimate the effect of a given covariate on the treatment effect after marginalizing across all other covariates. This approach yields interpretable and well-behaved models but is best used for confirmatory rather than exploratory analyses as it is unable to analyze the CATE in a multivariate manner and requires covariate pre-specification to avoid Type I error inflation. At this current time, tehtuner supports linear models fit via the LASSO (R. Tibshirani, 1996), MARS (Friedman, 1991), random forests (Breiman, 2001), and super learner (van der Laan et al., 2007) in Step 1 and linear models tuned via the LASSO (R. Tibshirani, 1996), regression and classification trees (Breiman et al., 2017), and conditional inference trees (Hothorn et al., 2006) in Step 2. Comparative evaluations of these methods can be found in Wolf et al. (2022) and Deng et al. (2023).

Example Usage
The primary function is tunevt(), which first fits the Step 1 model and then fits the Step 2 model with an appropriate penalty parameter to ensure that the probability of incorrectly detecting any treatment effect modifiers when there is no treatment effect heterogeneity is a user-specified value.
The following code fits a tuned Virtual Twins model using a random forest in Step 1 and a regression tree in Step 2 while setting the probability of falsely detecting treatment effect heterogeneity at 20%. In this example, the fitted model for the CATE can be used to identify three distinct subgroups with differential estimated treatment effects.