c212: An R Package for the Detection of Safety Signals in Clinical Trials Using Body-Systems (System Organ Classes)

Adverse events are typically defined by medical dictionaries, which provide a common reference terminology for use in and between clinical trials. There are a number of medical dictionaries in current use, all of which provide similar services. One such dictionary is MedDRA (Medical Dictionary for Regulatory Activities), which was developed by the ICH (International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use) and is widely used by regulatory bodies, clinical research organisations (CROs), and pharmaceutical companies. WHO-ART (World Health Organisation Adverse Reaction Terminology) is a similar dictionary maintained by the Uppsala Monitoring Centre for the World Health Organisation Collaborating Centre for International Drug Monitoring. MedDRA and WHO-ART have a similar hierarchical structure consisting of System Organ Classes (SOC) and various grouping and descriptor terms.


Summary
Safety in clinical trials may be characterised by the incidence or occurrence of adverse events. The statistical analysis of this data is complicated by the large number of adverse events recorded, with low event rates, small effect sizes and low power all contributing to the difficulty in determining a robust safety profile for a treatment during the trial process.
In addition to end of trial analyses, a number of interim analyses may take place at different time points during the trial lifecycle. These offer the additional statistical challenge of testing accumulating data, with possibly differing recruitment rates on trial arms contributing to a lack of balance in the data.
Adverse events are typically defined by medical dictionaries, which provide a common reference terminology for use in and between clinical trials. There are a number of medical dictionaries in current use, all of which provide similar services. One such dictionary is MedDRA (Medical Dictionary for Regulatory Activities), which was developed by the ICH (International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use) and is widely used by regulatory bodies, clinical research organisations (CROs), and pharmaceutical companies. WHO-ART (World Health Organisation Adverse Reaction Terminology) is a similar dictionary maintained by the Uppsala Monitoring Centre for the World Health Organisation Collaborating Centre for International Drug Monitoring. MedDRA and WHO-ART have a similar hierarchical structure consisting of System Organ Classes (SOC) and various grouping and descriptor terms.
The MedDRA hierarchical structure consists of five levels: System Organ Class (SOC), High Level Group Terms (HLGT), High Level Terms (HLT), Preferred Terms (PT), and Lower Level Terms (LLT). The PT is a single medical description of a symptom or observation while the LLT is how a patient or data recorder would describe a symptom or observation. Each LLT belongs to one PT and, in general, data will be recorded at the LLT level but reported at the PT level (the adverse event). As of 2020 there are 27 SOCs and over 80,000 LLTs.
The grouping of adverse events by SOC (or body-system) provides for possible relationships between the adverse events within a SOC. One consequence of this is the possibility that, for treatments which may affect a particular SOC, there may be raised rates for a number of adverse events within that SOC. A number of methods have recently been proposed to address the statistical issues in adverse event analysis by using these groupings of adverse events by body-system or SOC, taking into account the additional information provided by these relationships to increase the power of detecting real adverse event effects. These methods, which include both error controlling procedures for multiple hypothesis testing (Benjamini & Hochberg, 1995;Hu et al., 2010;Matthews, 2006;Mehrotra & Adewale, 2012;Yekutieli, 2008), and Bayesian modelling approaches (Amy Xia et al., 2011;Berry & Berry, 2004;Carragher, 2017b), are implemented in the R package c212 (Table 1).

Method Description
Benjamini-Hochberg procedure (Benjamini & Hochberg, 1995) Control of False Discovery Rate Group Benjamini-Hochberg procedure (Hu et al., 2010) Control of False Discovery Rate Double false discovery rate (Mehrotra & Adewale, 2012) Control of False Discovery Rate Subset BH-procedure (Yekutieli, 2008) Control of False Discovery Rate Bonferroni correction (Matthews, 2006) Control of Familywise Error Rate Berry and Berry model (Berry & Berry, 2004) Bayesian model for end of trial data Berry and Berry model without point-mass (Amy Xia et al., 2011) Bayesian model for end of trial data Interim analysis model (Carragher, 2017b) Bayesian model for interim trial data Interim analysis model without point-mass (Carragher, 2017b) Bayesian model for interim trial data

Statement of Need
The detection of safety issues in the post-marketing phase of a treatment's life cycle, as opposed to the trial phase, can have a serious effect on the health of patients and also a financial impact both for the companies developing the treatments, and the regulatory bodies responsible for overseeing them.
The R package c212 provides a self-contained set of methods for clinical trial safety investigators, statisticians and researchers, to aid in the early detection of adverse events. It is designed to be easy to use with a simple data input format and interface.
The primary use case for the software is in the statistical analysis of adverse event incidence and occurrence data during clinical trials. A second goal of the package is to provide reference implementations of the methods in Table 1 for use by researchers, both in the area of safety in clinical trials, as well those developing or testing methods for handling error rates when testing multiple hypotheses. Beyond safety in clinical trials, the package will be useful to any project which deals with multiple hypothesis testing, or projects where two groups of comparative data may be modelled by hierarchical Bayesian binomial or Poisson models, with recent extensions of the Bayesian models to observational data being developed .
The c212 package is currently being used both for clinical trial safety analysis (Fries et al., 2016;Munsaka, 2018;Wang et al., 2018) and as a research tool in the investigation and development of new safety methods (Tan et al., 2019(Tan et al., , 2020. It has also been implemented as part of an ASA Biopharm Safety Working Group workstream.

Overview
The Bayesian models, under assumptions of conditional independence, are fitted using a Gibbs sampling Markov Chain Monte-Carlo (MCMC) method (Robert & Casella, 1999). The posterior distributions of the model parameters are used to assess which adverse events may have increased rates on the treatment arm. In the case of the Berry and Berry model, which is binomial, the theta model parameter, representing the increase in the log-odds of an event occurring on the treatment arm, is used for this purpose (Berry & Berry, 2004). For the interim analysis models, which are Poisson based, the increase in the log rate of an event on the treatment arm is used for adverse event assessment. As in the Berry and Berry model this is represented by the parameter theta (Carragher, 2017b). Functions for generating summary statistics and highest probability intervals are provided using the services of the coda package (Plummer et al., 2006). The main convergence diagnostics available directly within the package are the Gelman-Rubin and Geweke statistics (Gelman et al., 2004), again from the coda package. Access to the raw samples is available for further processing should that be required. The error controlling procedures included in the package follow exactly the method definitions in the papers which introduced them (Benjamini & Hochberg, 1995;Hu et al., 2010;Matthews, 2006;Mehrotra & Adewale, 2012;Yekutieli, 2008). The following sections contain examples which cover the main uses of the software. The data sets and functions used are fully documented in the package.

Berry and Berry End of Trial Analysis
The data set c212.trial.data contains sample end of trial adverse event incidence counts. The data is modelled using the Berry and Berry model as follows: library(c212) data(c212.trial.data, package="c212") head(c212.trial.data, 2) B j AE Group Count Total 1 Bdy-sys_2 1 Adv-Ev_2 1 20 450 2 Bdy-sys_2 4 Adv-Ev_5 2 21 450 mod.BB <-c212.BB(c212.trial.data) mod.BB contains the raw samples generated from the model fitting procedure. To perform a convergence check: conv.BB = c212.convergence.diag(mod.BB) c212.print.convergence.summary(conv.BB) In order to assess which adverse events may be associated with treatment the function c21 2.ptheta is used. This calculates the posterior probability of an increase in log-odds of an event occurring on the treatment arm. A threshold may be used to view the adverse events which exceed some defined level, The high posterior probabilities may indicate a possible association with treatment for these adverse events.

Interim Analysis
Apart from the function used to fit the model, the procedure for fitting and accessing interim analysis data is exactly the same as for the Berry and Berry model.

Software Details and Availability
The c212 package was initially released to CRAN in 2017 and has been through a number of release cycles. Before each release a full set of unit and functional tests are performed on the package development system, including memory checks with valgrind (Seward & Nethercote, 2005) and Google address sanitizer (Serebryany et al., 2012). The package documentation also contains tests and examples based on data included in the package.
The c212 package is most easily downloaded and installed directly from CRAN (Carragher, 2017a) or, alternatively, from the corresponding GitHub repository (Carragher, 2020).
The authors are interested in extending the software to include new methods, particularly in the area of safety analysis, and would welcome collaborations in this area. Any support issues or questions can be addressed directly to the corresponding author, through the associated CRAN maintainer email address, or through the Github repository.

Performance
The Bayesian models are expensive to fit in terms of both computation and memory. The main issue is the number of parameters in the model. For the Berry and Berry model with N adverse events, and B SOCs there are (2 × N + 5 × B + 6) parameters in total. With C parallel MCMC chains and I iterations of the MCMC sampler, this will require space for C × I × (2 × N + 5 × B + 6) double precision numbers to store the samples. For larger datasets, the calculation of the convergence diagnostics and summary statistics may be time consuming as the samples are passed to the coda package.
Example: A trial with 23 SOCs and 497 AEs, 5 parallel chains and 40,000 iterations after burn-in, will require space for 133,800,000 doubles, equating to approximately 1GB of memory storage for the samples alone (assuming 8 bytes per double).
It is recommended that at least twice the storage required for the samples be available for fitting any of the Bayesian models.