secuTrialR: Seamless interaction with clinical trial databases in R

Elementary clinical trials have been conducted for hundreds of years (Meinert & Tonascia, 1986). The most famous early example is the proof that sailors’ scurvy can be cured by the consumption of citrus fruit (Lind, 2014) performed by James Lind in the 18th century. Since those initial days of clinical research, trials have significantly evolved methodically, ethically, and technologically. While it was viable and legitimate to collect clinical trials data in unversioned spreadsheets in the past, this is no longer true and digital clinical data management systems (CDMS) have taken over. CDMS allow constraint-based and versioncontrolled data entry into a clinical trial database, which ensures traceability, integrity, and quality of study data.

There is a vast market of heterogeneous CDMS solutions, each of which has individual advantages and limitations (Kuchinke et al., 2010). One limitation can be the interaction with the data after it has been collected. Specifically, a CDMS may be tailored for optimal data capture while, at least to some extent, disregarding ease-of-use of study data after the conclusion of data entry. It is, however, vital that the interaction between data sources and data analysts is fast and seamless in order to avoid loss of valuable time due to technical overhead. This point has been prominently highlighted by the currently ongoing coronavirus pandemic (Callaway et al., 2020) in which issues have been reported regarding the timely and complete transfer of information for the preparation of up-to-date infection counts (Kelion, 2020;Merlot & Pauly, 2020). These issues led to confusion and may have ultimately delayed important actions. While this is a stark example, it still serves to show how severe the influence of technical friction between digital systems can be.
To this end we have developed the open source R statistics (R Core Team, 2020) software package secuTrialR, which enables seamless interaction with data collected in the commercially available CDMS secuTrial (vendor interActive Systems Berlin). In addition to parsing and reading the data, it performs data transformation for dates, date times, and categorical data to reduce the data preparation overhead and to allow a swift transition into the analytical phase. Furthermore, secuTrialR includes standard functionalities to show descriptive statistics such as study recruitment or completeness of entered data per case report form for secuTrial data exports.

Statement of need
Due to the size and complexity of clinical trial and registry databases, technical friction during the initial interaction with data exported from secuTrial can be expected. Our own first-hand experience revealed that this overhead can sometimes significantly redirect scarce time and energy away from analysis and towards data management. The amount of time spent on data management should be as small as possible. The use of secuTrialR leads to a pronounced reduction of time necessary for data management, enables swift quantitative analyses through preimplemented functionalities, and most importantly, standardizes the interaction with data exports from secuTrial, thus allowing robust and reproducible science.
While some CDMS provide APIs (e.g., REDCap (Harris et al., 2009(Harris et al., , 2019) or Open Database Connectivity (ODBC) connections (e.g., 2mt's WebSpirit) to download data easily, using secuTrial's SOAP API involves querying individual datapoints. This results in an extraordinarily high number of queries even to download a relatively small database, and high demand on servers. As such, approaches such as those for REDCap (e.g., the REDCapR package, which can interface to REDCap's REST API and download all data in a single query, but does no data preparation) are not suitable for secuTrial. Another approach is to parse data exported manually from websites (e.g., the ox package for importing OpenClinica exports into R). This approach is used in secuTrialR.

Design
All secuTrial data exports share a certain common technical structure independent of the specific database at hand. In secuTrialR we make use of this information to build an S3 object of class secuTrialdata, which is a list, while the data is being read into R. All downstream functions implemented in secuTrialR expect a secuTrialdata object as input but custom analyses with other compenents of R statistics are also an option (see Figure 1). While editing the secuTrialdata object is technically possible, this is not advisable. Instead, it should be treated as raw data archive from which data can be extracted for analysis. However, if necessary, it is possible to extract subsets of secuTrialdata objects with the subset_secuTrial() function and return intact secuTrialdata objects. The individual elements of the secuTrialdata object can be accessed via regular list access operations or the as.data.frame() method, which assigns all objects to an environment of choice. Availability secuTrialR is available on GitHub, CRAN, Anaconda Cloud, and should be functional on all major operating systems.

interActive Systems statement
InterActive Systems (iAS) has given permission for the open source development of this software package but accepts no responsibility for the correctness of any functionalities within.
iAS has read and approved this manuscript.