MatSurv: Survival analysis and visualization in MATLAB

Survival analysis is a set of methods for evaluating time-to-event data that is widely applied across research disciplines. For example, it is commonly used in clinical trials to compare the effect of treatments. In cancer biology, it can be used to understand how lowor highexpression of genes affect the aggressiveness of the tumor. Time-to-event data frequently include censored data points, samples where no event was observed. An event is, for example, death, relapse of disease, or a new metastatic tumor. If none of these events occur during the study period, the time to-to-event is unknown, we only know that no events were observed during the study time. The methods described below were developed for this kind of data. For an in-depth introduction to survival analysis, we can recommend the book by Kleinbaum and David (Kleinbaum, 1998). In fact, much of the code used in MatSurv is based on the equations given in the book. Commonly reported elements of survival analysis include log-rank tests, hazard ratios (HR) and Kaplan-Meier (KM) curves. KM-curves are used to compare survival durations between two or more groups and give users a particular estimate of survival probability at a given time; log-rank tests are used to conduct statistical inference on survival durations between groups; and HRs provide a ratio of the hazard rates between groups. To further improve the KM-plot, it has been suggested that the KM-plot should always be accomplished by a table that describes the number of patients that are still “at-risk” at a specific timepoint (Morris et al., 2019). MATLAB (MATLAB, 2019) currently lacks functions to easily create KM-plots with accompanying risk tables. Furthermore, MATLAB does not have a built-in log-rank test, nor is one available in any of the existing toolboxes, including the Statistics and Machine Learning Toolbox. The Statistics and Machine Learning Toolbox support Cox proportional hazards regression using the coxphfit function and KM-plots can be created using the plot or stairs functions. Our goal for MatSurv is to provide an easy-to-use tool that creates publication quality KM-plots with corresponding risk tables. The statistical procedures built into MatSurv can be used to compare two or multiple groups. In addition, MatSurv allows the user to easily modify the appearance of the created figure. The graphics were inspired by the survminer R-package (Kassambara, 2018).


Summary
Survival analysis is a set of methods for evaluating time-to-event data that is widely applied across research disciplines. For example, it is commonly used in clinical trials to compare the effect of treatments. In cancer biology, it can be used to understand how low-or highexpression of genes affect the aggressiveness of the tumor. Time-to-event data frequently include censored data points, samples where no event was observed. An event is, for example, death, relapse of disease, or a new metastatic tumor. If none of these events occur during the study period, the time to-to-event is unknown, we only know that no events were observed during the study time. The methods described below were developed for this kind of data. For an in-depth introduction to survival analysis, we can recommend the book by Kleinbaum and David (Kleinbaum, 1998). In fact, much of the code used in MatSurv is based on the equations given in the book. Commonly reported elements of survival analysis include log-rank tests, hazard ratios (HR) and Kaplan-Meier (KM) curves. KM-curves are used to compare survival durations between two or more groups and give users a particular estimate of survival probability at a given time; log-rank tests are used to conduct statistical inference on survival durations between groups; and HRs provide a ratio of the hazard rates between groups. To further improve the KM-plot, it has been suggested that the KM-plot should always be accomplished by a table that describes the number of patients that are still "at-risk" at a specific timepoint (Morris et al., 2019). MATLAB (MATLAB, 2019) currently lacks functions to easily create KM-plots with accompanying risk tables. Furthermore, MATLAB does not have a built-in log-rank test, nor is one available in any of the existing toolboxes, including the Statistics and Machine Learning Toolbox. The Statistics and Machine Learning Toolbox support Cox proportional hazards regression using the coxphfit function and KM-plots can be created using the plot or stairs functions. Our goal for MatSurv is to provide an easy-to-use tool that creates publication quality KM-plots with corresponding risk tables. The statistical procedures built into MatSurv can be used to compare two or multiple groups. In addition, MatSurv allows the user to easily modify the appearance of the created figure. The graphics were inspired by the survminer R-package (Kassambara, 2018).

MatSurv Use and Features
MatSurv creates three different items, a KM-plot, a risk table and statistical results. The KMplot shows the events and also censoring for the different groups and it is customizable using different input parameters. The risk table shows the number of patients "at-risk" for different time points and it is linked to the KM-plot. The statistical results are reported as a structure and the different values are described below. MatSurv uses the log-rank (Mantel-Cox) test to calculate the Chi-squared test statistic and corresponding p-value describing evidence against the null hypothesis that the curves are identical. Users have two options for calculating HRs: the log-rank or Mantel-Haenszel approach. HR can only be calculated when there are two groups being compared. In the log-rank approach, HR = (Oa/Ea)/(Ob/Eb), where Oa & Ob are the observed events in each group and Ea & Eb are the number of expected events. In the Mantel-Haenszel approach, HR = exp((O1-E1)/V), where O1 is the number of observed events in a group, E1 is the expected number of events in the same group and V is the total variance. The two methods give similar results but the log-rank results will not work of there is no events in one of the groups. The 95% confidence interval for the HR's are also reported together with the inverse of all the values.
In order to use MatSurv, simply put MatSurv.m in any directory of your choice and make sure it is added to your path. At a minimum, the user should provide TimeVar, a vector with numeric time to event, either observed or censored, EventVar, a vector or cell array defining events or censoring, and GroupVar, a vector or cell array defining the comparison groups (see example code below). The function returns three pieces p, the log-rank p-value, fh, the KM-plot figure handle, and stats, which are additional statistics from the log-rank test. The user can further customize the style of their KM-plot (line colors, labels, ticks, etc.) by making changes to the figure handle.
When MatSurv is creating the groups based on the median value, the default option uses values less than the median compared to all other values, however this is a parameter that can be changed by the user.
The MatSurv software has no dependencies on toolboxes and runs completely on base MAT-LAB functions.

Comparison
The MatSurv output is comparable to that from proc lifetest in SAS and ggsurvplot in R. Code for reproducing similar output in R and SAS are shown below as well as the output from all 3 statistical programs (R, SAS and MatSurv). The data used is from a classic and frequently used example by Freireich (Freireich et al., 1963). In addition, we have also used the Acute Myeloid Leukemia (LAML) dataset (Ley et al., 2013)   The results from MatSurv have been compared against both SAS and R and found to return similar estimates. The Chi-Sq values and p-values for a log-rank test in MatSurv, SAS, and R are provided below (Table 1)