Automated Sleep Stage Scoring Using k-Nearest Neighbors Classifier

Many features of sleep, such as the existence of rapid eye movement (REM) sleep or non-REM sleep stages, as well as some of the underlying physiological mechanisms controlling sleep, are conserved across different mammalian species. Sleep research is important to understanding the impact of disease on circadian biology and optimal waking performance, and to advance treatments for sleep disorders, such as narcolepsy, shift work disorder, non-24 sleep-wake disorder, and neurodegenerative disease. Given the evolutionary relatedness of mammalian species, sleep architecture and changes therein may provide reliable translational biomarkers for pharmacological engagement in proof-of-mechanism clinical studies.


Applications and Advantage
To expedite the tedious process of visually analyzing PSG signals and to further objectivity in the scoring procedure, a number of sleep staging algorithms have been developed both for animals (Barger, Frye, Liu, Dan, & Bouchard, 2019;Bastianini et al., 2014;Stephenson, Caron, Cassel, & Kostela, 2009;Vladimir, Ting-Chuan, Yuting, Bryan, & Steven, 2020) and human subjects (Gunnarsdottir et al., 2020;Penzel & Conradt, 2000;Zhang et al., 2020) as reviewed most recently by Fiorillo et al. (2019) and Faust, Razaghi, Barika, Ciaccio, & Acharya (2019). However, computer-based methods are typically tested on data obtained from healthy subjects or control animals, and performance is assessed only in a few cases in subjects with sleep disorders or following drug treatment (Allocca et al., 2019;Boostani, Karimzadeh, & Nami, 2017). Furthermore, scoring sleep for hundreds of animals in a typical preclinical drug discovery effort often becomes a bottleneck and a potential source of subjectivity affecting research outcomes.
In this paper, we present an automated approach intended to eliminate these potential issues. The initial application of our approach is for basic and discovery research in which experiments are conducted in large cohorts of rodents, with the expectation that results can be translated to higher-order mammals or even humans. Building on features classically extracted from EEG and EMG data and machine learning-based classification of PSG, this approach is capable of staging sleep in multiple species under control and drug-treated conditions, facilitating the detection of treatment-induced changes or other manipulations (e.g., genetic). Using human interpretable features calculated from EEG and EMG will be important to understand drug mechanisms, for prediction of treatment outcomes, and as biomarkers or even translational biomarkers. For example, one of the features used by the algorithm is the power in the theta frequency band (called eeg_theta in the code), which is the 4 Hz to 12 Hz range and it is known that an increase of theta activity together with low EMG activity (our relevant features are called emg_high and emg_RMS) are the hallmark of REM sleep (see the figure in Wikipedia contributors (2020)). However, theta power is also associated with other phenomena, like anxiety (John, Kiss, Lever, & Érdi, 2014), thus our eeg_theta feature, besides being used for sleep scoring can also be used as a biomarker of drug effect.
Multiple software applications have been developed to address the problem of automated sleep stage scoring. In their comparative review, Boostani et al. (2017) found that the best results could be achieved when entropy of wavelet coefficients along with a random forest classifier were chosen as feature and classifier, respectively. Another recent method (Miladinović et al., 2019) used cutting-edge machine learning methods combining a convolutional neural networkbased architecture to produce domain invariant predictions integrated with a hidden Markov model to constrain state dynamics based upon known sleep physiology. While our method also builds on machine learning techniques, it is based on interpretable features and uses a simpler algorithm for classification -which should make it an ideal choice for the broader community as well as for sleep experts who might not be too familiar with complex machine learning approaches. Furthermore, we chose not to constrain the number of identifiable sleep/wake states or the probability of transition from one state to another, as we and others have found that drug interventions (Harvey et al., 2013) and disease processes (de Mooij et al., 2020) tend to change not only the amount of time spent in different sleep stages but their transition probabilities as well. Finally, our method is a supervised method that requires a training set. While this might seem to be a disadvantage over non-supervised methods, we have found that drug treatment or pathological conditions can result in sleep stages not observed in healthy controls. Thus, the algorithm must be trained to these new stages.

Brief Software Description
Our software package, implemented in Matlab, is available for download on GitHub . Automatic sleep staging consists of the classical consecutive steps of machine learning-based sleep scoring algorithms Figure 1. First, offline stored EEG and EMG data are loaded into memory to allow for the uniform processing of time-series data and segmented into consecutive 10-second, non-overlapping epochs that correspond to manually scored epochs. Second, features are extracted from the raw signal for all epochs. Features consist of the power contained in physiologically-relevant frequency bands, as well as Hjorth parameters for both EEG and EMG data. Third, features undergo a pre-processing step including the following operations: unusable epochs that contain too much noise or contain no signal are removed. Features are then transformed using the logarithm function making feature distributions more Gaussian-like, thereby facilitating subsequent machine classification. Finally, each feature is normalized to its median wake value within an animal to enable usability of the algorithm across laboratories. Wake periods can be identified before running the algorithm using the manually-scored training set or an experiment can be performed such that a given period is expected to be comprised of an extended period of wakefulness. Following feature extraction, a combined filter and wrapper method-based feature selection step is applied. This step ensures that features with the most predictive value are chosen and also helps to prevent over-fitting. For classification, the k-nearest neighbors classifier is used on data pre-processed following the procedure described above. The algorithm was used to predict sleep stages in mice (Figure 2), rats ( Figure 3) and nonhuman primates (data not shown). Prediction accuracy was found to depend on a number of parameters of the input data, including consistency of manual scores and physiological signals, as well as the amount of artifacts. Furthermore, relative frequency of predicted labels can influence efficacy, with rare labels being harder to predict. The code on GitHub  accompanying this paper contains the abridged version of two datasets, one from male Trace Amine-Associated Receptor 1 (TAAR1) knockout mice described in detail in Schwartz, Palmerston, Lee, Hoener, & Kilduff (2018) (Figure 2) and the other from male Sprague-Dawley rats collected in the Sleep Neurobiology Laboratory at SRI International (Figure 3).
The rodents in both datasets received an oral dosing of a water-based vehicle solution.
Three labels were predicted: wake (W), non-REM sleep (NR), and REM sleep (R), and prediction efficacy was calculated. (However, note that any number of stages can be trained depending on how elaborate the manual scoring is.) The model was first used to train a single classifier merging training data from all animals (Figure 2 A, Figure 3 A), then individual models were trained, one for each animal (Figure 2 B, Figure 3 B). The GitHub repository includes additional information on prediction accuracy, including detailed values of true and false positive rates, as well as a method to deal with imbalanced data. Figure 2: Estimation of prediction accuracy for the transgenic mouse data. For each state (wake -W, non-REM -NR, REM -R) and animal (points on plots) true and false positive rates are calculated. Red crosses denote mean and SEM. In A, training data was merged and one single classifier was trained to predict sleep stages of all animals. In B, an individual classifier was trained for each animal separately.
State labels were predicted the same way for the rat data (the same set of GitHub scripts were run) and prediction accuracy represented on Figure 3 shows very similar results.