Automated Detection of Sleep Arousals from Polysomnography: Comparison
Please note this is a comparison between Version 1 by Jianwei Shuai and Version 2 by Amina Yu.

Multiple types of sleep arousal account for a large proportion of the causes of sleep disorders. The detection of sleep arousals is very important for diagnosing sleep disorders and reducing the risk of further complications including heart disease and cognitive impairment. Sleep arousal scoring is manually completed by sleep experts by checking the recordings of several periods of sleep polysomnography (PSG), which is a time-consuming and tedious work. The development of efficient, fast, and reliable automatic sleep arousal detection system from PSG may provide powerful help for clinicians. This paper reviews the automatic arousal detection methods in recent years, which are based on statistical rules and deep learning methods.

 

  • sleep arousal
  • polysomnography (PSG)
  • machine learning
  • deep learning

1. Introduction

The appearance of sleep arousals (also known as microarousals) reflects the interruption and fragmentation of sleep and is a harbinger of the presence of somnipathy. Frequent microarousals can cause sleep disruption, sleep fragmentation, sleep disorder, aggravating daytime sleepiness, and other symptoms [1]. An increasing amount of evidence indicates that sleep arousals diseases are the concomitant symptoms of other diseases, including weight gain, depression, heart diseases, and diabetes. Therefore, advancing our current understanding of microarousals neurophysiology is not only a challenging research issue but also a public health issue.

Microarousals can also be spontaneous, caused by grinding teeth, partial airway obstruction, or even snoring [2]. A certain amount of spontaneous arousals seems to be an intrinsic part of physiological sleep [3][4][3,4], but excessive arousals can disrupt healthy sleep.

Polysomnography (PSG) collects all of the vital signs in a multidimensional time series. The vital signs include electroencephalogram (EEG), electromyography (EMG), electrocardiography (ECG), electrooculography (EOG), blood oxygen saturation level (SaO 2), respiratory airflow (airflow), and respiratory movement (chest ABD). Normal and abnormal brain activities are typically picked up by EEG. Some neuropathic disorders leave their signature on EEG [5][6][7][5,6,7]. PSG is the gold standard for detecting sleep disorders.

The physiological band of interest for PSG signals usually ranges from 0.01 to several hundred cycles per second. The lowest band in conventional EEG studies has a lower limit of 0.5 Hz or 1.0 Hz as the ‘slow frequency’ and ‘sub-slow’ EEG bands, while 100 Hz corresponds to the highest frequency of the EEG band [8][9][8,9]. The ECG spectrum is generally considered to be 0.05–100 Hz [10]. Jarvis et al. [11] suggested that ECG frequency associated with sleep apnea can be reduced to 0.02 Hz. EMG ranges from 5.0 Hz to higher frequencies up to 450 Hz [12]. Respiration movements, airflow, and other forms of SaO2 are low-frequency phenomena with activity ranging from 0.05 Hz to 0.35 Hz [13].

2. Micro arousal Detection with Traditional Machine Learning Methods

The general workflow in this field is shown in Figure 1 . Data scientists first extract the domain-specific features of PSG signals. Then, they use machine learning methods to classify them into non-arousal and arousal fragments.

Figure 1. General workflow of sleep arousal detection models with machine learning.

Designing hand-made features and then finding the best combination of these features to improve the classifier performance are difficult and time-consuming, because the process requires extensive domain knowledge, such as feature selection or dimensionality reduction techniques. Even so, the automatic detection with manual feature extraction does not guarantee optimal identification for tasks.

Another obstacle for automatic detection with traditional machine learning methods is that the classifier needs to work for many different patients whose signals may have different relevant statistics. Therefore, the same algorithm can produce different results, depending on how its criteria match the data for a particular patient. Table 1 summarizes the different automatic or semi-automatic detection algorithms  with wide spread machine learning methods.

Table 1. Various studies conducted on the automated detection of microarousal regions in PSG signals using traditional machine learning methods.
Author (Year) [Reference] Database Data Preprocessing Machine Learning Model Results
Huupponen et al. (1996) [14]Huupponen et al. (1996) [43] Local dataset FFT, average power MLP Accuracy = 41%
Patanerli et al. (1999) [15]Patanerli et al. (1999) [63] Naya University Wavelet transform, moving average, filter SAS software; STEPDISC program Sensitivity = 88.1%, Selectivity = 74.5%
Gouveia et al. (2003) [16]Gouveia et al. (2003) [39] Local dataset FFT, frequency analysis A set of scoring rules Detection rate = 70%
Cho et al. (2005) [17]Cho et al. (2005) [41] South Korea’s Asan Medical Center Filtering, power spectrum, FFT SVM Sensitivity = 75.26%, Specificity = 93.08%
Agarwal et al. (2006) [18]Agarwal et al. (2006) [37] Local dataset (two patients) Second-order adaptive filter, frequency, MAA, etc. A set of decisional rules Sensitivity = 76.15%
David et al. (2006) [19]David et al. (2006) [36] National Institutes of Health (NIH) Sleep Disorders Research Plan 1. Bi-directional recursive filtering, 2. peak detection

3. relative trough position
Passive ballistocardiograph-based system Sensitivity = 77.3%,

Specificity = 96.2%
Shmiel et al. (2009) [20]Shmiel et al. (2009) [42] Aviv’s Assuta Medical Center FFT, critical points, etc. Sequential pattern discovery field Sensitivity = 75.2%, positive predictive value = 76.5%
Foussier et al. (2013) [21]Foussier et al. (2013) [38] Self-bulit database HRV, MD, 72 features Linear mixed mode MD=1.16, χ2=16,633
 
Espiritu et al. (2015) [22]Espiritu et al. (2015) [40] Texas State Sleep Center Savitzky-Golay filter,

energy power/entropy,

zero-crossing rate, etc.
Decision tree Accuracy = 81.63%
Shahrbabaki et al. (2015) [23]Shahrbabaki et al. (2015) [44] Self-bulit database

(6 male, 3 female)
Butterworth filter,

Welch’s algorithm,

32 features
KNN Accuracy = 93.6%
Wallant et al. (2016) [24]Wallant et al. (2016) [45] Self-bulit database (35 healthy volunteers) PSD, filtering data, segmentation, maximal amplitude, and slope Adapted thresholds Sensitivity = 83%
Subramanian et al. (2018) [25]Subramanian et al. (2018) [65] PhysioNet 2018 28 features GLM, RF Highest AUROC = 0.847, highest AUPRC = 0.630
Ugur et al. (2019) [26]Ugur et al. (2019) [66] SHHS CWT SVM Accuracy = 98.2%, positive predictive value = 97.93%
Liu et al. (2020) [27]Liu et al. (2020) [64] PhysioNet 2018 ICA, double density DWT algorithm, FIR filter CNN with RF AUPRC = 0.552

MLP = multilayer perceptron neural network; SVM = support vector machine; MAA = maximum absolute amplitude; HRV = heart rate variability; RF = random forest; SCL = skin conductance level; GLM = generalized linear model; CWT = continuous wavelet transforms; ICA = independent component correlation algorithm; DWT = discrete wavelet transformation; AUROC = area under the receiver operating characteristic curve; AUPRC = area under the precision-recall curve.

3. Microarousal Detection with Deep Learning Methods

Different from the manual feature extraction, neural networks can automatically learn variations and trends in the signal by carrying out feature extraction procedures through an abstract method. Deep learning methods possess the strong capability to learn complex features by directly applying them to raw data without extracting any hand-crafted features. Only recently have researchers begun to show a preference for deep learning methods, such as CNN [28][29][30][31][68,69,70,71], ResNet [32][48], the Siamese architecture network [30][70], RNN, and LSTM [33][34][35][59,72,73], over traditional machine learning methods in arousal detection.

The CNN makes it easier to extract different features of the input PSG data through convolution kernels. The models using CNN reviewed in this paper are summarized in Table 2 .

Table 2. Detailed information of models using the CNN.
Author (Year) Database Preprocessing Results
Dongya et al. (2018) [28]Dongya et al. (2018) [68] PhysioNet 2018 Welch algorithm AUPRC = 0.114
Varga et al. (2018) [29]Varga et al. (2018) [69] PhysioNet 2018 68 features AUPRC = 0.42
Patane et al. (2018) [30]Patane et al. (2018) [70] PhysioNet 2018 Filter, data augmentation AUPRC =0.40
Miller et al. (2018) [36]Miller et al. (2018) [92] PhysioNet 2018 - AUPRC = 0.37
Zabihi et al. (2018) [31]Zabihi et al. (2018) [71] PhysioNet 2018 - AUPRC = 0.31
Olesen et al. (2020) [37]Olesen et al. (2020) [47] National Sleep Research Resource Resampled, baseline model F1-score = 0.682
Zhou et al. (2020) [38]Zhou et al. (2020) [93] PhysioNet 2018 Re-sample, Fourier transform AUPRC= 0.39
Jia et al. (2020) [32]Jia et al. (2020) [48] Beijing Tongren Hospital Down-sampled Recall = 86.0%

KSS = Karolinska sleepiness scale, F1-score = harmonic mean of precision and recall.

Common time series models include the RNN, LSTM, and bidirectional LSTM (Bi-LSTM). RNN and LSTM are networks that contain loops to connect previous information to current tasks. The models with LSTM reviewed in this paper are shown in Table 3 and Table 4.

Table 3. Comparison of LSTM-based approaches.
Author (Year) Database Data Preprocessing AUPRC
Warrick et al. (2018) [34]Warrick et al. (2018) [72] PhysioNet 2018 ST algorithm, logarithmic filters 0.36
Már Þráinsson et al. (2018) [33]Már Þráinsson et al. (2018) [59] PhysioNet 2018 Energy, Hjorth parameters, WPD 0.45
Kim et al. (2019) [35]Kim et al. (2019) [73] PhysioNet 2018 MFCC 0.458

ST = scattering transform; WPD = wavelet packet decomposition; MFCC = Mel-Frequency Cepstral Coefficient.

Table 4. Analysis of application of CNN+LSTM in sleep arousal.
Author (Year) [Reference] Database Data Preprocessing Model AUPRC
Li et al. (2018) [39]Li et al. (2018) [97] PhysioNet 2018 Signal segmentation CNN+BiLSTM 0.42
Sridhar et al. (2018) [40]Sridhar et al. (2018) [98] PhysioNet 2018 Feature time-series LSTM 0.573
Howe-Patterson et al. (2018) [41]Howe-Patterson et al. (2018) [100] PhysioNet 2018 FFT, down-sampled DNN+BiLSTM 0.54
Warrick et al. (2019) [42]Warrick et al. (2019) [99] PhysioNet 2018 - ST-LSTM 0.36
Achuth et al. (2019) [43]Achuth et al. (2019) [102] Local dataset Filters, RF DNN+LSTM 0.50
Table 5 provides the models and results of the teams participating in the PhysioNet 2018 Computational Challenge in Cardiology.
Table 5. Comparison of detection of non-apnea/hypopnea sleep arousal in PhysioNet 2018 Computational Challenge.

Author (Year) [Reference]

Number of Channels Model AUPRC
Sridhar et al. (2018) [40]Sridhar et al. (2018) [98] 13 CNN+RNN 0.573
Howe-Patterson et al. (2018) [41]Howe-Patterson et al. (2018) [100] 12 CNN+LSTM 0.54
Pourbabaee et al. (2019) [44]Pourbabaee et al. (2019) [101] 12 DNN+LSTM 0.543
Már Þráinsson et al. (2018) [33]Már Þráinsson et al. (2018) [59] 13 Bi-LSTM 0.45
Li et al. (2018) [39]Li et al. (2018) [97] 13 DNN+LSTM 0.43
Varga et al. (2018) [29]Varga et al. (2018) [69] 13 CNN 0.42
Patane et al. (2018) [30]Patane et al. (2018) [70] 5 CNN 0.40
Miller et al. (2018) [36]Miller et al. (2018) [92] 13 CNN 0.36
Warrick et al. (2018) [34]Warrick et al. (2018) [72] 13 RNN 0.36
Zabihi et al. (2019) [31]Zabihi et al. (2019) [71] 5 CNN 0.31

Note: Submitted inside the time frame of the official phase of the 2018 PhysioNet Challenge. AUPRC is for their internal test set and the official blind test set.

4. Automated Detection of CAP

CAP reflects the instability of sleep through EEG, which is accompanied by some dynamic events in the process of sleep (falling asleep, conversion of different sleep periods, and awakening in sleep). It is suggested that when there are external or internal sleep interference factors, the A1 subtype in CAP marks the brain’s efforts to continue to sleep. When sleep becomes increasingly unstable and the brain cannot maintain continuous sleep, EEG arousal will accompany or replace the slow activity with high amplitude. Therefore, A2 and A3 subtypes constitute the arousal of the central nervous system.

Methods for automated detection of CAP are listed in Table 6.

Table 6. Automated detection of CAP

Author(Year) [Reference]

Database

Data Preprocessing

Model

Results
Mariani et al. (2012) [45][107]
Parma Sleep Disorders Center
Hjorth activity; EEG variance
Discriminant classifier
Accuracy=84.9%
Chindhade et al. (2018) [46][105]
CAP Sleep Database
Differential moving average
Logistic regression
AUROC=0.512;
Accuracy = 58%
Hui et al.(2021) [47][106]
CAP Sleep Database
-
CNN
Sensitivity=80.29%;
Accuracy = 74.43%
Mendona et al.(2021) [108]
CAP Sleep Database
Lowpass filter
LSTM
Accuracy=81.3%;
Sensitivity=73.7%;
Specificity=81.7%

5. Conclusion

Reliable diagnosis of arousal is the most essential prerequisite of sleep disorder treatment. The ‘gold standard’ for sleep disorders was developed manually by experienced experts, which is a time consuming and costly process. Accurate automated scoring models could assist doctors to identify medical images faster and more accurately, free doctors from tedious work, and ultimately improve the efficiency of laboratory and home sleep diagnostic methods.

This review showed that deep learning models can complete complex tasks, and are more accurate than traditional machine learning models. Deep learning has the powerful function of learning complex features by directly applying them to original data without extracting any manual features. Because the changes in various physiological parameters usually occur in a period of time before arousal, RNN and LSTM can learn the temporal relation in PSG signals. Therefore, using deep learning methods to detect the features of sleep arousals has become a mainstream trend in the field of PSG signals.