1000/1000
Hot
Most Recent
Multiple types of sleep arousal account for a large proportion of the causes of sleep disorders. The detection of sleep arousals is very important for diagnosing sleep disorders and reducing the risk of further complications including heart disease and cognitive impairment. Sleep arousal scoring is manually completed by sleep experts by checking the recordings of several periods of sleep polysomnography (PSG), which is a time-consuming and tedious work. The development of efficient, fast, and reliable automatic sleep arousal detection system from PSG may provide powerful help for clinicians. This paper reviews the automatic arousal detection methods in recent years, which are based on statistical rules and deep learning methods.
The appearance of sleep arousals (also known as microarousals) reflects the interruption and fragmentation of sleep and is a harbinger of the presence of somnipathy. Frequent microarousals can cause sleep disruption, sleep fragmentation, sleep disorder, aggravating daytime sleepiness, and other symptoms [1]. An increasing amount of evidence indicates that sleep arousals diseases are the concomitant symptoms of other diseases, including weight gain, depression, heart diseases, and diabetes. Therefore, advancing our current understanding of microarousals neurophysiology is not only a challenging research issue but also a public health issue.
Microarousals can also be spontaneous, caused by grinding teeth, partial airway obstruction, or even snoring [2]. A certain amount of spontaneous arousals seems to be an intrinsic part of physiological sleep [3][4], but excessive arousals can disrupt healthy sleep.
Polysomnography (PSG) collects all of the vital signs in a multidimensional time series. The vital signs include electroencephalogram (EEG), electromyography (EMG), electrocardiography (ECG), electrooculography (EOG), blood oxygen saturation level (SaO 2), respiratory airflow (airflow), and respiratory movement (chest ABD). Normal and abnormal brain activities are typically picked up by EEG. Some neuropathic disorders leave their signature on EEG [5][6][7]. PSG is the gold standard for detecting sleep disorders.
The physiological band of interest for PSG signals usually ranges from 0.01 to several hundred cycles per second. The lowest band in conventional EEG studies has a lower limit of 0.5 Hz or 1.0 Hz as the ‘slow frequency’ and ‘sub-slow’ EEG bands, while 100 Hz corresponds to the highest frequency of the EEG band [8][9]. The ECG spectrum is generally considered to be 0.05–100 Hz [10]. Jarvis et al. [11] suggested that ECG frequency associated with sleep apnea can be reduced to 0.02 Hz. EMG ranges from 5.0 Hz to higher frequencies up to 450 Hz [12]. Respiration movements, airflow, and other forms of SaO2 are low-frequency phenomena with activity ranging from 0.05 Hz to 0.35 Hz [13].
The general workflow in this field is shown in Figure 1 . Data scientists first extract the domain-specific features of PSG signals. Then, they use machine learning methods to classify them into non-arousal and arousal fragments.
Designing hand-made features and then finding the best combination of these features to improve the classifier performance are difficult and time-consuming, because the process requires extensive domain knowledge, such as feature selection or dimensionality reduction techniques. Even so, the automatic detection with manual feature extraction does not guarantee optimal identification for tasks.
Another obstacle for automatic detection with traditional machine learning methods is that the classifier needs to work for many different patients whose signals may have different relevant statistics. Therefore, the same algorithm can produce different results, depending on how its criteria match the data for a particular patient. Table 1 summarizes the different automatic or semi-automatic detection algorithms with wide spread machine learning methods.
Author (Year) [Reference] | Database | Data Preprocessing | Machine Learning Model | Results |
---|---|---|---|---|
Huupponen et al. (1996) [14] | Local dataset | FFT, average power | MLP | Accuracy = 41% |
Patanerli et al. (1999) [15] | Naya University | Wavelet transform, moving average, filter | SAS software; STEPDISC program | Sensitivity = 88.1%, Selectivity = 74.5% |
Gouveia et al. (2003) [16] | Local dataset | FFT, frequency analysis | A set of scoring rules | Detection rate = 70% |
Cho et al. (2005) [17] | South Korea’s Asan Medical Center | Filtering, power spectrum, FFT | SVM | Sensitivity = 75.26%, Specificity = 93.08% |
Agarwal et al. (2006) [18] | Local dataset (two patients) | Second-order adaptive filter, frequency, MAA, etc. | A set of decisional rules | Sensitivity = 76.15% |
David et al. (2006) [19] | National Institutes of Health (NIH) Sleep Disorders Research Plan | 1. Bi-directional recursive filtering, 2. peak detection 3. relative trough position |
Passive ballistocardiograph-based system | Sensitivity = 77.3%, Specificity = 96.2% |
Shmiel et al. (2009) [20] | Aviv’s Assuta Medical Center | FFT, critical points, etc. | Sequential pattern discovery field | Sensitivity = 75.2%, positive predictive value = 76.5% |
Foussier et al. (2013) [21] | Self-bulit database | HRV, MD, 72 features | Linear mixed mode | MD=1.16, χ2=16,633 |
Espiritu et al. (2015) [22] | Texas State Sleep Center | Savitzky-Golay filter, energy power/entropy, zero-crossing rate, etc. |
Decision tree | Accuracy = 81.63% |
Shahrbabaki et al. (2015) [23] | Self-bulit database (6 male, 3 female) |
Butterworth filter, Welch’s algorithm, 32 features |
KNN | Accuracy = 93.6% |
Wallant et al. (2016) [24] | Self-bulit database (35 healthy volunteers) | PSD, filtering data, segmentation, maximal amplitude, and slope | Adapted thresholds | Sensitivity = 83% |
Subramanian et al. (2018) [25] | PhysioNet 2018 | 28 features | GLM, RF | Highest AUROC = 0.847, highest AUPRC = 0.630 |
Ugur et al. (2019) [26] | SHHS | CWT | SVM | Accuracy = 98.2%, positive predictive value = 97.93% |
Liu et al. (2020) [27] | PhysioNet 2018 | ICA, double density DWT algorithm, FIR filter | CNN with RF | AUPRC = 0.552 |
MLP = multilayer perceptron neural network; SVM = support vector machine; MAA = maximum absolute amplitude; HRV = heart rate variability; RF = random forest; SCL = skin conductance level; GLM = generalized linear model; CWT = continuous wavelet transforms; ICA = independent component correlation algorithm; DWT = discrete wavelet transformation; AUROC = area under the receiver operating characteristic curve; AUPRC = area under the precision-recall curve.
Different from the manual feature extraction, neural networks can automatically learn variations and trends in the signal by carrying out feature extraction procedures through an abstract method. Deep learning methods possess the strong capability to learn complex features by directly applying them to raw data without extracting any hand-crafted features. Only recently have researchers begun to show a preference for deep learning methods, such as CNN [28][29][30][31], ResNet [32], the Siamese architecture network [30], RNN, and LSTM [33][34][35], over traditional machine learning methods in arousal detection.
The CNN makes it easier to extract different features of the input PSG data through convolution kernels. The models using CNN reviewed in this paper are summarized in Table 2 .
Author (Year) | Database | Preprocessing | Results |
---|---|---|---|
Dongya et al. (2018) [28] | PhysioNet 2018 | Welch algorithm | AUPRC = 0.114 |
Varga et al. (2018) [29] | PhysioNet 2018 | 68 features | AUPRC = 0.42 |
Patane et al. (2018) [30] | PhysioNet 2018 | Filter, data augmentation | AUPRC =0.40 |
Miller et al. (2018) [36] | PhysioNet 2018 | - | AUPRC = 0.37 |
Zabihi et al. (2018) [31] | PhysioNet 2018 | - | AUPRC = 0.31 |
Olesen et al. (2020) [37] | National Sleep Research Resource | Resampled, baseline model | F1-score = 0.682 |
Zhou et al. (2020) [38] | PhysioNet 2018 | Re-sample, Fourier transform | AUPRC= 0.39 |
Jia et al. (2020) [32] | Beijing Tongren Hospital | Down-sampled | Recall = 86.0% |
KSS = Karolinska sleepiness scale, F1-score = harmonic mean of precision and recall.
Common time series models include the RNN, LSTM, and bidirectional LSTM (Bi-LSTM). RNN and LSTM are networks that contain loops to connect previous information to current tasks. The models with LSTM reviewed in this paper are shown in Table 3 and Table 4.
Author (Year) | Database | Data Preprocessing | AUPRC |
---|---|---|---|
Warrick et al. (2018) [34] | PhysioNet 2018 | ST algorithm, logarithmic filters | 0.36 |
Már Þráinsson et al. (2018) [33] | PhysioNet 2018 | Energy, Hjorth parameters, WPD | 0.45 |
Kim et al. (2019) [35] | PhysioNet 2018 | MFCC | 0.458 |
ST = scattering transform; WPD = wavelet packet decomposition; MFCC = Mel-Frequency Cepstral Coefficient.
Author (Year) [Reference] | Database | Data Preprocessing | Model | AUPRC |
---|---|---|---|---|
Li et al. (2018) [39] | PhysioNet 2018 | Signal segmentation | CNN+BiLSTM | 0.42 |
Sridhar et al. (2018) [40] | PhysioNet 2018 | Feature time-series | LSTM | 0.573 |
Howe-Patterson et al. (2018) [41] | PhysioNet 2018 | FFT, down-sampled | DNN+BiLSTM | 0.54 |
Warrick et al. (2019) [42] | PhysioNet 2018 | - | ST-LSTM | 0.36 |
Achuth et al. (2019) [43] | Local dataset | Filters, RF | DNN+LSTM | 0.50 |
Author (Year) [Reference] |
Number of Channels | Model | AUPRC |
---|---|---|---|
Sridhar et al. (2018) [40] | 13 | CNN+RNN | 0.573 |
Howe-Patterson et al. (2018) [41] | 12 | CNN+LSTM | 0.54 |
Pourbabaee et al. (2019) [44] | 12 | DNN+LSTM | 0.543 |
Már Þráinsson et al. (2018) [33] | 13 | Bi-LSTM | 0.45 |
Li et al. (2018) [39] | 13 | DNN+LSTM | 0.43 |
Varga et al. (2018) [29] | 13 | CNN | 0.42 |
Patane et al. (2018) [30] | 5 | CNN | 0.40 |
Miller et al. (2018) [36] | 13 | CNN | 0.36 |
Warrick et al. (2018) [34] | 13 | RNN | 0.36 |
Zabihi et al. (2019) [31] | 5 | CNN | 0.31 |
Note: Submitted inside the time frame of the official phase of the 2018 PhysioNet Challenge. AUPRC is for their internal test set and the official blind test set.
CAP reflects the instability of sleep through EEG, which is accompanied by some dynamic events in the process of sleep (falling asleep, conversion of different sleep periods, and awakening in sleep). It is suggested that when there are external or internal sleep interference factors, the A1 subtype in CAP marks the brain’s efforts to continue to sleep. When sleep becomes increasingly unstable and the brain cannot maintain continuous sleep, EEG arousal will accompany or replace the slow activity with high amplitude. Therefore, A2 and A3 subtypes constitute the arousal of the central nervous system.
Methods for automated detection of CAP are listed in Table 6.
Table 6. Automated detection of CAP
Author(Year) [Reference] |
Database |
Data Preprocessing |
Model |
Results |
Mariani et al. (2012) [45] |
Parma Sleep Disorders Center |
Hjorth activity; EEG variance |
Discriminant classifier |
Accuracy=84.9% |
Chindhade et al. (2018) [46] |
CAP Sleep Database |
Differential moving average |
Logistic regression |
AUROC=0.512;Accuracy = 58% |
Hui et al.(2021) [47] |
CAP Sleep Database |
- |
CNN |
Sensitivity=80.29%;Accuracy = 74.43% |
Mendona et al.(2021) [108] |
CAP Sleep Database |
Lowpass filter |
LSTM |
Accuracy=81.3%;Sensitivity=73.7%;Specificity=81.7% |
Reliable diagnosis of arousal is the most essential prerequisite of sleep disorder treatment. The ‘gold standard’ for sleep disorders was developed manually by experienced experts, which is a time consuming and costly process. Accurate automated scoring models could assist doctors to identify medical images faster and more accurately, free doctors from tedious work, and ultimately improve the efficiency of laboratory and home sleep diagnostic methods.
This review showed that deep learning models can complete complex tasks, and are more accurate than traditional machine learning models. Deep learning has the powerful function of learning complex features by directly applying them to original data without extracting any manual features. Because the changes in various physiological parameters usually occur in a period of time before arousal, RNN and LSTM can learn the temporal relation in PSG signals. Therefore, using deep learning methods to detect the features of sleep arousals has become a mainstream trend in the field of PSG signals.