Automated Detection of Sleep Arousals from Polysomnography

Automated Detection of Sleep Arousals from Polysomnography: Comparison

Please note this is a comparison between Version 1 by Jianwei Shuai and Version 2 by Amina Yu.

Multiple types of sleep arousal account for a large proportion of the causes of sleep disorders. The detection of sleep arousals is very important for diagnosing sleep disorders and reducing the risk of further complications including heart disease and cognitive impairment. Sleep arousal scoring is manually completed by sleep experts by checking the recordings of several periods of sleep polysomnography (PSG), which is a time-consuming and tedious work. The development of efficient, fast, and reliable automatic sleep arousal detection system from PSG may provide powerful help for clinicians. This paper reviews the automatic arousal detection methods in recent years, which are based on statistical rules and deep learning methods.

sleep arousal
polysomnography (PSG)
machine learning
deep learning

1. Introduction

The appearance of sleep arousals (also known as microarousals) reflects the interruption and fragmentation of sleep and is a harbinger of the presence of somnipathy. Frequent microarousals can cause sleep disruption, sleep fragmentation, sleep disorder, aggravating daytime sleepiness, and other symptoms [1]. An increasing amount of evidence indicates that sleep arousals diseases are the concomitant symptoms of other diseases, including weight gain, depression, heart diseases, and diabetes. Therefore, advancing our current understanding of microarousals neurophysiology is not only a challenging research issue but also a public health issue.

Microarousals can also be spontaneous, caused by grinding teeth, partial airway obstruction, or even snoring [2]. A certain amount of spontaneous arousals seems to be an intrinsic part of physiological sleep ^[3][4][3,4], but excessive arousals can disrupt healthy sleep.

Polysomnography (PSG) collects all of the vital signs in a multidimensional time series. The vital signs include electroencephalogram (EEG), electromyography (EMG), electrocardiography (ECG), electrooculography (EOG), blood oxygen saturation level (SaO 2), respiratory airflow (airflow), and respiratory movement (chest ABD). Normal and abnormal brain activities are typically picked up by EEG. Some neuropathic disorders leave their signature on EEG ^[5][6][7][5,6,7]. PSG is the gold standard for detecting sleep disorders.

The physiological band of interest for PSG signals usually ranges from 0.01 to several hundred cycles per second. The lowest band in conventional EEG studies has a lower limit of 0.5 Hz or 1.0 Hz as the ‘slow frequency’ and ‘sub-slow’ EEG bands, while 100 Hz corresponds to the highest frequency of the EEG band ^[8][9][8,9]. The ECG spectrum is generally considered to be 0.05–100 Hz [10]. Jarvis et al. [11] suggested that ECG frequency associated with sleep apnea can be reduced to 0.02 Hz. EMG ranges from 5.0 Hz to higher frequencies up to 450 Hz [12]. Respiration movements, airflow, and other forms of SaO2 are low-frequency phenomena with activity ranging from 0.05 Hz to 0.35 Hz [13].

2. Micro arousal Detection with Traditional Machine Learning Methods

The general workflow in this field is shown in Figure 1 . Data scientists first extract the domain-specific features of PSG signals. Then, they use machine learning methods to classify them into non-arousal and arousal fragments.

Figure 1. General workflow of sleep arousal detection models with machine learning.

Designing hand-made features and then finding the best combination of these features to improve the classifier performance are difficult and time-consuming, because the process requires extensive domain knowledge, such as feature selection or dimensionality reduction techniques. Even so, the automatic detection with manual feature extraction does not guarantee optimal identification for tasks.

Another obstacle for automatic detection with traditional machine learning methods is that the classifier needs to work for many different patients whose signals may have different relevant statistics. Therefore, the same algorithm can produce different results, depending on how its criteria match the data for a particular patient. Table 1 summarizes the different automatic or semi-automatic detection algorithms with wide spread machine learning methods.

Table 1. Various studies conducted on the automated detection of microarousal regions in PSG signals using traditional machine learning methods.

Author (Year) [Reference]	Database	Data Preprocessing	Machine Learning Model	Results
Huupponen et al. (1996) ^[14]	Huupponen et al. (1996) [43]	Local dataset	FFT, average power	MLP	Accuracy = 41%
Patanerli et al. (1999) ^[15]	Patanerli et al. (1999) [63]	Naya University	Wavelet transform, moving average, filter	SAS software; STEPDISC program	Sensitivity = 88.1%, Selectivity = 74.5%
Gouveia et al. (2003) ^[16]	Gouveia et al. (2003) [39]	Local dataset	FFT, frequency analysis	A set of scoring rules	Detection rate = 70%
Cho et al. (2005) ^[17]	Cho et al. (2005) [41]	South Korea’s Asan Medical Center	Filtering, power spectrum, FFT	SVM	Sensitivity = 75.26%, Specificity = 93.08%
Agarwal et al. (2006) ^[18]	Agarwal et al. (2006) [37]	Local dataset (two patients)	Second-order adaptive filter, frequency, MAA, etc.	A set of decisional rules	Sensitivity = 76.15%
David et al. (2006) ^[19]	David et al. (2006) [36]	National Institutes of Health (NIH) Sleep Disorders Research Plan	1. Bi-directional recursive filtering, 2. peak detection 3. relative trough position	Passive ballistocardiograph-based system	Sensitivity = 77.3%, Specificity = 96.2%
Shmiel et al. (2009) ^[20]	Shmiel et al. (2009) [42]	Aviv’s Assuta Medical Center	FFT, critical points, etc.	Sequential pattern discovery field	Sensitivity = 75.2%, positive predictive value = 76.5%
Foussier et al. (2013) ^[21]	Foussier et al. (2013) [38]	Self-bulit database	HRV, MD, 72 features	Linear mixed mode	$MD = 1.16, χ^{2} = 16, 633$


Espiritu et al. (2015) ^[22]	Espiritu et al. (2015) [40]	Texas State Sleep Center	Savitzky-Golay filter, energy power/entropy, zero-crossing rate, etc.	Decision tree	Accuracy = 81.63%
Shahrbabaki et al. (2015) ^[23]	Shahrbabaki et al. (2015) [44]	Self-bulit database (6 male, 3 female)	Butterworth filter, Welch’s algorithm, 32 features	KNN	Accuracy = 93.6%
Wallant et al. (2016) ^[24]	Wallant et al. (2016) [45]	Self-bulit database (35 healthy volunteers)	PSD, filtering data, segmentation, maximal amplitude, and slope	Adapted thresholds	Sensitivity = 83%
Subramanian et al. (2018) ^[25]	Subramanian et al. (2018) [65]	PhysioNet 2018	28 features	GLM, RF	Highest AUROC = 0.847, highest AUPRC = 0.630
Ugur et al. (2019) ^[26]	Ugur et al. (2019) [66]	SHHS	CWT	SVM	Accuracy = 98.2%, positive predictive value = 97.93%
Liu et al. (2020) ^[27]	Liu et al. (2020) [64]	PhysioNet 2018	ICA, double density DWT algorithm, FIR filter	CNN with RF	AUPRC = 0.552

MLP = multilayer perceptron neural network; SVM = support vector machine; MAA = maximum absolute amplitude; HRV = heart rate variability; RF = random forest; SCL = skin conductance level; GLM = generalized linear model; CWT = continuous wavelet transforms; ICA = independent component correlation algorithm; DWT = discrete wavelet transformation; AUROC = area under the receiver operating characteristic curve; AUPRC = area under the precision-recall curve.

3. Microarousal Detection with Deep Learning Methods

Different from the manual feature extraction, neural networks can automatically learn variations and trends in the signal by carrying out feature extraction procedures through an abstract method. Deep learning methods possess the strong capability to learn complex features by directly applying them to raw data without extracting any hand-crafted features. Only recently have researchers begun to show a preference for deep learning methods, such as CNN ^{[28][29][30][31]}[68,69,70,71], ResNet ^[32][48], the Siamese architecture network ^[30][70], RNN, and LSTM ^[33][34][35][59,72,73], over traditional machine learning methods in arousal detection.

The CNN makes it easier to extract different features of the input PSG data through convolution kernels. The models using CNN reviewed in this paper are summarized in Table 2 .

Table 2. Detailed information of models using the CNN.

Author (Year)	Database	Preprocessing	Results
Dongya et al. (2018) ^[28]	Dongya et al. (2018) [68]	PhysioNet 2018	Welch algorithm	AUPRC = 0.114
Varga et al. (2018) ^[29]	Varga et al. (2018) [69]	PhysioNet 2018	68 features	AUPRC = 0.42
Patane et al. (2018) ^[30]	Patane et al. (2018) [70]	PhysioNet 2018	Filter, data augmentation	AUPRC =0.40
Miller et al. (2018) ^[36]	Miller et al. (2018) [92]	PhysioNet 2018	-	AUPRC = 0.37
Zabihi et al. (2018) ^[31]	Zabihi et al. (2018) [71]	PhysioNet 2018	-	AUPRC = 0.31
Olesen et al. (2020) ^[37]	Olesen et al. (2020) [47]	National Sleep Research Resource	Resampled, baseline model	F1-score = 0.682
Zhou et al. (2020) ^[38]	Zhou et al. (2020) [93]	PhysioNet 2018	Re-sample, Fourier transform	AUPRC= 0.39
Jia et al. (2020) ^[32]	Jia et al. (2020) [48]	Beijing Tongren Hospital	Down-sampled	Recall = 86.0%

KSS = Karolinska sleepiness scale, F1-score = harmonic mean of precision and recall.

Common time series models include the RNN, LSTM, and bidirectional LSTM (Bi-LSTM). RNN and LSTM are networks that contain loops to connect previous information to current tasks. The models with LSTM reviewed in this paper are shown in Table 3 and Table 4.

Table 3. Comparison of LSTM-based approaches.

Author (Year)	Database	Data Preprocessing	AUPRC
Warrick et al. (2018) ^[34]	Warrick et al. (2018) [72]	PhysioNet 2018	ST algorithm, logarithmic filters	0.36
Már Þráinsson et al. (2018) ^[33]	Már Þráinsson et al. (2018) [59]	PhysioNet 2018	Energy, Hjorth parameters, WPD	0.45
Kim et al. (2019) ^[35]	Kim et al. (2019) [73]	PhysioNet 2018	MFCC	0.458

ST = scattering transform; WPD = wavelet packet decomposition; MFCC = Mel-Frequency Cepstral Coefficient.

Table 4. Analysis of application of CNN+LSTM in sleep arousal.

Author (Year) [Reference]	Database	Data Preprocessing	Model	AUPRC
Li et al. (2018) ^[39]	Li et al. (2018) [97]	PhysioNet 2018	Signal segmentation	CNN+BiLSTM	0.42
Sridhar et al. (2018) ^[40]	Sridhar et al. (2018) [98]	PhysioNet 2018	Feature time-series	LSTM	0.573
Howe-Patterson et al. (2018) ^[41]	Howe-Patterson et al. (2018) [100]	PhysioNet 2018	FFT, down-sampled	DNN+BiLSTM	0.54
Warrick et al. (2019) ^[42]	Warrick et al. (2019) [99]	PhysioNet 2018	-	ST-LSTM	0.36
Achuth et al. (2019) ^[43]	Achuth et al. (2019) [102]	Local dataset	Filters, RF	DNN+LSTM	0.50

Table 5 provides the models and results of the teams participating in the PhysioNet 2018 Computational Challenge in Cardiology.

Table 5. Comparison of detection of non-apnea/hypopnea sleep arousal in PhysioNet 2018 Computational Challenge.

	Author (Year) [Reference]		Number of Channels	Model
Sridhar et al. (2018) ^[40]	Sridhar et al. (2018) [98]	13	CNN+RNN	0.573
Howe-Patterson et al. (2018) ^[41]	Howe-Patterson et al. (2018) [100]	12	CNN+LSTM	0.54
Pourbabaee et al. (2019) ^[44]	Pourbabaee et al. (2019) [101]	12	DNN+LSTM	0.543
Már Þráinsson et al. (2018) ^[33]	Már Þráinsson et al. (2018) [59]	13	Bi-LSTM	0.45
Li et al. (2018) ^[39]	Li et al. (2018) [97]	13	DNN+LSTM	0.43
Varga et al. (2018) ^[29]	Varga et al. (2018) [69]	13	CNN	0.42
Patane et al. (2018) ^[30]	Patane et al. (2018) [70]	5	CNN	0.40
Miller et al. (2018) ^[36]	Miller et al. (2018) [92]	13	CNN	0.36
Warrick et al. (2018) ^[34]	Warrick et al. (2018) [72]	13	RNN	0.36
Zabihi et al. (2019) ^[31]	Zabihi et al. (2019) [71]	5	CNN	0.31

Note: Submitted inside the time frame of the official phase of the 2018 PhysioNet Challenge. AUPRC is for their internal test set and the official blind test set.

4. Automated Detection of CAP

CAP reflects the instability of sleep through EEG, which is accompanied by some dynamic events in the process of sleep (falling asleep, conversion of different sleep periods, and awakening in sleep). It is suggested that when there are external or internal sleep interference factors, the A1 subtype in CAP marks the brain’s efforts to continue to sleep. When sleep becomes increasingly unstable and the brain cannot maintain continuous sleep, EEG arousal will accompany or replace the slow activity with high amplitude. Therefore, A2 and A3 subtypes constitute the arousal of the central nervous system.

Methods for automated detection of CAP are listed in Table 6.

Table 6. Automated detection of CAP

Author(Year) [Reference]	Database	Data Preprocessing	Model	Results
Mariani et al. (2012) ^[45][107]	Parma Sleep Disorders Center	Hjorth activity; EEG variance	Discriminant classifier	Accuracy=84.9%
Chindhade et al. (2018) ^[46][105]	CAP Sleep Database	Differential moving average	Logistic regression	AUROC=0.512; Accuracy = 58%
Hui et al.(2021) ^[47][106]	CAP Sleep Database	-	CNN	Sensitivity=80.29%; Accuracy = 74.43%
Mendona et al.(2021) [108]	CAP Sleep Database	Lowpass filter	LSTM	Accuracy=81.3%; Sensitivity=73.7%; Specificity=81.7%

5. Conclusion

Reliable diagnosis of arousal is the most essential prerequisite of sleep disorder treatment. The ‘gold standard’ for sleep disorders was developed manually by experienced experts, which is a time consuming and costly process. Accurate automated scoring models could assist doctors to identify medical images faster and more accurately, free doctors from tedious work, and ultimately improve the efficiency of laboratory and home sleep diagnostic methods.

This review showed that deep learning models can complete complex tasks, and are more accurate than traditional machine learning models. Deep learning has the powerful function of learning complex features by directly applying them to original data without extracting any manual features. Because the changes in various physiological parameters usually occur in a period of time before arousal, RNN and LSTM can learn the temporal relation in PSG signals. Therefore, using deep learning methods to detect the features of sleep arousals has become a mainstream trend in the field of PSG signals.

1. Introduction

2. Micro arousal Detection with Traditional Machine Learning Methods

3. Microarousal Detection with Deep Learning Methods

Author (Year) [Reference]

4. Automated Detection of CAP

Author(Year) [Reference]

Database

Data Preprocessing

Model

Mariani et al. (2012) [45][107]

Parma Sleep Disorders Center

Hjorth activity; EEG variance

Discriminant classifier

Accuracy=84.9%

Chindhade et al. (2018) [46][105]

CAP Sleep Database

Differential moving average

Logistic regression

AUROC=0.512;

Accuracy = 58%

Hui et al.(2021) [47][106]

CAP Sleep Database

-

CNN

Sensitivity=80.29%;

Accuracy = 74.43%

Mendona et al.(2021) [108]

CAP Sleep Database

Lowpass filter

LSTM

Accuracy=81.3%;

Sensitivity=73.7%;

Specificity=81.7%

5. Conclusion

Mariani et al. (2012) ^[45][107]

Chindhade et al. (2018) ^[46][105]

Hui et al.(2021) ^[47][106]