Sleep apnea detection can be performed with externally mounted devices or ambient sensors, other than biomedical sensors. One such technique for sleep apnea detection is based on smartphones.
Sleep apnea is a sleep disorder in which a sleeping person’s breathing is disturbed. It is prevalent in adults as well as a small percentage of the juvenile population . Subjects suffering from sleep apnea undergo periods of no or shallow breathing during their sleep. The former condition in which breathing stops temporarily is referred to as apnea, while the latter condition of periods of shallow breathing or airflow reduction is called hypopnea. Clinical comorbidities can result from either condition and, therefore, both are detrimental to a person’s well-being . The physiological symptoms of sleep apnea include snoring, gasping for air during sleep, waking up with dry mouth and, in general, low sleep quality, thereby leading to low attention, insomnia, decrease in cognitive skills, accidents, memory loss and depression. In addition to the low quality of life caused by sleep deprivation and fatigue, sleep apnea may also lead to severe issues such as diabetes, cardiovascular problems, hypertension, neurological issues, and liver problems. Due to the global prevalence of sleep apnea as well as the direct and indirect long-term problems it brings about, it is important to diagnose and treat this condition. In this paper, we review the recent state-of-the-art research in the application of machine learning for sleep apnea detection. The review covers the parameters and sensors used, and feature engineering approaches for enabling sleep apnea detection using machine learning.
There are three types of sleep apnea: Obstructive sleep apnea (OSA) occurs due to improper functioning of the upper respiratory tract. When the muscles of the hard palate in the back of the throat that supports that soft palate relax, the soft palate blocks the passage of air to the respiratory system. This leads to stoppage of breathing for short durations . Central sleep apnea (CSA) occurs when the brain fails to generate or transmit signals that control breathing muscles. This leads to short durations of time when the subject does not breathe at all. Complex sleep apnea syndrome is manifested with central apnea persisting even after obstructive events have disappeared with PAP therapy . Javaheri et al.  describe the etiological risk factors for sleep apnea and its consequences.
The common set of biomedical parameters that is used to detect sleep apnea include SpO2, heart rate, ECG and EEG. Biomedical informaticians have used various machine learning techniques to predict the accuracy of sleep apnea diagnosis using these aforementioned parameters and their derivatives. Of late, the effectiveness of ensemble classifiers and deep learning techniques has also been investigated. The features used for sleep apnea detection could be reported directly from sensors, or extracted from various sensor observations. There has also been extensive research into utilizing observations from one or more of these sensors using data fusion to detect sleep disorders. Studies also include the impact of extracting statistical, time and frequency domain features from the parameters, and performing dimensionality reduction to downsize the feature vectors on the classifier performance. The following paragraphs provide examples of how classic machine learning, deep learning, and sensor fusion techniques have been applied to detect sleep apnea. Deep learning can be considered as a specialized segment of machine learning; however, the manner in which feature engineering is accomplished differs greatly from each other.
In many research papers, single biomedical markers, such as SPO2, ECG, EOG, or EEG, have been used for the detection of sleep apnea. Among these, most studies focus on using SPO2 and ECG signals. For example, in , SPO2 signals are used for OSA detection. During feature engineering, ODI, total time below saturation levels (tsa), and other six features were extracted from SPO2. Various variants of decision tree (DT) classifiers were used to obtain an accuracy of 93%. In  too, pulse oximeter parameters are used for sleep apnea detection.
ECG is another parameter that is commonly used in the detection of sleep apnea. Hassan et al.  compare various machine learning classifiers on a dataset generated by a single lead ECG sensor. Statistical moment-based and empirical mode decomposition features were extracted from the raw data. Post feature extraction, Naive Bayes, k-nearest neighbor (kNN), neural network, AdaBoost, Bagging, random forest, extreme learning machine (ELM), discriminant analysis (DA) and restricted Boltzmann machine were compared for performance. ELM gave the best accuracy of 83.77%. A dataset based on single-lead ECG was used in  as well to detect sleep apnea. In this study, segments of ECG signals were fed into dual-tree complex wavelet transform (DTCWT) to generate frequency sub-bands. Three statistical features—variance, skewness, and kurtosis—were extracted from the DTCWT output and analyzed to determine their suitability in detecting sleep apnea. LogitBoost gave an accuracy of 84.4%. Other classifiers analyzed include DA, kNN, Artificial Neural Network (ANN), ELM, SVM, AdaBoost and Bagging. ECG signals have also been used not just for the detection of sleep apnea, but also to determine its type .
In , IHR is used as the sole marker for sleep apnea detection. This paper argues that using only IHR and its derivatives can provide 85% accuracy at best, with simple classification algorithms for classifying minute-to-minute apnea. Therefore, LSTM–RNN was employed for the identification of sleep apnea and its severity. Various configurations of LSTM–RNN, post feature extraction and selection, were used for training, which yielded 99.99% accuracy in detecting sleep apnea. Erdenebayar et al.  describe a comparative study of the performance of deep learning classifiers on ECG signals—the classifiers are Deep Neural Network (DNN), 1D CNN, 2D CNN, RNN, LSTM and gated-recurrent unit model (GRU). The 1D CNN and GRU models were the best performing with an accuracy and recall of 99%. Other studies include .
In , Prabha et al. make use of HRV and Respiratory Rate Variability (RRV) from ECG and respiratory effort signals (RES), respectively. A decision making system which fuses time-domain features from HRV and RRV signals, by combining their outputs with empirically calculated weights, produced an accuracy of 100%. The weight associated with time-domain HRV features was considerably higher than that of time-domain RRV features, which indicates that HRV has a higher correlation with sleep apnea detection than RRV, although the latter may be complementing the former. This analysis concludes that the time-domain features of HRV and RRV provide sufficient information to detect OSA. Other related studies include .
In addition to devices that measure biomedical parameters, studies show the application of environmental sensors/devices such as microphones and cameras to ascertain the presence of sleep apnea. Literature also shows the application of health profiles to detect apnea and predict the AHI values to classify the severity of apneic events.
One such technique for sleep apnea detection is based on smartphones. Camcı et al.  use sonar waves generated by smart phones, which give information about chest movements, to detect sleep apnea. The accuracy of the system was found to be dependent on the subject’s change of sleep position. Other techniques such as placing a microphone close to the subject’s nose and mouth were found to be obtrusive and impacting the sleep behavior of the subjects . Another technique relies on the use of a 3D time-of-flight camera, which records the subject’s respiratory motion . The signals pertaining to respiratory movement of abdominal muscles are analyzed to monitor sleep stages and detect apnea. Davidovich et al.  propose a novel algorithm for sleep apnea screening with a contact-free system based on a piezo-electric sensor. The setup consisted of a piezo-electric sensor, which recorded a combination of gross body motion, rib cage movements, and the cardioballistic effect. The specificity and sensitivity were found to be 89% and 88%, respectively.
Non-wearable techniques for sleep apnea detection have certain advantages and disadvantages when compared with wearable devices. For example, wearable devices for sleep apnea detection have to be small in form factor and light-weight, while non-wearable techniques such as BCG-embedded beds or camera based systems do not have restrictions on their size or form factor. Another characteristic of comparison between wearable and non-wearable techniques is power consumption. Minimizing power consumption enables the wearable device to be on battery power for longer durations, which reduces the overhead of charging the devices. Power consumption of such devices occurs in three activities—sensing, processing, and communication. These three functions have to be optimized for energy saving to enable the device to be worn for long periods of time without recharging. In contrast, non-wearable devices can be connected to the main power supply, and hence need not be designed for optimized power consumption. One significant factor that affects the accuracy of sleep apnea detection in both techniques, is the placement of the sensors. Wearable devices allow round-the-clock monitoring of parameters since it does not restrict the parameter collection to a certain geographical region under study. However, non-wearable devices are sensitive to the sensing range of the devices. Environmental sensor-based systems also sometimes tend to be intrusive—for example, placing a microphone close to a subject’s face while sleeping could be uncomfortable for him/her. Camera-based systems may tend to be expensive and have higher power and bandwidth requirements. Due to all these aspects, wearable devices may be conducive to at-home sleep monitoring, while non-wearable techniques may be applied in hospital environments where the mobility of the subjects is more constrained.
There has been research that highlights the significance of including a subject’s health profile in the diagnosis of sleep apnea and its severity. Mencar et al.  use 19 features including heart disease, diabetes, gender, BMI, age, smoking, hypertension and snoring, to explore methods to classify sleep apnea severity. Classification algorithms are applied to classify the severity of sleep apnea, and regression methods are applied to predict the AHI values. In another work, Ustun et al.  argue that medical information of subjects would be more suited to diagnose sleep apnea than real time sleep related symptoms. Features such as age, gender, BMI, presence of hypertension, history of heart failure, stroke, asthma, smoking, and snoring were used to train the classifiers. Seven classifiers including variants of Logistic regression, DT, and SVM were compared with a new machine learning model named SLIM (Supersparse Linear Integer Models). SLIM is a linear classification model for creating medical scoring systems, and this gave a sensitivity of 64.2% and specificity of 77%. The study supports the use of simple models with good generalization capabilities, especially for medical applications where datasets are prone to overfitting.
In this study, we briefly summed up the causes and risks associated with sleep apnea, and the drawbacks of the related diagnostic processes. We outlined the parameters that help detect apneic events. Subsequently, we examined the application of machine learning in sleep apnea detection, with focus on wearable systems. We summarized the recent research that demonstrates feature engineering techniques and efficient use of classic machine learning, deep learning, and sensor/feature fusion algorithms to detect sleep apnea, and in some cases, classify its severity, using biomedical markers such as ECG, EEG and SPO2. The paper also briefly looked at the application of environmental sensors and information in subjects’ health profiles to ascertain the presence of sleep apnea.
From our analysis, an observation is that machine learning algorithms applied to datasets in the literature survey, produce varying degrees of accuracy. This indicates that the performance of the algorithms depends on various factors such as:
(i) Data collection modalities
Factors such as type of sensors, their placement, and frequency and sensitivity of measurements, affect the training of machine learning classifiers. Among the various biomedical parameters that aid in the detection of sleep apnea, we observe that the most common of them are those from ECG, SPO2, and EEG signals. The drawback of using ECG is that the signals generated by three leads or more require a resting ECG or an ECG Holter monitor, which may be restrictive for the subject under study because of the placement of leads. Single lead ECG can be embedded within wearable devices; however, the accuracy of such devices is less than those with multilead devices. Collection of EEG data also requires the subjects to wear a headgear while sleeping, which may cause inconvenience. SPO2 sensors, such as single lead ECG sensors, can be embedded within wearable devices and, in combination with the demographic information of subjects, has been proven to provide good results in the detection of sleep apnea. Environmental sensors may constrain the subjects to a certain area under observation while sleeping (such as bed-embedded BCG sensors). Some may introduce noise in the data collection, for example, acoustic sensors are prone to errors from ambient noise.
(ii) Dataset characteristics
Characteristics of data such as its distribution and dataset features, along with the pre-processing that has been applied to it also influences the efficiency of supervised training techniques. For a classifier to be well-trained, the dataset it trains on must be balanced. In the case of sleep apnea, it has to be ensured that the number of apneic events in the dataset are comparable with that of non-apneic events. In the absence of this, the classifier gets trained for the majority classes and misclassifies the minority classes. Additionally, appropriate data pre-processing techniques and feature engineering should be performed to fine tune the classifier training.
(iii) Labelling techniques
Training machine learning models for sleep apnea detection using supervised learning techniques, requires annotation of the records in the sleep dataset. Some of the standards used in sleep stage scoring from sleep study reports are the Rechtschaffen and Kales standard (R&K)  and American Academy of Sleep Medicine (AASM) . In practice, apneic events are annotated manually by domain experts. The process involves correlation of the subject’s biomedical and physiological history with the sleep data, while adhering to the guidelines set forth by the standards. The dependency of annotation on the standards and subjective domain expertise may limit the generalization capability of the trained model.
The capability of a wearable device or an end-to-end system to store data for analysis, raise alarms on detection of abnormalities, and generate reports long-term is especially useful in the context of geriatric care homes. Today, there are commercial devices that synchronize collected data to a smartphone periodically; however, a drawback of such a system is that at any given time, the device can be paired with only a single smartphone. The ability to support data collection and analysis at a central location would be especially beneficial in geriatric healthcare, where elderly people are saved the effort required to access and view their own reports.