Real-Time Deep Learning-Based Drowsiness Detection: History
Please note this is an old version of this entry, which may differ significantly from the current revision.

Drowsy driving can significantly affect driving performance and overall road safety. Statistically, the main causes are decreased alertness and attention of the drivers. The combination of deep learning and computer-vision algorithm applications has been proven to be one of the most effective approaches for the detection of drowsiness. Robust and accurate drowsiness detection systems can be developed by leveraging deep learning to learn complex coordinate patterns using visual data. Deep learning algorithms have emerged as powerful techniques for drowsiness detection because of their ability to learn automatically from given inputs and feature extractions from raw data. Eye-blinking-based drowsiness detection was applied, which utilized the analysis of eye-blink patterns.

  • drowsiness detection
  • eye tracking
  • blinking
  • deep learnin

1. Introduction

Driver drowsiness detection is a significant concern for road safety. Drowsy driving is defined as driving when there are symptoms of fatigue, sleepiness, or an inability to maintain alertness. There are several factors underlying the feeling of drowsiness such as a lack of sleep, long driving hours, and monotonous conditions. The eye-blinking-based method is a promising approach to detect driver drowsiness. This process involves monitoring the pattern and frequency of eye-blinks while driving [1,2]. Eye-blinks are a good indicator of a driver’s level of alertness and are based on the frequency and pattern changes of eye-blinks with respect to the driver’s condition. A decrease in the frequency of eye-blinks or an increase in the duration of eye closure can be indicators of driver drowsiness. By monitoring these variable changes, it is possible to determine whether a driver is at risk of falling asleep at the wheel. This information can be captured using various systems and sensor technologies such as in-vehicle cameras or wearable devices to provide real-time monitoring of driver alertness and identify a person’s condition. The eye-blinking-based method is non-intrusive and can be used to detect driver drowsiness. The implementation of this method can significantly improve road safety by reducing the number of accidents [3,4,5].
Several studies have discussed methods to detect driver drowsiness based on eye-blink analyses, including the implementation of computer-vision techniques, machine-learning algorithms, and physiological signals [6,7,8,9,10]. Ref. [6] proposed the use of eye-blinks and yawning features to detect driver drowsiness. The authors investigated a system that captured video footage of a driver and analyzed the frequency and duration of eye-blinks and yawns to determine the level of drowsiness. They applied computer-vision techniques to extract relevant features from video sequences of drivers. The main contribution of this study was the integration of eye-blink and yawning features for the detection of drowsiness. One study found that a combination of these two features provided a more reliable indication of drowsiness than the solo implementation of eye-blink features. This was probably because yawning is a well-known physiological response to drowsiness and can provide complementary information for an eye-blink analysis. Ref. [6] evaluated the performance of their method using a dataset of driver video sequences and found that the proposed method could accurately detect drowsiness with a high degree of precision. The findings of [1] suggested that this method had the potential to be used in real-world applications to improve road safety and reduce the risk of drowsy driving accidents. Moon et al. conducted research on this topic using a different approach. They proposed a CNN-based method to automatically detect and analyze eye-blink frequency counts in real-time video sequences of drivers. The main contribution of this research was the use of CNNs for the detection of eye-blink-based driver drowsiness. The proposed method was trained using a large dataset of eye-blink features from both drowsy and non-drowsy drivers. The authors found that the proposed CNN-based method could accurately detect drowsiness with a high degree of precision compared with traditional drowsiness detection methods. The performance of the proposed method was evaluated under various scenarios. These included changes in lighting conditions and driver behavior. Jain et al. [8] investigated a method to detect driver drowsiness based on an analysis of eye-blink features and used support vector machine (SVM) classification to determine the level of drowsiness. An SVM can be an effective approach for eye-blink detection because it can handle non-linear boundary problems and is robust against overfitting. However, to maintain the superior performance of the SVM classifier in eye-blink detection, it is important to perform high-quality feature extractions from eye images and obtain details of the size of the training dataset and the choice of SVM hyperparameters.

2. EOG

Eye-movement detection refers to the process of detecting and measuring eye movements. Eye movements provide important information regarding the attention, perception, and cognitive processes of participants by analyzing saccades, fixations, and smooth pursuit. Eye-movement detection can be applied to the study of visual and auditory processing, learning and memory, and other behavioral aspects of human performance. Eye-movement detection can be achieved using several techniques, including EOG, infrared (IR) eye trackers, video-based eye trackers, and dual-purpose tracking.

This is a non-invasive method that measures the electrical potential difference between the cornea and retina to detect eye movements. This difference arises because the cornea is positively charged with respect to the retina and the overall potential difference changes when the eye moves. The electrical signal generated by eye movements can be used to detect the direction and magnitude of eye movements and can also be applied to study the underlying physiological processes with respect to eye-movement control. EOG is widely applied in psychology, neuroscience, and ophthalmology for visual perception attention and cognitive processes. Moreover, EOG can be applied to human–computer interactions, where eye movements control the computer cursor or interact with graphical interfaces.
A few studies [12,13,14] have focused on the EOG technique, in which the authors focused on eye-movement-related research. In [12], an overview of the history, basic principles, and applications of EOG is provided, together with some limitations of the EOG technique for eye-movement measurements. In [13], the development of a new EOG recording system and its application to the study of eye movements are described. In [14], a new low-noise amplifier for EOG signals is presented and its performance is compared with those of other amplifier designs. They evaluate the performance of the amplifier in terms of its noise, bandwidth, and distortion and then compare it with other amplifier designs that are commonly utilized for EOG signals. The experimental results indicated that the proposed amplifier design provided better performance than existing designs in terms of a lower noise, a wider bandwidth, and lower distortion rates. The origin of the EOG as a technique to measure eye movements can be traced back to several pioneering studies. In 1950, Harold Hoffman first developed the EOG system and used it to study eye movements and visual perception. Steiner extended Hoffman’s work and developed a more sophisticated EOG system capable of measuring either horizontal or vertical movements. In the 1960s, Bowen et al. developed a mathematical model of EOG that enabled a more accurate and detailed description of the electrical signals generated by eye movements. Currently, EOG is widely used in several fields and plays a crucial role in advancing our knowledge of visual perception, attention, and cognitive processes.

3. Infrared (IR) Eye Tracking

IR eye-tracking techniques involve the use of infrared light to illuminate the eyes and a camera to capture the reflection of light in the eyes. IR works by using an infrared light source in the eyes and capturing the reflection using a camera. IR eye tracking is widely used in fields such as psychology, HCI, market research, and gaming. This is because IR eye-tracking techniques are non-intrusive and can be used with a wide range of subjects such as those wearing glasses and contact lenses [15].

4. Video-Based Eye Tracking

In computer-vision-based eye tracking and eye detection from an input image, localization is considered to be the main area. These two areas are challenging because there are several issues associated with eye detection and tracking such as the degree of eye openness or the variability of eye sizes in target objects. Computer-vision-based methods use video cameras to capture images of the eyes and analyze them to determine eye movements. Cameras can be set up to track eye movements in real-time or in laboratory settings. The basic idea of this method is to use image processing techniques to detect and track the position of the pupils in a video stream and then use the acquired information to infer the direction of gaze. Generally, there are two types of video-based eye-tracking systems, remote and wearable. Regarding the remote-tracking type, eye tracking uses a camera placed at a certain distance from the participant and then records their eye movements at the same time. In the wearable eye-tracking type, a camera is attached to a headset or glasses that the participant wears and then records the participant’s eye movements from a relatively greater proximity. The benefit of this technique is that a more complete picture of the gaze behavior is obtained. In recent years, several researchers [14,15,16,17,18] have discussed video-based eye-tracking movements by analyzing eye recordings. In [16], a method for real-time video-based eye tracking using deep learning algorithms that combined CNNs and RNNs to accurately track eye positions in real time was proposed. The authors evaluated their system using a publicly available dataset. The experimental results showed that the proposed method outperformed traditional approaches in terms of accuracy, speed, and robustness. Eye-tracking pattern recognition has achieved remarkable results. The authors of [19] proposed a principal component analysis (PCA) model to identify six principal components. The identified components were used to reduce the dimensionality of the image pixels. The authors used an ANN to classify pupil positions where calibration was required to observe five varied points that all represented different pupil positions. Lui et al. [20] presented both eye-detection and -tracking methods. Researchers have applied the Viola–Jones face detector Haar features to locate the face in an input image. The template matching (TM) method was used for eye detection. TM is widely used in computer vision because it can be applied to object recognition, image registration, and motion analysis. For eye-detection purposes, a similarity measure was computed based on metrics such as cross-correlation, mean-squared error, and normalized correlation. The authors of [20] used different methods for eye detection and tracking such as Zernike moments (ZMs) to extract the rotation invariant of eye characteristics and an SVM for eye or non-eye classification.

5. Dual-Purpose Tracking

This method combines infrared and video-based tracking techniques to improve the accuracy of eye-movement detection. Infrared tracking provides highly accurate eye positions, whereas video-based tracking provides additional information regarding eye appearance and motion. Huang et al. [21] presented an algorithm to detect eye pupils based on eye intensity, size, and shape. In this study, the intensity of the eye pupil was applied as the main feature in pupil detection and the SVM identified the location of the eye. In pupil-fitting, corneal reflection and energy-controlled iterative curve-fitting methods are efficient approaches, as reported by Li and Wee [22]. For pupil boundary detection, an ellipse-fitting algorithm can be used, which is controlled by the energy function. In the ellipse-fitting process, the task is to find the best fit for a given data point and to minimize the distance between the input data points and the sum of the squared distances.

6. Yawning-Based Drowsiness Detection

Yawning is a physical reflex lasting 4–7 s of gradual mouth gaping and a rapid expiratory phase with muscle relaxation [23]. As it is a natural physiological response to tiredness, it is widely used in research to identify drowsy drivers. The authors of [24] proposed a method based on tracking the condition of the driver’s mouth and recognizing the yawning state. They implemented a cascade-boosted classifier for Haar wavelet features on several different scales if those positions were measured by Canny integral images. The AdaBoost algorithm was used for feature selection and localization. To determine the yawning condition of the driver, an SVM was applied to the model prediction in the case of the data instances for model testing. For this, the SVM was trained using mouth- and yawning-related images by transforming the data and scaling was performed on the data by applying the radial function kernel. In [25], the researchers proposed a yawning detection approach that included measurements of eye-closure duration. They calculated the eye state and yawning coordinate measurements based on mouth conditions. For mouth detection, [25] used a spatial fuzzy c-means (s-FCM) clustering method.
The authors also applied an SVM for drowsiness detection; the input of the SVM was the width-to-height ratios of the eyes and mouth. Calculations were performed based on the state of the eye and mouth of the driver such as whether the eye was half-open or closed or yawning or not. The final classification results were used to determine whether the driver was in a dangerous condition. Ying et al. [26] determined driver fatigue or drowsiness by monitoring variances in eye and mouth positions. This method relied on skin-color extraction. To find the state of the moving object, a back propagation (BP) neural network was required that enabled the recognition of the object’s position [27]. A similar approach by Wang et al. [28] mentioned that the mouth region was located in terms of multi-threshold binarization in intensity space using the Gaussian model in the range of RGB color space. By applying the lip corners to the integral projection of the mouth in the vertical direction, the lower and upper lip boundaries were highlighted as the openness of the mouth. Therefore, the yawning stage was determined by the degree of mouth opening, with respect to the ratio of the mouth–bounding rectangle. When the ratio in the box identified a large open mouth over a predefined threshold for a continuous number of frames, the driver was classified as drowsy.

This entry is adapted from the peer-reviewed paper 10.3390/s23146459

This entry is offline, you can click here to edit this entry!
Video Production Service