Mental stress is known as a prime factor in road crashes. The devastation of these crashes often results in damage to humans, vehicles, and infrastructure. Likewise, persistent mental stress could lead to the development of mental, cardiovascular, and abdominal disorders. Preceding research in this domain mostly focuses on feature engineering and conventional machine learning approaches.
1. Introduction
Successful driving activities always require both mental and physical skills
[1][2][3][1,2,3]. Acute stress reduces the driver’s ability to fix hazardous situations, which causes significant damage to humans and vehicles every year
[4][5][6][7][8][4,5,6,7,8]. Dangerous driving situations are triggered due to human errors, individual factors, and ambiance conditions
[9]. According to the National Motor Vehicle Crash Causation Survey (NMVCCS) in the United States (US), human errors caused 94% of crashes alone, while vehicle defects, ambiance conditions, and other factors collectively caused 6% of crashes during 2005–2007
[10]. Human errors are linked to the driver’s perceptual conditions, so a complete understanding of these conditions is crucial for preventing traffic accidents.
To detect and diagnose drivers’ different stress levels, physiological, physical, and contextual information are widely utilized
[11]. Moreover, different traditional machine learning models based on handcrafted feature extraction methods are utilized for the classification of stress. Extracting the best features using these approaches is always a challenging task, as the quality of extracted features has a significant effect on the classification performance
[12]. These approaches are laborious, ad hoc, less robust to noise, and need thorough skill
[13]. To come through these challenges, deep learning models have been utilized to automatically produce complex nonlinear features reliably
[14][15][16][14,15,16]. In addition to automatic feature extraction from raw data, these models offer noise robustness and better classification accuracy
[17][18][19][17,18,19]. Different deep learning algorithms are used in recent research, e.g., CNN, RNN, DNN, and LSTM.
The models proposed in the current work are based on 1D CNN and hybrid 1D CNN-LSTM networks. The proposed models are separately trained using multiple physiological signals (SRAD) and multimodal data (AffectiveROAD) including physiological signals and other information about the vehicle, driver, and ambiance. Multimodal fusion of data based on deep learning approaches can be used to develop a precise driver stress level recognition model with improved performance and reliability.
2. Real-World Driver Stress Recognition and Diagnosis
Several machine learning approaches have been proposed for real-world driver mental stress recognition based on different physiological signals. Dalmeida and Masala
[20], Vargas-Lopez et al.
[21], Khowaja et al.
[8], Lopez-Martinez et al.
[22], Haouij et al.
[23], Chen et al.
[4], Ghaderi et al.
[24], Zhang et al.
[25], and Healey and Picard
[26] propose conventional machine models based on physiological signals obtained from the PhysioNet SRAD public database
[27]. Unlike the previous studies, Rigas et al.
[28] presented a real-world binary stress recognition model based on multimodal data, including physical and contextual data, in addition to physiological signals. On the other hand, Zontone et al.
[29], Bianco et al.
[30], Lee et al.
[31], Lanatà et al.
[32], and Gao et al.
[33] proposed conventional machine learning models for driver stress recognition based on simulated driving situations. Lanatà et al.
[32] and Lee et al.
[31] presented driver stress recognition models based on multimodal data. Contrary to previous studies, Šalkevicius et al.
[34], Rodríguez-Arce et al.
[35], Can et al.
[36], Al abdi et al.
[37], Betti et al.
[38], Siramprakas et al.
[39], de Vries et al.
[40], and Sun et al.
[41] proposed stress recognition models during controlled, lab, semi-lab, and physical (such as sitting, standing, and walking) environments. Recent development in deep learning and machine learning models have shown good results in various applied domains that can be applied in driver stress detection
[42][43][42,43].
All the mentioned studies are based on feature engineering techniques, and various conventional machine learning algorithms were employed to classify levels of stress. However, handcrafted features are less robust to noise and subjective changes, and need a considerable amount of time and hard work
[8][13][19][34][35][44][8,13,19,34,35,44]. Moreover, capturing the features’ sequential nature is difficult due to the absence of explicit features and high dimensionality despite using complex feature selection methods. Likewise, the dependence of the model on past observations would make it impractical to process all the information due to the growing complexity. The feature-level multimodal fusion models proposed by Chen et al.
[4], Healey and Picard
[26], Haouij et al.
[23], Lee et al.
[31], Bianco et al.
[30], Sun et al.
[41], and Can et al.
[36] mainly concentrate on pattern learning in individual signals instead of multiple simultaneous signals
[18]. Thus, these models are inappropriate to obtain the nonlinear correlation across multiple signals appearing simultaneously. Various linear and non-linear methods employed in these conventional machine learning models have not been able to perform the vigorous investigation of such manifold time series signals
[19].
To address the issues faced by conventional machine learning models, deep learning methods have been introduced. Deep learning models are developed based on signal preprocessing (noise filtering), designing a particular deep neural network based on the area of interest, network training, and model testing. Deep learning models learn and classify raw data using multilayer deep neural networks
[45]. The last fully connected (FC) layers are utilized to obtain the final output. Contrary to feature engineering techniques used in conventional machine learning approaches, deep learning models automatically produce steady features
[14][15][14,15]. Moreover, deep learning models are more robust to noise and achieve improved classification accuracy
[19]. Different deep learning algorithms are used in recent research, e.g., the recurrent neural network (RNN), deep aeural network (DNN), LSTM, and CNN. Rastgoo et al.
[11], Zhang et al.
[46], Kanjo et al.
[17], Lim and Yang
[47], Yan et al.
[48], Hajinoroozi et al.
[49], and Lee et al.
[50] presented different deep learning models to identify different driver states. Rastgoo et al.
[11], Kanjo et al.
[17], Lim and Yang
[47], and Yan et al.
[48] proposed deep learning models based on multimodal data. On the other hand, the models proposed by Hajinoroozi et al.
[49] and Lee et al.
[50] are based on physiological signals only. The stress recognition model proposed by Zhang et al.
[46] is based on facial images only. Apart from driving scenarios, Masood and Alghamdi
[51], Cho et al.
[52], Seo et al.
[53], Hwang et al.
[54], and He et al.
[55] proposed stress recognition models based on deep learning techniques and physiological signals in academic, workplace, and lab settings. Most of these studies including
[46][49][50][52][53][54][55][56][46,49,50,52,53,54,55,56] are based on two levels of stress only. Moreover, the schemes presented by
[46][50][52][55][56][46,50,52,55,56] are based on images. Likewise, the schemes proposed by
[49][52][53][54][55][56][49,52,53,54,55,56] are either based on physiological signals or a single modality. On the other hand, the model proposed by
[11] is based on multimodal data collected during simulated driving.
The models proposed in this study are based on the fusion of multimodal data collected during real-world driving (SRAD and AffectiveROAD datasets). Moreover, these models are based on 1D CNN and 1D CNN-LSTM networks to detect driver’s two (stressed and relaxed) and three levels (low, medium, and high). The fuzzy EDAS approach is also used to find the performance ranks of the proposed models based on different classification metrics.