This research focuses on improving healthcare quality by introducing an automated system that continuously monitors patient pain intensity. The system analyzes the Electrodermal Activity (EDA) sensor modality modality, compares the results obtained from both EDA and facial expressions modalities, and late fuses EDA and facial expressions modalities.
1. Introduction
A reliable assessment of pain is crucial to determine proper and prompt treatment, especially for vulnerable patients who cannot communicate their pain, such as those in intensive care, people with dementia, or adults with cognitive impairment. To make the clinical observations go well, it is promising to provide an automated system due to its possibility for objective and robust measurements and the monitoring of pain
[1]. The COVID-19 pandemic has further highlighted the importance of such systems. Many countries like China adopted automated systems to effectively manage patients
[2]. Thus, this
res
tudyearch aims to develop an automated system for clinical settings that can rapidly and objectively monitor patients’ pain levels by analyzing the informative modalities in the X-ITE Pain Dataset. Such a database has been made to complement existing databases and provide valuable information for more advanced discriminating pain or pain intensities versus no pain.
Physical expressions of pain encompass visual cues (facial expressions and body movements), vocalization cues (verbally and non-verbally), and physiological cues (electrocardiography (ECG), electromyography (EMG), Electrodermal Activity (EDA), and brain activity)
[3,4,5][3][4][5]; these cues play a significant role in assessing pain in individuals
[5]. The extracted features from EDA and facial expression modalities indicate the spontaneous pain expression, stress, and anxiety caused by different pain levels; both modalities are good measures for pain assessment
[6,7][6][7]. This
res
tudyearch presents the findings obtained from analyzing two important modalities regarding classification and regression. Regarding regression approaches, the pain intensity stimuli were handled as continuous labels and normalized between 0 and 1 to fit using all regression approaches.
EDA records the changes in the skin’s electrical activity using two electrodes attached to the index and ring fingers. It correlates significantly with pain intensity ratings, as it reflects the intense body reactions after experiencing pain when a painful stimulus is applied
[8,9,10][8][9][10]. An increasing number of studies
[11,12,13,14,15][11][12][13][14][15] explored physiological signals and machine learning models for objective assessments of pain intensity; findings demonstrated that EDA signals tend to outperform other physiological signals in terms of accurate pain assessment. Thus, many studies
[16,17,18][16][17][18] focused on EDA for pain assessment. Further, the temporal integration of EDA features were investigated to improve the performance of pain assessment
[14,19,20][14][19][20]. The temporal integration was represented as a time series statistics descriptor (EDA-D) that was calculated from several statistical measures along with their first and second derivatives per time series.
Ekman and Friesen
[21] decomposed facial expressions into individual facial Action Units (AUs) with the Facial Action Coding System (FACS). A combination of some of these AUs expresses pain behaviors
[22]. Prior studies
[13,14,23,24,25,26,27][13][14][23][24][25][26][27] using facial expressions have explored machine learning approaches to recognize pain intensity. Regarding the use of the temporal integration of frame-level features represented by the Facial Activity Descriptor (FAD), RF showed superior performance compared to linear Support Vector Machine (SVM) and Radial Basis Function kernel (RBF-SVM)
[24]; thus, it was used in
[27] and this
res
tudyearch as baseline approach regarding classification and regression. Approaches that use FAD to recognize pain intensity showed better results than those that used facial features
[24], which relied on independently extracted facial features from each frame of a given sequence. FAD is good at describing dynamics among neighboring frames.
2. Continuous Pain Intensity Recognition
Several studies of pain have focused on physiological signals because of the strong correlation between these signals and pain
[32,33,34][28][29][30]. In
[5[5][14][15],
14,15], it was reported that the EDA signal obtained the best performance compared to other single physiological signals. Thus, EDA has gained attention in automatic pain recognition systems. EDA records changes in the electrical activity of the skin of the hands, which is controlled by the autonomic nervous system
[35,36][31][32]. The sweat on the skin’s surface changes the electrical conductivity of the skin (e.g., people sweat when they are scared, nervous, and in pain). EDA is composed of phasic and tonic signals. The phasic signal is a quick response caused by external stimuli such as pain stimuli. The tonic signal is a slower component of the signal, including the baseline of the signal due to unconscious activities
[37][33].
Recent studies have focused on deep-learning methods due to their success in classifying pain using EDA, such as 1D convolutional neural networks [CNNs]
[13], a multi-task learning method based on neural networks
[38][34], and the Recurrent Convolutional Neural Network [RCNN]
[12]. These deep-learning methods were utilized because of their ability to mine the sequential relationships between different periods of EDA signals. Posada et al.
[17] presented classification and regression machine learning models to estimate pain sensation in healthy subjects using EDA. They computed the extracted features of EDA based on time-domain decomposition, spectral analysis, and differential features. The maximum macro-averaged geometric mean scores of the models were 69.7% and 69.2%, respectively. Kong et al.
[18] analyzed the spectral characteristics of EDA to obtain reliable performance because it is more sensitive and reproducible for the assessment of sympathetic arousal than traditional indices (tonic and phasic signals). Bhatkar et al.
[16] reported a successful novel method to discriminate the reduction in pain with clinically effective analgesics by combining self-reports with continuous physiological data in a structured and specific-to-pain protocol.
A common knowledge is that pain databases have a significant impact on the performance of automatic pain assessment systems. The above-mentioned studies of EDA signals for pain intensity recognition used databases that include fewer variants of quality and duration. By analyzing pain in terms of quality and length, additional valuable information is provided for more advanced discrimination between pain or pain intensity versus no pain. Thus, the X-ITE Pain Database
[28][35] is designed to complement existing databases. The X-ITE Pain Database includes behavioral and physiological data that were recorded when healthy participants (subjects) were exposed to different qualities and durations of pain stimuli. The use of healthy subjects in a medical study has always played a vital role in evaluating safety and tolerability without interference from concomitant pathological conditions
[39][36].
Werner et al.
[24] introduced a novel feature set for describing facial actions and their dynamics, which
wresearche
rs call facial activity descriptors [FAD]. They trained FAD (extracted from the BioVid Heat Pain Dataset) with SVM and RFc, and the results showed that RFc with 100 trees outperformed SVM. They focused on the video-level using temporal integration for pain recognition because it was more effective in describing the dynamic information beneficial for pain intensity recognition
[23]. This approach often involves the temporal integration of frame-level features. For example, video content can be condensed to high-level features using a time series statistics descriptor that consists of several statistical measures of the time series. In
[14], the same RFc was trained using the extracted features from facial expressions, audio, ECG, EMG, and EDA that were introduced in
[28][35] to recognize pain levels. They classified the pre-segmented time windows (7 s) cut out from the continuous recording of the main stimulation phase in the X-ITE Pain Database. According to the ability of Random Forest (RF)
[40][37] for pain detection using facial expressions
[14[14][23],
23], wresearche
rs introduced RFc using temporal information of facial expressions by representing time-series statistics descriptor (FAD)
[25,26][25][26]. FAD was represented by calculating several statistical measures with their first and second derivatives per time series. The performances of reduced MobileNetV2 and simple Convolutional Neural Network (CNN) were better than RFc. CNN accuracy improved when using the sample weighting method by about 1%. The sample weighting method was suggested to reduce the weight of misclassified samples by duplicating some training samples with more facial responses if their classification scores are above 0.3 to improve the pain intensity recognition performance
[26].
In
[5,6[5][6][13][14][20][23][34][35][38],
13,14,20,23,28,30,38], the authors reported that fusing modalities could improve the results of pain recognition. After investigating these studies, it was found that some fused physiological modalities, while others fused both behavioral and physiological modalities. The models combining the fused modalities of EMG and EDA were the most successful in the majority of the aforementioned studies for developing pain recognition systems. However, physiological signals could also be indicative of other pathological conditions unrelated to pain. In the study by Werner et al.
[14], fusion was applied with multiple modalities (frontal RGB camera, audio, ECG, EMG, and EDA). Firstly, they individually trained random forests (RF) using the features of each modality. Secondly, they concatenated the feature vectors of all modalities and trained and tested the RF (referred to as feature fusion). Thirdly, they applied decision fusion by training the RF on individual modalities and then aggregating the RF scores into final decisions. They employed two types of aggregation: fixed mapping and trained mapping approaches.