In early emotion recognition research, researchers mainly used non-physiological signals such as facial expressions
[6][7], speech intonation
[8], and body movements
[9] to recognize emotions and achieved good results. However, the features extracted from these data, such as facial expressions, speech, and body posture, are easy to disguise and are influenced by human subjective factors, making it difficult to reflect the true emotional state. This recognition strategy results in a lack of reliability
[10]. In contrast, physiological signals are difficult to disguise and contain more information. Common physiological signals include electroencephalogram (EEG)
[11][12][13][14][15][16][17][18][19][20], electromyography (EMG)
[21], galvanic skin resistance (GSR)
[22], electrocardiogram (ECG)
[23], skin temperature (SKT)
[24] and pupil diameter
[25]. Using physiological signals for emotion recognition results in more reliable results
[26]. Among many physiological signals, EEG signals are non-linear, non-stationary, and random signals that record changes in scalp electrical activity. They can reflect human mental state and emotional changes well
[27]. More and more researchers are using EEG signals for emotion recognition research and have achieved better results than non-physiological signals such as facial expressions, speech intonation, and body movements
[28][29][30][31][32]. While it is true that previous research on emotion recognition using EEG has yielded impressive results, there are still some urgent problems that need to be addressed, such as low recognition accuracy and high computational cost
[26]. Given that emotion recognition with a high computational cost has limited practical value, there is a need to develop an EEG-based algorithm for emotion recognition that strikes a balance between high accuracy and low computational requirements.
2. Different Features for EEG Emotion Recognition
EEG can reflect the electrophysiological activity of brain nerve cells in the cerebral cortex or scalp surface
[27]. Human emotion changes and brain nerve activity are closely related, and EEG records the state changes of brain nerve cells during emotion changes in real-time; this signal is very realistic and has a high temporal resolution. Therefore, the results of emotion recognition by EEG are more accurate and reliable
[15]. Typically, time-domain features, frequency-domain features, time–frequency features, nonlinear features, or a combination of these features are extracted from EEG signals for this purpose
[14][15]. Mehmood et al.
[16] employed the Hjorth parameter to extract EEG signal features and utilized random forests for the binary classification of emotions. Their study encompassed binary classification experiments on DEAP, SEED-IV, DREAMER, SELEMO, and ASCERTAIN datasets, with corresponding accuracy rates of 69%, 76%, 85%, 59%, and 87%. Tripathi et al.
[17] extracted nine features, comprising the mean, median, maximum, minimum, standard deviation, variance, value range, skewness, and kurtosis, from the DEAP EEG signal. They employed deep neural networks (DNN) and convolutional neural networks (CNN) for two classifications and attained superior results. Gao et al.
[18] extracted fuzzy entropy (FE) and
PSD from high-frequency EEG signals and applied multi-order detrended fluctuation analysis (MODFA) to classify emotions. Their study achieved an accuracy rate of 76.39% in the three-category task. Bai et al.
[19] extracted
DE features from EEG signals of the DEAP dataset and utilized a residual network with deep convolution and point convolution for binary classification, with an accuracy rate of 88.75%. Fraiwan et al.
[3] used multiscale entropy (MSE) to extract features from EEG, principal component analysis (PCA) for feature dimension reduction, and, finally, artificial neural networks (ANNs) to predict the enjoyment of museum pieces, obtaining a high 98.0% accuracy.
3. Fusion Features for EEG Emotion Recognition
Extracting multiple features of EEG and fusing them with different fusion strategies often results in better emotion recognition than single features
[20]. Multi-band feature fusion has particularly demonstrated effectiveness in enhancing the accuracy of emotion recognition
[28]. An et al.
[29] proposed an EEG emotion recognition algorithm based on 3D feature fusion and convolutional autoencoder (CAE), which extracted
DE from different frequency bands and fused them into 3D features. Using CAE for emotion classification, the recognition accuracy rates of valence and arousal dimensions on the DEAP dataset were 89.49% and 90.76%, respectively. Gao et al.
[30] developed a method of fusing power spectrum and wavelet energy entropy to classify three emotions (neutral, happy, and sad) using support vector machine (SVM) and relational vector machine (RVM). The experimental results showed that the fusion of two features was superior to a single feature. Zhang et al.
[31] proposed a multi-band feature fusion method GC–F-GCN based on Granger causality (GC) and graph convolutional neural network (GCN) for emotional recognition of EEG signals. The GC–F-GCN method demonstrated superior recognition performance than the state-of-the-art GCN method in the binary classification task, achieving average accuracies of 97.91%, 98.46%, and 98.15% for arousal, valence, and arousal–valence classification, respectively. Parui et al.
[32] extracted various features, including frequency domain features, wavelet domain features, and Hjorth parameters, and used the XGBoost algorithm to perform binary tasks on the DEAP dataset. The accuracy rates of valence and arousal reached 75.97% and 74.206%, respectively. These findings suggest that the use of multiple features and their fusion through appropriate strategies can significantly enhance the recognition accuracy of emotions using EEG signals.
4. Hybrid Model for EEG Emotion Recognition
In addition to the technique of feature fusion, the application of hybrid models has been proven to be effective in improving the accuracy of emotion recognition
[33][34][35]. Various studies have explored this approach and achieved promising results. For example, Chen et al.
[36] proposed a cascaded and parallel hybrid convolutional recurrent neural network (CRNN) for binary classification of EEG signals using spatiotemporal EEG features extracted from the
PSD of the signals. The proposed hybrid networks achieved classification accuracies of over 93% on the DEAP dataset. Similarly, Yang et al.
[37] developed a hybrid neural network that combined a CNN and a recurrent neural network (RNN) to classify emotions in EEG sequences. They converted chain-like EEG sequences into 2D frame sequences to capture the channel-to-channel correlation between physically adjacent EEG signals, achieving an average accuracy of 90.80% and 91.03% for potency and arousal classification, respectively, on the DEAP dataset. Furthermore, Wei et al.
[38] proposed a transformer capsule network (TCNet) that consisted of an EEG Transformer module for feature extraction and an emotion capsule module for feature refinement and classification of emotional states. On the DEAP dataset, their proposed TCNet achieved average accuracies of 98.76%, 98.81%, and 98.82% for binary classification of valence, arousal, and dominance dimensions, respectively. These studies demonstrate the potential of hybrid models in enhancing the performance of emotion recognition.
5. Multi-Category EEG Emotion Recognition
Compared to the research focusing solely on binary emotions, multi-classification research on emotions has promising prospects
[38][39][40][41]. Hu et al.
[42] introduced a hybrid model comprised of a CNN, a bidirectional long short-term memory network (BiLSTM), and a multi-head self-attention mechanism (MHSA) which transforms EEG signals into temporal frequency maps for emotion classification. The model achieved an accuracy rate of 89.33% for the four-category task using the DEAP dataset. Similarly, Zhao et al.
[43] proposed a 3D convolutional neural network model to automatically extract spatiotemporal features in EEG signals, achieving an accuracy rate of 93.53% for the four-category task on the DEAP dataset. Singh et al.
[44] utilized SVM to classify emotions by extracting the different features of EEG average event-related potentials (ERPs) and average ERPs, achieving accuracy rates of 75% and 76.8%, respectively, for the four-classification tasks on the DEAP dataset. Gao et al.
[45] proposed a new strategy for EEG emotion recognition that utilized Riemannian geometry. Wavelet packets were used to extract the time–frequency features of EEG signals to construct a matrix for emotion recognition, achieving an accuracy rate of 86.71% for the four-category task on the DEAP dataset.