BCI Emotion Recognition: Comparison
Please note this is a comparison between Version 1 by Myriam Hernandez-Alvarez and Version 2 by Vivi Li.

This entry gives an overview of available datasets, emotion elicitation methods, feature extraction and selection, classification algorithms, and performance evaluation for emotion recognition using EEG-based BCI systems.

  • BCI
  • survey
  • emotion
  • recognition
  • preprocessing
  • feature
  • extraction
  • selection
  • classification
  • trends

Note:All the information in this draft can be edited by authors. And the entry will be online only after authors edit and submit it.

Definition:Affecting computing is an artificial intelligence area of study that recognizes, interprets, processes, and simulates human affects. The user’s emotional states can be sensed through electroencephalography (EEG)-based Brain Computer Interfaces (BCI) devices. Research in emotion recognition using these tools is a rapidly growing field with multiple inter-disciplinary applications. 

1. Introduction

Affective computing is a branch of artificial intelligence. It is computing that relates to, arises from, or influences emotions [1]. Automatic emotion recognition is an area of study that forms part of affective computing. Research in this area is rapidly evolving thanks to the availability of affordable devices for capturing brain signals, which serve as inputs for systems that decode the relationship between emotions and electroencephalographic (EEG) variations. These devices are called EEG-based brain-computer interfaces (BCIs).

Affective states play an essential role in decision-making. Such states can facilitate or hinder problem-solving. Emotion recognition takes advantage of positive affective states, enhances emotional intelligence, and consequently improves professional and personal success [2]. Moreover, emotion self-awareness can help people manage their mental health and optimize their work performance. Automatic systems can increase our understanding of emotions, and therefore promote effective communication among individuals and human-to-machine information exchanges. Automatic EEG-based emotion recognition could also help enrich people’s relationships with their environment. Besides, automatic emotion recognition will play an essential role in artificial intelligence entities designed for human interaction [3].

2. EEG-Based BCI Systems for Emotion Recognition

Figure 1Figure 3 presents the structure of an EEG-based BCI system for emotion recognition. 

2.1. Signal Acquisition

Inexpensive wearable EEG helmets and headsets that position noninvasive electrodes along the scalp can efficiently acquire EEG signals. The clinical definition of EEG is an electrical signal recording of brain activity over time. Thus, electrodes capture signals, amplify them and send them to a computer (or mobile device) for storage and processing. Currently, there are various low-cost EEG-based BCI devices available on the market [4][22]. However, many current models of EEG-based BCI become incommodious after continued use. Therefore, it is still necessary to improve their usability.

Public Databases

2.1.1. Public Databases

Alternatively, there are also public databases with EEG data for affective information. Table 12 presents a list of available datasets related to emotion recognition. Such datasets are convenient for research, and several emotion recognition studies use them.

Table 12. Publicly available datasets.

Specificity is the ability to identify a true negative rate. It measures the proportion of correctly identified true negatives over the sum of the true negatives plus false positives.

The False Positive Rate (FPR) is then equal to 1 – Specificity.
Specificity measures how often a classifier correctly categorizes a negative result. Specificity focuses on one class only, and the majority class biases it.
Precision Precision also referred to as Positive Predicted Value, is calculated as 1 – False Detection Rate (F).

False detection rate is the ratio between false positives over the sum of true positives plus false positives.
Precision measures the fraction of correct classifications. Precision should not be used when the positive class is larger (imbalanced dataset), and correct detection of positives samples is less critical to the problem.
ROC The ROC curve is a Sensitivity plot as a function of the False Positive Rate. The area under the ROC curve is a measure of how well a parameter can distinguish between a true positive and a true negative. ROC curve provides a measure of the classifier performance across different significance levels. ROC is not recommended when the negative class is smaller but more important. The Precision and Recall will mostly reflect the ability to predict the positive class if it is larger in an imbalanced dataset.
F-Measure F-Measure is the harmonic mean of Precision and Recall. It is useful because as the Precision increases, Recall decreases, and vice versa. F-measure can handle imbalanced data. F-measure (like ROC and kappa) provides a measure of the classifier performance across different significance levels. F-measure does not generally take into account true negatives.

True negatives can change without affecting the F-measure.
Pearson correlation coefficient Pearson’s correlation coefficient (r), quantifies the degree of a ratio between the true and predicted values by a value ranking from −1 to +1. Pearson’s correlation is a valid way to measure the performance of a regression algorithm. Pearson’s correlation ignores any bias which might exist between the true and the predicted values.
Information transfer rate (ITR) As BCI is a channel from the brain to a device, it is possible to estimate the bits transmitted from the brain. ITR is a standard metric for measuring the information sent within a given time in bits per second. ITR is a metric that contributes to criteria to evaluate a BCI System. ITR is often misreported due to inadequate understanding of many considerations as delays are necessary to process data, to present feedback, and clear the screen.

TR is best suited for synchronous BCIs over user-paced BCI.



Number of Channels

Emotion Elicitation

Number of Participants

Target Emotions



32 EEG channels

Music videos


Valence, arousal, dominance, liking



54 EEG channels

Selected images from IAPS.


Calm, positive, exciting, negative exciting




Recall past emotions


Positive valence (joy, happiness) or of negative valence (sadness, anger)



62 channels

Film clips


Positive, negative, neutral



62 channels

72 film clips


Happy, sad, neutral, fear



32 channels

Fragments of movies and pictures.


Valence and arousal rated with the self-assessment manikin


EEG Alpha Waves dataset

16 channels

Resting-state eyes open/closed experimental protocol





14 channels

Film clips


Rating 1 to 5 to valence, arousal, and dominance



64 channels

Native Chinese Affective Video System



Happy, sad, and neutral

2.1.2. Emotion Elicitation

The International Affective Picture System (IAPS) [14][31] and the International Affective Digitized Sound System (IADS) [15][32] are the most popular resources for emotion elicitation. These datasets provide emotional stimuli in a standardized way. Hence, it is useful for experimental investigations.

IAPS consists of 1200 images divided into 20 sets of 60 photos. Valence and arousal values are tagged for each photograph. IADS’ latest version provides 167 digitally recorded natural sounds familiar in daily life, with sounds labeled for valence, arousal, and dominance. Participants labeled the dataset using the Self-Assessment Manikin system [16][12]. IAPS and IADS stimuli are accessible with labeled information, which is convenient for the construction of a ground-truth for emotion assessment [17][33].

Other researchers used movie clips, which have also been shown capable of provoking emotions. In [18][34], the authors state that emotions using visual or auditory stimuli are similar. However, results obtained through affective labeling of multimedia may not be generalizable to more interactive situations or everyday circumstances. Thus, new studies using interactive emotional stimuli to ensure the generalizability of results for BCI would be welcomed.

Numerous experiments stimulated emotions in different settings, but they do not use EEG devices. However, they collected other physiological indicators like heart rate, skin galvanic changes, and respiration rate, among others. Conceptually, such paradigms could be useful if they are replicated for EEG signal acquisition. Possible experiments include stress during interviews for the detection of anger, anxiety, rejection, and depression. Exposure to odorants triggers emotions, such as anger, disgust, fear, happiness, sadness, and surprise. Harassment provokes fear. A threat of short-circuit or a sudden backward-tilting chair elicits fear. A thread of shock provokes anxiety. Naturally, these EEG-based BCIs experiments should take into account ethical considerations.

To our knowledge, only a few studies have used more interactive conditions where participants played games or used flight simulators to induce emotions [19][20][35][36]. Alternatively, some authors have successfully used auto-induced emotions through memory recall [21][37].

2.2. Preprocessing

EEG signals’ preprocessing relates to signal cleaning and enhancement. EEG signals are weak and easily contaminated with noise from internal and external sources. Thus, these processes are essential to avoid noise contamination that could affect posterior classification. The body itself may produce electrical impulses through blinking, eye or muscular movement, or even heartbeats that blend with EEG signals. It should be carefully considered whether these artifacts should be removed because they may have relevant emotional state information and could improve emotion recognition algorithms’ performance. If filters are used, it is necessary to use caution to apply them to avoid signal distortions.

The three commonly used filter types in EEG are (1) low-frequency filters, (2) high-frequency filters (commonly known by electrical engineers as low-pass and high-pass filters), and (3) notch filters. The first two filters are used to filter frequencies between 1 and 50–60 Hz.

For EEG signal processing, filters, such as Butterworth, Chebyshev, or inverse Chebyshev, are preferred [22][39]. Each of them has specific features that need to be analyzed. A Butterworth filter has a flat response in the passband and the stopband but also has a wide transition zone. The Chebyshev filter has a ripple on the passband, and a steeper transition, so it is monotonic on the stopband. The inverse Chevishev has a flat response in the passband, is narrow in the transition, and has a ripple in the stopband. A Butterworth phase zero filter should be used to prevent a phase shift because this filter goes forward and backward over the signal to avoid this problem.

According to [23][97], emotions emerge as the synchronization of various subsystems. Several authors use synchronized activity indexes in different parts of the brain. The efficiency of these indexes has been demonstrated in [24][98], calculating the correlation dimension of a group of EEG signals. In [24][98], other methods were used to calculate the synchronization of different areas of the brain. Synchronized indexes are a promising method for emotion recognition that deserves further research.

2.4. Feature Selection

The feature selection process is vital because it obtains the signal’s properties that best describe the EEG characteristics to be classified. In BCI systems, the feature vector generally has high dimensionality [25][99]. Feature selection reduces the number of input variables for the classifier (not to be confused with dimensionality reduction). While both processes decrease the data’s attributes, dimensionality reduction combines features to reduce their quantity.

A feature selection method does not change characteristics but excludes some according to specific usefulness criteria. Feature selection methods aim to achieve the best results by processing the least amount of data. It serves to remove attributes that do not contribute to the classification because they are irrelevant (or redundant) for simpler classification models (which are faster and have better performance). Additionally, feature selection methods reduce the overfitting likelihood in regular datasets, flexible models, or when the dataset has too many features but not enough observations.

2.5. Classification Algorithms

Model frameworks can categorize classification algorithms [26][27][56,57]. The model’s categories may be (1) generative-discriminative, (2) static-dynamic, (3) stable-unstable, and (4) regularized [28][29][30][102,103,104].

There are two different selection approaches for the classifier that works best under certain conditions in emotion recognition [26][56]. The first identifies the best classifier for a given BCI device. The second specifies the best classifier for a given set of features.

For synchronous BCIs, dynamic classifiers and ensemble combinations have shown better performances than SVMs. For asynchronous BCIs, the authors in this field have not determined an optimal classifier. However, it seems that dynamic classifiers perform better than static classifiers [26] [56] because they handle better the identification of the onset of mental processes.

From the second approach, discriminative classifiers have been found to perform better than generative classifiers, principally in the presence of noise or outliers. Dynamic classifiers like SVM generally handle high dimensionality in the features better. If there is a small training set, simple techniques like LDA classifiers may yield satisfactory results [31][58].

2.6. Performance Evaluation

Results must be reported consistently so that different research groups can understand and compare them. Hence, evaluation procedures need to be chosen and described accurately [32][119]. The evaluation of the classifier’s execution involves addressing performance measures, error estimation, and statistical significance testing [33][120]. Performance measures and error estimation configure the fulfillment rate of the classifier’s function. The most recommended performance evaluation measures are shown in Table 2Table 7. They are confusion matrix, accuracy, error rating, and other measures obtained from the confusion matrix, such as the recall, specificity, precision, Area Under the Curve (AUC), and F-measure. Other performance evaluation coefficients are Cohen’s kappa (k) [34][121], information transfer rate (ITR) [35][65], and written symbol rate (WSR) [34][121].

Table 27.

Conventional performance evaluation methods for BCI.
Performance Evaluation Main characteristics Advantages Limitations
Confusion matrix The confusion matrix presents the number of correct and erroneous classifications specifying the erroneously categorized class. The confusion matrix gives insights into the classifier’s error types (correct and incorrect predictions for each class).

It is a good option for reporting results in M-class classification.
Results are difficult to compare and discuss. Instead, some authors use some parameters extracted from the confusion matrix.
Accuracy and error rate The accuracy p is the probability of correct classification in a certain number of repeated measures.

The error rate is e = 1 − p and corresponds to the probability that an incorrect classification has been made.
It works well if the classes are balanced, i.e., there are an equal number of samples belonging to each class. Accuracy and error rate do not take into account whether the dataset is balanced or not. If one class occurs more than another, the evaluation may appear with a high value for accuracy even though the classification is not performing well.

These parameters depend on the number of classes and the number of cases. In a 2-class problem the chance level is 50%, but with a confidence level depending on the number of cases.
Cohen’s kappa (k) k is agreement evaluation between nominal scales. This index measures the agreement between a true class compared to a classifier output. 1 is a perfect agreement, and 0 is pure chance agreement. Cohen’s kappa returns the theoretical chance level of a classifier.

This index evaluates the classifier realistically. If k has a low value, the confusion matrix would not have a meaningful classification even with high accuracy values.

This coefficient presents more information than simple percentages because it uses the entire confusion matrix.
This coefficient has to be interpreted appropriately. It is necessary to report the bias and prevalence of the k value and test the significance for a minimum acceptable level of agreement.
Sensitivity or Recall Sensitivity, also called Recall, identifies the true positive rate for describing the accuracy of classification results. It evaluates the proportion of correctly identified true positives related to the sum of true positives plus false negatives. Sensitivity measures how often a classifier correctly categorizes a positive result. The Recall should not be used when the positive class is larger (imbalanced dataset), and correct detection of positives samples is less critical to the problem.

Performance evaluation and error estimation may need to be complemented with a significance evaluation. This is because high accuracies can be of little impact if the sample size is too small, or classes are imbalanced (labeled EEG signals typically are). Therefore, significance classification is essential. There are general approaches that can handle arbitrary class distributions to verify accuracy values that lie significantly above certain levels. Used methods are the theoretical level of random classification and adjusted Wald confidence interval for classification accuracy.

The theoretical level of random classification test classification results for randomness is the sum of the products between the experimental results’ classification probability and the probability calculated if all the categorization randomly occurs (p0 = classification accuracy of a random classifier). This approach can only be used after the classification has been performed [36][122].

Adjusted Wald confidence interval gives the lower and upper confidence limits for the probability of the correct classification, which specifies the intervals for the classifier performance evaluation index [37][123].

3. Conclusions

EEG signals are reliable information that cannot be simulated or faked. To decode EEG and relate these signals to specific emotion is a complex problem. Affective states do not have a simple mapping with specific brain structures because different emotions activate the same brain locations, or conversely, a single emotion can activate several structures.

In recent years, EEG-based BCI emotion recognition has been a field affecting computing that has generated much interest. Significant advances in the development of low-cost BCI devices with increasingly better usability have encouraged numerous research studies.