Emotion Recognition Systems: Comparison
Please note this is a comparison between Version 1 by Kamarulzaman Ab. Aziz and Version 2 by Jessie Wu.

Emotion recognition systems (ERS) are an emerging technology with immense potential, exemplifying the innovative utilization of artificial intelligence (AI) within the context of the fourth industrial revolution (IR 4.0). Given that personalization is a key feature of the fifth industrial revolution (IR 5.0), ERS has the potential to serve as an enabler for IR 5.0. Furthermore, the COVID-19 pandemic has increased the relevance of this technology as work processes were adapted for social distancing and the use of face masks. Even in the post-pandemic era, many individuals continue to wear face masks. Therefore, ERS offers a technological solution to address communication challenges in a masked world. The existing body of knowledge on ERS primarily focuses on exploring modalities or modes for emotion recognition, system development, and the creation of applications utilizing emotion recognition functions.

  • emotion recognition system
  • fourth industrial revolution
  • fifth industrial revolution
  • artificial intelligence

1. Introduction

Artificial intelligence (AI) has evolved from being an interesting theoretical concept to tangibility, with recent applications of AI making significant impacts across businesses, industries, and societies [1]. Inspired by human intelligence, AI aims to learn, reason, and make decisions like humans, reducing the need for human intervention [2]. AI has established trustworthiness, enabling AI systems in the aspect of beneficence, non-maleficence, autonomy, justice, and explicability [2][3][4][2,3,4]. AI systems are designed to operate with varying levels as well as defined objectives, predictions, and recommendations influenced by real or virtual environments. Furthermore, AI offers benefits for businesses and industries, such as the automation of repetitive and time-consuming tasks, which allows humans to focus on higher-value work [4]. For example, massive data, which used to be a challenge to analyze, are now easily processed by AI; complex problems can be tackled using AI in a more efficient manner by integration with thousands of computers and other resources [2][3][4][2,3,4].
Moving ahead to the present, we find ourselves in the era of the fourth industrial revolution (IR 4.0), characterized by digitalization and the integration of AI and computers in collaboration with societies [5]. IR 4.0 primarily emphasizes the manufacturing industry, enabling smart manufacturing through technologies such as AI [6]. The constant nature of technological change is propelling us towards the fifth industrial revolution (IR 5.0). What distinguishes IR 5.0 from IR 4.0 is the specialization of machines and computers, endowed with the capability to comprehend human actions [7][8][7,8]. Aspects of human–computer interaction (HCI) will become more significant as we move into IR 5.0.emotion recognition systems (ERS)ERS is well positioned to be a key enabling technology for this, as it can enhance AI with abilities to understand human emotions and behavioral responses. To facilitate HCI, the computer system must have the ability to communicate with humans in some form [8]. Since the used cases of IR 5.0 are still in their formative years, manufacturers must actively consider strategies to incorporate humans and machines and maximize the opportunities that can be exploited in IR 5.0. Hence, the introduction of ERS may enhance robots and machines in understanding human emotions with the proposition of collaborative robots.
ERS is an emerging technology in the field of AI that allows machines to recognize human emotions by learning from various data modalities. It has gained significance due to technological advancements and potential applications. Initially introduced as part of affective computing (AC) by [9] to predict and understand human behavior, AC has further evolved to achieve recent advancements in recognizing emotions. Over the past decade, researchers have developed several ERS systems, which are now commonly embedded in various AI applications [10]. ERS offers a wide range of potential innovative solutions based on modalities introduced by previous researchers [11]. It has attracted significant interest from researchers, as evidenced by the increasing trend in studies related to ERS over the past decade, as shown in Figure 1. This underscores the importance of the research area. For the development of complex systems such as human-interacting robots, a sub-system capable of understanding and expressing human emotions has been proposed [6]. Previous studies indicate that ERS holds promise as a significant technology, offering advantages to individuals, societies, organizations, businesses, and industries across various platforms or applications. Examples include ERS in healthcare [12], driving assistance [13], and enhancing teaching and learning technologies in the education sector [14].
Figure 1.
ERS trend (
), accessed on 10 June 2023.
ERS is an advanced AI application that utilizes affective computing to understand and respond to human cues [15][17] and has become increasingly significant in the related field of study since it was introduced by [9]. ERS enhances AI in human–computer interaction and represents an additional advancement in technological progress [16][18]. AI, as the basis for intelligent machines or computers that enhance productivity in different settings [17][19], forms the foundation for incorporating emotion recognition as a subset of AI technology. Over the last decade, researchers and innovators have explored ERS modalities that consist of physiological, physical, and data mining from text or documents. These modalities have been suggested as part of the innovation for AI to enable tasks such as learning and understanding human emotions [18][20].
Physiological modalities are commonly found within the healthcare industry, such as electroencephalography (EEG), electrocardiography (ECG), and photoplethysmography (PPG). EEG serves as an analytical tool utilized in neuroscience, neural engineering, and biomedical engineering to measure human brain signals by observing the electromagnetic activity of specific components [19][20][21,22]. EEG is the preferred modality for accurate data in automated emotion recognition, as it aligns with AI systems that employ convolutional neural networks and deep machine learning [21][22][23,24]. It has been tested in detecting human emotions and is considered a cost-effective, portable, and simple method of identifying emotions [23][25]. ECG is one of the most well-known modalities and is commonly used in emotion recognition and affective computing research. Previous studies have utilized ECG to detect stress and emphasize the importance of monitoring emotional stress levels to prevent negative outcomes [12]. Machine-based ERS utilizing ECG provides an alternative to physical modalities. PPG, along with the galvanic skin response (GSR), is considered a practical and suitable modality for real-life applications [24][26].
For physical modalities, the modalities involved are facial recognition, speech recognition, body movement, etc. Facial recognition and speech recognition are considered well-known physical modalities for ERS researchers and have been extensively utilized in previous works [25][26][27,28]. Facial recognition, in particular, has been identified among ERS practitioners due to its wide range of real-world applications, including security supervision, online learning, and gaming experiences [26][28]. Speech recognition, as a modality of ERS, is capable of identifying human feelings and “makes conventional speech emotion recognition (SER) more suitable for real-life applications” [27][29] (p. 1). According to [28][30], one of the earlier instances of the detection of human emotion was through speech recognition: based on someone’s voice, the computer can specify the emotive cue and determine the emotion of the person. Combining the modalities also leads to better results in enabling ERS. For example, a study by [29][31] suggested that a combination of modalities such as EEG and facial recognition compensates for their defects as single information sources [29][31].
Text data mining refers to machine learning techniques that involve learning-based algorithms and feature extraction to describe the main characteristics of textual data [30][32]. In a recent study by [31][33], text word mining using emotion-label lexicons, such as a small set of seed words, was employed. For example, the text “Hurray!” can be labeled as indicating happy emotions, while “Argh!” may represent anger and frustration. Nevertheless, certain words may possess overlapping potential emotions; for instance, the word “Aww”, can convey both pleasant sentiments and expressions of pity and sympathy [31][32][33,34]. Various applications leveraging data and text mining for the automatic recognition of sentiments or emotions can be observed, particularly in eliciting opinions related to marketing or promotional content from sources like blog posts, social media, articles, surveys, etc. [33][35]. This can be applied to the web, such as chats on social networks, by analyzing their sentiments and emotions. Moreover, deep-learning-enabled emojis such as smilies, symbols, and characters based on text can be used to further classify emotions [34][36].

2. Emotion Recognition SystemsS Applications

ERS holds the potential to be applied in and bring benefits to various sectors due to its adaptability as an embedded technological function within a system. In other words, it can be one of the functionalities used to process the inputs of AI-enabled smart machines and computers to affect higher levels of HCI. ERS has been identified as having the potential to benefit the education sector as it can enable better engagement between instructors and learners [14][35][14,37]. Emotions exert a noteworthy influence on academic performance, with positive emotions being particularly instrumental in enhancing student interest and focus and increasing the likelihood of academic success [35][37]. Instructors have derived benefits from using a webcam equipped with facial recognition technology within a computer to identify students’ moods [14]. Another example of an ERS application is its integration into the implementation of a smart car [13]. A driver’s performance can be influenced by their emotions, particularly given their impact on the driver’s focus. Therefore, ERS has significance for applications to ensure driving safety. Specifically, [13] used a driving simulation with a built-in ECG modality in the steering wheels to detect human blood pressure in indicating emotional states of stress and fatigue.
Similarly, the use of facial expressions for ERS towards video surveillance was proposed in [36][38]. It has been highlighted that video surveillance systems nowadays are operated via human capabilities to interpret behavior through video surveillance, which leads to delays in responding to emergencies [36][38]. The experiments concluded that the implementation of facial recognition as part of ERS towards video surveillance system can improve the reliability of abnormal behavior detection via facial expressions depending on different emotions and environmental conditions. Furthermore, facial expressions can be used in identifying pain, which will benefit the healthcare industry [37][39]. Assessing a patient’s pain levels over time is deemed to be important, specifically regarding the effectiveness of medical treatments. Therefore, the usage of facial expression recognition can be widely anticipated in the healthcare industry.
Physiological modalities have gained increased attention towards the successful implementation of ERS since physiological factors are more useful in understanding human emotions through neural activity [38][40]. Among physiological modalities, most of the methods have been assessed through healthcare facilities; therefore, the implementation of ERS through physiological modalities is more likely to be beneficial for the healthcare industry. Moreover, ERS applications in the healthcare industry can serve as a supportive aid for people with conditions like Down’s syndrome and autism and among the elderly [39][41]. Multi-modal approaches combining facial expressions for automated emotion recognition and computer advisors guiding appropriate reactions to specific situations have been explored [39][40][41,42]. Additionally, a communication aid using speech recognition was proposed to identify the tone and voice of special needs patients with conditions like autism or Down’s syndrome [40][42]. However, given the accessibility of technologies, some components may be implemented in technological devices; thus, they can enhance global outreach to users.
During the pandemic caused by COVID-19, some potential innovative solutions were introduced to enable technologies to be adopted in daily life, supported by virtual videoconferencing, which made breakthroughs and enabled working environments such as work from home, online classes, virtual event gatherings, and more [41][43]. Ref. [41][43] suggested that facial emotion recognition may provide a significant effect in reducing videoconferencing fatigue by analyzing participants in videoconferencing through Zoom; they tracked users using a facial recognition modality to recognize six emotions. Furthermore, in the marketing sector, ERS has significant applications in increasing brand awareness through image, video, and text mining [42][44]. For instance, text mining implemented in web browsers can analyze feedback and comments from potential users, revealing their sentiments and emotions towards a certain product [34][42][36,44]; other studies have gathered a small group of individuals in a room, introducing a product and recording their reactions to evaluate their emotions [43][45].
With such potential of ERS and its innovative applications, there is a need to understand whether individuals are ready for the technology. Considering that ERS will be available in various industries and implemented for daily use, investigating the importance of ERS is crucial; this can help ERS scientists, engineers, practitioners, and technology developers to understand the factors influencing users in adopting ERS. Furthermore, to identify the factors, previous studies have suggested technology adoption theories and concepts that identify the user’s behavior, intention, adoption, and readiness for such technologies.
ScholarVision Creations