Facial Expression Recognition System Using Convolutional Neural Network

Facial Expression Recognition System Using Convolutional Neural Network: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Contributor:

Hassanain K. Alrammahi

Recognizing facial expressions plays a crucial role in various multimedia applications, such as human–computer interactions and the functioning of autonomous vehicles. An individual’s facial expressions (FEs) or countenance convey their psychological reactions and intentions in response to a social or personal event. These expressions convey non-verbal stealth messages. With technological advancements, human behavior can be understood through facial expression recognition (FER). To express emotions, humans use facial expressions as their primary nonverbal communication method.

convolutional neural network
intelligent system
machine learning
human interaction
Deep learning

1. Introduction

Biometric authentication and nonverbal communication applications have recently drawn considerable attention to facial recognition. The movie industry has also conducted studies that predict emotions experienced during scenes. These works sought to identify a person’s mood or emotion by analyzing facial expressions.

With landmarks in 2D and 3D ^[1], facial appearances ^[2], geometry ^[3], and 2D/3D spatiotemporal representations ^[4], facial models can be developed. Review papers provide comprehensive reviews ^[4]^[5]^[6]. It is also possible to categorize approaches based on images, deep learning, and model-based approaches. Engineered topographies such as HOGs ^[7], local binary pattern histograms (LBPHs) ^[8], and Gabor filters ^[9] are used in many image-based approaches.

The most recent works in this field focus on hand-engineered features, but various techniques have been developed ^[10]^[11]^[12]^[13]. In today’s image processing and computer vision applications, deep neural networks are the best choice due to the variety and size of datasets ^[14]^[15]^[16]. Conventional deep networks can easily handle spatial images ^[16]. Traditional feature extraction and classification schemes are computationally complex and difficult to use for achieving high recognition rates. A DNN based on convolutional coatings and residuals is proposed for recognizing facial emotions ^[17]^[18]. By learning the subtle features of each expression, the proposed model can distinguish them ^[19]^[20]. The proposed model presents a facial emotion classification report, along with the confusion matrix derived from the image dataset.

2. Human Emotions and Facial Expressions

Human emotions are expressed through facial expressions in social communication. Three orthogonal planes are used to extract local binary pattern features (LBP-TOP) ^[21]. Based on computed histograms, the proposed LBP-TOP operator determines expression features from video sequences from three orthogonal planes. The author classified expressions using video sequences based on the extracted features of LBP-TOPs using a machine learning (ML) classifier. According to ^[22], fuzzy ARTMAP neural networks (FAMNNs) are used for VFER. In addition, particle swarm optimization (PSO) has been used to determine hyperparameters for the FAMNN network. A definite method ^[23] categorizes emotions based on their types, such as sadness, happiness, fear, and anger, according to the dimensional method ^[24].

SVM has also been used to categorize facial expressions ^[25]. Using 15 different feature points and their representations of neutral faces, the authors proposed a method measuring Euclidean distances between them. In the JAFFE dataset, 92% of the datasets are recognized, while in the CK dataset, 86.33% are recognized. These facial expression classification results demonstrate the effectiveness of SVMs in recognizing emotions. Also, SVMs have been employed for formalizing faces ^[26] and recognizing faces ^[27]^[28].

A large sample size is essential for developing algorithms for automatically recognizing facial expressions and related tasks. CK ^[29] and MMI ^[30] are three facial expression databases used to test and evaluate expression recognition algorithms. There are many databases where participants are asked to present certain facial expressions (e.g., frowns) rather than naturally occurring expressions (e.g., smiles). A spontaneous facial expression does not follow the same spatial or temporal pattern as a deliberately posed expression ^[31]. Over 90% of facial expressions can be detected by several algorithms. However, it is much harder to recognize spontaneous expressions ^[19]^[32]. A naturalistic setting is the best place for automatic FER. The working flow of FER methods ^[33], their integral and intermediate steps, and pattern structures are thoroughly analyzed and surveyed in the study in order to address these missing aspects. Furthermore, the limitations of existing FER surveys are discussed. A detailed analysis of FER datasets follows, followed by a discussion of the challenges and problems related to these datasets.

3. Deep-Learning-Based Face Recognition

During the training process, deep learning can auto-learn the new features based on stored information, thus minimizing the need to train the system repeatedly for new features. As deep learning algorithms do not require manual preprocessing, they can also handle large amounts of data. Recurrent neural networks (RNNs) and convolutional neural networks (CNNs) are two algorithms used in deep learning ^[34].

With RNN, relative dependencies with images are learned by recollecting information about past inputs, which is an advantage over CNN. It is widely used to combine RNNs and CNNs for image processing tasks, such as image recognition ^[35] and segmentation ^[36]. When input successions and hidden states are mapped to yields, a recurrent neural network (RNN) learns quick progression. In their paper, the authors proposed an improved method for representing spatial–temporal dependencies between two signals ^[37]. The CNN model is used in various fields, such as IOT ^[38], offloading ^[39], speech recognition ^[40], and traffic sign recognition ^[41].

A CNN and an RNN are combined in the HDCR-NN ^[23]. A facial expression classification system is adapted to it. Hybridizing convolutional neural networks ^[42] and recurrent neural networks ^[43] are automatically motivated by their wide acceptance of learning feature attributes. The researchers used the Japanese Female Facial Expression (JAFFE) and Karolinska Directed Emotional Faces (KDEF) datasets to evaluate the proposed methodology.

Using cross-channel convolutional neural networks (CC-CNNs), the authors ^[44] propose a method for calculating VFER. The author of ^[45] proposes a method for VFER based on hierarchical bidirectional recurrent neural networks (PHRNNs). In their proposed framework (MSCNN), spatial information was extracted from still frames of an expression sequence using an MSCNN. Combining PHRNN and MSCNN further enhances VFER by extracting dynamic stills, parts and wholes, and geometry appearance information. An effective VFER method was demonstrated by combining spatiotemporal fusion and transfer learning. CNNs and RNNs are combined in the FER to incorporate audio and visual expressions.

An alternative to deep-learning-based methods used for recognizing facial emotions was proposed with a simple machine learning method ^[46]. Regional distortions are useful for analyzing facial expressions. On top of the convolutional neural network, they trained a manifold network to learn a variety of facial distortions. A set of low-level features has been considered for inferring human action ^[47] rather than only trying to extract facial features. An entropy-based method used for facial emotion classification was developed by ^[48]. An innovative method for recognizing facial expressions from videos was presented by ^[49]. Multiple feature descriptors have described face and motion information. To exploit complementary and discriminative distance metrics, the researchers carried out the learning of multiple distance metrics. An ANN that learns and fuses the spatial–temporal features was proposed by ^[50]. The authors ^[48] proposed several preprocessing steps to recognize facial expressions.

This entry is adapted from the peer-reviewed paper 10.3390/app132112049

References

Cootes, T.F.; Taylor, C.J.; Cooper, D.H.; Graham, J. Active Shape Models-Their Training and Application. Comput. Vis. Image Underst. 1995, 61, 38–59.
Cootes, T.F.; Edwards, G.J.; Taylor, C.J. Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 681–685.
Kähler, K.; Haber, J.; Seidel, H.-P. Geometry-based muscle modeling for facial animation. In Proceedings of the Graphics Interface, Ottawa, ON, Canada, 7–9 June 2001; Volume 2001, pp. 37–46.
Fasel, B.; Luettin, J. Automatic facial expression analysis: A survey. Pattern Recognit. 2003, 36, 259–275.
Li, S.; Deng, W. Deep Facial Expression Recognition: A Survey. IEEE Trans. Affect. Comput. 2022, 13, 1195–1215.
Sandbach, G.; Zafeiriou, S.; Pantic, M.; Yin, L. Static and dynamic 3D facial expression recognition: A comprehensive survey. Image Vis. Comput. 2012, 30, 683–697.
Carcagnì, P.; Del Coco, M.; Leo, M.; Distante, C. Facial expression recognition and histograms of oriented gradients: A comprehensive study. SpringerPlus 2015, 4, 1–25.
Shan, C.; Gritti, T. Learning Discriminative LBP-Histogram Bins for Facial Expression Recognition. In Proceedings of the BMVC, Leeds, UK, 1–4 September 2008; pp. 1–10.
Lajevardi, S.M.; Lech, M. Averaged Gabor filter features for facial expression recognition. In Proceedings of the 2008 Digital Image Computing: Techniques and Applications, Canberra, Australia, 1–3 December 2008; pp. 71–76.
Kahou, S.E.; Froumenty, P.; Pal, C. Facial expression analysis based on high dimensional binary features. In Proceedings of the Computer Vision-ECCV 2014 Workshops, Zurich, Switzerland, 6–7,12 September 2014; Proceedings, Part II. Springer: Cham, Switzerland, 2015; pp. 135–147.
Shan, C.; Gong, S.; McOwan, P.W. Facial expression recognition based on local binary patterns: A comprehensive study. Image Vis. Comput. 2009, 27, 803–816.
Rani, P.; Verma, S.; Yadav, S.P.; Rai, B.K.; Naruka, M.S.; Kumar, D. Simulation of the Lightweight Blockchain Technique Based on Privacy and Security for Healthcare Data for the Cloud System. Int. J. E-Health Med. Commun. 2022, 13, 1–15.
Rani, P.; Singh, P.N.; Verma, S.; Ali, N.; Shukla, P.K.; Alhassan, M. An Implementation of Modified Blowfish Technique with Honey Bee Behavior Optimization for Load Balancing in Cloud System Environment. Wirel. Commun. Mob. Comput. 2022, 2022, 3365392.
Kahou, S.E.; Bouthillier, X.; Lamblin, P.; Gulcehre, C.; Michalski, V.; Konda, K.; Jean, S.; Froumenty, P.; Dauphin, Y.; Boulanger-Lewandowski, N. Emonets: Multimodal deep learning approaches for emotion recognition in video. J. Multimodal User Interfaces 2016, 10, 99–111.
Kalchbrenner, N.; Grefenstette, E.; Blunsom, P. A convolutional neural network for modelling sentences. arXiv 2014, arXiv14042188.
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90.
Shamsolmoali, P.; Zareapoor, M.; Yang, J. Convolutional neural network in network (CNNiN): Hyperspectral image classification and dimensionality reduction. IET Image Process. 2019, 13, 246–253.
Zareapoor, M.; Shamsolmoali, P.; Yang, J. Learning depth super-resolution by using multi-scale convolutional neural network. J. Intell. Fuzzy Syst. 2019, 36, 1773–1783.
Sariyanidi, E.; Gunes, H.; Cavallaro, A. Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 1113–1133.
Sebe, N.; Lew, M.S.; Sun, Y.; Cohen, I.; Gevers, T.; Huang, T.S. Authentic facial expression analysis. Image Vis. Comput. 2007, 25, 1856–1863.
Zhao, G.; Pietikainen, M. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 915–928.
Gharavian, D.; Bejani, M.; Sheikhan, M. Audio-visual emotion recognition using FCBF feature selection method and particle swarm optimization for fuzzy ARTMAP neural networks. Multimed. Tools Appl. 2017, 76, 2331–2352.
Jain, N.; Kumar, S.; Kumar, A.; Shamsolmoali, P.; Zareapoor, M. Hybrid deep neural networks for face emotion recognition. Pattern Recognit. Lett. 2018, 115, 101–106.
Dellandrea, E.; Liu, N.; Chen, L. Classification of affective semantics in images based on discrete and dimensional models of emotions. In Proceedings of the 2010 International Workshop on Content Based Multimedia Indexing (CBMI), Grenoble, France, 23–25 June 2010; pp. 1–6.
Sohail, A.S.M.; Bhattacharya, P. Classifying facial expressions using level set method based lip contour detection and multi-class support vector machines. Int. J. Pattern Recognit. Artif. Intell. 2011, 25, 835–862.
Khan, S.A.; Hussain, A.; Usman, M.; Nazir, M.; Riaz, N.; Mirza, A.M. Robust face recognition using computationally efficient features. J. Intell. Fuzzy Syst. 2014, 27, 3131–3143.
Chelali, F.Z.; Djeradi, A. Face Recognition Using MLP and RBF Neural Network with Gabor and Discrete Wavelet Transform Characterization: A Comparative Study. Math. Probl. Eng. 2015, 2015, e523603.
Ryu, S.-J.; Kirchner, M.; Lee, M.-J.; Lee, H.-K. Rotation invariant localization of duplicated image regions based on Zernike moments. IEEE Trans. Inf. Forensics Secur. 2013, 8, 1355–1370.
Kanade, T.; Cohn, J.F.; Tian, Y. Comprehensive database for facial expression analysis. In Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580), Grenoble, France, 28–30 March 2000; pp. 46–53.
Pantic, M.; Valstar, M.; Rademaker, R.; Maat, L. Web-based database for facial expression analysis. In Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, Amsterdam, The Netherlands, 6 July 2005.
Wang, S.; Wu, C.; He, M.; Wang, J.; Ji, Q. Posed and spontaneous expression recognition through modeling their spatial patterns. Mach. Vis. Appl. 2015, 26, 219–231.
Gunes, H.; Hung, H. Is automatic facial expression recognition of emotions coming to a dead end? The rise of the new kids on the block. Image Vis. Comput. 2016, 55, 6–8.
Sajjad, M.; Ullah, F.U.M.; Ullah, M.; Christodoulou, G.; Cheikh, F.A.; Hijji, M.; Muhammad, K.; Rodrigues, J.J. A comprehensive survey on deep facial expression recognition: Challenges, applications, and future guidelines. Alex. Eng. J. 2023, 68, 817–840.
Ansari, G.; Rani, P.; Kumar, V. A Novel Technique of Mixed Gas Identification Based on the Group Method of Data Handling (GMDH) on Time-Dependent MOX Gas Sensor Data. In Proceedings of International Conference on Recent Trends in Computing; Mahapatra, R.P., Peddoju, S.K., Roy, S., Parwekar, P., Eds.; Lecture Notes in Networks and Systems; Springer Nature: Singapore, 2023; Volume 600, pp. 641–654. ISBN 978-981-19882-4-0.
Visin, F.; Kastner, K.; Cho, K.; Matteucci, M.; Courville, A.; Bengio, Y. ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv 2015, arXiv150500393.
Sulong, G.B.; Wimmer, M.A. Image hiding by using spatial domain steganography. Wasit J. Comput. Math. Sci. 2023, 2, 39–45.
Zhang, T.; Zheng, W.; Cui, Z.; Zong, Y.; Li, Y. Spatial–temporal recurrent neural network for emotion recognition. IEEE Trans. Cybern. 2018, 49, 839–847.
Bhola, B.; Kumar, R.; Rani, P.; Sharma, R.; Mohammed, M.A.; Yadav, K.; Alotaibi, S.D.; Alkwai, L.M. Quality-enabled decentralized dynamic IoT platform with scalable resources integration. IET Commun. 2022.
Heidari, A.; Navimipour, N.J.; Jamali, M.A.J.; Akbarpour, S. A hybrid approach for latency and battery lifetime optimization in IoT devices through offloading and CNN learning. Sustain. Comput. Inform. Syst. 2023, 39, 100899.
Alluhaidan, A.S.; Saidani, O.; Jahangir, R.; Nauman, M.A.; Neffati, O.S. Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network. Appl. Sci. 2023, 13, 4750.
Triki, N.; Karray, M.; Ksantini, M. A real-time traffic sign recognition method using a new attention-based deep convolutional neural network for smart vehicles. Appl. Sci. 2023, 13, 4793.
Zhou, W.; Jia, J. Training convolutional neural network for sketch recognition on large-scale dataset. Int. Arab. J. Inf. Technol. 2020, 17, 82–89.
Zouari, R.; Boubaker, H.; Kherallah, M. RNN-LSTM Based Beta-Elliptic Model for Online Handwriting Script Identification. Int. Arab. J. Inf. Technol. 2018, 15, 532–539.
Barros, P.; Wermter, S. Developing crossmodal expression recognition based on a deep neural model. Adapt. Behav. 2016, 24, 373–396.
Zhang, K.; Huang, Y.; Du, Y.; Wang, L. Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 2017, 26, 4193–4203.
Jeong, M.; Ko, B.C. Driver’s Facial Expression Recognition in Real-Time for Safe Driving. Sensors 2018, 18, 4270.
Ullah, M.; Ullah, H.; Alseadonn, I.M. Human action recognition in videos using stable features. Signal Image Process. Int. J. 2017, 8, 1–10.
Wang, S.-H.; Phillips, P.; Dong, Z.-C.; Zhang, Y.-D. Intelligent facial emotion recognition based on stationary wavelet entropy and Jaya algorithm. Neurocomputing 2018, 272, 668–676.
Yan, H. Collaborative discriminative multi-metric learning for facial expression recognition in video. Pattern Recognit. 2018, 75, 33–40.
Samadiani, N.; Huang, G.; Cai, B.; Luo, W.; Chi, C.-H.; Xiang, Y.; He, J. A Review on Automatic Facial Expression Recognition Systems Assisted by Multimodal Sensor Data. Sensors 2019, 19, 1863.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.