Machine Learning-Based Facial Palsy Detection and Evaluation: History
Please note this is an old version of this entry, which may differ significantly from the current revision.

Automated solutions for medical diagnosis based on computer vision form an emerging field of science aiming to enhance diagnosis and early disease detection. The detection and quantification of facial asymmetries enable facial palsy evaluation.  Deep learning methods allow the automatic learning of discriminative deep facial features, leading to comparatively higher performance accuracies.

  • mathematical modeling
  • facial palsy
  • facial landmarks
  • asymmetry index
  • deep learning

1. Introduction

Facial palsy is a common neuromuscular disorder causing facial weakness and the disability of facial expressions [1]. Palsy patients lose control of the affected side of their face, experiencing the dropping or stiffness of muscles and disorders of taste buds. Statistics regarding facial palsy report 25 incidents annually per 100,000 people, or approximately one patient out of 60 people in their lifetime, while an average of 40,000 palsy patients are reported in the United States every year [2]. Even though palsy does not cause patients to be in physical pain, they experience phycological stress, external discomfort, and depression, since palsy affects their appearance, facial movements, feeding functions, and, thus, their daily lives [3]. Therefore, the accurate diagnosis and exact evaluation of the degree of palsy are essential for the objective assessment of the facial nerve’s function in terms of monitoring the progress or resolution of palsy. The latter could help for evaluating the therapeutic processes and designing effective treatment plans.
Traditionally, the diagnosis of facial palsy is clinically performed by specialized neurologists who force patients to perform specific facial expressions for evaluating the condition of certain face muscles. The level of palsy is assessed by evaluating the symmetry between the right and left parts of the face in terms of various scoring standards and measuring distances between facial landmarks for both sides with a simple ruler [4]. The manual and empirical evaluation of palsy are, therefore, both labor intensive and subjective. Assessment based on visual inspection makes it hard to precisely quantify the severity of palsy, and it is not feasible to track improvements between subsequent rehabilitation interventions. Moreover, assessment relies on the degree of human expertise; thus, the clinical quantification of palsy may differ between different neurologists [5].
Automatic inspection approaches can alleviate these disadvantages and provide more consistent and objective facial palsy diagnosis and evaluation methods, providing neurologists with an efficient decision-supporting tool [6]. The automatic quantitative evaluation of facial palsy has been a subject of research for many years. Several approaches use optical markers attached to human faces to determine the degree of palsy [7,8], as well as full-face laser scanning [9,10] or electroneurography (ENoG) and electromyography (EMG) signals. The latter approaches, although very accurate, require specialized high-cost equipment and a constrained clinical environment and presuppose physical interventions, which are obtrusive and uncomfortable. Moreover, the patients themselves cannot perform these approaches on their own to monitor their progress at home.
Recent advancements in image analysis algorithms, combined with the increasingly affordable cost of high-resolution capturing devices, resulted in the development of efficient, simple, and cost-effective vision-based techniques for medical applications, reporting impressive state-of-the-art performances [11,12,13]. The diagnosis of various diseases is greatly assisted by facial abnormalities recognition using computer vision [14,15], dynamically incorporating facial recognition into artificial intelligence (AI)-based medicine [16,17]. Automatic image-based facial palsy could accelerate the diagnosis and progress evaluation of the disease, offering a non-invasive, simple, and time- and cost-saving method that could be used by the palsy patients themselves without the presence of a human expert.

2. Machine Learning-Based Facial Palsy Detection and Evaluation

Traditional machine learning methods are based on encoding facial palsy with facial asymmetry-related mathematical features. A portable automatic diagnosis system based on a smartphone application for classifying subjects to healthy or palsy patients was presented by Kim et al. [63]. Facial landmarks were extracted, and an asymmetry index was computed. Classification was implemented using Linear Discriminant Analysis (LDA) combined with Support Vector Machines (SVMs), resulting in 88.9% classification accuracy. Wang et al. [64] used Active Shape Models (ASMs) to locate facial landmarks, dividing the face in eight regions and Local Binary Patterns (LBPs) used to extract descriptors for recognizing patterns of facial movements in these regions, reaching the highest recognition rate of up to 93.33%. In [65], He et al. extracted features based on LBPs in the spatial–temporal domain in both facial regions and validated their method using biomedical videos, reporting an overall accuracy of up to 94% for the HB grading. In [51], the authors automatically measure the ability of palsy patients to smile using Active Appearance Models (AAMs) for feature extraction and facial expression synthesis, providing an average accuracy of 87%. McGrenary et al. [66] quantified facial asymmetry in videos using an artificial neural network (ANN).
Early research into facial asymmetry analysis was also studied by Quan et al. [67], who presented a method for automatically detecting and quantifying facial dysfunctions based on 3D face scans. The authors extracted a number of feature points that enabled the segmentation of faces in local regions, enabling specific asymmetry evaluation for regions of interest rather than the entire face. Gaber et al. [68] proposed an evaluation system for seven palsy categories based on an ensemble learning SVM classifier, reporting an accuracy of 96.8%. The authors proved that their proposed classifier was robust and stable, even for different training and testing samples. Zhuang et al. [69] implemented a performance evaluation between various feature extraction techniques and concluded that 2D static images with Histogram of Oriented Gradients (HOG) features tend to be more accurate. The authors proposed a framework in which landmark and HOG features were extracted, Principal Component Analysis (PCA) was employed separately to the features, and the results were used as inputs to an SVM classifier for classification into three classes, demonstrating performance of up to 92.2% for the entire face. The same research group, as shown in [70], demonstrated a video classification detection tool, namely the Facial Deficit Identification Tool for Videos (F-DIT-V), exploiting HOG features to find a 92.9% classification accuracy. Arora et al. [71] tested an SVM and a Logistic Regressor on generated facial landmark features, achieving 76.87% average accuracy with SVM. In [72], laser speckle contrast imaging was employed by Jiang et al. to monitor the facial blood flow of palsy patients. Then, faces were segmented into regions based on blood distribution features, and three HB score classifiers were tested for their classification performance: a neural network (NN), an SVM, and a k-NN, achieving an accuracy of up to 97.14%. A set of four classifiers (multi-layer perceptron (MLP), SVM, k-NN, multinomial logistic regression (MNLR)) was also comparatively tested in [73]. The authors explored regional information, extracting handcrafted features only in certain face areas of interest. Experimental results reported up to 95.61% correct facial palsy detection and 95.58% correct facial palsy assessment in three categories (healthy, slight palsy, and strong palsy).
All previous methods are based on hand-crafted features. Deep learning methods can automatically learn discriminative feature from the data, without the need to compute them in advance. Deep learning models have accomplished state-of-the-art performances in the field of medical imaging [74]. Based on the above, most of the recent works in vision-based facial palsy detection and evaluation employ deep features. Storey and Jiang [75] presented a unified multitask convolutional neural network (CNN) for the simultaneous object proposal, detection and asymmetry analysis of faces. Sajid et al. [76] introduced a CNN to classify palsy into five scales, resulting in a 92.6% average classification accuracy. Xia et al. [5] suggested a deep neural network (DNN) to detect facial landmarks in palsy. Hsu et al. [33] proposed a deep hierarchical network (DHN) to quantify facial palsy, including a YOLO2 detector for face detection, a fused neural architecture (line segment network—LSN) to detect facial landmarks, and an object detector, similar to Darknet, to locate palsy regions. Preliminary results of the same method were published in [77]. Guo et al. [78] investigated the unilateral peripheral facial paralysis classification using GoogLeNet, reaching a classification accuracy of up to 91.25% for predicting the HB degree.
Storey et al. [79] implemented a facial grading system from video sequences based on a 3D CNN model using ResNet as the backbone, reporting a palsy classification accuracy of up to 82%. Barrios Dell’Olio and Sra [80] proposed a CNN for detecting muscle activation and intensity in the users of their mobile augmented reality mirror therapy system. In [81], Tan et al. introduced a facial palsy assessment method, including a facial landmark detector, a feature extractor based on EfficientNet backbone and semi-supervised extreme learning to classify features, reporting an 85.5% accuracy. Abayomi-Alli et al. [82] trained a SqueezeNet network with augmented images and used the activations from the final convolutional layer as features to train a multiclass error-corrected output code SVM (ECOC-SVM) classifier, reporting an up to 99.34% mean classification accuracy. In [83], computed tomography (CT) images were used to train two geometric deep learning models, namely PointNet++ and PointCNN, for the facial part segmentation of healthy and palsy patients for facial monitoring and rehabilitation. Umirzakova et al. [84] suggested a light deep learning model for analyzing facial symmetry, using a foreground attention block for enhanced local feature extraction and a depth-map estimator to provide more accurate segmentation results. Table 4 summarizes basic information from all the aforementioned studies, including the followed methodology, dataset, and performance results. Details regarding the mathematical modeling of machine learning and deep learning classification models can be found in [85,86,87,88,89].
Table 4. Methodologies for facial palsy (FP) detection.
Ref. Objective Methodology Dataset Performance Conclusions/Limitations
[63] Smartphone-based FP diagnostic system (five FP grades) Linear regression model for facial landmark detection and SVM with linear kernel for classification Private dataset of 36 subjects (23 noral−13 palsy patients) performing 3 motions 88.9% classification accuracy Reproducibility under different experimental conditions, as well as repeatability of measurements over a period of time, were not implemented
[64] Facial movement patterns recognition for FP (2 classes, i.e., normal and asymmetric) Active Shape Models plus Local Binary Patterns (ASMLBP) for feature extraction and SVM for classification Private dataset of 570 images of 57 subjects with 5 facial movements Up to 93.33% recognition rate High robustness and accuracy
[65] Quantitative evaluation of FP (HB scale) Multiresolution extension of uniform LBP and SVM for FP evaluation Private dataset of 197 subject videos with 5 facial movements ~94% classification accuracy Sensitive to out-plane facial movements, with significant natural bilateral asymmetry
[51] Facial landmarks tracking and feedback for FP assessment (HB scale) Active Appearance Models (AAMs) for facial expression synthesis Private dataset of frontal images of neutral and smile expressions from 5 healthy subjects 87% accuracy Preliminary results to demonstrate a proof of concept
[66] FP assessment ANN Private dataset of 43 videos from 14 subjects 1.6% average MSE Pilot study; general results follow the opinions of experts
[67] Facial asymmetry measurement Measuring 3D asymmetry index Three-dimensional dynamic scans from Hi4D-ADSIP database (stroke) - Extraction of 3D feature points, as well as potential for detecting facial dysfunctions
[68] FP classification of real-time facial animation units (seven FP grades) Ensemble learning SVM classifier Private dataset of 375 records from 13 patients and 1650 records from 50 control subjects 96.8% accuracy
88.9% sensitivity
99% specificity
Data augmentation for the imbalanced dataset issues
[69] FP quantification Combination of landmarks and intensity
HoG-based features and a CNN model for classification
Private dataset of 125 images of left facial weakness, 126 images of right facial weakness, and 186 images of normal subjects Up to 94.5% accuracy The
combination of landmarks and HoG intensity features produced the best, when compared to either landmarks or intensity features separately
[70] FP classification (three classes) HOG features and a voting classifier Private dataset of 37 videos of left weakness, 38 of right and 60 of normal subjects 92.9% accuracy
93.6% precision
92.8% recall
94.2% specificity
Comparison with other methods revealed the reliability of HOG features
[71] Facial metric calculation of face sides symmetry Facial landmark features with cascade regression and SVM Stroke faces dataset of 1024 images and 1081 images of healthy faces 76.87% accuracy Machine learning problem-specific models can lead to improved performances
[72] FP assessment (HB scale) Laser speckle contrast imaging and NN classifiers Private dataset of 80 FP patients 97.14% accuracy Outperforms the state-of-the-art systems and other classifiers
[73] FP classification (three classes) Regional handcrafted features and four classifiers (MLP, SVM, k-NN, MNLR) YouTube Facial Palsy (YFP) database Up to 95.58% correct classification Severity is higher classified in eyes and mouth regions
[75] Face symmetry analysis (symmetrical-asymmetrical) Unified multi-task
CNN
AFLW database to fine tune the model and extended Cohn–Kanade (CK+) to learn face symmetry (18,786 images in total) - Lack of fully annotated training set, as well as the need for labeling or a synthesized training set
[76] FP classification (five grades) CNN (VGG-16) Dataset from online sources augmented to 2000 images 92.6% accuracy
92.91% precision
93.14% sensitivity
93% F1 Score
Deep features combined with data augmentation can lead to robust classification
[5] FP classification FCN AFLFP dataset Normalized mean error (NME): 11.5% Mean average: 2.3% standard deviation Comparative results indicate that deep learning methods are, overall, better than machine learning methods
[33] Quantitative analysis of FP Deep Hierarchical Network YouTube Facial Palsy (YFP) database 5.83% NME Line segment learning
leads to an important part of deep features being able to improve the accuracy of facial landmark and palsy region detection
[77] Quantitative analysis of FP Hierarchical Detection Network YouTube Facial Palsy
(YFP) database
Up to 93% precision and 88% recall Efficient for video-to-description diagnosis
[78] Unilateral peripheral FP assessment (HB scale) Deep CNN Private dataset of 720 labeled images of four facial expressions 91.25% classification accuracy Fine-tuning deep CNNs can learn specific representations from biomedical images
[79] FP grading Fully 3D CNN Private FP dataset of 696 sequences with 17 subjects 82% classification accuracy Very competent at learning spatio-temporal features
[80] AR system for FP estimation Light-Weight Facial Activation Unit model (LW-FAU) Private dataset from 20 subjects - Lack of FP benchmark models and datasets
[81] FP assessment (six classes) FNPARCELM-CCNN method YouTube Facial Palsy
(YFP) database
85.5% accuracy Semi-supervised methods can distinguish different degrees of FP, even with little-labeled data
[82] FP detection and classification Deep feature extraction with SqueezeNet and ECOC-SVM classifier YouTube Facial Palsy
(YFP) database
99.34% accuracy Improvement in FP detection from a small dataset
[83] Part segmentation Point-Net++ and PointCNN CT images of 33 subjects 99.19% accuracy
89.09% IOU
Geometric deep learning can be efficient
[84] FP asymmetry analysis Proposed deep architecture YouTube Facial Palsy
(YFP) database
93.8% IOU Poor with bearded faces due to a lack of such training data images

From the information included in Table 4, useful conclusions can be drawn. The lack of available datasets designated for palsy detection and evaluation is obvious. Most research teams develop their own private sets to test their algorithms. The most used public dataset among the referenced works is the YFP dataset; however, it refers to a limited video dataset. The videos are converted into image sequences; however, low dysfunctions cannot be easily visible from only one image and, thus, a sequence of frames needs to be examined to draw conclusions. Moreover, the dataset is labeled but facial landmark points are not annotated. From Table 4, it can be observed that deep learning methods lead to better performance results compared to machine learning methods or methods relying on hand-crafted features.

This entry is adapted from the peer-reviewed paper 10.3390/axioms12121091

This entry is offline, you can click here to edit this entry!
Video Production Service