Automatic inspection approaches can alleviate these disadvantages and provide more consistent and objective facial palsy diagnosis and evaluation methods, providing neurologists with an efficient decision-supporting tool
[6]. The automatic quantitative evaluation of facial palsy has been a subject of research for many years. Several approaches use optical markers attached to human faces to determine the degree of palsy
[7[7][8],
8], as well as full-face laser scanning
[9,10][9][10] or electroneurography (ENoG) and electromyography (EMG) signals. The latter approaches, although very accurate, require specialized high-cost equipment and a constrained clinical environment and presuppose physical interventions, which are obtrusive and uncomfortable. Moreover, the patients themselves cannot perform these approaches on their own to monitor their progress at home.
Recent advancements in image analysis algorithms, combined with the increasingly affordable cost of high-resolution capturing devices, resulted in the development of efficient, simple, and cost-effective vision-based techniques for medical applications, reporting impressive state-of-the-art performances
[11,12,13][11][12][13]. The diagnosis of various diseases is greatly assisted by facial abnormalities recognition using computer vision
[14[14][15],
15], dynamically incorporating facial recognition into artificial intelligence (AI)-based medicine
[16,17][16][17]. Automatic image-based facial palsy could accelerate the diagnosis and progress evaluation of the disease, offering a non-invasive, simple, and time- and cost-saving method that could be used by the palsy patients themselves without the presence of a human expert.
2. Machine Learning-Based Facial Palsy Detection and Evaluation
Traditional machine learning methods are based on encoding facial palsy with facial asymmetry-related mathematical features. A portable automatic diagnosis system based on a smartphone application for classifying subjects to healthy or palsy patients was presented by Kim et al.
[63][18]. Facial landmarks were extracted, and an asymmetry index was computed. Classification was implemented using Linear Discriminant Analysis (LDA) combined with Support Vector Machines (SVMs), resulting in 88.9% classification accuracy. Wang et al.
[64][19] used Active Shape Models (ASMs) to locate facial landmarks, dividing the face in eight regions and Local Binary Patterns (LBPs) used to extract descriptors for recognizing patterns of facial movements in these regions, reaching the highest recognition rate of up to 93.33%. In
[65][20], He et al. extracted features based on LBPs in the spatial–temporal domain in both facial regions and validated their method using biomedical videos, reporting an overall accuracy of up to 94% for the HB grading. In
[51][21], the authors automatically measure the ability of palsy patients to smile using Active Appearance Models (AAMs) for feature extraction and facial expression synthesis, providing an average accuracy of 87%. McGrenary et al.
[66][22] quantified facial asymmetry in videos using an artificial neural network (ANN).
Early research into facial asymmetry analysis was also studied by Quan et al.
[67][23], who presented a method for automatically detecting and quantifying facial dysfunctions based on 3D face scans. The authors extracted a number of feature points that enabled the segmentation of faces in local regions, enabling specific asymmetry evaluation for regions of interest rather than the entire face. Gaber et al.
[68][24] proposed an evaluation system for seven palsy categories based on an ensemble learning SVM classifier, reporting an accuracy of 96.8%. The authors proved that their proposed classifier was robust and stable, even for different training and testing samples. Zhuang et al.
[69][25] implemented a performance evaluation between various feature extraction techniques and concluded that 2D static images with Histogram of Oriented Gradients (HOG) features tend to be more accurate. The authors proposed a framework in which landmark and HOG features were extracted, Principal Component Analysis (PCA) was employed separately to the features, and the results were used as inputs to an SVM classifier for classification into three classes, demonstrating performance of up to 92.2% for the entire face. The same research group, as shown in
[70][26], demonstrated a video classification detection tool, namely the Facial Deficit Identification Tool for Videos (F-DIT-V), exploiting HOG features to find a 92.9% classification accuracy. Arora et al.
[71][27] tested an SVM and a Logistic Regressor on generated facial landmark features, achieving 76.87% average accuracy with SVM. In
[72][28], laser speckle contrast imaging was employed by Jiang et al. to monitor the facial blood flow of palsy patients. Then, faces were segmented into regions based on blood distribution features, and three HB score classifiers were tested for their classification performance: a neural network (NN), an SVM, and a k-NN, achieving an accuracy of up to 97.14%. A set of four classifiers (multi-layer perceptron (MLP), SVM, k-NN, multinomial logistic regression (MNLR)) was also comparatively tested in
[73][29]. The authors explored regional information, extracting handcrafted features only in certain face areas of interest. Experimental results reported up to 95.61% correct facial palsy detection and 95.58% correct facial palsy assessment in three categories (healthy, slight palsy, and strong palsy).
All previous methods are based on hand-crafted features. Deep learning methods can automatically learn discriminative feature from the data, without the need to compute them in advance. Deep learning models have accomplished state-of-the-art performances in the field of medical imaging
[74][30]. Based on the above, most of the recent works in vision-based facial palsy detection and evaluation employ deep features. Storey and Jiang
[75][31] presented a unified multitask convolutional neural network (CNN) for the simultaneous object proposal, detection and asymmetry analysis of faces. Sajid et al.
[76][32] introduced a CNN to classify palsy into five scales, resulting in a 92.6% average classification accuracy. Xia et al.
[5] suggested a deep neural network (DNN) to detect facial landmarks in palsy. Hsu et al.
[33] proposed a deep hierarchical network (DHN) to quantify facial palsy, including a YOLO2 detector for face detection, a fused neural architecture (line segment network—LSN) to detect facial landmarks, and an object detector, similar to Darknet, to locate palsy regions. Preliminary results of the same method were published in
[77][34]. Guo et al.
[78][35] investigated the unilateral peripheral facial paralysis classification using GoogLeNet, reaching a classification accuracy of up to 91.25% for predicting the HB degree.
Storey et al.
[79][36] implemented a facial grading system from video sequences based on a 3D CNN model using ResNet as the backbone, reporting a palsy classification accuracy of up to 82%. Barrios Dell’Olio and Sra
[80][37] proposed a CNN for detecting muscle activation and intensity in the users of their mobile augmented reality mirror therapy system. In
[81][38], Tan et al. introduced a facial palsy assessment method, including a facial landmark detector, a feature extractor based on EfficientNet backbone and semi-supervised extreme learning to classify features, reporting an 85.5% accuracy. Abayomi-Alli et al.
[82][39] trained a SqueezeNet network with augmented images and used the activations from the final convolutional layer as features to train a multiclass error-corrected output code SVM (ECOC-SVM) classifier, reporting an up to 99.34% mean classification accuracy. In
[83][40], computed tomography (CT) images were used to train two geometric deep learning models, namely PointNet++ and PointCNN, for the facial part segmentation of healthy and palsy patients for facial monitoring and rehabilitation. Umirzakova et al.
[84][41] suggested a light deep learning model for analyzing facial symmetry, using a foreground attention block for enhanced local feature extraction and a depth-map estimator to provide more accurate segmentation results.
Table 41 summarizes basic information from all the aforementioned studies, including the followed methodology, dataset, and performance results. Details regarding the mathematical modeling of machine learning and deep learning classification models can be found in
[85,86,87,88,89][42][43][44][45][46].
Table 41.
Methodologies for facial palsy (FP) detection.
Ref. |
Objective |
Methodology |
Dataset |
Performance |
Conclusions/Limitations |
[63][18] |
Smartphone-based FP diagnostic system (five FP grades) |
Linear regression model for facial landmark detection and SVM with linear kernel for classification |
Private dataset of 36 subjects (23 noral−13 palsy patients) performing 3 motions |
88.9% classification accuracy |
Reproducibility under different experimental conditions, as well as repeatability of measurements over a period of time, were not implemented |
[64][19] |
Facial movement patterns recognition for FP (2 classes, i.e., normal and asymmetric) |
Active Shape Models plus Local Binary Patterns (ASMLBP) for feature extraction and SVM for classification |
Private dataset of 570 images of 57 subjects with 5 facial movements |
Up to 93.33% recognition rate |
High robustness and accuracy |
[65][20] |
Quantitative evaluation of FP (HB scale) |
Multiresolution extension of uniform LBP and SVM for FP evaluation |
Private dataset of 197 subject videos with 5 facial movements |
~94% classification accuracy |
Sensitive to out-plane facial movements, with significant natural bilateral asymmetry |
[51][21] |
Facial landmarks tracking and feedback for FP assessment (HB scale) |
Active Appearance Models (AAMs) for facial expression synthesis |
Private dataset of frontal images of neutral and smile expressions from 5 healthy subjects |
87% accuracy |
Preliminary results to demonstrate a proof of concept |
[66][22] |
FP assessment |
ANN |
Private dataset of 43 videos from 14 subjects |
1.6% average MSE |
Pilot study; general results follow the opinions of experts |
[67][23] |
Facial asymmetry measurement |
Measuring 3D asymmetry index |
Three-dimensional dynamic scans from Hi4D-ADSIP database (stroke) |
- |
Extraction of 3D feature points, as well as potential for detecting facial dysfunctions |
[68][24] |
FP classification of real-time facial animation units (seven FP grades) |
Ensemble learning SVM classifier |
Private dataset of 375 records from 13 patients and 1650 records from 50 control subjects |
96.8% accuracy 88.9% sensitivity 99% specificity |
Data augmentation for the imbalanced dataset issues |
[69][25] |
FP quantification |
Combination of landmarks and intensity HoG-based features and a CNN model for classification |
Private dataset of 125 images of left facial weakness, 126 images of right facial weakness, and 186 images of normal subjects |
Up to 94.5% accuracy |
The combination of landmarks and HoG intensity features produced the best, when compared to either landmarks or intensity features separately |
[70][26] |
FP classification (three classes) |
HOG features and a voting classifier |
Private dataset of 37 videos of left weakness, 38 of right and 60 of normal subjects |
92.9% accuracy 93.6% precision 92.8% recall 94.2% specificity |
Comparison with other methods revealed the reliability of HOG features |
[71][27] |
Facial metric calculation of face sides symmetry |
Facial landmark features with cascade regression and SVM |
Stroke faces dataset of 1024 images and 1081 images of healthy faces |
76.87% accuracy |
Machine learning problem-specific models can lead to improved performances |
[72][28] |
FP assessment (HB scale) |
Laser speckle contrast imaging and NN classifiers |
Private dataset of 80 FP patients |
97.14% accuracy |
Outperforms the state-of-the-art systems and other classifiers |
[73][29] |
FP classification (three classes) |
Regional handcrafted features and four classifiers (MLP, SVM, k-NN, MNLR) |
YouTube Facial Palsy (YFP) database |
Up to 95.58% correct classification |
Severity is higher classified in eyes and mouth regions |
[75][31] |
Face symmetry analysis (symmetrical-asymmetrical) |
Unified multi-task CNN |
AFLW database to fine tune the model and extended Cohn–Kanade (CK+) to learn face symmetry (18,786 images in total) |
- |
Lack of fully annotated training set, as well as the need for labeling or a synthesized training set |
[76][32] |
FP classification (five grades) |
CNN (VGG-16) |
Dataset from online sources augmented to 2000 images |
92.6% accuracy 92.91% precision 93.14% sensitivity 93% F1 Score |
Deep features combined with data augmentation can lead to robust classification |
[5] |
FP classification |
FCN |
AFLFP dataset |
Normalized mean error (NME): 11.5% Mean average: 2.3% standard deviation |
Comparative results indicate that deep learning methods are, overall, better than machine learning methods |
[33] |
Quantitative analysis of FP |
Deep Hierarchical Network |
YouTube Facial Palsy (YFP) database |
5.83% NME |
Line segment learning leads to an important part of deep features being able to improve the accuracy of facial landmark and palsy region detection |
[77][34] |
Quantitative analysis of FP |
Hierarchical Detection Network |
YouTube Facial Palsy (YFP) database |
Up to 93% precision and 88% recall |
Efficient for video-to-description diagnosis |
[78][35] |
Unilateral peripheral FP assessment (HB scale) |
Deep CNN |
Private dataset of 720 labeled images of four facial expressions |
91.25% classification accuracy |
Fine-tuning deep CNNs can learn specific representations from biomedical images |
[79][36] |
FP grading |
Fully 3D CNN |
Private FP dataset of 696 sequences with 17 subjects |
82% classification accuracy |
Very competent at learning spatio-temporal features |
[80][37] |
AR system for FP estimation |
Light-Weight Facial Activation Unit model (LW-FAU) |
Private dataset from 20 subjects |
- |
Lack of FP benchmark models and datasets |
[81][38] |
FP assessment (six classes) |
FNPARCELM-CCNN method |
YouTube Facial Palsy (YFP) database |
85.5% accuracy |
Semi-supervised methods can distinguish different degrees of FP, even with little-labeled data |
[82][39] |
FP detection and classification |
Deep feature extraction with SqueezeNet and ECOC-SVM classifier |
YouTube Facial Palsy (YFP) database |
99.34% accuracy |
Improvement in FP detection from a small dataset |
[83][40] |
Part segmentation |
Point-Net++ and PointCNN |
CT images of 33 subjects |
99.19% accuracy 89.09% IOU |
Geometric deep learning can be efficient |
[84][41] |
FP asymmetry analysis |
Proposed deep architecture |
YouTube Facial Palsy (YFP) database |
93.8% IOU |
Poor with bearded faces due to a lack of such training data images |
From the information included in Table 41, useful conclusions can be drawn. The lack of available datasets designated for palsy detection and evaluation is obvious. Most research teams develop their own private sets to test their algorithms. The most used public dataset among the referenced works is the YFP dataset; however, it refers to a limited video dataset. The videos are converted into image sequences; however, low dysfunctions cannot be easily visible from only one image and, thus, a sequence of frames needs to be examined to draw conclusions. Moreover, the dataset is labeled but facial landmark points are not annotated. From Table 4, it can be observed that deep learning methods lead to better performance results compared to machine learning methods or methods relying on hand-crafted features.