Machine Learning-Based Facial Palsy Detection and Evaluation

Machine Learning-Based Facial Palsy Detection and Evaluation: History

View Latest Version

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Computer Science, Artificial Intelligence

Contributor:

Automated solutions for medical diagnosis based on computer vision form an emerging field of science aiming to enhance diagnosis and early disease detection. The detection and quantification of facial asymmetries enable facial palsy evaluation. Deep learning methods allow the automatic learning of discriminative deep facial features, leading to comparatively higher performance accuracies.

mathematical modeling
facial palsy
facial landmarks
asymmetry index
deep learning

1. Introduction

Facial palsy is a common neuromuscular disorder causing facial weakness and the disability of facial expressions [1]. Palsy patients lose control of the affected side of their face, experiencing the dropping or stiffness of muscles and disorders of taste buds. Statistics regarding facial palsy report 25 incidents annually per 100,000 people, or approximately one patient out of 60 people in their lifetime, while an average of 40,000 palsy patients are reported in the United States every year [2]. Even though palsy does not cause patients to be in physical pain, they experience phycological stress, external discomfort, and depression, since palsy affects their appearance, facial movements, feeding functions, and, thus, their daily lives [3]. Therefore, the accurate diagnosis and exact evaluation of the degree of palsy are essential for the objective assessment of the facial nerve’s function in terms of monitoring the progress or resolution of palsy. The latter could help for evaluating the therapeutic processes and designing effective treatment plans.

Traditionally, the diagnosis of facial palsy is clinically performed by specialized neurologists who force patients to perform specific facial expressions for evaluating the condition of certain face muscles. The level of palsy is assessed by evaluating the symmetry between the right and left parts of the face in terms of various scoring standards and measuring distances between facial landmarks for both sides with a simple ruler [4]. The manual and empirical evaluation of palsy are, therefore, both labor intensive and subjective. Assessment based on visual inspection makes it hard to precisely quantify the severity of palsy, and it is not feasible to track improvements between subsequent rehabilitation interventions. Moreover, assessment relies on the degree of human expertise; thus, the clinical quantification of palsy may differ between different neurologists [5].

Automatic inspection approaches can alleviate these disadvantages and provide more consistent and objective facial palsy diagnosis and evaluation methods, providing neurologists with an efficient decision-supporting tool [6]. The automatic quantitative evaluation of facial palsy has been a subject of research for many years. Several approaches use optical markers attached to human faces to determine the degree of palsy [7,8], as well as full-face laser scanning [9,10] or electroneurography (ENoG) and electromyography (EMG) signals. The latter approaches, although very accurate, require specialized high-cost equipment and a constrained clinical environment and presuppose physical interventions, which are obtrusive and uncomfortable. Moreover, the patients themselves cannot perform these approaches on their own to monitor their progress at home.

Recent advancements in image analysis algorithms, combined with the increasingly affordable cost of high-resolution capturing devices, resulted in the development of efficient, simple, and cost-effective vision-based techniques for medical applications, reporting impressive state-of-the-art performances [11,12,13]. The diagnosis of various diseases is greatly assisted by facial abnormalities recognition using computer vision [14,15], dynamically incorporating facial recognition into artificial intelligence (AI)-based medicine [16,17]. Automatic image-based facial palsy could accelerate the diagnosis and progress evaluation of the disease, offering a non-invasive, simple, and time- and cost-saving method that could be used by the palsy patients themselves without the presence of a human expert.

2. Machine Learning-Based Facial Palsy Detection and Evaluation

Traditional machine learning methods are based on encoding facial palsy with facial asymmetry-related mathematical features. A portable automatic diagnosis system based on a smartphone application for classifying subjects to healthy or palsy patients was presented by Kim et al. [63]. Facial landmarks were extracted, and an asymmetry index was computed. Classification was implemented using Linear Discriminant Analysis (LDA) combined with Support Vector Machines (SVMs), resulting in 88.9% classification accuracy. Wang et al. [64] used Active Shape Models (ASMs) to locate facial landmarks, dividing the face in eight regions and Local Binary Patterns (LBPs) used to extract descriptors for recognizing patterns of facial movements in these regions, reaching the highest recognition rate of up to 93.33%. In [65], He et al. extracted features based on LBPs in the spatial–temporal domain in both facial regions and validated their method using biomedical videos, reporting an overall accuracy of up to 94% for the HB grading. In [51], the authors automatically measure the ability of palsy patients to smile using Active Appearance Models (AAMs) for feature extraction and facial expression synthesis, providing an average accuracy of 87%. McGrenary et al. [66] quantified facial asymmetry in videos using an artificial neural network (ANN).

Early research into facial asymmetry analysis was also studied by Quan et al. [67], who presented a method for automatically detecting and quantifying facial dysfunctions based on 3D face scans. The authors extracted a number of feature points that enabled the segmentation of faces in local regions, enabling specific asymmetry evaluation for regions of interest rather than the entire face. Gaber et al. [68] proposed an evaluation system for seven palsy categories based on an ensemble learning SVM classifier, reporting an accuracy of 96.8%. The authors proved that their proposed classifier was robust and stable, even for different training and testing samples. Zhuang et al. [69] implemented a performance evaluation between various feature extraction techniques and concluded that 2D static images with Histogram of Oriented Gradients (HOG) features tend to be more accurate. The authors proposed a framework in which landmark and HOG features were extracted, Principal Component Analysis (PCA) was employed separately to the features, and the results were used as inputs to an SVM classifier for classification into three classes, demonstrating performance of up to 92.2% for the entire face. The same research group, as shown in [70], demonstrated a video classification detection tool, namely the Facial Deficit Identification Tool for Videos (F-DIT-V), exploiting HOG features to find a 92.9% classification accuracy. Arora et al. [71] tested an SVM and a Logistic Regressor on generated facial landmark features, achieving 76.87% average accuracy with SVM. In [72], laser speckle contrast imaging was employed by Jiang et al. to monitor the facial blood flow of palsy patients. Then, faces were segmented into regions based on blood distribution features, and three HB score classifiers were tested for their classification performance: a neural network (NN), an SVM, and a k-NN, achieving an accuracy of up to 97.14%. A set of four classifiers (multi-layer perceptron (MLP), SVM, k-NN, multinomial logistic regression (MNLR)) was also comparatively tested in [73]. The authors explored regional information, extracting handcrafted features only in certain face areas of interest. Experimental results reported up to 95.61% correct facial palsy detection and 95.58% correct facial palsy assessment in three categories (healthy, slight palsy, and strong palsy).

All previous methods are based on hand-crafted features. Deep learning methods can automatically learn discriminative feature from the data, without the need to compute them in advance. Deep learning models have accomplished state-of-the-art performances in the field of medical imaging [74]. Based on the above, most of the recent works in vision-based facial palsy detection and evaluation employ deep features. Storey and Jiang [75] presented a unified multitask convolutional neural network (CNN) for the simultaneous object proposal, detection and asymmetry analysis of faces. Sajid et al. [76] introduced a CNN to classify palsy into five scales, resulting in a 92.6% average classification accuracy. Xia et al. [5] suggested a deep neural network (DNN) to detect facial landmarks in palsy. Hsu et al. [33] proposed a deep hierarchical network (DHN) to quantify facial palsy, including a YOLO2 detector for face detection, a fused neural architecture (line segment network—LSN) to detect facial landmarks, and an object detector, similar to Darknet, to locate palsy regions. Preliminary results of the same method were published in [77]. Guo et al. [78] investigated the unilateral peripheral facial paralysis classification using GoogLeNet, reaching a classification accuracy of up to 91.25% for predicting the HB degree.

Storey et al. [79] implemented a facial grading system from video sequences based on a 3D CNN model using ResNet as the backbone, reporting a palsy classification accuracy of up to 82%. Barrios Dell’Olio and Sra [80] proposed a CNN for detecting muscle activation and intensity in the users of their mobile augmented reality mirror therapy system. In [81], Tan et al. introduced a facial palsy assessment method, including a facial landmark detector, a feature extractor based on EfficientNet backbone and semi-supervised extreme learning to classify features, reporting an 85.5% accuracy. Abayomi-Alli et al. [82] trained a SqueezeNet network with augmented images and used the activations from the final convolutional layer as features to train a multiclass error-corrected output code SVM (ECOC-SVM) classifier, reporting an up to 99.34% mean classification accuracy. In [83], computed tomography (CT) images were used to train two geometric deep learning models, namely PointNet++ and PointCNN, for the facial part segmentation of healthy and palsy patients for facial monitoring and rehabilitation. Umirzakova et al. [84] suggested a light deep learning model for analyzing facial symmetry, using a foreground attention block for enhanced local feature extraction and a depth-map estimator to provide more accurate segmentation results. Table 4 summarizes basic information from all the aforementioned studies, including the followed methodology, dataset, and performance results. Details regarding the mathematical modeling of machine learning and deep learning classification models can be found in [85,86,87,88,89].

Table 4. Methodologies for facial palsy (FP) detection.

Ref.	Objective	Methodology	Dataset	Performance	Conclusions/Limitations
[63]	Smartphone-based FP diagnostic system (five FP grades)	Linear regression model for facial landmark detection and SVM with linear kernel for classification	Private dataset of 36 subjects (23 noral−13 palsy patients) performing 3 motions	88.9% classification accuracy	Reproducibility under different experimental conditions, as well as repeatability of measurements over a period of time, were not implemented
[64]	Facial movement patterns recognition for FP (2 classes, i.e., normal and asymmetric)	Active Shape Models plus Local Binary Patterns (ASMLBP) for feature extraction and SVM for classification	Private dataset of 570 images of 57 subjects with 5 facial movements	Up to 93.33% recognition rate	High robustness and accuracy
[65]	Quantitative evaluation of FP (HB scale)	Multiresolution extension of uniform LBP and SVM for FP evaluation	Private dataset of 197 subject videos with 5 facial movements	~94% classification accuracy	Sensitive to out-plane facial movements, with significant natural bilateral asymmetry
[51]	Facial landmarks tracking and feedback for FP assessment (HB scale)	Active Appearance Models (AAMs) for facial expression synthesis	Private dataset of frontal images of neutral and smile expressions from 5 healthy subjects	87% accuracy	Preliminary results to demonstrate a proof of concept
[66]	FP assessment	ANN	Private dataset of 43 videos from 14 subjects	1.6% average MSE	Pilot study; general results follow the opinions of experts
[67]	Facial asymmetry measurement	Measuring 3D asymmetry index	Three-dimensional dynamic scans from Hi4D-ADSIP database (stroke)	-	Extraction of 3D feature points, as well as potential for detecting facial dysfunctions
[68]	FP classification of real-time facial animation units (seven FP grades)	Ensemble learning SVM classifier	Private dataset of 375 records from 13 patients and 1650 records from 50 control subjects	96.8% accuracy 88.9% sensitivity 99% specificity	Data augmentation for the imbalanced dataset issues
[69]	FP quantification	Combination of landmarks and intensity HoG-based features and a CNN model for classification	Private dataset of 125 images of left facial weakness, 126 images of right facial weakness, and 186 images of normal subjects	Up to 94.5% accuracy	The combination of landmarks and HoG intensity features produced the best, when compared to either landmarks or intensity features separately
[70]	FP classification (three classes)	HOG features and a voting classifier	Private dataset of 37 videos of left weakness, 38 of right and 60 of normal subjects	92.9% accuracy 93.6% precision 92.8% recall 94.2% specificity	Comparison with other methods revealed the reliability of HOG features
[71]	Facial metric calculation of face sides symmetry	Facial landmark features with cascade regression and SVM	Stroke faces dataset of 1024 images and 1081 images of healthy faces	76.87% accuracy	Machine learning problem-specific models can lead to improved performances
[72]	FP assessment (HB scale)	Laser speckle contrast imaging and NN classifiers	Private dataset of 80 FP patients	97.14% accuracy	Outperforms the state-of-the-art systems and other classifiers
[73]	FP classification (three classes)	Regional handcrafted features and four classifiers (MLP, SVM, k-NN, MNLR)	YouTube Facial Palsy (YFP) database	Up to 95.58% correct classification	Severity is higher classified in eyes and mouth regions
[75]	Face symmetry analysis (symmetrical-asymmetrical)	Unified multi-task CNN	AFLW database to fine tune the model and extended Cohn–Kanade (CK+) to learn face symmetry (18,786 images in total)	-	Lack of fully annotated training set, as well as the need for labeling or a synthesized training set
[76]	FP classification (five grades)	CNN (VGG-16)	Dataset from online sources augmented to 2000 images	92.6% accuracy 92.91% precision 93.14% sensitivity 93% F1 Score	Deep features combined with data augmentation can lead to robust classification
[5]	FP classification	FCN	AFLFP dataset	Normalized mean error (NME): 11.5% Mean average: 2.3% standard deviation	Comparative results indicate that deep learning methods are, overall, better than machine learning methods
[33]	Quantitative analysis of FP	Deep Hierarchical Network	YouTube Facial Palsy (YFP) database	5.83% NME	Line segment learning leads to an important part of deep features being able to improve the accuracy of facial landmark and palsy region detection
[77]	Quantitative analysis of FP	Hierarchical Detection Network	YouTube Facial Palsy (YFP) database	Up to 93% precision and 88% recall	Efficient for video-to-description diagnosis
[78]	Unilateral peripheral FP assessment (HB scale)	Deep CNN	Private dataset of 720 labeled images of four facial expressions	91.25% classification accuracy	Fine-tuning deep CNNs can learn specific representations from biomedical images
[79]	FP grading	Fully 3D CNN	Private FP dataset of 696 sequences with 17 subjects	82% classification accuracy	Very competent at learning spatio-temporal features
[80]	AR system for FP estimation	Light-Weight Facial Activation Unit model (LW-FAU)	Private dataset from 20 subjects	-	Lack of FP benchmark models and datasets
[81]	FP assessment (six classes)	FNPARCELM-CCNN method	YouTube Facial Palsy (YFP) database	85.5% accuracy	Semi-supervised methods can distinguish different degrees of FP, even with little-labeled data
[82]	FP detection and classification	Deep feature extraction with SqueezeNet and ECOC-SVM classifier	YouTube Facial Palsy (YFP) database	99.34% accuracy	Improvement in FP detection from a small dataset
[83]	Part segmentation	Point-Net++ and PointCNN	CT images of 33 subjects	99.19% accuracy 89.09% IOU	Geometric deep learning can be efficient
[84]	FP asymmetry analysis	Proposed deep architecture	YouTube Facial Palsy (YFP) database	93.8% IOU	Poor with bearded faces due to a lack of such training data images

From the information included in Table 4, useful conclusions can be drawn. The lack of available datasets designated for palsy detection and evaluation is obvious. Most research teams develop their own private sets to test their algorithms. The most used public dataset among the referenced works is the YFP dataset; however, it refers to a limited video dataset. The videos are converted into image sequences; however, low dysfunctions cannot be easily visible from only one image and, thus, a sequence of frames needs to be examined to draw conclusions. Moreover, the dataset is labeled but facial landmark points are not annotated. From Table 4, it can be observed that deep learning methods lead to better performance results compared to machine learning methods or methods relying on hand-crafted features.

This entry is adapted from the peer-reviewed paper 10.3390/axioms12121091

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.