1. Parkinson’s Disease: Background
PD is a dynamic sensory system problem that influences the development of an individual. Symptoms of PD start gradually and may begin with a hardly detectable tremor
[1]. Tremors are common; however, disorders are often accompanied by rigorousness or slowed mobility
[2]. In the beginning phases of PD, one’s face might show slight or zero expressions, and one’s arms may not swing when walking. Voice might turn out to be slurred or soft
[3]. PD symptoms deteriorate as the disease progresses over time. Although there is no cure for PD, meds could alleviate the symptoms. Doctors may recommend a medical procedure to control specific areas of the brain and further alleviate symptoms. According to research, India has 7 million elderly people suffering from PD
[4]. Medication and surgery are offered to control the symptoms of this subjective disease
[5]. The number of Americans living with this subjective disease is close to one million, which is greater than the total number of persons with Lou Gehrig’s disease and multiple sclerosis. By 2030, this number is expected to reach 1.2 million.
People with PD (PWP) have significant variances in their symptoms, reactions to medicines, and adverse treatment effects. Understanding the genetic variations among Parkinson’s patients might provide crucial hints regarding how and why each person’s experience with PD differs. Although the actual etiology of PD is unknown, scientists predict that a mix of genetic and environmental factors causes it. Each factor’s impact varies depending on the individual. Researchers do not know why some people develop Parkinson’s and others do not. About 10 to 15% of Parkinson’s cases are genetic in nature. In certain families, variations (or mutations) in particular genes are inherited or handed down from generation to generation
[6]. The molecular basis of this neurodegenerative disease is still not fully known over two decades after the discovery of the first mutation linked to PD. Initially, research on the genetics of Parkinson’s disease (PD) concentrated on uncommon family variants of the condition; however, six genes—LRRK2, alpha-synuclein, Parkin, VPS35, DJ-1, and PINK1—have now been conclusively linked to either an autosomal dominant or recessive form of the disease. Major advancements in the field have been made with the introduction of genome-wide association studies (GWAS) and the use of new technologies, such as next-generation sequencing (NGS) and exome sequencing. A wave of genetic association studies later implicated a number of genetic variants in the disease pathogenesis/protection
[7].
Proteomic biomarkers, ideally a biomarker that is predicted to properly represent a disease process, should be available for investigation in the afflicted tissue, such as in the suffering dopaminergic neurons in the case of PD. One of the biggest challenges to creating causal or disease-modifying medicines for PD is the fact that this is not achievable. The benefits of this strategy are clear, for instance, in contemporary tumor treatment that may be customized based on the specific hormone receptor status of the malignant cells. Proteomic disease-associated modifications must be looked for inaccessible bodily fluids such as blood plasma or CSF, as well as in peripheral tissues, as this technique cannot be used in PD. Although it initially appears unlikely, there are signs that the observed mutations may in fact represent at least some components of the disease process in the brain
[8].
In terms of risk factors, although the specific reasons for PD are anonymous, certain cases are frequently caused by natural and other factors that play a significant influence in the progression of the disease. Head traumas, an inadequate diet that includes a large number of pesticides or chemical exposure, and sedentary lifestyles are all risk factors. Figure 1 represents the major symptoms of this disease.
Figure 1. Symptoms of Parkinson’s disease.
Risk factors for PD include age (this disease rarely affects young adults; it typically manifests in middle or later life, and the risk rises with age), hereditary factors (having close relatives who have the disorder increases your probability of having it), sex (men are more likely to get PD than women), and exposure to toxins (ongoing exposure to herbicides and pesticides may slightly increase risk of PD)
[9]. Genetic indicators for PD have been used to identify people who have a higher risk of acquiring this disease, to track the disease’s development, and to look at how well treatment prevents the depletion of dopaminergic neurons. Early diagnostic tools for PD include markers including cerebrospinal fluid testing, non-motor clinical signs of PD, and several imaging modalities
[10].
2. Machine Learning Techniques Used to Diagnose Parkinson’s Disease
This entry identified the potential to correctly estimate the severity of PD, as measured by clinical metrics, using ML techniques such as SVM, ANN, KNN, naïve Bayes, logistic regression, CART, decision tree, etc. ANN is mostly used in classification and regression problems, where existing nearby features are considered to be relatable. Training datasets and test data are used in the ML algorithm. A technique that gains experience from its previous data and improves itself accordingly is known as ML. It is basically an analysis of algorithms that can generate data automatically. An ML classifier is categorized into two types, supervised and unsupervised. Labeled data fall under supervised, where different approaches of algorithms are used to train models. The categorization of ML algorithms is shown in
Figure 2. Artificial Neural Network (ANN)
[11] and Multilayer perceptron (MLP) with a back propagation algorithm
[12] are also used to diagnose PD.
Figure 2. Machine learning algorithm used to diagnose Parkinson’s disease.
The ML-based diagnosis of this subjective disease can be achieved by using symptoms as an attribute for the algorithm. The ML algorithm is used to diagnose the PD severity from the handwriting of an individual
[13]. Speech analysis and tremors are also important risk factors used to diagnose PD
[14]. Over time, several initiatives have been taken to diagnose PD by various researchers. In
[15], the author discussed a unique methodology to discriminate a healthy person from a person with Parkinson’s disease (PWP) by detecting dysphonia. They introduced a new reliable dysphonia test called pitch period entropy (PPE). It is unaffected by a variety of uncontrolled confounding factors such as loud acoustic surroundings and natural, healthy changes in the voice frequency. The dataset was obtained from thirty-one people, where twenty-three were subjective disease patients and eight were healthy, and it contained 195 persistent vowel phonations. The methodology that was used in this research is categorized into three steps: the calculation of feature refinement, the pre-processing and pre-selection of features, and the classification of models. To diagnose the subjective disease, a kernel support vector machine (SVM) classifier was used. By using this algorithm, the model accomplished an accuracy rate of 91.4%.
The main purpose of this
[16] research is to differentiate a healthy person from a PWP. In their work, to create a method for diagnosing PD patients using voice disorders, they used a dataset containing 34 persistent vowels, from 34 individuals, where 17 were subjective patients and 17 were healthy. For classification, the SVM technique was used and achieved an accuracy rate of 91.17% by using the first twelve coefficients of the Mel Frequency Cepstral Coefficients (MFCC) by kernel SVM.
To distinguish healthy individuals from subjective diseases, Ref.
[17] used a supervised ML algorithm, SVM, for classification purposes. All the data were processed in a tool called weka. Libsvm was used to find the best plausible accuracy on various kernel values for the given dataset. The linear kernel SVM accomplished an accuracy rate of 65.2174%. Similarly, the poly-kernel and RBF kernel accomplished an accuracy rate of 60.8696%.
In
[18], the authors discuss a methodology to diagnose PD. Weka tools were used to develop the algorithms for the pre-processing of data, classification methods, clustering, and the analysis of a given dataset. From the experimental results of this research, the accuracy rate achieved from K-nearest neighbor (KNN) + Adaboost.M1 was 91.28%, KNN + Bagging scores 90.76%, and KNN + MLP score 91.28%.
The author proposed a methodology for distinguishing between healthy persons and subjective disease patients. In their research, the data were obtained from 40 individuals where 20 were healthy and 20 were subjective disease patients
[19]. A total of 26 speech samples were taken from each individual, including phrases, sentences, words, and numerals. For classification and cross-validation, they used SVM and KNN. The KNN classifier produced an accuracy of 82.50%, whereas the SVM classifier reported an accuracy of 85%.
In
[20], the authors examined several voice signal analysis techniques for the diagnosis of this subjective disease. A novel feature termed tunable Q-factor wavelet transform (TQWT) was presented in their work. TQWT excelled in state-of-the-art voice signal computational methods adopted for feature extraction in PD detection. Distinct classifiers were applied to different feature subsets, and the predictions of the classifiers were aggregated using ensemble methods. The best accuracy of the model was reached by MFCCs and TQWT, which are thus key aspects in the problem of PD classification. As a data preparation phase, the minimum redundancy-maximum relevance (mRMR) feature selection approach was applied. In all the feature subsets, Radial Basis Function (RBF) kernel SVM had the greatest accuracy of 86%.
ANN was used by
[21] to identify PD. The dataset was obtained from the University of California, Irvine’s machine learning library. 45 attributes were chosen as input values and one outcome for categorization using the MATLAB tool. With an accuracy of 94.93%, their suggested model was able to differentiate healthy individuals from PD subjects.
The author addressed the causes and symptoms of the disease. The severity of this disease and its complications were discussed in their work. Furthermore, their studies established the best detection range for classifying Parkinson’s symptoms
[22].
The authors discovered a method for diagnosing PD that included ML and Kalman filtering methods. Tremor activities were applied to detect Parkinson’s symptoms in this method. Sleeping tremors were identified using ML approaches based on local field potentials. The data were obtained from 12 people. The Kalman filter enhanced the attributes of classified results based on the processed data
[23].
For evaluating this disease, Ref.
[24] examined phonation and acoustic signals. Four distinct ML approaches were used to preprocess and evaluate the data acquired about voice frequencies. Various microphone devices, including smartphones, were used to record the voice signals. For testing the measured accuracy rate and error rate in detection, the voice features acquired using smartphones were loaded into an ML system. The acoustic cardioid (AC) channel had a 94.55% accuracy, a 0.87 area under curve (AUC), and a 19.01% equal error rate (EER). While, by using the smartphone channel, they achieved an accuracy of 92.94%, an AUC of 0.92, and EER of 14.15%, respectively.
Using EEG signals recorded during the completion of verbal fluency tests, Almalaq et al.
[25] explored the connections and causality of distinct areas of the brain. Mental demands, such as transitioning between one behavioral task and another, are challenging for those with the subjective disease. Motor and phonemic fluency are among the behavioral tasks. Their approach included verbal generating skills, as well as stimulating several Broca sections of the Brodmann areas (BA44 and BA45).
In
[26], the authors presented a neural network (NN) approach for identifying symptoms of the subjective disease using speech data. The algorithm helped to classify symptoms of this disease and balance the data features using the SMOTE algorithm. Furthermore, the techniques of ensemble and Adaboost were used to improve the disease detection rate (accuracy rate). The final AdaBoost ensemble classifier implementation of NNge achieved an accuracy rate of 96.30%.
The authors examined PD subject detection using various ML techniques
[27]. They conducted their experiment on both training and test data, where they used 22 acoustic features of 195 sound recordings. To diagnose PD, four machine learning classifiers were used: KNN, SVM, Naive Bayes, and random forest. The Naive Bayes algorithm diagnosed PD patients with 70.26% accuracy and a precision of 0.64 for test data.
In
[28], the authors proposed a method to diagnose PD using the selection and extraction of features and pre-processing classification. In their work, for the feature selection task, recursive feature elimination and feature importance methods were used. For classification, various ML algorithms were used, such as SVM, ANN, and Classification and Regression Trees (CART). The accuracy of classification was measured before and after feature selection. Before feature selection, SVM was shown to have 79.98% accuracy, and after selection, it was shown to implement better than that.
The authors proposed a statistical method to detect the subjective disease using voice features including vowels. They used two ML techniques, SVM and KNN, where the accuracy rates obtained were 91.25% and 91.23%, respectively
[29].
In
[30], the authors suggested a method for evaluating feature sets by comparing performance metrics with various feature sets, such as genetic algorithm-based feature sets and Principal Component Analysis (PCA)-based feature reduction techniques. Using SVM with RBF and genetic algorithm-based feature sets, they were able to achieve an accuracy of 97.57%.
Using L1-norm SVM of feature selection, Ref.
[31] suggested a method for identifying PD patients from healthy people by generating a new subset of features from the PD dataset. Their study was validated using the k-fold cross-validation approach. The results of their study’s experiments imply that the suggested approach may be used to reliably forecast the subjective disease and that it can be readily used in healthcare for diagnosis purposes.
According to
[32], Linear Discriminant Analysis (LDA) performed better than PCA for distinguishing HC subjects and PD patients; thus, LDA was used as input for the clustering models. The performance of various models was evaluated by comparing the results of the clustering algorithms with the ground truth after a follow-up. In terms of sensitivity, specificity, and accuracy, Hierarchical clustering surpassed DBSCAN and K-means algorithms by 78.13%, 38.89%, and 64%, respectively.
From the above, it was observed that various ML techniques have been applied in recent research works over voice-based PD detection and in handwritten patterns to diagnose PD.
3. Adaptation of the ML Framework
Multiple input variables led to various interpretations. When the input variable is an acoustic voice feature, ML algorithms are preferred to diagnose PD. In the instance of acoustic voice datasets, the primary interpretation for the application of ML was to diagnose the initial signs of PD
[33]. In other instances, it was assumed that training models may be quite effective for the early screening of PD because the gold standard was readily available. A particular combination of ML methods, including PCA, was used since the input dataset’s dimensions were reduced. A decision tree or k-mean clustering algorithms are more suitable for analyzing the speech database’s characteristics, and these classifiers may be used to classify voice data for control vs. PD. Due to the fact that the acoustic speech data violated the data in components, it was considered that learning the acoustic speech data using ML techniques such as HMM would be the best approach, which would then be followed by the detection procedure. To determine the risk of PD, a deep CNN classifier using transfer learning and data augmentation approaches can be implemented. Due to the small amount of data, using handwriting data to predict PD presents a significant classification difficulty in the early stages. To achieve high accuracy, the independent usage of the ImageNet and MNIST databases as input sources was utilized.
3.1. Architecture Based on Acoustic Voice Dataset as Input
In
[34], the authors proposed a methodology to diagnose PD using stochastic gradient descent (SGD), logistic regression, Extreme Gradient Boosting (XGB), KNN, random forest, and decision tree ML classifier, as shown in
Figure 3. In their study, the authors first extracted certain attributes to classify for better understanding. By extracting attributes from the input data, feature extraction improves the accuracy of trained models. By getting rid of the redundant records, this stage decreases the dimensionality of the data. Naturally, it speeds up categorization. By choosing and merging variables into features, it helps acquire the optimum feature from such enormous data sets, while also significantly reducing the volume of data. These characteristics are simple to use while still accurately and uniquely describing the real data set. Secondly, they applied some data mining approaches to classify the HC and affected patients based on various acoustic voice features to predict the accuracy rate. For that, the authors first set the target variables, i.e., the health status of PD patients. Once the target attribute was set, they modified the dataset column that was used as the input after being extracted from the dataset. Finally, the authors made a comparison among all the ML algorithms to check the best accuracy result, which was obtained by a random forest classifier with an accuracy rate of 97.10%, and a minimum accuracy was obtained by SGD and logistic regression, with an accuracy rate of 91.66% for both classifiers.
Figure 3. Proposed methodology to diagnose Parkinson’s disease by
[34].
3.2. Architecture Based on Handwritten Patterns as Input
A structural Co-occurrence Matrix (SCM)-based technique to diagnose PD as shown in
Figure 2 was proposed by
[35]. In their research, the features were extracted from the spiral and meander handwriting exams of the Hand PD datasets
[36].
Figure 4 represents a methodology with combinations of an exam template (b) and handwritten trace (c). First, the exam segmentation is performed, and it generates two new images: an exam template (b) and a handwritten trace (c). By using digital image processing techniques on a handwritten trace, these images are produced. Secondly, the segmentation of the exam is converted into grayscale for the next level. The third phase is feature extraction from the grayscale images that have been segmented and converted. As shown in
Figure 2, these images serve as the SCM’s input images.
Figure 4. Proposed methodology to diagnose Parkinson’s disease using handwriting in a spiral format by
[35].
Analyzing the connection between signals, in this example in a two-dimensional space, is conducted by feature extraction through the SCM.
For this research
[35], datasets were collected from 92 individuals, of which 74 were PD patients and 18 were HC. In their proposed work, various algorithms such as SVM, naïve Bayes, and OPF classifiers were applied to the dataset. The highest accuracy was calculated by combining the handwritten trace with the handwriting in a spiral format by using SVM (85.54%).
3.3. Architecture Based on Gait Dataset as Input
A methodology in gait patterns to diagnose PD using ML algorithms was proposed by
[37]. According to the authors, gait pattern is very eccentric for each and every human being, but there is a substantial transformation in the gait pattern of HC and PD patients.
Table 1 represents the review of ML approaches in gait data to diagnose PD for 18 studies. For this research, the sources of data were the Laboratory for Gait and Neurodynamics, Neurology Outpatient Clinic at Massachusetts General Hospital, Boston, MA, USA, and the University of Michigan and were collected from the participants. The ML algorithms used to diagnose PD for gait symptoms were the least square-SVM, particle swarm optimization, fuzzy KNN, random forest, hidden Markov models, logistic regression, ANN, kernel fisher discriminant, naïve Bayes, linear discriminant analysis, etc. The maximum number of subjects considered in
Table 1 is 424, with 156 PD patients and 268 healthy controls, with an accuracy rate of 85.51% using the hidden Markov model algorithm. The minimum number of subjects was 20, with 10 PD patients and 10 healthy controls with an accuracy of 91.9% using a deep convolutional neural network algorithm. For experimental setup, MATLAB R2013b and python were used. It was observed that the maximum accuracy was achieved by using the SVM algorithm with a 100% accuracy rate for the 166 subjects, where 93 were PD patients and 73 were healthy controls. The minimum accuracy was achieved by the random forest algorithm, with a 79.6% accuracy rate for the 80 subjects, where 40 were PD patients and 40 were healthy controls.
Table 1. Comparative studies of machine learning approaches in gait dataset to diagnose PD.
This entry is adapted from the peer-reviewed paper 10.3390/diagnostics12082003