Exploring Huntington’s Disease Diagnosis via Artificial Intelligence-Powered Models: Comparison
Please note this is a comparison between Version 1 by Jayakumar Kaliappan and Version 2 by Fanny Huang.

Huntington’s Disease (HD) is a devastating neurodegenerative disorder characterized by progressive motor dysfunction, cognitive impairment, and psychiatric symptoms. The early and accurate diagnosis of HD is crucial for effective intervention and patient care. 

  • Huntington’s disease
  • Artificial Intelligence
  • machine learning
  • deep learning
  • diagnosis

1. Introduction

Huntington’s disease is a profoundly impactful neurodegenerative disorder [1] that not only affects individuals but also casts a long shadow over their families [2]. It represents a complex clinical picture, marked by the inexorable progression of motor dysfunction, cognitive decline, and psychiatric symptoms, ultimately culminating in profound disability and a tragically shortened lifespan [3]. The gravity of HD has sparked growing concern among the medical and research communities worldwide, triggering multifaceted efforts to not only unravel its etiological and pathophysiological intricacies but also to pioneer advancements in its early detection and management [4]. Research in other neurodegenerative diseases such as Alzheimer’s [5][6][7][5,6,7] and Parkinson’s [8] has similarly aimed to decode their intricate mechanisms, leading to strides in understanding their pathology and paving the way for potential treatment breakthroughs.
Recent epidemiological studies have illuminated the prevalence and incidence of HD, revealing the stark reality of this disease. The pooled incidence of HD, as reported across various populations, has been estimated at 0.48 cases per 100,000 person-years (95% CI, 0.33–0.63). This statistic underscores the challenging nature of diagnosing HD, especially in its nascent stages, given its relative rarity. Furthermore, a continent-based analysis of these figures uncovers marked disparities in the incidence of HD, with Europe and North America experiencing considerably higher rates compared to Asia. Beyond incidence, comprehending the prevalence of HD is essential for effective healthcare planning and resource allocation. The compiled prevalence of HD stands at 4.88 per 100,000 (95% CI, 3.38–7.06) [9], shedding light on the overall burden of the disease within populations. These prevalence figures not only serve as an alarming reminder of the global health concern that HD represents but also emphasize the urgent need for concerted efforts to enhance its diagnosis, treatment, and support systems for affected individuals and their families.
The pathological progression of Huntington’s disease (HD) remains elusive, drawing attention from varied research domains. The work [10] explores motor speech patterns, reflecting the motor involvement in HD. The ref. [11] delves into speech biomarkers across HD stages, emphasizing the continuum from pre-symptomatic to early manifestation. An another article [12] contributes insights on Huntington’s multifaceted role, spanning neurodevelopment to neurodegeneration. Understanding synaptic loss, the ref. [13] and another study in Nature Medicine [14] highlight the early involvement of microglia, complement activation, and innate immune mechanisms in corticostriatal synapse decline. The study [15] offers perspectives on the toxic effects of mutant Huntington. This multifaceted exploration underscores the complexity of HD’s pathological cascade, spanning molecular, synaptic, and clinical dimensions.

2. Exploring Huntington’s Disease Diagnosis via AI-Powered Models

2.1. Preamble—Diagnosis of Huntington’s Disease via AI-Powered Models

The need for AI-powered approaches in HD diagnosis arises from the pressing demand for early and accurate detection of this debilitating neurodegenerative disorder. HD is a multifaceted condition with a broad range of clinical manifestations, making it challenging for clinicians to diagnose, particularly in its early stages. ML and DL models have demonstrated their potential in handling the intricate and multi-modal data associated with HD, including genetic information, neuroimaging scans, and clinical assessments. These models can leverage vast datasets to identify subtle patterns and biomarkers that might elude human observers, enabling earlier and more precise diagnosis. Additionally, the development of automated diagnostic tools can alleviate the burden on healthcare professionals, streamline the diagnostic process, and ultimately lead to better patient outcomes. Given the ongoing progression of HD coupled with the lack of a definitive cure, the timely diagnosis facilitated by ML and DL techniques becomes paramount for initiating appropriate interventions, providing counseling, and advancing research into potential therapies. Therefore, the integration of ML and DL models into HD diagnosis is not only a necessity but also holds significant promise for enhancing the quality of life for individuals and families impacted by this devastating disease.

2.2. Machine Learning Techniques

Machine learning techniques have emerged as valuable tools in the diagnosis and assessment of Huntington’s disease. These techniques utilize various algorithms and computational approaches to analyze complex data sets, offering clinicians and researchers new insights into the disease [16][70]. One of the primary applications of machine learning in HD diagnosis is the identification of biomarkers and patterns within medical images, such as magnetic resonance imaging (MRI) and functional MRI (fMRI) [17][71]. Machine learning algorithms can detect subtle changes in brain structure and function, helping to distinguish individuals with HD from healthy controls and providing a means to monitor disease progression over time. Additionally, machine learning models can analyze clinical data, including motor, cognitive, and psychiatric assessments, to identify relevant features and patterns that contribute to accurate diagnosis and prognosis [18][72].
It also plays a crucial role in predictive modeling for HD risk assessment. By incorporating genetic data and other relevant factors, machine learning algorithms can predict an individual’s likelihood of developing HD, aiding in early intervention and counseling. Furthermore, machine learning approaches can be applied to large-scale genetic studies to identify genetic modifiers and factors linked to the potential for risk involving with the age of onset and disease advancement. This information not only deepens our understanding of the disease but also has implications for the development of targeted therapies. In summary, machine learning techniques are advancing the field of HD diagnosis by facilitating the extraction of valuable insights from clinical and genetic data, ultimately leading to earlier detection and improved management of this devastating neurodegenerative disorder.

2.2.1. Naive Bayes

The Naive Bayes classifier stands out as a prominent choice for effectively discerning gait signals between individuals with HD and those without the condition [19][73]. The study reports an impressive accuracy rate of 94.4% achieved by the Naive Bayes classifier in this diagnostic context. This highlights the effectiveness of Naive Bayes as a valuable machine learning tool for Huntington’s disease diagnosis, offering the potential for non-invasive and objective assessment of individuals based on their gait dynamics, aiding in the timely identification and continuous tracking of the condition.

2.2.2. Decision Tree

The use of Decision Tree stands out as a highly effective tool in the diagnosis of Huntington’s disease [19][73]. Decision Tree achieved an impressive average accuracy of 100% in accurately classifying gait signals from subjects with HD. This remarkable accuracy underscores the robustness of the Decision Tree algorithm in distinguishing individuals with HD based on their gait dynamics. Additionally, the Decision Tree emerges as a pivotal machine learning algorithm employed for the prediction and identification of potential contributing genes in Huntington’s disease [20][74]. This method involves the use of Decision Tree to formulate rules for attributes, specifically genes, and makes determinations regarding the prediction class, which denotes whether a given sample is associated with HD or not. Remarkably, the Decision Tree model showcased its efficacy by achieving an impressive cross-validated classification accuracy of 90.79% with a standard deviation of 4.57% when applied to the expression data of prefrontal cortex samples.

2.2.3. Support Vector Machine

Support Vector Machine (SVM) emerges as a crucial classifier for gait classification, playing a significant role in the context of Huntington’s disease diagnosis, alongside other pathological conditions [21][75]. The utilization of SVM to differentiate gait patterns among diverse clinical groups, including individuals with Huntington’s disease, post-stroke patients, and healthy elderly individuals, employing data collected from inertial sensors. The classifier is trained using features derived from subject-specific Hidden Markov Models (HMMs), encompassing temporal and frequency domain signal data, and employs a leave-one-subject-out cross-validation technique, working in conjunction with three HMMs to assess likelihoods and ensure precise gait classification. The SVM also emerges as a valuable supervised classification method utilized for the discrimination of neurodegenerative diseases, including Huntington’s disease, through gait analysis [22][76]. This approach leverages SVM as a prediction model to classify and monitor these target diseases effectively. Notably, the SVM model aids in the identification of the most predictive features extracted from the gait analysis dataset, enabling a refined and accurate disease discrimination process. Impressively, the SVM model achieves a commendable accuracy rate of 86.9% in distinguishing these neurodegenerative diseases, underscoring its significance as a powerful tool for enhancing the diagnostic capabilities and understanding of HD and similar conditions through the analysis of gait patterns.
SVM is harnessed for classification purposes, specifically to distinguish individuals as either pre-HD or controls based on neuroimaging data [23][77]. Collaborating with linear discriminant analysis (LDA), SVM plays a crucial role in developing classification models capable of decoding essential information about the disease state from neuroimaging data. Impressively, these classification models utilizing SVM achieve notable success, reaching up to 76% accuracy in effectively distinguishing between pre-HD and control individuals based on their neuroimaging profiles. SVM is also applied for the classification of subjects by their HD stage, based on oculomotor features [24][78]. The accuracy of the SVM classifier varies depending on the specific classification task, achieving 73.47% accuracy for distinguishing control participants from pre-HD participants, 81.84% for distinguishing control participants from HD subjects, and 83.54% for distinguishing pre-HD subjects from HD patients, highlighting its effectiveness in stratifying individuals based on disease progression. Linear SVM is employed to classify eye tracking data across pre-HD, HD, and control groups, utilizing different combinations of features to optimize performance. Notably, the study reports the best accuracy of 76.88% for the CTRL vs. HD classifier and 72.50% for the pre-HD vs. HD classifier, underscoring the utility of SVM in Huntington’s disease diagnosis by leveraging oculomotor performance-derived features to accurately differentiate disease stages.
SVM plays a pivotal role in the classification of HD stages based on features extracted from T1- and diffusion-weighted imaging data [25][79]. Utilizing SVM, different feature selection techniques, such as whole-brain GM or FA values, subcortical regions-of-interest GM or FA values, and automated GM or FA value selection via the Relief-F algorithm, are employed to classify HD stages. This research showcases the adaptability of SVM, achieving noteworthy distinctions between Early-HD and Pre-HD or healthy individuals, with accuracy levels spanning from 85% to 95%. Moreover, SVM effectively discriminates Pre-HD from controls using the caudate region’s FA feature, achieving an accuracy of 74%.
SVM model is harnessed to facilitate the development of a novel formula for HD. SVM, a versatile machine learning algorithm widely recognized for its proficiency in classification and regression tasks, proves instrumental in handling the complex task of formulating a treatment for Huntington’s disease [26][80]. This research utilizes SVM to model training using documented Traditional Chinese Medicine (TCM) prescriptions. The objective is to identify a formula that can effectively target multiple proteins associated with HD, leveraging the SVM model’s ability to work with high-dimensional data and complex datasets. SVM is also utilized to classify gait signals from unknown subjects, distinguishing between those suffering from HD and healthy subjects [19][73]. Remarkably, experimental results highlight SVM’s outstanding performance, achieving an impressive average accuracy of 100.0% in accurately classifying gait signals. The remarkable precision achieved marks a notable milestone in Huntington’s disease diagnosis, underscoring SVM’s potency in effectively utilizing gait dynamics data for dependable differentiation of HD patients from non-afflicted individuals.

2.2.4. Random Forest

In the ref. [20][74], Random Forest emerges as a prominent machine learning algorithm employed for the identification of contributing genes in Huntington’s disease. This strategy employs Random Forest to scrutinize postmortem prefrontal cortex samples from HD patients and control subjects, with the objective of identifying genes potentially linked to HD pathogenesis. The versatility of Random Forest proves beneficial in this context by effectively reducing the dimensionality of the data and highlighting the most relevant genes implicated in the pathophysiology of HD.The Random Forest model achieved a notable accuracy of 90.45 ± 4.24%, highlighting its pivotal role in aiding the diagnosis and comprehension of HD by deciphering the genetic components influencing its onset and progression. Random Forest also emerges as a prominent supervised classification model utilized for the discrimination of neurodegenerative diseases, including Huntington’s disease, through gait analysis. Random Forest serves as a computational classification technique, effectively characterizing these diseases using extracted features from gait cycles [22][76]. The study reports an impressive accuracy rate of 84.9% achieved by the Random Forest model, highlighting its substantial contribution to the accurate discrimination of HD and other neurodegenerative conditions based on gait patterns. The ref. [27][81] mainly focuses on assessing the significance and order of importance of potential factors that could predict the progression of clinical symptoms in patients with manifest HD. It accomplishes this by employing a random forest regression model to forecast how clinical outcomes change based on these factors.
Random Forest emerges as a powerful machine learning technique employed to discern microRNA biomarkers indicative of susceptibility to Juvenile Onset Huntington’s Disease (JOHD) [28][82]. The research employs the Random Forest methodology strategically to build predictive models, which can distinguish between JOHD and WT samples using mouse cortex samples from both young and aged groups. Additionally, it aims to forecast the inclination toward those genotypes. Impressively, the Random Forest model yields several robust models with testing accuracies exceeding 80% and impressive Area Under the Curve (AUC) scores surpassing 90%. It demonstrates a remarkable ability to distinguish between JOHD and WT samples, featuring a mature mRNA-based model that achieves a flawless 100% AUC score, highlighting its outstanding discriminatory capabilities. This application of Random Forest in the study underscores its potential in not only identifying crucial microRNA biomarkers but also in the diagnosis and predisposition assessment of Juvenile Onset HD, offering a significant advancement in the field of HD diagnostics.

2.2.5. K-Nearest Neighbours

K-Nearest Neighbors (KNN) emerges as a significant classifier for the diagnosis of HD based on gait dynamics information. KNN, a well-known machine learning classifier, plays a crucial role in distinguishing individuals with HD from healthy subjects by classifying gait signals from unknown subjects [19][73]. The study reports an impressive accuracy rate of 97.2% achieved by the KNN classifier, underscoring its effectiveness in accurately identifying and differentiating individuals with HD from those without the condition through the analysis of gait dynamics.
KNN is employed for the identification of HD through audio signal processing [29][83]. KNN is applied in the classification stage following dimensionality reduction of voice signals, contributing to the accuracy of disease detection. The study demonstrates the effectiveness of the combination of the emobase2010 feature extractor with the KNN classifier, achieving an impressive accuracy rate of 97.3%. Notably, this high accuracy is achieved while maintaining a prediction time below one second, highlighting the practical utility of KNN in Huntington’s disease diagnosis through audio signal analysis.

2.2.6. Ensemble Models

The ref. [30][84] highlights the crucial significance of ensemble classifier algorithms, with a specific emphasis on employing general ensemble classifier algorithms, in distinguishing gait patterns between individuals affected by HD and those who are healthy. This innovative methodology amalgamates individual classifier algorithms such as Logitboost and RandomForest, where Logitboost serves as the metaclassifier and RandomForest acts as the base classifier. The combination of Logitboost and RandomForest as ensemble classifiers showcases superior performance, particularly outperforming other tree decision algorithms in effectively classifying HD gait data. Significantly, the ensemble classifier method introduced demonstrates notable improvements in accuracy. It successfully classified 13 out of 14 subjects correctly and accurately identified all seven individuals with HD when employing the Logitboost and RandomForest combination. This showcases the significant promise of ensemble classifiers, specifically in harnessing ankle-mounted iPhone sensor data for robust diagnostic capabilities within the domain of Huntington’s disease classification. This approach holds substantial promise for advancing the accuracy and efficiency of HD diagnosis through gait analysis.
This research [31][85] presents an ensemble machine learning model that consistently outperforms nine conventional machine learning models, notably excelling in terms of accuracy. This ensemble model achieves a commendable balanced accuracy of 55.3% ± 6.1 in a 4-group classification of HD progression states. Even more impressive results are observed in binary classifications, with accuracies ranging from 70.9% ± 9.4 to 83.3% ± 6.3. Notably, the accuracy of the ensemble model experiences further augmentation through the incorporation of volumetric scores from diverse brain regions, including the occipital cortex, lateral ventricles, cingulate, and temporal lobe, in addition to the striatal structures. This emphasizes the potential of ensemble learning algorithms in advancing the precision of HD diagnosis through the utilization of structural MRI data, illustrating a significant stride forward in the field of neuroimaging-based diagnostics.

2.2.7. Automatic Machine Learning

AutoML provides a significant advancement by automating the selection and optimization of machine learning models, thus reducing the need for manual intervention in model selection and tuning [32][86]. Within this investigation, the utilization of auto-sklearn, which harnesses Bayesian optimization algorithms, effectively pinpoints the most proficient model within the training dataset. This optimization contributes to an elevated level of effectiveness and precision in prediction outcomes. Notably, the utilization of AutoML enables the integration of various speech features with demographic variables to predict cognitive, motor, and functional scores in HD. Moreover, it supports the creation of fully automated methods for speech analysis, potentially minimizing the need for manual annotations and enabling remote assessment of individual conditions in Huntington’s disease and similar neurodegenerative disorders. While the paper does not explicitly mention the accuracy of the AutoML models, it emphasizes the significant improvement in predictions when combining speech features with demographic variables, showcasing its potential for accurate assessment in HD. This innovative approach holds immense promise for advancing the diagnostic capabilities of Huntington’s disease.

2.2.8. Summary of Machine Learning Models

In summary, machine learning models have emerged as powerful tools for the diagnosis and understanding of HD. Various ML algorithms, as described in the earlier headings, have been applied to diverse data sources such as gait dynamics, genetic information, neuroimaging data, and speech recordings to enhance HD diagnosis and prognosis as in Table 1. These models have shown remarkable accuracy rates, often surpassing 90%, and have the potential to contribute to early detection, monitoring, and understanding of HD. However, several limitations persist across these studies, including the need for larger and more diverse datasets, the interpretability of complex models, and ethical considerations related to data privacy and security. Moreover, generalization to larger populations and clinical settings remains a challenge. The incorporation of automatic machine learning (AutoML) approaches signifies a promising direction in automating model selection and parameter tuning, potentially making these ML models more accessible for clinical deployment. Overall, ML models offer substantial potential for improving HD diagnosis, but further research and validation are needed to fully harness their capabilities and ensure their clinical utility.
Table 1. Summary of Machine Learning Models.
Reference Machine

Learning

Approaches

Used
Main Contributions Dataset Performance

Evaluation

Metrics
Limitation
[19][73] Decision Tree Proposes an automated method

for evaluating gait dynamics

as a means of diagnosing HD.
Gait in

Neuro-

degenerative

disease

dataset of

36 people
Accuracy

= 100%
Limited

models

explored
[20][74] Decision Tree Identification of potential genes

contributing to HD
GSE33000

dataset of

314 subjects
Accuracy

= 90.79%
Small data

samples
[21][75] Support Vector

Machine
Introduces an approach centered

around training classifiers such as

Hidden Markov Models



recordings


of HD


gene carriers
Relative error


from 12.7% to


20%
Less

number of

participants

2.3. Deep Learning Techniques

Deep learning approaches have become potent instruments for advancing the field of HD research and diagnosis. These techniques utilize the artificial neural network (ANN) with stacked layers to automatically acquire complex patterns and representations from complex datasets. Deep learning models excel at capturing hierarchical and abstract features from diverse data sources, such as neuroimaging scans, genetic data, and clinical assessments. The application of deep learning in the context of HD has shown promise in enhancing diagnostic accuracy, predicting disease progression, and uncovering fresh perspectives on the fundamental mechanisms behind disorder.

2.3.1. Artificial Neural Network

The study [33][87] introduces a mathematical model, incorporating Artificial Neural Networks (ANN), which effectively simulates HD disorders and accurately replicates the behavior of individuals affected by Huntington’s disease. Specifically, the ANN within the model is trained using comprehensive data and physiological insights concerning the Basal Ganglia (BG), the region of the brain primarily impacted by HD. This innovative model serves as a potent analytical tool for comprehensively studying HD behavior, offering valuable insights into the underlying causes of movement disorders in HD patients. By employing ANN in mathematical models of brain performance, particularly within the context of BG in HD, this research significantly contributes to the expansion of medical knowledge and sheds crucial illuminate the intricacies of brain function in individuals grappling with Huntington’s disease.
The research [34][88] introduces an innovative hybrid model designed to assess the symptoms of individuals afflicted with Huntington’s disease. This model ingeniously combines the robust predictive capabilities of an ANN with the interpretability afforded by a fuzzy logic system (FLS). Remarkably, the ANN component of the model achieved an impressive regression R value of 0.98, along with a low mean squared error (MSE) of 0.08. These metrics affirm the accuracy of the model in predicting the functional capacity level (FCL) of an individual. Complementarily, the FLS component offers a conclusive evaluation of the subject’s reaction condition, further enhancing the model’s interpretability. This amalgamation of ANN and FLS in the hybrid model enables a comprehensive evaluation of HD symptoms, effectively leveraging both predictive capabilities and linguistic interpretation. This pioneering model holds significant potential in advancing the daily lives of HD patients, offering a means to monitor and predict disease progression for improved care and management.
In the ref. [35][89], a comprehensive model employing a range of Artificial Neural Network (ANN) models to analyze data gathered from smart devices, such as smartphones or tablets, in order to forecast the functional capacity level of individuals afflicted with HD, is introduced. This approach encompasses a diverse array of ANN models, including Cascade forward backpropagation (CFBP), Feed-forward backpropagation (FFBP), Elman, Generalized regression neural network (GRNN), Nonlinear autoregressive exogenous model (NARX), Layer recurrent neural network (RNN), and Feed-forward time delay neural network (FFTDNN). The paper intricately details the entire process, from data preparation and labeling to the selection of learning algorithms, specific neural network training, performance evaluation, and comparative analysis. This study represents a significant stride toward leveraging advanced technology for a more precise and insightful assessment of functional capacity levels in HD patients.
The ref. [36][90] underscores the significance of employing non-linear techniques, particularly ANNs, as a potent tool in comprehending the intricacies of HD. The authors present a pioneering approach utilizing ANNs to accurately discern between control subjects and those affected by HD, leveraging DNA CpG methylation data. What sets this approach apart is its capacity to streamline the consideration of CpGs from hundreds of thousands down to a mere 237, showcasing the remarkable effectiveness of deep learning techniques in HD diagnosis. The study’s results unequivocally demonstrate that by focusing on these 237 CpGs and employing non-linear techniques such as ANNs, a precise differentiation between control and HD patients can be achieved. Overall, this paper advocates for the pivotal role of artificial neural networks, particularly as a deep learning technique, in the diagnosis of Huntington’s disease, particularly when leveraging DNA CpG methylation data.

2.3.2. Deep Neural Network

Deep Neural Network revolutionizes the identification of HD through the utilization of DNN in analyzing speech signals [37][91]. The approach leverages a combination of acoustic and lexical features for automated detection. Employing a Leave-One-Subject-Out (LOSO) methodology, the DNN model is meticulously trained and validated, where individual subjects are consecutively held out as the test speakers. Notably, the study observes a progressive increase in the accuracy of this method, particularly with advancing disease stages. This underscores the potential of speech as an effective biomarker for monitoring HD progression. The performance evaluation of the DNN model, alongside other deep learning models, is quantified using the word error rate (WER), yielding an impressive range between 9.4 to 14.9. These results substantiate the notion that employing objective analyses through DNN and similar deep learning models holds significant promise in distinguishing between healthy individuals and those with HD. This advancement not only reinforces clinical diagnoses but also facilitates symptom tracking in non-laboratory and non-clinical settings, presenting a notable stride towards improved healthcare management for individuals affected by Huntington’s disease.

2.3.3. Deep Convolutional Neural Network

The ref. [38][92] delves into the application of deep learning models, specifically VGG16 and 3D CNN, for diagnosing Huntington’s disease. The study reveals that VGG16, a well-established architecture, holds great promise in classifying disease severity through analyzing pressure data from individual footsteps, achieving an impressive 89% accuracy. Its proficiency in extracting nuanced features such as edges and corners significantly contributes to accurate classification. While VGG16 excelled, other techniques such as 3D CNN also demonstrated an accuracy of 82%. The study highlights that 3D CNN, though slightly less accurate at 82%, presents potential for improvement when combined with models such as VGG16. The paper suggests that while 3D CNN may have slightly different feature extraction capabilities compared to the novel model used, combining their strengths could lead to even more accurate Huntington’s disease diagnosis. This integrated approach signifies a promising stride towards refining disease classification, holding substantial implications for both clinical practice and research in this field.

2.3.4. Extreme Learning Machine

Extreme Learning Machine (ELM) models, as outlined in this research, present a pioneering method for predicting the progression of Huntington’s disease based on brain scans. The approach, referred to as Brute-force Missing Data Extreme Learning Machine, showcases significant potential in this domain [39][93]. This novel method leverages the ELM framework to train models on datasets containing absent data for both processing and the desired outcomes. Notably, the ELM approach in this study demonstrates exceptional efficiency by individually constructing and training models for each sample in the test set. This process is remarkably efficient, eliminating the need for repeated access to the training data. Experimental comparisons conducted in the study reveal highly promising results, this underscores the effectiveness of employing ELM in the diagnosis of Huntington’s disease. By addressing missing data challenges and leveraging the power of ELM, this approach offers a significant stride forward in accurately predicting the progression of HD, holding considerable potential for advancing early diagnosis and intervention strategies for individuals affected by this condition.

2.3.5. Deep Boltzmann Machine

The ref. [40][94] introduces a pioneering approach utilizing a stacked restricted Boltzmann machine (SRBM) in the analysis of RNA-seq data for Huntington’s disease diagnosis. This innovative deep learning technique is specifically tailored to identify key genes implicated in the progression of HD. By examining differentially activated neurons and changes in gene energy at various time intervals, SRBM efficiently screens disease-associated factors and genes. Experimental results underscore the remarkable efficacy of SRBM, demonstrating its ability to discern crucial information in time series gene expression datasets. This leads to a significant improvement in the accuracy of identifying disease-associated genes and predicting top-ranking genes, surpassing the capabilities of current state-of-the-art methods. Moreover, SRBM outperforms other computational approaches in analyzing gene expression data of HD-afflicted mice across distinct spans of time. Its automatic feature learning capacity, coupled with heightened precision in identifying disease-associated genes, underscores SRBM as a formidable tool in HD diagnosis. This approach stands at the forefront of computational methods, offering a highly effective means of understanding the genetic underpinnings of HD progression.

2.3.6. Summary of Deep Learning Models

These algorithms represent a significant paradigm shift in the diagnosis and understanding of HD as in Table 2. These DL models excel in leveraging various data modalities such as gait dynamics, genetic information, neuroimaging data, speech signals, and RNA-seq data to enhance the accuracy and depth of HD diagnosis and prognosis. Notably, DL models such as DNNs showcase exceptional predictive capabilities, with impressive accuracy rates and the potential for monitoring HD progression. However, DL models often require large and diverse datasets for training, and their complexity may pose challenges in model interpretability and clinical applicability. Furthermore, while these models demonstrate remarkable potential, validation in diverse populations and clinical settings is essential to fully harness their capabilities and ensure their suitability for widespread clinical deployment. Nonetheless, DL models signify a transformative advancement in the field of HD diagnosis, offering valuable insights and paving the way for more accurate and personalized care for individuals affected by this debilitating disease.
Table 2. Summary of Deep Learning Models.
References Deep Learning

Approaches Used
Main Contribution Dataset Performance

Evaluation

Metrics
Limitation
[33][87] Artificial Neural

Network
Creating a mathematical model

with a grey box approach

to replicate the characteristics

of Huntington’s disease disorders.
Gait

Signal

dataset of

36 people
NIL Limited to

pharmaceutical

treatments

only
[34][88] Artificial Neural

Network
Creating a hybrid framework

that merges an ANN with

a Fuzzy Logic System (FLS).
Dataset of

3032

examples

from 20

test subjects
R value: 0.98

MSE value:

0.08
Small dataset
[35][89] Artificial Neural

Network
and SVMs, tailored to specific classes,

with a focus on gait classification.
Creation of an ANN model

aimed at forecasting the functional

capacity status of individuals.Dataset

of gait

measurements

of 58

subjects
Dataset of

200 examples

from 10Accuracy

= 90.5%


subjects
R value: 0.995

MSE value:

0.108Restricted

to only

two patho-

logical

populations
Inadequate

dataset [22][76]
[36][90]Support Vector

Machine
Investigates the viability Artificial Neural

Network

of employing machine learning and

statistical methods to aid in

distinguishing neurodegenerative

conditions through gait analysis.
Gait in

Creating a biomarker utilizing

DNA CpG methylation data

to identify HD.Neuro-

degenerative

disease

dataset of

64 patients
DNA

methylation

Accuracy

= 86.9%
data of

76 samplesUse of

CMP: 0.92

CP: 0.86irrelavant

features
Small size of

the datapool [23][77]
[37][91]Support Vector

Machine
Focuses on developing imaging Deep Neural

Network

biomarkers for neurodegenerative

disease, specifically HD.
Voxel based Development of an objective

and non-invasive acoustic

data

of 64

individuals
Accuracy

= 76%


biomarker that can detect HD
Data from

HD study of

62 speakersClassification

pre-HD

subjects

>22 YTO or

>14 YTO
Accuracy

= 87% Insufficient

features [24][78]
[38][92Support Vector

Machine
]Explores the use of ML methods,

specifically the SVM algorithm,

to classify individuals with HD

based on oculomotor performance
Deep Convolutional

Neural Network
Creating a DL-driven method

to analyze gait patterns

in individuals with HD.Recorded eye

movement

data of 50

participants
Foot

pressure

data of

12 patientsClassifying:

Accuracy

= 73.4%

Distinguishing:

Accuracy

= 81.8%
Accuracy

= 82%Relatively

small number

of individuals

per group
Preprocessing

module can

be further

optimized
[25][79] Support Vector

Machine
Investigates the use of SVMs

in categorizing HD stages , utilizing

metrics extracted from T1-weighted

and diffusion-weighted imaging data.
MRI-derived

datasets of

68 people
Classifying:

Accuracy

[39][93] Extreme Learning

Machine
Built an innovative technique

to educate ELM models using

datasets containing absent

data points.
= 85–95%

Huntington’s

disease

Distinguishing:

Accuracy

= 74%
dataset of

3729 samples

from 1370

Small training

sample size
subjects F1 score: 0.98 Performance

loss on

smaller

features
[26][80] Support Vector

Machine
[40][94]Utilization of a pharmacologic Deep Boltzmann

Machine

strategy to explore a newly developed

traditional Chinese medicine (TCM)

formulation for HD therapy.
TCM database Proposal of SRBM to analyze

RNA-seq data associated

with Huntington’s disease.CoMFA − R2 =
Gene

expression

dataset of

12 samples

0.9488

CoMSIA − R2 =

0.9555
Limited

AUC: 0.522only to

HD
Did not

explore other

methodologies [19][73] Support Vector

Machine
Recommends an automated

method for diagnosing HD by

examining gait dynamics
Gait in

Neuro-

degenerative

disease

dataset of

36 people
Accuracy

= 100%
Limited

models

explored
[20][74] Random Forest Utilising machine learning methods

to pinpoint potential genes that play

a role in HD
GSE33000

dataset of

314 subjects
Accuracy

= 90.45%
Small data

samples
[27][81] Random

Forest
Aims to assess the ability of

clinical and biological factors

to forecast the advancement of HD.
Enroll-HD

periodic

dataset (PDS6)

of 15,301

subjects
NIL Focused on

clinical

variables

only
[28][82] Random

Forest
Discover potential microRNA

biomarkers associated with

susceptibility to Juvenile Onset HD.
JOHD miRNA-

mRNA

expression

dataset

(GSE65776)

of 168 samples
100% AUC Limited to

Juvenile

Onset HD
[29][83] K-NN Suggested an innovative method

to identify HD by

analyzing digitized voice recordings

of patients reciting Lithuanian poems.
Own audio

dataset

of 24

patients
Accuracy

= 97.3%
Smaller dataset
[30][84] Logiboost,

Random

Forest
Enhance the accuracy of

classifying HD patients using

gait data while simultaneously

minimizing the reliance on a

reduced number of sensor devices

for data acquisition.
HD gait

dataset of

28 gait

features
For raw data:

Accuracy

= 94.4%

For gait features:

Accuracy

= 92.8%
Analyses only

two

experiment

results
[31][85] Ensemble

Model
Creation and presentation of

a ML model based on stacked

ensemble techniques for predicting

the individual stages of HD.
TRACK-HD

dataset of

184 HD

patients
Accuracy

= 55.3% ± 6.1
Research solely

on baseline

cross-sectional

data only
[32][86] Automatic

Machine

Learning
Development of a ML model

that can predict clinical

performance in HD using brief

samples of speech recordings
126 samples

of audio
Video Production Service