Exploring Huntington’s Disease Diagnosis via Artificial Intelligence-Powered Models

Exploring Huntington’s Disease Diagnosis via Artificial Intelligence-Powered Models: Comparison

Please note this is a comparison between Version 1 by Jayakumar Kaliappan and Version 2 by Fanny Huang.

Huntington’s Disease (HD) is a devastating neurodegenerative disorder characterized by progressive motor dysfunction, cognitive impairment, and psychiatric symptoms. The early and accurate diagnosis of HD is crucial for effective intervention and patient care.

Huntington’s disease
Artificial Intelligence
machine learning
deep learning
diagnosis

1. Introduction

Huntington’s disease is a profoundly impactful neurodegenerative disorder [1] that not only affects individuals but also casts a long shadow over their families [2]. It represents a complex clinical picture, marked by the inexorable progression of motor dysfunction, cognitive decline, and psychiatric symptoms, ultimately culminating in profound disability and a tragically shortened lifespan [3]. The gravity of HD has sparked growing concern among the medical and research communities worldwide, triggering multifaceted efforts to not only unravel its etiological and pathophysiological intricacies but also to pioneer advancements in its early detection and management [4]. Research in other neurodegenerative diseases such as Alzheimer’s ^[5][6][7][5,6,7] and Parkinson’s [8] has similarly aimed to decode their intricate mechanisms, leading to strides in understanding their pathology and paving the way for potential treatment breakthroughs.

Recent epidemiological studies have illuminated the prevalence and incidence of HD, revealing the stark reality of this disease. The pooled incidence of HD, as reported across various populations, has been estimated at 0.48 cases per 100,000 person-years (95% CI, 0.33–0.63). This statistic underscores the challenging nature of diagnosing HD, especially in its nascent stages, given its relative rarity. Furthermore, a continent-based analysis of these figures uncovers marked disparities in the incidence of HD, with Europe and North America experiencing considerably higher rates compared to Asia. Beyond incidence, comprehending the prevalence of HD is essential for effective healthcare planning and resource allocation. The compiled prevalence of HD stands at 4.88 per 100,000 (95% CI, 3.38–7.06) [9], shedding light on the overall burden of the disease within populations. These prevalence figures not only serve as an alarming reminder of the global health concern that HD represents but also emphasize the urgent need for concerted efforts to enhance its diagnosis, treatment, and support systems for affected individuals and their families.

The pathological progression of Huntington’s disease (HD) remains elusive, drawing attention from varied research domains. The work [10] explores motor speech patterns, reflecting the motor involvement in HD. The ref. [11] delves into speech biomarkers across HD stages, emphasizing the continuum from pre-symptomatic to early manifestation. An another article [12] contributes insights on Huntington’s multifaceted role, spanning neurodevelopment to neurodegeneration. Understanding synaptic loss, the ref. [13] and another study in Nature Medicine [14] highlight the early involvement of microglia, complement activation, and innate immune mechanisms in corticostriatal synapse decline. The study [15] offers perspectives on the toxic effects of mutant Huntington. This multifaceted exploration underscores the complexity of HD’s pathological cascade, spanning molecular, synaptic, and clinical dimensions.

2. Exploring Huntington’s Disease Diagnosis via AI-Powered Models

2.1. Preamble—Diagnosis of Huntington’s Disease via AI-Powered Models

The need for AI-powered approaches in HD diagnosis arises from the pressing demand for early and accurate detection of this debilitating neurodegenerative disorder. HD is a multifaceted condition with a broad range of clinical manifestations, making it challenging for clinicians to diagnose, particularly in its early stages. ML and DL models have demonstrated their potential in handling the intricate and multi-modal data associated with HD, including genetic information, neuroimaging scans, and clinical assessments. These models can leverage vast datasets to identify subtle patterns and biomarkers that might elude human observers, enabling earlier and more precise diagnosis. Additionally, the development of automated diagnostic tools can alleviate the burden on healthcare professionals, streamline the diagnostic process, and ultimately lead to better patient outcomes. Given the ongoing progression of HD coupled with the lack of a definitive cure, the timely diagnosis facilitated by ML and DL techniques becomes paramount for initiating appropriate interventions, providing counseling, and advancing research into potential therapies. Therefore, the integration of ML and DL models into HD diagnosis is not only a necessity but also holds significant promise for enhancing the quality of life for individuals and families impacted by this devastating disease.

2.2. Machine Learning Techniques

Machine learning techniques have emerged as valuable tools in the diagnosis and assessment of Huntington’s disease. These techniques utilize various algorithms and computational approaches to analyze complex data sets, offering clinicians and researchers new insights into the disease ^[16][70]. One of the primary applications of machine learning in HD diagnosis is the identification of biomarkers and patterns within medical images, such as magnetic resonance imaging (MRI) and functional MRI (fMRI) ^[17][71]. Machine learning algorithms can detect subtle changes in brain structure and function, helping to distinguish individuals with HD from healthy controls and providing a means to monitor disease progression over time. Additionally, machine learning models can analyze clinical data, including motor, cognitive, and psychiatric assessments, to identify relevant features and patterns that contribute to accurate diagnosis and prognosis ^[18][72].

It also plays a crucial role in predictive modeling for HD risk assessment. By incorporating genetic data and other relevant factors, machine learning algorithms can predict an individual’s likelihood of developing HD, aiding in early intervention and counseling. Furthermore, machine learning approaches can be applied to large-scale genetic studies to identify genetic modifiers and factors linked to the potential for risk involving with the age of onset and disease advancement. This information not only deepens our understanding of the disease but also has implications for the development of targeted therapies. In summary, machine learning techniques are advancing the field of HD diagnosis by facilitating the extraction of valuable insights from clinical and genetic data, ultimately leading to earlier detection and improved management of this devastating neurodegenerative disorder.

2.2.1. Naive Bayes

The Naive Bayes classifier stands out as a prominent choice for effectively discerning gait signals between individuals with HD and those without the condition ^[19][73]. The study reports an impressive accuracy rate of 94.4% achieved by the Naive Bayes classifier in this diagnostic context. This highlights the effectiveness of Naive Bayes as a valuable machine learning tool for Huntington’s disease diagnosis, offering the potential for non-invasive and objective assessment of individuals based on their gait dynamics, aiding in the timely identification and continuous tracking of the condition.

2.2.2. Decision Tree

The use of Decision Tree stands out as a highly effective tool in the diagnosis of Huntington’s disease ^[19][73]. Decision Tree achieved an impressive average accuracy of 100% in accurately classifying gait signals from subjects with HD. This remarkable accuracy underscores the robustness of the Decision Tree algorithm in distinguishing individuals with HD based on their gait dynamics. Additionally, the Decision Tree emerges as a pivotal machine learning algorithm employed for the prediction and identification of potential contributing genes in Huntington’s disease ^[20][74]. This method involves the use of Decision Tree to formulate rules for attributes, specifically genes, and makes determinations regarding the prediction class, which denotes whether a given sample is associated with HD or not. Remarkably, the Decision Tree model showcased its efficacy by achieving an impressive cross-validated classification accuracy of 90.79% with a standard deviation of 4.57% when applied to the expression data of prefrontal cortex samples.

2.2.3. Support Vector Machine

Support Vector Machine (SVM) emerges as a crucial classifier for gait classification, playing a significant role in the context of Huntington’s disease diagnosis, alongside other pathological conditions ^[21][75]. The utilization of SVM to differentiate gait patterns among diverse clinical groups, including individuals with Huntington’s disease, post-stroke patients, and healthy elderly individuals, employing data collected from inertial sensors. The classifier is trained using features derived from subject-specific Hidden Markov Models (HMMs), encompassing temporal and frequency domain signal data, and employs a leave-one-subject-out cross-validation technique, working in conjunction with three HMMs to assess likelihoods and ensure precise gait classification. The SVM also emerges as a valuable supervised classification method utilized for the discrimination of neurodegenerative diseases, including Huntington’s disease, through gait analysis ^[22][76]. This approach leverages SVM as a prediction model to classify and monitor these target diseases effectively. Notably, the SVM model aids in the identification of the most predictive features extracted from the gait analysis dataset, enabling a refined and accurate disease discrimination process. Impressively, the SVM model achieves a commendable accuracy rate of 86.9% in distinguishing these neurodegenerative diseases, underscoring its significance as a powerful tool for enhancing the diagnostic capabilities and understanding of HD and similar conditions through the analysis of gait patterns.

SVM is harnessed for classification purposes, specifically to distinguish individuals as either pre-HD or controls based on neuroimaging data ^[23][77]. Collaborating with linear discriminant analysis (LDA), SVM plays a crucial role in developing classification models capable of decoding essential information about the disease state from neuroimaging data. Impressively, these classification models utilizing SVM achieve notable success, reaching up to 76% accuracy in effectively distinguishing between pre-HD and control individuals based on their neuroimaging profiles. SVM is also applied for the classification of subjects by their HD stage, based on oculomotor features ^[24][78]. The accuracy of the SVM classifier varies depending on the specific classification task, achieving 73.47% accuracy for distinguishing control participants from pre-HD participants, 81.84% for distinguishing control participants from HD subjects, and 83.54% for distinguishing pre-HD subjects from HD patients, highlighting its effectiveness in stratifying individuals based on disease progression. Linear SVM is employed to classify eye tracking data across pre-HD, HD, and control groups, utilizing different combinations of features to optimize performance. Notably, the study reports the best accuracy of 76.88% for the CTRL vs. HD classifier and 72.50% for the pre-HD vs. HD classifier, underscoring the utility of SVM in Huntington’s disease diagnosis by leveraging oculomotor performance-derived features to accurately differentiate disease stages.

SVM plays a pivotal role in the classification of HD stages based on features extracted from T1- and diffusion-weighted imaging data ^[25][79]. Utilizing SVM, different feature selection techniques, such as whole-brain GM or FA values, subcortical regions-of-interest GM or FA values, and automated GM or FA value selection via the Relief-F algorithm, are employed to classify HD stages. This research showcases the adaptability of SVM, achieving noteworthy distinctions between Early-HD and Pre-HD or healthy individuals, with accuracy levels spanning from 85% to 95%. Moreover, SVM effectively discriminates Pre-HD from controls using the caudate region’s FA feature, achieving an accuracy of 74%.

SVM model is harnessed to facilitate the development of a novel formula for HD. SVM, a versatile machine learning algorithm widely recognized for its proficiency in classification and regression tasks, proves instrumental in handling the complex task of formulating a treatment for Huntington’s disease ^[26][80]. This research utilizes SVM to model training using documented Traditional Chinese Medicine (TCM) prescriptions. The objective is to identify a formula that can effectively target multiple proteins associated with HD, leveraging the SVM model’s ability to work with high-dimensional data and complex datasets. SVM is also utilized to classify gait signals from unknown subjects, distinguishing between those suffering from HD and healthy subjects ^[19][73]. Remarkably, experimental results highlight SVM’s outstanding performance, achieving an impressive average accuracy of 100.0% in accurately classifying gait signals. The remarkable precision achieved marks a notable milestone in Huntington’s disease diagnosis, underscoring SVM’s potency in effectively utilizing gait dynamics data for dependable differentiation of HD patients from non-afflicted individuals.

2.2.4. Random Forest

In the ref. ^[20][74], Random Forest emerges as a prominent machine learning algorithm employed for the identification of contributing genes in Huntington’s disease. This strategy employs Random Forest to scrutinize postmortem prefrontal cortex samples from HD patients and control subjects, with the objective of identifying genes potentially linked to HD pathogenesis. The versatility of Random Forest proves beneficial in this context by effectively reducing the dimensionality of the data and highlighting the most relevant genes implicated in the pathophysiology of HD.The Random Forest model achieved a notable accuracy of 90.45 ± 4.24%, highlighting its pivotal role in aiding the diagnosis and comprehension of HD by deciphering the genetic components influencing its onset and progression. Random Forest also emerges as a prominent supervised classification model utilized for the discrimination of neurodegenerative diseases, including Huntington’s disease, through gait analysis. Random Forest serves as a computational classification technique, effectively characterizing these diseases using extracted features from gait cycles ^[22][76]. The study reports an impressive accuracy rate of 84.9% achieved by the Random Forest model, highlighting its substantial contribution to the accurate discrimination of HD and other neurodegenerative conditions based on gait patterns. The ref. ^[27][81] mainly focuses on assessing the significance and order of importance of potential factors that could predict the progression of clinical symptoms in patients with manifest HD. It accomplishes this by employing a random forest regression model to forecast how clinical outcomes change based on these factors.

Random Forest emerges as a powerful machine learning technique employed to discern microRNA biomarkers indicative of susceptibility to Juvenile Onset Huntington’s Disease (JOHD) ^[28][82]. The research employs the Random Forest methodology strategically to build predictive models, which can distinguish between JOHD and WT samples using mouse cortex samples from both young and aged groups. Additionally, it aims to forecast the inclination toward those genotypes. Impressively, the Random Forest model yields several robust models with testing accuracies exceeding 80% and impressive Area Under the Curve (AUC) scores surpassing 90%. It demonstrates a remarkable ability to distinguish between JOHD and WT samples, featuring a mature mRNA-based model that achieves a flawless 100% AUC score, highlighting its outstanding discriminatory capabilities. This application of Random Forest in the study underscores its potential in not only identifying crucial microRNA biomarkers but also in the diagnosis and predisposition assessment of Juvenile Onset HD, offering a significant advancement in the field of HD diagnostics.

2.2.5. K-Nearest Neighbours

K-Nearest Neighbors (KNN) emerges as a significant classifier for the diagnosis of HD based on gait dynamics information. KNN, a well-known machine learning classifier, plays a crucial role in distinguishing individuals with HD from healthy subjects by classifying gait signals from unknown subjects ^[19][73]. The study reports an impressive accuracy rate of 97.2% achieved by the KNN classifier, underscoring its effectiveness in accurately identifying and differentiating individuals with HD from those without the condition through the analysis of gait dynamics.

KNN is employed for the identification of HD through audio signal processing ^[29][83]. KNN is applied in the classification stage following dimensionality reduction of voice signals, contributing to the accuracy of disease detection. The study demonstrates the effectiveness of the combination of the emobase2010 feature extractor with the KNN classifier, achieving an impressive accuracy rate of 97.3%. Notably, this high accuracy is achieved while maintaining a prediction time below one second, highlighting the practical utility of KNN in Huntington’s disease diagnosis through audio signal analysis.

2.2.6. Ensemble Models

The ref. ^[30][84] highlights the crucial significance of ensemble classifier algorithms, with a specific emphasis on employing general ensemble classifier algorithms, in distinguishing gait patterns between individuals affected by HD and those who are healthy. This innovative methodology amalgamates individual classifier algorithms such as Logitboost and RandomForest, where Logitboost serves as the metaclassifier and RandomForest acts as the base classifier. The combination of Logitboost and RandomForest as ensemble classifiers showcases superior performance, particularly outperforming other tree decision algorithms in effectively classifying HD gait data. Significantly, the ensemble classifier method introduced demonstrates notable improvements in accuracy. It successfully classified 13 out of 14 subjects correctly and accurately identified all seven individuals with HD when employing the Logitboost and RandomForest combination. This showcases the significant promise of ensemble classifiers, specifically in harnessing ankle-mounted iPhone sensor data for robust diagnostic capabilities within the domain of Huntington’s disease classification. This approach holds substantial promise for advancing the accuracy and efficiency of HD diagnosis through gait analysis.

This research ^[31][85] presents an ensemble machine learning model that consistently outperforms nine conventional machine learning models, notably excelling in terms of accuracy. This ensemble model achieves a commendable balanced accuracy of 55.3% ± 6.1 in a 4-group classification of HD progression states. Even more impressive results are observed in binary classifications, with accuracies ranging from 70.9% ± 9.4 to 83.3% ± 6.3. Notably, the accuracy of the ensemble model experiences further augmentation through the incorporation of volumetric scores from diverse brain regions, including the occipital cortex, lateral ventricles, cingulate, and temporal lobe, in addition to the striatal structures. This emphasizes the potential of ensemble learning algorithms in advancing the precision of HD diagnosis through the utilization of structural MRI data, illustrating a significant stride forward in the field of neuroimaging-based diagnostics.

2.2.7. Automatic Machine Learning

AutoML provides a significant advancement by automating the selection and optimization of machine learning models, thus reducing the need for manual intervention in model selection and tuning ^[32][86]. Within this investigation, the utilization of auto-sklearn, which harnesses Bayesian optimization algorithms, effectively pinpoints the most proficient model within the training dataset. This optimization contributes to an elevated level of effectiveness and precision in prediction outcomes. Notably, the utilization of AutoML enables the integration of various speech features with demographic variables to predict cognitive, motor, and functional scores in HD. Moreover, it supports the creation of fully automated methods for speech analysis, potentially minimizing the need for manual annotations and enabling remote assessment of individual conditions in Huntington’s disease and similar neurodegenerative disorders. While the paper does not explicitly mention the accuracy of the AutoML models, it emphasizes the significant improvement in predictions when combining speech features with demographic variables, showcasing its potential for accurate assessment in HD. This innovative approach holds immense promise for advancing the diagnostic capabilities of Huntington’s disease.

2.2.8. Summary of Machine Learning Models

In summary, machine learning models have emerged as powerful tools for the diagnosis and understanding of HD. Various ML algorithms, as described in the earlier headings, have been applied to diverse data sources such as gait dynamics, genetic information, neuroimaging data, and speech recordings to enhance HD diagnosis and prognosis as in Table 1. These models have shown remarkable accuracy rates, often surpassing 90%, and have the potential to contribute to early detection, monitoring, and understanding of HD. However, several limitations persist across these studies, including the need for larger and more diverse datasets, the interpretability of complex models, and ethical considerations related to data privacy and security. Moreover, generalization to larger populations and clinical settings remains a challenge. The incorporation of automatic machine learning (AutoML) approaches signifies a promising direction in automating model selection and parameter tuning, potentially making these ML models more accessible for clinical deployment. Overall, ML models offer substantial potential for improving HD diagnosis, but further research and validation are needed to fully harness their capabilities and ensure their clinical utility.

Table 1. Summary of Machine Learning Models.

Reference	Machine Learning Approaches Used	Main Contributions	Dataset	Performance Evaluation Metrics	Limitation
^[19][73]	Decision Tree	Proposes an automated method for evaluating gait dynamics as a means of diagnosing HD.	Gait in Neuro- degenerative disease dataset of 36 people	Accuracy = 100%	Limited models explored
^[20][74]	Decision Tree	Identification of potential genes contributing to HD	GSE33000 dataset of 314 subjects	Accuracy = 90.79%	Small data samples
^[21][75]	Support Vector Machine	Introduces an approach centered around training classifiers such as Hidden Markov Models

recordings

of HD

gene carriers
Relative error

from 12.7% to

20%
Less		number of		participants

2.3. Deep Learning Techniques

Deep learning approaches have become potent instruments for advancing the field of HD research and diagnosis. These techniques utilize the artificial neural network (ANN) with stacked layers to automatically acquire complex patterns and representations from complex datasets. Deep learning models excel at capturing hierarchical and abstract features from diverse data sources, such as neuroimaging scans, genetic data, and clinical assessments. The application of deep learning in the context of HD has shown promise in enhancing diagnostic accuracy, predicting disease progression, and uncovering fresh perspectives on the fundamental mechanisms behind disorder.

2.3.1. Artificial Neural Network

The study ^[33][87] introduces a mathematical model, incorporating Artificial Neural Networks (ANN), which effectively simulates HD disorders and accurately replicates the behavior of individuals affected by Huntington’s disease. Specifically, the ANN within the model is trained using comprehensive data and physiological insights concerning the Basal Ganglia (BG), the region of the brain primarily impacted by HD. This innovative model serves as a potent analytical tool for comprehensively studying HD behavior, offering valuable insights into the underlying causes of movement disorders in HD patients. By employing ANN in mathematical models of brain performance, particularly within the context of BG in HD, this research significantly contributes to the expansion of medical knowledge and sheds crucial illuminate the intricacies of brain function in individuals grappling with Huntington’s disease.

The research ^[34][88] introduces an innovative hybrid model designed to assess the symptoms of individuals afflicted with Huntington’s disease. This model ingeniously combines the robust predictive capabilities of an ANN with the interpretability afforded by a fuzzy logic system (FLS). Remarkably, the ANN component of the model achieved an impressive regression R value of 0.98, along with a low mean squared error (MSE) of 0.08. These metrics affirm the accuracy of the model in predicting the functional capacity level (FCL) of an individual. Complementarily, the FLS component offers a conclusive evaluation of the subject’s reaction condition, further enhancing the model’s interpretability. This amalgamation of ANN and FLS in the hybrid model enables a comprehensive evaluation of HD symptoms, effectively leveraging both predictive capabilities and linguistic interpretation. This pioneering model holds significant potential in advancing the daily lives of HD patients, offering a means to monitor and predict disease progression for improved care and management.

In the ref. ^[35][89], a comprehensive model employing a range of Artificial Neural Network (ANN) models to analyze data gathered from smart devices, such as smartphones or tablets, in order to forecast the functional capacity level of individuals afflicted with HD, is introduced. This approach encompasses a diverse array of ANN models, including Cascade forward backpropagation (CFBP), Feed-forward backpropagation (FFBP), Elman, Generalized regression neural network (GRNN), Nonlinear autoregressive exogenous model (NARX), Layer recurrent neural network (RNN), and Feed-forward time delay neural network (FFTDNN). The paper intricately details the entire process, from data preparation and labeling to the selection of learning algorithms, specific neural network training, performance evaluation, and comparative analysis. This study represents a significant stride toward leveraging advanced technology for a more precise and insightful assessment of functional capacity levels in HD patients.

The ref. ^[36][90] underscores the significance of employing non-linear techniques, particularly ANNs, as a potent tool in comprehending the intricacies of HD. The authors present a pioneering approach utilizing ANNs to accurately discern between control subjects and those affected by HD, leveraging DNA CpG methylation data. What sets this approach apart is its capacity to streamline the consideration of CpGs from hundreds of thousands down to a mere 237, showcasing the remarkable effectiveness of deep learning techniques in HD diagnosis. The study’s results unequivocally demonstrate that by focusing on these 237 CpGs and employing non-linear techniques such as ANNs, a precise differentiation between control and HD patients can be achieved. Overall, this paper advocates for the pivotal role of artificial neural networks, particularly as a deep learning technique, in the diagnosis of Huntington’s disease, particularly when leveraging DNA CpG methylation data.

2.3.2. Deep Neural Network

Deep Neural Network revolutionizes the identification of HD through the utilization of DNN in analyzing speech signals ^[37][91]. The approach leverages a combination of acoustic and lexical features for automated detection. Employing a Leave-One-Subject-Out (LOSO) methodology, the DNN model is meticulously trained and validated, where individual subjects are consecutively held out as the test speakers. Notably, the study observes a progressive increase in the accuracy of this method, particularly with advancing disease stages. This underscores the potential of speech as an effective biomarker for monitoring HD progression. The performance evaluation of the DNN model, alongside other deep learning models, is quantified using the word error rate (WER), yielding an impressive range between 9.4 to 14.9. These results substantiate the notion that employing objective analyses through DNN and similar deep learning models holds significant promise in distinguishing between healthy individuals and those with HD. This advancement not only reinforces clinical diagnoses but also facilitates symptom tracking in non-laboratory and non-clinical settings, presenting a notable stride towards improved healthcare management for individuals affected by Huntington’s disease.

2.3.3. Deep Convolutional Neural Network

The ref. ^[38][92] delves into the application of deep learning models, specifically VGG16 and 3D CNN, for diagnosing Huntington’s disease. The study reveals that VGG16, a well-established architecture, holds great promise in classifying disease severity through analyzing pressure data from individual footsteps, achieving an impressive 89% accuracy. Its proficiency in extracting nuanced features such as edges and corners significantly contributes to accurate classification. While VGG16 excelled, other techniques such as 3D CNN also demonstrated an accuracy of 82%. The study highlights that 3D CNN, though slightly less accurate at 82%, presents potential for improvement when combined with models such as VGG16. The paper suggests that while 3D CNN may have slightly different feature extraction capabilities compared to the novel model used, combining their strengths could lead to even more accurate Huntington’s disease diagnosis. This integrated approach signifies a promising stride towards refining disease classification, holding substantial implications for both clinical practice and research in this field.

2.3.4. Extreme Learning Machine

Extreme Learning Machine (ELM) models, as outlined in this research, present a pioneering method for predicting the progression of Huntington’s disease based on brain scans. The approach, referred to as Brute-force Missing Data Extreme Learning Machine, showcases significant potential in this domain ^[39][93]. This novel method leverages the ELM framework to train models on datasets containing absent data for both processing and the desired outcomes. Notably, the ELM approach in this study demonstrates exceptional efficiency by individually constructing and training models for each sample in the test set. This process is remarkably efficient, eliminating the need for repeated access to the training data. Experimental comparisons conducted in the study reveal highly promising results, this underscores the effectiveness of employing ELM in the diagnosis of Huntington’s disease. By addressing missing data challenges and leveraging the power of ELM, this approach offers a significant stride forward in accurately predicting the progression of HD, holding considerable potential for advancing early diagnosis and intervention strategies for individuals affected by this condition.

2.3.5. Deep Boltzmann Machine

The ref. ^[40][94] introduces a pioneering approach utilizing a stacked restricted Boltzmann machine (SRBM) in the analysis of RNA-seq data for Huntington’s disease diagnosis. This innovative deep learning technique is specifically tailored to identify key genes implicated in the progression of HD. By examining differentially activated neurons and changes in gene energy at various time intervals, SRBM efficiently screens disease-associated factors and genes. Experimental results underscore the remarkable efficacy of SRBM, demonstrating its ability to discern crucial information in time series gene expression datasets. This leads to a significant improvement in the accuracy of identifying disease-associated genes and predicting top-ranking genes, surpassing the capabilities of current state-of-the-art methods. Moreover, SRBM outperforms other computational approaches in analyzing gene expression data of HD-afflicted mice across distinct spans of time. Its automatic feature learning capacity, coupled with heightened precision in identifying disease-associated genes, underscores SRBM as a formidable tool in HD diagnosis. This approach stands at the forefront of computational methods, offering a highly effective means of understanding the genetic underpinnings of HD progression.

2.3.6. Summary of Deep Learning Models

These algorithms represent a significant paradigm shift in the diagnosis and understanding of HD as in Table 2. These DL models excel in leveraging various data modalities such as gait dynamics, genetic information, neuroimaging data, speech signals, and RNA-seq data to enhance the accuracy and depth of HD diagnosis and prognosis. Notably, DL models such as DNNs showcase exceptional predictive capabilities, with impressive accuracy rates and the potential for monitoring HD progression. However, DL models often require large and diverse datasets for training, and their complexity may pose challenges in model interpretability and clinical applicability. Furthermore, while these models demonstrate remarkable potential, validation in diverse populations and clinical settings is essential to fully harness their capabilities and ensure their suitability for widespread clinical deployment. Nonetheless, DL models signify a transformative advancement in the field of HD diagnosis, offering valuable insights and paving the way for more accurate and personalized care for individuals affected by this debilitating disease.

Table 2. Summary of Deep Learning Models.

References	Deep Learning Approaches Used	Main Contribution	Dataset	Performance Evaluation Metrics	Limitation
^[33][87]	Artificial Neural Network	Creating a mathematical model with a grey box approach to replicate the characteristics of Huntington’s disease disorders.	Gait Signal dataset of 36 people	NIL	Limited to pharmaceutical treatments only
^[34][88]	Artificial Neural Network	Creating a hybrid framework that merges an ANN with a Fuzzy Logic System (FLS).	Dataset of 3032 examples from 20 test subjects	R value: 0.98 MSE value: 0.08	Small dataset
^[35][89]	Artificial Neural Network	and SVMs, tailored to specific classes, with a focus on gait classification.	Creation of an ANN model aimed at forecasting the functional capacity status of individuals.Dataset of gait measurements of 58 subjects	Dataset of 200 examples from 10Accuracy = 90.5%	subjects	R value: 0.995 MSE value: 0.108Restricted to only two patho- logical populations
Inadequate		dataset	^[22][76]
^[36][90]	Support Vector Machine	Investigates the viability	Artificial Neural Network of employing machine learning and statistical methods to aid in distinguishing neurodegenerative conditions through gait analysis.	Gait in	Creating a biomarker utilizing DNA CpG methylation data to identify HD.Neuro- degenerative disease dataset of 64 patients	DNA methylation Accuracy = 86.9%	data of 76 samplesUse of	CMP: 0.92 CP: 0.86irrelavant features
Small size of		the datapool	^[23][77]
^[37][91]	Support Vector Machine	Focuses on developing imaging	Deep Neural Network biomarkers for neurodegenerative disease, specifically HD.	Voxel based	Development of an objective and non-invasive acoustic data of 64 individuals	Accuracy = 76%	biomarker that can detect HD	Data from HD study of 62 speakersClassification pre-HD subjects >22 YTO or >14 YTO
Accuracy		= 87%	Insufficient		features	^[24][78]
^[38][92	Support Vector Machine	]Explores the use of ML methods, specifically the SVM algorithm, to classify individuals with HD based on oculomotor performance	Deep Convolutional Neural Network	Creating a DL-driven method to analyze gait patterns in individuals with HD.Recorded eye movement data of 50 participants	Foot pressure data of 12 patientsClassifying: Accuracy = 73.4% Distinguishing: Accuracy = 81.8%	Accuracy = 82%Relatively small number of individuals per group
Preprocessing		module can		be further	optimized	^[25][79]	Support Vector Machine	Investigates the use of SVMs in categorizing HD stages , utilizing metrics extracted from T1-weighted and diffusion-weighted imaging data.	MRI-derived datasets of 68 people	Classifying: Accuracy
^[39][93]	Extreme Learning Machine	Built an innovative technique to educate ELM models using datasets containing absent data points.	= 85–95%		Huntington’s disease Distinguishing: Accuracy = 74%	dataset of 3729 samples from 1370 Small training sample size
subjects	F1 score: 0.98	Performance		loss on	smaller features	^[26][80]	Support Vector Machine
^[40][94]	Utilization of a pharmacologic	Deep Boltzmann Machine strategy to explore a newly developed traditional Chinese medicine (TCM) formulation for HD therapy.	TCM database	Proposal of SRBM to analyze RNA-seq data associated with Huntington’s disease.CoMFA − R² =	Gene expression dataset of 12 samples 0.9488 CoMSIA − R² = 0.9555	Limited	AUC: 0.522only to HD
Did not		explore other		methodologies	^[19][73]	Support Vector Machine	Recommends an automated method for diagnosing HD by examining gait dynamics	Gait in Neuro- degenerative disease dataset of 36 people	Accuracy = 100%	Limited models explored
^[20][74]	Random Forest	Utilising machine learning methods to pinpoint potential genes that play a role in HD	GSE33000 dataset of 314 subjects	Accuracy = 90.45%	Small data samples
^[27][81]	Random Forest	Aims to assess the ability of clinical and biological factors to forecast the advancement of HD.	Enroll-HD periodic dataset (PDS6) of 15,301 subjects	NIL	Focused on clinical variables only
^[28][82]	Random Forest	Discover potential microRNA biomarkers associated with susceptibility to Juvenile Onset HD.	JOHD miRNA- mRNA expression dataset (GSE65776) of 168 samples	100% AUC	Limited to Juvenile Onset HD
^[29][83]	K-NN	Suggested an innovative method to identify HD by analyzing digitized voice recordings of patients reciting Lithuanian poems.	Own audio dataset of 24 patients	Accuracy = 97.3%	Smaller dataset
^[30][84]	Logiboost, Random Forest	Enhance the accuracy of classifying HD patients using gait data while simultaneously minimizing the reliance on a reduced number of sensor devices for data acquisition.	HD gait dataset of 28 gait features	For raw data: Accuracy = 94.4% For gait features: Accuracy = 92.8%	Analyses only two experiment results
^[31][85]	Ensemble Model	Creation and presentation of a ML model based on stacked ensemble techniques for predicting the individual stages of HD.	TRACK-HD dataset of 184 HD patients	Accuracy = 55.3% ± 6.1	Research solely on baseline cross-sectional data only
^[32][86]	Automatic Machine Learning	Development of a ML model that can predict clinical performance in HD using brief samples of speech recordings	126 samples of audio