Alzheimer’s disease (AD) is a prevalent form of dementia that accounts for up to 80% of all dementia cases. The use of machine learning and feature selection methods in predicting AD based on gene expression data is a rapidly evolving area of research.
1. Alzheimer’s Disease
Alzheimer’s disease (AD) is a progressive brain disorder that was first described by Dr. Alois Alzheimer in 1906. Dr. Alzheimer observed symptoms in his patient, such as memory loss, paranoia, and psychological changes, and upon autopsy, he noticed shrinkage of the patient’s brain 
. AD is the most common cause of dementia, which is a condition that slowly destroys memory and cognitive functioning, ultimately impacting the ability to carry out daily activities 
Currently, AD is ranked as the sixth leading cause of death worldwide, and the symptoms typically appear in individuals over the age of 60 
. In Saudi Arabia, experts estimate that 3.23% of the population, mostly aged 65 or older, may have dementia caused by AD 
. Despite extensive research efforts, there is currently no cure or definitive treatment for AD. Current approaches to managing the disease focus on helping individuals maintain cognitive function, manage behavioral symptoms, and slow the progression of memory loss 
. However, researchers are actively pursuing therapies that target specific genetic, molecular, and cellular mechanisms in the hopes of stopping or preventing the underlying cause of the disease 
. The complex nature of AD makes it a challenging condition to treat and manage. However, continued research and innovative approaches may lead to more effective treatments and improved outcomes for individuals with AD and their families.
2. Supervised Machine Learning
Supervised Machine Learning (SML) is when a machine is programmed to find particular patterns in massive data. SML has different ways to adjust these data by adjusting the algorithm to make predictions and many other tasks 
. The term is directly related to the fields of programming, IT, and mathematics. It is applied in all types of sectors of government, marketing, medicine, and any business which collects data and wants to make a decision based on these data. Subsequently, it is employed in consumer choices, weather forecasting, and website calculations. Researchers concentrate on various types of SML 
. Although there are many, many categories and aspects to SML, we would only generally describe the following: Support Vector Machine, Logistic Regression, Linear Discriminant Analysis, K-nearest neighbor, Decision Tree, and Naïve Bayes.
Support Vector Machine (SVM)
SVM is a discriminative classifier formally defined by a separating hyperplane. The algorithm outputs an optimal hyperplane which categorizes new examples. In two-dimensional space, this hyperplane is a line dividing a plane into two parts. Each class is separated on each side of the plane 
. A hyperplane is a line that linearly separates and classifies a set of data. Generally, the further from the hyperplane our data points lie, the more confident we are that they have been correctly classified. Hence, when new testing data are added, whatever side of the hyperplane they land on will decide the class that we assign to them 
Given the solutions
, the decision function can be written as
One aspect of SVM is its accuracy. SVM works well on smaller cleaner datasets. It can be more efficient because it uses a subset of training points. The cons are that it is not suited to larger datasets, as the training time with SVM can be high 
3. Feature Selection
Feature selection is a process of removing irrelevant features in a dataset, where the chosen algorithm automatically selects those features that contribute most to the prediction variable or output in which one is interested. Using feature selection before fitting data into the classifier can enhance accuracy by reducing training time and overfitting.
This stands for minimum Redundancy Maximum Relevance. mRMR aims to select the genes that have shown low correlation among the other genes (Minimum Redundancy) but still have high correlation to the classification variable (Maximum Relevance). For classes c
= (ci, … ck) the Maximum Relevance condition is to maximize the total relevance of all features in
The Minimum Redundancy condition is
refer to the expression levels of genes i
CFS stands for Correlation-based Feature Selection algorithm. CFS selects attributes by using a heuristic which measures the usefulness of individual genes for predicting the class label along with the level of inter-correlation among them. Highly correlated and irrelevant features are avoided. The method calculates the merit of a subset of k
is the average value of all feature–classification correlations, and
is the average value of all feature–feature correlations. The CFS criterion is defined as follows:
3.3. Chi-Square Test
The Chi-Square Test is a statistical algorithm used by classification methods to check the correlation between two variables. In the following equation, high scores on χ2
indicate that the null hypothesis (H0) of independence should be eliminated and thus that the occurrence of the term and class are dependent:
F-score is a simple statistical algorithm for feature selection. F-score can be used to measure the discrimination of two sets of real numbers.
Genetic Algorithm (GA) is one of the common wrapper gene selection methods. It is usually applied to discrete optimization problems. The main goal of GA is discovering the best and perfect solution within a group of potential solutions. This method reflects the process of natural selection where the fittest individuals are selected for reproduction in order to produce offspring of the next generation. Each set of solutions is named a population. Populations consist of vectors, i.e., chromosomes or individuals. Every item in the vector is referred to as a gene