Machine learning (ML), an area of artificial intelligence (AI), enables researchers, physicians, and patients to solve some of these issues. Many researchers and practitioners illustrate the promise of machine-learning-based disease diagnosis (MLBDD), which is inexpensive and time-efficient. Traditional diagnosis processes are costly, time-consuming, and often require human intervention. While the individual’s ability restricts traditional diagnosis techniques, ML-based systems have no such limitations, and machines do not get exhausted as humans do. As a result, a method to diagnose disease with outnumbered patients’ unexpected presence in health care may be developed. To create MLBDD systems, health care data such as images (i.e., X-ray, MRI) and tabular data (i.e., patients’ conditions, age, and gender) are employed.
1. Basics and Background
Machine learning (ML) is an approach that analyzes data samples to create main conclusions using mathematical and statistical approaches, allowing machines to learn without programming. Arthur Samuel presented machine learning in games and pattern recognition algorithms to learn from experience in 1959, which was the first time the important advancement was recognized. The core principle of ML is to learn from data in order to forecast or make decisions depending on the assigned task
[9][1]. Thanks to machine learning (ML) technology, many time-consuming jobs may now be completed swiftly and with minimal effort. With the exponential expansion of computer power and data capacity, it is becoming simpler to train data-driven ML models to predict outcomes with near-perfect accuracy. Several papers offer various sorts of ML approaches
[10,11][2][3].
The ML algorithms are generally classified into three categories such as supervised, unsupervised, and semisupervised
[10][2]. However, ML algorithms can be divided into several subgroups based on different learning approaches, as shown in
Figure 1. Some of the popular ML algorithms include linear regression, logistic regression, support vector machines (SVM), random forest (RF), and naïve Bayes (NB)
[10][2].
Figure 1. Different types of machine learning algorithms.
2. Machine Learning Techniques for Different Disease Diagnosis
Many academics and practitioners have used machine learning (ML) approaches in disease diagnosis. This section describes many types of machine-learning-based disease diagnosis (MLBDD) that have received much attention because of their importance and severity. For example, due to the global relevance of COVID-19, several studies concentrated on COVID-19 disease detection using ML from 2020 to the present, which also received greater priority in ourthe study. Severe diseases such as heart disease, kidney disease, breast cancer, diabetes, Parkinson’s, Alzheimer’s, and COVID-19 are discussed briefly, while other diseases are covered briefly under the “other disease”.
2.1. Heart Disease
Most researchers and practitioners use machine learning (ML) approaches to identify cardiac disease
[37,38][4][5]. Ansari et al. (2011), for example, offered an automated coronary heart disease diagnosis system based on neurofuzzy integrated systems that yield around 89% accuracy
[37][4]. One of the study’s significant weaknesses is the lack of a clear explanation for how their proposed technique would work in various scenarios such as multiclass classification, big data analysis, and unbalanced class distribution. Furthermore, there is no explanation about the credibility of the model’s accuracy, which has lately been highly encouraged in medical domains, particularly to assist users who are not from the medical domains in understanding the approach.
Rubin et al. (2017) uses deep-convolutional-neural-network-based approaches to detect irregular cardiac sounds. The authors of this study adjusted the loss function to improve the training dataset’s sensitivity and specificity. Their suggested model was tested in the 2016 PhysioNet computing competition. They finished second in the competition, with a final prediction of 0.95 specificity and 0.73 sensitivity
[39][6].
Aside from that, deep-learning (DL)-based algorithms have lately received attention in detecting cardiac disease. Miao and Miao et al. (2018), for example, offered a DL-based technique to diagnosing cardiotocographic fetal health based on a multiclass morphologic pattern. The created model is used to differentiate and categorize the morphologic pattern of individuals suffering from pregnancy complications. Their preliminary computational findings include accuracy of 88.02%, a precision of 85.01%, and an F-score of 0.85
[40][7]. During that study, they employed multiple dropout strategies to address overfitting problems, which finally increased training time, which they acknowledged as a tradeoff for higher accuracy.
Although ML applications have been widely employed in heart disease diagnosis, no research has been conducted that addressed the issues associated with unbalanced data with multiclass classification. Furthermore, the model’s explainability during final prediction is lacking in most cases.
2.2. Kidney Disease
Kidney disease, often known as renal disease, refers to nephropathy or kidney damage. Patients with kidney disease have decreased kidney functional activity, which can lead to kidney failure if not treated promptly. According to the National Kidney Foundation, 10% of the world’s population has chronic kidney disease (CKD), and millions die each year due to insufficient treatment. The recent advancement of ML- and DL-based kidney disease diagnosis may provide a possibility for those countries that are unable to handle the kidney disease diagnostic-related tests
[49][8]. For instance, Charleonnan et al. (2016) used publicly available datasets to evaluate four different ML algorithms:
K-nearest neighbors (KNN), support vector machine (SVM), logistic regression (LR), and decision tree classifiers and received the accuracy of 98.1%, 98.3%, 96.55%, and 94.8%, respectively
[50][9]. Aljaaf et al. (2018) conducted a similar study. The authors tested different ML algorithms, including RPART, SVM, LOGR, and MLP, using a comparable dataset, CKD, as used by
[50][9], and found that MLP performed best (98.1 percent) in identifying chronic kidney disease
[51][10]. To identify chronic kidney disease, Ma et al. (2020) utilizes a collection of datasets containing data from many sources
[52][11]. Their suggested heterogeneous modified artificial neural network (HMANN) model obtained an accuracy of 87–99%.
2.3. Breast Cancer
Many scholars in the medical field have proposed machine-learning (ML)-based breast cancer analysis as a potential solution to early-stage diagnosis. Miranda and Felipe (2015), for example, proposed fuzzy-logic-based computer-aided diagnosis systems for breast cancer categorization. The advantage of fuzzy logic over other classic ML techniques is that it can minimize computational complexity while simulating the expert radiologist’s reasoning and style. If the user inputs parameters such as contour, form, and density, the algorithm offers a cancer categorization based on their preferred method
[57][12]. Miranda and Felipe (2015)’s proposed model had an accuracy of roughly 83.34%. The authors employed an approximately equal ratio of images for the experiment, which resulted in improved accuracy and unbiased performance. However, as the study did not examine the interpretation of their results in an explainable manner, it may be difficult to conclude that accuracy, in general, indicates true accuracy for both benign and malignant classifications. Furthermore, no confusion matrix is presented to demonstrate the models’ actual prediction for the each class.
Zheng et al. (2014) presented hybrid strategies for diagnosing breast cancer disease utilizing
k-means clustering (KMC) and SVM. Their proposed model considerably decreased the dimensional difficulties and attained an accuracy of 97.38% using Wisconsin Diagnostic Breast Cancer (WDBC) dataset
[58][13]. The dataset is normally distributed and has 32 features divided into 10 categories. It is difficult to conclude that their suggested model will outperform in a dataset with an unequal class ratio, which may contain missing value as well.
To determine the best ML models, Asri et al. (2016) applied various ML approaches such as SVM, DT (C4.5), NB, and KNN on the Wisconsin Breast Cancer (WBC) datasets. According to their findings, SVM outperformed all other ML algorithms, obtaining an accuracy of 97.13%
[59][14]. However, if a same experiment is repeated in a different database, the results may differ. Furthermore, experimental results accompanied by ground truth values may provide a more precise estimate in determining which ML model is the best or not.
Mohammed et al. (2020) conducted a nearly identical study. The authors employ three ML algorithms to find the best ML methods: DT (J48), NB, and sequential minimal optimization (SMO), and the experiment was conducted on two popular datasets: WBC and breast cancer datasets. One of the interesting aspects of this research is that they focused on data imbalance issues and minimized the imbalance problem through the use of resampling data labeling procedures. Their findings showed that the SMO algorithms exceeded the other two classifiers, attaining more than 95% accuracy on both datasets
[60][15]. However, in order to reduce the imbalance ratio, they used resampling procedures numerous times, potentially lowering the possibility of data diversity. As a result, the performance of those three ML methods may suffer on a dataset that is not normally distributed or imbalanced.
Assegie (2021) used the grid search approach to identify the best
k-nearest neighbor (KNN) settings. Their investigation showed that parameter adjustment had a considerable impact on the model’s performance. They demonstrated that by fine-tuning the settings, it is feasible to get 94.35% accuracy, whereas the default KNN achieved around 90% accuracy
[61][16].
To detect breast cancer, Bhattacherjee et al. (2020) employed a backpropagation neural network (BNN). The experiment was carried out in the WBC dataset with nine features, and they achieved 99.27% accuracy
[62][17]. Alshayeji et al. (2021) used the WBCD and WDBI datasets to develop a shallow ANN model for classifying breast cancer tumors. The authors demonstrated that the suggested model could classify tumors up to 99.85% properly without selecting characteristics or tweaking the algorithms
[63][18].
Sultana et al. (2021) detect breast cancer using a different ANN architecture on the WBC dataset. They employed a variety of NN architectures, including the multilayer perceptron (MLP) neural network, the Jordan/Elman NN, the modular neural network (MNN), the generalized feedforward neural network (GFFNN), the self-organizing feature map (SOFM), the SVM neural network, the probabilistic neural network (PNN), and the recurrent neural network (RNN). Their final computational result demonstrates that the PNN with 98.24% accuracy outperforms the other NN models utilized in that study
[64][19]. However, this study lacks the interpretability as of many other investigations because it does not indicate which features are most important during the prediction phase.
Deep learning (DL) was also used by Ghosh et al. (2021). The WBC dataset was used by the authors to train seven deep learning (DL) models: ANN, CNN, GRU, LSTM, MLP, PNN, and RNN. Long short-term memory (LSTM) and gated recurrent unit (GRU) demonstrated the best performance among all DL models, achieving an accuracy of roughly 99%
[65][20].
2.4. Diabetes
According to the International Diabetes Federation (IDF), there are currently over 382 million individuals worldwide who have diabetes, with that number anticipated to increase to 629 million by 2045
[71][21]. Numerous studies widely presented ML-based systems for diabetes patient detection. For example, Kandhasamy and Balamurali (2015) compared ML classifiers (J48 DT, KNN, RF, and SVM) for classifying patients with diabetes mellitus. The experiment was conducted on the UCI Diabetes dataset, and the KNN (K = 1) and RF classifiers obtained near-perfect accuracy
[72][22]. However, one disadvantage
of th is
work is that it used a simplified Diabetes dataset with only eight binary-classified parameters. As a result, getting 100% accuracy with a less difficult dataset is unsurprising. Furthermore, there is no discussion of how the algorithms influence the final prediction or how the result should be viewed from a nontechnical position in the experiment.
Yahyaoui et al. (2019) presented a Clinical Decision Support Systems (CDSS) to aid physicians or practitioners with Diabetes diagnosis. To reach this goal, the study utilized a variety of ML techniques, including SVM, RF, and deep convolutional neural network (CNN). RF outperformed all other algorithms in their computations, obtaining an accuracy of 83.67%, while DL and SVM scored 76.81% and 65.38% accuracy, respectively
[73][23].
Naz and Ahuja (2020) employed a variety of ML techniques, including artificial neural networks (ANN), NB, DT, and DL, to analyze open-source PIMA Diabetes datasets. Their study indicates that DL is the most accurate method for detecting the development of diabetes, with an accuracy of approximately 98.07%
[71][21]. The PIMA dataset is one of the most thoroughly investigated and primary datasets, making it easy to perform conventional and sophisticated ML-based algorithms. As a result, gaining greater accuracy with the PIMA Indian dataset is not surprising. Furthermore, the paper makes no mention of interpretability issues and how the model would perform with an unbalanced dataset or one with a significant number of missing variables. As is widely recognized in healthcare, several types of data can be created that are not always labeled, categorized, and preprocessed in the same way as the PIMA Indian dataset. As a result, it is critical to examine the algorithms’ fairness, unbiasedness, dependability, and interpretability while developing a CDSS, especially when a considerable amount of information is missing in a multiclass classification dataset.
Ashiquzzaman et al. (2017) developed a deep learning strategy to address the issue of overfitting in diabetes datasets. The experiment was carried out on the PIMA Indian dataset and yielded an accuracy of 88.41%. The authors claimed that performance improved significantly when dropout techniques were utilized and the overfitting problems were reduced
[74][24]. Overuse of the dropout approach, on the other hand, lengthens overall training duration. As a result, as they did not address these concerns in their study, assessing whether their proposed model is optimum in terms of computational time is difficult.
Alhassan et al. (2018) introduced the King Abdullah International Research Center for Diabetes (KAIMRCD) dataset, which includes data from 14k people and is the world’s largest diabetic dataset. During that experiment, the author presented a CDSS architecture based on LSTM and GRU-based deep neural networks, which obtained up to 97% accuracy
[75][25].
2.5. Parkinson’s Disease
Parkinson’s disease is one of the conditions that has received a great amount of attention in the ML literature. It is a slow-progressing chronic neurological disorder. When dopamine-producing neurons in certain parts of the brain are harmed or die, people have difficulty speaking, writing, walking, and doing other core activities
[80][26]. There are several ML-based approaches have been proposed. For instance, Sriram et al. (2013) used KNN, SVM, NB, and RF algorithms to develop intelligent Parkinson’s disease diagnosis systems. Their computational result shows that, among all other algorithms, RF shows the best performance (90.26% accuracy), and NB demonstrate the worst performance (69.23% accuracy)
[81][27].
Esmaeilzadeh et al. (2018) proposed a deep CNN-based model to diagnose Parkinson’s disease and achieved almost 100% accuracy on train and test set
[82][28]. However, there was no mention of any overfitting difficulties in the trial. Furthermore, the experimental results do not provide a good interpretation of the final classification and regression, which is now widely expected, particularly in CDSS. Grover et al. (2018) also used DL-based approaches on UCI’s Parkinson’s telemonitoring voice dataset. Their experiment using DNN has achieved around 81.67% accuracy in diagnosing patients with Parkinson’s disease symptoms
[80][26].
Warjurkar and Ridhorkar (2021) conducted a thorough study on the performance of the ML-based approach in decision support systems that can detect both brain tumors and diagnose Parkinson’s patients. Based on their findings, it was obvious that, when compared to other algorithms, boosted logistic regression surpassed all other models, attaining 97.15% accuracy in identifying Parkinson’s disease patients. In tumor segmentation, however, the Markov random technique performed best, obtaining an accuracy of 97.4%
[83][29].
2.6. COVID-19
The new severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), also known as COVID-19, pandemic has become humanity’s greatest challenge in contemporary history. Despite the fact that a vaccine had been advanced in distribution because to the global emergency, it was unavailable to the majority of people for the duration of the crisis
[88][30]. Because of the new COVID-19 Omicron strain’s high transmission rates and vaccine-related resistance, there is an extra layer of concern. The gold standard for diagnosing COVID-19 infection is now Real-Time Reverse Transcription-Polymerase Chain Reaction (RT-PCR)
[89,90][31][32]. Throughout the epidemic, the researcher advocated other technologies including as chest X-rays and Computed Tomography (CT) combined with Machine Learning and Artificial Intelligence to aid in the early detection of people who might be infected. For example, Chen et al. (2020) proposed a UNet++ model employing CT images from 51 COVID-19 and 82 non-COVID-19 patients and achieved an accuracy of 98.5%
[91][33]. Ardakani et al. (2020) used a small dataset of 108 COVID-19 and 86 non-COVID-19 patients to evaluate ten different DL models and achieved a 99% overall accuracy
[92][34]. Wang et al. (2020) built an inception-based model with a large dataset, containing 453 CT scan images, and achieved 73.1% accuracy. However, the model’s network activity and region of interest were poorly explained
[93][35]. Li et al. (2020) suggested the COVNet model and obtain 96% accuracy utilizing a large dataset of 4356 chest CT images of Pneumonia patients, 1296 of which were verified COVID-19 cases
[94][36].
Several studies investigated and advised screening COVID-19 patients utilizing chest X-ray images in parallel, with major contributions in
[95,96,97][37][38][39]. For example, Hemdan et al. (2020) used a small dataset of only 50 images to identify COVID-19 patients from chest X-ray images with an accuracy of 90% and 95%, respectively, using VGG19 and ResNet50 models
[95][37]. Using a dataset of 100 chest X-ray images, Narin et al. (2021) distinguished COVID-19 patients from those with Pneumonia with 86% accuracy
[97][39].
In addition, in order to develop more robust and better screening systems, other studies considered larger datasets. For example, Brunese et al. (2020) employed 6505 images with a data ratio of 1:1.17, with 3003 images classified as COVID-19 symptoms and 3520 as “other patients” for the objectives of that study
[98][40]. With a dataset of 5941 images, Ghoshal and Tucker (2020) achieved 92.9% accuracy
[99][41]. However, neither study looked at how their proposed models would work with data that was severely unbalanced and had mismatched class ratios. Apostolopoulos and Mpesiana (2020) employed a CNN-based Xception model on an imbalanced dataset of 284 COVID-19 and 967 non-COVID-19 patient chest X-ray images and achieved 89.6% accuracy
[100][42].
2.7. Alzheimer’s Disease
Alzheimer is a brain illness that often begins slowly but progresses over time, and it affects 60–70% of those who are diagnosed with dementia
[103][43]. Alzheimer’s disease symptoms include language problems, confusion, mood changes, and other behavioral disorders. Body functions gradually deteriorated, and the usual life expectancy is three to nine years after diagnosis. Early diagnosis, on the other hand, may assist to avoid and take required actions to enter into suitable treatment as soon as possible, which will also raise the possibility of life expectancy. Machine learning and deep learning have shown promising outcomes in detecting Alzheimer’s disease patients throughout the years. For instance, Neelaveni and Devasana (2020) proposed a model that can detect Alzheimer patients using SVM and DT, and achieved an accuracy of 85% and 83% respectively
[104][44]. Collij et al. (2016) also used SVM to detect single-subject Alzheimer’s disease and mild cognitive impairment (MCI) prediction and achieved an accuracy of 82%
[105][45].
Multiple algorithms have been adopted and tested in developing ML based Alzheimer disease diagnosis. For example, Vidushi and Shrivastava (2019) experimented using Logistic Regression (LR), SVM, DT, ensemble Random Forest (RF), and Boosting Adaboost and achieved an accuracy of 78.95%, 81.58%, 81.58%, 84.21%, and 84.21% respectively
[106][46]. Many of the study adopted CNN based approach to detect Alzheimer patients as CNN demonstrates robust results in image processing compared to other existing algorithms. As a consequence, Ahmed et al. (2020) proposed a CNN model for earlier diagnosis and classification of Alzheimer disease. Within the dataset consists of 6628 MRI images, the proposed model achieved 99% accuracy
[107][47]. Nawaz et al. (2020) proposed deep feature-based models and achieved an accuracy of 99.12%
[108][48]. Additionally, Studies conducted by Haft-Javaherian et al. (2019)
[109][49] and Aderghal et al. (2017)
[110][50] are some of the CNN based study that also demonstrates the robustness of CNN based approach in Alzheimer disease diagnosis.
2.8. Other Diseases
Beyond the disease mentioned above, ML and DL have been used to identify various other diseases. Big data and increasing computer processing power are two key reasons for this increased use. For example, Mao et al. (2020) used Decision Tree (DT) and Random Forest (RF) to disease classification based on eye movement
[114][51]. Nosseir and Shawky (2019) evaluated KNN and SVM to develop automatic skin disease classification systems, and the best performance was observed using KNN by achieving an accuracy of 98.22%
[115][52]. Khan et al. (2020) employed CNN-based approaches such as VGG16 and VGG19 to classify multimodal Brain tumors. The experiment was carried out using publicly available three image datasets: BraTs2015, BraTs2017, and BraTs2018, and achieved 97.8%, 96.9%, and 92.5% accuracy, respectively
[116][53]. Amin et al. (2018) conducted a similar experiment utilizing the RF classifier for tumor segmentation. The authors achieved 98.7%, 98.7%, 98.4%, 90.2%, and 90.2% accuracy using BRATS 2012, BRATS 2013, BRATS 2014, BRATS 2015, and ISLES 2015 dataset, respectively
[117][54].
Dai et al. (2019) proposed a CNN-based model to develop an application to detect Skin cancer. The authors used a publicly available dataset, HAM10000, to experiment and achieved 75.2% accuracy
[118][55]. Daghrir et al. (2020) evaluated KNN, SVM, CNN, Majority Voting using ISIC (International Skin Imaging Collaboration) dataset to detect Melanoma skin cancer. The best result was found using Majority Voting (88.4% accuracy)
[119][56].