Machine learning models are used to create and enhance various disease prediction frameworks. Ensemble learning is a machine learning technique that combines multiple classifiers to improve performance by making more accurate predictions than a single classifier. Although numerous studies have employed ensemble approaches for disease prediction, there is a lack of thorough assessment of commonly used ensemble approaches against highly researched diseases.
Ensemble learning is a machine learning approach that attempts to improve predictive performance by mixing predictions from many models. Employing ensemble models aims to reduce prediction generalisation error 
. The ensemble technique decreases model prediction error when the base models are diverse and independent. The technique turns to the collective output of individuals to develop a forecast. Despite numerous base models, the ensemble model operates and performs as a single model 
. Most real data mining solutions employ ensemble modelling methodologies. Ensemble approaches combine different machine learning algorithms to create more accurate predictions than those made by a single classifier 
. The ensemble model’s main purpose is to combine numerous weak learners to form a powerful learner, boosting the model’s accuracy 
. The main sources of the mismatch between real and predicted values when estimating the target variable using any machine-learning approach are noise, variation, and bias 
Disease diagnosis refers to the process of determining which disease best reflects a person’s symptoms. The most challenging issue is diagnosis because certain symptoms and indications are vague, and disease identification is vital in treating any sickness 
. Machine learning is a field that can help anticipate disease diagnosis based on prior training data 
. Many scientists have created various machine learning algorithms to effectively identify a wide range of conditions. Machine learning algorithms can create a model that predicts diseases and their treatments 
. Because of the vast amount of data available, disease prediction has become a significant research subject. Using these databases, researchers create disease prediction models for decision-making systems, allowing for better disease prediction and treatment at an early stage. Early diagnosis and timely treatment are the most effective ways to lower disease-related mortality rates 
. As a result, most medical scientists are drawn to emerging machine learning-based predictive model technologies for disease prediction.
Diabetes, skin cancer, kidney disease, liver disease, and heart conditions are common diseases that can significantly impact patients’ health. This explores the literature for disease prediction models based on these diseases. Researchers initially identified several types of disease prediction models by reviewing the current literature based on the five disease categories considered using a search strategy. The scope is to find essential trends among ensemble approaches used on various base model learners, their accuracy, and the diseases being studied in the literature. Given the increasing relevance and efficiency of the ensemble approach for predictive disease modelling, the field of study appears to be expanding. Researchers found limited research that thoroughly evaluates published studies applying ensemble learning for disease prediction. As a result, this aims to uncover critical trends across various ensemble techniques (i.e., bagging, boosting, stacking, and voting), their performance accuracies, and the diseases being researched. Furthermore, the benefits and drawbacks of various ensemble techniques are summarised.
Bagging is aggregating the predictions of many decision trees that have been fit to different samples of the same dataset. Ensemble bagging is created by assembling a series of classifiers that repeatedly run a given algorithm on distinct versions of the training dataset 
. Bagging, also known as bootstrapping 
, is the process of resampling data from the training set with the same cardinality as the starting set to reduce the classifier’s variance and overfitting 
. Compared to a single model, the final model should be less overfitted. A model with a high variance indicates that the outcome is sensitive to the practice data provided 
Bootstrapping is like random replacement sampling that can provide a better understanding of a data set’s bias and variation 
. A small portion of the dataset is sampled randomly as part of the bootstrap procedure. Random Forest and Random Subspace are upgraded versions of decision trees that use the bagging approach to improve the predictions of the decision tree’s base classifier 
. For generating multiple split trees, this technique uses a subset of training samples as well as a subset of characteristics. Multiple decision trees are created to fit each training subset. The dataset’s distribution of attributes and samples is normally performed at random.
Another bagging technique is the extra trees, in which many decision trees combine forecasts. It mixes a vast number of decision trees, like the random forest. On the other hand, the other trees employ the entire sample while selecting splits at random. Although assembling may increase computational complexity, bagging can be parallelisable. This can significantly reduce training time, subject to the availability of hardware for running parallel models 
. Because deep learning models take a long time to train, optimising several deep models on various training bags is not an option.
Boosting algorithms use weighted averages to transform poor learners into strong ones. During boosting, the original dataset is partitioned into several subgroups. The subset is used to train the classifier, which results in a sequence of models with modest performance 
. The elements that were incorrectly categorised by the prior model are used to build new subsets. The ensembling procedure then improves its performance by integrating the weak models using a cost function. It explained that, unlike bagging, each model functions independently before aggregating the inputs, with no model selection at the end. Boosting is a method of consecutively placing multiple weak pupils in a flexible manner. Intuitively, the new model focuses on the discoveries that have been shown to be the most difficult to match up until now, resulting in a good learner with less bias at the end of the process 
. Boosting can be used to solve regression and identification problems, such as bagging.
When compared to a single weak learner, strategies such as majority voting in classification problems or a linear combination of weak learners in regression problems result in superior prediction 
. The boosting approach trains a weak learner, computes its predictions, selects training samples that were mistakenly categorised, and then trains a subsequent weak learner with an updated training set that includes the incorrectly classified instances from the previous training session 
. Boosting algorithms such as AdaBoost and Gradient Boosting have been applied in various sectors 
. AdaBoost employs a greedy strategy to minimise a convex surrogate function upper limit by misclassification loss by augmenting the existing model with the suitably weighted predictor at each iteration 
. AdaBoost optimises the exponential loss function, whereas Gradient Boost extends this approach to any differential loss function.
Stacking is an assembly method in which one or more base-level classifiers are stacked with a metalearner classifier. The original data is used as input to numerous distinct models in stacking 
. The metaclassifier is then utilised to estimate the input as well as the output of each model, as well as the weights 
. The best-performing models are chosen, while the rest are rejected. Stacking employs a metaclassifier to merge multiple base classifiers trained using different learning methods on a single dataset. The model predictions are mixed with the inputs from each successive layer to generate a new set of predictions 
. Ensemble stacking is also known as mixing because all data is mixed to produce a forecast or categorisation. Multilinear response (MLR) and probability distribution (PD) stacking are the most advanced techniques. Groupings of numerous base-level classifiers (with weakly connected predictions) are widely known to work well. Nahar et al. 
propose a stacking technique that employs correspondence analysis to find correlations between base-level classifier predictions.
The dataset is randomly divided into n
equal sections in this procedure. One set is utilised for testing and the rest for training in the nth-fold cross-validation 
. Researchers derive the predictions of various learning models using these training testing pair subsets, which are then used as metadata to build the metamodel 
. The metamodel produces the final forecast, commonly known as the winner-takes-all technique 
. Stacking is an integrated approach that uses the metalearning model to integrate the output of base models. If the final decision element is a linear model, the stacking is also known as “model blending” or just “blending” 
. Stacking involves fitting multiple different types of models to the same data and then using a different model to determine how to combine the results most efficiently 
The Voting Classifier ensemble approach is a strategy that aggregates predictions from numerous independent models (base estimators) to make a final prediction 
. It uses the “wisdom of the crowd” notion to create more accurate predictions by taking into account the aggregate judgement of numerous models rather than depending on a single model. In the Voting Classifier, there are two types of voting: hard voting, in which each model makes a prediction, and soft voting, in which each model forecasts the probability or confidence ratings for each class or label. The final prediction is made by summing the expected probabilities across all models and choosing the class with the highest average probability 
. Weighted voting allows multiple models to have different influences on the final forecast, which can be assigned manually or learned automatically based on the performance of the individual models. Because of this diversity, different models can affect the final prediction differently 
By combining the strengths of several models, the Voting Classifier increases overall performance and robustness, especially when distinct models have diverse properties and generate independent predictions. The Voting Classifier can overcome biases or limits in a single model and produce more accurate and trustworthy predictions by using the collective decision-making of numerous models. Overall, Voting Classifier is a versatile ensemble strategy that can be applied to a variety of machine learning applications by using the capabilities of different models to make more accurate predictions 
. The Voting Classifier is a versatile ensemble approach in machine learning that provides a number of benefits. By integrating various models with diverse strengths and weaknesses, it enhances accuracy, robustness, and model diversity while minimising bias and variance in predictions.
The Voting Classifier’s ensemble nature improves model stability by decreasing overfitting and increasing model variety 
. It offers various voting procedures like as hard voting, soft voting, and weighted voting, allowing for customisation based on the tasks and characteristics of specific models. Furthermore, the Voting Classifier can improve interpretability by analysing the contributions of many models, assisting in understanding the underlying patterns and decision-making process 
. Overall, the Voting Classifier is an effective tool for enhancing predictive performance in various machine learning tasks.