1000/1000
Hot
Most Recent
An antigenic determinant (AD) is a portion of an antigen molecule known as an epitope that is recognized by the human immune system, specifically by antibodies or T and B cells. Recognition of epitopes is considered important in EBPV design to contain pandemics, epidemics, and endemics due to the outbreak of infectious diseases. To design an effective and viable EBPV against different strains of a pathogen, it is important to identify the putative T- and B-cell epitopes. Using the wet-lab experimental approach to identify these epitopes is time-consuming and costly because the experimental screening of a vast number of potential epitope candidates is required. Fortunately, various available machine learning (ML)-based prediction methods have reduced the burden related to the epitope mapping process by decreasing the potential epitope candidate list for experimental trials. Moreover, these methods are also cost-effective, scalable, and fast.
Study Conducted | Methodology Adopted | Strengths/Limitations |
---|---|---|
T. Liu et al. [23] | A feedforward deep neural network-based ensemble of 11 classifiers was created to predict BCEs. IEDB was used to obtain the BCE peptide dataset. On the test set, the model was evaluated using the AUROC metric. | Model reports peptide as an epitope if classified by all 11 classifiers. It would provide the best results if simple majority voting was used for classification. |
Fatoba, A. J. et al. [24] | In [24], potential epitope-based vaccine candidates were explored. After retrieving 600 genome sequences of SARS-CoV-2 from the ViPR repository, CD8+ and CD4+ epitopes and B-cell (linear) epitopes were generated and screened for immunogenicity, antigenicity, and non-allergenicity. | The results of [25] reported 19 candidate T-cell epitopes (CD8+), which were found to overlap strongly with 8 B-cell epitopes. The results provide the basis for an experimental design for a suitable peptide vaccine against SARS-CoV-2. |
R. Moody et al. [26] | Authors used IEDB prediction tools for predicting B-cell epitopes and those with high scores in terms of prediction were selected as candidate epitopes. The epitopes were then matched to human proteins using NCBI Blast technology. | The findings showed eleven (11) novel B-cell epitopes in the host that were capable of explaining key elements of COVID-19 extrapulmonary disease that previous research had not been able to explain. |
Jespersen MC et al. [27] | The authors employed feedforward neural networks (FFNN) with two hidden layers, each with 25 neurons, an activation function (sigmoid) at all neurons, and an ADAM as an optimizing function to predict antibody-specific epitopes (B cell) or epitope targets of provided cognate antibodies. The dataset was obtained from the IEDB database. PCA was used for dimensionality reduction before the model was trained. | It was shown that a simple set of attributes retrieved from the cognate antibody boosted the rate of accuracy in predicting individual epitopes. Furthermore, sophisticated features such as Zernike Moments can improve the model’s predictive potential. When compared to DiscoTope 2.0, this model performs better in finding patches overlapping with an actual patch of an epitope in cross-validation and on an independent dataset. |
Ling-yun Liu et al. [28] | The authors used PCA and RNN networks. They converted the physicochemical properties into digital vectors, intending to have high-dimensional feature space, and later PCA was applied to process them. The output from PCA was used as an input to the RNN for predicting epitopes. | Prediction results obtained by this process demonstrated that PCA reduced dimensions, but at the same time, original features of the main component were retained, and the rate of prediction was also improved. |
Bin Cheng et al. [29] | Authors introduced a novel scale to measure feature importance, called the relevance of amino acid pair (RAAP). RAAP was calculated by decomposing the sequences of amino acids based on their physicochemical properties. | The successful prediction rate was drastically improved here by using LSTM. It does not suffer from gradient instability and is good enough for textual classification sequences. Fivefold cross-validation was used to test and validate the models. |
Balachandran Manavalan et al. [30] | Here, a non-redundant dataset was constructed containing 5500 BCEs experimentally validated, and 6893 non-B-cell epitopes were retrieved from IEDB. Then, an ensemble model to predict B-cell epitopes based on ERT (extremely randomized tree) and a classifier called GB (gradient boosting) was developed. The model works based on the physicochemical properties, AA composition, and combination of dipeptides and PCP as the input features. | After performing cross-validation on a benchmark dataset, it was shown that this model performed far better than the individual classifiers such as ERT and GB, with an MCC (Matthews correlation coefficient) of 0.454. |
Yuh-Jyh Hu et al. [31] | A cost-sensitive strategy based on bagging MDT was suggested, which integrates two ensemble-based learning algorithms. Without employing the prediction of a pre-trained single predictor, it makes it independent of multiple prediction tools. It can also learn a meta-classification architecture with varied data, without being constrained by a particular hierarchy. | It was demonstrated that the performance of prediction is superior as compared to a single epitope predictor. However, epitope prediction based on meta-learning is purely dependent upon the predictive strength of various other pre-trained linear and conformational epitope prediction tools, which cannot be retained directly by users. Hence, this limits the flexibility and applicability of these meta-classifiers. |
Jing Ren et al. [32] | The authors proposed a novel staged heterogeneity-based learning model. The model learns both heterogeneity and characteristics of data in a phased manner to identify residue of antigens of conformational B-cell type epitopes that are heterogeneous, purely based on sequences of antigens. In the first stage, the model is made to learn the generic epitope pattern with propensities, and in the second stage, the same model is made to learn the complementarity of the propensities used in the first stage, which is heterogeneous but this time on a small dataset of experimentally verified epitopes. | It was demonstrated that if heterogeneity was learned well, the transferability of the model improved remarkably in handling new data.It was tested and validated on two different datasets: one with epitopes determined experimentally and another with computationally defined. It showed outstanding performance that was around twice that of existing predictors, including CBTOPE. |
Georgios A. et al. [33] | A novel method, “SEPIa”, has been proposed here to predict B-cell epitopes from protein sequences and is sufficiently faster, and it can also be applied to large-scale datasets. The model is the combination of two classifiers, random forest and naïve Bayes algorithm. | The average prediction accuracy of SEPIa is limited. The AUC score is 0.65 in both 10-fold cross-validation and on the independent test dataset, which is higher than other approaches tested on the same test dataset. |
Gene Sher et al. [25] | Authors proposed a novel, analytically trained DREEP (Deep Ridge Regressed Epitope Predictor) based on string kernels using a deep neural network tailored to predict continuous epitopes. | The model was tested with input as long sequences of proteins from datasets such as AntiJen, Pellequer, and HIV. The results were compared with epitope predictors such as DMNLBE, LBtope, etc. Using the area under the curve (AUC) metric, the model achieved performance improvements over SARS by 13.7%, HIV by 8.9%, and Pellequer by 1.5%. |
Wen Zhang et al. [34] | Authors attempted to differentiate immunogenic epitopes from non-immunogenic epitopes based purely on their primary structure. To effectively utilize various features, an ensemble method based on a genetic algorithm was proposed. | The model was tested on two benchmark datasets: IMMA2, PAAQD. The model was compared with methods such as POPI, PAAQD, and POPISK, which are considered state-of-the-art in nature. The model performed better, with an AUC score on IMMA2 of 0.846 and 0.829 on PAAQD. |
Wei Zheng et al. [35] | The authors used ensemble learning to improve the prediction of BCEs. Their ensemble method combined twelve SVMs. To handle imbalanced datasets, resampling and AdaBoost methods were used. | The proposed ensemble model achieved an AUC score of 0.642–0.672 on the training dataset with five-fold cross-validation and an AUC score of 0.579–0.604 on the test dataset. |
Jian Zhang et al. [36] | To predict antigenic determinants, the authors devised a cost-sensitive ensemble approach, and a spatial clustering-based algorithm was used to identify probable epitopes. | The model performed admirably in terms of prediction. AUC scores of 0.721 and 0.703 were obtained using leave-one-out cross-validation (LOOCV) on two benchmark datasets: bound and unbound. |
Kavitha K V et al. [37] | PCA was used to reduce dimensions and to filter out the essential features; for prediction purposes, a random forest algorithm was used. | Experimental results showed that the random forest-based classifier had an improved prediction accuracy rate as compared to BCPred, AAP, etc. |
Wen Zhang et al. [38] | The authors used sequence-derived features and developed an ensemble model based on random forest to predict epitopes accurately. | The model was evaluated using the leave-one-out cross-validation procedure, and an AUC score of 0.687 and 0.651 on bound and unbound datasets was obtained. |
Ping Chen et al. [39] | Authors reviewed various prediction models for epitopes, such as models based on SVM, neural network, random forest, etc., to defend computational approaches in the prediction of epitopes as in silico methods require a lot of effort and time. | Apart from defending the computational approaches, it was also concluded that there is a limitation to current models as it is impossible to devise an exact model without having complete knowledge of the immune system, and current models are simply best at approximation. |
Claus Lundegaard et al. [40] | Here, an artificial neural network was used. The standard feedforward neural network with backpropagation was employed to predict epitopes. The dataset was retrieved from the SYFPEITHI database. | The model efficiently and accurately predicts MHC class I type peptides and outperforms the existing methods. |
Sr. No. | Method Name | Usage |
---|---|---|
01 | NetMHC [63] | To predict HLA I class or CD8+ SARS-CoV-2 T-cell epitopes |
02 | NetMHCpan [64] | |
03 | NetCTLpan_1.1 [65] | |
04 | NetMHC_4.0 [66] | |
05 | HLAthena [67] | |
06 | MHCflurry [68] | |
07 | NetHMCII_2.3 [69] | To predict HLA II class or CD4+ SARS-CoV-2 T-cell epitopes |
08 | NetMHCIIpan_3.0 [70] | |
09 | NetMHCIIpan_4.0 [71] | |
10 | NeonMHC2 [72] | |
11 | MARIA [73] |