1. Introduction
Over the course of the last many years, computational techniques have been broadly used to predict drug combinations, including traditional machine learning methods, deep learning methods, mathematical methods, systems biology methods, and search algorithms. Traditional machine learning applies to various feature types for high prediction accuracy in different scale databases. For a long time, traditional machine learning has been applied to improve and optimize drug discovery and design processes and integrate with other computational methods
[1][2]. Deep learning methods can learn the complex nonlinear relationships between input attribute data (such as genomics) and the associated output (such as synergy score)
[3]. Due to its multi-processing layer, the accuracy of deep learning models will be incredibly improved with the increment of input data, particularly huge databases
[4]. The key step of the mathematical model is to collect the necessary kinetic parameters from the literature or experiments. When cellular pathways and parameterization are available, mathematical simulations can be highly accurate for combinatorial drug discovery
[5]. Systems biology methods analyze the therapeutic effects of drug combinations through various biological networks, which take a lot of biological knowledge
[6]. Search algorithms are seen as a method that endeavors to investigate feature spaces, using the high performance of computers to purposefully exhaust some or all possible scenarios of a problem-solving space
[7].
2. Application of Traditional Machine Learning in Drug Combination Prediction
Traditional ML methods include Support vector machine (SVM), Decision tree (DT) and Gradient boosting (GB). SVM is often used for classification tasks, where the goal is to find hyperplanes that separate positive cases from negative cases. Like SVM, a Decision tree is a tree-shaped predictive model that judges the feasibility of various situations based on known probability of occurrence. Different from the previous two methods, Gradient boosting is an ensemble algorithm. It obtains a subset of the sample by operating the sample set and then generates a series of base classifiers. These algorithms are good classifiers for identifying whether drug combinations belong to synergistic or antagonistic effects.
Support vector machine
[8] is a sparse and robust classifier. SVM is very well at identifying subtle patterns in complex data sets. SVM also introduces kernel functions for faster computation and prediction of nonlinear problems. In addition, because SVM classifies by maximizing the interval, it is robust to noise and outliers. As mentioned above, the predictive model constructed by Wang T. et al.
[9] utilizes SVM to obtain the best characteristic. Their results show that the best SVM classifier they built is significantly better than one that uses only individual features, with prediction accuracy (ACC) of 0.903 and Matthew’s correlation coefficient (MCC) of 0.806. This makes sense because their classifier combines several topological information about the drug. The model is expected to be a helpful method for predicting new drug combinations that didn’t exist in the training set. On large-scale data sets, SVM training time is very slow and takes up a lot of computing resources. What’s worse, SVM is poorly interpretable because its decision boundaries are determined by the support vector rather than all training samples
[10]. In general, SVM has relatively high accuracy, strong generalization ability when dealing with nonlinear data and can capture complex relationships in the data.
Decision tree
[11] is easy to handle and implement. Random Forest is a Bagging integration algorithm composed of decision trees. The results of the decision tree can be visually displayed through the tree structure, which is easy to understand and visualize. Wu, L. et al.
[12] built an advanced deep forest-based model, ForSyn. ForSyn is a multi-layer cascade structure with two new forest types embedded in each cascade as units. Comparing ForSyn with other advanced algorithms on several datasets, their results show that ForSyn performs better, with an area under the precision–recall curve (AUPR) of 0.591 and recall of 0.537. Unlike traditional machine learning methods, this model solves problems about the imbalance of data types and high dimensions of characteristics. However, the model is still confused by the scale of input data.
Moreover, its ability to generalize to new anti-cancer drugs or cancer cell lines is insufficient. The intrinsic problems with these drug combination predictions remain unresolved. Decision trees are prone to over-fitting training data, especially when the depth of the tree is large or there are too many leaf nodes
[13]. The accuracy of decision trees mainly depends on data quality, feature selection, tree structure and parameters. In cases where there are complex interactions between complex datasets and features, the accuracy of the decision tree may decline.
Gradient boosting
[14] improves the accuracy of any given learning algorithm. It shows high predictive accuracy in many machine learning tasks and is particularly good at dealing with nonlinear relationships and high-dimensional data. It can clarify the decision-making process for predicting the outcome by looking at the importance of each weak learner. Xu, Q. et al.
[15] introduced a new model based on the stochastic gradient boosting algorithm called PDC-SGB. The model constructs 732-dimensional feature vectors containing biological, chemical, and pharmacological information for each drug combination. This integrated six types of characteristics to describe drug combinations, including molecular two-dimensional structure, structural similarity, anatomical therapeutic similarity, protein-protein interactions, chemical-chemical interactions, and disease pathways. Compared with other advanced models, this model shows better performance and feature prediction ability, with its AUC up to 0.9775. However, the performance of the biological part of the model is relatively low, which may be due to the incomplete molecular network or biological pathway and the oversimplified characterization of biological features. Unfortunately, gradient-boosting algorithms are prone to overfitting on training sets, especially when there are many weak learners. The algorithm also involves more hyperparameters than other methods. As a result, hyperparameter tuning is more complex
[16].
3. Application of Deep Learning Methods in Drug Combination Prediction
Deep learning models mainly contain Feedforward neural network (FNN), Autoencoder (AE), Graph neural network (GNN) and Deep belief network (DBN). In the Feedforward neural network, every neuron is organized in layers, and every neuron is simply associated with the neurons of the past layer. It is often used as a baseline for Deep learning methods. Unlike FNN, the autoencoder is more complex and is a semi-supervised and unsupervised learning artificial neural network for reduction and anomaly detection. The graph neural network uses deep learning to directly learn graph structure by extracting and discovering its characteristics. Deep belief networks can be utilized not exclusively to recognize characteristics and classify data but additionally to produce data.
A feedforward neural network
[17] receives the outputs of the past layer and results it to the following layer without feedback. The advantage of this model is that it has strong nonlinear modeling ability and can automatically learn complex relationships between input features. In addition, it can also improve the performance of the model by increasing the number of hidden layers and neurons. Tsai, P.L. et al.
[18] proposed a multi-layer FNN with two hidden layers, which was used to predict the treatment outcome of antidepressant therapy in patients with initial treatment and first diagnosis of major depressive disorder (MDD) patients during the severe depressive stage. The first layer of the neural network is the input layer, where each unit receives a one-dimensional data vector containing patient characteristics. The final layer is the output layer that performs the classification. The evaluation results show that the model has an Area Under Curve (AUC) range from 0.7 to 0.8 and can use clinical features and peripheral biochemical characteristics to predict the outcome of antidepressant therapy. The drawback is that during the model training process, they used a small sample size and could not carry out a more detailed analysis. Deep neural networks still have insufficient mechanisms to explain the interactions between variables. Besides, it requires many training samples and a complex network structure, which is easy to overfit.
Furthermore, the training process of feedforward neural networks takes a long time, especially when dealing with large data sets
[19]. The results of feedforward neural networks often lack interpretability. The accuracy of this method is affected by many factors, including data quality, feature selection, network structure and hyperparameter selection.
Autoencoder
[20] includes both encoder and decoder, a representation learning algorithm in a general sense. It has a strong feature learning ability and can extract useful features from drug response data through unsupervised learning without the need for manually labeled information
[21]. Liu, Q. et al.
[22] constructed a knowledge-enabled and self-attention transformer-boosted deep learning model, TranSynergy. It includes three major components: (1) input dimension reduction component, (2) self-attention transformer component, and (3) output fully connected component. Their experimental results of model evaluations showed that TranSynergy outperformed the most advanced approaches, and the AUC and AUPR reached 0.908 and 0.625, respectively. As with traditional computational models, the TranSynergy model selected only a few cancer-related genes that included drug targets and annotations due to limited training data. In addition, the model will also cause dimensional disasters due to too many feature dimensions, resulting in overfitting problems. Autoencoder have the risk of overfitting when dealing with large-scale drug response data, especially when the training set is small. The training process of the autoencoder model is unsupervised, so the features extracted by the model are often difficult to interpret
[21][23].
Graph neural network
[24] is an emergent framework that has emerged recently. The advantage of a graph neural network is that it can capture the relationship and topological information between the nodes in the graph and transform the data into low-dimensional and more discriminative feature space. In addition, graph neural networks can automatically learn the feature representation of nodes and edges
[25]. Wang J. et al.
[26] proposed a graphical neural network (GNN) and attention mechanism-based model called DeepDDS. In this model, the chemical structure of the drug is represented by a graph. The drug embeddings are calculated according to the above two deep learning models. By integrating genomic and drug signatures, DeepDDS can capture important information from drug chemical structures and gene expression patterns to identify synergistic drug combinations that target specific cancer cell lines. Additionally, they compared DeepDDS with deep learning methods and traditional machine learning methods on a benchmark dataset. Finally, the results demonstrate the better performance of DeepDDS compared to other models, and its performance measures of AUC, area under the AUPR and accuracy reach 0.93, 0.93 and 0.85, respectively. Similarly, DeepDDS still did not show satisfactory predictive accuracy on independent test sets for the same reason described earlier. The main disadvantages of graph neural networks are as follows: (1) Due to the complex structure of GNN, its model training process is relatively difficult. (2) GNN is also a black box, which makes it difficult to explain its decision-making process. (3) GNN is vulnerable to adversarial attacks, and its robustness needs to be improved
[27].
A deep belief network
[28] can train the weights between its neurons, allowing the whole network to generate enough training data to maximize the probability. Moreover, DBN can automatically learn high-level abstract features from data through unsupervised learning and perform back-propagation through supervised learning. When it comes to supervised training with just some labeled data and extracting features from regular data, DBN performs admirably
[29]. Chen, G. et al.
[30] introduced a stacked restricted Boltzmann machine (RBM), which can predict the response of drug combinations from gene expression, pathways, and body fingerprints. In their model, the training data is utilized before the learning stage to optimize the weight of the input using contrastive divergence. Their evaluation of the model showed an accuracy rate of 71.5%, the recall of 60.2%, and an F score of 65.4%. Overall, they performed better than the DREAM competition group. The RBM model also faces the problem of data integrity and lack of experimental data, which may be the cause of model performance degradation. Moreover, DBN is prone to over-fitting when dealing with small sample data, so some regularization methods are needed to alleviate over-fitting
[29]. DBN can achieve high accuracy in drug response prediction models, but the training data and hyperparameter selection need to be carefully considered in practical applications.
4. Application of Mathematical Methods in Drug Combination Prediction
Mathematical methods include Network analysis and Dynamic mathematical models. Network medicine uses a systems-network perspective to understand the disease mechanism
[31]. Similarly, the Dynamic mathematical model studies the effects of drug combinations on potential protein concentrations and drug combination therapies, which can effectively control the progression of disease states
[7].
The human body is composed of a rich variety of biological units, and with the advancement of bio-measurement technology, various types of disease networks have been established. With the help of Network analysis, the complex interactions between drugs and proteins can be captured, including physical interactions, metabolic pathways, signal transduction, etc. In addition, it can provide an interpretation of the predicted results, for example, by analyzing critical paths in the network, node importance, etc.
[32][33]. Yin N. et al.
[34] explained the connection between network topology and the effect of drug combinations by displaying the interaction of drug combinations and their targets in the network. They found that the effect of drug combinations depends heavily on the network topology, and they were able to identify motifs that could serve as useful catalogs for rational drug combination design of enzyme systems. Unlike most studies on drug synergies, they focused on antagonism and synergies. Their model generally provides a rational and easy-to-apply approach to designing synergistic drug combinations. However, the results of this model are only based on enzyme networks, and other types of biological networks still need to be further explored. Moreover, this method requires a large amount of drug and protein interaction data, so acquisition and collation are challenging
[32][33]. The accuracy of network analysis is like many other methods and is also affected by several factors, including data quality, network construction methods, and prediction algorithms.
The dynamic mathematical model captures important dynamic aspects of disease treatment. It can describe the processes of drug absorption, distribution, metabolism, and excretion in organisms, to simulate drug reactions more accurately. It also has a strong predictive ability, which can predict the effect of drugs under different doses and dosing schemes and support the individualization of drug therapy
[35][36]. Geva-Zatorsky, N. et al.
[37] dealt with accurately tracking different protein concentrations, considering different drugs through a dynamic proteomics method. They tracked down that the dynamics of proteins’ reaction to drug pairs can be accurately depicted through their responses to various drugs. However, Dynamic mathematical models are usually constructed based on a series of mathematical equations, and their complexity may limit the interpretability of the models
[35][36]. These models typically have high accuracy but are more data-intensive, requiring more data and optimization of the model parameters.
5. Application of Search Algorithms in Drug Combination Prediction
The breadth-first search algorithm is always extended outward through the boundary between found and unfound vertices. The applications of breadth-first search, especially in drug structures, include finding the shortest path and the minimum distance between two points.
As one of the least complex graph search algorithms, the Breadth-first search algorithm is the basis of numerous graph algorithms. It can consider a large amount of potential drug-target interactions. Ji, L.S. et al.
[38] investigated the immunomodulatory mechanisms of Bushen formula (BSF) combined with entecavir (ETV) in patients with newly treated chronic hepatitis B (CHB) and CHB patients with partial virological response to ETV. They finally demonstrated that the combination of these two drugs helped ETV partially alleviate hBsAg reduction in patients and is a potential treatment for these patients. However, in their study, the underlying immunomodulatory mechanisms underlying BSF treatment of CHB patients remain to be explored. This method is not robust enough. Moreover, because its prediction results are often based on the similarity between the drug and the target, it does not consider other factors such as drug metabolism, drug delivery, etc.
[39][40]. Regarding the accuracy of the model, the feature selection, parameter selection and tuning, evaluation and verification methods of the search algorithm have a decisive influence.
6. Application of Systems Biology Methods in Drug Combination Prediction
Systems biology methods hypothesize that drugs that are effective for specific diseases can be used as candidates for other diseases with similar characteristics of changes in gene expression. The model is suitable for rapid drug combination prediction for experimental verification.
The cMap database provides gene expression profiles of numerous small molecules against different cancer cell lines, which provides rich data for the Signature-based model. In addition, Systems biology methods can integrate large-scale biological data at different levels, such as gene expression, protein interactions, metabolic pathways, etc., to gain a more comprehensive understanding of drug action mechanisms and influencing factors
[41]. Kim J. et al.
[42] have developed a web-based program called K-Map. The program can uncover the perplexing communications between protein kinases and their inhibitors and provide the basis for rational clinical drug use. This model can link kinases to drugs using quantitative signatures of kinase inhibitor activity. In addition, it is highly real-time and useful, and they also update their data on a quarterly basis, making it even more valuable. However, there are still some disadvantages to this method: accuracy is highly dependent on the quality of the input data. Moreover, the method usually uses complex network models and algorithms, which makes it difficult to interpret the result
[41]. The systems biology approaches construct complex network models by integrating multiple data sources to achieve high accuracy.