Computational Biology in Drug Design

Computational Biology in Drug Design: Comparison

Please note this is a comparison between Version 1 by Yue Zhang and Version 3 by Dean Liu.

Traditional drug design requires a great amount of research time and developmental expense. Booming computational approaches, including computational biology, computer-aided drug design, and artificial intelligence, have the potential to expedite the efficiency of drug discovery by minimizing the time and financial cost. In recent years, computational approaches are being widely used to improve the efficacy and effectiveness of drug discovery and pipeline, leading to the approval of plenty of new drugs for marketing. The present review emphasizes on the applications of these indispensable computational approaches in aiding target identification, lead discovery, and lead optimization. Some challenges of using these approaches for drug design are also discussed. Moreover, rwesearchers propose a methodology for integrating various computational techniques into new drug discovery and design.

computational biology
computer-aided drug design (CADD)
artificial intelligence-aided drug design (AIDD)

1. Application of Molecular Mechanics in Drug Design

Molecular mechanics (MM) is an approach which approximately treats the molecules with the laws of classical mechanics and saves the computational resources required for quantum mechanical calculations ^[1][58]. Over the past decades, MM approach plays an important role in understanding the ligand-protein structures, interactions and optimizing leads. It is achieved by MM potential energy function, which represents the sum of different energy terms, referred as “force fields” ^[2][59]. MM potential energy functions are used in various sampling methods, such as MD and MC (Monte Carlo). MD simulations are often utilized in drug discovery ^[3][4][60,61]. MD is one of the most popular algorithms for sampling. It utilizes various integration algorithms, such as Verlet’s Algorithm, Leap-frog Algorithm and Beeman’s Algorithm, to interpret classical Newton’s equation of motion to analyze the trajectories, movements and interactions in a given molecular system ^[4][61]. Time-dependent properties can be obtained from MD ^[5][62]. The system is generally a biomacromolecule, such as a protein for example an enzyme, with a solvent environment. For this protein or enzyme system, the initial protein structure is resolved by experiments ^[6][63]. Then, the structure could be modelled by different methods. After that, simulations start with the prepared model. X-ray crystallography is used as an experimental method for obtaining the three-dimensional protein structure ^[7][64]. However, X-ray requires the protein to form stable crystals and the crystal quality determines the resolution of the structure, which limits the obtainment of high-quality protein structures, especially of membrane proteins ^[8][65]. Cryo-EM addresses the problem without the need to form crystals ^[9][10][66,67]. Cryo-EM can determine even quite unstable and intractable membrane protein structures ^[10][67]. However, Cryo-EM is not a panacea. In cryo-EM, sample quality is still the most critical factor for determining the high-resolution structure ^[11][54]. If no experimental structure is available, modeling or predicting the structure is necessary. Homology modeling ^[12][13][68,69] and AlphaFold developed by DeepMind ^[14][15][16][70,71,72] are preferred techniques for acquiring the initial protein structure. In the molecular dynamics simulation, atoms and molecules of the system interact during the fixed time, providing the dynamic features of the system. Atom trajectories are generally determined by Newton’s laws of motion. Molecular mechanics methods with various force fields ^{[17][18][19][20][21][22]}[73,74,75,76,77,78] are employed to calculate the energies of the system.

1.1. Application in Investigating the Mechanism of the Target Protein

Target protein can be regulated by drugs to cure the disease or relieve the symptoms. The overall process is dynamic and usually accompanied by the conformational changes of the target protein. Target protein conformation has an essential role in drug design. Even minor changes, as well as the motions of residues, may affect the target–ligand interactions. MD simulations can provide dynamic information about the target protein and the ligand in terms of drug design, which cannot be obtained through experimental methods. Compared with experiments, MD simulations can provide detailed information about the target protein folding process and describe the conformational changes of the protein with environmental changes such as temperature, pH, and residue mutations, with detailed energetic information. At present, MD simulations have been broadly applied to study the molecular mechanisms of the target protein to aid drug design.

For example, Horikoshi et al. revealed the molecular mechanisms underlying the loss of activity in the most severe glucose-6-phosphate dehydrogenase (G6PD) deficiency ^[23][79]. It is triggered by the long-distance propagation of structural defects at the dimer interface. The findings indicated that a promising drug can possibly be discovered and developed by correcting these structural defects. While studying pathogenic mutations in the kinesin-3 motor KIF1A by using MD simulations, Budaitis et al. found that these mutations were linked to neurodevelopmental and neurodegenerative disorders that impaired neck linker docking and dramatically reduced the KIF1A force generation ^[24][80]. Zanetti-Domingues’s work revealed autoinhibition mechanisms in dimers and oligomers of the epidermal growth factor receptor (EGFR) and supported that dysregulated species bear populations of symmetric and asymmetric kinase dimers coexisting in an equilibrium ^[25][81]. The structural feature of the assembly inspires the related drug design. Based on MD simulations, Zhu’s lab elaborated on the genotype-determined EGFR-RTK heterodimerization and its drug resistance mechanism in lung cancer caused by a tighter EGFR-RTK crosstalk ^[26][82]. The study promotes drug design against the tighter crosstalk of the genotype determined. Understanding the dynamic behaviors of sirtuins, which have several ligand-binding sites ^[27][83], may provide perspectives for the design of selective inhibitors or activators. Polymyxin resistance was found to be induced by lipopolysaccharides and outer membrane vesicles in the multidrug-resistant Pseudomonas aeruginosa ^[28][84]. Based on this mechanism, an intelligent strategy for designing lipopeptide antibiotics against polymyxin resistance was developed ^[28][84]. The strategy may be suitable for the discovery of other classes of bacterial pathogen-targeting antibiotics. In addition to regular MM approach, coarse-grained models can be used to investigate the molecular mechanism of the target. More details are shown in Section 2.1.4.

1.2. Application in Molecular Docking

In molecular docking, according to the space complementarity and energy match, compounds are docked in the specific site. Then, the docking poses are scored and ranked based on the scores ^[29][85]. On the basis of molecular docking, VS has been indispensable to drug discovery ^[30][31][86,87]. VS saves time and costs for drug discovery and efficiently obtains various molecule scaffolds ^{[32][33][34][35][36]}[88,89,90,91,92]. The complete VS process consists of pre-processing compound libraries, molecular docking, and the selection of pretest compounds ^[37][38][39][22,93,94]. In general, the enrichment factor greatly determines the success of VS. The enrichment factor is a validation tool that evaluates the effectiveness of VS by computing the ratio of active molecules among the tested molecules in the initial library. For each VS step, different strategies are used to enhance the enrichment factor ^[40][41][38,95]. The VS results depend on the rationality of the docking poses between the target and ligand, and the accuracy of binding affinity ^{[42][43][44][45][46][47][48][49][50]}[39,96,97,98,99,100,101,102,103].

After VS step, filtering promising candidates may then be sampled with MD simulations. The use of MD simulations can improve the flexibility in conformational sampling, which increases the number of degrees of freedom of the system and consequently in the computational effort ^[51][104]. For MM MD simulations, one of the most time-consuming parts is the calculation of the interactions between each atom in the system, which cost more than 90% of the total simulation time. This is mainly related to the calculation giving rise to O(N²) computational complexity (N represents the number of atoms in the system) ^[52][53][54][105,106,107]. The cutoff method applied to treat the interactions between atoms can reduce the computational complexity to O(N) ^[52][53][105,106]. Compared with the three-dimensional structure of the target protein obtained through X-ray or Cryo-EM, MD simulations take the flexibility of the target protein into account. The experimental structure is in the specific crystalized condition, which is possibly different from the real binding conformation with the ligand. A set of conformations can be obtained by modeling and simulations, especially the crucial intermediate or transition state that may contribute to the ligand–target protein binding process. MD simulations used to sample the specific conformation can provide more rational docking poses and achieve a higher enrichment factor. In addition to the conformation optimization of the target and ligand, MD simulations combined with binding free energy calculations are applied to assess the binding affinity of the ligand with the target. MM-PBSA and MM-GBSA are general approaches used to calculate binding free energy. Based on the trajectories from MD simulations, electrostatic energy, solvation energy, and van der Waals energy are calculated. Entropy change can be obtained through normal mode analysis. Then, the binding free energy can be obtained ^[55][108]. The binding free energy calculations are great and useful to augment the accuracy of the binding affinity of docking poses and improve the enrichment factor. But these high-cost sampling calculations are often used on an even smaller subset of potential hits.

1.3. Application in Lead Optimization

The optimal binding mode and the accurate binding affinity are vital for understanding the ligand–target interactions and guiding the modification of screened compounds. The ligand–target thermodynamical data, such as entropy change (ΔS) and free energy change (ΔG), can be determined through experiments and are used to distinguish between active and inactive compounds. However, the lack of details about target–ligand interactions limits further structural modifications of the compounds. MD simulation is a powerful approach for precisely evaluating the ligand–target binding modes. It can describe the detailed ligand–target interactions and determine the free energy contribution of each residue in the binding sites. The information can provide guidance for lead optimization.

Using the combination of MD simulations and VS, Patel’s lab optimized bedaquiline to decrease the severity of its adverse side effects and discovered that the compound CID 15947587 with low cardiotoxicity has the highest binding free energy (−85.27 kcal/mol) and an optimal docking score (−5.64) with mycobacterial ATP synthase enzyme ^[56][109]. Castillo’s group optimized AKT inhibitors by using MD simulations, thereby improving the binding affinity of the 2,4,6-trisubstituted pyridine scaffold in the ATP pocket of PKB/AKT and interacting well with glutamates/aspartates in ATP-binding sites ^[57][110]. Zhang et al. screened the new inhibitor against phosphodiesterase-2A (PDE2A). With the guidance of MD simulations, they obtained the optimized lead, LHB-8, forming an extra hydrogen bond with D808 and a hydrophobic interaction with T768, in addition to the interactions with Q859 and F862 of the PDE2A target ^[58][111].

1.4. Application of Coarse-Grained Models in Drug Design

All-atom MD simulations present the limitation while exploring the dynamic process of the large-scale target protein or long-time scale. Coarse-grained (CG) models help overcome the limitation well. When using CG models, the main chain of residues is in the all-atom state, but the side chain is a simplified united atom. Compared with all-atom MD simulations, CG simulations decrease the number of particles and make the potential energy surface smoother. Thus, the longer time and larger scale are available using CG models. Martini is a classical force filed to employ CG simulations. Martini is currently applied to study the mechanism and oligomerization of membrane proteins and self-assembly of proteins, predict conformational changes, and study binding and pore formation in membranes ^[59][60][61][112,113,114].

The CG model consistently developed by Warshel et al. ^[21][22][62][77,78,115] is advantageous in investigating the molecular mechanism of different biophysical systems, such as SARS-CoV-2 ^[63][116], GPCR ^[64][117], ATPase ^[65][66][118,119], and kinase ^[67][120]. This model can accurately describe the electrostatic term ^[68][121], which usually is the major contributor compared with other types of interactions in proteins. The CG profile can determine the dynamic information of the reaction in proteins, including the reaction energy barrier, rate-determining step, and the transition state. These results offer energetic details for understanding the working mechanism of proteins and guide rational drug discovery and development.

2. Application of QM in Drug Design

Structural studies have shown that the details of the potential drug target are valuable not only for lead discovery and optimization but also for the later steps of drug design, such as toxicity determination and prediction of binding modes of the leads and drug targets. During drug discovery, the molecular docking or pharmacophore model is used for predicting the binding modes in a short time. MD simulations can be employed to obtain flexible and rational docking modes. They can also guide drug design by exploring ligand–target interactions, such as studying the active site for extra electrostatic, hydrophilic, or hydrophobic interactions that can increase binding affinity ^[42][69][70][39,122,123]. Although MD simulations improve the accuracy of scoring and docking ^[71][72][73][124,125,126], concerns still exist, especially in enzymes or metal-containing drug targets, in which valence electron transfer occurs ^[74][75][127,128].

QM is considered the potential solution for the aforementioned concerns, which can explore drug target details at the electronic level ^[70][75][76][52,123,128]. At present, QM is increasingly applied to enzymes or metal-containing proteins that are considered drug targets, such as HIV-1 protease ^[77][129], human DHFR ^[78][130], and GPCR ^[79][131], and clarify the molecular mechanism for drug design ^{[80][81][82][83]}[132,133,134,135]. QM is also used for designing novel drugs, including the high-affinity ligands of FKBP12 ^[84][136] and novel inhibitors of human DHFR ^[85][137].

Additionally, researchers have attempted to improve scoring by inducing QM approaches, especially QM-polarized ligand docking ^[86][138], and QMScore, a semiempirical QM (SQM) scoring function ^[87][139]. QM in combination with molecular mechanics (MM) has been developed to enhance the accuracy of docking and VS ^{[75][88][89][90]}[128,140,141,142]. Fong et al. applied a series of QM/MM scoring functions to six HIV-1 proteases and found that parts of QM/MM functions were superior to MM functions ^[91][143]. Kim et al. ^[92][144] proposed a new strategy of using QM/MM with the implicit solvation model to rescore docking poses and improve the docking performance on 40 GPCR–ligand complexes. A significant improvement was observed over the conventional docking method. Chaskar et al. developed a QM/MM on-the-fly docking method to deal with polarization and metal interactions in docking and observed a significant improvement in scoring ^[93][145]. Compared to MD simulations, QM calculations are even more expensive. For example, the Hartree-Fock recovers approximately 99% of the total electronic energy and requires diagonalizing the M by M Fock matrix at O(M³) cost (M represents the number of basis functions) ^[94][146]. By Shor’s factoring algorithm, the complexity of quantum calculations is O((log₂N)³) (N represents the number of atoms in the system) ^[95][147]. Moreover, QM calculations are restricted to systems of up to a few hundred atoms in contrast to MD simulations, which has evolved from simulating tens of thousands of atoms to handling over 100 million atoms comprising an entire cell organelle ^[96][97][148,149].