Your browser does not fully support modern features. Please upgrade for a smoother experience.

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Felcia Lai	--	7044	2023-01-09 05:35:33	\|
2	format correct	Conner Chen	-2 word(s)	7042	2023-01-09 07:53:56	\|

Video Upload Options

We provide professional Academic Video Service to translate complex research into visually appealing presentations. Would you like to try it?

No, upload directly Yes

Cite

If you have any further questions, please contact Encyclopedia Editorial Office.

Select a Style

Chang, Y.; Hawkins, B.A.; Du, J.J.; Groundwater, P.W.; Hibbs, D.E.; Lai, F. Structure-Based Drug Design. Encyclopedia. Available online: https://encyclopedia.pub/entry/39883 (accessed on 01 March 2026).

Chang Y, Hawkins BA, Du JJ, Groundwater PW, Hibbs DE, Lai F. Structure-Based Drug Design. Encyclopedia. Available at: https://encyclopedia.pub/entry/39883. Accessed March 01, 2026.

Chang, Yiqun, Bryson A. Hawkins, Jonathan J. Du, Paul W. Groundwater, David E. Hibbs, Felcia Lai. "Structure-Based Drug Design" Encyclopedia, https://encyclopedia.pub/entry/39883 (accessed March 01, 2026).

Chang, Y., Hawkins, B.A., Du, J.J., Groundwater, P.W., Hibbs, D.E., & Lai, F. (2023, January 09). Structure-Based Drug Design. In Encyclopedia. https://encyclopedia.pub/entry/39883

Chang, Yiqun, et al. "Structure-Based Drug Design." Encyclopedia. Web. 09 January, 2023.

Structure-Based Drug Design

Edit

This entry is adapted from the peer-reviewed paper 10.3390/pharmaceutics15010049

drug discovery computer-aided drug design in silico drug design

1. Introduction

New drugs with better efficacy and reduced toxicity are always in high demand, however the process of drug discovery and development is costly and time consuming and presents a number of challenges. The pitfalls of target validation and hit identification aside, a high failure rate is often observed in clinical trials due to poor pharmacokinetics, poor efficacy, and high toxicity ^[1]^[2]. A study conducted by Wong et al. that analysed 406,038 trials from January 2000 to October 2015 showed that the probability of success for all drugs (marketed and in development) was only 13.8% ^[3]. In 2016, DiMasi and colleagues ^[4] estimated a research and development (R&D) cost for a new drug of USD $2.8 billion based upon data for 106 randomly selected new drugs developed by 10 pharmaceutical companies. The average time taken from synthesis to first human testing was estimated to be approximately 2.6 years (31.2 months) and cost approximately USD $430 million, and from the start of a clinical testing to submission with the FDA was 6 to 7 years (80.8 months). In comparison to a study conducted by the same author in 2003, the R&D cost for a new drug had increased drastically by more than two-fold (from USD $1.2 billion) ^[5]. A possible reason for the increase in R&D cost is that regulators, such as the FDA have become more risk averse, tightening safety requirements, leading to higher failure rates in trials and increased costs for drug development. It is therefore important to optimise every aspect of the R&D process in order to maximise the chances of success.

The process of drug discovery starts with target identification, followed by target validation, hit discovery, lead optimisation, and preclinical/clinical development. If successful, a drug candidate progresses to the development stage, where it passes through different phases of clinical trials and eventually submission for approval to launch on the market (Figure 1) ^[6].

Figure 1. Stages of drug discovery and development.

Briefly, drug targets can be identified using methods, such as data-mining ^[7], phenotype screening ^[8]^[9], and bioinformatics (e.g., epigenetic, genomic, transcriptomic, and proteomic methods) ^[10]. Potential targets must then be validated to determine whether they are rate limiting for the disease’s progression or induction. Establishing a strong link between the target and disease builds up confidence in the scientific hypothesis and thus greater success and efficiency in later stages of the drug discovery process ^[11]^[12].

Once the targets are identified and validated, compound screening assays are carried out to discover novel hit compounds (hit-to-lead). There are various strategies that can be used in this screening, involving physical methods, such as mass spectrometry ^[13], fragment screening ^[14]^[15], nuclear magnetic resonance (NMR) screening ^[16], DNA encoded chemical libraries ^[17], high throughput screening (HTS) (such as protein or cells) ^[18] or in silico methods, such as virtual screening (VS) ^[19].

After hit compounds are identified, properties, such as absorption, distribution, metabolism, excretion (ADME), and toxicity should be considered and optimised early in the drug discovery process. Unfavourable pharmacokinetic and toxicity profile of a drug candidate is one of the hurdles that often leads to failure in the clinical trials ^[20].

Although physical and computational screening techniques are distinct in nature, they are often integrated in the drug discovery process to complement each other and maximise the potential of the screening results ^[21].

Computer-aided drug design (CADD) utilises this information and knowledge to screen for novel drug candidates. With the advancement in technology and computer power in recent years, CADD has proven to be a tool that reduces the time and resources required in the drug discovery pipeline.

2. Structure-Based Drug Design

The functionality of a protein is dependent upon its structure, and structure-based drug design (SBDD) relies on the 3D structural information of the target protein, which can be acquired from experimental methods, such as X-ray crystallography, NMR spectroscopy and cryo-electron microscopy (cryo-EM). The aim of SBDD is to predict the Gibbs free energy of binding (ΔG_bind), the binding affinity of ligands to the binding site, by simulating the interactions between them. Some examples of SBDD include molecular dynamics (MD) simulations ^[22], molecular docking ^[23], fragment-based docking ^[24], and de novo drug design ^[25]. Figure 2 describes a general workflow of molecular docking that will be discussed in greater detail.

Figure 2. General workflow of molecular docking. The process begins with the preparation of the protein structure and ligand database separately, followed by molecular docking in which the ligands were ranked based on their binding pose and predicted binding affinity. (Abbreviations: LBDD: Ligand-based drug design; ADME: absorption, distribution, metabolism and excretion; MD: molecular dynamics; MM-GBSA: molecular mechanics with generalised Born and surface area).

2.1. Protein Structure Prediction

The advancements in sequencing technology led to a steep increase in recorded genetic information thus rapidly widening the gap between the amounts of sequence and structural data available. As of May 2022, the UniprotKB/TrEMBL database contained over 231 million sequence entries, yet there are only approximate 193,000 structures recorded in the Protein Data Bank (PDB) ^[26]^[27]. To model the structures of those proteins where structural data is not available, homology (comparative) modelling or ab initio methods can be used.

2.1.1. Homology Modelling

Homology modelling involves predicting the structure of a protein by aligning its sequence to a homologous protein that serves as a template for the construction of the model. The process can be broken down into three steps: (1) template identification, (2) sequence-template alignment, and (3) model construction.

Firstly, the protein sequence is obtained, either experimentally or from databases, such as the Universal Protein Resource (UniProt) ^[28], and this is followed by identifying modelling templates that have high sequence similarity and resolution by performing a BLAST ^[29] search against the Protein Data Bank ^[30]. PSI-BLAST ^[29] uses profile-based methods to identify patterns of residue conservation, which can be more useful and accurate than simply comparing raw sequences, as protein functions are predominately determined by the structural arrangement rather than the amino acid sequence. One of the biggest limitations of homology modelling is that it relies heavily upon the availabilities of suitable templates and accurate sequence alignment. A high sequence identity between the query protein and the template normally gives greater confidence in the homology model. Generally, a minimum of 30% sequence identity is considered to be a threshold for successful homology modelling, as approximately 20% of the residues are expected to be misaligned for sequence identities below 30%, leading to poor homology models. Alignment errors are less frequent when the sequence identity is above 40%, where approximately 90% of the main-chain atoms are likely to be modelled with a root-mean-square deviation (RMSD) of ~1 Å, and the majority of the structural differences occur at loops and in side-chain orientations ^[31].

Pairwise alignment methods are used when comparing two sequences and they are generally divided into two categories—global and local alignment (Figure 3). Global alignment aims to align the entire sequences and are most useful when sequences are closely related or of similar lengths. Tools such as EMBOSS Needle ^[32] and EMBOSS Stretcher ^[32] use the Needleman–Wunsch algorithm ^[33] to perform global alignment. In comparison to using a somewhat brute-force approach, the Needleman–Wunsch algorithm uses dynamic programming to find the best alignment by reducing the number of possible alignments that need to be considered and guarantees to find the best alignment. Dynamic programming aims to break a larger problem (the entire sequence) into smaller problems which are then solved optimally. The solutions to these smaller problems are then used to construct an optimal solution to the original problem ^[34]. The Needleman–Wunsch algorithm first builds a matrix that is subjected to a gap penalty (negative scores in first row and column), and the matrix is used to assign a score to every possible alignment (usually positive score for match, no score or penalty for mismatch and gaps). Once the cells in the matrix are filled in, traceback starts from the lower right towards the top left of the matrix to find the best alignment with the highest score.

Figure 3. Example of global and local alignment using Needle ^[32] and LALIGN ^[32]. Global alignment aims to find the best alignment across the two entire length of sequences. Local alignment finds regions of high similarity in parts of the sequences.

Local alignment, on the other hand, aims to identify regions that share high sequence similarity, which is more useful when aligning sequences that are dissimilar or distantly related. EMBOSS water ^[32] and LALIGN ^[32] are tools that use the Smith–Waterman algorithm ^[35] for local alignment. The Smith–Waterman algorithm, such as the Needleman-Wunsch algorithm, uses dynamic programming to perform sequence alignment. However, there is no negative score assigned in this algorithm, and the first row and column are set to 0. Traceback begins with the matrix cell from the highest score and travels up/left until it reaches 0 to produce the highest scoring local alignment.

When searching for templates used for homology modelling, including multiple sequences will improve accuracy of the alignment in regions where there is a low sequence homology, hence multiple sequence alignment (MSA) is essential. The global alignment method for multiple sequences is generally too computationally expensive; modern MSA tools (e.g., ClustalW ^[36], T-Coffee ^[37] and MUSCLE ^[38]) commonly use a progressive alignment approach that combines global and/or local alignment methods, followed by the branching order of a guide tree. This technique aims to achieve a succession of pairwise alignments, first aligning the most similar sequences and then progressing to the next most similar sequence until the entire query set has been incorporated.

For example, MSA was used during the construction of the homology models for Alanine-Serine-Cysteine transporter (SLC1A5) by Garibsingh et al. At the time, there was limited structural information on SLC1A5 due to the lack of an experimentally determined structure of human SCL1 family proteins. Most of the knowledge on the human SLC1 family protein therefore came from the study of prokaryotic homologs, which share low sequence identity. Using the structural information of the recently solved human SLC1A3, Garibsingh et al. carried out a phylogenetic analysis by generating MSA of the human SCL1 family and its prokaryotic homologs using MUSCLE and Promals3D ^[39], and built two different conformations of SLC1A5 homology models for the design of SLC1A5 inhibitors ^[40].

Once the alignment is complete, the model can be constructed starting with the backbone, then loops and lastly side-chains. The polypeptide backbone of the protein is first created by copying the coordinates of the residues from the template to create the model backbone. Gaps between the alignment of the sequence and the template are then taken care of through insertions and deletions in the alignment. It is important to remodel gaps accurately, as any error introduced here, will be amplified in later stages, thus leading to structural changes that can be critical for protein functionality and protein–protein interactions. Loop modelling, via knowledge-based methods or energy-based methods, can be used to generate predictions of the conformations of the loop. Knowledge-based methods look for experimental data on loops with high sequence similarity to the target from databases, such as PDB, and then insert them into the model. Yang et al. used FREAD ^[41] to predict the structure of a missing loop and construct a model of a monoclonal antibody, Se155-4, to study its antibody–antigen interactions with Salmonella Typhimurium O polysaccharide ^[42]. On the other hand, energy-based methods predict protein folding using ab initio methods with scoring function optimisation. For example, the Rosetta Next-Generation Kinematic Closure protocol ^[43], which employs the ab initio method, was used in loop prediction calculations to construct parts of the leucine-rich repeat kinase 2 (LRRK2) model, as the homology model template had missing loop sections. Mutations in the catalytic domains of LRRK2 are associated with familial and sporadic Parkinson’s disease, yet little is known about its overall structure and the mutations, which alter LRRK2 function and enzymatic activities. Combining homology models with experimental constraints, Guaitoli and co-workers constructed the first structural model of the full length LRRK2 that includes domain engagement and contacts. The model provided insight into the roles that the different domains play in the pathogenesis of Parkinson’s disease and will serve as a basis for future drug design on LRRK2 ^[44].

Lastly, side-chains are built onto the backbone model according to the target sequence. Most side-chain types in proteins have a limited number of conformations (rotamers) and programs such as SCWRL ^[45] predict these in order to minimise the total potential energy. Upon completion, the model is optimised using molecular mechanics force fields to improve its quality.

A ligand-based approach can be utilised to further optimise homology models with low sequence identity between query sequence and structural template. Moro et al. first presented ligand-based homology modelling, also known as ligand-guided or ligand-supported homology modelling, as a tool to inspect G protein-coupled receptors (GPCRs) structural plasticity ^[46]. GPCRs comprise a superfamily of membrane proteins with over 800 members; they play a significant role in cellular signalling in the human body. As such, GPCRs are associated with numerous biological processes, making them important therapeutic targets ^[47]. Unfortunately, crystallisation of membrane proteins is known to be challenging, especially in the case of GCPRs, and there were few structural data of GPCRs available until the last decade.

Given that the GPCRs are a diverse family, additional optimisation is required to refine homology models built for those with low sequence identity to the structural template to increase the level of accuracy. In this approach, an initial homology model is first developed using the conventional method. Active ligands are then docked into the binding site for optimisation. The receptor is reorganised and refined based upon the ligand binding in order to better accommodate ligands with higher affinity. Moro et al. first introduced this approach to construct a homology model of the human A₃ receptor based on the structure of bovine rhodopsin in 2006, the only known GPCR structure at the time. A set of structurally related class of pyrazolotriazolopyrimidines with known binding affinities was docked into a conventional rhodopsin-based homology model to induce receptor reorganisation ^[46].

The ligand-based homology modelling approach has been used extensively since then in studies of GPCRs, including serotonin receptors ^[48], dopamine receptors ^[49], cannabinoid receptors ^[50], neurokinin-1 receptor ^[51], γ-aminobutyric acid (GABA) receptor ^[52] and histamine H3 receptors ^[53].

2.1.2. Ab Initio Protein Structure Prediction

Historically, the homology modelling approach has been the ‘go-to’ method when it comes to protein structure prediction because it is less computationally expensive and produces more accurate predictions. One of the biggest limitations, however, is that it relies on existing known structures, so that the prediction of more complex targets, such as membrane proteins with little known structural data, is almost impossible. Another solution to this problem is the use of template-free approach, also known as ab initio modelling, free modelling, or de novo modelling ^[54]^[55]. As the name implies, this approach predicts a protein structure from amino acid sequences without the use of a template. In addition, the ab initio approach can model protein complexes and provide information on complex formation and protein-protein interaction. This is significant as some proteins exist as oligomers and hence performing docking on monomeric structures may be ineffective ^[56]. The principle behind ab initio modelling is based on the thermodynamic hypothesis proposed by Anfinsen, which states that ‘the three-dimensional structure of a native protein in its normal physiological milieu is the one in which the Gibbs free energy of the whole system is lowest; that is that the native conformation is determined by the totality of the inter atomic interactions, and hence by the amino acid sequence, in a given environment ^[57].

Ab initio protein structure prediction is traditionally classified into two groups, physics-based and knowledge-based, although recent approaches tend to incorporate both. Purely physics-based methods such as ASTRO-FOLD ^[58]^[59] and UNRES ^[60] are independent of structural data and the interactions between atoms are modelled based on quantum mechanics. It is believed that all the information about the protein, including the folding process and its 3D structure, can be deduced from the linear amino acid sequence. This approach is often coupled with molecular dynamics refinement which also gives valuable insight into the protein folding process. The Critical Assessment of Methods of Protein Structure (CASP) is a biennial double-blinded structure prediction experiment that assesses the performance of various protein structure prediction methods. ASTRO-FOLD 2.0 successfully predicted a number of good quality structures that are comparable to the best model in CASP9 ^[59]. Unfortunately, one of the major drawbacks of pure physics-based approaches is that, due to the enormous amount of conformational space needed to cover, it is often accompanied with high computational cost and time requirement and is only feasible to predict the structures of small proteins.

Bowie and Eisenberg first proposed the idea of assembling short fragments derived from existing structures to form new tertiary structures in 1994 ^[61]. The idea behind this process is that the use of low-energy local structures from a fragment library provides confidence in local features as these structures are experimentally validated. Furthermore, significantly reduced computational resources are required as the conformational sampling space is reduced. Rosetta, one of the best-known knowledge-based programs, utilises a library of short fragments that represent a range of local structures by splicing 3D structures of known protein structures. The query sequence is then divided into short ‘sequence window’; the top fragments for each sequence window are identified, on the basis of factors, such as sequence similarity and secondary structure prediction for local backbone structures, and these fragments are assembled to build a pool of structures with favourable local and global interactions (known as decoys) via a Monte Carlo sampling algorithm ^[62]. During the assembly process, the representation of the structure is simplified (only includes the backbone atoms and a single centroid side-chain pseudo-atom) in order to sample the conformational space efficiently. It starts off with the protein in a fully extended conformation. A sequence window is selected and one of the top ranked fragments for this window is randomly selected to have its torsion angles replace those of the protein chain. The energy of the conformation is then evaluated by a course-grained energy function and the move accepted or rejected according to the Metropolis criterion. In the Metropolis criterion, a conformation with a lower energy than the previous one is accepted, whereas a conformation with a higher energy (less favourable) is kept based on the acceptance probability ^[63]. The whole process repeats until the whole 3D structure is generated. Following this, side-chains are constructed and structures are refined using an all-atom energy function to model the position of every atom in the structure and generate high resolution models ^[64]. Other knowledge-based ab initio approaches include I-TASSER ^[65] and QUARK ^[66].

Another method to improve the accuracy of de novo protein structure prediction is the use of co-evolutionary data for targets with many homologs. The structure of a protein is the key to its biological function, and through the evolutionary process, amino acids in direct physical contact, or in proximity, tend to co-evolve together in order to maintain these interactions and hence preserve the function of the protein. Furthermore, residues that have a high number of evolutionary constraints could indicate important functionalities. Based upon this principle, evolutionary and co-variation data that are obtained from databases such as Pfam ^[67] can be harnessed to predict residue contacts and protein folding ^[68]. This method works by performing MSA on a large and diverse set of homolog sequences to the query protein, information on amino acids pairs that co-evolve, also known as evolutionary couplings, are then extracted to determine the location of each residues ^[69].

The application of neural network-based deep learning approaches to integrate co-evolutionary information has revolutionised the technology used in protein structure prediction and made a huge impact. There are currently a few prediction approaches using deep learning methods to guide protein structure prediction, such as Raptor X ^[70], ProQ3D ^[71], D-I-TASSER ^[72], D-QUARK ^[72], and trRosetta ^[73]. The impact of using deep learning methods is showcased by AlphaFold, an Artificial Intelligence (AI) system developed by DeepMind and RoseTTAFold ^[74], a similar program built using a 3-track neural network from the Baker lab, which has taken the protein modelling community by storm in the two most recent CASPs, CASP13 and CASP14. In CASP13, Alphafold 1 ^[75] was placed first in the rankings with an average of Global Distance Test Total Score (GDT_TS) of 70%. The GDT_TS is a metric that corresponds to the accuracy of the backbone of the model, the higher the value, the higher the accuracy ^[76]. Subsequently in CASP14, the newer version, Alphafold 2, was placed first again and outperformed all other programs by a huge margin with a median GDT_TS of 92.4 over all categories ^[77]. Additionally, the updated version of trRosetta, RoseTTaFold, was ranked second and demonstrated a superior performance than AlphaFold 1 in CASP13, and that all top 10 ranking methods in CASP14 use deep learning-based approaches, signifying the progression in protein prediction accuracy. High accuracy models predicted by AlphaFold 2 are also published in AlphaFold Protein Structure Database (https://alphafold.ebi.ac.uk/, accessed on 7 May 2022), providing an extensive structural coverage of known protein sequences ^[78].

Knowledge-based methods, such as I-TASSER and QUARK were not tested in CASP14 ^[72], however variants of these approaches which integrated deep-learning into protein structure prediction algorithms ranked 8th and 9th, respectively. Physics-based methods, such as UNRES (previously described above), using 3 different approaches (UNRES-template, UNRES-contact and UNRES) achieved GDT_TS scores of 56.37, 39.3 and 29.2, respectively. These results ranked 32nd, 109th and 117th ^[77]. The large majority of the top ranking algorithms in CASP14 utilised deep learning approaches, further affirming the utility of deep learning in protein structure prediction approaches ^[72].

2.1.3. Protein Model Validation

The accuracy and quality of the predicted structures can be validated and verified using different methods. The stereochemistry of the model can be verified by analysing bond lengths, torsion angles and rotational angles with tools, such as WHATCHECK ^[79] and Ramachandran plots ^[80]. The Ramachandran plot examines the backbone dihedral angles ϕ and ψ, which represents the rotations made by N—Cα and Cα—C bond in the polypeptide chain, respectively (Figure 4). Torsion angles determine the conformation of each residue and the peptide chain; however, some angle combinations cause close contacts between atoms, leading to steric clashes. The Ramachandran plot determines which torsional angles of the peptide backbone are permitted, and thus assesses the quality of the model. Spatial features, such as 3D conformation and mean force statistical potentials, can be validated using Verify3D ^[81], which measures the compatibility of the model to its own amino acid sequence. Each residue in the model is evaluated by its environment, which is defined by the area of the residue that is buried, the fraction of side-chain area that is covered by polar atoms (oxygen and nitrogen) and the local secondary structure. Other structure validation tools include MolProbity ^[82]^[83], NQ-Flipper ^[84], Iris ^[85], SWISS-MODEL ^[86] and Coot ^[87]^[88]^[89]. In addition to in silico validation, experimental validation of the predicted complexes may also be used to aid selection of a model for future in silico studies. Cross-linking mass spectrometry (XL-MS) provides experimental distance constraints, which can be checked against the predicted models ^[90].

Figure 4. (A) Protein backbone with dihedral angles. (B) An example of a Ramachandran plot of crystal structure of human farnesyl pyrophosphate synthase (PDB ID: 4P0V) ^[91]. White: disallowed region; yellow: allowed region; red: favourable region.

2.2. Docking-Based Virtual Screening

Docking-based virtual screening aims to discover new drugs by predicting binding modes of both ligand and receptor, studying their interaction patterns, and estimating binding affinity. Some examples of the many docking programs include AutoDock ^[92], GOLD ^[93], Glide ^[94]^[95], SwissDock ^[96], DockThor ^[97], CB-Dock ^[98] and Molecular Operating Environment (MOE) ^[99] (Table 1). Due to limitations of X-ray crystallography and NMR spectroscopy, experimentally derived structures often have problems, such as missing hydrogen atoms, incomplete side-chains and loops, ambiguous protonation states and flipped residues. It is therefore essential to prepare the 3D structures accordingly in order to fix these issues before the docking process ^[100].

The three main goals of molecular docking are: (1) pose prediction to envisage how a ligand may bind to the receptor, (2) virtual screening to search for novel drug candidates from small molecule libraries and (3) binding affinity prediction using scoring functions to estimate the binding affinity of ligands to the receptor ^[101]. Search algorithms and scoring functions are essential components for molecular docking programs.

A good search algorithm should explore all possible binding modes, and this can be a challenging task. The concept of molecular docking originated from the ‘lock and key’ model proposed by Emil Fischer ^[102], and early docking programs treated both the protein and ligands as rigid bodies. It was known that protein and ligands are both dynamic entities and that their conformations play an important role in ligand–receptor binding and protein functions, but historically this was too computationally expensive to implement. Modern docking programs treat both protein and ligand with varying degrees of flexibility in order to address this issue.

2.2.1. Binding Site Detection

In docking-based virtual screening, the location of the binding site within the protein must be identified. Most of the protein structures in the PDB are ligand-bound (holo) structures, which defines the binding pocket and provides its geometries. In cases where only ligand-free (apo) structures available, there are traditionally three main types of method to identify potential druggable binding sites. Template-based methods such as firestar ^[103], 3DLigandSite ^[104] and Libra ^[105]^[106] utilise protein sequences to locate residues that are conserved and important for binding. Geometry-based methods, such as CurPocket ^[98], Surfnet ^[107] and SiteMap ^[108]^[109], search for clefts and pockets based on the size and depths of these cavities. Energy-based methods such as FTMap ^[110] and Q-SiteFinder ^[111] locate sites on the surface of a protein that are energetically favourable for binding. Hybrid methods, such as ConCavity ^[112] and MPLs-Pred ^[113], as well as machine-learning methods, such as DeepSite ^[114], Kalasanty ^[115], and DeepCSeqSite ^[116] are some of the newer approaches that are under rapid development in recent years.

Beyond locating the orthosteric binding site, these tools are also valuable in identifying potential allosteric binding sites to modulate protein function, hot spots on protein surface to alter protein–protein interactions and also analysing known binding sites to design better molecules that complement the binding pocket. Furthermore, proteins are dynamic systems, and their conformations may change upon ligand binding. Hidden binding pockets, known as cryptic pockets, which are not present in a ligand-free structure, can result from conformational changes upon ligand binding. Detection of cryptic pockets can be a solution to target proteins that were previously considered to be undruggable due to the lack of druggable pockets ^[117]^[118].

In addition to the location of the binding site, the evaluation of its potential druggability is equally important. Druggability is the likelihood of being able to modulate a target with a small molecule drug ^[119]. It can be evaluated on the basis of target information and association, such as protein sequence similarity or genomic information ^[120]. However, this approach only works for well-studied protein families and homologous proteins may not necessarily bind to structurally similar molecules ^[121].

Various efforts have been made to evaluate druggability using structure-based approaches. Cheng et al. developed the MAP_POD score, one of the first methods to evaluate druggability, using a physics-based method. MAP_POD model is a binding free energy model combined with curvature and hydrophobic surface area to estimate the maximal achievable affinity for passively absorbed drugs ^[119]. Halgren developed Dscore, which is a weighted sum of size, enclosure and hydrophobicity ^[108]^[109]^[122]. Other methods to predict druggability include Drug-like Density (DLID) ^[123], DrugPred ^[124], DoGSiteScorer ^[125], FTMap ^[126] and PockDrug ^[127].

DoGSiteScorer is a webserver that supports the prediction of potential pockets, characterisation and the druggability estimation. The algorithm first maps a rectangular grid onto the protein; grid points are labelled as either free or occupied depending on whether they lie within the vdW radius of any protein atom. Free grid points are merged to form pockets and subpockets, and neighbouring subpockets are then merged to form pockets. A 3D Difference of Gaussian (DoG) filter is then applied to identify pockets that are favourable to accommodate a ligand. These pockets are characterised global and local descriptors, such as pocket volume, surface, depth, ellipsoidal shape, types of amino acids, presence of metal ions, lipophilic surface, overall hydrophobicity ratio, distances between functional group atoms and many more ^[125]^[128].

To predict druggability, a machine learning technique (support vector machine model) trained on a set of known druggable proteins is used to identify druggable pockets based on a subset of these descriptors and to provide a druggability score between 0 to 1, where the higher the score the more druggable is the pocket. A SimpleScore, a linear regression based on size, enclosure and hydrophobicity, is also available to predict druggability ^[129].

Michel and co-workers used DoGSite, along with FTMap, CryptoSite, as well as SiteMap to predict ligand binding pockets and evaluate druggability of the nucleoside diphosphates attached to sequence-x (NUDIX) hydrolase protein family. Using a dual druggability assessment approach, the authors identified several proteins that are druggable out of the 22 that were studied. This in silico data was also found to correlate well with experimental results ^[130].

Sitemap locates binding sites by placing ‘site points’ around the protein and each site point is analysed for the proximity to the protein surface and solvent exposure. Site points that fulfil the criteria and are within a given distance of each other are combined into subsites, then subsites that have a relatively small gap between them in a solvent-exposed region are merged to form sites. Distance-field and van der Waals (vdW) grids are then generated to characterise the binding site into three basic regions: hydrophobic, hydrophilic (further separates into H-bond donor, acceptor, and metal-binding region) and neither. Sitemap also evaluates the potential binding sites and computes various properties such as size of the site measured by number of site points, exposure to solvent, degree of enclosure by protein, contact of site points with the protein, hydrophobic and hydrophilic character of the site, and the degree to which a ligand can donate hydrogen bonds. These properties contribute to the calculation of the SiteScore (to distinguish drug-binding and non-drug binding sites) and Dscore (druggability score), which helps to recognise druggable binding sites for virtual screening ^[108]^[109].

The transient receptor potential vanilloid 4 (TRPV4) is a widely expressed non-selective cation channel involved in various pathological conditions. Despite the availability of several TRPV4 inhibitors, the binding pocket of TRPV4 and the mechanism of action was not well understood. Doñate-Macian and coworkers used Sitemap to search and assess the binding pocket for one of the known TRPV inhibitors HC067047 based on the crystal structure of Xenopus TRPV4 (Figure 5). This group also further characterised the binding pocket and inhibitor–protein binding interactions with the aid of molecular docking, molecular dynamics and mutagenesis studies. The information was then employed to run a structure-based virtual screening to discover novel TRPV4 inhibitors ^[131].

Figure 5. Binding site of TRPV4 detected using Sitemap by Doñate-Macian et al. ^[131]. Yellow: hydrophobic region; blue: H-bond donor region; red: H-bond acceptor region; white sphere: site point.

2.2.2. Ligand Flexibility

Ligand structures for virtual screening can be obtained from small molecule databases, which are free (e.g., ZINC ^[132], DrugBank ^[133] and Pubchem ^[134]) or commercial (e.g., Maybridge, ChemBridge and Enamine). Conformational sampling of ligands can be performed in several ways. Systematic search generates all possible ligand conformations by exploring all degrees of freedom of the ligand ^[135]. Carrying out a systematic search using a brute-force approach (exhaustive search) can easily overwhelm the computing power, especially for molecules with many rotatable bonds and therefore rule-based methods have been the more favoured approaches in recent years. Rule-based methods, such as the incremental construction algorithm (also known as anchor and grow method), generate conformations based on known structural preferences of compounds by limiting the conformational space that is being explored. Usually, a knowledge base of allowed torsion angles and ring conformations (e.g., data from the PDB), and possibly a library of 3D fragment conformations, is used to guide the sampling ^[136]^[137]. These break the molecule into fragments that are docked into different regions of the receptor. The fragments are then reassembled together to construct a molecule in a low energy conformation.

Conformer generator OMEGA ^[138] employs a prebuilt library of fragments as well as a knowledge base of torsion angles to generate a large set of conformations, which are sampled by geometric and energy criteria to eliminate conformers with internal clashes. Likewise, ConfGen ^[139] divides ligands into a core region and peripheral rotamer groups. The core conformation is first generated using a template library, followed by the calculation of the potential energy of rotatable bonds with the torsional term of the OPLS force field, and lastly positioning peripheral groups in their lowest energy forms. To eliminate undesirable conformations or to limit the number of conformations, filtering approaches are applied. Conformations that are too similar are removed based on an energy filter, RMSD, and dihedral angles involving polar hydrogen atoms. Compact conformers are also removed by an empirically derived heuristic scoring method ^[94]^[139].

On the other hand, a stochastic search randomly changes the degrees of freedom of the ligand at each step and the change is either accepted or rejected according to a probabilistic criterion such as the Metropolis criterion ^[140]. Sampling of conformational space can be performed using different techniques in a stochastic search, including Monte Carlo (MC) sampling ^[62], distance geometry sampling ^[141] and genetic algorithm-based sampling ^[142]^[143]. Balloon ^[142], a free conformer generator, uses distance geometry to generate an initial conformer for a ligand, followed by a multi-objective genetic algorithm approach to modify torsion angles around rotatable bonds, stereochemistry of double bonds, chiral centres, and ring conformations. Some other tools that were developed for ligand preparation include Prepflow ^[144], VSPrep ^[145], Gypsum-DL ^[146], Frog2 ^[147] and UNICON ^[148].

2.2.3. Protein Flexibility

Protein flexibility is essential for their biological function and subtle changes, such as side-chain rearrangements, can alter the size and shape of the binding site and thus bias docking results ^[149]. Methods to handle protein flexibility can be divided into four groups: soft docking ^[150]^[151], side-chain flexibility ^[152], molecular relaxation ^[153], and protein ensemble docking ^[154]^[155]. Soft docking allows small degrees of overlap between the protein and the ligand by softening the interatomic vdW interactions in docking calculations ^[151]. These are the simplest methods and are computationally efficient, but they can only account for small changes. Side-chain flexibility allows the sampling of side-chain conformations by varying their essential torsional degrees of freedom, while the protein backbones are kept fixed ^[156]. The molecular relaxation method involves both protein backbone flexibility and side-chain conformational changes; it first uses rigid-protein docking to place the ligand into the binding site then relaxes the protein backbone and the nearby side-chain atoms, usually employing methods, such as MC and MD ^[157]^[158]^[159]. Protein ensemble docking methods dock the ligand on a set of rigid protein structures, with different conformations which represent a flexible receptor. The docking results for each conformation are then re-analysed ^[160].

Most contemporary docking approaches treat proteins with partial or complete flexibility. For instance, Schrödinger offers a range of docking methodologies with different treatment of protein flexibility. Glide ^[94]^[95], with standard precision (SP) and extra precision (XP) is a docking strategy, which allows conformational flexibility for the ligands but treats the receptor as a rigid entity. It softens the active site via vdW scaling (soft docking) with the option of rotamer configuration sampling. Meanwhile, a superior method, Induced Fit Docking, uses Glide for docking to account for ligand flexibility, and Prime ^[161]^[162] for side-chain optimisation to account for receptor flexibility ^[163]. The ligand is docked into the receptor using Glide with vdW scaling and flexible side-chains are temporarily mutated to alanine to reduce steric clashes and the blocking of the binding site. Once the docking poses are generated, the mutated residues are restored to their original residues and Prime (a program for protein structure predictions) ^[161]^[162] is used to predict and reorient the side-chains with each ligand pose. The ligand–receptor complex is then minimised to afford a low-energy protein conformation, which is used for ligand resampling with Glide.

Water molecules have a crucial role in biological systems and interactions, such as stabilising protein–ligand complex, biomolecular recognition and participating in H-bond networks. Water molecules can participate in ligand–protein interactions by acting as bridging waters, and their displacement from the binding site upon ligand binding can also contribute to binding affinity, playing a significant role in the thermodynamics of protein-ligand binding ^[164]. The retention or removal of water molecules during virtual screening can have a direct impact on the size, shape and chemical properties of the binding site, which can influence binding geometries and affinity calculations.

Due to the ability of a water molecule to act as both an H-bond donor and acceptor, as well as its highly mobile nature, predicting the location and contribution of water molecules in protein–ligand binding is a challenging task. Crystal structures or cryo-EM structures of proteins can sometimes capture the placement of water molecules in the protein matrix, but the information is not always accurate due to the low resolution of the structural data, and the sample preparation conditions do not reflect the biological environment ^[165]^[166]^[167]^[168].

Many approaches were developed to simulate and predict the behaviour of water molecules. Implicit models, also known as continuum models, treat water molecules as a uniform and continuous medium. The free energy of solvation is traditionally estimated based on three parameters, the free energy required to form the solute cavity, vdW interactions and electrostatic interactions between solute and solvent. This method is less computationally demanding but neglects details at the solute–solvent interface ^[167]^[168]. Explicit models are computationally more expensive, but the molecular details of each water molecule are considered. Water molecules are normally described using a three-, four-, or five-point model.

In protein–ligand docking, water can be treated explicitly or in an approach involving a combination of implicit and explicit (hybrid), and they can be separated into four categories: (1) Empirical and knowledge-based methods (e.g., Consolv ^[169] and WaterScore ^[170]), (2) statistical and molecular mechanics methods (e.g., GRID ^[171]^[172], 3D-RISM ^[173]^[174], SZMAP ^[175]), (3) MD simulation methods (e.g., WaterMap ^[176], GIST ^[177], SPAM ^[116]) and, lastly, (4) Monte Carlo simulation methods (e.g., JAWS ^[178]).

2.2.4. Scoring Functions

After searching for all possible binding modes, a scoring function is used to evaluate the quality of the docking poses. Scoring functions determine the binding mode and estimate binding affinity, which assists in identifying and ranking potential drug candidates. There are three main categories of scoring functions: force field-based, empirical-based, and knowledge-based methods.

Force field-based scoring functions generally use standard force field parameters taken from force fields, such as AMBER ^[179], which consider both the intramolecular energy of the ligand and the intermolecular energy of the protein–ligand complex ^[180]. The ΔG estimated using this scoring function is the sum of these energies, which is generally composed of vdW and electrostatic energy terms. An example of program that uses this method is DOCK, which utilises the following equation: ^[181]^[182]

Δ G = \sum_{i} \sum_{j} (\frac{A_{i j}}{r_{i j}^{12}} - \frac{B_{i j}}{r_{i j}^{6}} + \frac{q_{i} q_{j}}{ε (r_{i j}) r_{i j}})

where

r_{i j}

is the distance between protein atom

i

and ligand atom

j

A_{i j}

and

B_{i j}

are vdW components (repulsive and attractive vdW),

q_{i}

and q_j are atomic charges and

ε (r_{i j})

is the distance-dependent dielectric constant.

Empirical-based functions estimate binding affinity based upon a set of weighted energy terms that are described in the following equation:

Δ G = \sum_{i} W_{i} \cdot Δ G_{i}

The energy terms (

Δ G_{i}

) represents energy terms such as vdW energy, electrostatic energy, hydrogen (H) bond interactions, desolvation, entropy, hydrophobicity, etc., whereas the weighting factors (

W_{i}

) are determined via regression analysis by fitting the binding affinity data of a training set of protein–ligand complex with known 3D structures ^[94]. The first empirical scoring function (SCORE) was developed by Böhm in 1994 ^[183] based upon a dataset of 45 protein–ligand complexes, and the scoring function considers four energy terms: hydrogen bonds, ionic interactions, the lipophilic protein–ligand contact surface and the number of rotatable bonds in the ligand. Over time, the empirical scoring function has evolved by expanding the data set and considering more energy terms. For example, ChemScore, developed by Eldridge et al. ^[184], also considers metal atoms contribution and Glide XP score includes terms to account for desolvation effects ^[94].

In knowledge-based functions, structural information is extracted from experimentally determined structures of protein–ligand complexes from databases, such as the PDB ^[30] and Cambridge Structural Database (CSD) ^[185]^[186]. Boltzmann law is employed to transform the protein–ligand atom pair preferences into distance-dependent pairwise potentials, and the favourability of the binding modes of atom pairs is related to the frequency observed in known protein–ligand structures ^[187]^[188]. The potentials are calculated using the following equation:

w (r) = - K_{B} T \ln [g (r)], g (r) = ρ (r) / ρ * (r)

where w(r) is the pairwise potential between protein and ligand,

K_{B}

is the Boltzmann constant,

T

is the absolute temperature of the system,

ρ (r)

is the number density of the protein–ligand atom pair at distance

r

, and

ρ * (r)

is the pair density in a reference state where the interatomic interactions are zero.

Table 1. List of common docking programs.

Program	Ligand Flexibility	Receptor Flexibility	Scoring Functions	Examples of Application
Glide (HTVS, SP and XP) ^[94]^[95]^[189]	Exhaustive ligand conformation search	Soft docking	Empirical	Discovery of novel fibroblast growth factor receptor 1 kinase inhibitors ^[190] and CDK5 inhibitors ^[191]
GOLD ^[93]	Genetic algorithm	Soft docking Ensemble docking Side-chain flexibility	Goldscore (empirical) Chemscore (empirical) ChemPLP (empirical) ASP (knowledge based)	Design of non-peptide MDM2 inhibitors ^[192]
Autodock 4 ^[193]	Genetic Algorithm Simulated Annealing Local Search Lamarckian Genetic Algorithm	Side-chain flexibility	Semi-empirical free energy force field	Discovery of reversible NEDD8 activating enzyme inhibitor ^[194]
DOCK 6 ^[195]	Incremental construction algorithm	Rigid	Force field	Design and development of potent and selective dual BRD4/PLK1 Inhibitors ^[196]
Internal Coordinates Mechanics (ICM) ^[197]	Stochastic search (MC)	Side-chain flexibility (rotamer libraries)	Force field	Discovery of novel retinoic acid receptor agonist ^[198] and enoyl-acyl carrier protein reductase inhibitors in Plasmodium falciparum ^[199]
Surflex ^[200]^[201]	Incremental construction algorithm	Ensemble docking	Empirical	Discovery of novel inhibitors of Leishmania donovani γ-glutamylcysteine synthetase ^[202]
MOE ^[99]^[203]^[204]^[205]	Systematic (exhaustive) Stochastic High throughput Conformational Import (incremental construction + stochastic) ^[99]	Rigid	ASE (empirical) Affinity dG (empirical) Alpha HB (empirical) GBVI/WSA (force field)	Identification of novel monoamine oxidase B inhibitors ^[206] and Chk1 inhibitors ^[207]
FlexX ^[208]^[209]	Incremental construction algorithm	Rigid	Empirical	Identification of PKB inhibitors ^[210] and phosphodiesterase 4 inhibitors ^[211]
FRED ^[212]^[213]	Systematic (exhaustive) search, precomputed using Omega (using torsion and ring libraries) ^[138]	Rigid	Chemgauss 3 (empirical) Chemgauss 4 (empirical)	Discovery of selective butyrylcholinesterase inhibitors ^[214]

Abbreviations: ASP: Astex Statistical Potential; BRD4: Bromodomain 4; CDK5: Cyclin dependent kinase 5; ChemPLP: Piecewise Linear Potential; HTVS: high throughput virtual screening; MDM2: Mouse double minute 2 homolog; PKB: Protein kinase B; PLK1: Polo-like Kinase 1.

References

(FDA), U.S.F.D.A. The Drug Development Process. Available online: https://www.fda.gov/patients/learn-about-drug-and-device-approvals/drug-development-process (accessed on 2 February 2022).
Wouters, O.J.; McKee, M.; Luyten, J. Estimated research and development investment needed to bring a new medicine to market, 2009–2018. Jama 2020, 323, 844–853.
Wong, C.H.; Siah, K.W.; Lo, A.W. Estimation of clinical trial success rates and related parameters. Biostatistics 2019, 20, 273–286.
DiMasi, J.A.; Grabowski, H.G.; Hansen, R.W. Innovation in the pharmaceutical industry: New estimates of R&D costs. J. Health Econ. 2016, 47, 20–33.
DiMasi, J.A.; Hansen, R.W.; Grabowski, H.G. The price of innovation: New estimates of drug development costs. J. Health Econ. 2003, 22, 151–185.
Paul, S.M.; Mytelka, D.S.; Dunwiddie, C.T.; Persinger, C.C.; Munos, B.H.; Lindborg, S.R.; Schacht, A.L. How to improve R&D productivity: The pharmaceutical industry’s grand challenge. Nat. Rev. Drug Discov. 2010, 9, 203–214.
Yang, Y.; Adelstein, S.J.; Kassis, A.I. Target discovery from data mining approaches. Drug Discov. Today 2009, 14, 147–154.
Moffat, J.G.; Rudolph, J.; Bailey, D. Phenotypic screening in cancer drug discovery—Past, present and future. Nat. Rev. Drug Discov. 2014, 13, 588–602.
Hart, C.P. Finding the target after screening the phenotype. Drug Discov. Today 2005, 10, 513–519.
Xia, X. Bioinformatics and drug discovery. Curr. Top. Med. Chem. 2017, 17, 1709–1726.
Morgan, P.; Brown, D.G.; Lennard, S.; Anderton, M.J.; Barrett, J.C.; Eriksson, U.; Fidock, M.; Hamren, B.; Johnson, A.; March, R.E. Impact of a five-dimensional framework on R&D productivity at AstraZeneca. Nat. Rev. Drug Discov. 2018, 17, 167–181.
Morgan, P.; Van Der Graaf, P.H.; Arrowsmith, J.; Feltner, D.E.; Drummond, K.S.; Wegner, C.D.; Street, S.D.A. Can the flow of medicines be improved? Fundamental pharmacokinetic and pharmacological principles toward improving Phase II survival. Drug Discov. Today 2012, 17, 419–424.
Maple, H.J.; Garlish, R.A.; Rigau-Roca, L.; Porter, J.; Whitcombe, I.; Prosser, C.E.; Kennedy, J.; Henry, A.J.; Taylor, R.J.; Crump, M.P. Automated protein–ligand interaction screening by mass spectrometry. J. Med. Chem. 2012, 55, 837–851.
Dalvit, C. NMR methods in fragment screening: Theory and a comparison with other biophysical techniques. Drug Discov. Today 2009, 14, 1051–1057.
O’Reilly, M.; Cleasby, A.; Davies, T.G.; Hall, R.J.; Ludlow, R.F.; Murray, C.W.; Tisi, D.; Jhoti, H. Crystallographic screening using ultra-low-molecular-weight ligands to guide drug design. Drug Discov. Today 2019, 24, 1081–1086.
Shuker, S.B.; Hajduk, P.J.; Meadows, R.P.; Fesik, S.W. Discovering high-affinity ligands for proteins: SAR by NMR. Science 1996, 274, 1531–1534.
Madsen, D.; Azevedo, C.; Micco, I.; Petersen, L.K.; Hansen, N.J.V. Chapter Four—An overview of DNA-encoded libraries: A versatile tool for drug discovery. In Progress in Medicinal Chemistry; Witty, D.R., Cox, B., Eds.; Elsevier: Amsterdam, The Netherlands, 2020; Volume 59, pp. 181–249.
Macarron, R.; Banks, M.N.; Bojanic, D.; Burns, D.J.; Cirovic, D.A.; Garyantes, T.; Green, D.V.S.; Hertzberg, R.P.; Janzen, W.P.; Paslay, J.W.; et al. Impact of high-throughput screening in biomedical research. Nat. Rev. Drug Discov. 2011, 10, 188–195.
Shoichet, B.K. Virtual screening of chemical libraries. Nature 2004, 432, 862–865.
Kola, I.; Landis, J. Can the pharmaceutical industry reduce attrition rates? Nat. Rev. Drug Discov. 2004, 3, 711–716.
Bajorath, J. Integration of virtual and high-throughput screening. Nat. Rev. Drug Discov. 2002, 1, 882–894.
Karplus, M.; Petsko, G.A. Molecular dynamics simulations in biology. Nature 1990, 347, 631–639.
Shoichet, B.K.; McGovern, S.L.; Wei, B.; Irwin, J.J. Lead discovery using molecular docking. Curr. Opin. Chem. Biol. 2002, 6, 439–446.
Chen, Y.; Shoichet, B.K. Molecular docking and ligand specificity in fragment-based inhibitor discovery. Nat. Chem. Biol. 2009, 5, 358–364.
Schneider, G.; Fechner, U. Computer-based de novo design of drug-like molecules. Nat. Rev. Drug Discov. 2005, 4, 649–663.
EMBL-EBI UniProtKB/TrEMBL Protein Database Release 2022_02 Statistics. Available online: https://www.ebi.ac.uk/uniprot/TrEMBLstats (accessed on 27 July 2022).
Bank, R.P.D. PDB Statistics: Overall Growth of Released Structures Per Year. Available online: https://www.rcsb.org/stats/growth/growth-released-structures (accessed on 27 July 2022).
Consortium, U. UniProt: A hub for protein information. Nucleic Acids Res. 2015, 43, D204–D212.
Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402.
Bernstein, F.C.; Koetzle, T.F.; Williams, G.J.; Meyer Jr, E.F.; Brice, M.D.; Rodgers, J.R.; Kennard, O.; Shimanouchi, T.; Tasumi, M. The Protein Data Bank: A computer-based archival file for macromolecular structures. Eur. J. Biochem. 1977, 80, 319–324.
Sánchez, R.; Šali, A. Comparative protein structure modeling as an optimization problem. J. Mol. Struct. THEOCHEM 1997, 398–399, 489–496.
Madeira, F.; Park, Y.M.; Lee, J.; Buso, N.; Gur, T.; Madhusoodanan, N.; Basutkar, P.; Tivey, A.R.; Potter, S.C.; Finn, R.D. The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res. 2019, 47, W636–W641.
Needleman, S.B.; Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970, 48, 443–453.
Bellman, R. Dynamic programming. Science 1966, 153, 34–37.
Smith, T.F.; Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 1981, 147, 195–197.
Thompson, J.D.; Gibson, T.J.; Higgins, D.G. Multiple sequence alignment using ClustalW and ClustalX. Curr. Protoc. Bioinform. 2003, 1, 2–3.
Notredame, C.; Higgins, D.G.; Heringa, J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 2000, 302, 205–217.
Edgar, R.C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32, 1792–1797.
Pei, J.; Kim, B.H.; Grishin, N.V. PROMALS3D: A tool for multiple protein sequence and structure alignments. Nucleic Acids Res. 2008, 36, 2295–2300.
Garibsingh, R.-A.A.; Otte, N.J.; Ndaru, E.; Colas, C.; Grewer, C.; Holst, J.; Schlessinger, A. Homology Modeling Informs Ligand Discovery for the Glutamine Transporter ASCT2. Front. Chem. 2018, 6, 279.
Choi, Y.; Deane, C.M. FREAD revisited: Accurate loop structure prediction using a database search algorithm. Proteins Struct. Funct. Bioinform. 2010, 78, 1431–1440.
Yang, M.; Simon, R.; MacKerell Jr, A.D. Conformational preference of serogroup B Salmonella O polysaccharide in presence and absence of the monoclonal antibody Se155–4. J. Phys. Chem. B 2017, 121, 3412–3423.
Stein, A.; Kortemme, T. Improvements to robotics-inspired conformational sampling in rosetta. PLoS ONE 2013, 8, e63090.
Guaitoli, G.; Raimondi, F.; Gilsbach, B.K.; Gómez-Llorente, Y.; Deyaert, E.; Renzi, F.; Li, X.; Schaffner, A.; Jagtap, P.K.A.; Boldt, K. Structural model of the dimeric Parkinson’s protein LRRK2 reveals a compact architecture involving distant interdomain contacts. Proc. Natl. Acad. Sci. USA 2016, 113, E4357–E4366.
Wang, Q.; Canutescu, A.A.; Dunbrack Jr, R.L. SCWRL and MolIDE: Computer programs for side-chain conformation prediction and homology modeling. Nat. Protoc. 2008, 3, 1832.
Moro, S.; Deflorian, F.; Bacilieri, M.; Spalluto, G. Ligand-based homology modeling as attractive tool to inspect GPCR structural plasticity. Curr. Pharm. Des. 2006, 12, 2175–2185.
Gacasan, S.B.; Baker, D.L.; Parrill, A.L. G protein-coupled receptors: The evolution of structural insight. AIMS Biophys. 2017, 4, 491.
Rodríguez, D.; Ranganathan, A.; Carlsson, J. Strategies for improved modeling of GPCR-drug complexes: Blind predictions of serotonin receptors bound to ergotamine. J. Chem. Inf. Model. 2014, 54, 2004–2021.
Kołaczkowski, M.; Bucki, A.; Feder, M.; Pawłowski, M. Ligand-optimized homology models of D1 and D2 dopamine receptors: Application for virtual screening. J. Chem. Inf. Model. 2013, 53, 638–648.
Cichero, E.; Menozzi, G.; Guariento, S.; Fossa, P. Ligand-based homology modelling of the human CB2 receptor SR144528 antagonist binding site: A computational approach to explore the 1, 5-diaryl pyrazole scaffold. MedChemComm 2015, 6, 1978–1986.
Evers, A.; Klebe, G. Successful virtual screening for a submicromolar antagonist of the neurokinin-1 receptor based on a ligand-supported homology model. J. Med. Chem. 2004, 47, 5381–5392.
Freyd, T.; Warszycki, D.; Mordalski, S.; Bojarski, A.J.; Sylte, I.; Gabrielsen, M. Ligand-guided homology modelling of the GABAB2 subunit of the GABAB receptor. PLoS ONE 2017, 12, e0173889.
Schaller, D.; Hagenow, S.; Stark, H.; Wolber, G. Ligand-guided homology modeling drives identification of novel histamine H3 receptor ligands. PLoS ONE 2019, 14, e0218820.
Hameduh, T.; Haddad, Y.; Adam, V.; Heger, Z. Homology modeling in the time of collective and artificial intelligence. Comput. Struct. Biotechnol. J. 2020, 18, 3494–3506.
Bonneau, R.; Strauss, C.E.; Rohl, C.A.; Chivian, D.; Bradley, P.; Malmström, L.; Robertson, T.; Baker, D. De novo prediction of three-dimensional structures for major protein families. J. Mol. Biol. 2002, 322, 65–78.
Goodsell, D.S.; Olson, A.J. Structural Symmetry and Protein Function. Annu. Rev. Biophys. Biomol. Struct. 2000, 29, 105–153.
Anfinsen, C.B. Principles that govern the folding of protein chains. Science 1973, 181, 223–230.
Klepeis, J.L.; Floudas, C.A. ASTRO-FOLD: A Combinatorial and Global Optimization Framework for Ab Initio Prediction of Three-Dimensional Structures of Proteins from the Amino Acid Sequence. Biophys. J. 2003, 85, 2119–2146.
Subramani, A.; Wei, Y.; Floudas, C.A. ASTRO-FOLD 2.0: An Enhanced Framework for Protein Structure Prediction. AIChE J 2012, 58, 1619–1637.
Ołdziej, S.; Czaplewski, C.; Liwo, A.; Chinchio, M.; Nanias, M.; Vila, J.; Khalili, M.; Arnautova, Y.; Jagielska, A.; Makowski, M.O. Physics-based protein-structure prediction using a hierarchical protocol based on the UNRES force field: Assessment in two blind tests. Proc. Natl. Acad. Sci. USA 2005, 102, 7547–7552.
Bowie, J.U.; Eisenberg, D. An evolutionary approach to folding small alpha-helical proteins that uses sequence information and an empirical guiding fitness function. Proc. Natl. Acad. Sci. USA 1994, 91, 4436–4440.
Hart, T.N.; Read, R.J. A multiple-start Monte Carlo docking method. Proteins Struct. Funct. Bioinform. 1992, 13, 206–222.
Shim, J.; MacKerell Jr, A.D. Computational ligand-based rational design: Role of conformational sampling and force fields in model development. Medchemcomm 2011, 2, 356–370.
Alford, R.F.; Leaver-Fay, A.; Jeliazkov, J.R.; O′Meara, M.J.; DiMaio, F.P.; Park, H.; Shapovalov, M.V.; Renfrew, P.D.; Mulligan, V.K.; Kappel, K.; et al. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J. Chem. Theory Comput. 2017, 13, 3031–3048.
Roy, A.; Kucukural, A.; Zhang, Y. I-TASSER: A unified platform for automated protein structure and function prediction. Nat. Protoc. 2010, 5, 725–738.
Xu, D.; Zhang, J.; Roy, A.; Zhang, Y. Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based structure refinement. Proteins Struct. Funct. Bioinform. 2011, 79, 147–160.
Mistry, J.; Chuguransky, S.; Williams, L.; Qureshi, M.; Salazar Gustavo, A.; Sonnhammer, E.L.L.; Tosatto, S.C.E.; Paladin, L.; Raj, S.; Richardson, L.J.; et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 2021, 49, D412–D419.
Marks, D.S.; Colwell, L.J.; Sheridan, R.; Hopf, T.A.; Pagnani, A.; Zecchina, R.; Sander, C. Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 2011, 6, e28766.
Tetchner, S.; Kosciolek, T.; Jones, D.T. Opportunities and limitations in applying coevolution-derived contacts to protein structure prediction. Bio-Algorithms Med. Syst. 2014, 10, 243–254.
Xu, J. Distance-based protein folding powered by deep learning. Proc. Natl. Acad. Sci. USA 2019, 116, 16856–16865.
Uziela, K.; Menéndez Hurtado, D.; Shu, N.; Wallner, B.; Elofsson, A. ProQ3D: Improved model quality assessments using deep learning. Bioinformatics 2017, 33, 1578–1580.
Zheng, W.; Li, Y.; Zhang, C.; Zhou, X.; Pearce, R.; Bell, E.W.; Huang, X.; Zhang, Y. Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14. Proteins Struct. Funct. Bioinform. 2021, 89, 1734–1751.
Du, Z.; Su, H.; Wang, W.; Ye, L.; Wei, H.; Peng, Z.; Anishchenko, I.; Baker, D.; Yang, J. The trRosetta server for fast and accurate protein structure prediction. Nat. Protoc. 2021, 16, 5634–5651.
Baek, M.; DiMaio, F.; Anishchenko, I.; Dauparas, J.; Ovchinnikov, S.; Lee, G.R.; Wang, J.; Cong, Q.; Kinch, L.N.; Schaeffer, R.D. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871–876.
Senior, A.W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Qin, C.; Žídek, A.; Nelson, A.W.; Bridgland, A. Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13). Proteins Struct. Funct. Bioinform. 2019, 87, 1141–1148.
Zemla, A. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003, 31, 3370–3374.
Antoniak, A.; Biskupek, I.; Bojarski, K.K.; Czaplewski, C.; Giełdoń, A.; Kogut, M.; Kogut, M.M.; Krupa, P.; Lipska, A.G.; Liwo, A. Modeling protein structures with the coarse-grained UNRES force field in the CASP14 experiment. J. Mol. Graph. Model. 2021, 108, 108008.
Varadi, M.; Anyango, S.; Deshpande, M.; Nair, S.; Natassia, C.; Yordanova, G.; Yuan, D.; Stroe, O.; Wood, G.; Laydon, A. AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022, 50, D439–D444.
Hooft, R.W.; Vriend, G.; Sander, C.; Abola, E.E. Errors in protein structures. Nature 1996, 381, 272.
Ramachandran, G.T.; Sasisekharan, V. Conformation of polypeptides and proteins. In Advances in Protein Chemistry; Elsevier: Amsterdam, The Netherlands, 1968; Volume 23, pp. 283–437.
Eisenberg, D.; Lüthy, R.; Bowie, J.U. VERIFY3D: Assessment of protein models with three-dimensional profiles. In Methods in Enzymology; Elsevier: Amsterdam, The Netherlands, 1997; Volume 277, pp. 396–404.
Chen, V.B.; Arendall, W.B.; Headd, J.J.; Keedy, D.A.; Immormino, R.M.; Kapral, G.J.; Murray, L.W.; Richardson, J.S.; Richardson, D.C. MolProbity: All-atom structure validation for macromolecular crystallography. Acta Crystallogr. Sect. D Biol. Crystallogr. 2010, 66, 12–21.
Williams, C.J.; Headd, J.J.; Moriarty, N.W.; Prisant, M.G.; Videau, L.L.; Deis, L.N.; Verma, V.; Keedy, D.A.; Hintze, B.J.; Chen, V.B. MolProbity: More and better reference data for improved all-atom structure validation. Protein Sci. 2018, 27, 293–315.
Weichenberger, C.X.; Sippl, M.J. NQ-Flipper: Recognition and correction of erroneous asparagine and glutamine side-chain rotamers in protein structures. Nucleic Acids Res. 2007, 35 (Suppl. S2), W403–W406.
Rochira, W.; Agirre, J. Iris: Interactive all-in-one graphical validation of 3D protein model iterations. Protein Sci. 2021, 30, 93–107.
Bienert, S.; Waterhouse, A.; De Beer, T.A.; Tauriello, G.; Studer, G.; Bordoli, L.; Schwede, T. The SWISS-MODEL Repository—New features and functionality. Nucleic Acids Res. 2017, 45, D313–D319.
Bond, P.S.; Wilson, K.S.; Cowtan, K.D. Predicting protein model correctness in Coot using machine learning. Acta Crystallogr. Sect. D Struct. Biol. 2020, 76, 713–723.
Emsley, P.; Lohkamp, B.; Scott, W.G.; Cowtan, K. Features and development of Coot. Acta Crystallogr. Sect. D Biol. Crystallogr. 2010, 66, 486–501.
Emsley, P.; Cowtan, K. Coot: Model-building tools for molecular graphics. Acta Crystallogr. Sect. D Biol. Crystallogr. 2004, 60, 2126–2132.
O’Reilly, F.J.; Rappsilber, J. Cross-linking mass spectrometry: Methods and applications in structural, molecular and systems biology. Nat. Struct. Mol. Biol. 2018, 25, 1000–1008.
Liu, Y.L.; Lindert, S.; Zhu, W.; Wang, K.; McCammon, J.A.; Oldfield, E. Taxodione and arenarone inhibit farnesyl diphosphate synthase by binding to the isopentenyl diphosphate site. Proc. Natl. Acad. Sci. USA 2014, 111, E2530-9.
Trott, O.; Olson, A.J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010, 31, 455–461.
Verdonk, M.L.; Cole, J.C.; Hartshorn, M.J.; Murray, C.W.; Taylor, R.D. Improved protein–ligand docking using GOLD. Proteins Struct. Funct. Bioinform. 2003, 52, 609–623.
Friesner, R.A.; Banks, J.L.; Murphy, R.B.; Halgren, T.A.; Klicic, J.J.; Mainz, D.T.; Repasky, M.P.; Knoll, E.H.; Shelley, M.; Perry, J.K. Glide: A new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 2004, 47, 1739–1749.
Halgren, T.A.; Murphy, R.B.; Friesner, R.A.; Beard, H.S.; Frye, L.L.; Pollard, W.T.; Banks, J.L. Glide: A new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J. Med. Chem. 2004, 47, 1750–1759.
Grosdidier, A.; Zoete, V.; Michielin, O. SwissDock, a protein-small molecule docking web service based on EADock DSS. Nucleic Acids Res. 2011, 39 (Suppl. S2), W270–W277.
Santos, K.B.; Guedes, I.A.; Karl, A.L.; Dardenne, L.E. Highly flexible ligand docking: Benchmarking of the DockThor program on the LEADS-PEP protein–peptide data set. J. Chem. Inf. Model. 2020, 60, 667–683.
Liu, Y.; Grimm, M.; Dai, W.-t.; Hou, M.-c.; Xiao, Z.-X.; Cao, Y. CB-Dock: A web server for cavity detection-guided protein–ligand blind docking. Acta Pharmacol. Sin. 2020, 41, 138–144.
Chemical Computing Group Inc. Molecular Operating Environment (MOE); Chemical Computing Group Inc.: Montreal, QC, Canada, 2022.
Sastry, G.M.; Adzhigirey, M.; Day, T.; Annabhimoju, R.; Sherman, W. Protein and ligand preparation: Parameters, protocols, and influence on virtual screening enrichments. J. Comput. Aided Mol. Des. 2013, 27, 221–234.
Lengauer, T.; Rarey, M. Computational methods for biomolecular docking. Curr. Opin. Struct. Biol. 1996, 6, 402–406.
Fischer, E. Einfluss der Configuration auf die Wirkung der Enzyme. Ber. Der Dtsch. Chem. Ges. 1894, 27, 2985–2993.
López, G.; Valencia, A.; Tress, M.L. firestar—Prediction of functionally important residues using structural templates and alignment reliability. Nucleic Acids Res. 2007, 35 (Suppl. S2), W573–W577.
Wass, M.N.; Kelley, L.A.; Sternberg, M.J. 3DLigandSite: Predicting ligand-binding sites using similar structures. Nucleic Acids Res. 2010, 38 (Suppl. S2), W469–W473.
Toti, D.; Viet Hung, L.; Tortosa, V.; Brandi, V.; Polticelli, F. LIBRA-WA: A web application for ligand binding site detection and protein function recognition. Bioinformatics 2018, 34, 878–880.
Viet Hung, L.; Caprari, S.; Bizai, M.; Toti, D.; Polticelli, F. Libra: Ligand binding site recognition application. Bioinformatics 2015, 31, 4020–4022.
Laskowski, R.A. SURFNET: A program for visualizing molecular surfaces, cavities, and intermolecular interactions. J. Mol. Graph. 1995, 13, 323–330.
Halgren, T. New method for fast and accurate binding-site identification and analysis. Chem. Biol. Drug Des. 2007, 69, 146–148.
Halgren, T.A. Identifying and characterizing binding sites and assessing druggability. J. Chem. Inf. Model. 2009, 49, 377–389.
Brenke, R.; Kozakov, D.; Chuang, G.-Y.; Beglov, D.; Hall, D.; Landon, M.R.; Mattos, C.; Vajda, S. Fragment-based identification of druggable ‘hot spots’ of proteins using Fourier domain correlation techniques. Bioinformatics 2009, 25, 621–627.
Laurie, A.T.; Jackson, R.M. Q-SiteFinder: An energy-based method for the prediction of protein–ligand binding sites. Bioinformatics 2005, 21, 1908–1916.
Capra, J.A.; Laskowski, R.A.; Thornton, J.M.; Singh, M.; Funkhouser, T.A. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput. Biol. 2009, 5, e1000585.
Lu, C.; Liu, Z.; Zhang, E.; He, F.; Ma, Z.; Wang, H. MPLs-Pred: Predicting membrane protein-ligand binding sites using hybrid sequence-based features and ligand-specific models. Int. J. Mol. Sci. 2019, 20, 3120.
Jiménez, J.; Doerr, S.; Martínez-Rosell, G.; Rose, A.S.; De Fabritiis, G. DeepSite: Protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics 2017, 33, 3036–3042.
Stepniewska-Dziubinska, M.M.; Zielenkiewicz, P.; Siedlecki, P. Improving detection of protein-ligand binding sites with 3D segmentation. Sci. Rep. 2020, 10, 5035.
Cui, Y.; Dong, Q.; Hong, D.; Wang, X. Predicting protein-ligand binding residues with deep convolutional neural networks. BMC Bioinform. 2019, 20, 93.
Vajda, S.; Beglov, D.; Wakefield, A.E.; Egbert, M.; Whitty, A. Cryptic binding sites on proteins: Definition, detection, and druggability. Curr. Opin. Chem. Biol. 2018, 44, 1–8.
Cimermancic, P.; Weinkam, P.; Rettenmaier, T.J.; Bichmann, L.; Keedy, D.A.; Woldeyes, R.A.; Schneidman-Duhovny, D.; Demerdash, O.N.; Mitchell, J.C.; Wells, J.A. CryptoSite: Expanding the druggable proteome by characterization and prediction of cryptic binding sites. J. Mol. Biol. 2016, 428, 709–719.
Cheng, A.C.; Coleman, R.G.; Smyth, K.T.; Cao, Q.; Soulard, P.; Caffrey, D.R.; Salzberg, A.C.; Huang, E.S. Structure-based maximal affinity model predicts small-molecule druggability. Nat. Biotechnol. 2007, 25, 71–75.
Finan, C.; Gaulton, A.; Kruger, F.A.; Lumbers, R.T.; Shah, T.; Engmann, J.; Galver, L.; Kelley, R.; Karlsson, A.; Santos, R. The druggable genome and support for target identification and validation in drug development. Sci. Transl. Med. 2017, 9, eaag1166.
Liao, J.; Wang, Q.; Wu, F.; Huang, Z. In Silico Methods for Identification of Potential Active Sites of Therapeutic Targets. Molecules 2022, 27, 7103.
Schmidtke, P.; Barril, X. Understanding and predicting druggability. A high-throughput method for detection of drug binding sites. J. Med. Chem. 2010, 53, 5858–5867.
Sheridan, R.P.; Maiorov, V.N.; Holloway, M.K.; Cornell, W.D.; Gao, Y.-D. Drug-like density: A method of quantifying the “bindability” of a protein target based on a very large set of pockets and drug-like ligands from the Protein Data Bank. J. Chem. Inf. Model. 2010, 50, 2029–2040.
Krasowski, A.; Muthas, D.; Sarkar, A.; Schmitt, S.; Brenk, R. DrugPred: A structure-based approach to predict protein druggability developed using an extensive nonredundant data set. J. Chem. Inf. Model. 2011, 51, 2829–2842.
Volkamer, A.; Kuhn, D.; Rippmann, F.; Rarey, M. DoGSiteScorer: A web server for automatic binding site prediction, analysis and druggability assessment. Bioinformatics 2012, 28, 2074–2075.
Ngan, C.H.; Bohnuud, T.; Mottarella, S.E.; Beglov, D.; Villar, E.A.; Hall, D.R.; Kozakov, D.; Vajda, S. FTMAP: Extended protein mapping with user-selected probe molecules. Nucleic Acids Res. 2012, 40, W271–W275.
Borrel, A.; Regad, L.; Xhaard, H.; Petitjean, M.; Camproux, A.-C. PockDrug: A model for predicting pocket druggability that overcomes pocket estimation uncertainties. J. Chem. Inf. Model. 2015, 55, 882–895.
Volkamer, A.; Griewel, A.; Grombacher, T.; Rarey, M. Analyzing the topology of active sites: On the prediction of pockets and subpockets. J. Chem. Inf. Model. 2010, 50, 2041–2052.
Volkamer, A.; Kuhn, D.; Grombacher, T.; Rippmann, F.; Rarey, M. Combining global and local measures for structure-based druggability predictions. J. Chem. Inf. Model. 2012, 52, 360–372.
Michel, M.; Homan, E.J.; Wiita, E.; Pedersen, K.; Almlöf, I.; Gustavsson, A.-L.; Lundbäck, T.; Helleday, T.; Warpman Berglund, U. In silico druggability assessment of the NUDIX hydrolase protein family as a workflow for target prioritization. Front. Chem. 2020, 8, 443.
Doñate-Macian, P.; Duarte, Y.; Rubio-Moscardo, F.; Pérez-Vilaró, G.; Canan, J.; Díez, J.; González-Nilo, F.; Valverde, M.A. Structural determinants of TRPV4 inhibition and identification of new antagonists with antiviral activity. Br. J. Pharmacol. 2022, 179, 3576–3591.
Irwin, J.J.; Tang, K.G.; Young, J.; Dandarchuluun, C.; Wong, B.R.; Khurelbaatar, M.; Moroz, Y.S.; Mayfield, J.; Sayle, R.A. ZINC20—A free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 2020, 60, 6065–6073.
Wishart, D.S.; Feunang, Y.D.; Guo, A.C.; Lo, E.J.; Marcu, A.; Grant, J.R.; Sajed, T.; Johnson, D.; Li, C.; Sayeeda, Z. DrugBank 5.0: A major update to the DrugBank database for 2018. Nucleic Acids Res. 2018, 46, D1074–D1082.
Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B. PubChem in 2021: New data content and improved web interfaces. Nucleic Acids Res. 2021, 49, D1388–D1395.
Beusen, D.D.; Shands, E.B.; Karasek, S.; Marshall, G.R.; Dammkoehler, R.A. Systematic search in conformational analysis. J. Mol. Struct. THEOCHEM 1996, 370, 157–171.
Smellie, A.; Stanton, R.; Henne, R.; Teig, S. Conformational analysis by intersection: CONAN. J. Comput. Chem. 2003, 24, 10–20.
Hawkins, P.C.D. Conformation Generation: The State of the Art. J. Chem. Inf. Model. 2017, 57, 1747–1756.
Hawkins, P.C.; Skillman, A.G.; Warren, G.L.; Ellingson, B.A.; Stahl, M.T. Conformer generation with OMEGA: Algorithm and validation using high quality structures from the Protein Databank and Cambridge Structural Database. J. Chem. Inf. Model. 2010, 50, 572–584.
Watts, K.S.; Dalal, P.; Murphy, R.B.; Sherman, W.; Friesner, R.A.; Shelley, J.C. ConfGen: A Conformational Search Method for Efficient Generation of Bioactive Conformers. J. Chem. Inf. Model. 2010, 50, 534–546.
Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953, 21, 1087–1092.
Spellmeyer, D.C.; Wong, A.K.; Bower, M.J.; Blaney, J.M. Conformational analysis using distance geometry methods. J. Mol. Graph. Model. 1997, 15, 18–36.
Vainio, M.J.; Johnson, M.S. Generating conformer ensembles using a multiobjective genetic algorithm. J. Chem. Inf. Model. 2007, 47, 2462–2474.
Jones, G.; Willett, P.; Glen, R.C.; Leach, A.R.; Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 1997, 267, 727–748.
Sisquellas, M.; Cecchini, M. PrepFlow: A Toolkit for Chemical Library Preparation and Management for Virtual Screening. Mol. Inform. 2021, 40, 2100139.
Gally, J.-M.; Bourg, S.; Fogha, J.; Do, Q.-T.; Aci-Sèche, S.; Bonnet, P. VSPrep: A KNIME workflow for the preparation of molecular databases for virtual screening. Curr. Med. Chem. 2020, 27, 6480–6494.
Ropp, P.J.; Spiegel, J.O.; Walker, J.L.; Green, H.; Morales, G.A.; Milliken, K.A.; Ringe, J.J.; Durrant, J.D. Gypsum-DL: An open-source program for preparing small-molecule libraries for structure-based virtual screening. J. Cheminformatics 2019, 11, 34.
Miteva, M.A.; Guyon, F.; Tufféry, P. Frog2: Efficient 3D conformation ensemble generator for small compounds. Nucleic Acids Res. 2010, 38 (Suppl. S2), W622–W627.
Sommer, K.; Friedrich, N.-O.; Bietz, S.; Hilbig, M.; Inhester, T.; Rarey, M. UNICON: A Powerful and Easy-to-Use Compound Library Converter; ACS Publications: Washington, DC, USA, 2016.
Cozzini, P.; Kellogg, G.E.; Spyrakis, F.; Abraham, D.J.; Costantino, G.; Emerson, A.; Fanelli, F.; Gohlke, H.; Kuhn, L.A.; Morris, G.M. Target flexibility: An emerging consideration in drug discovery and design. J. Med. Chem. 2008, 51, 6237–6255.
Palma, P.N.; Krippahl, L.; Wampler, J.E.; Moura, J.J. BiGGER: A new (soft) docking algorithm for predicting protein interactions. Proteins Struct. Funct. Bioinform. 2000, 39, 372–384.
Jiang, F.; Kim, S.-H. “Soft docking”: Matching of molecular surface cubes. J. Mol. Biol. 1991, 219, 79–102.
Dominguez, C.; Boelens, R.; Bonvin, A.M. HADDOCK: A protein− protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 2003, 125, 1731–1737.
Apostolakis, J.; Plückthun, A.; Caflisch, A. Docking small ligands in flexible binding sites. J. Comput. Chem. 1998, 19, 21–37.
Knegtel, R.M.; Kuntz, I.D.; Oshiro, C. Molecular docking to ensembles of protein structures. J. Mol. Biol. 1997, 266, 424–440.
Motta, S.; Bonati, L. Modeling Binding with Large Conformational Changes: Key Points in Ensemble-Docking Approaches. J. Chem. Inf. Model. 2017, 57, 1563–1578.
Leach, A.R. Ligand docking to proteins with discrete side-chain flexibility. J. Mol. Biol. 1994, 235, 345–356.
Huang, S.-Y.; Zou, X. Advances and challenges in protein-ligand docking. Int J Mol Sci 2010, 11, 3016–3034.
Davis, I.W.; Baker, D. RosettaLigand docking with full ligand and receptor flexibility. J. Mol. Biol. 2009, 385, 381–392.
Miao, Y.; McCammon, J.A. G-protein coupled receptors: Advances in simulation and drug discovery. Curr. Opin. Struct. Biol. 2016, 41, 83–89.
Huang, S.Y.; Zou, X. Ensemble docking of multiple protein structures: Considering protein structural variations in molecular docking. Proteins Struct. Funct. Bioinform. 2007, 66, 399–421.
Jacobson, M.P.; Friesner, R.A.; Xiang, Z.; Honig, B. On the role of the crystal environment in determining protein side-chain conformations. J. Mol. Biol. 2002, 320, 597–608.
Jacobson, M.P.; Pincus, D.L.; Rapp, C.S.; Day, T.J.; Honig, B.; Shaw, D.E.; Friesner, R.A. A hierarchical approach to all-atom protein loop prediction. Proteins Struct. Funct. Bioinform. 2004, 55, 351–367.
Sherman, W.; Day, T.; Jacobson, M.P.; Friesner, R.A.; Farid, R. Novel procedure for modeling ligand/receptor induced fit effects. J. Med. Chem. 2006, 49, 534–553.
Maurer, M.; Oostenbrink, C. Water in protein hydration and ligand recognition. J. Mol. Recognit. 2019, 32, e2810.
Davis, A.M.; St-Gallay, S.A.; Kleywegt, G.J. Limitations and lessons in the use of X-ray structural information in drug design. Drug Discov. Today 2008, 13, 831.
Renaud, J.-P.; Chari, A.; Ciferri, C.; Liu, W.-t.; Rémigy, H.-W.; Stark, H.; Wiesmann, C. Cryo-EM in drug discovery: Achievements, limitations and prospects. Nat. Rev. Drug Discov. 2018, 17, 471–492.
Roux, B.; Simonson, T. Implicit solvent models. Biophys. Chem. 1999, 78, 1–20.
Kleinjung, J.; Fraternali, F. Design and application of implicit solvent models in biomolecular simulations. Curr. Opin. Struct. Biol. 2014, 25, 126–134.
Raymer, M.L.; Sanschagrin, P.C.; Punch, W.F.; Venkataraman, S.; Goodman, E.D.; Kuhn, L.A. Predicting conserved water-mediated and polar ligand interactions in proteins using a K-nearest-neighbors genetic algorithm. J. Mol. Biol. 1997, 265, 445–464.
García-Sosa, A.T.; Mancera, R.L.; Dean, P.M. WaterScore: A novel method for distinguishing between bound and displaceable water molecules in the crystal structure of the binding site of protein-ligand complexes. J. Mol. Model. 2003, 9, 172–182.
Wade, R.C.; Clark, K.J.; Goodford, P.J. Further development of hydrogen bond functions for use in determining energetically favorable binding sites on molecules of known structure. 1. Ligand probe groups with the ability to form two hydrogen bonds. J. Med. Chem. 1993, 36, 140–147.
Wade, R.C.; Goodford, P.J. Further development of hydrogen bond functions for use in determining energetically favorable binding sites on molecules of known structure. 2. Ligand probe groups with the ability to form more than two hydrogen bonds. J. Med. Chem. 1993, 36, 148–156.
Kovalenko, A.; Hirata, F. Self-consistent description of a metal–water interface by the Kohn–Sham density functional theory and the three-dimensional reference interaction site model. J. Chem. Phys. 1999, 110, 10095–10112.
Kovalenko, A.; Hirata, F. Three-dimensional density profiles of water in contact with a solute of arbitrary shape: A RISM approach. Chem. Phys. Lett. 1998, 290, 237–244.
SZMAP, version 1.6.4.1; OpenEye Scientific Software: Santa Fe, NM, USA, 2013.
Wang, L.; Berne, B.; Friesner, R. Ligand binding to protein-binding pockets with wet and dry regions. Proc. Natl. Acad. Sci. USA 2011, 108, 1326–1330.
Nguyen, C.N.; Kurtzman Young, T.; Gilson, M.K. Grid inhomogeneous solvation theory: Hydration structure and thermodynamics of the miniature receptor cucurbit uril. J. Chem. Phys. 2012, 137, 044101.
Michel, J.; Tirado-Rives, J.; Jorgensen, W.L. Prediction of the water content in protein binding sites. J. Phys. Chem. B 2009, 113, 13337–13346.
Meng, E.C.; Shoichet, B.K.; Kuntz, I.D. Automated docking with grid-based energy evaluation. J. Comput. Chem. 1992, 13, 505–524.
Huang, N.; Kalyanaraman, C.; Bernacki, K.; Jacobson, M.P. Molecular mechanics methods for predicting protein–ligand binding. Phys. Chem. Chem. Phys. 2006, 8, 5166–5177.
Weiner, S.J.; Kollman, P.A.; Case, D.A.; Singh, U.C.; Ghio, C.; Alagona, G.; Profeta, S.; Weiner, P. A new force field for molecular mechanical simulation of nucleic acids and proteins. J. Am. Chem. Soc. 1984, 106, 765–784.
Weiner, S.J.; Kollman, P.A.; Nguyen, D.T.; Case, D.A. An all atom force field for simulations of proteins and nucleic acids. J. Comput. Chem. 1986, 7, 230–252.
Böhm, H.J. The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure. J. Comput. -Aided Mol. Des. 1994, 8, 243–256.
Eldridge, M.D.; Murray, C.W.; Auton, T.R.; Paolini, G.V.; Mee, R.P. Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J. Comput. Aided Mol. Des. 1997, 11, 425–445.
Sippl, M.J. Calculation of conformational ensembles from potentials of mena force: An approach to the knowledge-based prediction of local structures in globular proteins. J. Mol. Biol. 1990, 213, 859–883.
Allen, F.H. The Cambridge Structural Database: A quarter of a million crystal structures and rising. Acta Crystallogr. Sect. B Struct. Sci. 2002, 58, 380–388.
Thomas, P.D.; Dill, K.A. An iterative method for extracting energy-like quantities from protein structures. Proc. Natl. Acad. Sci. USA 1996, 93, 11628–11633.
Thomas, P.D.; Dill, K.A. Statistical potentials extracted from protein structures: How accurate are they? J. Mol. Biol. 1996, 257, 457–469.
Friesner, R.A.; Murphy, R.B.; Repasky, M.P.; Frye, L.L.; Greenwood, J.R.; Halgren, T.A.; Sanschagrin, P.C.; Mainz, D.T. Extra Precision Glide: Docking and Scoring Incorporating a Model of Hydrophobic Enclosure for Protein−Ligand Complexes. J. Med. Chem. 2006, 49, 6177–6196.
Ravindranathan, K.P.; Mandiyan, V.; Ekkati, A.R.; Bae, J.H.; Schlessinger, J.; Jorgensen, W.L. Discovery of Novel Fibroblast Growth Factor Receptor 1 Kinase Inhibitors by Structure-Based Virtual Screening. J. Med. Chem. 2010, 53, 1662–1672.
Khair, N.Z.; Lenjisa, J.L.; Tadesse, S.; Kumarasiri, M.; Basnet, S.K.C.; Mekonnen, L.B.; Li, M.; Diab, S.; Sykes, M.J.; Albrecht, H.; et al. Discovery of CDK5 Inhibitors through Structure-Guided Approach. Acs Med. Chem. Lett. 2019, 10, 786–791.
Ding, K.; Lu, Y.; Nikolovska-Coleska, Z.; Qiu, S.; Ding, Y.; Gao, W.; Stuckey, J.; Krajewski, K.; Roller, P.P.; Tomita, Y.; et al. Structure-Based Design of Potent Non-Peptide MDM2 Inhibitors. J. Am. Chem. Soc. 2005, 127, 10130–10131.
Morris, G.M.; Huey, R.; Lindstrom, W.; Sanner, M.F.; Belew, R.K.; Goodsell, D.S.; Olson, A.J. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J. Comput. Chem. 2009, 30, 2785–2791.
Lu, P.; Liu, X.; Yuan, X.; He, M.; Wang, Y.; Zhang, Q.; Ouyang, P. Discovery of a novel NEDD8 Activating Enzyme Inhibitor with Piperidin-4-amine Scaffold by Structure-Based Virtual Screening. ACS Chem. Biol. 2016, 11, 1901–1907.
Allen, W.J.; Balius, T.E.; Mukherjee, S.; Brozell, S.R.; Moustakas, D.T.; Lang, P.T.; Case, D.A.; Kuntz, I.D.; Rizzo, R.C. DOCK 6: Impact of new features and current docking performance. J. Comput. Chem. 2015, 36, 1132–1156.
Liu, S.; Yosief, H.O.; Dai, L.; Huang, H.; Dhawan, G.; Zhang, X.; Muthengi, A.M.; Roberts, J.; Buckley, D.L.; Perry, J.A.; et al. Structure-Guided Design and Development of Potent and Selective Dual Bromodomain 4 (BRD4)/Polo-like Kinase 1 (PLK1) Inhibitors. J. Med. Chem. 2018, 61, 7785–7795.
Neves, M.A.; Totrov, M.; Abagyan, R. Docking and scoring with ICM: The benchmarking results and strategies for improvement. J. Comput.-Aided Mol. Des. 2012, 26, 675–686.
Schapira, M.; Raaka, B.M.; Samuels, H.H.; Abagyan, R. In silico discovery of novel retinoic acid receptor agonist structures. Bmc Struct. Biol. 2001, 1, 1–7.
Nicola, G.; Smith, C.A.; Lucumi, E.; Kuo, M.R.; Karagyozov, L.; Fidock, D.A.; Sacchettini, J.C.; Abagyan, R. Discovery of novel inhibitors targeting enoyl–acyl carrier protein reductase in Plasmodium falciparum by structure-based virtual screening. Biochem. Biophys. Res. Commun. 2007, 358, 686–691.
Cleves, A.E.; Jain, A.N. ForceGen 3D structure and conformer generation: From small lead-like molecules to macrocyclic drugs. J. Comput. -Aided Mol. Des. 2017, 31, 419–439.
Jain, A.N. Surflex: Fully automatic flexible molecular docking using a molecular similarity-based search engine. J. Med. Chem. 2003, 46, 499–511.
Agnihotri, P.; Mishra, A.K.; Mishra, S.; Sirohi, V.K.; Sahasrabuddhe, A.A.; Pratap, J.V. Identification of Novel Inhibitors of Leishmania donovani γ-Glutamylcysteine Synthetase Using Structure-Based Virtual Screening, Docking, Molecular Dynamics Simulation, and in Vitro Studies. J. Chem. Inf. Model. 2017, 57, 815–825.
Corbeil, C.R.; Williams, C.I.; Labute, P. Variability in docking success rates due to dataset preparation. J. Comput. Aided Mol. Des. 2012, 26, 775–786.
Ye, W.L.; Shen, C.; Xiong, G.L.; Ding, J.J.; Lu, A.-P.; Hou, T.J.; Cao, D.S. Improving docking-based virtual screening ability by integrating multiple energy auxiliary terms from molecular docking scoring. J. Chem. Inf. Model. 2020, 60, 4216–4230.
Chen, I.J.; Foloppe, N. Conformational sampling of druglike molecules with MOE and catalyst: Implications for pharmacophore modeling and virtual screening. J. Chem. Inf. Model. 2008, 48, 1773–1791.
Geldenhuys, W.J.; Darvesh, A.S.; Funk, M.O.; Van der Schyf, C.J.; Carroll, R.T. Identification of novel monoamine oxidase B inhibitors by structure-based virtual screening. Bioorganic Med. Chem. Lett. 2010, 20, 5295–5298.
Foloppe, N.; Fisher, L.M.; Howes, R.; Potter, A.; Robertson, A.G.S.; Surgenor, A.E. Identification of chemically diverse Chk1 inhibitors by receptor-based virtual screening. Bioorganic Med. Chem. 2006, 14, 4792–4802.
Rarey, M.; Kramer, B.; Lengauer, T.; Klebe, G. A Fast Flexible Docking Method using an Incremental Construction Algorithm. J. Mol. Biol. 1996, 261, 470–489.
Kramer, B.; Rarey, M.; Lengauer, T. Evaluation of the FLEXX incremental construction algorithm for protein–ligand docking. Proteins Struct. Funct. Bioinform. 1999, 37, 228–241.
Forino, M.; Jung, D.; Easton, J.B.; Houghton, P.J.; Pellecchia, M. Virtual docking approaches to protein kinase B inhibition. J. Med. Chem. 2005, 48, 2278–2281.
Krier, M.; de Araújo-Júnior, J.X.; Schmitt, M.; Duranton, J.; Justiano-Basaran, H.; Lugnier, C.; Bourguignon, J.-J.; Rognan, D. Design of small-sized libraries by combinatorial assembly of linkers and functional groups to a given scaffold: Application to the structure-based optimization of a phosphodiesterase 4 inhibitor. J. Med. Chem. 2005, 48, 3816–3822.
McGann, M. FRED and HYBRID docking performance on standardized datasets. J. Comput. -Aided Mol. Des. 2012, 26, 897–906.
McGann, M. FRED pose prediction and virtual screening accuracy. J. Chem. Inf. Model. 2011, 51, 578–596.
Brus, B.; Košak, U.; Turk, S.; Pišlar, A.; Coquelle, N.; Kos, J.; Stojan, J.; Colletier, J.-P.; Gobec, S. Discovery, Biological Evaluation, and Crystal Structure of a Novel Nanomolar Selective Butyrylcholinesterase Inhibitor. J. Med. Chem. 2014, 57, 8167–8179.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Others

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Yiqun Chang

, Bryson A. Hawkins ,

Jonathan J. Du

, Paul W. Groundwater , David E. Hibbs , Felcia Lai

View Times: 3.0K

Update Date: 09 Jan 2023

Table of Contents

Notice

You are not a member of the advisory board for this topic. If you want to update advisory board member profile, please contact office@encyclopedia.pub.

Confirm

Only members of the Encyclopedia advisory board for this topic are allowed to note entries. Would you like to become an advisory board member of the Encyclopedia?

Yes

${ textCharacter }/${ maxCharacter }

Submit

Cancel

There is no comment~

${ textCharacter }/${ maxCharacter }

Submit

Cancel

${ selectedItem.replyTextCharacter }/${ selectedItem.replyMaxCharacter }

Submit

Cancel

Confirm

Are you sure to Delete?

Yes No