2. Computer Simulations for Rational Drug Design
For most of its existence, the human genre has exploited natural products such as leaves, seeds, roots, bark, and flowers as medicines, based on empirical observations purely based on symptom relief
[36][37].
Nevertheless, throughout the latest two centuries, the process of drug discovery has evolved rapidly from the serendipitous discovery of novel active principles derived from or inspired by natural compounds
[38][39] to the rational design of brand-new chemical entities
[40].
The major turning point in the history of modern drug discovery can be traced back to the 1980s when experimentally solved macromolecular structures become routinely available
[41]. The enhanced accessibility of structural data about biological targets is reflected in a rapid interest in the development of computational methods that could valorize this information and aid medicinal chemists’ work
[42].
Today, computer simulations are a staple point of drug discovery campaigns, thanks to their ability to streamline and reduce their attrition rate
[43]. From a functional perspective, computer-aided drug discovery (CADD) techniques are employed in the earliest stages of the pipeline for hit identification, hit-to-lead optimization, and pharmacokinetic evaluations
[44].
CADD methodologies can either fall into one of two subgroups, based on the rationale behind them: the first group is represented by ligand-based (LBDD) approaches, while the second one includes structure-based (SBDD) methods
[45]. The main difference between these two orthogonal and complementary approaches is that the first one does not exploit any information about the target macromolecule structure (e.g., a protein or a nucleic acid), while the second one does
[46].
Nowadays, with the advent of cryo-electron microscopy (cryo-EM)
[47] and groundbreaking tools for de novo prediction of protein structures such as AlphaFold
[48], the second approach has become the gold standard
[4].
2.1. CADD Strategies against COVID-19
The starting point of every SBDD campaign is the identification of a target macromolecule (a protein or a nucleic acid) that is involved in the etiology and or pathogenesis of a disease of interest, whose function can be opportunely modulated through a specifically designed ligand, usually a small organic molecule
[4].
Once the target has been identified, its structure must be retrieved, either through experimental methods such as X-Ray crystallography (XRC, the gold standard)
[49], nuclear magnetic resonance (NMR)
[50], and cryo-EM
[51] or hypothesized through homology modeling or de novo prediction
[52].
Homology modeling involves using a homologous protein with a high primary sequence identity with the target as a template for constructing its three-dimensional model
[53][54]. De novo prediction, instead, does not rely on any information about other proteins’ structures and outputs a structural hypothesis that is solely based on the primary sequence of the target of interest
[55].
While the second approach has gained a lot of momentum during the last two years, thanks to its unprecedentedly high accuracy
[56][57], the first one is still relevant in those cases where important structural rearrangements occur between different states of the target functional cycle, other than predicting ligand-bound conformations
[58][59].
In the context of the COVID-19 pandemic, where the extraordinary effort promoted by the scientific community quickly made several experimentally determined structures available, the relevance of structural modeling was highlighted by the ability to keep up with the high mutation rate of the virus
[60][61], other than providing a useful starting point for drug discovery campaigns for a target whose structure had yet to be elucidated
[62][63]. For example, several studies were conducted to investigate the impact of mutations found in both the spike protein
[60][64][65][66][67][68] and the main protease
[32][60][69][70] of emerging strains on viral fitness and resistance to existing therapies. These studies showed that relatively inexpensive approaches such as homology modeling and positional scanning could be reliable tools to rationalize the origin of the virus
[69][71][72][73], quickly track the evolution of the original strain
[60][74][75], predict the impact of future possible mutations
[65][67] and adjust existing therapeutics tools accordingly
[32][76].
The huge amount of structural information available on several SARS-CoV-2 druggable targets was fertile terrain for various COVID-19 SBDD campaigns
[77][78], both in academia and in industry, with the most effort aimed at hitting well-characterized and pivotal viral targets such as M
pro or spike
[79][80].
A remarkable example is represented by the COVID Moonshot Consortium, a drug discovery campaign driven by a collaborative effort among different research groups across the world aimed at targeting the SARS-CoV-2 main protease. This project led to the advancement of novel noncovalent orally available nanomolar M
pro inhibitors to clinical stage experimentation
[81].
2.2. The Swiss Knife of SBDD: Molecular Docking
Within every SBDD campaign, available information about the target structure is exploited to fetch molecules able to recognize it selectively and potently
[82]. Usually, this involves the identification of molecules that have good steric and electrostatic complementarity with the active site
[83]. Depending on the steric and volumetric features of the binding site, the ligand type can be chosen accordingly, with small organic molecules being a better solution for buried cavities
[84] and peptides, aptamers, or antibodies a better one for larger, flatter, and solvent-exposed interaction surfaces
[85].
To narrow down the list of potentially active molecules to experimentally test to a feasible number, and to avoid wasting resources on compounds that do not possess the appropriate features to interact with the target, most SBDD campaigns start with a virtual screening process (SBVS)
[86]. The most widely and successfully adopted method for SBVS is molecular docking, a computational protocol developed in the 1980s by Kuntz et al.
[87] for predicting the preferred orientation of a certain ligand within the active site of a receptor
[88].
Each docking program has two major components, which cooperate to find the solution to the protein–ligand docking problem
[89]. The first part is the search algorithm (SA), which explores the ligand degrees of freedom within a user-defined search space centered around the active site of the protein
[90]. The SA generates several ligand conformations (poses) that are fed to the second element of the program, i.e., the scoring function (SF), which qualitatively evaluates subsisting protein–ligand interaction features
[91].
In the context of the COVID-19 pandemic, docking was also the king of computational methods used for drug discovery, thanks to the combination of its accuracy
[92] and rapidity, which allows it to virtually screen billions of compounds in just a few days
[93][94][95].
For example, Corona et al. reported the discovery of four low micromolar nsp13 inhibitors through a virtual screening carried out with the LiGen
[96] docking program on an in-house natural compounds library
[97].
Kolarič et al. identified two micromolar SARS-CoV-2 cell-entry inhibitors that act by binding human neuropilin-1 (nrp-1) and preventing its interaction with the spike protein, by performing a virtual screening with the GOLD
[98] program on a library of commercially available compounds
[99].
Vatansever et al. performed a virtual screening based on the Autodock
[100] program on a library of drugs approved by the Food and Drug Administration and by the European Medical Agency (EMA) to discover six micromolar M
pro inhibitors
[101].
Kao et al. reported the discovery of three sub-micromolar, synergistic nsp1 inhibitors identified through two independently executed virtual screenings with ICM
[102][103] and Vina
[104] software on a library of FDA-approved drugs
[105].
Zhang et al. identified 11 natural compound M
pro inhibitors active in the low micromolar range through a virtual screening purely based on the commercial software Glide
[106], developed by Schrödinger
[107]. Another strategic use of docking-based virtual screening based on the Glide program is portrayed by the work of Huff et al., which designed six mixed covalent and noncovalent nanomolar M
pro inhibitors
[108]. Another Glide-based virtual screening performed by Liu et al. led to the repurposing of histone deacetylase (HDAC) inhibitors as SARS-CoV-2 cell entry inhibitors through allosteric modulation of ACE2 and alteration of its ability to recognize the spike protein
[109].
Wang et al. used LibDock
[110] to perform a virtual screening on a library composed of FDA-approved peptides. This led to the identification of a nanomolar SARS-CoV-2 cell entry inhibitor that exerts its effect by binding to the human ACE2 receptor
[111].
A remarkable result was obtained by Luttens et al., which identified eight M
pro inhibitors (including a nanomolar compound with pan coronaviral activity) by combining fragment-based drug design with ultra-large virtual screening based on the DOCK
[87] program
[112].
Welker et al. exploited the molecular docking pipeline of the LeadIT
[113] program to repurpose previously identified SARS-CoV PL
pro inhibitors towards its SARS-CoV-2 homolog, demonstrating their activity on viral replication in cell-based assays
[114].
Otava et al. utilized docking calculations with the GOLD
[98] software to rationalize the structure–activity relationship of a series of rationally designed S-adenosyl-L-homocysteine derivatives, some of which showed inhibitory activity towards SARS-CoV-2 nsp14 in the low nanomolar potency range
[115].
Similarly, Wang et al. exploited docking with Vina to rationalize the SAR of a series of rationally designed phenanthridine nucleocapsid protein (NPro) inhibitors, including two compounds showing low micromolar inhibitory activity
[116].
2.3. Complementary Strategies to Address Docking Limitations
Although a very efficient and useful tool, molecular docking is rarely used on its own within SBDD campaigns and, indeed, is most often coupled with other methods to compensate for its weak points, such as neglecting receptor flexibility or the role of solvents
[117], thus increasing the virtual screening success rate
[118]. Another major limitation is represented by the poor ranking capabilities of classical scoring functions
[119], which is the main cause of the high false positive rate of docking-based virtual screenings
[120]. Indeed, in order to be universally applicable across different biological targets and computationally efficient enough to evaluate a large number of compounds, scoring functions have some limitations in the physical description of the binding event, which prevent any correlation between docking scores and experimentally determined affinity values
[91]. Furthermore, little to no difference in score exists between top-ranking compounds derived from large virtual screening campaigns, making it practically impossible to distinguish active from inactive compounds solely based on the docking score
[121]. For these reasons, each docking-based virtual screening cannot be blindly executed and fully automatized, and a careful setup of the experiment must be executed based on the available literature data and the knowledge of the target
[121][122]. For COVID-19, the importance of this common-sense medicinal chemistry practice has been highlighted by the retrospective literature analysis provided by Llanos et al., which showcased the poor performances of structure-based virtual screenings solely based on ranking provided by docking scoring functions
[118].
A possible solution to the limited physical description of the protein–ligand binding event of docking is to couple it with molecular dynamics (MD) simulations
[89][123]. Molecular dynamics is a computational technique that allows investigating the time-dependent evolution of biological systems following the rules of molecular mechanics, i.e., determining the atomic trajectories by numerically solving Newton’s equation of motion, where forces between the particles and their potential energies are calculated according to molecular mechanical force fields
[124]. Due to the heavy computational workload required to run these types of simulations, MD is rarely used for screening purposes, while it is more frequently exploited for the refinement of docking results, i.e., evaluating the pose stability or optimizing the protein–ligand complex geometry for a more accurate estimation of the free binding energy
[125][126].
Regarding the pitfalls of the scoring component of docking programs, one possible strategy is to apply some form of knowledge-based filter upon docking results, in a similar fashion to what would happen if each pose were visually inspected
[127]. For example, experimental information about critical protein–ligand interactions required for binding can be encoded within a pharmacophore filter or an interaction fingerprint, both of which can be used as constraints in the pose selection process
[128]. In the case of pharmacophore filters, poses are filtered based on their ability to place a given functional group within a defined volume
[129][130], while in the case of protein–ligand interaction fingerprint, the selection is usually based on the similarity between the reference and the query vector, representing the interaction features of the reference compound (a true active) and the investigated molecule respectively
[131][132].
For instance, Wang et al. used a combination of structure-based pharmacophore screening, docking (both performed with the appropriate tools of the Molecular Operating Environment suite), and post-docking molecular dynamics refinement to identify a set of four sub-micromolar M
pro inhibitors among a database of in-house compounds
[133].
The same protocol was successfully exploited by Tian et al. to identify four sub-micromolar PL
pro inhibitors in the same in-house library
[134].
Furthermore, a slight variation of the protocol was also employed by Yin et al. to discover a noncovalent cyclic peptide that simultaneously inhibits both SARS-CoV-2 M
pro and nrp-1 with an activity in the low nanomolar range
[135]. Within this scientific work, pharmacophore constraints were used for scoring peptide poses on M
pro, while traditional docking scores were used for the nrp-1 screening.
A remarkable joint computational work by Gossen et al. led to the molecular dynamics-driven design of a structure-based pharmacophore filter, which was then exploited to identify two nanomolar M
pro inhibitors among a library of publicly available compounds
[136].
A similar approach was exploited by Hu et al., which exploited the combination between MD-based pharmacophore filtering, docking-based virtual screening within the Molecular Operating Environment suite, and MD-based post-docking refinement to identify micromolar SARS-CoV-2 cell entry inhibitors targeting the FP of the spike protein
[137].
Jang et al. used protein–ligand interaction fingerprint similarity as a post-docking filter for their double virtual screening on both M
pro and RdRp with the Vina program to identify seven compounds inhibiting SARS-CoV-2 replication in cell-based assays among a library of approved drugs
[138].
Due to the static nature of molecular docking, which does not consider receptor flexibility, the choice of the input structure is vital for the success rate of a virtual screening
[139]. Although molecular dynamics can be a useful posterior refinement of poses, a wrong input conformation of the target macromolecule could prevent the sampling of native-like poses for active compounds, leading to a reduced hit-finding rate
[140]. For this reason, multiple conformations of the same receptor derived from MD simulations or experimentally solved in different conditions can be used in parallel in a process defined as ensemble docking (ED)
[141]. When this approach is used, docking calculations are independently run on each structure, with virtual hit compounds being identified either through consensus scoring or a consensus ranking approach
[142][143]. In the case of consensus scoring, the docking score of the same molecule is averaged across the different virtual screenings, with the final ranking based on the consensus score
[144]. Differently, consensus ranking involves the selection of top-ranking hit compounds across different virtual screenings, regardless of congruence between scores
[145]. A consensus approach can also be utilized to rank molecules based on virtual screening executed on the same receptor structures with different docking protocols
[146].
For example, Gimeno et al. applied a consensus scoring approach to three independently executed virtual screenings through Glide, FRED
[147], and Vina software to identify two M
pro micromolar inhibitors within the Drugbank database, a library that includes all drugs approved by the Food and Drug Administration (FDA)
[148].
Yang et al., instead, employed an ensemble docking approach with the Glide docking software to identify six M
pro inhibitors among a library of commercially available peptidomimetic compounds, two of which demonstrated sub-micromolar potency
[149].
Rubio-Martinez et al. used a combination of ensemble docking based on QVina2
[150] and post-docking molecular dynamics refinement to identify five M
pro micromolar inhibitors within a library of commercially available natural compounds
[151].
A mixture of the previous two approaches was exploited by Clyde et al. for their High-Throughput Virtual Screening (HTVS), based on both ensemble docking and consensus scoring between the FRED and Vina docking programs, that led to the discovery of seven micromolar M
pro inhibitors among a set of commercially available compounds
[152].
Further, a combination of consensus ranking among Autodock, Hybrid, and FlexX and post-docking molecular dynamics refinement was utilized by Glaab et al. to virtually screen a library of commercially available compounds and identify two micromolar M
pro inhibitors
[153].
Similarly, Ghahremanpour et al. applied both consensus ranking among three independent virtual screenings performed with the Glide, Autodock, and Vina software and post-docking molecular dynamics refinement to identify 14 micromolar M
pro inhibitors within the Drugbank database
[154].
Another possible solution to cope with inaccuracy in free binding energy determination by traditional scoring functions is to rescore docking poses using more computationally intensive and accurate methods such as Free Energy Perturbation (FEP)
[155] or MMGBSA/MMPBSA
[156]. The first approach relies on performing a series of alchemical transformations across a set of ligands that need to be evaluated. This conversion cycle allows calculating relative differences in the free binding energy that can be used for a more accurate ranking of hit compounds derived from a virtual screening
[157]. The second approach relies instead on correcting the gas phase interaction energy calculated according to the molecular mechanics force field with a term accounting for the desolvation-free energy, where the polar component is estimated either by numerically solving the Poisson–Boltzmann equation (MMPBSA) or through the Generalized Born method (MMGBSA)
[158].
Intriguingly, one of the hit compounds identified in the work of Ghahremanpour et al. was then used by Zhang et al. for the FEP-driven design of multiple nanomolar M
pro inhibitors
[159].
A similar combination of Glide docking and FEP to determine the absolute binding free energy was also employed by Li et al. to identify 15 micromolar M
pro inhibitors within the Drugbank database
[160]. The efficacy of FEP in estimating the binding energy of potential M
pro inhibitors was also highlighted by a retrospective study by Ngo et al.
[161].
A multistep virtual screening involving semiflexible docking with Glide, Schrödinger induced-fit docking
[162], MD-based post-docking refinement, and binding free energy estimation with the MMGBSA
[163] protocol was exploited by Ibrahim et al. to identify one low micromolar nsp15 inhibitor
[164].
Although the estimation of thermodynamic properties such as the free binding energy has been a staple point of drug discovery campaigns, both from a computational and an experimental perspective, lately there has been a major interest shift towards the determination of kinetic parameters since they better correlate with in vivo efficacy
[165]. Specifically, several MD-based methods have been developed throughout the years to rank compounds based on their predicted residence time, i.e., the time that the ligand spends in the receptor-bound state
[166]. Among those, Pavan et al. developed Thermal Titration Molecular Dynamics (TTMD), a new method for qualitative estimation of protein–ligand complex stability (
Figure 2), which was successfully applied for correctly discriminating tight, low nanomolar binders from weak, micromolar SARS-CoV-2 M
pro inhibitors
[167].
Figure 2. Workflow of a Thermal Titration Molecular Dynamics (TTMD) simulation. The time-dependent conservation of the native binding mode within a protein–ligand complex of interest is monitored with a scoring function based on interaction fingerprint through a series of short molecular dynamics simulations performed at progressively increasing temperatures. The simulation is carried out until the target temperature is reached or the dissociation process is completed. A coefficient called MS is then calculated and used to rank ligands based on the persistence of their native binding mode.
2.4. Beyond Protein-Ligand Docking: Alternative Strategies for Rational Drug Development
Despite the indisputable relevance of molecular docking within most SARS-CoV-2 drug discovery campaigns, other approaches were successfully implemented, especially for projects which deviate from the design of a standard small molecule noncovalent binder.
For example, Zaidman et al. developed
Covalentizer, an automated pipeline for the conversion of noncovalent binders to irreversible ones, which was successfully applied to the conversion of a SARS-CoV M
pro reversible inhibitor to a sub-micromolar SARS-CoV-2 M
pro irreversible one
[168].
Valiente et al. reported the discovery of D-peptides that bind the spike RBD with low nanomolar affinity, hence blocking SARS-CoV-2 infection in cell-based assays. These ACE2-mimicking peptides were selected within the starting library through a combination of structural alignment, MD-based post-docking refinement, and binding free energy estimation
[169].
Similarly, a series of peptides mimicking the HR2 domain of the spike protein able to prevent SARS-CoV-2 infection in cell-based assays with low micromolar potency were designed through the combination between structural alignment, mutational scanning with the BeAtMuSiC
[170] tool, and MD-based post-docking refinement
[171].
Jeong et al. used Rosetta
[172] to rationally design a mAb that recognizes a conserved surface on the spike RBD of various coronaviruses with picomolar binding affinities, thereby strongly inhibiting SARS-CoV-2 replication in cell-based assay
[173].
A similar strategy was exploited by Miao et al., which employed Rosetta docking and MD-based post-docking refinement to design an RNA aptamer that binds with picomolar affinity to the spike RBD and inhibits SARS-CoV-2 replication with sub-micromolar potency in cell-based assay
[174].
Further, Cao et al. utilized a combination of modeling with Rosetta and docking with RifDock
[175] to design ten mini proteins which bind with picomolar affinity to the spike RBD thus inhibiting SARS-CoV-2 infection within cell-based assays
[176].
Moreover, Glasgow et al. combined modeling with Rosetta and computational alanine scanning with Robetta
[177][178] to rationally design “ACE2 receptor traps”, i.e., engineered proteins that bind the spike RBD with high affinity and neutralize SARS-CoV-2 infection as effectively as clinically used mAbs
[179].
As thoroughly discussed in previous paragraphs, many SARS-CoV-2 drug discovery campaigns favored static, time-independent approaches such as docking or structural alignment, over time-dependent methods such as molecular dynamics. This can be attributed to the long calculation times, the reduced conformational sampling capabilities, and the lower accessibility of MD simulations to the general medicinal chemistry audience
[126][180]. Despite these issues, several works demonstrated the potential of using full-fledged MD-based drug discovery pipelines, especially when smart enhanced-sampling strategies are employed
[180].
For example, Bissaro et al. showed how high-throughput supervised molecular dynamics (HT-SuMD)
[181], a virtual screening platform based on an enhanced sampling MD protocol, could be successfully exploited for docking fragments to the active site of SARS-CoV-2 M
pro, overcoming accuracy limitations of most docking protocols
[182] in identifying the native-like binding mode for frag-like compounds
[183].
Furthermore, the SuMD
[184][185] algorithm (
Figure 3) was successfully exploited by Pavan et al. to decipher details about the recognition mechanism of Nirmatrelvir upon the SARS-CoV-2 M
pro catalytic site before any structural detail was revealed by the drug developer, with successive structural
[23] and molecular medicine
[32] studies confirming the prediction validity
[186].
Figure 3. Workflow of a Supervised Molecular Dynamics (SuMD) simulation. The ligand is dynamically docked within a user-defined binding site through a series of short, unbiased molecular dynamics simulations. At the end of each step, the distance of mass between the ligand and the receptor binding site is computed for each trajectory frame and is fed to a tabu-like algorithm. If the slope of the straight line that interpolates the data is negative, indicating the ligand is approaching the binding site, the step is retained, and the simulation continues with the next “SuMD-step”. If not, the step is discarded and repeated, randomly reassigning particles’ velocities through the Langevin thermostat. This cycle is repeated until a threshold distance is reached or other user-defined termination criteria are met.
Moreover, an evolved version of the SuMD protocol was developed by Pavan et al. and successfully applied to the study of the recognition mechanism between RNA aptamers and proteins, including an RNA-aptamer that binds to the spike RBD with picomolar affinity thus preventing the viral infection of host cells
[187].