In Silico Strategies in Tuberculosis Drug Discovery

Tuberculosis (TB) remains a serious threat to global public health, responsible for an estimated 1.5 million mortalities in 2018. Discovering new and more potent antibiotics that target novel TB protein targets is an attractive strategy towards controlling the global TB epidemic. In silico strategies can be applied at multiple stages of the drug discovery paradigm to expedite the identification of novel anti-TB therapeutics.

tuberculosis druggability docking pharmacophore MD simulation QSAR DFT

1. Introduction

In 1882, Mycobacterium tuberculosis (Mtb) was identified by Robert Koch as the causative agent of tuberculosis (TB), an infectious disease that continuous to be a relevant threat to global public health, especially in low- to middle-income countries. The pathogenesis of TB has several risk factors, including HIV infection, malnutrition, air pollution, type 2 diabetes, alcoholism, and smoking [1][2][3][4][5].
TB is encountered either as latent TB infection (LTBI), which is non-communicable and asymptomatic [6], or active TB, which is communicable and has symptoms such as fever, weight loss, productive cough, and hemoptysis [7]. Active infection is also classified depending on the strain: (1) drug-sensitive, (2) multidrug-resistant TB (MDR-TB), which is resistant to isoniazid and rifampicin, and (3) extensively drug-resistant TB (XDR-TB), which shows resistance to isoniazid, rifampicin, any fluoroquinolone, and aminoglycoside. Around 1.7 billion people are projected to suffer from LTBI and are at risk of progressing into active TB infection [8]. The World Health Organization (WHO) stated that active TB disease can be found in approximately 10 million people and has caused approximately 1.5 million deaths in 2018. An estimated half million individuals have rifampicin-resistant TB (RR-TB), of which 78% had MDR-TB. Furthermore, approximately 6.2% are suggested to have XDR-TB from these MDR cases [8].

2. Current Tuberculosis Management

One of the major challenges in managing TB is the estimated three million ‘missing’ individuals who have developed active infections but remained undetected or undiagnosed. TB can be deadly if not treated. With the help of conventional regimen, an estimated 58 million infected individuals were saved from 2000 to 2018. Global treatment outcome in 2017 shows a success rate of 85% for new TB cases and 56% for those with drug-resistant TB [8].

2.1. Latent Tuberculosis Infection

Treatment for LTBI are only provided for select groups that have a high risk of transitioning to active TB infection, including HIV-positive patients, people who were exposed to those with active TB, patients undergoing dialysis for end-stage renal disease, taking anti-tumor necrosis factor (TNF) medications, preparing for transplant surgery, or those with silicosis. Depending on whether it is beneficial or not, especially for children below 5 years of age, exposure to patients with active MDR-TB would require personalized treatment regimens and close observation. WHO recommended several different treatment regimens for LTBI, including 3 months of rifapentine and isoniazid, 3–4 months of isoniazid and rifampicin, 3–4 months of rifampicin, and 6–9 months of isoniazid [9][10]. While all these have established efficacy, poor patient compliance continues to be an issue especially with long treatment periods [9][10][11].

2.2. Active Drug-Sensitive Tuberculosis

In the last several decades, the treatment strategy for active drug-sensitive TB has not changed from the standard regimen of first-line drugs rifampicin, isoniazid, pyrazinamide and ethambutol (Figure 1) for the first 2 months continued by isoniazid and rifampicin for the next 4 months [12][13]. While this treatment procedure is highly efficacious and successful, its long duration primarily leads to poor patient compliance. This has long been an issue in TB management, necessitating monitoring protocols like the directly observed therapy (DOT), wherein a health professional directly supervises each dose intake [14]. Another issue brought about by the prolonged treatment is drug toxicity resulting in numerous adverse effects such as skin rash, gastrointestinal intolerance, neuropathy, arthralgia, increase in liver enzymes, hepatitis, immune thrombocytopaenia, agranulocytosis, haemolysis, renal failure, optic neuritis, and ototoxicity [15][16].
Figure 1. First- and second-line drugs approved for tuberculosis treatment.

2.3. Multiple and Extensively Drug-Resistant Tuberculosis

Failure to complete the full TB regimen leads to disease relapse and drug resistance, which is more challenging to treat. A specific regimen can be designed depending on the resistance profile of the TB strain in a patient [17][18]. These treatments are often of longer duration (18 months or more) and utilize the more expensive second-line drugs (Figure 1) which have uncertain efficacy and high toxicity, resulting in poorer compliance and undesirable outcomes. To mitigate these issues, an updated seven-drug regimen guideline for the treatment of drug-resistant TB lasting 9 to 12 months was released by the WHO last 2016 [19].
With the increasing threat of treatment-resistant TB infection, a number of drugs have been fast-tracked to aid with the efforts in controlling TB worldwide. At the end of 2012, the US Food and Drug Administration (FDA) conferred accelerated approval to the drug bedaquiline for the treatment of resistant TB [20]. Bedaquiline’s anti-mycobacterial activity is due to its inhibition of the mycobacterial ATP synthase, a key enzyme in ATP synthesis, resulting in bacterial death. However, its use was shown to have an increased risk of death, thereby causing concerns about its approval. During clinical trials, roughly 11.4% of patients who took bedaquiline died as compared with 2.5% of those who took placebo treatments [21]. In 2014, the use of delamanid, a nitro-dihydro-imidazooxazole derivative, in the treatment of MDR-TB in adults was given conditional approval by the European Medicines Agency (EMA) [22]. Delamanid inhibits mycolic acid biosynthesis to block the formation of mycobacterial cell wall leading to improved drug permeation and more effective treatment [23]. Just recently, pretomanid in combination with bedaquiline and linezolid has also been approved by the FDA for treatment-resistant TB [24]. Pretomanid is a prodrug activated by nitroreductase, which reduces pretomanid’s imidazole ring to generate active metabolites. Specifically, a des-nitro metabolite leads to elevated levels of nitric oxide, which displays antimycobacterial activities due to its work as a poison for bacterial respiration under anaerobic conditions [25]. In aerobic conditions, it works like delaminid by targeting cell wall mycolic acid biosynthesis [26], and while there were several potential targets for this drug, its exact protein target is not yet known [27].
An increasing number of XDR-TB cases, such as in India, China, South Africa, Russia, and in eastern Europe, have proved difficult to treat even with the more intensive drug-resistant TB treatment regimen [18]. Novel therapeutics such as bedaquiline, delamanid, and pretomanid might help in curing these patients, though a suitable treatment regimen still has to be carefully designed. However, there is an additional difficulty in acquiring these drugs, especially in developing countries, resulting in a pool of patients that may remain untreated. Essentially, TB can be cured completely with the use of currently available and newly approved anti-tubercular drugs. However, difficulties in diagnosing and reporting infection, long treatment durations leading to drug toxicity and poor patient compliance, emergence of drug resistant strains, and limited acquisition of required treatment urgently necessitates the discovery and development of newer and effective drugs for TB.

3. Rise of Computer-Aided Drug Design in TB Drug Discovery

The drug discovery paradigm covers a wide range of fields, including biochemistry, chemical and structural biology, chem- and bioinformatics, computational chemistry, physical chemistry, organic synthesis, and others. The whole process entails large investments of time, money, and effort in order to produce promising candidates for the pipeline. Over the years, the drug discovery process for new antitubercular therapeutics have changed due to the increase in biological and chemical data, number of identified and validated targets, and advances in high-throughput screening technologies and software development. Moreover, the progress in data storage capacities, supercomputing powers, and parallel processing in the last several years allowed computer-aided drug design (CADD) to become an integral part of TB pharmaceutical research. This continuing expansion in computing power can soon potentially allow the exploration of the vast chemical space, thought to comprise of approximately 1060 organic molecules below 500 Da, in order to identify therapeutically interesting scaffolds [28]. Moreover, the boom in protein structural data, including over 150,000 macromolecular structures found in the Protein Data Bank (PDB, [29], proved beneficial in elucidating important molecular and computational concepts for drug design studies. As with any other disease, TB has been the subject of continuous and numerous drug discovery studies, including thousands of published CADD investigations. Despite this, a paper by Ekins et al. noted gaps in the application of these methods in TB research [30], resulting in the slow output of candidates into the TB drug pipeline despite the apparent need and urgency for this disease. This suggests that more rigorous efforts are needed in TB drug discovery to maximize the advantages provided by computational tools.
Computational or in silico methods are knowledge-driven, rationally exploring available data to investigate protein function and design new molecular entities (NMEs) that can effectively regulate its behavior. Computational drug discovery approaches are generally divided into structure—(SBDD) and ligand-based drug design (LBDD), depending on the availability of structural data (Figure 2). However, it has been a common practice to integrate these methods in a complementary manner in order to increase the success rate of current drug discovery projects (Figure 2). SBDD requires the target’s three-dimensional (3D) structure to be able to examine and use the binding pocket for screening and design of suitable ligands, which can then be experimentally validated and optimized. In the absence of protein structural data, LBDD utilizes knowledge gained from a collection of diverse ligands with known activity to create predictive models for hit discovery and lead optimization [31]. Different types of SB and LB strategies, or a combination thereof, can be applied at different stages of TB drug discovery and development in order to alleviate the challenges involved with experimental methods. With the availability of TB genome and proteome, as well as abundant structural data, data mining and docking strategies can be employed for target identification. Virtual screening (VS) can then be applied to pick out the best potential candidates from a database containing millions of molecules for a chosen TB target. After validation of candidates, structure-activity or -property relationship (SAR/SPR) studies can be implemented to understand mechanism of action and ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties in order to design compounds with better activity and pharmacokinetics. Data (both positive and negative results) taken from these investigations can be kept and used for further iteration and method optimization in the design of novel TB compounds. Both commercial and free software and webservers have been developed covering different SBDD and LBDD techniques, some of which are listed in Table 1.
Figure 2. In silico tools that can be applied to TB drug design and development.
Table 1. Free and commercially available programs, webservers, and source codes for SBDD and LBDD.



Webserver Name



Comparative modeling


Free webserver

Structural geometry confirmation


Free standalone program for academic license or commercially available through BIOVIA

Robetta [34]

Free webserver

Prime [35]

Commercially available through Schrödinger

I-TASSER [36][37][38][39][40][41]

Free webserver or standalone program for academic license


HHPred [42][43][44]

Free webserver

Structural geometry confirmation


Free webserver and source code

Druggability and binding site prediction

Druggability and binding site prediction

ProSA [46]

Free webserver


Free webserver

ERRAT [48]

Free webserver

PockDrug [49]

Free webserver

DoGSiteScorer [50]

Free webserver

fpocket [51][52]

Free/open source platform

CASTp [53][54][55]

Free webserver

PocketQuery [56]

Free webserver

PASS [57]

Free/open source platform


SiteMap [58]

Commercially available through Schrödinger

Docking, pharmacophore, and virtual screening

Docking, pharmacophore, and virtual screening

ConCavity [59]

Free webserver

PrankWeb [60]

Free webserver

ProFunc [61]

Free webserver

AutoDock [62] and AutoDock Vina [63]

Free standalone program

DOCK [64]

Free/open source platform

GOLD [65]

Commercially available through CCDC

Glide [66]

Commercially available through Schrödinger

Induced Fit [67]

Commercially available through Schrödinger


FlexX [68]

Commercially available through BioSolveIT

RosettaLigand [69]

Free/open source platform for academic license


Commercially available through BIOVIA

SwissDock [71][72]

Free webserver

Pharmer [73]

Free/open source platform


Commercially available through BIOVIA

PharmGist [75]

Free webserver

LigandScout [76]

Commercially available through Inte:Ligand

SwissSimilarity [77]

Free webserver


LEA3D [78]

Free webserver

PyRx [79]

Free (no support) or commercially available

Phase [80]

Commercially available through Schrödinger

Molecular Dynamics

AMBER [81][82]

Commercially available


Free or commercially available through CHARMM or BIOVIA

CHARMMing [84]

Free webserver

GROMACS [85][86]

Free/open source platform

NAMD [87]

Free/open source platform

Desmond [88]

Commercially available through Schrödinger

SwissParam [89]

Free webserver


Free webserver

ParamChem CGenFF [91][92][93]

Free webserver

VMD [94]

Free/open source platform

Molecular Descriptors, Fingerprints, and Quantitative Structure-Activity Relationship

Dragon [95]

Commercially available through Talete

E-Dragon [96]

Free webserver

Canvas [97]

Commercially available through Schrödinger

RDKit [98]

Free/open source platform


PyDescriptor [99]

Free/open source platform

Mordred [100]

Free/open source platform

Open3DQSAR [101]

Free/open source platform

ChemSAR [102]

Free webserver

SeeSAR [103]

Commercially available through BioSolveIT

Pharmacokinetic properties

QikProp [104]

Commercially available through Schrödinger

ADMET Predictor [105]

Commercially available through SimulationsPlus, Inc.

ACD Percepta [106]

Commercially available through ACD/Labs

FAF-Drugs4 [107]

Free webserver


PatchSearch [108]

Free webserver


TOPKAT [109] and ADMET [110]

Commercially available through BIOVIA

PASS Online [111]

Free webserver or commercially available standalone program

SwissADME [112]

Free webserver

MetaSite [113]

Commercially available through Molecular Discovery

ToxPredict [114]

Free webserver

VirtualToxLab [115][116][117][118]

Free standalone software

admetSAR [119][120][121]

Free webserver

MetaTox [122][123]

Free webserver

4. Edges and Pitfalls of In Silico Methods

There are roughly 2500 protein structures for tuberculosis in the PDB and perhaps thousands of ligand candidates published. All these pieces of information are available with a few keyboard strokes and a click of the mouse. Along with existing technologies, it is now possible to analyze TB enzymes and lead candidates at the atomic level in order to understand their function and how to regulate them. While computational methods have been widely used in drug discovery nowadays due to their successful applications [124][125][126], it is still important to remember that these tools are like any other experimental approaches—prone to limitations dependent on the system and other various parameters being studied [127][128][129].
VS has been known to successfully screen millions of compounds to identify potential inhibitors for a given target [124][126]. This lends efficiency to cost, time, and efforts used in drug discovery projects as only the most promising compounds are brought forward for more rigorous experimental testing and drug development. However, optimization and validation of these methods are far from perfect and are highly dependent on the protein system and compound classes used, leading to possible bias in the computational model. Thus, it is challenging to determine which method has the advantage over another; many benchmark studies have been published regarding this matter [130][131]. Other major limitations include difficulties in incorporating protein flexibility and solvent effects due to the computational burden attached to these factors [31]. Fortunately, available technologies seem to be catching up as enhance sampling methods, HPC, and MD platforms are now routinely applied in drug discovery projects and are known to calculate up to milliseconds of simulations for various protein targets [132][133][134]. In terms of ligand-based drug design, its main advantage is its simplicity and efficiency. Indeed, LBDD has a long history and numerous candidates have already been discovered even with the lack of protein structural information [135][136][137]. Nonetheless, several factors should be considered when applying ligand-based tools. Firstly, ligand alignments are based on the lowest conformation energy, which is often different from the bioactive conformation [138][139], as well as on the assumption that ligands bind in the same site and display the same conformation. Secondly, compounds should be evaluated by the same group (preferred) or tested using the same assay with the same parameters to be considered comparable [140]. Thirdly, the basic premise of ‘similar structures display similar activities’ are contradicted by the existence of activity cliffs [141][142][143], and so care should be taken when selecting potential candidates from a pool of virtual hits. Finally, it is also a challenge to incorporate the effects of solvation and protein flexibility due to the nature of the analysis.
As mentioned in the previous section, integration of several in silico methods have become common practice when designing and optimizing lead candidates to overcome the shortcomings of each individual tools. Despite requiring more computational resources, assimilation of computational methods result in better accuracy and enrichment of hits. In addition, the combination of a researcher’s innate knowledge with the computational efficiency of these tools is perhaps the best integration of all, as a human’s touch continues to be irreplaceable in the interpretation of all the data produced by in silico methods.


