Mathematical Modeling and Computational Tools in Drug Design

Mathematical Modeling and Computational Tools in Drug Design: Comparison

Please note this is a comparison between Version 1 by Foysal Ahammad and Version 3 by Md Afsar Ahmed Sumon.

The conventional drug discovery approach is an expensive and time-consuming process, but its limitations have been overcome with the help of mathematical modeling and computational drug design approaches. Previously, finding a small molecular candidate as a drug against a disease was very costly and required a long time to screen a compound against a specific target. The development of novel targets and small molecular candidates against different diseases including emerging and reemerging diseases remains a major concern and necessitates the development of novel therapeutic targets as well as drug candidates as early as possible. In this regard, computational and mathematical modeling approaches for drug development are advantageous due to their fastest predictive ability and cost-effectiveness features. Computer-aided drug design (CADD) techniques utilize different computer programs as well as mathematics formulas to comprehend the interaction of a target and drugs. Traditional methods to determine small-molecule candidates as a drug have several limitations, but CADD utilizes novel methods that require little time and accurately predict a compound against a specific disease with minimal cost.

mathematical modeling
CADD
QSAR
MM-GBSA
MM-PBSA
pharmacophore modeling
MD simulation
biological activity
drug design

1. Introduction

A drug is a type of natural or synthetic chemical that is used to prevent, treat, or diagnose disease [1]. It can be able to alter the function of a biological system or target from the molecular to the cellular level. Drug discovery helps to determine new therapeutic candidates by using different computational, experimental, and clinical models. The integrated approaches led to the identification of novel drugs not only from plants but also from other chemical sources [2]. Although various therapeutic compounds originating from plant products are highly regarded, synthetic chemistry and biotechnology products account for the majority of medications in the current medical system [3]. The subject of drug development is exceedingly difficult and needs proper infrastructure and laboratory resources. Unfortunately, the traditional strategy of discovering new drug compounds is a time-consuming process that can take up to 10–15 years and can cost up to USD 2.558 billion to bring a therapeutic to market [4]. This is a multistage and complex process that begins with the identification of an appropriate drug target, followed by drug target validation, hit-to-lead identification, and lead molecule optimization, as well as preclinical and clinical research [5]. Despite the huge financial and time commitments required for medication development, clinical trial success is just 13%, with a high drug attrition rate [6].

A mathematical model is a powerful representation of a biological system that uses mathematical ideas and language to produce an accurate description of the system of principles [7]. The model helps in determining the operation process as well as anticipating certain influencing factors and enables the simulation of complex biological processes that generate hypotheses and suggest experiments [8]. The model also known as forecasting modeling is now frequently used to guide drug development at the industrial level. For example, simulation is the more direct approach that utilizes a mathematical model and predicts system behavior under given conditions [9]. Mathematical model-based biological complex system analysis has high productivity and low cost. The process generates novel lead compounds that undergo clinical trials and reach the market [10]. Most of the major obstacles that arose during the conventional drug design and discovery process may be overcome by employing mathematical models [11]. These models are now being utilized in in silico research to describe various pharmacological properties of potential medicinal drugs [12]. For example, the FDA’s Center for Drug Evaluation and Research (CDER) uses modeling and computer simulations at various phases of drug discovery [13].

Currently, CADD has proven to be a useful and powerful strategy in the manufacture of various medicines [14]. The approach has assisted in overcoming the drawbacks of a time-consuming and expensive procedure in drug research and development [15]. In the latest drug design process, the in silico approach is more important than before. CADD methods such as pharmacophore modeling, virtual screening, molecular docking, and dynamic simulation are frequently applied to identify, develop, and evaluate medicinal properties as well as comparable physiological activity of substances [16]. To quantify the binding efficiency and toxicity of a compound in the classical drug development process, massive in vitro and in vivo trials are required [17]. CADD techniques include a molecular docking methodology that can effectively categorize a large number of molecules with higher binding effectiveness [18]. The method can be used to identify the interaction between a ligand and a receptor at the atomic scale, which helps to identify the binding position of a molecular to a target protein and subsequently provides an idea about the biochemical process [19]. The technique also provides information regarding the target behavior and predicts how a protein (enzyme) interacts with small molecules (ligands) at the binding site of target proteins and facilitates the evaluation of the biological activity of a molecular candidate [20]. Additionally, the CADD approaches include different pharmacology properties analysis tools that can evaluate a compound’s pharmacokinetic (PK) parameters such as bioavailability, toxicity, and effectiveness within a short period. Furthermore, the CADD approaches also include molecular dynamics (MD) simulation techniques that can determine a ligand’s binding stability towards its receptor, which is more suitable and accurate [21].

2. Target Identification

The early stages of drug discovery probably start with target selection and later move to lead optimization. In the process of potential disease, target discovery is dependent on a variety of resources, involving academic studies, clinical investigations, and the business sector. The pharmaceutical industry, as well as numerous research organizations, use the designated target to locate molecules for developing authorized treatments [22]. Several preliminary stages are involved in this procedure. Throughout the process of target identification and validation, researchers search for chemicals to disrupt a particular biological path that is connected to a certain illness [23]. These compounds can be found in nature, identified through high-throughput screening of large compound libraries, or synthesized as analogs of other drugs that have been proven to be effective against a specific disease. The initial stages in target classification and identification are to determine the function of a possible therapeutic target (which may be a gene or protein) and its involvement in the illness [24]. The molecular processes addressed by the objective are characterized by the following target identification. A good target must always be productive, safe, suitable, and druggable, and it must fulfill clinical and financial requirements. Target identification may be divided into two types: the system biology approach and the molecular biology approach. The system biology approach is a technique that involves studying diseases in complete organisms and selecting targets based on data from clinical trials and in vivo animal research [25]. The molecular biology method, which is at the heart of today’s target identification efforts, aims to find “druggable” targets whose activity may be influenced by associations with molecules, proteins, and sometimes antibodies. Since the biological factors involved in human diseases are so complicated, the foremost essential issue in target identification is not only identifying, optimizing, and choosing trustworthy “druggable” targets, but also truly comprehending the cell membrane associations that identify disease patterns, developing predictive models, and building biological mechanisms for human diseases [26]. For example, G-protein-coupled receptors (GPCRs) and protein kinases are highly “druggable” targets that were identified throughout the molecular biology-based methods [27]. Network-based drug discovery, a field that utilizes information in drug–protein and protein–disease networks, may also be used to study target identification [28]. This strategy entails a highly collaborative scheme between databases and correlations across genomics, transcriptomics, proteomics, metabolomics, the study of the microbiome, and pharmacogenomics, and it is heavily reliant on the development of relevant mathematical, computational, and systems biology tools that connect pharmacological and genomic domains and create computational frameworks for drug target discovery [29]. Another recent network-based application was the combination of large-scale structural genomics and disease association studies to produce a three-dimensional human interactome, which resulted in the identification of candidate genes for previously unknown disease-to-gene associations with proposed molecular mechanisms.

3. Mathematical Models in Drug Design

Mathematical techniques for drug discovery have a high value because of their potential effect and low cost compared to preclinical studies [30]. The employment of mathematical models, as well as computer simulations, has several advantages. It can be very helpful for systematically determining the relevance of a specific target or pathway for the overall behavior of the system. First, the inconsistencies between the behavior forecasted by a mathematical model and the behavior observed in actual trials might point to missing components, in which the mathematical model allows for a briefer image of a biological mechanism to develop. Although it is not clear which compounds are absent from the system under review, the mathematical model research findings may be used to influence the construction of additional investigations to address the problem. In addition, mathematical models enable a systematic analysis of system fluctuations triggered by the delivery of drugs [31]. However, it is difficult to represent real-world systems such as biological systems in terms of mathematical relationships [32]. Figure 1 shows the process of the use of a mathematical model in the drug design process.

Figure 1. A schematic representation of a mathematical model, including experimental design, experimental data analysis, model optimization, and model validation, used in modern drug design approaches.

Pharmacokinetic and pharmacodynamic analyses are the earliest and most widely used forms of mathematics in drug design. Pharmacokinetics is the study that describes how drug concentrations change over time, whereas pharmacodynamics explains how drug effects fluctuate with concentration. Pharmacokinetics depict a possible drug’s concentration in the appropriate organ compartments (e.g., circulating blood). Pharmacodynamic models relate this concentration to a biomarker that is thought to be linked with a disease state, often considering the modification of the pharmaceutical target [33]. Cancer research is a good example of how mathematical models are used in drug discovery. One of the most widely employed mathematical models in cancer treatment research is integrated into network-based medicine [34]. Network medicine is a discipline of medicine that explores molecular and physiological links with therapeutic implications. Infectious diseases, such as malaria, are another instance of a mathematical model application in drug innovation [35]. In this situation, mathematical models may be employed to evaluate the prospective drug’s capacity to destroy the parasite at a different phase of the disease. Compound pharmacokinetics and compound pharmacodynamics are used in such models. COVID-19, an infectious viral disease, is the most recent example of how mathematical models are employed in drug discovery [36].

4. Protein Structure Prediction

Proteins are vital molecules that are involved in a variety of biological activities. Protein structure prediction or modeling is critical since a protein’s activity is largely determined by its three-dimensional structure. Furthermore, a protein’s 3D structure is determined by its amino acid composition. Experiments using X-ray crystallography or NMR spectroscopy to resolve protein structure are time-consuming, expensive, and complex ^[37][42]. Consequently, theoretical knowledge of protein structure, dynamics, and folding has been used to construct a model from amino acid sequences due to the improvement of computer methods and computational tools. The approaches for predicting protein structure may be divided into three categories (Figure 2): (a) homology modeling; (b) threading; (c) ab initio methods (de novo).

Figure 2. Representation of the protein structure prediction methods: (a) homology-based approach; (b) threading approach; (c) ab initio approach.

The most effective computer technique for protein structure prediction is homology modeling, which involves predicting an unknown structure using a similar known protein structure as a framework ^[38][43]. An ideal therapeutic simulation of a protein may be built by assigning a structure based on sequence alignment and then creating the model and minimizing energy. Despite homology modeling’s predictive potential and utility, some issues remain. Firstly, the amounts of target-template architectural conservation and alignment precision are key indicators of the model’s quality. If the identity of the template sequence is below 20%, around 50% of residues inside the layout are likely to be misaligned. Another concern includes that homology modeling systems should develop innovative ways to manage the expanding number of existing protein molecules. To date, different homology modeling tools has been developed and the most frequent use tools use for the modeling has been listed in Table 1. Threading a sequence throughout a fold involves a precise adjustment of the protein’s amino acid sequence with the folding motif’s corresponding amino acid residue residues. The main goal of this technique is to determine the most possible fold from a given sequence or to find appropriate sequences that might fold into a certain structure. Threading performance is characterized by the number of useable folds whose structures are determined precisely towards the atomic level ^[39][44]. Threading processes, which use approaches for aligning sequences with 3D shapes to determine the proper folding of a given sequence from a range of possibilities, were used to make the predictions. In absence of an experimentally solved structure of a similar/homologous protein, ab initio (de novo) protein structure prediction is a technique for evaluating the three-dimensional structure, when an experimentally solved structure of a similar/homologous protein is not present. The energy function guides the construction of protein structure in this strategy. The ab initio (from scratch) methodologies are based on first-principles physics and chemistry regulations, as well as the premise that a protein’s natural structure always remains at the lowest energy level ^[40][45]. However, the precision of ab initio modeling is poor, and performance is generally limited to tiny proteins (120 residues).

Table 1.

Summary of the most widely recognized homology modeling tools use in drug development.

No	Name	Application	Availability	Reference
1.	I-TASSER	Reassembling fragment structure via threading	https://zhanggroup.org/I-TASSER/	^[41][46]
2.	SWISS-MODEL	Segment assembly/local similarity	https://swissmodel.expasy.org/	^[42][47]
3.	ESyPred3D	3D modeling, template identification, and alignment	https://www.unamur.be/sciences/biologie/urbm/bioinfo/esypred/	^[43][48]
4.	HH-suite	Template detection, alignment, 3D modeling	https://arquivo.pt/wayback/20160514083149/http:/toolkit.tuebingen.mpg.de/hhpred	^[44][49]
Nonlinear	Works well with nonlinear data	^[	⁶⁶	^][94]	5.	RaptorX	Protein 3D modeling, remote homology discovery, and binding site prediction	http://raptorx.uchicago.edu/	^[45][
5.	50	]
Support vector machine	6.	FoldX	Protein design and energy calculations	https://foldxsuite.crg.eu/	^[46][51]
7.	ROBETTA	Rosetta homology modeling and fragment assembly from scratch with Ginzu domain prediction	http://robetta.bakerlab.org/	^[47][52]
8.	BHAGEERATH-H	Methods of ab initio folding and homology are combined	http://www.scfbio-iitd.res.in/bhageerath/bhageerath_h.jsp	^[48][53]
9.	Prime	Homology modeling, evaluation, and refining of the produced model using the energy function	https://www.schrodinger.com/prime	^[49][54]
10.	LOMETS	Tertiary structure prediction with a local meta-threading server	https://zhanglab.ccmb.med.umich.edu/	^[50][55]

5. Computer-Aided Drug Design

Computer-aided drug design methods have been applied in the field of drug development over the past two decades ^[51][56]. Currently, this is seen as one of the best appropriate alternatives to high-throughput screening, which is routinely used in drug design and development. CADD may be used for all efforts that have been made throughout the process of drug development that can be described mathematically and analyzed using numerical methods ^[52][57]. Figure 3 demonstrates the basic CADD approach that may be utilized interactively with experimental methodologies to find novel drug targets and direct iterative ligand optimization. Structure-based and ligand-based drug design techniques are two types of CADD that have been widely used throughout the development of drugs process to find acceptable lead compounds. The CADD approaches help to expedite the drug discovery and development process by minimizing the cost and time ^[53][58]. However, if the computer system crashes unexpectedly, the CADD designs might be lost. If proper precautions are not performed, viruses will infect the computer system.

Figure 3. Representation of the basic workflow of computational drug design approaches. The CADD approaches include structure- and ligand-based drug design approaches, pharmacophore modeling, virtual screening, molecular docking, ADMET, dynamics simulation, and MM-GBSA or MM-PBSA approaches.

5.1. Structure-Based Drug Design

Structure-based drug design (SBDD) (or direct techniques) can be used if the target’s spatial structure is available. Compounds with qualities complementary to the target area can be created based on the properties and features of the macromolecule’s spatial structure. X-ray crystallography, NMR, and in silico homology-based prediction approaches can all be used to determine a protein’s 3D structure. The protein’s binding/active site is discovered when the three-dimensional structure is understood. Structure-based pharmacophore modeling, virtual screening (SBVS), molecular docking, and molecular dynamics (MD) simulations are some of the typical methodologies used in SBDD.

5.1.1. Structure-Based Pharmacophore Modeling

The pharmacophore features are discovered by utilizing the shape of the complicated molecular target ^[54][59]. The characteristics are founded on a single X-ray crystallized target–ligand complex. The pharmacophore characteristics are built using a single ligand as well as its associations with the specific target protein. The fundamental contrast between ligand-based and structure-based approaches is the number of ligands utilized to construct the pharmacophore. The ligand-based technique necessitates at least 30 actives, whereas the structure-based method necessitates only one ligand and its connection with the receptor. Furthermore, the pharmacophore technique is derived from an active site of the ligand. Another method for creating a structure-based pharmacophore is to employ an APO template in such a way that the active site amino acids are determined and then develop a feature list based on their interaction properties that may be included in the pharmacophore ^[55][60]. The only drawback is when the list predicts too many features (more than seven features).

5.1.2. Pharmacophore Model Validation

Structure-based pharmacophore modeling can be employed efficiently when there is inadequate information on ligands that have been empirically proven to inhibit or stimulate the activity of a certain therapeutic target ^[56][61]. Validation is required to obtain an accurate pharmacophore analysis and to analyze the molecular model’s quality. Pharmacophore methods focused on appropriate correlation coefficients (R) might be validated in three main steps: Fisher’s randomization test, test set prediction, and Guner–Henry (GH) scoring technique.

Fisher’s Randomization Test

Fisher’s randomization approach is critical for establishing a link between structural and biological functionality in training set molecules ^[57][62]. The relevant experimental data linked with the training dataset are randomly changed to make them statistically irrelevant. The randomized dataset is then used to construct assumptions using the same characteristics and variables that were used to develop the original hypothesis. This randomization approach validated the drug-tested pharmacophore hypothesis by selecting 95% confidence levels, which resulted in 19 random spreadsheets. The randomized dataset should give equivalent or higher cost values, improved RMSD, and significant correlations for successful pharmacophore development.

Test Set Prediction

The goal of the pharmacophore method is to anticipate not only the behavior of molecules in the training dataset, but also the activity of external molecules. The correlation value between the experimental and forecasted behavior of external molecules that were excluded from the training dataset was predicted using test set prediction. This metric determines the predictability of pharmacophores’ stability (free of errors). In this technique, the behavior of the test set components has a higher correlation coefficient, which has a 95% confidence level ^[58][63].

5.2. Ligand-Based Drug Design

Ligand-based drug design is considered an indirect technique because the structure of the biomolecular target is unknown and cannot be anticipated using approaches such as homology modeling ^[59][87]. The most significant and highly used methods in ligand-based drug discovery are 3D quantitative structure–activity relationships (3D QSARs) and pharmacophore modeling, both of which can supply vital knowledge regarding the nature of connections between drug targets and ligand compounds as well as computer simulations suitable for lead compound optimization ^[60][88]. The most crucial aspects of the interaction nature are preserved, but the noise of extra information is eliminated.

5.2.1. Quantitative Structure–Activity Relationship (QSAR) Models

Structure–activity analysis relationship models depict the overall mathematical relationship between a collection of chemicals’ structural properties and target response ^[61][89]. The QSAR model has been successfully employed to decrease the need for time-consuming, arduous, and expensive processes in innovative drug development during the last few decades, and it also performed well in terms of predicting physiochemical properties (Table 23). Regression techniques, artificial neural networks, principal component analysis (PCA), and partial least squares (PLS) can be used to determine these correlations. Multiple linear regression is a frequently used approach for establishing a link between active and multiple structural features. When a high number of structural features must be taken into account (for example, grid-based approaches in 3D QSAR), linear regression fails and a specialized method such as PCA or PLS is needed. The idea of multidimensional QSAR has been proposed in recent years ^[62][90]. Predicting the biological properties of chemical substances is more beneficial. HQSAR, G-QSAR, MIA-QSAR, and multitarget QSAR are all part of this process, which has had outstanding success in the new drug process. The two most essential methodologies suggested for developing pharmacological compounds are comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA). However, QSAR modeling has some limitations; for example, if the number of molecules in the training set is small, the data may not accurately reflect all of the properties, and therefore it cannot be used to forecast the most active compounds.

Table 23.

A list of techniques and mathematical equations used in QSAR modeling as well as drug design.

No	Techniques	Equation	Activity	Reference
1.	K-nearest neighbor	Linear	Simple	^[63][91]
2.	Multiple linear regression	Linear	Simple	^[64][92]
3.	Partial least squares	Linear	Performs effectively on data including a big dataset	^[65][93]
4.	Artificial neural network
Nonlinear	A most effective approach for classification and regression	^[	⁶⁷	^][95]
6.	Decision tree	Nonlinear	Extremely interpretable	^[68][96]
7.	Random forest	Nonlinear	A better and more reliable estimate	^[69][97]