1. AI in Drug Target Identification
Drug development is a long, expensive, and risky process that requires more than 10 years and USD 2 billion to bring a new drug to market
[1]. Fewer than 500 effective drug targets had been discovered by 2022, which is a negligible portion of the estimated druggable targets in humans
[2][3][4]. Target identification is one of the most important processes to determine the biological cause of a disease and provide effective therapies
[5]. It is the process of selecting appropriate biological molecules or cellular pathways that can be altered by drugs to achieve therapeutic benefits. The availability of biomedical data has increased in recent years, ranging from basic research into the causes of disease to clinical studies. However, this large amount of information poses challenges for data analysis in terms of scalability, data integration, data quality, noise, computational complexity, interpretability, and validation. AI can manage and analyse such complex networks of biological data. Recently, a promising method for target identification has been developed that combines multi-omics data with AI algorithms
[6]. Puna et al. combined various bioinformatics and DL-based models trained using disease-specific text and multi-omics data to prioritize treatable genes and identify potential therapeutic targets in amyotrophic lateral sclerosis (ALS), yielding 18 potential therapeutic targets for ALS. Leveraging more than 20 AI and bioinformatics models, PandaOmics assesses targets by considering their associations with specific diseases, along with information on druggability, developmental stage, and tissue specificity. By combining omics AI scores, text-based AI scores, finance scores, and key opinion leaders (KOL) ratings, PandaOmics could forecast the target genes linked to a particular disease through the use of sophisticated DL models and AI techniques
[7]. Validation of proposed AI therapeutic targets for ALS in an ALS-mimicking drosophila model revealed eight unreported targets whose elimination significantly reverses ocular neurodegeneration. Zhang et al. also developed a ML-based technique to recognize KANK1 as a novel ALS-related gene in the same therapeutic area and confirmed the neurotoxic consequences of KANK1 mutations replicated by CRISPR-Cas9 in human neurons
[8]. DeepDTnet was developed by Zeng et al. to aid in the in silico discovery of molecular targets for already approved drugs. It is based on 15 different types of chemicals, genomic, phenotypic, and cellular networks. In a mouse model of multiple sclerosis, one of the discovered drugs targeting human ROR-γt demonstrated specific therapeutic benefit
[9].
AI has received a lot of attention, and ML-based algorithms, especially AI techniques, have produced remarkable results in the pharmaceutical industry
[10]. The modern DL designs such as generative adversarial networks (GANs), recurrent neural networks (RNNs), and transfer learning approaches are attracting more attention and are used in many more healthcare applications than older ML techniques
[11]. AI-powered Large Modules (LMs) help to speed up target identification and simplify searches. In addition to using a multitask learning neural network with shared character and word layers (MTM-CW), automatic biomedical named entity recognition (BioNER) is a useful method for identifying chemicals, diseases, and genes that are embedded in free-text documents
[12][13]. AI-driven LMs offer potential advantages, including the capability to analyse data and assist in the identification and prioritization of targets
[12].
Furthermore, Fabris et al.—using genetic or protein traits such as gene ontology terms, protein–protein interactions, and biological pathways—developed a DL approach with a new modular architecture to identify human genes associated with several age-related diseases
[14] (
Figure 1).
Figure 1. Artificial intelligence employs machine learning and neural network techniques to effectively identify targets. Figure created with BioRender.com.
Synthetic data are data that have been intentionally created to resemble real-world patterns and characteristics. AI algorithms can be used to generate synthetic data to replicate different biological circumstances, allowing scientists to explore and test a wider range of possible outcomes
[15][16]. Additionally, the predictions of AI systems may be verified using synthetic data that provide higher assurance for the target discovery process. There are two groups of AI algorithms; i.e., ML-based bioanalysis, and network-based bioanalysis algorithms. These are commonly used for predictive cancer target identification and drug development. However, even after the use of ML algorithm, there are still three significant obstacles to target identification and drug development for cancer
[17]. The difficulties are the lack of reliable data for validation, integration of disparate information. and DL models that are difficult to understand
[18][19][20]. Although there has been significant improvement in this field, a lot more progress is required.
2. AI for Insightful MD Data Analysis
Molecular dynamics (MD) are extensively employed in the field of drug discovery, playing a crucial role in understanding molecular interactions and predicting drug behaviour. Its research applications include lipid membrane permeability, protein–ligand binding, protein–protein interactions, and partition coefficients. The advantage of MD is the ability to determine thermodynamic characteristics such as the free energy of binding in atomistic detail
[21]. Based on a general model of physics that governs interatomic interactions, MD predict how each atom in a protein or other molecular system will move over time
[22]. The fundamental concept of MD is to determine the force that all other atoms in a biomolecular system exert on each atom by knowing the position of all. To determine the forces in an MD simulation, a model called a molecular mechanical force field is used, which is often adjusted to the results of some experimental observations and the results of quantum mechanical calculations. Additionally, these simulations can predict how biomolecules respond to changes such as mutation, phosphorylation, protonation, or the addition or removal of a ligand at the atomic level
[23]. Traditional MD tools such as GROMACS 2021, Amber 20, NAMD 2.14, and GROMOS 56a7 are well established and used to study interactions between disease targets and drugs
[24][25][26][27]. With increased computing power and advanced software, MD is an effective method for studying systems of biomolecules and ligands; however, it has to deal with enormous amounts of data and extremely complex simulated systems
[28]. The massive amounts of data generated by MD can be processed quickly and efficiently using DL techniques. To increase the efficiency and speed of analysing MD data, DL can transform the high-dimensional structure space into a low-dimensional feature space. Surface extraction and free energy kinetics, force fields, coarse-grained molecular dynamics, thermodynamics, and other applications are just a few of the many applications of DL in molecular simulations. ML models are important tools in the field of MD. With its exceptional ability to predict chemical qualities such as binding affinity and solubility, RF models are a useful tool for managing intricate, high-dimensional data. MD trajectories are analysed using RNNs, which capture temporal relationships and provide insights into dynamic processes. Because they are good at evaluating chemical structures, graph neural networks (GNNs) are a powerful tool for tasks such as drug-target binding and protein–protein interaction prediction. Meanwhile, the creation of new molecular structures is facilitated by GANs, which accelerates progress in materials research and drug development. These ML models including GANs, GNNs, RNNs, and RFs have transformed the field of molecular dynamics and offer invaluable resources for comprehending and investigating molecular behaviour
[29]. Plante et al. used a DL technique to process large MD trajectories and classify GPCR ligands into full, partial, or reverse agonists with good accuracy. The red, green, and blue (RGB) pixels represent the X, Y, and Z coordinates of the protein atoms in the MD trajectory. The confusion matrix was used to train the neural network to predict the classification class labels of GPCR ligands.
In molecular docking, the scoring factor or scoring function (SF), is a mathematical function used to evaluate the binding affinity between a ligand and a receptor. Van der Waals forces, electrostatic interactions, hydrogen bonds, and hydrophobic interactions are only a few of the molecular interactions that are taken into account by the SF factor. Through its guidance of lead identification, virtual screening, and ligand optimization activities, it plays a critical part in the drug development process. When comparing scoring criteria between ML and traditional methodologies, ML approaches have several benefits over conventional methods. These techniques utilize sophisticated algorithms, such as neural networks or RF, to identify intricate patterns and correlations within the data. More accurate predictions are made possible by their ability to manage high-dimensional feature spaces and capture non-linear correlations. Conventional methods, on the other hand, rely on pre-established mathematical formulas that are derived from empirical or physical principles. Conventional methods may not be able to fully capture the intricacy of molecular interactions, despite their widespread usage and insightful contributions. In 2017, Ragoza et al. developed a technique using CNN scoring algorithms to automatically learn the basic binding-related properties of the PLI, given the full 3D description of the PLI as input. Their CNN scoring functions were optimized to distinguish known binders from non-ligands, and between valid and invalid binder positions
[30]. A ML-based grading mechanism was created by Nguyen et al. in 2018 to choose the postures produced by GOLD, GLIDE, and Autodock Vina. Using a targeting ligand, they created a complex training dataset through WPB. They then docked the ligands again to the proteins of the selected complexes and used CNN to collect topological features, and implemented RF to understand biomolecular structure. The final predictions for this strategy were determined by combining the energy values predicted by these two machine learning algorithms
[31].
3. Compound Screening with AI
The aim of drug discovery is to identify small molecules that can alter the function of the target protein and disease phenotypic characteristics. At the same time, there is also a need to search for small molecules with good pharmacokinetic properties and minimal toxicity. The identification of drug candidates, their validation, pharmacokinetics, and preclinical toxicity assessment are difficult, time-consuming, and expensive processes. It takes an average of 10 to 12 years for a drug to reach the market, and between USD 800 and USD 1.8 billion is spent on each successfully discovered drug
[32][33]. The application of AI in drug screening began a revolutionary era in pharmaceutical research, significantly reducing R&D costs by 50% while increasing efficiency and accuracy
[34]. AI addresses various challenges related to drug screening, including predicting physicochemical properties and assessing bioactivity and toxicity.
3.1. Primary Drug Screening: Enhancing Cell Classification and Sorting
AI can help to ease the load of repetitive and challenging tasks, making the drug development process faster. These tasks involve various activities such as sorting cells, calculating properties of small molecules, using computers to create organic compounds, designing new compounds, developing tests, and predicting the 3D shapes of target molecules
[35]. Accurate identification of cell type is possible using AI-based approaches, particularly the least square vector support method (LS-SVM). With a classification accuracy of 95.34%, this method makes a significant contribution to high-throughput automated cell sorting techniques
[35][36][37]. In addition, the interpretation of computerized electrocardiography (ECG), which is a fundamental step in the clinical processes of diagnosis and treatment, has shown the promise of AI
[38].
Based on a CNN architecture, Nitta et al. developed the Intelligent Image-activated Cell Sorting system (iIACS), which can sort cells in real time based on cell pictures. The system combines a double membrane pump, a two-stage microfluidic chip with 3D hydraulic focusing, and a high-speed fluorescence microscope to enable automatic liquid focusing, cell sorting, and real-time detection. The intercellular contacts and intracellular localization of proteins allows sorting of blood cells and microalgae in real time. To obtain higher-quality cell images and reduce processing time, researchers modified the iIACS-based cell imaging approach with an image sensor-based optomechanical flow imaging method and improved the system hardware
[36].
3.2. Secondary Drug Screening
3.2.1. Predicting Physicochemical Properties
AI is ideal for secondary drug screening, where predicting physicochemical properties is crucial for drug development. Deep neural networks (DNNs) algorithms that use chemical descriptors such as SMILES strings and potential energy measurements are used to generate viable compounds
[39][40]. To predict chemical properties, these networks are divided into generation and prediction phases. Each stage is trained independently through supervised learning
[39]. The study highlights the importance of considering physiochemical parameters such as melting point and logP when selecting the best drug candidates. Using data from the Environmental Protection Agency (EPA) and the Estimation Program Interface (EPI) package, Yang et al. developed a quantitative structure–property relationship (QSPR) method to determine six physicochemical properties of environmental pollutants. These include the bioconcentration factor (BCF), log P, log S, melting point (MP), boiling point (BP), and vapour pressure (log VP). The lipophilicity and solubility of various substances were predicted using neural networks based on ALGOPS software (version 2.1) and ADMET predictor. It has been shown that it is possible to predict the solubility of compounds using DL techniques such as graph-based convolutional neural networks (CVNN) and recurrent neural networks on undirected graphs
[41]. Panapitiya et al. evaluated various molecular representation techniques (such as molecular descriptors, SMILES, molecular graphs, and 3D atomic coordinates) and deep learning techniques (such as fully connected neural networks, RNNs, graph neural networks, and SchNet) for solubility prediction. According to the authors’ results, which were based on the same test dataset, the fully connected neural network outperformed other neural networks in predicting solubility with chemical descriptors. In addition, the researchers examined the importance of different variables in prediction and found that two-dimensional molecular descriptors made the greatest contribution
[42].
3.2.2. Predicting Bioactivity: Optimizing Compound Activity
ML techniques, including matched molecular pairs (MMPs), RF, gradient boosting machines (GBMs), and DNNs, are used to predict chemical bioactivity. The performance of RF and GBMs is outperformed particularly well by the combination of MMP with DNN
[43][44][45]. This method can predict multiple bioactivity characteristics including oral exposure, intrinsic clearance, ADME, and mode of action, which helps in decision-making in drug development
[46][47][48]. Koutsoukas et al. used a multilayer perceptron model based on molecular fingerprint representation of the compounds to predict various bioactivity (pKi and pIC
50) against seven targets, including two GPCRs, cannabinoid receptor 1 and dopamine receptor D4. Additionally, the researchers found that MLP performs better than traditional ML techniques on large datasets. However, they pointed out that DL models require much finer tuning of hyperparameters to achieve high predictive performance
[49]. Stokes et al. proposed a direct message neural network capable of predicting antimicrobial activity. They generated a feature vector for each molecule by first building a molecular graph based on its SMILES and then combining bonding information (e.g., bond type and stereochemistry) and atomic features (e.g., the number of bonds for each atom and the atom number) used. The optimized feature vector was fed into a feedforward neural network, which acquired the antibacterial properties of the chemical by repeatedly performing the message transmission process
[50].
3.2.3. Toxicity Prediction: Mitigating Risk through AI
AI-based toxicity prediction plays a key role in evaluating drug safety. The DeepTox algorithm is effective in correctly assessing the toxicity of a substance
[51]. This capacity to forecast could prevent probable side effects during drug development. The application of open-source technologies such as TargeTox and PrOCTOR enhances researchers’ proficiency in the identification and resolution of toxicological issues, thereby advancing the field of toxicity prediction
[52][53]. However, the literature lacks essential information, even for basic molecular structures. This absence hinders the ability to conduct a fundamental environmental assessment for a therapeutic molecule’s synthesis path. To address this, simple algorithms use substance interpolation based on known toxicity and similar structures. By leveraging AI expertise, researchers can efficiently discover target-specific compounds, predict physicochemical properties, assess bioactivity, and predict compound toxicity.
4. Drug Design and Optimization
The aim of drug design is to obtain small molecules that could meet various criteria, including efficacy for pharmacological purposes, an appropriate safety profile, appropriate chemical and biological properties, and sufficient innovation to secure intellectual property rights for commercial success, etc.
[54]. The computational tools have been crucial to drug discovery and have completely changed the way drugs are designed. There are still several problems associated with traditional computational techniques, including input time, computational cost, and reliability
[55][56]. AI could overcome all the obstacles associated with computational drug development, thereby increasing the usefulness of computational techniques in drug development.
4.1. AI in Molecular Design
The process of automatically suggesting novel chemical structures that align most effectively with a specified molecular profile is referred to as “de novo molecular design”
[57]. A virtual chemical library is created for computational testing followed by synthesis, and characterization. The variational autoencoder and GANs are two significant technological advances in deep generative modelling
[58][59]. “Molecular Autoencoder”, studied by Gómez-Bombarelli et al., was the first to demonstrate detailed generative modelling of molecules. An interesting method is the Variational Autoencoder, which connects an encoder network with a decoder network consisting of two neural networks
[60]. The encoder network transforms the chemical structures defined in SMILES into a latent space represented by a real-valued continuous vector. The decoder component can convert latent spatial vectors into chemical structures. De novo design has also been successfully applied through RNNs. They were originally developed in the field of natural language processing
[61]. Sequential information is fed into the RNN. The most commonly used models for modelling and creating sequences are RNNs
[62][63]. A GAN was successfully employed by Kadurin et al. to recommend potential anticancer drugs
[64]. Assmann et al. discussed the practical difficulties in using de novo design to facilitate the identification of new CDK9 inhibitors. The use of the molecules suggested by the molecular generator as seeds in the similarity search of the Enamine REAL library is characterized by an improved variation of the vs. approach. Of the 69 compounds tested, seven were active against CDK9
[65]. Perron et al. have published another useful example of applying generative methods to find optimum solutions to multiparameter objectives using an RNN-based generative model
[66]. Li et al. looked at the potential of RNN-based de novo design techniques to produce new molecule inhibitors in chemical space that has been thoroughly researched. Scientists announced their search for new inhibitors of CDK4 kinases and the well-studied proto-oncogene protein serine/threonine kinase 1 (PIM1). After testing four different drugs, they identified a potent PIM1 inhibitor and two key CDK4 inhibitor compounds
[67].
4.2. Predicting Pharmacokinetics and Pharmacodynamics
The key concepts of pharmacology include pharmacokinetics and pharmacodynamics. While pharmacodynamics focuses on how a drug works in the body and how it affects other systems in the body, pharmacokinetics deals with the study of drug absorption, distribution, metabolism, and elimination (ADME)
[68]. The application of AI techniques in pharmacokinetics and pharmacodynamics has created new opportunities to improve drug development and personalized treatments. It can analyse complex datasets, identify trends and make predictions that could improve patient outcomes, improve drug delivery and minimize side effects
[69][70]. ML and DL techniques are widely used to predict pharmacokinetic parameters. Numerous ML techniques—including Bayesian model, RF, SVM, ANN, and decision tree—have been used to predict the ADME of drugs. To predict various pharmacokinetic parameters such as drug absorption, bioavailability, clearance, volume of distribution, and half-life, DL algorithms such as Convolutional Neural Networks (CNN), Short-Term Memory (LSTM), and RNN are often used. A computational method called quantitative structure–activity relationship (QSAR) uses the chemical structure of a molecule to predict its biological activity
[71][72][73]. With improved training data, a 47th version of admetSAR 2.0 is now available. This program also includes a module called ADMETopt, which is used to optimize lead activity based on expected ADMET attributes
[74]. AI techniques facilitate the modelling of drug–receptor interactions and the prediction of drug efficacy and toxicity in the field of pharmacodynamics. The use of AI in pharmacokinetics and pharmacodynamics can significantly accelerate the drug discovery process and improve precision medicine
[70][75]. Obrezanova et al. used conventional ML techniques and multitask convolutional neural networks to calculate time-dependent pharmacokinetic profiles and nine in vivo pharmacokinetic parameters in rats (oral and intravenous administration) based on in vitro measured ADME properties and molecular chemical structures of 3000 different compounds
[76]. Ye et al. used transfer learning and multitasked learning to pre-train the model on over 30 million bioactivity data points. The model was then used to estimate four human pharmacokinetic parameters: oral bioavailability, plasma protein binding, V
d, and half-life, for 1104 FDA-approved small-molecule drugs. Compared to other traditional ML techniques, their DL model showed the highest performance (although not always by a significant margin) and generalization ability, achieving a mean absolute error or MAE = 0.31 for oral bioavailability and MAE = 0.17 for V
d [77]. Interestingly, Lou et al. created a model that predicts the bioavailability of mAbs administered through subcutaneous preparation in humans. A dataset of 45 clinical mAbs—with sequence and structure-based features including isoelectric point, total charge, aggregation propensity, solubility score, surface hydrophobicity spots, positive charge, and negative charge (with a threshold of 70% bioavailability)—were used to build a classification model. The study used a range of traditional Scikit-Learn ML techniques such as Adaptive Boost, Multilayer Perceptron, RF, and SVM. Among them, the tree approach showed the highest accuracy, reaching 78%
[78].
Two areas that benefit greatly from the implementation of AI algorithms are drug design and optimization. De novo design, virtual screening, and structure-based drug design are just a few examples of these algorithms. The application of AI to drug development and optimization has a transformative impact on the discipline, enabling the rapid discovery of new therapeutic candidates and the more targeted and effective exploration of chemical space.