1. Bibliographical Sources and Virtual NP Databases
Chemical libraries encompassing millions of compounds include the Chemical Abstracts Service (CAS) REGISTRY database (
http://www.cas.org/expertise/cascontent/registry/index.html, accessed on 20 December 2022), which is updated on a daily basis and contains >250,000 NPs out of >150 million chemical substances, PubChem (including PCSubstance, PCCompound, and PCBioAssay)
[1], ChEMBL (a manually curated database of >2,300,000 bioactive molecules with drug-like properties, last update July 2022)
[2], and ChemSpider (with various levels of partial to complete stereochemistry)
[3]. The free-to-access resource DrugBank is a web-enabled database (
https://go.drugbank.com/, accessed on 20 December 2022) that incorporates comprehensive molecular information about drugs, their mechanisms, their interactions, and their targets. First described in 2006 as a knowledgebase for drugs, drug actions, and drug targets
[4], DrugBank has evolved over time in response to improvements in web standards and changing needs for drug research and development. The latest update, DrugBank 5.0
[5], was expanded to cover not only drug binding data, numerous investigational drugs, drug-drug and drug-food interactions, and SNP-associated drug effects, but also information on the influence of hundreds of drugs on metabolite levels (pharmacometabolomics), gene expression levels (pharmacotranscriptomics), and protein expression levels (pharmacoproteomics). Enzyme inhibitors (DBCAT000003) are described as “compounds or agents that combine with an enzyme in such a manner as to prevent the normal substrate-enzyme combination and the catalytic reaction”.
Reviews on MNPs have been published on a regular basis in the scientific literature
[6][7][8]. The renewed upsurge of interest in NPs, and MNPs in particular, over the last two decades has led to a rapid multiplication of databases in both the private sector and the public domain that compile general-purpose or thematic information on these naturally occurring compounds, often incorporating supplementary material published in scientific papers. A dedicated, searchable, and continuously updated database (MarinLit,
https://marinlit.rsc.org/, accessed on 20 December 2022) that was established in the 1970s by Prof. John Blunt and Prof. Murray Munro (University of Canterbury, New Zealand) has been maintained by the Royal Society of Chemistry (UK) since 2014. MarinLit covers ~40,000 compounds from marine macro- and microorganisms and about the same number of references to journal articles. Among the specialized MNP databases, the Dictionary of Marine Natural Products (DMNP)
[9] appeared as the first of its kind in 2008 and encompassed a subset of data from the Dictionary of Natural Products (DNP, one of several Chapman & Hall chemical dictionaries) based on the biological source of the compounds. DMNP was marketed as a book together with a CD-ROM for a desktop version, and the searchable web-based version CHEMnetBASE (
https://dmnp.chemnetbase.com/, accessed on 20 December 2022) is still available (v. 31.1; updated in 2022), but only to subscribing institutions.
Virtual chemical libraries of NPs can be categorized into (i) encyclopedic and general NP databases; (ii) special subsets within fully enumerated, ultra-large scale chemical libraries specifically built to facilitate VS campaigns, e.g., ZINC
[10][11]; (iii) compound collections enriched with NPs used in traditional medicines; and (iv) specialized databases focused on specific habitats, geographical regions, organisms, biological activities, or even specific NP classes. Unfortunately, many NP databases belonging to the latter two categories are rather ephemeral or rapidly become either outdated or unavailable to the scientific community
[12], and the same criticism applies to many bioinformatics web services related to NPs
[13]. This is most likely due to (i) a lack of funds (and/or human resources) for their sustained management and continuous upgrading, and (ii) the current overwhelming “data deluge”. For these reasons, there is an urgent need for nonredundant, community-wide efforts that optimize the use of contemporary bioinformatic and chemoinformatic capabilities, as exemplified by the recently established open platform LOTUS (
https://lotus.naturalproducts.net, accessed on 20 December 2022), a knowledgebase that is expected to have strong transformative potential for research on NPs and beyond
[14]. In this praiseworthy initiative, data sharing within the Wikidata framework broadens interoperability and facilitates access to >750,000 referenced structure-organism pairs.
Another large and freely available NP database is Super Natural II (
https://bioinf-applied.charite.de/supernatural_new/index.php; last updated: October 2022, accessed on 20 December 2022), which provides two-dimensional (2D) structures and physicochemical properties for ~326,000 molecules, as well as information about the pathways associated to their synthesis, degradation, and mechanisms of action with respect to structurally similar drugs
[15]. An additional recent compilation of 400,000 non-redundant NPs was made available in 2021
[16] as the open-access COlleCtion of Open NatUral producTs (COCONUT,
https://coconut.naturalproducts.net/, accessed on 20 December 2022).
One important goal of these NP databases is to facilitate a quick assessment of novelty for any newly identified compound in a natural extract. To distinguish between known and unknown compounds, it is important to have rapid and trustworthy “dereplication” methods, which rely heavily on the interpretation of molecular mass and molecular formula, as well as UV and NMR spectral data
[17]. Nevertheless, the dereplication process can be problematic sometimes because (i) the present validity and accuracy of the collected information is only as good as that of the original data source; and (ii) stereochemical information on NPs is often inaccurate or incomplete. In the field of MNPs alone, it was recently reported that more than 200 structures were misassigned in the last ten years only
[18]. A comparative analysis of the original and the revised structures revealed that major pitfalls still plague the structural elucidation of small molecules and, consequently, that quite a few 3D molecular structures present in databases may be inaccurate. This finding emphasizes the roles of total synthesis, X-ray crystallography, as well as chemical and biosynthetic logic, to complement spectroscopic data. Nevertheless, it is noteworthy that a much lower incidence of “impossible” structures was found in MNPs compared to NPs of plant origin.
The utilization of computer-assisted structure elucidation (CASE) programs can minimize the risk of misassignment and help identify truly novel compounds (the “unknown unknowns”)
[19] by generating all structures that are consistent with key data from 2D correlation spectroscopy (COSY), heteronuclear multiple bond correlation (HMBC), and 1,1-adequate sensitivity double-quantum spectroscopy (ADEQUATE) NMR experiments, and by ranking the resulting structures in order of probability. The algorithms may additionally benefit from both stereospecific NMR data and use of optimized geometries and predicted chemical shifts provided by density funtional theory (DFT) quantum mechanical calculations
[20]. The absolute configuration of an MNP can be unequivocally confirmed by crystallographic analysis and, in the case of noncrystalline compounds containing a pseudo-meso core structure that results in a specific rotation ([a]
D) of almost zero (e.g., elatenyne), it may be necessary to absorb the compound into a porous coordination network (a “crystalline sponge”)
[21].
The exploration of the identities and biological activities of metabolites present in complex mixtures has benefited enormously in recent years from scalable native and functional metabolomics approaches
[22]. Novel techniques, such as affinity selection mass spectrometry (MS), complemented with pulsed ultrafiltration, size exclusion chromatography, and magnetic microbead affinity selection screening, now allow the separation of non-covalent ligand-receptor complexes from other nonbinding compounds
[23].
Recognizing the need for community-wide platforms to effectively share and analyze raw, processed, or identified tandem MS (MS/MS or MS
2) data of NPs, in an analogous fashion to what has been achieved in genomics and proteomics research with the GenBank
® at the National Center for Biotechnology Information (NCBI)
[24] and the UniProtKB
[25], the open-access knowledgebase known as Global Natural Products Social Molecular Networking (GNPS,
http://gnps.ucsd.edu, accessed on 20 December 2022) was presented in 2017
[26]. The spectral libraries enable unambiguous dereplication (by matching spectral features of the unknown compound(s) to curated spectral databases of reference compounds, i.e., identification of “known unknowns”)
[19], variable dereplication (approximate matches to spectra of related molecules), and the identification of spectra in molecular networks. Importantly, GNPS allows for the community-driven, iterative re-annotation of reference MS/MS spectra in a wiki-like fashion, and therefore it will contribute to library improvements and eventual convergence of all curated MS/MS spectra. The visualization of molecular networks in GNPS represents each spectrum as a node, and spectrum-to-spectrum alignments as edges (connections) between nodes.
Secondary metabolites can be considered genetically encoded small molecules that play a variety of roles in cell biology and therefore have the potential to become chemical probes or drug leads. Their identification and characterization can benefit from a growing number of databases and genomics-based computational tools that have been compiled and hyperlinked at the Secondary Metabolite Bioinformatics Portal (SMBP (
http://www.secondarymetabolites.org/, accessed on 20 December 2022) website
[27]. Inherent limitations related to their low production and difficult detection, and also high rediscovery rates, can be addressed, at least in part, by searching for BGCs in genomic data and unveiling their (sometimes cryptic) metabolic potential
[28]. However, the highly repetitive nature of the associated genes creates major challenges for accurate sequence assembly and analysis, hence the need for new bioinformatic tools. An example is the Natural Product Domain Seeker (NaPDoS) web service (
https://npdomainseeker.sdsc.edu/napdos2/, accessed on 20 December 2022), which provides an automated method to assess the secondary metabolite biosynthetic gene diversity and novelty of strains or environments. NaPDoS analyses are based on the phylogenetic relationships of sequence tags derived from genes encoding PKS and NRPS, respectively. The sequence tags correspond to PKS-derived KS domains and NRPS-derived condensation (C) domains and are compared to an internal database of experimentally characterized biosynthetic genes, so that genes associated with uncharacterized biochemistry can be identified
[29]. The latest update (NaPDoS2) greatly expands the taxonomic and functional diversity represented in the webtool database and allows larger datasets to be analyzed. Importantly, NaPDoS2 can be used to detect genes involved in the biosynthesis of specific structural classes or new biosynthetic mechanisms, and also to predict biosynthetic potential
[30].
The key role of marine microbial symbionts of invertebrates in MNP biosynthesis has been increasingly recognized
[31] and “genome mining” (i.e., the exploitation of genomic information for the discovery of biosynthetic pathways)
[32] provides unique opportunities for (i) the identification of yet undisclosed specialized metabolites
[33] and their chemical variants
[32]; (ii) the genetic engineering of BGCs to obtain novel “unnatural” NPs
[34]; and (iii) the heterologous expression of secondary metabolic pathways that remain silent or are poorly expressed in the absence of a specific trigger or elicitor
[35]. In fact, the results of a variety of genome sequencing projects have unveiled the metabolic diversity of microorganisms (which may be overlooked under standard fermentation and detection conditions) and their tremendous biosynthetic potential. Furthermore, studies on the evolutionary history of BGCs in relation to that of the bacteria harboring them (“comparative genomics”) beautifully illustrate the mechanisms by which chemical diversity is created in nature and how some NPs represent ecotype-defining traits while others appear selectively neutral
[36].
Novel algorithms have been devised to systematically identify BGCs in microbial genomic sequences
[32][37][38]. A network analysis of the predicted BGCs in Proteobacteria (aka Pseudomonadota, a major phylum of Gram-negative bacteria) has revealed large gene cluster families, and the experimental characterization of the most prominent one revealed two subfamilies consisting of hundreds of BGCs encoding the biochemical machinery for the synthesis of a series of remarkably conserved lipids with an aryl head group conjugated to a polyene tail (i.e., aryl polyenes) that are likely to play important roles in Gram-negative cell biology
[39]. The systematic study of BGCs in Actinobacteria (actinomycetes mainly associated to sponges in marine habitats) is complicated by numerous repetitive motifs. By combining several metrics, a method for the global classification of these gene clusters into families (GCFs) has been developed, and the biosynthetic capacity of the resulting GCF network has been validated in hundreds of strains by correlating confident MS detection of known NPs with the presence or absence of their established BGCs
[40].
The Minimum Information about a Biosynthetic Gene cluster (MIBiG,
https://mibig.secondarymetabolites.org/, accessed on 20 December 2022) specification is a data standard that facilitates the consistent and systematic deposition and retrieval of metadata on BGCs and their molecular products
[41]. MIBiG is a Genomic Standards Consortium project that builds on the Minimum Information about any Sequence (MIxS) framework to (i) identify which genes are responsible for the biosynthesis of which chemical moieties, thus systematically connecting genes and chemistry; (ii) understand the natural genetic diversity of BGCs within their environmental and ecological context; and (iii) develop an evidence-based parts registry for engineering biosynthetic pathways and gene clusters through synthetic biology. The MIBiG standard contains dedicated class-specific checklists for gene clusters encoding pathways to produce alkaloids, saccharides, terpenes, polyketides, NRPs, and RiPPs
[42].
Natural antimicrobial peptides (AMPs) have been found not only in marine fish
[43][44] but also in marine invertebrates
[45][46] as major components of their innate host defense systems. The Antimicrobial Peptide Database (APD,
https://aps.unmc.edu/, accessed on 20 December 2022), online since 2003 and last updated in June 2022
[47], defines four unified classes of AMPs on the basis of the polypeptide chain’s connection patterns: (I) linear polypeptide chains (e.g., cathelicidins)
[48]; (II) sidechain-linked peptides, such as disulfide-containing defensins and lantibiotics (i.e., lanthionine-containing antibiotics, e.g., microbisporicin, produced by the soil actinomycete
Microbispora corallina [49] and mathermycin from the marine actinomycete
Marinactinospora thermotolerans [50]); (III) polypeptide chains with side chain to backbone connection (e.g., bacterial lassos and fusaricidins); and (IV) circular peptides with a seamless backbone, i.e., N- and C-termini linked by a peptide bond (e.g., plant cyclotides and animal θ-defensins)
[51]. The manually curated Database of Antimicrobial Activity and Structure of Peptides (DBAASP,
http://dbaasp.org, accessed on 20 December 2022) provides detailed information (including chemical structure and activity against specific targets) on experimentally tested peptides (both natural and synthetic) that have shown antimicrobial activity as monomers, multimers, or multi-peptides
[52]. The Collection of Antimicrobial Peptides (CAMP), CAMPSign, and ClassAMP are open-access resources that have been developed to advance the current understanding of AMPs, from N- and C-terminal modifications and the presence of unusual amino acids to 3D structures thorough family-specific signatures that facilitate AMP identification and classification as antibacterial, antifungal, or antiviral
[53][54]. Synthetic AMPs are substantially enriched in residues with physicochemical properties known to be critical for antimicrobial activity, such as high α-helical propensity, positive charge, and hydrophobicity.
The Natural Products Atlas
[55] was created as an open-access centralized knowledgebase encompassing ~25,000 microbially produced NPs using a combination of manual curation and automated data mining approaches, and was developed as a community-supported resource under findable, accessible, interoperable, and reusable (FAIR)
[56] principles. It contains referenced data for molecular structure, source organism, isolation, total synthesis, and instances of structural reassignment for compounds of bacterial, fungal, and cyanobacterial origin. Its associated web interface (
https://www.npatlas.org, v. 2.3.0, accessed on 20 December 2022) allows users to search by structure, substructure, and physical properties, as well as to explore the chemical space of these NPs from a variety of perspectives. The NP Atlas is integrated with other NP databases, including the MIBiG repository and the GNPS platform cited above. The NP Atlas was recently updated
[57] and currently embodies (i) >32,000 compounds; (ii) a full RESTful (REST is an acronym for REpresentational State Transfer and an architectural style for distributed hypermedia systems) application programming interface (API); (iii) full taxonomic descriptions for all microbial taxa; (iv) integrated data from external resources, including CyanoMetDB (
https://www.eawag.ch/en/department/uchem/projects/cyanometdb/, accessed on 20 December 2022), a comprehensive public database of secondary metabolites from cyanobacteria (aka “blue-green algae”)
[58]; and (v) chemical ontology terms from both ClassyFire
[59] (see below) and NPClassifier (a deep-learning tool for the automated structural classification of NPs from their counted Morgan fingerprints)
[60].
Finally, more than seven terabases of metagenomic data from samples collected in epipelagic and mesopelagic water locations across the globe by the
Tara (
https://fondationtaraocean.org/en/foundation/, accessed on 20 December 2022) Oceans project have been used to generate an ocean microbial reference gene catalog (
http://ocean-microbiome.embl.de/companion.html, accessed on 20 December 2022) with >40 million nonredundant sequences from viruses, prokaryotes, and picoeukaryotes. Remarkably, almost three quarters of ocean microbial core functionality is shared with the human gut microbiome, and epipelagic community composition was found to be mostly driven by water temperature rather than geography or any other environmental factor
[61]. A more recent analysis of 214 metagenome-assembled genomes (MAGs) recovered from the polar seawater microbiomes revealed strains that are prevalent in the polar regions while nearly undetectable in temperate seawater
[62].
The long-established Gene Ontology (GO) resource
[63][64] describes knowledge of the “universe” of biology with respect to (i) molecular functions, (ii) cellular locations, and (iii) biological processes of gene products, in terms of a dynamic, controlled vocabulary that can be applied to prokaryotes and eukaryotes, as well as to single and multicellular organisms. Along the same vein, a standardized and purely structure-based chemical ontology (ChemOnt) was recently developed to automatically assign over 77 million compounds to a taxonomy consisting of >4800 different categories by means of a computer program named ClassyFire (
http://classyfire.wishartlab.com/, accessed on 20 December 2022) that is freely accessible as a web server
[59]. This new taxonomy for chemical substances consists of up to 11 different levels (kingdom, superclass, class, subclass, etc.), with each of the categories defined by unambiguous, computable structural rules.
As a follow-on, the Chemical Functional Ontology (ChemFOnt), another FAIR-compliant, web-enabled resource (
https://www.chemfont.ca, accessed on 20 December 2022), describes the functions and actions of >341,000 biologically important chemical substances, including primary and secondary metabolites, as well as drugs and NPs. The functional hierarchy within ChemFOnt consists of four functional “aspects” (physiological effect; disposition; process; and role), which are subdivided into twelve functional categories (health effects and organoleptic effects; sources, biological locations, and routes of exposure; environmental, natural, and industrial processes; adverse biological roles, normal biological roles, environmental roles, and industrial applications) and a total of >170,000 functional terms. At the time of publishing, ChemFOnt contained almost four million protein-chemical relationships and more than ten million chemical-functional relationships that can be adopted by other databases and software tools and be of utility not only to general chemists but also to researchers involved in genomics, metagenomics, proteomics, and metabolomics
[65].
NPs are the result of nature’s exploration of biologically relevant chemical space through eons of evolutionary time, hence their high diversity regarding atom connectivity and functional groups. Because they cover a broad range of sizes, 3D structures, and physicochemical properties that can be related to drug-likeness (including favorable ADME characteristics), NPs are considered not only as potential drugs, but also as an invaluable source of chemical inspiration for the development of new bioactive small molecules useful in chemical biology and medicinal chemistry research. The structural diversity of drugs was early assessed by making use of shape description methods and grouping the atoms of each drug molecule into ring, linker, framework (or scaffold)
[66], and side chain
[67]. A methodology that calculated the NP-likeness score—a Bayesian measure of similarity with respect to the structural space covered by NPs—proved capable of efficiently separating NPs from synthetic (i.e., man-made) molecules in a cross-validation experiment
[68]. Nevertheless, rule-based procedures applied to the automated assignment of NPs to different classes, such as alkaloids, steroids, and flavonoids, have unveiled database-dependent differences in the coverage of chemical space
[69]. Beyond that, several cheminformatics techniques have been used to analyze NPs and decompose them into fragments in the belief that their unique substructural features and chemical properties are likely to be optimized for protein recognition and enzyme inhibition. A recent cheminformatic analysis of the structural and physicochemical properties of NP-based drugs in comparison to top-selling brand-name synthetic drugs revealed that macrocycles occupied distinctive and relatively underpopulated regions of chemical space, while chemical probes largely overlapped with synthetic drugs
[70].
Ideally, molecular diversity in drug discovery efforts should be focused on what is usually considered drug-like chemical space (aka “drug space”), which may (or may not) fully comply with Lipinski’s “rule of five”
[71]. A pioneering initiative to map this space made use of 72 descriptors accounting for size, lipophilicity (calculated log P
o/w), polarizability, charge, flexibility (number of nonterminal rotatable bonds), rigidity (total number of rings and rigid bonds), and hydrogen bonding abilities for a set of ~400 compounds encompassing both representative drugs (“core structures”) and a number of “satellite molecules” intentionally placed outside of the drug space (i.e., possessing extreme values in one or several of the desired properties, while containing drug-like chemical fragments). By means of principal component analysis (PCA) and projections to latent structures (PLS) it was possible, after some iterations that involved the inclusion of additional randomly selected active molecules, to extract map coordinates in the form of
t-score values and construct a chemical global positioning system (ChemGPS)
[72]. The ChemGPS scores were found to describe well the latent structures extracted with PCA from a large set of compounds and appeared to be suitable for comparing multiple libraries and for keeping track of previously explored regions of chemical space. Later work (largely based on cyclooxygenase 1 and/or cyclooxygenase 2 (COX-1/2) inhibition) proposed an expansion of ChemGPS to better cover space for NPs, giving birth to ChemGPS-NP
[73], which was further tuned for the improved handling of the chemical diversity encountered in NP research with a view to increasing the probability of hit identification
[74]. The public ChemGPS-NP Web tool (
http://chemgps.bmc.uu.se/, accessed on 20 December 2022) was then developed to allow for the exploration of NPs by navigating in a consistent 8-dimensional global map of structural characteristics built by means of PCA
[75].
Following a different philosophy to chart the known chemical space explored by nature, the structural classification of natural products (SCONP) was devised to accomplish a hierarchical grouping of the scaffolds present in ~170,000 entries from the DNP by establishing parent–child relationships between them and arranging the scaffolds in a tree-like fashion
[76]. Some previous processing was necessary that included structure cleansing (i.e., separation from accompanying molecules) and deglycosylation (in the case of glycosides whose active component is the aglycon part). Unfortunately, stereochemistry could not be considered in this early cheminformatic analysis so that the different possible configurations of the NP scaffolds had to be treated as being equivalent. The conversion of the resulting NP scaffolds to SMILES (simplified molecular-input line-entry system) strings
[77] allowed for the comparison with those of standard synthetic molecules represented by over 10 million drug-like commercially available samples from the ZINC database
[10]. This analysis revealed interesting differences not only between natural and synthetic (i.e., man-made) molecules, but also between scaffolds originating from distinct classes of organisms, i.e., plants, bacteria, and fungi. Visual comparisons of the respective structural features were effectively displayed by plotting the scaffolds according to their frequency distributions
[78]. Moreover, a flexible analytics framework named Scaffold Hunter (
https://scaffoldhunter.sourceforge.net/, accessed on 20 December 2022) generates and enables the visualization of virtual scaffold trees in bioactive compound collections that easily allow for the identification of new starting points for the design and synthesis of biology-oriented small molecule libraries
[79]. Interestingly, a recent cluster analysis of chemical fingerprints and molecular scaffolds of >55,000 compounds reportedly isolated from marine and terrestrial microorganisms showed that three quarters of the MNPs are closely related to compounds isolated from their terrestrial counterparts
[80].
Historically, the total synthesis of NPs followed by derivative synthesis (“active analogue approach” or “analogue-oriented synthesis”
[81][82]) and semisynthetic procedures aimed at modifying the chemical structure of complex fermentation products have enabled a deeper understanding of structure–activity relationships (SAR). In contrast, the de novo combination of NP fragments in unique arrangements, often by virtue of innovative strategies such as “diversity-oriented synthesis”
[83][84], “target-oriented and diversity-oriented organic synthesis”
[85], and “synthesis-informed design”
[86], has been shown to generate focused NP-like libraries containing compounds endowed with bioactivities unrelated to those of the guiding NP(s)
[87][88][89]. Examples of successful workflows of pseudo-NP design and development are “biology-oriented synthesis”
[90][91] and “pharmacophore-directed retrosynthesis”
[92]. In applying the latter approach, a key first step is to elaborate a tentative pharmacophore, i.e., “an ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target and to trigger (or block) its biological response”, as defined by the International Union of Pure and Applied Chemistry (IUPAC)
[93], and then devise a retrosynthetic procedure that ensures that the proposed pharmacophore is present in multiple intermediates of increasing complexity, ultimately leading to the NP. An important goal of these synthetic approaches is to find structurally simplified and optimized derivatives with lower molecular weights that can overcome commonly observed limitations, such as poor oral absorption, short half-life, and low blood–brain barrier permeability.
4. Linking NPs to Their Targets: Computational Methodologies for Building Global Networks
The popular term “druggable genome”
[94] refers to the genes (or, more appropriately, gene products) that are known or predicted to interact with drugs, ideally resulting in a therapeutic benefit. Although drugs are intended to be selective (i.e., have high affinity for one single target), it is not uncommon for many molecules to bind to more than one protein, giving rise to polypharmacology and side effects. Due to the fact that many drug-target combinations are theoretically possible, the computational exploration of possible interactions can help identify potential targets.
Because the systematic identification of drug targets for NPs, regardless of their origin, using a battery of experimental binding or affinity assays, is both costly and time-consuming, a substantial amount of effort has gone into devising in silico tools that allow for the construction of global networks that connect active compounds to their cellular targets. It is expected that, by using these methods, the resulting system’s pharmacology infrastructure will help to predict new drug targets for pharmacologically uncharacterized NPs and identify secondary targets (off-targets) that can aid in the rationalization of side effects of known molecules
[95]. The Drug-Gene Interaction Database (DGIdb 4.0,
https://www.dgidb.org/, accessed on 20 December 2022) provides information on drug-gene interactions and druggable gene products collected from publications, databases, and other web sites
[96]. The latest update mostly focused on (i) the integration with crowdsourced efforts (e.g., Wikidata) to facilitate term normalization and with the open-data web platform Drug Target Commons (
https://dataverse.harvard.edu/dataverse/dtc2tdc, accessed on 20 December 2022)
[97] to enable the upload of community-contributed interaction data; and (ii) export to a Network Data Exchange (NDEx) infrastructure
[98] for storing, sharing and publishing biological network knowledge. The tool named substructure-drug-target network-based inference (SDTNBI) was devised to prioritize potential targets for old drugs (“drug repositioning”), failed drugs, and new chemical entities by bridging the gap between new chemical entities and known drug-target interactions (DTIs)
[99]. A later modification (wSDTNBI)
[100] uses weighted DTI networks, whose edge weights are correlated with binding affinities, and network-based VS, which does not rely on the receptors’ 3D structures
[101]. The publicly available SwissTargetPrediction web server (
http://www.swisstargetprediction.ch, accessed on 20 December 2022)
[102] also attempts to predict the most likely target(s) (in mice, rats, or human beings) for a SMILES-defined input molecule by using a computational method that combines different measures of similarities (both in 2D chemical structure and in 3D molecular shape) with known ligands
[103]. All of these approaches, together with highly efficient receptor-based ligand docking
[104], can be useful to narrow down the number of potential targets, but strict experimental confirmation and validation are needed
[105][106].
The attention initially drawn
[107] to certain synthetic molecules that were responsible for disproportionate percentages of hits in enzyme-based bioassays but, on closer inspection, turned out to be false actives and therefore nonprogressible hits, leading to the PAINS acronym (Pan Assay INterference compoundS)
[108], was later extended to NPs
[109]. As a result, some NPs have been designated as “invalid metabolic panaceas” and the concept of “residual complexity” (
http://go.uic.edu/residualcomplexity, accessed on 20 December 2022) has emerged
[110]. Nowadays, compounds with a PAINS chemotype can be recognized and excluded from bioassays by the judicious use of electronic substructure filters
[111] and machine learning approaches
[112] (e.g., Hit Dexter,
https://nerdd.univie.ac.at/hitdexter3/, accessed on 20 December 2022).
Because the best link connecting NPs to their targets is arguably the experimentally determined 3D structure of the respective complexes, in the following section, I will provide some examples of MNPs and synthetic analogues that were selected on the basis of chemical novelty and submicromolar inhibition data, preferably supported by structural evidence of complex formation with pharmacologically relevant enzyme targets.