Better Antimicrobial Peptides Databases

Better Antimicrobial Peptides Databases: Comparison

Please note this is a comparison between Version 1 by Jianhua Wang and Version 4 by Beatrix Zheng.

With the accelerating growth of antimicrobial resistance (AMR), there is an urgent need for new antimicrobial agents with low or no AMR. Antimicrobial peptides (AMPs) have been extensively studied as alternatives to antibiotics (ATAs). Coupled with the new generation of high-throughput technology for AMP mining, the number of derivatives has increased dramatically, but manual running is time-consuming and laborious. Therefore, it is necessary to establish databases that combine computer algorithms to summarize, analyze, and design new AMPs. A number of AMP databases have already been established, such as the Antimicrobial Peptides Database (APD), the Collection of Antimicrobial Peptides (CAMP), the Database of Antimicrobial Activity and Structure of Peptides (DBAASP), and the Database of Antimicrobial Peptides (dbAMPs). These four AMP databases are comprehensive and are widely used.

antimicrobial peptides
databases
characteristic function

1. Introduction

Antibiotics represent one of the major discoveries made in the field of health during the 20th century. Starting with the discovery of penicillin in 1942 as the first key milestone, antibiotics have greatly benefited humanity, playing a key role in the treatment of human and animal diseases. However, due to the long-term abuse of antibiotics, especially in husbandry production, many bacteria have formed and have developed antimicrobial resistance (AMR) over time. These bacteria include Staphylococcus aureus, Streptococcus, Escherichia coli, and other species. Some of them have developed multi-drug resistance quickly, which significantly reduces the efficacy of antibiotic treatment ^[1][2][3][4][1,2,3,4]. The first sulfonamide drug with a special resistance mechanism was reported in 1937, but the threat of AMR received little attention at that time. After drug-resistant plasmids were first reported in 1960, the number of antimicrobial-resistant bacteria steadily increased year by year in the nearly 30 years that followed ^[1][5][1,5]. There is now an urgent need for a series of new ATAs to address this issue. Figure 1 shows the timeline of resistance development for the major classes of antibiotics.

Figure 1.

Timeline of research progress on antibiotics, antibiotic resistance, AMPs, and AMP databases.

AMPs are produced naturally in organisms and act as an innate defense system against invading pathogens via diverse mechanisms of action [6]. Melittin and maganin were first discovered by Fennell and Zasloff in 1967 and 1987, respectively ^[7][8][7,8]. In Sweden, Boman’s team discovered and reported typical antibacterial peptides known as cecropins from the insect Hyatophoraceropia during the 1970s and 1980s ^{[9][10][11][12][13]}[9,10,11,12,13], marking a key moment in the development of AMP science. From 1980 to 2000, AMPs, including defensins, cecropins, and magainin, were isolated from humans, insects, and marine animals. Since then, the number of AMPs has increased dramatically, accelerating the establishment and development of AMP databases ^{[14][15][16][17][18][19]}[14,15,16,17,18,19]. AMPs are naturally produced by humans, animals, bacteria, and fungi and include bacteriocins and fungus defensins. They are all involved in antimicrobial and immune regulation at trace levels and respond in vivo ^{[14][15][18][20][21][22][23]}[14,15,18,20,21,22,23]. In addition, AMPs are also reported to have anticancer, antiviral, antiparasitic, and antibiofilm functions ^[24][25][26][24,25,26]. For example, Maximin-1, Dermaseptin-B2, Macropin-1, HBD-3, and Opis were discovered in 2002, 2010, 2012, 2017, and 2020, respectively, and have antiviral, antimicrobial, antifungal, and antiparasitic properties ^{[18][20][23][25]}[18,20,23,25]. On a basis of works over the past two decades, the “iron triangle” theory for the prevention and treatment of human and animal diseases has been recently proposed serving the One Health concept, consisting of AMPs, antibiotics, and vaccines and focusing on strong penetration, high internalization, and low AMR ^{[26][27][28][29][30][31]}[26,27,28,29,30,31]. Most studies have emphasized that a better understanding of the structure and activity of peptides is vital and have demonstrated the value of databases to classify them. In terms of research, application, and construction, it is known that most AMPs, with a length of about 50 amino acids, have cation characteristics (+6~+8) and can have both hydrophobicity and amphiphilicity ^{[17][23][24][25]}[17,23,24,25]. Their structures include α-helix, β-sheet, linear, and α-helix and β-sheet combinations. For example, CecropinA, LactoferricinB, LeucocinA, HBD-3, and indolicidin have α-helix, β-sheet, α-helix plus β-sheet unpackaged, α-helix plus β-sheet packaged, and linear structures, respectively ^{[32][33][34][35][36]}[32,33,34,35,36]. The structure and physicochemical properties of these AMPs are summarized in Figure 2. Around the year 2000, some AMP databases were constructed according to the charge, length, antimicrobial activity, and structure of AMPs and mainly functioned as prediction tools based on the natural templates of AMPs. The key amino acids as cysteine, lysine, arginine, and glycine, among others, in the sequences of AMPs significantly affect their structure and physical properties, especially the status of cationic and hydrophobic properties ^{[37][38][39][40][41][42][43]}[37,38,39,40,41,42,43]. However, there is a threshold beyond which strong hemolysis and cytotoxicity can follow ^[44][45][44,45]. Most AMPs interact with bacterial anionic lipid membranes or viral capsids through cationic attraction and hydrophobicity through processes such as the carpet, β-barrel wall, or ring pore models ^[46][47][46,47]. They further combine with nucleic acids, intracellular proteins, and enzymes to inhibit transcription, translation, and biosynthesis, thereby inhibiting the formation of cell walls, cell membranes, or even the cell cycle. Studies have shown that AMPs such as those in the defensin family target the binding of bacterial lipids to exert high antibacterial effects ^{[48][49][50][51]}[48,49,50,51]. With the rapid increase in the number of AMPs, the processes of in vivo/vitro, one-by-one, and step-by-step verification use a number of resources in terms of design, screening, and confirmation. Most AMPs suffer serious limitations with regard to low yield, instability, and toxicity. Therefore, it is necessary to establish an AMP database and to combine it with computer algorithms to efficiently and accurately predict and design new AMPs ^[52][53][52,53] and to further validate the iron triangle theory ^{[26][27][28][29][30][31]}[26,27,28,29,30,31] and its application in health maintenance.

Figure 2. The 3D structures of typical AMPs (Source: Protein Data Bank (PDB); Tool: UCSF Chimera). (a) 1F0F: AMP, the α-helix structure of CecropinA (1–8); (b) 1LFC: AMP, the β-sheeted structure of LactoferricinB; (c) 1CW6: AMP, α-helix and β-sheeted unpacked structure of LeucocinA; (d) 1KJ6: AMP, the α-helix and β-sheeted packed structure of hBD3; (e) 1g89: AMP, the structure line of 1g89.

More than ten AMP databases have been established to collect and classify AMPs so far, including APD3, DBAASPv3, CAMP3, dbAMP2, ANTI- MIC, YADAMP, LAMP2, DRAMP3.0, CyBase, and PenBese ^{[54][55][56][57][58][59]}[54,55,56,57,58,59]. Among them, the first four are the most popular because of their superior tool buffering, large data resources, and powerful function, thus attracting more users [60]. These four databases were first built in 2005, 2008, 2014, and 2018 ^{[56][60][61][62][63][64]}[56,60,61,62,63,64] and updated in 2016, 2016, 2021, and 2022, respectively ^{[55][65][66][67]}[55,65,66,67]. The data resources and analytical functions of AMP databases are their essential features. Now, more and more AMP databases are being recognized as bioinformatics resources to identify, predict, and design new AMP derivatives with better or improved properties. For example, non-hemolytic anti-MRSA AMPs from plant sources have been obtained using the above tools to design them ^[42][56][68][42,56,68]. Although a variety of AMP databases have been established, they have not been applied fully or extensively, due to their weak reliability for prediction ability in design processes [60]. Only data acquisition and prediction are used in practice. Further resources are urgently needed to support additional requirements such as AMP mining [68], DNA editing, AMP AI editing [69], complex BI analysis [70], computer-aided design [71], and chemical and synthetic biology ^{[72][73][74][75]}[72,73,74,75]. When considering how to supplement these disadvantages in AMPs and achieving the above goals in AMP science in the future, there is room for improvement. There are large challenges facing meeting the above new requirements for AMPs in health practices in humans and animals. The evolution of antibiotics, AMPs, and AMP databases is shown in the timeline in Figure 1.

2. Four Typical AMP Databases

AMP databases usually feature a number of functions, such as large datasets with logistical classification, accurate prediction abilities, fast searching, and unique computer algorithms. Their most important features include prediction tools and abundant data from different pathways. Those prediction tools were developed by analyzing the physicochemical properties, toxicity, and specificity of AMPs. Four databases (DBAASP, CAMP, APD, and dbAMP) are the most popular so far; their advantageous modules are shown in Figure 3. They are introduced one by one in the following sections [61].

Figure 3.

Advantageous modules of four AMP libraries.

2.1. DBAASP

DBAASP is a database that is curated manually that collects experimentally validated AMPs through experiments in which the physicochemical properties can be predicted or analyzed [56]. Recently, the 3D structures of the AMPs in this database were updated [63]. Presently, a total of 18,719 entries have been collected and classified in DBAASP ^[66](Table 1 and Table 2) [66]. It is the most comprehensive database for evaluating the antimicrobial activity, cytotoxicity, and hemolysis of target peptides obtained through the collection of validated AMPs from laboratory studies. Users can search by peptide ID, name, synthesis type, sequence, length, C-terminal N-terminal modification, family source, intracellular target, UniProt ID, BD structure, hemolysis, and other fields to obtain the target sequence. Another advantage of DBAASP compared to other databases is its capacity to learn the structural and functional relationships of AMPs (Figure 3). Of course, instability, molecular weight, secondary structure, and half-life parameters should be added or supplemented if possible, and more machine learning (ML) algorithms should be adopted to increase and ensure the accuracy of prediction results.

2.2. APD

The APD database was established in 2003 by Wang Guangshun team and has been updated in recent years. It contains 1228 peptides (including 65 anticancer peptides, 76 antiviral peptides, 327 antifungal peptides, and 994 antibacterial peptides) and offers search capability [76], statistical analysis, structure–function relationships, and other AMP indexes [66]. Currently, there are 3425 AMPs in the APD3 database, which are mainly derived from natural species and are very close to the actual number of reported active AMPs with high reliability (Table 1 and Table 2). Through this database, the physical and chemical properties of AMPs, including their molecular size, isoelectric point, hydrophilicity, structure, hydrophobic residues, protein-binding capacity, and net charge, can be predicted and calculated. Another feature is the AMP timeline module, allowing a better understanding of AMPs in relation to time (Figure 3). This database is considered to be the best tool for learning about the development and predicting the physical and chemical properties of AMPs. The APD3 database is in need of improvement. Its capacity for buffering candidates and physicochemical properties are relatively limited: for example, some derived peptides with better antibacterial activity are not included in the library or are not classified in detail. Furthermore, the in-depth analysis of anti-Gram-negative/positive bacterial peptides, more family sources, and better ability to predict on the potential and toxicity of AMPs should be integrated.

2.3. CAMP

The Collection of Antimicrobial Peptides (CAMP), established by Shaini Thomas in 2010, is a free online database that includes mature ML algorithms for various AMPs, initially including 3782 AMPs: 2766 AMPs from experimentally verified patents/non-patents and 1016 predicted sequences [62]. Its latest version features 10,247 sequences, containing 8164 AMPs, 2083 patented AMPs, 757 structures, and 114 AMPs with family-specific features (Table 1 and Table 2). The best feature of CAMP is its prediction tools based on ML algorithms such as Random Forest (RF), Support Vector Machine (SVM), and Discriminant Analysis (DA), achieving accuracy levels of 93.2%, 91.5%, and 87.5%, respectively. This database marks the relationship between sequence structure and antibacterial activity for the first time and is useful for searching sequence activities and for determining their specificity and relationships with AMPs ^[77][78][77,78]. By analyzing sequence signatures consisting of patterns and Hidden Markov Models (HMMs) from 1386 experimentally studied AMPs, 45 AMP families have been generated in this database. It is expected that sequence optimization algorithms to rationally design amplifiers will be widely used (Figure 3) in design practices in the future. In addition, regarding the physicochemical properties of AMPs, such as hydrophobicity, net charge, instability, amphipathicity, and toxicity, the statistical results and derived peptides should be further improved.

2.4. dbAMP

The dbAMP database is the largest database and was developed by Tzong-Yi Lee in 2018 [64]. It initially contained 12,389 AMPs able to be retrieved through the NCBI, UniProt, PDB, and AMP databases, such as APD3, CAMPR3, ADAM, PhytAMP, AMPer, Antip2, BACTIBASE, and LAMP. References can be retrieved by querying the searchable fields of AMP-related articles individually [64]. The latest version, updated in 2022, includes 26,447 AMPs and 2262 antimicrobial proteins, with 4579 references [79] (Table 1 and ^[79]Table 2). It also offers transcriptomic and proteomic data from all species quickly and simulates the 3D structures of AMPs online. Thus far, a total of 458 3D-structured AMPs have been collected and are available to users [65]. Compared with other databases, its best feature is the capacity to predict the activity of AMPs on different target bacteria, viruses, cancer cells, fungi, and mammals and to handle the transcriptomic and proteomic data obtained by applying high-pass technologies such as mass spectrometry (Figure 3). Because of this, it has particular value when dealing with transcriptomic and proteomic data and when analyzing their specificity. In addition, AMPs can be searched by their dbAMP ID number, although this feature works less smoothly than is desirable. Another negative is that the dbAMP database lacks the ability to predict physicochemical properties such as the hydrophobicity, net charge, amphiphilicity, instability, and hemolysis of AMPs.