1. Background
The first description of a pentapeptide repeat protein (PRP) was reported in 1995, when Haselkorn and coworkers identified a gene from the filamentous cyanobacterium
Nostoc (formerly
Anabaena) sp. strain PCC 7120, which, when mutated, altered the composition of glycolipids encasing the heterocysts, among other alterations ()
[1]. They named the gene
heterocyst-specific glycolipids-directing protein K (
hglK) for the role that the gene played in localization of glycolipids to heterocysts. The protein encoded by the
hglK gene, HglK, was predicted to contain four trans-membrane spanning regions and an unusual alanine- and leucine-rich pentapeptide repeat (PR) region made up of 36 PRs with the consensus sequence AXLXX
[1]. Therefore, HglK was the first PRP to be associated with a putative biochemical function. To date, however, the precise mechanism for the role that HglK plays in regulating glycolipid localization to heterocysts remains unknown.
Figure 1. Milestones in the timeline investigation of the structure and function of PRPs.
In 1998, Bateman et al.
[2] reported the discovery of a novel family of proteins, to which HglK belonged, that contained tandem PRs with the sequence motif A(D/N)LXX, based on the analysis of recently determined complete genomes of several bacteria at the time. They observed that PRPs were most commonly found in cyanobacteria
[2]. The authors also proposed a model of PRP structures, rightly predicting that PRPs would adopt a right-handed β helical architecture; however, they predicted a triangular-shaped helix, which would prove to be in error once the first three-dimensional structures of PRPs were determined several years later.
Furthermore, in 1998, Martínez-Martínez confirmed that quinoline resistance in bacteria could be carried on a multi-resistance plasmid (pMG252), which they discovered in a clinical isolate of
Klebsiella pneumonia [3]. Bacterial acquisition of antimicrobial resistance undergoes constant evolution with horizontal gene transfer through plasmids playing a major role
[4]. In 2001, Montero et al. discovered that intrinsic resistance to fluoroquinolines was influenced by MfpA, a putative PRP-containing protein encoded by a chromosomal gene in
Mycobacterium smegmatis [5]. In 2005, Hegde et al.
[6] solved the three-dimensional structure of MfpA from
Mycobacterium tuberculosis, a homologue of MfpA from
M. smegmatis, representing the first three-dimensional structure of a PRP, revealing that it adopted a right-handed quadrilateral β helical structure. Hegde et al. also reported that the structure and electrostatic charge distribution of MfpA mimicked that of DNA, and therefore MfpA was able to confer fluoroquinoline resistance to
M. tuberculosis due to its ability to mimic DNA, bind to DNA gyrase and inhibit its function
[6]. Soon after, many more chromosomal genes encoding homologs of MfpA were discovered in the genomes of a variety of organisms and in 2013, Jacoby and Hooper reported a phylogenetic tree analysis that showed that quinoline resistance genes (
qnr) and
mfpA homologs could be identified in the chromosomes of 58 Gram-negative bacteria and 34 Gram-positive organisms and in 14 plasmid-mediated genes
[7].
In 2003, Chandler et al.
[8] ascribed a cellular function to RfrA, a PRP from the cyanobacterium
Synechocystis 6803, showing that it played a role in regulating a novel manganese uptake system; however, the nature of the system and the precise role that RfrA plays in regulating the manganese uptake system remains unknown.
By 2006, Vetting et al.
[9] reported that the PRP family had grown to more than 500 members in the prokaryotic and eukaryotic kingdoms and they updated the PR consensus sequence as [S,T,A,V][D,N][L,F][S,T,R][G]. In 2009, Buchko reviewed the knowledge of the structure and function of PRPs from cyanobacteria
[10]. In 2014, Shah and Heddle reported that a query of the Pfam database (
http://pfam.xfam.org (accessed on 15 June 2014)) for members of the PR family (PF00805) had expanded to 11,082 sequences from 1513 species
[11] and that protein structures had been solved for a number of PRPs from
Nostoc. sp. PCC 7120
[12][13], Cyanothece 51142
[14][15],
Arabidopsis thaliana [16],
Enterococcus faecalis [17],
K. pneumonia [18],
Xanthomonas albilineans [19],
Aeromonas hydrophila [20], and
M. tuberculosis [6]. In 2019, Zhang et al. updated the PR consensus sequence to (A/C/S/V/T/L/I)/(D/N/S/K/E/I/R)/(L/F)/(S/T/R/E/Q/K/V/D)/(G/D/E/N/R/Q/K) based on the consideration of several newly available PRP crystal structures
[21]. By 2020, the number of PRPs in the PF00805 Pfam had increased to 38,000 sequences in over 3300 species reported in a study by Xu and Kennedy that characterized the protein dynamics in PRPs
[22]. The current distribution of PRPs in PF00805 Pfam is depicted in , indicating 38,981 PRP sequences distributed over 3338 species (
https://pfam.xfam.org/family/PF00805#tabview=tab7 (accessed on 25 April 2021)). In this sunburst plot, 82.2% of the species and 84.7% of the sequences belong to bacteria, 14.1% of species and 13.7% of sequences belong to eukaryota, 0.5% of species and 1.4% of sequences belong to viruses, and 1.1% of species and 2.2% of sequences belong to archaea. The plot shows that PRPs are found most abundantly in cyanobacteria, with 26.9% of all PRP sequences occurring in cyanobacteria; however, cyanobacteria represent only 3.7% of the species in which PRPs have been discovered, indicating that PRPs likely played an important physiological or structural role in the evolution and lifecycle of ancient cyanobacteria. Despite the large and growing nature of the PRP superfamily, three-dimensional structures of only sixteen PRP or PRP-containing proteins have been determined, thirteen of which contain a single PRP domain with α helices capping the N and/or C termini and three of which contain two or more domains including the PRP domain.
Figure 2. Distribution of PRP sequences across species. This sunburst plot of the PF00805 PRP Pfam shows the distribution of 38,981 sequences across 3338 species. The color-coding in the sunburst plot is indicated in the legend.
Ranging from those that have putative associated biochemical or cellular functions to those that have had structures determined but with unknown putative functions, including those involved in (1) heterocyst glycolipid synthesis, (2) manganese uptake, (3) gyrase inhibition, (4) ubiquitin E3 ligases, (5) synaptic vesicle glycoprotein 2 isoform C (SV2C) receptors, and (6) plant and cyanobacteria proteins with three-dimensional structures but no functional characterization (). Although the biological functions of most PRPs remain unknown, three-dimensional structures of PRPs and PRP-containing multidomain proteins continue to be solved and reported with the hope of helping to eventually understand their biological, biochemical or cellular functions.
Figure 3. Summary of the PRPs discussed with and without known three-dimensional structures. The six category groups are shown in the first branch. The second and subsequent branches indicate specific PRPs. PRPs with known structures include the corresponding PDB ID inside parentheses immediately to the right of the PRP name.