Transposable Prophages in Pathogenic Leptospira: History
Subjects: Virology

The virome associated with the corkscrew shaped bacterium Leptospira, responsible for Weil’s disease, is scarcely known, and genetic tools available for these bacteria remain limited. To reduce these two issues, potential transposable prophages were searched in Leptospiraceae genomes. The 236 predicted transposable prophages were particularly abundant in the most pathogenic leptospiral clade, being potentially involved in the acquisition of virulent traits. According to genomic similarities and phylogenies, these prophages are distantly related to known transposable phages and are organized into six groups, one of them encompassing prophages with unusual TA-TA ends. Interestingly, structural and transposition proteins reconstruct different relationships between groups, suggesting ancestral recombinations. Based on the baseplate phylogeny, two large clades emerge, with specific gene-contents and high sequence divergence reflecting their ancient origin. Despite their high divergence, the size and overall genomic organization of all prophages are very conserved, a testimony to the highly constrained nature of their genomes. Finally, similarities between these prophages and the three known non-transposable phages infecting L. biflexa, suggest gene transfer between different Caudovirales inside their leptospiral host, and the possibility to use some of the transposable prophages in that model strain.

  • leptospira
  • transposable prophages
  • phylogeny
  • evolution

1. Introduction

In 1886, Adolf Weil (Ariane Toussaint’s great-grand father) first described what is now known as leptospirosis, which he reported as an “acute infectious disease with enlargement of the spleen, jaundice, and nephritis” [1]. This global zoonotic disease currently causes more than 1 million severe cases and 60,000 deaths per year [2], and is associated with agriculture or industrial activities [3]. Its causative agent, the corkscrew shaped bacterium Leptospira, was first identified in 1907 [4] and recognized as the cause of Weil’s disease in 1916 [5]. Bacteria causing this disease were affiliated with the order Leptospirales in 1979 [6], which currently includes only one family (Leptospiraceae) and three genera: LeptospiraLeptonema, and Turneriella [7]. Improvement in isolation procedures and the advent of the genomic era resulted in the sequencing of more than 700 genomes of Leptospiraceae [8][9], allowing for a better understanding of the diversity and evolution of this bacterial family and of its variable virulence. It was recently proposed [10] to organize the Leptospira genus in four clades, according to their pathogenic status and phylogeny: (i) a clade termed P1 includes virulent pathogenic strains causing disease in humans and mammals (e.g., L. interrogans and L. weilii species); (ii) a clade P2 contains ‘intermediate’ species that can cause disease in certain circumstances; and (iii) two clades S1 and S2 made of saprophytes, i.e., free-living environmental microorganisms not known to cause disease, mostly isolated from soil or surface waters. The P1 clade can be further subdivided into a virulent group and low virulence species that may represent an ancestral lineage of pathogens and a secondary passive reservoir of Leptospira present in soils (named P1-virulent and P1-low-virulent [11]). Species in the P1-virulent group have a higher and scattered GC content, a more open pangenome, with many genes specifically found in single species, a lower coding ratio, and a higher number of pseudogenes and signature genes of horizontal gene transfer such as IS transposases [11].
Molecular techniques for the genetic analysis of Leptospira remain very limited [12]. Mutants, a few plasmids, and only three phages are available for a few model strains. Putative prophages have been detected in several leptospiral sequenced genomes [9][13], among which there are potential transposable prophages. Escherichia coli phage Mu and Pseudomonas aeruginosa phage D3112 illustrate the potential offered by transposable phages for developing powerful genetic tools. Mini-Mu’s and mini-D3112’s have been engineered to remove the lethal phage functions but leave intact the capacity to transpose. They have proved very useful for, e.g., insertional mutagenesis and chromosome mobilization by conjugative plasmids (including broad host range IncP plasmids) and the building of genetic maps in a range of bacterial species [14][15][16][17]. In theory, similar tools could be developed for the many other phyla in which such phages and/or prophages exist [18]. The large number of leptospiral genomes now available offers the opportunity to identify candidate full-length transposable prophages, which, in the future, could help create tools adapted to the genetic and molecular studies for the model L. biflexa strains and other Leptospira strains.
Escherichia coli (E. coli) virus Mu and Pseudomonas aeruginosa (P. aeruginosa) virus B3 are the paradigm transposable phages, which are still described as multiple independent genera among the Myoviridae (Mu) and Siphoviridae (B3). However, because of their conserved and characteristic genome length, gene content, and replication and packaging mechanisms, it has been proposed to group them into a single family, the Saltoviridae, in the order Caudovirales [19]. Their genome is replicated via successive rounds of replicative transposition (illustration available at, accessed on 1 April 2021), a mechanism that produces insertion mutations, inversions, and deletions at more or less random sites in the host genome, and replicon fusions between resident plasmids and the chromosome. This leads to a profound reorganization of the host genome and far more diverse horizontal gene exchanges than promoted by any other type of phage [18]. Packaging of viral-genome copies into viral particles proceeds directly from the randomly inserted replicas, generating viral DNAs with short segments of the flanking host DNA, the variable ends (illustration available at, accessed on 1 April 2021, reviewed in [16]). The biochemistry of the sequential molecular steps of Mu replicative transposition has been deciphered in vitro, explaining how Mu integration generates a five-base-pair duplication at the insertion target site. The transposase cuts the two target DNA strands five bases apart, and the resulting gap is filled during the replication of the Mu insert from the 3′-OH overhangs [20]. Phage B3 generates a 6 bp target site duplication, which is supposed to result from its transposase cutting six bases instead of five bases apart. The 5/6 bp duplication provides a recognizable scar of the integration event, which, with a relatively well conserved genome length of 36–39 Kb and the presence of the transposition functions, are key features that have been used to identify transposable prophages in their host genomic sequences in a wide range of bacterial species [19][21][22][23]. Here, researchers identify 236 potential transposable prophages in genomes of Leptospiraceae, and group them at different scales using genome information and various protein phylogenies, in order to have a better idea of their diversity and try to reconstruct their evolutionary history.

2. Searching for Transposable Prophages in Leptospira weilii

Existing prophage prediction algorithms do not reliably predict transposable prophages, often missing one or both ends. This is most likely due to the presence of a rather long, poorly conserved region, the SEE semi-essential region, coding for small proteins with uncharacterized functions. Nevertheless, many such prophages have been manually predicted in different types of bacteria, using similarity searches with four conserved proteins encoded by both phage Mu and B3: the transposase TnpA, the transposition target binding ATPase TnpB, the late regulator Mor/C, and the GemA protein of unknown function [19][22][23]. The Tn552 transposase, which is related to B3 TnpA, was added here to this set. These nine proteins were first compared to proteins from Leptospira weilii genomes available in RefSeq. No proteins from L. weilii were similar to Mu proteins, but two closely related L. weilii strains, CUDO6 and CUD13, had hits to B3 and Tn552 transposases and to B3 TnpB. These putative transposases were found at the same location in the chromosome sequences (Chr I) of CUDO6 and CUD13 and in the large plasmid (Chr II) of CUDO6 but not of CUD13. The Chr II sequences of CUDO6 and CUD13 were found to be almost identical except from a 39,069 bp insertion in CUDO6 (positions 51,427 to 90,495), the identified transposase proteins being in this inserted sequence. Furthermore, inspection of the gene annotations in this 39,069 bp region revealed the presence of several phage related proteins, pointing to a putative prophage (in blue in Figure 1). Position 90,496 in CUDO6 Chr II corresponds to position 51,383 in CUD13 Chr II, where it is adjacent to the 6 bp sequence CCTACT. In CUDO6 Chr II, these six base pairs are directly repeated at each end of the 39,069 bp insertion, a footprint for a transposition event generated by a B3/Tn552 family transposase (in green in Figure 1). However, contrary to the usual TG-CA inverted repeat conserved at the ends of all Mu and B3-like phage and prophage DNA described so far [18], this new putative prophage had a TA direct repeat at its ends. The two regions where a transposase was identified in CUDO6 and CUD13 chromosomes were identical to this 39,069 bp predicted prophage in CUDO6 plasmid, which could have originated from the transposition of a chromosomal copy. In these two Chr I regions, this putative prophage (in blue in Figure 1) was also flanked by a 6 bp direct repeat (AGATGG in both strains), only one copy of this 6 bp sequence being found in other L. weilii genomes in which this prophage is absent. This first predicted prophage is referred to as WETA1 (for weilii TA-TA prophage number 1) in all analysis below.
Figure 1. Location of the seven predicted prophages in L. weilii CUDO6 and CUD13. The different copies of the three distinct prophages WETA1, WETG2, and WETG3 are colored, as well as the sequences of the flanking 6 bp repeats. See text for details.

3. Predicting Transposable Prophages in Leptospiraceae

To expand the search of possibly functional transposable prophages, the predicted WETA1 transposase protein sequence was used to search Leptospiraceae genomes available in the RefSeq repository. As many as 345 putative transposases were uncovered from 330 Leptospiraceae chromosomes and plasmids (amino acid identity percent ranging from 23 to 97%). Contigs shorter than 35 Kb were discarded, as well as contigs containing very long stretches of phage genes, possibly the result of assembly problems and for which defining precise prophage ends would be almost impossible (Table S1, supplementary could be found in The remaining 236 contigs (Table S2) were good candidates as they were longer than 35 Kb, contained a gene similar to the transposase of WETA1 and other genes similar to transposable phages. Complete prophages and their exact ends were searched in these 236 contigs. However, the 6 bp repeat scar of a transposition event could not be directly identified as repeated 6 bp long sequences are numerous in genomes. Thus, contigs that coded for identical transposase proteins were aligned at the nucleotide level, considering that they should share identical or very similar prophage ends since these contain the transposase binding and cleavage sites. Distinct flanking regions of these prophage pairs allowed to identify both prophage ends, which were then inspected manually to find a flanking 6 bp direct repeat. This strategy identified 97 full length prophages with their 6 bp repeat transposition scar. Proteins encoded by these complete prophages were then used to refine the ends of the 139 remaining prophages.

4. The Predicted Prophages Predominate in Leptospira Pathogenic Strains

The Leptospiraceae family contains three genera, LeptospiraLeptonema, and Turneriella. The 236 putative prophages defined above occur in thirteen different species of the Leptospira genus and only once in another genus, Leptonema illini (two bacterial genomes, each with one copy of the same prophage). This over-abundance of prophages in Leptospira could be explained by the fact that only two and one bacterial genomes are currently available in RefSeq for the Leptonema and Turneriella genera, respectively. The thirteen leptospiral species with at least one transposable prophage were distributed across the leptospiral phylogeny and belonged to four of the five leptospiral clades [10][11]: 215 prophages were found in the P1-virulent pathogens clade, three in the P1-low-virulent pathogens clade, five in the ‘intermediates’ clade P2, four in the ‘saprophytes’ clade S1, and none in the clade S2. However, the fact that human pathogens are more studied and sequenced than environmental Leptospira could explain these results. Researchers thus estimated the number of bacterial genomes sequenced for each species, and indeed, only five leptospiral genomes of the ‘saprophytes’ clade S2 were currently available, a trivial reason explaining the absence of detected prophages in this clade (Table S3). Yet, at least 31 genomes were available in each of the other four clades, and the proportion of genomes with at least one prophage was different inside these four clades. Indeed, the proportion of genomes with one or more prophages was 6%, 10%, 13%, and 26% for the S1, P1-low-virulent, P2, and P1-virulent clades, respectively. In addition, the number of prophages in individual strains was also higher in P1-virulent strains, some of which harbor more than one prophage (average of 1.5 prophage per strain). Leptospira weilii was the most colonized species, 63% of the strains containing at least one prophage, with an average of almost two prophages per strain, and sometimes up to four prophages as in CUDO6 (Figure 1).

5. Sequence Comparison between Predicted Prophages

5.1. Identifying (Almost) Identical Prophages

The 236 predicted prophage DNA sequences were compared and 80 were found to be almost identical to at least one other sequence (>99% nucleotide identity). Identical prophages always resided in bacterial hosts of the same species. Within groups of identical prophages, most were found in closely related strains isolated from similar geographical locations, but in fifteen cases they resided in bacteria isolated from different orders of mammals (Homo sapiens and Chiroptera, Rodentia, Artiodactyla or Afrosoricida). As leptospirosis is a zoonosis, finding closely related strains (and thus closely related prophages) in humans and in different mammal species is not surprising (Table S2). In addition, the same prophage is sometimes inserted at an identical location flanked by the same 6 bp repeat in different strains, as it is the case in L. interrogans strains 56,662 and 56,652 that both contained an identical TA-TA prophage; although, these strains were isolated from rodent (reed vole) and human samples, respectively, and in two different Chinese provinces more than ten years apart. Finally, six individual strains contained several copies of the same prophage (Table S2). These copies are all in separate contigs, as for WETA1 in CUD06 Chr I and II. These numbers are certainly not a correct representation of the occurrence of duplicated copies of the same prophage in a given strain, since as mentioned above, genome assembly tends to put several copies in short contigs missing the ends, which researchers set aside for this analysis. The 80 prophages identified as duplicates were removed to compile a final set of 156 distinct prophages used in further analysis. As many as 74 of these 156 prophages can be considered as full-length, potentially active prophages, because their ends, as well as the 6 bp scar of transposition, could be identified (Table S2).
Among these 74 complete prophages, 28 had a TA dinucleotide at each end flanked by a 6 bp direct repeat as WETA1 in L. weilii CUDO6 and CUD13, with length between 37,910 and 40,416 bp. The remaining 46 prophages had a slightly shorter size (between 35,432 and 38,824 bp) and the classical TG-CA ends (Table S2). The TA-TA and TG-CA prophages were both identified in four species (L. interrogansL. santarosaiL. weilii, and L. kirschneri) and only TG-CA prophages were detected in L. noguchii. As many as seventeen leptospiral strains contained prophages of both TA-TA and TG-CA type. Their whole chromosome being assembled as a unique contig, CUDO6 and CUD13 genomes allow a more precise analysis. Both strains contain two different TG-CA prophages (WETG2 and WETG3) and one TA-TA prophage (WETA1), this last one being in two copies in CUDO6 (see above and Figure 1). It has to be noted that the chromosomal regions that contain the WETG2 and WETG3 prophages are inverted in CUDO6 versus CUD13. The inversion does not seem to have been generated by any of the prophages because identical prophages are each flanked by the same 6 bp repeat and the same host genes in both strains. Hence, the inversion rather results from another type of recombination event.
The structure of the WETA1 transposition protein TnpA was predicted using AlphaFold Predictions [24]. The MuA transposase comes out as the closest relative, including the two N-terminal end-binding domains, the catalytic DDE domain and a small beta-barrel [25], and a helix with a positive stripe near the C-terminal end (Figure S1). The enhancer binding domain, which so far appears specific to Mu, is missing. In addition, structure comparison defines the WETA1 TnpA catalytic domain as D163 D269 E305. A similar analysis was performed on TnpA from one TG-CA prophage, INNN139, using RaptorX [26][27]. MuA transposase also comes out as the closest relative, with the same conserved domains as in the WETA1 protein, missing the enhancer binding portion, the DDE domain being here predicted as D161, D280, and E314, consistent with the most conserved D, D, and E residues among all prophage and phage transposases.

5.2. Defining Genera and Species Using Genome Comparison

The 156 prophage nucleotide sequences were compared with VIRIDIC [28], a tool that allows delineation of genera and species using thresholds of respectively 70% and 95% identity, as advocated by the ICTV. Using these standard cutoffs, transposable prophages from Leptospiraceae were grouped into 126 species within 25 genera (Figure S2Table S2). The 126 viral species only include prophages from the same host species, whereas 10 of the 25 viral genera include prophages from several host species. Phages appear to circulate among L. weiliiL. santarosaiL. alexanderiL. borgpeterseniiL. mayottensisL. interrogansL. kishneri, and L. noguchii, which are phylogenetically closely related and belong to the P1-virulent group, and L. stimsonii belonging to the sub-clade P1-low-virulent. Prophage GC-content is comprised of between 37.7 and 44% except for the Leptonema prophage that has a GC-content of 54.7 (54.2 for its host). For 76 of the 126 prophage species, the GC-content could also be obtained for their host contig (Table S2) and the phage GC% was greater and less variable (average of 41.2 and standard deviation of 2.3) than the host’s (avg 38.2, sd 3.7). These differences are mainly due to 26 prophages having a GC-content different from their host (>±5%), found in leptospiral genomes with a low GC-content (~35%). Prophage genera found in different host species had homogeneous GC-content, sometimes different from their host’s. For example, SANN49, SATG48, and NOTG66, three prophages belonging to the same genus, had a similar GC content (42.9, 42.8, and 42 respectively), which was similar to their L. santarosai hosts for the two first (41.7 and 42.5), but different from the third host, namely, L. noguchii (35.2%).
Considering more distant intergenomic similarities computed by VIRIDIC, prophages are organized into six large groups, even though BYNN151 from L. bouyouniensis (a soil isolate), KANN77 from L. kanakyensis (also sampled from soil), and UNNN26 (undetermined species isolated from water) were only distantly related to others (Figure S2). One of these six groups is only composed of ILNN158 from Leptonema illini and three groups contain TG-CA prophages while all the TA-TA prophages belong to a single group. The last group contains only prophages with no defined ends. As no nucleotide sequence similarity exists between groups, sequence information at the amino acid level was used to further investigate relationships between these six groups.

5.3. Defining Sub-Families Using the Species Phylogeny

The TnpA transposase and its associated TnpB ATPase are signature proteins of transposable phages. These two proteins were fetched from one representative prophage from each of the 126 species defined by VIRIDIC and from known transposable phages. Considering the phylogeny based on the concatenation of TnpA and TnpB multiple alignments, the 126 prophages form a monophyletic group separated from known transposable phages (Figure 2). As observed for the genomic analysis, the Leptonema prophage remains lonely (named IL group), and Leptospira prophages form five groups: one with all TA-TA prophages, three with TG-CA prophages, and one with only prophages with undefined ends (respectively named TA, TG1, TG2, TG3, and NN groups). These well-supported groups (bootstraps > 80) are congruent with those defined above with VIRIDIC at the nucleotide level. A group of four transposable phages including B3 is closer to these prophages than the rest of the reference phages. This is not surprising since B3 TnpA was the only core protein from B3 and Mu to show significant similarity when compared to leptospiral genomes. Even though most prophages were extracted from Leptospira isolated from mammals, two, associated to amphibians, are mixed with mammal ones in group TG1. Similarly, two prophages found in strains isolated from water are closely related to mammal ones (SINN20 and SINN19 in TA and TG3 groups). The other four environmental leptospiral prophages, in hosts sampled from water and soil, are clearly separated from animal-associated ones in groups TG1 and TG3. The only two prophages found in saprophyte Leptospira (clade S1) are also the only two from soil isolates and are divergent from the rest of the prophages. The two prophages found in P2-intermediate strains are closely related and separated from the rest, whereas no clear difference exists between prophages from the P1-virulent and P1-low-virulent strains. Except for the IL group that contains only one member, the five other prophage groups infect several leptospiral species, respectively, eight, nine, four, seven, and three for the TA, TG1, TG2, TG3, and NN groups, respectively (Figure S3). In addition, most leptospiral species are infected by prophages from different groups, L. interrogans being infected by members of the five groups of leptospiral prophages (Figure S3).
Figure 2. Phylogeny from the concatenation of TnpA and TnpB computed with RAXml. Bootstrap supports greater than 80 are indicated on internal branches by blue circles. Reference transposable phages are colored in grey at the top, with phage Mu and B3 printed in red. The 126 prophages (representative of the 126 prophage species) are separated into six clades, each with a different color, e.g., the TA clade in dark blue. Prophages for which exact ends were determined are indicated with a filled circle or triangle for TG-CA and TA-TA ends, respectively. Displayed as three outer circles are: (i) the genera number of each prophage calculated by VIRIDIC (from 1 to 25), (ii) the ecosystem in which the Leptospiraceae host was isolated, and (iii) the leptospiral clade of the bacteria.

5.4. Analyzing Gene Content of Transposable (Pro)Phage

To better understand how Leptospiraceae prophages relate to known transposable phages in terms of gene content, a set of 11,745 proteins was built by combining (i) 8875 proteins predicted from the 156 prophages, (ii) 2625 proteins from 48 reference transposable phages infecting a wide range of bacterial species, and (iii) 245 proteins from the three known Leptospira biflexa phages LE1, LE3, and LE4. These 11,745 proteins were clustered into groups of orthologous proteins (OG) using a two-step procedure involving remote homology detection (Table S4). Ogs were then functionally annotated using HMM comparisons to the PHROG database [29]. Considering only the 204 transposable (pro)phages, their 11,500 proteins are organized into 480 Ogs with at least two proteins and 392 singletons. To reduce the redundancy due to the over-representation of some leptospiral strains, a single representative genome was further considered for each of the 47 genera defined using VIRIDIC (25 genera for Leptospiraceae prophages and 22 for reference phages; Tables S2 and S5). TnpA and TnpB transposition/replication proteins are the only ones shared by all genomes of the 47 genera. Alongside these two, GemA, an unknown early function protein, is the only OG present in all but one genera. The other most conserved Ogs (Figure 3Table S4), shared by most reference and prophage genomes, are: (i) a tail completion also known as neck protein Ne1 (38 genomes out of 47); (ii) a baseplate protein (36 genomes); (iii) a terminase small subunit (33 genomes); and (iv) the Gam protein, which binds linear duplex DNA, conferring protection against RecBCD exonuclease degradation at the onset of viral DNA packaging [30] (30 genomes). Leptospiraceae prophages and known transposable phages have a different set of core genes. Indeed, eight Ogs are found in at least 24 of the 25 Leptospiraceae prophage genera as well as eight Ogs present in at least 21 of the 22 reference genera, with only three Ogs in common (TnpA, TnpB, and GemA). Core proteins for reference phages are composed of the late regulator Mor/C, surprisingly absent from all leptospiral prophages, and of a set of proteins involved in the head and neck formation and DNA packaging (namely, a head maturation protease, portal protein, head–tail adaptor, and neck protein Ne1). Concerning other functional modules such as lysis and tail, their gene contents are more heterogeneous than the prophage ones, possibly reflecting the larger diversity of infected hosts. Accordingly, reference phages infecting closely related hosts have a similar gene content as it is the case for B3 and its relatives infecting pseudomonas (all being Siphoviridae), for Mu and its relatives infecting Gammaproteobacteria (all being Myoviridae), or for phages infecting Rhodobacteraceae (at the bottom of Figure 3). Protein clusters were here built as groups of homologs and not orthologs. Hence, some contain paralogous proteins. Only 3 Ogs conserved in more than 10 phages have a significant number of paralogs (>25% of proteins of the OG): a transcriptional regulator (OG1) that has a paralog in 60 prophage genomes (2 having 3 copies); a likely structural protein (OG17) of unknown function (positioned between sheath and tape measure genes) that has paralogs in TG2 and NN; and an OG of unknown function (OG41) in TG1. All paralogous copies are adjacent on the genomes.
Figure 3. Distribution of orthologous groups (OG) of proteins among reference transposable phages, leptospiral phages and predicted Leptospiraceae prophages. One representative genome was chosen for each (pro)phage genus defined by VIRIDIC. Viruses are organized according to the baseplate subunit protein BW2 phylogeny. Each OG is represented by a column and is colored according to its functional category defined in the PHROG database. The first eight columns of the gene content heatmap represent Ogs conserved in most prophages and known transposable phages and are, from left to right: OG11 (tail completion or neck protein Ne1), OG6 (GemA), OG3 (TnpA), OG2 (TnpB), OG7 (BW2), OG4 (Terminase small subunit), OG5 (Gam), and OG67 (DNA methyltransferase). Ogs containing only one or two proteins were here considered as ORFans and their number in each genome is displayed as a separate column with a grey gradient on the right.

5.5. Determining Families Using Phylogeny and Gene Content

A phylogeny was computed for the most conserved structural OG, a baseplate wedge protein (referred to as BW2 in Mu [31]), and was used to organize (pro)phages in a gene content matrix. The six Leptospiraceae prophage groups defined above with the TnpA/TnpB phylogeny are well supported monophyletic groups in the baseplate phylogeny (Figure 3), except that the only genus of the TG2 group is here found among prophages of the NN group. Moreover, the relationships between groups are different in the baseplate and in the transposition proteins phylogeny. Two large sub-trees gather TA, TG1, and IL (in cold colors) and TG2, TG3, and NN (in warm colors) in the baseplate phylogeny, while TG1, TG2, and TG3 were associated using transposition proteins. These two large monophyletic groups, hereafter termed cold and warm groups, are further supported by their gene content. Indeed, seventeen and nineteen Ogs are specific to the cold and warm groups respectively (i.e., present in all but one genome of this group and absent from the other). The seventeen and nineteen specific Ogs represent two different, almost complete, structural modules. Each group of prophages have their own version of major head, portal, head closure, tail completion, tape measure, tail sheath, and baseplate proteins. It has to be noted that OG23 and OG42, two Ogs specific to the cold and warm groups respectively, are annotated as tape measure proteins and exhibit similarity. These two Ogs were not grouped into a single one because the similarity is below the defined threshold (it covers less than 15% of the two proteins). Furthermore, even if no similarities are detected between the remaining 34 specific Ogs, five of them, specific to ‘cold’ prophages, could be indirectly linked to six specific to ‘warm’ prophages, because they were similar to the same PHROG. For example, OG25 and OG43 respectively found only in ‘cold’ and ‘warm’ prophages were both significantly similar to PHROG52, a much more diverse group including 1295 proteins, annotated as ‘tail proteins’. Similarly, eight Ogs containing Mu structural proteins were linked to at least one of the 36 prophage specific Ogs, through similarity to the same PHROG. In addition, even if separated into multiple Ogs, a sheath protein is present in all leptospiral prophages, further indicating that they all harbor a contractile tail, such as the Myoviridae phage Mu.
As expected, LE1 and LE3 non-transposable phages have very little similarity with any of the other genomes analyzed (at the top of Figure 3). Only fourteen and five of their respective 81 and 83 proteins were clustered in Ogs with proteins from other phages. They lacked all the above conserved proteins, except a baseplate component. Some proteins from these L. biflexa phages have however a closest relative among transposable prophage proteins. When prophage Ogs are compared to NCBI RefSeqVirus proteins (526,375 proteins from 13,778 viruses), a handful are most similar to proteins from these L. biflexa Caudovirales: (i) OG16 has its best hit against tail fiber protein from LE3; (ii) OG18 (endolysin) and OG20, adjacent on genomes, are most similar to gp49 and gp51 of LE1; (iii) OG47 is most similar to the baseplate spike of LE1; and (iv) the two genes at the end of BYNN151 and ILNN158, just downstream of the tail fiber protein coding genes, are similar to hypothetical proteins of LE1 (gp46-47) and a hypothetical protein and a metallo-protease (gp80-81), respectively (Figure 4).

Figure 4. Genomic organization of the predicted prophages. One reference genome was chosen for each of the six prophage clades and reference phage Mu, B3, and D3112 are also shown. Protein coding genes are indicated with arrows showing their transcription orientation and are colored according to their functional category defined in the PHROG database. In addition, proteins whose best match in RefSeqVirus is a L. biflexa phage (LE1, LE3, or LE4) are highlighted in red dashed arrows. Similarities between genes from adjacent virus are displayed and were here defined by OG membership. On the left, two schematic representations of the head morphogenesis and final phage Mu structure. TnpA: transposase. TnpB: DNA transposition protein. MCP: major capsid protein; Pilot: Minor head and ejection (pilot) protein; Portal: portal protein; TerL: terminase large subunit; TerS: terminase small subunit; Sc: scaffolding; Prot: head maturation protease; BH, BS, BW: baseplate hub, spike, and wedge proteins; Sh: tail sheath; Tube: tail tube; MTP: major tail protein. TtpM: tail tape measure protein. Tf: tail fiber; SEE: semi-essential region. Ad, Hc, Tc, Tt: head-tail joining. Lys: endolysin; M23pep: metallo-peptidase; holin-p: holin/antiholin pair; and cys-prot: cysteine protease. The location of the tail completion protein (Tc, also known as neck protein Ne1) on the viral particle remains unknown. This figure summarizes Figure S4.

6. Genome Organization of Leptospiraceae Prophages

Known transposable phage genomes are organized into functional modules, expressed sequentially, involved in regulation, replication (by transposition), and building of head and tail structures. These modules are apparent on the genomic maps of Leptospiraceae prophages from the six groups defined earlier and their genomic organization is very stable inside each group (Figure S4). All prophages display a left and a right arm, transcribed in opposite directions, separated by one to three regulatory genes, which most likely regulate the lysis–lysogeny response and the lytic cycle (Figure 4 and Figure S4). This is typical of the B3 genome organization. When present (except in the TG2 group represented here by NOTG62), the gemA gene is near one genome end, again, as it is the case in B3. As mentioned earlier, none of the predicted prophage has the paralogous Mor/C genes, the middle and late transcription activators, so far considered as marker genes of transposable phages [21][22]. The replication module (the DDE transposase and the associated ATPase) and the gam gene are grouped (except in the NN group) and transcribed opposite from the regulatory genes. This gam gene, of the semi-essential SEE region in Mu and B3, is present in almost all the predicted prophages and is often (but not always) located next to a series of short genes of unknown function, which could constitute the SEE region.
The position of head structural genes is variable. The TerS-TerL-Portal block is separated from the other structural genes in the TG1 group as well as the block TerS-TerL-Portal-Pilot-Protease-MCP-adaptors in the NN and TG2 groups. Head and tail modules are always transcribed in the same orientation, with the head genes upstream of the tails in the remaining TA, IL, and TG3 groups. The tail module is at the end of all prophages. Despite the high sequence diversity between prophages of the cold and warm groups, which prevented the clustering of their structural proteins in orthologous groups, various genes could be annotated when compared to PHROG protein families and these identified annotations concern genes that are in the same order in all genomes: the tail sheath (Sh), tube, tail tape measure protein (TtpM), baseplate hub (BH1, BH2), and wedge subunits (BW1, BW2, and BW3) and tail-fiber (Tf) proteins.
Lysis proteins, a key functional module present on all Caudovirales and required for the release of mature viral particles at the end of the lytic cycle, were not directly visible. Three types of proteins together ensure the timely degradation of the host cell wall. The spanins, which appear to bridge the outer and inner membranes, come in two forms. The two component spanins comprise a lipoprotein located in the outer membrane, the o-spanin, and an inner membrane i-spanin with a coiled-coil periplasmic domain. The unitary u-spanins span the periplasm and are anchored in the outer membrane by a C-terminal transmembrane domain and in the outer membrane by a N-terminal lipoprotein domain. The holin/antiholin pair of proteins pierce the inner membrane in due time to allow for the endolysin to access and degrade the peptidoglycan [32]. Using a procedure to specifically identify these lysis functions, a metallo-peptidase with an M23 conserved domain was identified in all TG-CA prophages. This gene might code for an endolysin as in Meiothermus bacteriophage MMP17 where it has been shown to disrupt bacterial cells and exhibit antimicrobial activity against both Gram-negative and Gram-positive pathogenic bacteria [33]. Predictions of holins and spanins appeared in the vicinity of this M23-domain peptidase, supporting the presence of a lysis module at that position. In the TA-TA prophages, a holin/antiholin pair and a predicted cysteine protease, a putative endolysin, reside together in the middle of the genome. Except for those in groups NN and TG1, all prophages have the putative lysis module close to the head module.
Finally, several prophages carry an IS insertion extending their length by 1–2 Kb, e.g., an IS3 family element in WETA8 and WETA10 (TA group) and SANN28 and WENN9 (TG1 group) and an IS110 family element in INNN112. Interestingly, apart from its TnpA, INNN139 possesses another protein clustered in the OG3 but located on the left arm of the genome slightly overlapping the major head protein coding gene, likely an IS481 family element that might disturb the head maturation of this prophage, thus inactivating its ability to produce new virions. The position of the IS near the terL gene in WETA8, may also hamper its activity.

This entry is adapted from the peer-reviewed paper 10.3390/ijms222413434


