It is now widely recognized that genetic regulation of bacterial physiology cannot be fully understood without considering the possible participation of small non-coding RNAs (sRNAs). However, current genome annotations for most bacterial species unaccountably overlook sRNA genes. Here, we describe strategies undertaken to characterize the noncoding transcriptome of rhizobia, a group of soil bacteria that are well-known for their ability to establish agriculturally and environmentally relevant mutualistic nitrogen-fixing symbioses with legumes.
Small non-coding RNAs (sRNAs) are ubiquitous components of bacterial adaptive regulatory networks underlying stress responses and intracellular infection of eukaryotic hosts. However, non-protein coding genes largely escape classical genetics screens and the primary annotation of a single bacterial genome sequence, which is essentially limited to the prediction of open reading frames (ORFs), tRNAs, and rRNAs. Before the advent of high-throughput sequencing, computational comparative genomics was thus the tool of choice to identify conserved regions with putative functions in the unannotated portions of the genome. Accordingly, pioneering genome-wide searches for sRNAs in rhizobia relied on the comparison of intergenic sequences (i.e., genomic regions between ORFs; IGRs) from phylogenetically close species to unveil trans-acting sRNAs. Specifically, three studies published almost concurrently used the IGRs from the alfalfa symbiont S. meliloti (reference strain Rm1021) as queries to interrogate the genomes of α–proteobacterial relatives, i.e., plant symbionts (e.g., Mesorhizobium loti, Rhizobium etli or R. leguminosarum bv. viciae), phytopathogens (Agrobacterium tumefaciens) and animal pathogens (Brucella species) [1][2][3]. All three searches combined genomic comparisons with other known features of trans-sRNAs, namely association with orphan transcription signatures (promoter motifs and/or Rho-independent transcriptional terminators) and conservation of thermodynamically stable secondary structures. Collectively, these approaches predicted more than a hundred IGRs in the three replicons (chromosome, and pSymA and pSymB megaplasmids) of the Rm1021 genome putatively encoding a trans-sRNA. Northern blot hybridization of total RNA, RACE (Rapid Amplification of cDNA Ends) mapping of transcripts boundaries and/or microarray probing experimentally confirmed that a few dozens of these candidate IGRs did express sRNA species from independent transcription units. These studies also anticipated the symbiotic and/or stress-dependent expression of subsets of the identified sRNAs. A similar combination of in silico searches and experimental approaches also delivered the first inventories of sRNAs expressed by R. etli CFN42, Bradyrhizobium japonicum USDA110 and M. huakuii 7653R [4][5][6], the symbiotic partners of common bean (Phaseolus vulgaris), soybean (Glycine max), and milkvetch (Astragalus sinicus), respectively.
Inherent limitations to both computational searches and microarray designs necessarily biased these seminal genome-wide screens for sRNAs in rhizobia to the identification of putative intergenic, trans-acting, conserved riboregulators. Straightforward experimental identification of transcription start sites (TSS) associated to coding sequences, untranslated mRNA regions, and noncoding RNA genes in the prokaryotic genomes is now feasible with the implementation of oriented differential RNASeq (dRNASeq) [7] or Cappable-Seq [8] strategies. These two experimental setups target the primary transcriptome on a strand-specific basis upon terminal exonuclease (TEX)-mediated depletion of the processed RNA species or streptavidin capture of primary transcripts capped at their distinctive triphosphorylated 5’-ends, respectively. In particular, dRNASeq surveys rediscovered the early-identified sRNAs in S. meliloti and B. japonicum and further uncovered the complex rhizobial transcriptomes with the addition of hundreds of unknown trans-sRNAs, as well as thousands of mRNA-derived sRNAs and antisense RNAs (asRNAs) [5][9][10][11][12]. Other RNASeq studies are conceived to profile specific subpopulations of cellular transcripts supposedly enriched in sRNAs, e.g., RNA species co-immunoprecipitated with the major bacterial RNA chaperone Hfq. However, this approach resulted in a minor addition to the sRNA landscape revealed by deep sequencing of S. meliloti total RNA[13].
Prokaryotic gene prediction pipelines such as EuGen-P have incorporated the novel gene structural features uncovered by dRNASeq for the accurate reannotation of the S. meliloti (strains Rm1021 and Rm2011) and B. japonicum USDA110 genomes[10][11][12]. Even though dRNASeq mostly serves annotation purposes, comparison of transcripts levels in some datasets identified differentially expressed sRNAs in free-living and nodule endosymbiotic bacteria. In this regard, it is noteworthy the identification of nodule-expressed sRNAs by RNASeq based profiling of RNA derived from each developing zone of indeterminate nodules induced on the model legume M. truncatula by Rm2011[14]. On the other hand, comprehensive mapping of TSS in Rm2011 and USDA110 has facilitated the prediction of motifs putatively recognized by alternative σ factors such as RpoE2 (σE2) or RpoN (σ54) in the promoter regions of some of the identified sRNAs, thus placing these RNA regulators in major stress response and/or symbiotic regulons[10][11][12]. The integration of the updated genome annotation files, primary expression profiles, and promoter predictions provides a solid resource for the forthcoming investigation of the function of sRNAs in plant symbiotic bacteria. Similar in silico and experimental workflows can be implemented to explore the noncoding transcriptome of any bacterial species.