Bacteria are increasingly used for biotechnological applications such as bioremediation, biorecovery, bioproduction, and biosensing. The development of strains suited for such applications requires a thorough understanding of their behavior, with a key role for their transcriptomic landscape. We present a thorough analysis of the transcriptome of Cupriavidus metallidurans CH34 cells acutely exposed to copper by tagRNA-sequencing. C. metallidurans CH34 is a model organism for metal resistance, and its potential as a biosensor and candidate for metal bioremediation has been demonstrated in multiple studies. In total, 7500 transcription start sites (TSSs) were annotated and classified with respect to their location relative to coding sequences (CDSs). Predicted TSSs were used to re-annotate 182 CDSs. The TSSs of 2422 CDSs were detected, and consensus promotor logos were derived. Interestingly, many leaderless messenger RNAs (mRNAs) were found. In addition, many mRNAs were transcribed from multiple alternative TSSs. We observed pervasive intragenic TSSs both in sense and antisense to CDSs. Antisense transcripts were enriched near the 50 end of mRNAs, indicating a functional role in post-transcriptional regulation. In total, 578 TSSs were detected in intergenic regions, of which 35 were identified as putative small regulatory RNAs. Finally, we provide a detailed analysis of the main copper resistance clusters in CH34, which include many intragenic and antisense transcripts. These results clearly highlight the ubiquity of noncoding transcripts in the CH34 transcriptome, many of which are putatively involved in the regulation of metal resistance.
Environmental pollution with toxic metals due to anthropogenic activities is an internationally growing concern . The exposure and the risk of elevated concentrations of these pollutants in the environment can lead to bioaccumulation and harmful effects , which are facilitated by the high toxicity and recalcitrance of some metals . Consequently, monitoring tools for metal accumulation in natural environments such as soils and water bodies are needed. Physicochemical analysis techniques, while accurate and sensitive, often fail to chart the bioavailability and the toxicity of the polluting components . At the same time, microorganisms show great promise as biosensors to quantify the bioavailable fraction of heavy metals such as copper , an essential trace element that is highly toxic when overly abundant . In addition, microorganisms can be used to combat environmental contamination with heavy metals in a process called bioremediation . Bioremediation has been the focus of extensive research in recent years as a clean and efficient alternative to conventional strategies (reviewed in Tabak et al., 2005  and Akcil et al., 2015 ).
Cupriavidus metallidurans strains are exemplary β-proteobacteria in metal-contaminated industrial environments . Type strain CH34 was first isolated from a decantation basin in a zinc factory near Engis, Belgium, in 1976 . It was quickly shown to encode resistance mechanisms to a wide range of metals  and has become a model organism to study heavy metal resistance (HMR) in bacteria . Copper resistance in strain CH34 is mediated by multiple cooperating Cu detoxification systems; copF and copA1B1C1D1 encode a Cu efflux P-type ATPase and a periplasmic detoxification system, respectively, and are part of the extensive 21-gene cop cluster. The neighboring silDCBA cluster encodes a heavy metal efflux- resistance nodulation division (HME-RND) driven system. These gene clusters are encoded on the pMOL30 megaplasmid, and homologous systems can be found on its chromosome and chromid (Table 1). In addition, many more accessory genes may play a role in Cu resistance. The integration of these systems brings about a minimum inhibitory concentration of 3 mM Cu2+, three times higher than that of Escherichia coli K38 . A detailed description of Cu resistance in strain CH34 can be found in Mergeay and Van Houdt .
Table 1. Overview of the main Cu resistance gene clusters in Cupriavidus metallidurans CH34.
In a previous study, biosensors for metals such as Pb2+, Zn2+, and Cu2+ have been developed based on Cupriavidus strains . These biosensors function via a transcriptional fusion of metal-specific promotor regions to a luxCDABE luciferase gene cluster, which allows for the emission of a measurable bioluminescent signal upon translation (BIOMET® system). The strength of this signal is proportional to the biologically available fraction of a specific metal. Cupriavidus-based biosensors have been used for the characterization of bioavailable metal fractions in soil, sediments, mineral wastes, etc. . In addition, the bioremediation potential of CH34, owing to its capacity for metal solubilization and biocrystallization, has been demonstrated in several experimental setups . For instance, the deposition of ZnCO3 and CdCO3 crystals around the cell surface, both biologically catalyzed and via abiotic processes, can lead to efficient depletion of these toxic metal ions from the environment. Cupriavidus strains are especially interesting in the case of mixed pollution with metals and recalcitrant organic compounds, since they are often able to degrade a broad array of aromatic compounds . However, the construction of bacterial strains able to efficiently cope with mixed pollutions remains challenging .
In order to understand, optimize, and control the molecular processes underlying the above mentioned applications, it is vital to understand their regulation. Indeed, an elaborate network of sigma factors  and transcriptional regulators  has been found to play an important role in CH34. However, new paradigms in bacterial gene regulation highlight the role of small regulatory RNAs (sRNAs), acting at a (post-)transcriptional level in disparate stress responses . For instance, sRNAs are involved in a plethora of biological processes, including virulence , antibiotic resistance  and mobile genetic elements  as well as oxidative and metal stress  and the degradation of organic compounds .
Thus, in order to fully understand the intricate response of C. metallidurans CH34, a highly detailed map of its transcriptome is a basic necessity. Although its genome is fully sequenced and annotated , and transcriptome data in response to metal stress are available , these data do not allow analyzing pertinent features such as transcription start sites (TSSs), 5′ untranslated regions (5′ UTRs), RNA processing sites (PSSs) and regulatory RNAs. Therefore, we used a tagRNA-sequencing (tagRNA-seq) approach, a modified RNA-seq method that is based on the differential labeling of 5’ RNA ends and that enables whole transcriptome sequencing with the discrimination of primary from processed 5’ RNA ends . The ability of tagRNA-seq to annotate TSSs has been well established in recent years, and it is commonly exploited in research on regulatory features of the bacterial transcriptome. Copper was used as imposed stress, since CH34′s copper resistance mechanism is unique in respect to its complexity compared to others , and the putative functions of sRNAs in the bacterial copper response remain insufficiently studied.
In a next step, the tagRNA-seq data were used to identify and probe the type of the 5′ ends of RNAs in order to obtain a global snapshot of the transcriptional organization in C. metallidurans CH34 and to scrutinize the impact of Cu stress on the RNA landscape. The transcription start site (TSS) detection algorithm, as described in Materials and Methods, was fine-tuned to annotate the 7500 most highly expressed TSSs in both the control and the copper condition. This number was roughly based on the number of annotated genes and pseudogenes in CH34 (6514). These TSSs were divided into primary, secondary, internal, antisense, and orphan TSS according to their location, similar to previous publications  with minor modifications. A primary TSS (pTSS) is the main TSS of a gene or operon and is located within 200 bp upstream of a start codon. It is expressed at least twice as strongly as the second most highly expressed TSS within those 200 bps. The remaining TSSs in this region were classified as secondary TSSs (sTSSs). An internal TSS (iTSS) is located within and on the coding strand, while an antisense TSSs (aTSS) is located on the non-coding strand of a CDS or within 100 bp upstream of its start codon. The orphan TSSs (oTSSs) are not associated with CDSs. An overview of this classification for every replicon is shown in Table 2. As TSS classification was prioritized via the following cascade: pTSS > sTSS > iTSS > aTSS > oTSS; there is no overlap between the different classes. In addition, the tagRNA-seq data was verified with 5′ and 3′ RACE of selected transcripts, including copA1 messenger RNA (mRNA) and several antisense and orphan transcripts (see below). Similarly, the previously identified TSSs of cnrC, czcI, and pbrA were also detected and confirmed by the tagRNA-seq results , both of which corroborate the validity of the TSS identification with the tagRNA-seq procedure.
Table 2. Number and overlap of detected transcription start sites (TSSs) in each replicon 1.
Primary TSSs were detected for 2422 CDSs, amounting to 35.8% of all annotated CDSs. In a first step, this information was combined with the output of the recently developed GeneMarkS-2 gene prediction tool . CDSs for which no pTSS was detected and that contained an iTSS between the original start codon and the newly predicted one, being preceded by a predicted ribosome-binding site (RBS), were scrutinized. This resulted in the putative reannotation of 182 CDSs. In addition, the consensus RBS and the spacer length were derived (Figure 2).
Figure 2. Ribosome-binding site sequence motif (a) and spacer length distribution (b) of C. metallidurans CH34.
Four cases were related to metal resistance. First, related to copper, copM encoding a 136 aa uncharacterized pre-protein was putatively re-annotated to be 114 aa. The cleavage site was similar for both, i.e., the processed proteins were identical, but the signal peptide prediction was much better for the newly annotated CDS (0.82 vs. 0.66; SignalP-5.0). In addition, cupC encoding 133 aa Cu chaperone was putatively reannotated to be 66 aa. Related to the czc cluster involved in cadmium, zinc, and cobalt resistance, czcN (273 aa; 30.1 kDa predicted molecular weight) encoding a membrane-bound isoprenylcysteine carboxyl methyltransferase was putatively reannotated to be 216 aa (23.7 kDa predicted molecular weight). CzcN is homologous to NccN of Alcaligenes xylosoxidans 31 (reclassified as C. metallidurans), which was found to be a 23.5 kDA protein . Finally, the pbrD gene coding for a Pb2+-binding protein  could be putatively reannotated based on both TSS identification and GeneMarkS-2 prediction (CDS and RBS). However, the new CDS is not in frame and codes for a 150 aa protein unrelated to PbrD. These results merit further study of its function and sequence, since pbrD appears not to be necessary for Pb2+ resistance and is absent in all other known lead-resistant bacteria . Moreover, although it is heterologously expressed in E. coli, it does show Pb2+ absorption .
Next, the 5′ untranslated region (5′ UTR) lengths and their distribution were calculated (Figure 3), which showed a peak at a length of 26 nt, after which it tapered off. No difference was noted between the control and the Cu condition, indicating that Cu exposure does not entail a global measurable bias in the 5′ UTR length of expressed genes. A curious observation was the high number of leaderless mRNA (4.67% of all pTSSs), defined as those mRNAs with a 5′ UTR length shorter than 5 nt. Because of the lack of a 5′ UTR, these transcripts are translated via non-canonical mechanisms. Similar observations have been made in Burkholderia cenocepacia , Streptomyces coelicor , Salmonella enterica , Caulobacter crescentus , and Helicobacter pylori . A functional enrichment analysis showed that leaderless mRNAs was significantly enriched in eggNOG class K (transcription), which has already been observed in many bacterial species , and class L (replication, recombination, and repair). Several leaderless mRNAs coding for transcriptional regulators were also differentially expressed in the presence of Cu2+, including arsR, ompR, ohrR, and zniR. In addition, copJ, merP, rpoN, and czcI2 were transcribed into leaderless mRNAs. The prevalence of leaderless mRNAs has been shown to be higher in organisms with an earlier evolutionary age, and has been linked to extreme habitats .
Figure 3. The 5′ untranslated region (UTR) length distribution in the C. metallidurans transcriptomes. CDS: coding sequence.
Finally, to complete the promoter information, the consensus sequence in a region of 100 nt around each pTSS was identified with Improbizer , which allowed us to weight the location of detected motifs. On the chromosome, the chromid, and the megaplasmid pMOL30, a TAnAAT consensus motif was detected around the −10 position, generally flanked by GC-rich stretches. At the −35 position, a TTGACA-like motif with higher variability was detected, again flanked by short GC-rich stretches. It is possible that CDSs preceded by these consensus motifs, which are similar in location and sequence to those found in E. coli, are under transcriptional control of the two housekeeping σ70-factors rpoD1 (Rmet_2606) and rpoD2 (Rmet_4661) . Interestingly, although pMOL28 and pMOL30 do not contain essential housekeeping genes, TAnAAT consensus motifs were found on pMOL30 but not pMOL28, indicating a difference in sigma factor recruitment for the initiation of pMOL28 and pMOL30 gene transcription. Noteworthy, in contrast with pMOL30, which does not encode any sigma factors, pMOL28 encodes the ECF sigma factor CnrH. However, it is only involved in transcription of the cnr locus . In addition, it has been shown that none of the remaining 10 ECF sigma factors, which are encoded on the chromosome (6) and the chromid (4), are necessary for the upregulation of any of the operons responding to metal shock . Nevertheless, the involvement of non-plasmidic sigma factors in the transcription of plasmidic genes is not unexpected. On pMOL30, most CDSs preceded by a TAnAAT motif have an unknown function, but some genes involved in metal resistance are preceded by a TAnAAT-like pattern (czcE, pbrT, ubiE, and copM). Further determination of the functions of the remaining hypothetical proteins on both pMOL28 and pMOL30 may provide conclusive insights in this matter.
Alternative mRNAs deriving from secondary TSSs were detected for 623 CDSs. Among the genes differentially expressed under Cu stress, a high number of regulatory proteins was readily apparent, including the main regulators of pMOL30-based (copR1) and chromid-based periplasmic (copR2) Cu detoxification systems as well as ohrR (Rmet_3619), zniS (Rmet_5322), bzdR (Rmet_1223), iscR, rpoH, and czcR2. In addition, several genes involved in HMR and stress responses were transcribed from alternative TSSs. Alternative TSSs have been reported in several bacterial and archaeal species  and have been linked to differences in translational efficiency and mRNA stability . Longer 5′ UTRs enable more extensive post-transcriptional regulation, e.g., via sRNAs, and can contain riboswitches that respond to various environmental cues . Interestingly, many mRNAs coding for sigma factors showed multiple TSSs, including rpoA, rpoD1, rpoE, rpoH, rpoI, rpoK, and rpoM, which could have an impact on the transcription of genes associated with a particular sigma factor.
The second most abundant TSS class after the pTSSs was that of the iTSSs. As all TSSs were derived from primary 5′-PPP-RNA, these iTSSs are unlikely to be the result of RNA degradation, e.g., by 5′-exonucleases, although some could be pTSSs or sTSSs due to misannotated CDSs, or transcriptional noise from spurious promotors. More importantly, iTSSs can also mark the start site of noncoding RNAs.
Even though the roles of the detected iTSSs are unclear, they are pervasive in the CH34 transcriptome. Functional enrichment analysis showed that, in the copper condition, eggNOG classes C, F, J, M, O, P, and T were enriched in iTSSs. In total, 55 of the 167 genes involved in metal resistance contained at least 1 iTSS, e.g., almost all cop1 genes and follow-up functional analyses could provide more insights into their specific role in metal resistance. The iTSSs related to the Cu resistance mechanisms (cop1 and sil clusters on pMOL30) are provided and discussed in detail below.
In both the control and the Cu condition, iTSSs were enriched in the first 10% of the encompassing CDS, while iTSSs were relatively depleted in the last 10% (Figure 4). This observation lends credibility to the hypothesis that many detected iTSSs are pTSSs or sTSSs to misannotated CDSs. However, these were not included in our re-annotation, as the current approach does not allow discriminating between the original and the putative new start codon. Of course, these uncertainties could be addressed with additional protein-based experiments such as proteome profiling (although not straightforward as the absence of a longer protein over the presence of a shorter one needs to be shown), but this is out of the scope of this study. Since there is currently no evidence that iTSSs in the first percentile are linked more to coding RNAs than in other percentiles, we conclude that many iTSSs either mark the start site of noncoding RNA or are the result of transcriptional noise.
Figure 4. Percentile-wise distribution of iTSSs (a) and aTSSs (b) positions relative to the cognate CDS.
Similar to intergenic TSSs, antisense TSSs are ubiquitously found in the CH34 transcriptome. In total, 2129 aTSSs were associated with 692 CDSs, indicating that many mRNAs can putatively be complexed by multiple antisense transcripts. Of all aTSSs, 408 showed a logFC smaller than −1 or greater than 1, indicating differential expression. Six differentially expressed aTSSs were validated by 5′ RACE experiments. It is generally assumed that aTSSs mark the start of noncoding transcripts, however, whether a detected antisense transcript has a (regulatory) function or is the result of transcriptional noise  needs to be determined case-by-case. Since antisense transcripts can obscure regulatory features of the sense transcript by perfect base pairing , the location of aTSSs relative to the 5′ and 3′ ends of the sense transcript was scrutinized (Figure 4b). There is a clear enrichment of aTSSs near the 5′ end of the sense CDS, with a less prominent enrichment near the 3′ end, indicating a putative regulatory role of the antisense transcripts.
Specifically related to copper resistance, aTSSs were found in silB, copA1, copF, copG, copH, copL, and copM (Figure 5 and Figure 6). The aTSSs in silB (nt position 168,587 on pMOL30) and copL (nt position 181,548 on pMOL30) were further selected for 3′ RACE validation, indicating transcript lengths of approximately 700 and 500 bp, respectively. In addition, an aTSS was found in czcA1.
Figure 5. Transcription profile analysis of the cop cluster from C. metallidurans CH34 when exposed to copper. Combined TSS read counts of the three biological replicates for control (upper) and Cu condition (lower) are shown for positive (green) and negative (red) strands, with the y-axis containing a break pair (1000–1200) represented as striped grey lines. CDSs related to the cop cluster (coordinates for the pMOL30 region shown at the bottom) are colored based on their log2 fold change. The small black arrows indicate clearly identified primary and internal TSSs. The green arrow represents a re-annotated CDS.
Figure 6. Transcription profile analysis of the sil cluster from C. metallidurans CH34 when exposed to copper. Combined TSS read counts of the three biological replicates for control (upper) and Cu condition (lower) are shown for positive (green) and negative (red) strands. CDSs related to the sil cluster (coordinates for the pMOL30 region shown at the bottom) are colored based on their log2 fold change. The small black arrows indicate clearly identified primary and internal TSSs.
Antisense TSSs were enriched in CDSs belonging to eggNOG classes L (replication, recombination, and repair), M (cell wall/membrane/envelope biogenesis), and T (signal transduction mechanisms) in the control condition, and classes J (translation, ribosomal structure and biogenesis), L, and M in the copper condition. Class T was only significantly overrepresented in the control condition, which could indicate that repression of translation via antisense transcription is common for CDSs related to signal transduction and partially relieved under copper stress. Many sigma factors, such as rpoB (Rmet_3334), rpoD1 (Rmet_2606), rpoN, rpoB, and rpoS (Rmet_2115), showed antisense transcription to a high extent. The role of alternative sigma factors in the initiation of antisense transcription has been described before ; inversely, an interesting case of the regulation of the sigma factor RpoS itself by antisense and anti-antisense RNAs has been shown in E. coli .
In total, 578 TSSs were detected in intergenic regions unassociated with CDSs. These orphan TSSs could mark the start of transcripts resulting from the 5′ of mRNAs with exceptionally long 5′ UTRs, unidentified CDSs, often coding for short peptides , and trans-acting sRNAs , or they could result from transcription by spurious promoters.
First, the prevalence of long 5′ UTRs was investigated. A 200–300 nt region upstream of known start codons was scrutinized for oTSSs, and uninterrupted coverage from the oTSS to the start codon was manually verified. In this way, 126 oTSSs marked the 5′ of putative long 5′ UTRs. Interestingly, hfq, coding for a small RNA-binding protein that stabilizes sRNAs and modulates RNA–RNA interactions  showed a long 5′ UTR (209 bp) next to two other mRNA isoforms with 5′ UTRs of 25 and 137 bp. Relative expressions of these three isoforms (25, 137, and 209 nt 5′ UTR) were 6.1%, 81.1%, and 12.8% and 15.0%, 74.7%, and 10.3% for control and copper conditions, respectively. These disparate 5′ UTRs might have given rise to differences in post-transcriptional regulation of the hfq mRNA. While the relative coverage of each alternative TSS was quite similar in control and copper conditions, it is possible that stronger selection of differential TSSs exists in other (stress) conditions.
In a subsequent analysis, the existence of putative open reading frames (ORFs) downstream of oTSSs was studied. In total, 417 oTSSs were located upstream an ORF with ATG, GTG, or TTG as start codons and either with a minimum length of 150 nt or with homology to proteins in the non-redundant protein database. A list of these proteins with their inferred function, among which are several transporters, can be found in sheet “oTSS ORF BLAST”.
Consequently, the remaining 35 oTSS were neither located within 300 nt upstream of known CDSs nor associated with putative ORFs. These oTSSs were assumed to mark the 5′ end of noncoding RNAs. Eleven oTSSs showed a log2 fold change above 1 or below −1. As the expression of these oTSSs is affected by Cu stress, they are especially interesting as candidates for sRNAs with regulatory functions in the response to metal exposure. Three of them, based on interesting locations relative to known HMR genes, were experimentally validated by 5′ RACE. Two of these differentially expressed orphan transcripts were also experimentally validated by 3′ RACE (one at CHR1 nt position 3,784,309 and one at CHR2 nt position 147,562), indicating lengths of roughly 550 bp and 450 bp. It must be mentioned that 3′ ends of coding and non-coding RNAs cannot be accurately inferred from tagRNA-seq coverage, hence the need for additional experimental validation (via 3′ RACE or other techniques).