Transcription–Replication Coordination

Transcription–Replication Coordination: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Genetics & Heredity

Contributor: Marco Saponaro

Transcription and replication are the two most essential processes that a cell does with its DNA: they allow cells to express the genomic content that is required for their functions and to create a perfect copy of this genomic information to pass on to the daughter cells. Nevertheless, these two processes are in a constant ambivalent relationship. When transcription and replication occupy the same regions, there is the possibility of conflicts between transcription and replication as transcription can impair DNA replication progression leading to increased DNA damage. Nevertheless, DNA replication origins are preferentially located in open chromatin next to actively transcribed regions, meaning that the possibility of conflicts is potentially an accepted incident for cells. Data in the literature point both towards the existence or not of coordination between these two processes to avoid the danger of collisions.

transcription
DNA replication
genome instability
DNA damage
G-MiDS
transcription–replication collision

1. Complexity of the Transcription Process

Transcription is the process that produces RNA using DNA as a template. This allows cells to express the functional relevant parts of their genome and consents each cell in an organism to acquire specific functions. The diversity of transcripts that each cell can create permits the generation of 200 different cell types in the human body, all with virtually identical genomes. Importantly, this large variety of transcripts can be produced by different RNA Polymerase complexes, each responsible for a specific subset of transcripts (Figure 1):

Figure 1. Description of which specific classes of transcripts are produced by the three RNA Polymerase complexes, with RNAPII supporting and also contributing to the transcription of RNAPI and RNAPIII transcripts.

RNA Polymerase I (RNAPI) transcribes the ribosomal RNA (rRNA 5.8S, 18S, and 28S in mammals), transcribed as a single polycistronic RNA subsequently processed in single rRNAs; ribosomal RNA transcripts are arranged in rDNA clusters present in the short arms of five human chromosomes (13, 14, 15, 21, and 22), in a broad range of copies of each unit per clusters [1].

RNA Polymerase II (RNAPII) transcribes messenger RNA (mRNA) from genes, approximately 42,000 in total in the human genome, half of which are protein coding and the other half noncoding [2]; RNAPII also transcribes long noncoding RNA (lncRNA), micro-RNA (miRNA), piwi-interacting RNA (piRNA), and most of the small nuclear RNA (snRNA) and small nucleolar RNA (snoRNA).

RNA Polymerase III (RNAPIII) transcribes transfer RNA (tRNA), with approximately 500 present in the human genome [3]; RNAPIII also transcribes the 5S rRNA that is arranged as a single cluster of approximately 100 repeats on chromosome 1 [1], and the remaining of the snRNA and snoRNA.

In reality, recent evidence has shown great crosstalk between the different complexes, meaning that the distinction between the roles of each complex is less neat than previously thought. For example, for a long time, it was known that RNAPII and RNAPII-associated transcription factors are present next to sites of RNAPIII transcription [4,5,6]. More recently, it was shown that RNAPII regulates the transcription of some RNAPIII transcripts [7]. In parallel, RNAPII-associated transcription factors regulate RNAPI subunits expression [8], and RNAPII is found at rDNA sites, essential to support ribosome biogenesis (Figure 1) [9]. Equally difficult is clearly determining which parts of the genome are transcribed. Protein coding genes, of which there are approximately 21,000 in the human genome and that represent the most varied category of transcribed regions, account for approximately 3% of the total genome [10]. Nevertheless, at least 75% of the genome can be transcribed, with most of the transcripts presenting features of RNAPII transcription [11]. It becomes immediately obvious that transcription is a totally pervasive process that can occupy most of the genome at the convergence point of many different cellular processes. As well as supporting each other’s transcription, with RNAPII involved in the transcription of all RNA polymerases, RNA polymerases can likewise conflict with each other. For example, the transcription of a gene can affect the ability to transcribe another transcript downstream, in a process known as transcription interference [12]; RNA polymerase complexes can collide with each other when converging, as eukaryotic RNA polymerases cannot bypass each other [13,14]. In this complex scenario, we have to consider that transcription is not the only process that uses the DNA as a substrate, as this is also used by DNA replication.

2. Transcription-Induced Genome Instability

RNA polymerases complexes have molecular weights of more than 500 kDa even without accessory transcription factors, and as such, are much bigger than the physical barriers that replicative helicases can overcome on DNA [15]. Consequently, conflicts between the transcription and replication machinery create a particularly dangerous situation, as impediments to replication forks progression can induce an increase in DNA damage and genome instability (a condition generally referred to as replication stress) [16,17]. Indeed, studies in vitro and in vivo from bacteria to eukaryotes have shown how head-to-head collisions are more detrimental than codirectional ones in interfering with replication fork progression [18,19,20,21,22,23,24]. Intriguingly, in the case of codirectional collisions, the replication machinery could take advantage of the mRNA present there to restart and continue replication after the collision site [18]. This finding is supported by in vivo data in bacteria with evidence of replication restart at codirectional collision sites [20]. However, an increase in codirectional collisions induced, for example, by an accumulation of backtracked RNAPII can be equally dangerous for genome stability, because the restart of replication downstream of the collision site can lead to an accumulation of single-strand gaps [25]. Altogether, these data indicate that a head-to-head or a codirectional collision between transcription and replication machinery could impact very differently on replication’s ability to progress with its task.

There are several mechanisms through which transcription can affect directly or indirectly DNA replication progression inducing genome instability (Figure 2):

Figure 2. Description of the molecular mechanisms through which transcription has been identified to affect replication fork progression, inducing increased genome instability. Transcribing RNAPII is depicted as green oval with nascent RNA as a red line, replisome is depicted as blue ovals. Transcription can lead to (i) an accumulation of R-loops (left); (ii) an accumulation of topological constraints due to supercoilings generated by both transcription and replication (middle); (iii) an accumulation of stalled/paused/backtracked RNAPII known as transcription stress. For simplicity, transcription and replication have been presented in a head-to-head conformation, but this is not the unique condition that leads to increased genome instability.

(i) Increased formation and/or persistence of three-stranded RNA–DNA hybrid structures called R-loops, formed when the nascent RNA hybridises back with the template DNA strand displacing the non-template DNA strand [26,27]; R-loops can furthermore lead to changes in chromatin structure and accessibility [28,29,30].

(ii) Accumulation of positive and negative supercoiling that induces increased topological constraints [31,32,33].

(iii) Accumulation of stalled/paused/backtracked RNA polymerase (so called transcription stress) [25,34].

(iv) Increased occurrence of DNA damages at transcribed regions [35,36,37].

Nevertheless, the distinction between these mechanisms is not neat, and for example, impairments of topoisomerases and transcription stress can likewise lead to increases in R-loops levels [32,38,39]. Altogether, large evidence indicates that transcription is a driver of genome instability, and in this sense, many RNAPII-associated factors are identified as essential to preserve genome stability [16,17,22]. Another evidence that links transcription to increased genome instability comes from the analyses of the genomic sites more prone to DNA damage when DNA replication is impaired. Common fragile sites (CFS) are chromosomal regions prone to breakage following low levels of replication stress, such as treatments with low doses of aphidicolin [40,41,42]. The propensity to genome instability of CFS is observed correspondingly in human diseases, with CFS identified as hotspots for genomic breakages in cancer cells, ultimately inducing the expression of oncogenes or deregulating the expression of oncosuppressors [43,44,45]. Several identified mechanisms explain CFS’s instability, among which there is also a paucity for replication origins and the fact the CFS are generally replicated later in S-phase [46,47,48,49]. Importantly, CFS are moreover enriched for long transcribed RNAPII genes, linking genome instability directly at CFS to RNAPII transcription [38]. The combination of both poor availability for replication origins with replication forks travelling long distances across large, transcribed domains, appears as the main determinants for genome instability at CFS [50]. Another class of fragile sites has been identified that is called early replicating fragile sites (ERFS) [51]. ERFS differ from CFS because ERFS accumulate breakages when cells are treated with high levels of replication stress induced by high doses of hydroxyurea, and ERFS breakages arise in early replicated regions [51]. ERFS overlap as CFS with genomic sites commonly lost in cancers and with transcribed genes, in the case of ERFS specifically with highly transcribed short genes [51]. Hence, as the data above indicate how transcribed regions are hotspots for genome instability, we must ask the question: how are transcription and DNA replication organised in order to avoid the dangerous consequences of conflicts and collisions?

3. Co-Existence or Spatial–Temporal Separation between Transcription and Replication

One simple possibility would be to keep the time when replication occurs in a cell separate from when the cell transcribes. It was known and thought for a long time that DNA replication is restricted exclusively to the S-phase of the cell cycle, and only once completed can cells progress into mitosis [52]; however, DNA synthesis can occur as late as in mitosis following treatments with replication stress-inducing agents, so-called mitotic DNA synthesis (MiDAS), in hotspot sites prone to replication stress such as CFS; nevertheless, there are a fraction of cells performing MiDAS even in the absence of exogenous replication stress treatments [53,54,55]. Regarding transcription’s regulation throughout the cell cycle, this is different depending on each specific RNA Polymerase. In the case of RNAPI, transcription levels oscillate throughout the cell cycle, with RNAPI transcription inactive only in mitosis and early G1 [56]. Consequently, during the replication of rDNA regions, ongoing replication forks and RNAPI transcription machinery are coordinated through the presence of specific replication fork barriers (RFB) present in each rDNA repeat unit [57,58]. The absence of functional RFB leads to collisions between the transcription and replication machinery [58], with many replication fork stability factors important to preserve rDNA repeat stability [59]. RNAPIII transcription activity is also low in early G1 and increases as cells progress through the cell cycle, becoming repressed in mitosis [60,61,62]. Indeed, it was shown in S. cerevisiae that tRNAs act as hotspots where replication fork stalls and pauses [63]. Finally, RNAPII is active at any stage of the cell cycle as RNAPII transcribes specific genes even in mitosis despite condensed chromosomes, although the vast majority of RNAPII complexes are allowed to complete transcription just before entering mitosis, with new initiation events inhibited [64,65]. Gene transcription levels are, however, not constant through the cell cycle, as many genes greatly change their levels depending on roles and functions. Even so, many genes are specifically upregulated or expressed during the S-phase, for example, components of the replication machinery and histones required to pack the newly replicated DNA into chromatin [66]. Considering all data together, during the S-phase, all three RNA Polymerases are active, indicating that the timely separation of transcription and replication is not a strategy that eukaryotic cells deploy to avoid the occurrence of collisions.

An additional layer of complexity comes from the analysis of DNA replication timing and the distribution of replication origins. Several studies across many model systems have invariably shown that transcribed regions are preferentially replicated in the early S-phase, while poorly transcribed regions are preferentially replicated in the late S-phase [67,68,69]. Consequently, as different cell types will transcribe distinct regions of their genome depending on their role and function, there is not a unique replication program in higher eukaryotes, as this will be cell type specific and affected by which regions a specific cell transcribes [69]. Moreover, considering the diverse transcription programs that different cell types have, it would be virtually impossible to have replication origins activated so that replication forks would be uniquely codirectional with highly transcribed genes to avoid more challenging head-to-head collisions, as it happens in bacteria [57,69]. Finally, mapping of DNA replication origins has shown that these are enriched next to the transcription start sites (TSSs) of actively transcribed genes, colocalising with active histone marks [70,71,72,73]. Importantly, replication origins are preferentially enriched near TSSs of long genes, arranged so that the leading replication fork and RNAPII are codirectional [73]. This set of evidence would suggest that cells do not or cannot arrange replication origins to completely avoid potential encounters between transcription and replication. If anything, by arranging leading replication forks to be codirectional with transcription along long genes, the replication program is pre-emptying potentially more troublesome instances such as head-to-head collisions. Actually, there could be benefits for cells in activating replication origins next to transcribed sites, as the open chromatin conformation of transcribed regions makes them more accessible for the replication machinery too. Moreover, by starting replication of the genome from the transcribed regions, cells make sure that they will pass to their daughter cells the genetic information that is needed for their function; therefore, is there evidence supporting the existence of a higher level organisation that coordinates transcription and replication? Are cells actively controlling these two processes to reduce the risk of conflicts and ultimately preserve genome stability?

Some evidence in support of such an arrangement comes from studies analysing the nuclear distribution of active replication and active transcription throughout the S-phase. Some data show that transcription and replication occur in different parts of the nucleus throughout the S-phase, suggesting that when a region is replicated, it is not at the same time also transcribed [74]; however, other data show overlaps between transcription and replication, in particular, in early the S-phase [75]. The dissimilarity between these results could not be more striking, and even considering the technical differences between these papers in terms of labelling time or cell types, these findings do not answer whether transcription and replication are coordinated. Equally, genomic analyses assessing transcription and replication activities and dynamics throughout the S-phase have reached contrasting conclusions. Some data, for example, support segregation and temporal separation between transcription and replication: transcription levels and replication timings are inversely correlating, with early replicated genes increasing their transcription later during the S-phase, while late replicated genes reduce their transcription during the S-phase [76]. At the same time, others have identified that TSSs of actively transcribed genes maintain high levels of nascent transcription activity even when genes are replicated [77]. This transcription activity footprint at TSSs affects the replication of TSSs compared to the rest of the gene [77]. The reduced replication of TSSs persists throughout the cell cycle until G2/M, when the RNAPII is removed from most of the transcribed genes allowing the completion of the duplication of TSSs [77]. Hundreds of genes present DNA synthesis at TSSs, specifically in G2/M, especially genes characterised by high levels of TSS-associated antisense transcription [77]. This process is distinct from MiDAS, it is not associated with sites of DNA damage nor dependent on canonical DNA damage repair and response factors, and is called G2/M DNA synthesis (G-MiDS) [77]. TSSs have been further identified as hotspots of transcription replication interaction (TRI) zones in mouse cells [78]. Further, in this case, TRI zones are a relatively common and general instance with more than a thousand TSSs identified, in particular those characterised by the presence of transcription going in both directions, either because of bidirectional promoters or because of the presence of an annotated transcript [78]. Both the Wang et al. and the St Germain et al. papers show that hotspots of G-MiDS and TRI correlate with genomic sites frequently rearranged and mutated in tumours, linking once more transcription and replication conflict regions to hotspot sites of genome instability linked to human diseases [77,78]. Altogether, therefore, in the literature, there is both evidence supporting the existence of coordination between transcription and replication, as well as evidence that supports that the two processes coexist all the time together.

4. Pros and Cons from Coordinating Transcription and Replication

Can we, therefore, evaluate this problem from an evolutionary point of view, assessing perhaps what would be the best alternative for a cell in an ideal scenario? Practically analysing this situation, it would be convenient for a cell to keep the two processes separated, as interference between transcription and replication has been widely associated with increased DNA damage and genome instability. In support of this view, we have the fact that transcriptional defects drive directly increased DNA damage in cells [26,34], many transcription factors are found important to preserve genome stability [27], and the fact that fragile sites overlap with transcribed regions (Figure 3) [49,51]. Even the recent findings that show that transcription and replication coexist all the time emphasise how hotspot sites of interference overlap with genome instability sites in cancers [77,78].

Figure 3. List of the advantages and the disadvantages that cells have in keeping transcription and replication spatially and temporally separated. Transcribing RNAPII is depicted as green oval with nascent RNA as a red line, replisome is depicted as blue ovals.

However, again analysing the evidence hypothetically, there are benefits for a cell in having the two processes coexisting. It would be easier to load and activate replication initiation complexes in the open chromatin conformation of transcribed regions (Figure 3) [70,71,72,73]. Moreover, even in case of DNA damage arising from a collision event, there is the potential direct positive impact and contribution of transcription and transcription-associated chromatin modifications to the DNA damage repair kinetics and repair pathway choices. For example, in the case of nucleotide excision repair, the DNA damage-affected RNAPII can directly recruit and activate the transcription-coupled nucleotide excision repair sub-pathway, with faster DNA damage repair kinetics in transcribed regions than in not transcribed regions [79,80]. In the case of double-strand breaks (DSBs), these are preferentially repaired by homologous recombination instead of non-homologous end joining in transcribed regions, with non-homologous end joining preferentially repairing DSBs in not transcribed regions [81,82]. Moreover, in recent years it has become evident that RNA polymerases and transcription play a direct role in the correct establishment of DNA damage repair and response foci and the repair of DSBs. More specifically, RNAs produced at DSBs sites are required for the correct assembly of 53BP1 foci and for the formation of a phase separation state, important for the activation of the DNA damage response [83,84,85,86]. DSB induced RNAs are MRE11-dependent and their processing requires the DROSHA and DICER RNases involved in RNA interference [84,87,88]. An important step in this process is the generation of R-loops that facilitates the repair through RAD51-dependent homologous recombination, with many factors identified as important for R-loops establishment as well as R-loops resolution [89,90,91,92,93,94,95,96]. While currently, there is no evidence supporting that transcription could be directly involved in the resolution of transcription–replication collisions, a potential role for transcription cannot be completely excluded considering the above-mentioned data.

This entry is adapted from the peer-reviewed paper 10.3390/life12010108

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.