RNA is frequently found into DNA. Single embedded ribonucleotides are mainly introduced by DNA polymerases. Longer stretches of RNA can also anneal to DNA, forming RNA:DNA hybrids, as occurs for R-loops. Even if R-loops are the most studied hybrid structures, the world of RNA:DNA hybrids is much wider. Polyribonucleotide chains are indeed synthesized to allow Okazaki fragments priming in the process of DNA replication, and double-strand breaks repair and may also result from the direct incorporation of several consecutive ribonucleotides by DNA polymerases. We discuss about all the possible sources of single and multiple ribonucleotides in DNA, focusing on situations where the aberrant processing of RNA:DNA hybrids may result in potentially harmful stretches of consecutive ribonucleotides embedded into the genome, whose existence is also supported by their presence into the DNA of organelles.
Most leaving organisms store their genetic information in DNA rather than in RNA, due to the inherent chemical instability of the RNA. The DNA, indeed, lacks the reactive 2ʹ-OH group on the ribose sugar, which can attack the sugar-phosphate backbone, generating breaks with genotoxic outcomes . Apart from choosing the proper complementary base, replicative DNA polymerases must then discriminate between ribose in rNTPs versus deoxyribose in dNTPs . This is why they possess a special “steric-gate” residue in their nucleotide-binding pocket. Steric-gate residues, like Tyrosine or Phenylalanine in replicative polymerases, are characterized by a bulky side chain that sterically clashes with the 2’-OH of incoming rNTPs, preventing their incorporation in DNA . Other active site residues contribute to keeping the side chain of the steric-gate residue and the incoming nucleotide in the proper orientation to achieve high sugar selectivity . Moreover, it was recently shown that a polar filter interacts with the 3’-OH and the triphosphate moiety of the incoming nucleotide; this causes the 2’-OH of a rNTP to clash with the surface of the fingers domain, further limiting the possibility to bind rNTPs in a catalytically competent conformation .
Sugar selectivity, however, is not perfect, especially because DNA polymerases are constantly challenged by high rNTP concentrations. For example, even if in yeast cells the dNTP pools increase of about three-fold upon entry into the S phase respect to G1 , the physiological concentrations of the four rNTPs greatly exceed those of dNTPs: rNTPs range from 500 to 3,000 μM, while dNTPs are in between 12 and 30 μM . For this reason, pol ε was estimated to introduce 1 rNMP every 1,250 dNMPs during leading strand synthesis, while pol δ and pol α insert of 1 rNMP every 5,000 dNMPs and 625 dNMPs respectively, during lagging strand synthesis . This results in more than 13,000 rNMPs inserted into the yeast genome for each replication cycle . Such high numbers also depend from the reduced ability of pol ε and especially pol δ to proofread rNMPs . Ribonucleotides can be considered as the most common non-canonical nucleotides present in the eukaryotic genome . The presence of ribonucleotides into genomic DNA was confirmed in vivo by alkali-sensitivity assays . Single or di-ribonucleotides were detected in vivo also in mammalian genomic DNA and estimated to generate at least 1,000,000 alkali-sensitive sites per cell .
The activity of pol ε and pol δ is not restricted to DNA replication. They also work in repair processes requiring DNA synthesis, in particular nucleotide excision repair (NER) , so they may also introduce rNMPs in such circumstances. Reparative DNA synthesis steps are as well performed by many other specialized polymerases that can contribute to rNMPs incorporation . It should be emphasized that these polymerases are often active outside of the S phase , when the concentration of dNTPs is even lower than in the S phase, which may contribute to a more significant incorporation of rNMPs into DNA. We can then speculate that “non-replicative” ribonucleotides may become particularly relevant in post-mitotic cells, such as neurons, where TLS was recently found to take place .
To maintain the correct DNA:DNA composition cells have evolved ribonucleases H (RNases H) specialized in the removal of ribonucleotides from DNA. In eukaryotic cells, RNase H2 is composed by three subunits (Rnh201, Rnh202, Rnh203 in yeast; RNaseH2A, RNaseH2B, RNaseH2C in higher eukaryotes) all essential for the activity of the complex, and it cleaves both single and multiple rNMPs paired with DNA . RNase H2 is the initiator of ribonucleotide excision repair (RER), the most common repair pathway for removal of genomic embedded rNMPs . RER ensures genome integrity and proper development of mouse embryos , keeping embedded rNMPs under a threshold of ribonucleotide tolerance . RNase H2 mutations are associated with a rare autoinflammatory disorder known as Aicardi-Goutières syndrome (AGS), mainly characterized by early-age onset, and chronic overproduction of type I interferon in the absence of infections . Patient-derived cells accumulate rNMPs in their genome and exhibit constitutive post-replication repair (PRR) and DNA damage checkpoint activation .
Eukaryotic cells also possess RNase H1, which is a single subunit protein that cleaves stretches of at least 4 consecutive rNMPs . Its enzymatic activity is essential for mitochondrial DNA replication in mammals , while it does not seem to be required during RER .
Although the presence of single ribonucleotides into the chromosomal DNA has been extensively investigated in different organisms, whether the incorporation of consecutive rNMPs is also possible is still unclear. The study of multiple embedded rNMPs is complicated by the fact that it requires the simultaneous removal of RNase H1 and H2, which can both recognize stretches of more than 4 consecutive rNMPs. S. cerevisiae is an excellent model organism to this purpose, because mutants lacking all RNase H activities are still viable . Nevertheless, RNases H can potentially process any polyribonucleotide tract in DNA (stretches of rNMPs, R-loops, RNA primers found at Okazaki fragments, etc.), so it remains difficult to establish which one of these unprocessed substrates causes the observed effects. Anyway, if stretches of consecutive rNMPs do exist, how they are incorporated (Figure 1) and subsequently removed, needs to be clarified. We discuss below different possible sources of multiple embedded rNMPs.
Figure 1. Sources and forms of rNMPs embedded in genomic DNA.
Despite DNA polymerases being primarily responsible for the incorporation of single rNMPs, only mutant variants seem capable of introducing consecutive rNMPs. The pol ε variant pol2-M644G incorporates rNMPs in DNA at higher frequencies than the wild type pol ε . The fact that this mutant becomes synthetic lethal with the simultaneous absence of RNase H1 and H2 suggests that it incorporates stretches of rNMPs, requiring the activity of both RNases H to be removed . Also the S. cerevisiae polη-F35A steric-gate mutant seems to incorporate polyribonucleotide tracts in DNA at a high rate, leaving a specific 1bp deletion signature without RNase H2 . Moreover, under particular stress conditions, also wild type replicative and/or reparative DNA polymerases may incorporate consecutive rNMPs. For example, Meroni et al. found that, upon replication stress induced by hydroxyurea, pol η is recruited at stalled replication forks, where it facilitates the formation of stretches of rNMPs that become highly toxic for cells, when not replaced by DNA .
Although the number of rNMPs incorporated during DNA replication is surprisingly large, the main source of genomic ribonucleotides remains by far the replication priming. Replicative DNA polymerases require a piece of RNA initiator (RNAi) of ~ 8-10nt in length to work. Considering the discontinuous nature of the lagging strand, this results in about 100,000 RNA:DNA hybrids in yeast and in more than 10 millions of hybrids in human cells, for each round of DNA replication . RNA primers must then be removed, and Okazaki fragments (OFs) joined together to form a continuous lagging strand. Different pathways cooperate in Okazaki fragments maturation ; the dominant one depends on Fen1 (Rad27 in S. cerevisiae) . In the absence of strand displacement, also RNase H2 seems to have a role in the direct hydrolysis of RNA:DNA primers . S. cerevisiae strains lacking Rad27 and RNase H2 are sick, becoming lethal when combined with RNase H1 deletion. This suggests that, besides RNase H2, also RNase H1 has a role in Okazaki Fragments maturation . The exact composition, crosstalk, and regulation of all these pathways are still largely unknown, but dysfunctions in any of these mechanisms could leave flaps or nicks into the genome, causing deletions, amplification of DNA sequences, and DSBs . Moreover, even if never verified, dysfunctions could also result in the stable inclusion of RNA stretches into DNA, as suggested by different groups . Intriguingly, Holmes et al.  found that this happens in the mitochondrial DNA of mice, where, in the absence of RNase H1, RNA primers are retained in both template strands, causing dramatic effects on mtDNA replication. The incorporation of an RNA primer into the DNA is also the mechanism proposed for mating-type switching in S. pombe. During the S phase, two consecutive rNMPs are left by incomplete processing of RNA primer into the lagging strand at the MAT1 locus; these rNMPs are maintained until the following replication cycle, inducing polymerase stalling, and recombination events, which lead to mating-type switching .
Another important source of ribonucleotides is represented by R-loops, peculiar three-stranded structures formed when a transcribed RNA hybridizes back to the template, leaving the non-template DNA single-stranded . These hybrid regions are longer than the canonical 8bp hybrids formed by active RNA polymerases (RNAPs)  and R-loop-prone regions cover about 8% of the yeast genome . Growing evidence suggests that these structures play important roles in regulating gene expression  and chromatin structures . On the other hand, they can compromise genome integrity since R-loops expose patches of ssDNA, which are more susceptible to mutagenesis, recombination and DNA damage, compared to dsDNA . Moreover, conflicts between the DNA replication machinery and R-loops trigger fork collapse and DSBs . Once formed, different factors can act to remove R-loops, like RNase H enzymes (H1 and H2), which cleave the RNA moiety of RNA:DNA hybrids  and numerous helicases, that unwind the hybrids, such as Senataxin (Sen1 in S. cerevisiae) , the human DHX9  and Pif1-family helicases . However, different situations have been described where the RNA filaments of R-loops become embedded into DNA. In prokaryotic cells R-loops are frequently associated with origin-independent replication . In vitro studies have shown that prokaryotic DNA polymerases can use mRNA as a primer when the replication fork collides with the RNA polymerase , and this is also the case for eukaryotic cells. Stuckey et al.  found that in S. cerevisiae RNA polymerase I transcription constraints lead to persistent R-loops in ribosomal DNA, where the RNA present in the R-loop can be used as a primer by DNA polymerases, triggering an origin-independent replication process. Being highly inaccurate, this unscheduled replication can cause genome instability.
Finally, the local incorporation of ribonucleotides, and the presence of different types of RNA molecules have been shown to have important effects even on DNA DSBs, influencing their repair by nonhomologous end-joining (NHEJ) or homologous recombination (HR) pathways . For example, Pryor et al. recently reported that one or more rNMPs are transiently incorporated at broken DNA ends by pol μ or TdT, enhancing DSB repair by NHEJ mechanisms . Growing evidence shows that also the hybridization of complementary RNA molecules at DSB ends regulates their repair . RNA:DNA hybrids seem to contribute to the recruitment of repair factors , and to the control of DNA end resection , the fundamental process creating 3ʹ end ssDNA filaments needed for recombination . However, how RNA:DNA hybrids impact on DSBs processing and repair is still an open debate . Indeed, while some data indicate that they promote resection , others suggest an anti-resection role  or no effect at all . More work is thus required to clarify the regulation of this dynamic phenomenon, also because, as mentioned for R-loops, and RNA primers at Okazaki fragments, it is tempting to speculate that improperly removed RNA tracts might remain embedded at DSB ends, posing a threat to genome stability.
As previously mentioned, single rNMPs are the substrate of RER , but whether this pathway works on multiple rNMPs has never been proved. It is unlikely that the pathways acting on R-loops and OFs could process multiple rNMPs, once embedded into DNA, and thus inaccessible to players like helicases. Since RNase H1 and H2 both process consecutive embedded rNMPs, they represent the main candidates for their removal. Anyway, how the two enzymes work in vivo on these structures needs further clarification. Some progress has been made thanks to the development of a separation-of-function mutant of the RNase H2 enzyme, called rnh201-RED (ribonucleotide excision defective), which loses the ability to remove single rNMPs, but retains a discrete activity on consecutive rNMPs . This mutant has been extremely useful to enlighten the role of the two functions of RNase H2 . Being still able to remove multiple rNMPs, the rnh201-RED mutant alone cannot prove their existence; the development of additional separation-of-function mutants may thus be useful.
Multiple ribonucleotides have also been detected into the DNA of two types of eukaryotic organelles: mitochondria and chloroplasts.
The human mitochondrial DNA (mtDNA) is a circular multicopy molecule of 16,5kb, composed of two filaments, named heavy (H) strand and light (L) strand, and whose replication mechanism is not completely resolved. Different models for mitochondrial DNA duplication have been proposed . Replication primers represent the first source of consecutive rNMPs also in mtDNA. They seem to be synthesized by the mitochondrial RNA polymerase POLRMT and not by a replicative primase, as for the nuclear DNA . Such transcripts are stabilized by G-quadruplex structures formed in the non-template DNA strand, resulting in mitochondrial R-loops that act as replication primers . Polyribonucleotide chains could also result from long RNA transcripts, which temporally coat the displaced H-strand, generating RNA:DNA hybrids that function as lagging strands during mtDNA replication. These long RNAs may result from a primase activity or by the hybridization of the displaced DNA with preformed RNA transcripts . RNase H1 is the factor responsible for the removal of multiple rNMPs from mtDNA. The mammalian RNase H1 is recruited into the organelles thanks to an essential mitochondrial localization domain, and failures in its activity cause mitochondrial dysfunctions. In mouse, when RNase H1 is absent, replication primers are not properly removed and stretches of RNA remain fixed in both template strands of mtDNA . Since they cannot be bypassed by the mtDNA polymerase γ, they lead to persistent DNA gaps that are catastrophic for the subsequent round of replication . As a consequence, mice lacking RNase H1 die during embryogenesis . In human, mutations in RNase H1 have been associated with mitochondrial encephalomyopathy with adult onset . These examples highlight the importance of removing multiple rNMPs from mtDNA.
Ribonucleotides have also been observed into the DNA of chloroplasts, the other organelles capable of autonomous replication in plant cells. The chloroplast DNA (cpDNA) consists of linear or circular multicopy molecules of 120-170kb, which can replicate in different manners . Even if there is still much to learn about rNMPs in the DNA of chloroplasts, it is evident that stretches of multiple rNMPs can compromise cpDNA stability. Apart from RNA tracts used for DNA replication priming, R-loops can be frequently found in these organelles. It has been found that the AtRNaseH1-like protein (RNH1C), together with DNA gyrases, plays a key role in the processing of these hybrids, maintaining chloroplast DNA integrity .
Even if rNMPs in mtDNA and cpDNA need to be further explored, their existence in these endosymbiotic organelles is extremely intriguing: it suggests that the presence of the incorrect sugar in DNA comes from ancestral forms of life and has been maintained throughout the evolution, to perform physiological functions also in evolved organisms.
Although stretches of multiple embedded rNMPs have only been observed in mtDNA, their presence into the nuclear DNA has also been genetically predicted. The persistence of multiple rNMPs in the mitochondrial DNA has been shown to have detrimental effects, and so is suspected for genome-embedded polyribonucleotide chains, with consequences even more severe than those deriving from unprocessed single rNMPs. Different techniques are currently available to study single rNMPs and RNA:DNA hybrids, but further efforts should be made for the development of groundbreaking methods allowing to isolate only the desired category of RNA:DNA hybrids, and to distinguish sites of single rNMPs insertion from sites with multiple rNMPs. Demonstrating the existence of consecutive embedded rNMPs and discovering details about their sources and removal might help to clarify the contribution of the two RNases H in the recognition and processing of all hybrid structures, and, importantly, to shed light on the mechanisms linking RNA:DNA hybrid structures, replication stress, genome instability and severe human pathologies.