1. Role of G4-DNA at Transcribed S Regions and in the Resolution of G4s/R-Loops Conflicts
Although activation-induced deaminase (AID) can target AT-rich "switch" (S) regions from amphibians during class switch recombination (CSR), G4 (G-quadruplex)-DNA is well known to abundantly form on the nontemplate strand of transcribed mammalian S regions
[1]. In silico analysis (for example using the G-quadruplexes (G4s)-hunter algorithm, either
http://bioinformatics.cruk.cam.ac.uk/G4Hunter or
https://www.g4-society.org/online-tools, both accessed on 21 Octobre 2022) notably shows that G4-DNA is abundant at mouse and human Ig-heavy (IgH) S regions
[2] (
Figure 1). There are also multiple functional indications that G4s are implicated in CSR regulation and promote gene recombination
[3]. G4-DNA notably promotes the occurrence of DNA breaks
[4]. Among the seminal studies of the molecular properties of S sequences, Wells and colleagues cloned various G-rich S
α repeats into plasmids in a search of peculiar DNA structures
[5] and showed that S
α repeats adopted a non-B DNA structure, which is characterized by supercoil-dependent endonuclease cleavage and sensitivity to chemical probes, suggesting a potential intramolecular triple-strand. Moreover, Sen and Gilbert showed by electrophoretic mobility shift assays that S regions adopt the canonical model of a parallel, four-stranded G4-DNA structure diffraction
[6].
Figure 1. G4s at S regions. (
A) Schematic representation of R-loops and G-loops. Transcription can create RNA:DNA hybrids, also called R-loops (left). When the nontemplate DNA strand of R-loops is G-rich, it can create G4s, and these structures are then called G-loops (right); (
B) Representation of G4s located in the human IgH locus constant gene cluster (from Sµ to the end of the 3′RR2 (3′ regulatory region (3′RR))) on the coding strand (top) and the template strand (bottom). This representation has been made by processing the IgH sequence with the G4-Hunter algorithm (
https://www.g4-society.org/online-tools, accessed on 21 October 2022).
R-loops and G4s can act as physical impediments to DNA and RNA polymerases during replication and transcription, where G4s stabilize R-loop structures and facilitate the local recruitment and oligomerization of AID
[7]. However, the detailed molecular contribution of G4s to CSR remains unclear, and while their presence favors CSR, their stabilization by G4s-ligands by contrast impedes CSR
[2], suggesting that the contribution of G4-DNA follows a dynamic scheme. Whether the abovementioned fragility of G4-DNA in hypoxic conditions might contribute to the process of CSR is currently unclear. While B cell activation largely happens in vivo in hypoxic lymphoid structures such as germinal centers (GCs), conflicting data have been published about the connection between hypoxia and class switching, mentioning both increased CSR breaks in B cells cultured in hypoxia conditions in vitro and increased CSR to the Cα gene in vivo but also decreased AID expression in vivo and decreased CSR to IgG2 in a mouse model experimentally exposing GCs to hypoxia
[8][9]. These ambiguous global effects of hypoxia might be obscured by the fact that AID expression is lowered by hypoxia, which may then counterbalance a simultaneous increase in DNA breaks in G4-rich S regions
[9].
As previously mentioned, CSR occurs in transcribed G-rich S regions, forming RNA:DNA hybrids on the template strand and exposing single-stranded R-loops on the nontemplate strand, which is then a substrate for AID
[10][11]. This likely contributes to the prevalence of orientation-dependent CSR, joining distant breaks both initiated on the nontemplate strand
[12] (although orientation-independent CSR has also been experimentally reported after breaks involving short palindromic sequences instead of G-rich sequences
[13]) (
Figure 2B).
Figure 2. G4s and class switch recombination. (A) The human IgH locus is represented after VDJ recombination. The recombined VDJ gene and CH genes are represented by rectangles and the S regions by ovals. S regions are preceded by promoters and I exons. The human IgH locus also includes two regulatory 3′RR (black rectangles). Before CSR, NME1 binds to the S regions and prevents CSR; (B) Transcription through S regions by RNA Pol II (brown) yields noncoding RNA (purple). R-loops facilitate the formation of G4s (G-loops). Within loops, RNA:DNA hybrids restrict the accessibility of AID to only the nontemplate strand. The RNA exosome (orange) degrades the RNA hybridized to the template DNA strand, also exposing it to AID for DNA deamination; (C) After B cell activation, NME1 is removed, and AID binds G4s of targeted S regions, initiating breaks to be repaired by ligation of distant S regions (here, Sµ and Sα2). AID targets cytosines (C) on accessible ssDNA. CSR is modulated by natural G4s ligands, such as NME2, which bind to G4s after transcription and stimulation, and also by the ORC and the Mcm complexes; (D) CSR diversifies the functions of B cells and of class-switched antibodies (here IgA2); (E) CSR can be inhibited by chemicals ligands of G4s, such as RHPS4, pyridostatin, and CX-5461, decreasing the frequency of B cells expressing or secreting class-switched Ig.
A pair of nucleoside diphosphate kinase (NME) isoforms, one of them binding G4s, are novel players in the CSR process. They were identified (using a reverse ChIP (chromatin immunoprecipitation) proteomic screen and a gel shift with single-stranded DNA) by searching proteins associated with CSR double-strand breaks (DSBs) in B cell lines and mouse primary B cells. NME1 binds S regions before the formation of G-loops and represses the initiation of CSR. When G-loops are formed upon stimulation, NME1 then dissociates from activated S regions, whereas NME2 binds G-loops and promotes CSR
[14] (
Figure 2A,C). The NME1/NME2 pair thus coordinately modulate G-loop accessibility and CSR.
In addition to the role of G4s at a DNA level, there is abundant direct and indirect evidence that G4s and/or equivalent structures forming in parallel on RNA transcripts from these regions are implicated in CSR regulation. It remains unclear to what extent AID directly targets DNA and/or requires G4-RNA intermediates, and this remains a controversial topic.
2. Role of G4-RNA Structures within S Region Transcripts
In the context of mammalian S regions, the G-rich nontemplate single strand DNA and the corresponding nascent RNA can both form G4s motifs, the latter being called G4-RNA. G4-RNAs can be found in more than 3,000 human mRNAs
[15][16]. Transcriptomic profiling of G4-RNAs is possible via G4-RNAs-specific precipitation (G4 RP) using the G4s-specific probe, BioTASQ
[17].
For the specific situation of S region germline transcripts (GLTs), several studies showed that AID can directly bind S transcripts through G4-RNAs. As mentioned above, GLTs undergo splicing and are then liberated as processed GLTs, while the S region lariat remains annealed as part of the R-loop. Debranching and folding of the lariat into G4 secondary RNA structures likely contribute to the recruitment of AID via AID–RNA binding. Yewdell and Chaudhuri proposed models for RNA-dependent targeting of AID during CSR
[18]. They notably postulated a role of AID–RNA complexes in trans. In this situation, the S region lariat is debranched to form a linear S region transcript, which can fold into a G4 secondary RNA structure. Then, either this structure is bound by AID, and the following complex then binds on the complementary DNA strand, or the RNA first binds the complementary DNA strand, and both are then bound by AID. In another model proposing the targeting of AID–RNA complexes in cis, processed nascent GLTs would remain attached to the template DNA strand at the position of R-loops.
Whatever the model, AID binds structured substrates G4-DNA
[3] and efficiently yields mutation and DSBs clusters at the positions of S regions featuring ‘‘G-loops’’, and this may also help to recruit CSR cofactors
[19]. The situation of the IgH locus is worth comparison with other contexts, where R-loops in viral RNA (from HIV, Zika, Hepatitis B, SV40, etc.) topologically control its adenosine methylation and thus show colocalization of G4-RNAs with the epitranscriptomic mark m6A
[20]. Such a role remains to be explored in B cells, where it could potentially interfere with m6A-dependent processing of GLTs by the RNA exosome
[11]. The m6A mark allows RNA exosome binding for degradation of RNA:DNA hybrids, and it was recently shown that m6A modifications controlled not only G4-RNAs but also G4-DNAs formation, then regulating the biological functions of these structures
[21]. Of note, RNA sequence also influences RNA binding to lipid membranes. This interaction is increased by G4-RNAs
[22], and this may participate into the functional role of G4-rich RNA during biological processes by tethering some G4-RNAs.
It is noticeable that a mutation in the putative RNA-binding domain of AID impairs its recruitment to S regions, inhibiting CSR similarly to the inhibition of RNA processing
[23]. Inhibition of CSR was also obtained by inhibiting a specific step of the processing of S introns: the debranching of the lariat by the DBR1 enzyme
[23]. Expression of switch RNA in trans then rescued the CSR defect in DBR1-deficient B cells
[23]. Availability of debranched RNA copies of S regions may thus contribute to the subsequent generation of G4-RNAs that participate in guiding AID to specific DNA S regions through RNA:DNA base pairing as a “collaboration” between G4-RNAs and G4-DNAs. In addition, by focusing on G4s present in intronic S region RNA, Ribeiro de Almeida et al. showed that the RNA helicase DDX1 unwinds G4-RNAs structures, allowing these RNAs to participate to R-loops in vitro and in vivo. Therefore, in this model, R-loops at S regions are formed post-transcriptionally in trans and are dependent on DDX1 and G4-RNAs. Stabilizing G4-RNAs with G4 ligands such as pyridostatin or inducing the expression of DDX1 ATPase-deficient mutant accordingly reduces CSR
[24]. Moreover, alternative lariat sequences could avoid the fixation of DDX1 to G4-RNAs that participate in guiding AID to specific S regions through RNA:DNA base pairing. CSR would then rely on connections between G4-RNAs on S region transcripts and DNA at R-loops
[24].
3. Connections between CSR and DNA Replication
In addition to its connections with transcription, CSR is temporally and physically connected with the progression of DNA replication through the IgH locus and AID-dependent DNA breaks occur and are mostly repaired within the G1 phase
[25]. While R-loops contribute as mentioned above to the specification of replication origins, CSR efficiency notably depends on and correlates with the activity of these origins in S regions, and G4-DNA participates in CSR regulation in an indirect way in mouse B cell lines and in primary splenic B cells
[26]. DNA replication across S regions also regulates CSR in an R-loop-dependent manner. Wiedemann et al. demonstrated that the origin of replication is independent of AID and of DNA breaks but indeed mostly relies on G4-DNA, so that facultative replication origins assemble at R-loops and contribute to the synapsis of S regions targeted by CSR
[26]. Actually, at the IgH locus, as in other parts of the genome, G4-DNA impacts the binding of the origin recognition complex (ORC) and requires the replicative helicase activity of MCM
[27]. In the G1 phase, IgH transcription allows R-loop formation including G4-DNAs and then triggers the activation of facultative G4-rich replication origins within S regions
[26]. Since replication origins located within the same topologically associated domain (TAD) tend to be physically clustered, this may favor the synapsis of S regions including such origins
[26]. In this way, CSR would not only be coordinated with cell proliferation but also physically facilitated by the mechanistic aspects of DNA replication during the G1-phase (
Figure 2C).
4. A Role of G4-DNA in IgH Locus High-Dimensional Organization
As mentioned above, the physiology of IgH locus expression and recombination is based on programmed changes of the IgH locus 3D organization, based on long-range interactions between promoters, enhancers, and regions targeted for recombination. These dynamic changes are notably interpreted through the loop extrusion model, which allows the synapsing of distant recombination sites prior to V(D)J recombination and class-switching
[28][29]. The role of G4-DNA in these events is not currently demonstrated, but it is striking to note that G4-DNA is abundantly mapped at the position of regulatory chromatin and may then play a role in the organization of TADs
[30]. Presence of the G4-DNA within the IgH 3′RR might then play a role in the organization of the IgH TAD. Of note, the architectural factor YY1, known both to bind the 3′RR and play a role in DNA looping, is another known binder of G4-DNA
[31]. HP1α also binds G4-DNA and is known for its role in the organization of separate domains of heterochromatin or of transcriptionally active euchromatin
[32]. The abovementioned superimposition of G4-DNA
[33], or G-rich R-loops
[34] with regions bound by cohesin, and by soluble vimentin
[35] further argues for an architectural role of G4-DNA. All these elements are likely to be crucial for the process of CSR, which strongly relies on 3D interactions between germline promoters, the 3′RR, and the targeted S regions within an active IgH TAD
[36].