1. Introduction
It is well known that DNA can adopt various sequence-dependent secondary structures distinctly different from the classical Watson–Crick double helix. Among them, G4s are currently the most abundant and examined noncanonical conformations owing to their involvement in the regulation of various cellular processes. Over the past few decades, the structure of G4s, their distribution in cells and their biological functions have been extensively studied.
G4s are four-stranded nucleic acid structures that can originate either from DNA or RNA regions containing adjacent guanosine-rich runs (G-tracts). G4s arise from the stacking of two or more G-tetrads, which are a planar cyclic arrangement of four guanine bases held together by Hoogsteen hydrogen bonds and stabilized additionally by monovalent cations coordinated in or between the G-tetrads, with a decrease in the stabilization efficiency in the following order: K
+ > Na
+ > NH
4+ >> Li
+. Oligonucleotide loops that connect consecutive G-tracts play an important role in the overall folding and stability of G4s. Three types of loops with propeller, diagonal and lateral orientations are typical for G4s
[1]. The type of loop depends on the number and nature of the loop nucleotides, as well as strand directionality and the number of G-tetrads that they traverse. Some long loops have been found to adopt well-defined, hairpin-like structures that increase the thermodynamic stability of the G4s (for review, see
[2]).
The G4 structures are highly polymorphic. Under in vitro conditions, they are influenced by different factors as the number of DNA molecules involved in G4 formation, oligonucleotide concentration and sequence, especially the number of G-tracts and their length, the type and concentration of cations in solution, the length and secondary structure of the loops, crowding conditions, the presence of different cosolutes and other biomolecules
[2][3][4]. G4s can adopt right-handed parallel, antiparallel and hybrid (3 + 1) topologies characterized by different orientations of the four G-tracts in the quadruplex core and taking into account
syn- or anti-guanosine glycosidic bond angles, as well as left-handed G4 structures
[5]. The conformational diversity of intramolecular DNA G4s is expanded due to the detected G4-forming sequences (G4 motifs) that escape the standard QuadParser algorithm (G
nL
1-7G
nL
1-7G
nL
1-7G
n, where n = 3, L = A, T, G, C)
[6]. G4s with long loops (up to 21 nucleotide residues)
[7], bulges that arise from the incorporation of non-guanine bases in G-tracts
[8][9], G4s stabilized with just two G-tetrads
[10][11] and G4s with missing G, i.e., with a G-triad instead of a G-tetrad in the quadruplex core
[12], were identified. The formation of intermolecular G4s, four-stranded DNAs containing other types of tetrads consisting of A, T, C residues
[13] and mixed tetrads containing Watson–Crick base pairing in the quadruplex context
[14], as well as higher-order parallel-stranded G4 structures
[15], further increases the conformational space of these noncanonical forms.
Some improvements in sequence-based G4 motif prediction in genomes have recently been made using a radically different G4Hunter algorithm
[16]. This algorithm takes into account the G-richness and G-skewness (G/C asymmetry between complementary DNA strands) of a given region. The sequences capable of forming G4s are evolutionarily conserved. They are widely represented in the genomes of all organisms, although they are predominantly enriched in eukaryotic genomes. Bioinformatic analysis showed that G4 motifs frequently cluster at key regulatory elements such as promoter regions of many oncogenes and genes involved in growth control
[17][18][19], replication origins, untranslated exon regions, immunoglobulin switch sites and recombination hotspots. A high frequency of G4 motifs is also observed in telomeric DNA and micro(mini)satellite repeats, genes of ribosomal RNA and mitochondrial DNA
[20]. Compared to telomeres, the G4-forming sequences found in the promoter regions are more diverse, since the varying number and length of G-tracts leads to the potential formation of multiple G4s. G4 motifs have also been well documented in all bacterial and viral genomes available in the NCBI database
[21].
The evidence of G4 formation in living cells was obtained by both indirect (such as G4-associated genome instability
[22][23]) and direct approaches (the use of G4-specific, fluorescently labeled antibodies and small-molecule ligands
[24][25][26][27][28], in-cell NMR spectroscopy
[29] or real-time visualization of DNA G4 structures in living cells with small-molecule fluorescence probes
[30]). In cells, G4s appear to form maximally during the S-phase
[24], indicating that their number is dependent on DNA replication, a process at which the DNA strands are transiently separated at the replication fork, allowing the single strands to fold into a G4 structure.
G4 recognition by specific cellular proteins provides functional evidence of these noncanonical forms in vivo. They play critical roles in the regulation of key biological processes in eukaryotic genomes, such as telomere maintenance, the regulation of gene expression at the transcriptional level, DNA replication initiation and DNA damage and repair, as well as promoting immunoglobulin gene recombination and programming genome rearrangements. Recently, the role of G4s in embryonic development, the most controlled process in vertebrate biology, has been established
[31]. Thus, it was shown that G4s in the promoters of developmental genes enhance their transcription in zebrafish embryos, probably by favoring the binding of specific transcription factors or by keeping the DNA molecule open, thereby facilitating the re-initiation of transcription. On the other hand, G4 formation causes accidental genome and epigenome instability, associated with carcinogenesis and neurological disorders
[32]. Although the functions of G4s in prokaryotes and viruses are not fully elucidated, these non-B-form structures are considered important regulators of pathogenic processes because they control the expression of virulence genes
[33].
In recent years, the deleterious effects of G4s on genome integrity and their potential role in regulating the DNA repair machinery have become the subject of intense research. It has been shown that, among identified G4-binding proteins, there are many G4-resolving and repair proteins, such as special helicases and proteins involved in homologous recombination and other canonical repair pathways
[34][35]. Recent findings have revealed novel functions of G4 structures in various aspects of epigenetic regulation. However, a detailed understanding of the G4 effect on DNA repair and the mechanisms by which DNA replication is coupled to genetic and epigenetic instability is currently lacking.
Our review [34] summarizes the recent data on G4-mediated regulation of these key cellular processes. The main factors that play a dominant role in the efficiency of G4 damage, mainly the introduction of oxidative guanine lesions, as well as their removal from G4 structures by various repair pathways have been characterized. Since G4 formation in the genome context, their stabilization and resolution must be regulated in a complex, coordinated manner
[36], a broad overview of the factors that stabilize G4 structures in vitro and in vivo has been presented
[34]. Here we have focused on the G4's impact on the mismatch repair.
2. Influence of G-Quadruplexes on DNA Mismatch Repair
Mismatch repair (MMR) is required for proper maintenance of the genome by protecting against noncanonical base pairs or mismatches and insertion/deletion loops due to DNA polymerase errors or during homologous recombination. At the same time, this system has also been reported to be a driver of certain mutations, including disease-related instability of trinucleotide repeats in human cells
[37]. It is important to note that the basic features of MMR have been conserved throughout evolution from bacteria to humans
[38]. The most studied and widely employed MMR systems are those of
E. coli and humans. In
E. coli, the repair process is initiated by the binding of the MutS protein to mismatched bases. Upon recognition of the mismatch, MutS recruits MutL in an ATP-dependent manner to form a ternary complex that is believed to coordinate a cascade of subsequent events. MutL stimulates the MutH endonuclease, which interprets the absence of DNA methylation as a daughter-strand mark, thereby helping to distinguish and cleave the newly synthesized strand (methyl-directed MMR). In eukaryotes and most bacteria, MutL rather than MutH has the endonuclease activity (methyl-independent MMR). The unmethylated DNA strand is then hydrolyzed by a set of exonucleases. Finally, DNA polymerase and ligase fill the gap in the daughter strand (for review, see
[39]).
The binding of
E. coli MutS (ecMutS) protein and the human homolog MutSα to tetrameric polymorphic DNA G4s has been reported
[40][41], and the affinity of these proteins for G4 turned out to be 2–4-times higher than for DNA with a G/T mismatch, a specific substrate of the MutS. The binding of MutSα to G4 motifs of immunoglobulin class-switching regions was directly visualized by electron microscopy. The interaction of MutS with G4 has also been shown in vivo
[41]. The analysis of the mode of MutS binding to G4s has generated great interest, since some studies have revealed variations in the ways ecMutS and its human homolog interact with quadruplex structures, on the one hand, and DNA mismatches, on the other
[40][41]. Indeed, a highly conserved phenylalanine residue in the MutS’s Phe-X-Glu structural motif, critical for G/T mismatch recognition due to stacking with one of the mispaired bases, is not required for G4 recognition; in addition, the ATP-induced conformational changes in MutS, which promote the release of the mismatch-containing DNA duplex, are contrasted with the ATP-independent binding of MutS to the G4 structure. Ehrat
et al. hypothesized that MutS is unable to activate the ATP-dependent canonical MMR pathway through G4 binding and that the function of MutS in G4 DNA metabolism is not associated with methyl-directed MMR
[40]. However, no direct evidence has been obtained to support this hypothesis. To answer the questions of whether G4 formation interferes with mismatch-induced DNA cleavage caused by the coordinated actions of MutS, MutL and MutH proteins, and whether G4 itself activates MMR responses, our lab has designed new DNA constructs containing a biologically relevant intramolecular parallel G4 stabilized in the context of double-stranded DNA with a set of DNA sites required to initiate the MMR pathway (G/T mismatch and MutH recognition site). These DNA models were created by the hybridization of partly complementary strands, one of which contained a G4 motif d(GGGT)
4 flanked by oligonucleotides, while the opposite strand lacked the site complementary to the G4-forming insert
(Figure 1) [33]. Using NMR spectroscopy, chemical probing, fluorescent indicators, circular dichroism and UV spectroscopy, the coexistence of parallel-stranded intramolecular G4 and duplex domains in the developed DNA models has been unequivocally proved.
Figure 1. Schematic representation of a stable G4 structure embedded in a DNA duplex.
In contrast to previous studies that used simple models with isolated G4, our DNA constructs allowed us, in addition to ecMutS, to study the affinity of G4 binding to other proteins involved in the initial steps of MMR: proteins MutL from E. coli (methyl-directed MMR) and MutS from Rhodobacter sphaeroides (methyl-independent MMR). Moreover, we were the first to assess the impact of the G4 structure on the functioning of ecMMR. We have proven experimentally that G4 is not perceived by ecMMR as the damage that needs to be repaired; at the same time, this noncanonical DNA structure does not prevent the mismatch-dependent activation of MMR when the G4 and a G/T mismatch together are present in the DNA substrate at a distance of at least 17 bp.
To assess the role of the distance between G4 and DNA mismatch on the functioning of ecMMR on G4-containing substrates, a set of DNA duplexes with an embedded intramolecular parallel G4 structure and a monomethylated recognition site for the MutH endonuclease was prepared. They differed in the mismatch position—on the 3′- or 5′-side from G4—as well as the distance between the G/T pair and the G4 structure, which varied from 18 to 3 bp (
Figure 2).
The experimental procedure was described in paper [33].
Briefly, nicking endonuclease activity of MutH was assayed by incubating 25 nM 3ꞌ-TAMRA–labeled DNA substrate in 20 mM HEPES–KOH buffer (pH 7.9) containing 5 mM MgCl2, 120 mM KCl, 0.5 mg/mL BSA, and 1 mM ATP. Data were obtained for the MutH alone (250 nM concentration) and for the combinations of MutH with 250 nM MutS or 250 nM MutL, as well as with both 250 nM ecMutS and 250 nM ecMutL (all protein concentrations were calculated per monomer). The reaction mixtures were incubated at 37 °C for 1 h. The reactions were stopped by proteinase K treatment for 15 min at 50 °C. 3′-Labeled cleavage products were separated from intact DNA strands by electrophoresis in a 10% polyacrylamide gel containing 7 M urea, allowing for evaluation of nicking potency. The cleavage efficiency was calculated from at least three independent experiments by division of the intensity of a product band by total fluorescence intensity. Error is presented as 95% confidence intervals.
Figure 2. Efficiency of ecMMR protein-induced hydrolysis of linear DNA duplexes with an embedded parallel G4 and a MutH recognition site, which differ in the mismatch position. (
a) DNA models used in this work; their names are shown on the left and right, and the sequences are the same as in
[33]. 5′-Gm
6ATC-3′/3′-CTAG-5′ corresponds to MutH recognition site, and black arrows indicate the position of DNA cleavage by the MutH endonuclease. Pink asterisks represent the TAMRA fluorophore at the 3′-end of the unmethylated daughter strand. (
b) Efficiency of DNA hydrolysis induced by ecMMR proteins (
p < 0.05).
The efficiency of introducing a single-strand break into the prepared DNA models by ecMutH endonuclease alone and in the presence of ecMutL, ecMutS or both was evaluated and compared with the efficiency of control DNA duplexes lacking the G4 structure or G/T mismatch. It was shown that ecMutH alone cleaved all studied DNA substrates with equally low efficiency, while ecMutS added to the reaction slightly increased MutH activity, and the presence of MutL instead of MutS led to a ~4-fold growth in DNA cleavage efficiency (~40%) compared to MutH alone (~10%). Because ecMutH is known to colocalize and operate in coordination with ecMutL
[42], the observed effect can be explained by the recruitment of an increased number of MutL molecules to G4, causing the activation of additional MutH molecules. Finally, the combined action of the full set of MMR proteins involved in the initial stage of mismatch repair—MutH, MutS and MutL—yielded the highest efficiency of MutH-mediated DNA cleavage (~70%) for all G4-containing DNAs, which was practically independent of the G/T mismatch position.
3. Conclusion
Thus, biologically relevant parallel G4 stabilized in double-stranded DNA does not trigger the initiation of ecMMR. At the same time, such a noncanonical structure as G4, capable of disrupting the movement of many processive proteins along B-DNA and binding with high affinity to the MutS and MutL proteins, does not block the action of endonuclease ecMutH, even when located in the immediate vicinity of a G/T mismatch and a MutH recognition site. For the most studied MutS homologs, it was found that the mode of MutS binding to intermolecular and intramolecular G4s is apparently common for prokaryotic and eukaryotic organisms, regardless of the strand discrimination mechanism
[33][40].
Their involvement in DNA repair and recombination is also well known
[43][44][45]. It is believed that the interaction of MutS with G4 may play a yet unidentified role in the regulation of DNA recombination and G4 unwinding.
Funding: This work was supported by the Russian Science Foundation (project No. 21-14-00161).