| Version | Summary | Created by | Modification | Content Size | Created at | Operation |
|---|---|---|---|---|---|---|
| 1 | Victoria Sanchez-Martin | -- | 1333 | 2023-02-06 12:42:20 | | | |
| 2 | Victoria Sanchez-Martin | + 1 word(s) | 1334 | 2023-02-06 12:44:20 | | | | |
| 3 | Conner Chen | + 57 word(s) | 1391 | 2023-02-07 06:37:53 | | | | |
| 4 | Conner Chen | Meta information modification | 1391 | 2023-02-07 06:38:30 | | | | |
| 5 | Conner Chen | + 3 word(s) | 1394 | 2023-02-08 01:52:16 | | | | |
| 6 | Conner Chen | + 2 word(s) | 1396 | 2023-02-08 02:22:33 | | |
DNA G-quadruplexes (G4s) are non-canonical secondary structures formed in guanine-rich sequences. Within the human genome, G4s are found in regulatory regions such as gene promoters and telomeres to control replication, transcription, and telomere lengthening. In the cellular context, there are several proteins named as G4-binding proteins (G4BPs) that interact with G4s, either anchoring upon, stabilizing, and/or unwinding them.
DNA can adopt alternative secondary structures beyond the double helix. In G‐tetrads, four guanine bases are arranged via Hoogsteen base pairing in planar tetrameric squares. Several G‐tetrads, that are proximally located, self‐stack by π–π interactions further stabilized by monovalent or divalent cations centrally coordinated to form a three‐dimensional structure, termed as a G‐quadruplex (G4) [1]. G4s comprise several polymorphic structures and can fold into various topologies depending on the relative direction of the strands (i.e., parallel, antiparallel, or hybrid), the number of strands involved (i.e., intra or intermolecular), and the number of stacking G‐tetrads [2]. Within the genome, G4s are not randomly distributed. Instead, G4s are clustered in key regulatory sites such as gene promoters and telomeres, as well as in gene bodies [3]. Interestingly, nucleosome‐depleted regions and promoters of actively transcribed genes are significantly enriched in G4s [4]. Altogether, these observations strongly support their close involvement in functions of the genome including DNA replication, transcription, and epigenetic modification [5].
The participation of proteins is inherently required for the regulation of G4 formation across the whole genome and transcriptome, as well as the fulfillment of their biological functions. The proteins that specifically bind to G4s are known as G4‐binding proteins (G4BPs). The interest in G4BPs is considerably increasing for drug development owing to the ability to target the respective downstream processes directed by the interaction between G4BPs and the G4 counterpart.
The nucleic acid G4‐interacting proteins database (G4IPDB) contains comprehensive and updated information about proteins interacting with G4s [6]. At a freely accessible platform, G4IPDB includes more than 100 entries with data about interacting proteins, target sequence, PubMed reference number and binding details if available. Whilst there are multiple studies describing G4‐G4BP interactions, the functional outcome remains to be determined for many pairings [7].
The categorization of G4BPs is attainable in several ways. Firstly, G4BPs are divided into the following two types according to the distribution of G4s in the genome: (i) DNA and (ii) RNA G4BPs. The following contents only covers DNA G4BPs. Secondly, G4BPs are classified into two main types based on their functional relationships with G4s: (i) G4BPs that are recruited by G4s without affecting their structure and (ii) G4BPs that have an effect on the G4 structure. These last G4BPs are also divided into the following two categories in consonance with the structural effects on G4 structure: (i) G4‐stabilizing, which promote putative G4 sequences to form a stable G4 structure and (ii) G4‐destabilizing, which have the ability to unfold G4s.
Table 1 summarize recent discoveries related to the involvement of G4BPs in the regulation of cellular processes that include telomere lengthening, replication, transcription, chromatin remodeling, and histone modification.
Table 1. G4BPs involved in telomere lengthening, replication, transcription, and chromatin remodeling and histone modification. List of all G4BPs (alphabetically sorted) grouped according to their biological function. Literature column shows the reference in which the protein was first described as a G4BP.
| Biological Function | Protein Name | Literature |
|---|---|---|
| Telomere lengthening | BLM | [8] |
| CST | [9] | |
| DNA2 | [10] | |
| hnRNPA1 | [11] | |
| hTERT | [12] | |
| Pif1 | [13] | |
| POT1 | [14] | |
| RPA | [15] | |
| RTEL1 | [16] | |
| TLS/FUS | [17] | |
| TRF2 | [18][19] | |
| UP1 | [20] | |
| WRN | [8] | |
| Replication | BLM | [21] |
| BRCA1 | [22] | |
| DDX11 | [23] | |
| FANCJ | [24] | |
| Pif1 | [25] | |
| WRN | [21] | |
| Transcription | GQN1 | [26] |
| hnRNPA1 | [27] | |
| MAZ | [28] | |
| NM23-H2 | [29] | |
| Nucleolin | [30] | |
| Nucleophosmin | [31] | |
| SP1 | [32] | |
| TP53 | [33][34] | |
| XPB | [35] | |
| XPD | [35] | |
| YY1 | [36] | |
| Chromatin remodeling and histone modification | ATRX | [37] |
| BRD3 | [38] | |
| CTCF | [39] | |
| DNMT1 | [40] | |
| REST-LSD1 | [41] | |
| SMARCA4 | [42] |
Although there is a limited number of high-resolution structures of G4BPs interacting with G4s available, it is assumed that binding can occur at the following sites: (i) top-stacking with the upper G-tetrads, (ii) groove-binding, (iii) loop-binding, or a combination of those modes. The different functions of G4BPs may be linked to the binding mode the protein assumes in interacting with a G4. For instance, a top-stacking binding mode would appear to be more practical for G4BPs that unwind multiple and different G4s. In contrast, G4BPs that are too selective towards G4s may function through loop-binding because the orientation and length of the loops vary among different G4s (with a similar barrel core). Finally, groove-binders display a particular conformation able to fit into a groove of the G4 [43]. Interestingly, binding of a G4BP to a G4 does not equate to functioning. It was demonstrated that PARP-1 (poly(ADP-ribose) polymerase 1) affinity for a G4 increased as the loop features were removed, but PARP-1 activation was no longer achieved [44].
Analyzing the amino acid composition and structural patterns of G4BPs provides further insights into G4-recognition mechanisms. G4BPs are enriched in shared domains or motifs that are established or predicted to function as binding regions. In particular, 77 human G4BPs shared a domain rich in glycine and arginine residues [45]. Such a highly conserved domain is termed the RGG (Arginine-Glycine-Glycine)/RG (Arginine-Glycine) motif or GAR (Glycine-Arginine-rich) domain and is composed of repeat sequences rich in RGG or RG [46]. The RGG domain is important in G4–protein interactions. For instance, the RGG motif in nucleolin is essential for the recognition of the CMYC G4 sequence and the promotion of G4 formation [47]. The pairs of interactions between amino acid residues in proteins and bases in DNA have been identified [48]. Within the RGG domain, the internal arrangement of RGG repeats and gap amino acids seem to play a more crucial role in the G4-binding mechanism than a critical number of RGG repeats [49]. Interestingly, the cold-inducible RNA-binding protein (CIRBP) was the first protein identified as a G4BP both in vitro and in cells from the exploration of the RGG motif [49]. In addition, proteins that bind oligonucleotides or oligosaccharides were observed to contain a particular motif named the oligonucleotide/oligosaccharide-binding (OB)-fold domain. OB-fold has a five-stranded β-sheet coiled to form a closed β-barrel [50]. For instance, the OB-fold domain is included in several G4BPs such as CST [9] and POT1 [51] and participates in G4 interaction.
The large number and evolutive conservation of G4s point to their importance in biological functions. Since G4 structures are also a source of genomic instability, G4 formation has to be tightly controlled. In this sense, G4s are highly dynamic in vivo and their folding depends on the cell type and chromatin state. G4BPs participate in the stabilization or resolution of G4s. Thus, it is important to consider the G4 not as an isolated entity, but rather as a structure that exists as part of an interconnected network of other biomolecules, such as G4BPS, within living cells. To date, a broad spectrum of G4BPs has been identified, but there are some difficulties in the analysis of the G4 interactome. Firstly, the interaction in vitro may not confirm the existence of such an event in vivo due to the plasticity of G4 formation in distinct cellular contexts. Secondly, three-dimensional structures of G4-G4BP complexes are still sparse. However, given the development of methodologies for the identification of G4BPs, it can be suspected that the number of proteins with G4-binding specificity will be increased in the future. While some of the known G4BPs appear to function as “pan-binders”, others act in a more selective manner and with different affinity, which implies that selectivity is attainable. Despite several domains involved in the recognition of G4s having already been characterized, there may still be others to be deciphered. Determining the features of G4BPs will be helpful in eliciting the details necessary to rationally design selective binders. Therefore, improving the selectivity of G4 binders to minimize off-target effects in the host cell remains a challenge for the future. Undoubtedly, a critical step to activate or inactivate physiological or pathological pathways is the recognition and processing of G4s by G4BPs. In this regard, the extensive research on G4BPs will provide new targets for drug design and pave the way for novel therapeutic approaches in human diseases.