G-quadruplexes and ligands: Biophysical methods: Comparison
Please note this is a comparison between Version 1 by Tiago Santos and Version 3 by Amina Yu.

Progress in the design of G-quadruplex (G4) binding ligands relies on the availability of approaches that assess the binding mode and nature of the interactions between G4 forming sequences and their putative ligands. The experimental approaches used to characterize G4/ligand interactions can be categorized in structure-based methods (circular dichroism (CD), nuclear magnetic resonance (NMR) spectroscopy and x-ray crystallography), affinity and apparent affinity-based methods (surface plasmon resonance (SPR), isothermal titration calorimetry (ITC) and mass spectrometry (MS)), and high-throughput methods (fluorescence resonance energy transfer (FRET)-melting, G4-fluorescent intercalator displacement assay (G4-FID), affinity chromatography and microarrays. Each method has unique advantages and drawbacks, which makes it essential to select the ideal strategies for the biological question being addressed. The structural- and affinity and apparent affinity-based methods are in several cases complex and/or time-consuming and can be combined with fast and cheap high-throughput approaches to improve the design and development of new potential G4 ligands. In the last years, the joint use of these techniques permitted the discovery of a huge number of G4 ligands investigated for diagnostic and therapeutic purposes. Overall, this review article highlights in detail the most used approaches to characterize the G4/ligand interactions, as well as the applications and type of information that can be obtained from the use of each technique.

  • G-quadruplex
  • ligands
  • molecular interactions
  • biophysical methods

1. Introduction

The human genome and transcriptome contain several guanine-rich sequences, which stimulated a considerable interest from researchers since the first reports of their fold into non-classical structural motifs known as G-quadruplexes (G4s) [1][2][1–3] (Figure 1A). These structures are characterized by the presence of two or more stacks of four guanines organized in a coplanar manner [4]. Each set of four guanines forms a building block, usually called G-tetrad that are stabilized by Hoogsteen hydrogen base-pairing in physiological conditions, π-π interactions and as well as the presence of positively charged monovalent cations (usually K+ and Na+) (Figure 1B) [5]. G4s are highly polymorphic and can adopt a wide variety of structures based on strand molecularity, strand direction, as well as length and loop composition [6]. According to molecularity, the structures may be distinguished in intramolecular or intermolecular[6] [6]. Considering the direction of the strands, G4 structures may be classified as parallel, anti-parallel and hybrid (Figure 1C-H). The loops are generally divided into three main groups: propeller, lateral, and diagonal [6]. Recently, some structural studies demonstrated the formation of G4 structures with longer loop lengths and bulges, opening the framework for the development of novel diagnostic and therapeutic approaches based on those features[7][8] [7,8].

Figure 1. (A) Guanine-rich sequence with potential to form a three-tetrad G4. (B) Chemical structure of G-tetrad formed by the Hoogsteen hydrogen-bonded guanines and central cation (colored in gray) coordinated to oxygen atoms. Schematic representation of common unimolecular G4s based on the strand direction: (C) parallel, (D) anti-parallel, and (E) hybrid. Representative PDB structures of (F) parallel (PDB ID: 2M4P), (G) anti-parallel (PDB ID: 1I34) and (H) hybrid (2JPZ) G4 structures. The different loops (propeller, diagonal and lateral) and a bulge were also showed.

Computational algorithms were developed to predict the location of specific G4 sequence motifs in the human genome[9][10] [9,10]. Such predictors consist on the general motif G≥3NxG≥3NxG≥3NxG≥3 and identified over 370.000 sequences with the potential to fold into G4 structures [11]. However, early algorithms are not accurate and lack the flexibility to accommodate divergences from the canonical pattern. In order to surpass these disadvantages, novel approaches were developed to compute the G4 propensity score by quantifying G-richness (reflecting the fraction of guanines in the sequence) and G-skewness (reflecting G/C asymmetry between the complementary nucleic acid strands) of a given sequence[12][13] [12,13], or by summing the binding affinities of smaller regions within the G4 and penalizing with the destabilizing effect of loops [14]. Recently, new machine learning approaches were employed to map active G4s based on sequence features and trained using newly available genome-wide mapping of G4s in vitro and in vivo [15][16][15,16].

2. G4-seq in high-throughput sequencing methods

In the last years, the development of high-throughput sequencing methods, such as G4-seq, enables the identification of over 716.000 DNA guanine-rich sequences, across the human genome, with the ability to fold into G4 structures in the presence of the well-known G4 ligand, pyridostatin (PDS) (Figure 2) [17]. PDS has an important role in next-generation sequencing (NGS), since stabilizes G4s and induces polymerase stalling. Those DNA guanine-rich sequences are non-randomly distributed and are mainly located in clusters of immunoglobulin switch regions [18], telomeres [19] and promoter regions of oncogenes [20]. Several reports have described the formation of G4 structures within endogenous chromatin, and their ability to recruit transcription factors to promote active transcription[21[22][23][24][25][26][27][28] [21–28]. The location of those G4 structures was revealed using an antibody-based G4 chromatin immunoprecipitation sequencing (G4 ChIP–seq) approach [21], and suggest that they play a crucial role in critical cellular processes such as DNA replication [29][30][29,30], DNA damage repair [26], transcription [22][22,23], translation[31] [31] and epigenetic modifications [32]. By using G4 ChIP–seq, Hänsel-Hertsch et. al showed a reduction in the number of detected DNA G4s (10.000) in genome [21]. These results are not surprising since transient G4 structures strongly depend on chromatin relaxation and cell status [21]. Recently, an improved version of the G4-seq method was developed and makes available the G4 map of 12 different species [33].

RNA guanine-rich sequences came into the trend of research in the last few years due to their intrinsic features and strengths. RNA G4s are more compact, less hydrated, and more thermodynamically stable than their DNA counterparts [34]. Furthermore, the presence of the 2′-OH group in the ribose ring favors the parallel topology, making them more attractive as target molecules [34]. To date, using computational approaches, more than 1.1 million guanine-rich sequences were identified with the ability to fold into RNA G4[34] [35]. RNA G4s were shown to exist in human cells by using the specific G4 antibody BG4 [36][36] and like DNA G4s, those sequences are non-randomly distributed in transcriptome [37]. Those sequences are mainly located in both 5′ and 3′UTR, as well as splicing junction of mRNA and noncoding RNAs, being of utmost importance in regulatory post-transcriptional mechanisms [37]. In the last few years, several reports have highlighted the importance of G4s in the transcriptome by employing G4 sequencing high-throughput approaches [38][39][40][41][38–41]. rG4-seq was initially applied to map G4s in RNA extracted from HeLa cells [38] and later to plants [40] and bacteria [41]. G4RP-seq was also used to in vivo characterize the G4 transcriptomic landscape [39]. Yang et. al developed a biotinylated template-assembled synthetic G-quartet (TASQ) derivative (BioTASQ v.1) (Figure 2) and captured G4 RNAs from breast cancer cells in log-phase growth, followed by target identification by sequencing [39]. The effect of BRACO-19 and RHPS4 (Figure 2) treatment was also evaluated[39] [39]. They found that those ligands can change the G4 transcriptome in a more remarkable way in long non-coding RNAs [39]. More recently, the same research group developed a new BioTASQ prototype that they called BioTASQ v.2 (Figure 2) and performed an in-depth study of both ligands [42]. Those studies are of utmost importance and revealed the strong relevance that G4 ligands could have in cell biology.

Therefore, the location of G4s at both DNA and RNA levels suggests an active role in the development of diseases like cancer and neurological disorders [43]. Several pieces of evidence suggest that G4s play an important role in promoting genomic instability by triggering DNA damage[44][45][46] [44–46]. The G4 ligand PDS induces DNA damage as shown by the formation of γH2AX foci, a marker of double-stranded DNA breakage (DSB) [47]. Furthermore, ChIP-seq has shown that PDS accumulates at genes containing clusters of G4 structures and that accumulation is transcription-dependent [44][47]. Recently, De Magis et. al showed that G4 ligands PDS, BRACO-19 and bis-guanylhydrazone derivative of diimidazo[1,2-a:1,2-c]pyrimidine 1 (FG) (Figure 2) induced the formation of R-loops, another noncanonical secondary of a DNA:RNA hybrid compatible with the formation of a G4, and promote DNA damage as a consequence of that formation [47][44]. They also found that the mechanism of genome instability and cell killing by G4 ligands was particularly efficient in BRCA2-depleted cancer cells [44]. This study could open up new possibilities of investigation and lead to the development of new anticancer approaches.

Although G4s present in eukaryotic species has been extensively studied, their presence in bacteria and viruses only have gained attention in the last few years [48][49][50][51][52][48–52]. In bacteria, G4s are found in regulatory regions that play important functions in replication, radioresistance, antigenic variation and latency [51]. G4s in virus have important regulatory roles in key viral steps [53]. Recent studies have demonstrated the formation and function of G4s in pathogens responsible for serious diseases. Among them are Mycobacterium tuberculosis [54], Pseudomonas aeruginosa [41], Human Papilloma Virus (HPV) [55], Human Immunodeficiency Virus (HIV) [53] and SARS-CoV-2 [56].

Therefore, the recognition of the biological significance of G4s has promoted the research and development of ligands that interact with G4s and regulate their structure and function. The most well-known G4 ligands have been initially developed to target DNA G4s, but many of them have also been employed in the target of RNA G4s [57]. Despite some significant progress in the field, the main challenge remains on the trade-off between affinity and selectivity, which could be achieved with the full characterization of G4/ligand interactions. Since the discovery of the first G4 ligands (disubstituted amidoanthraquinones) (Figure 2) [58], methods like circular dichroism, surface plasmon resonance, isothermal titration calorimetry, mass spectrometry, nuclear magnetic resonance and X-ray crystallography have been used to characterize the molecular interactions of the G4/ligand pair. However, despite the utility of those methods, they are in general, time-consuming and/or costly for the first screening of G4/ligand interactions. Following the general tendency, high-throughput approaches like FRET-melting, G4-Fluorescence intercalator displacement (G4-FID), affinity chromatography and microarrays have emerged as rapid and efficient methods to detect the binding and interaction of ligands with their G4 targets.

Overall, this review describes the most known G4 ligands and highlights the importance of the last developed experimental methods to characterize G4/ligand complex interactions.

Figure 2. List of some examples of G4-interacting ligands mentioned in this review showing the common name of the ligand, chemical structure and family of the compound (chemical backbone).

 

Video Production Service