Cancer ranks as the second leading cause of death worldwide, and, being a genetic disease, it is highly heritable. Over the past few decades, genome-wide association studies (GWAS) have identified many risk-associated loci harboring hundreds of single nucleotide polymorphisms (SNPs). Some of these cancer-associated SNPs have been revealed as causal, and the functional characterization of the mechanisms underlying the cancer risk association has been illuminated in some instances.
1. Functional Mechanisms of Coding Region SNPs
Single nucleotide polymorphisms (SNPs) located in coding regions can be divided into two types: synonymous and non-synonymous mutations. Although synonymous mutations do not affect the amino acid sequence of the protein, they may change the expression of the protein by affecting post-transcriptional modifications, translation rates, and other processes. In contrast, non-synonymous SNPs (nsSNPs) cause the substitution of amino acids, thereby resulting in changes to the protein structure, its physical and chemical properties (stability, solubility, etc.), and its function. At present, there are many biological software packages (such as SIFT(Sorting Intolerant From Tolerant), F-SNP(the Functional Single Nucleotide Polymorphism) and PolyPhen) that can be used to predict the effect of nsSNPs on protein structure and function
[1][2][3]. Compared to SNPs located in gene non-coding regions, the functional mechanism underlying tumor-associated nsSNPs is relatively simple
[4]. Combined with whole exon analysis, several coding region SNPs have been identified to be associated with colorectal cancer development. For example, the missense mutation rs3184504 (p. trp263ARg) located in a domain of SH2B3 may change the function of the protein in the context of regulating cell division. Other coding variants may also affect variable shear (RS16888728, UTP23)
[5]. The mechanism by which SNPs located within the coding regions of genes affect the risk of disease is inseparable from the function of the resulting coded proteins.
Some risk loci exert an effect on the amino acid sequence of the produced protein. Examples include BRCA2 p.Lys3326Ter (rs11571833), and CHEK2 p.Ile157Thr (rs17879961) in lung
[6] and breast
[7] cancers. The mechanistic interpretation of such variants is presumed to be relatively simple. In addition to the aforesaid, coding SNPs can affect RNA processing; an example is rs78378222 in the 3′ untranslated region of TP53, whereby the risk-corrected variation alters the sequence AATAAA to AATACA, thereby changing the polyadenylation signal of TP53, and ultimately resulting in the impaired 3′-end processing of TP53 mRNA
[8][9]. Some variants can also affect splicing. Tian and his colleagues identified a single-nucleotide variation in the
ELP2 gene that affect
ELP2 exon pre-mRNA splicing through splicing a quantitative trait locus (sQTL)
[10].
Researchers have often focused on specific signaling pathways, genes, and genetic modifications of interest, while also performing whole-exon association analysis (GWAS) to find any relevant coding SNPs with large effects on these molecules and processes. For example, Li and colleagues used exon sequencing and conducted an association analysis of 12 important genes involved in TGF-β signaling to find that low-frequency causative variation in the TGF-β pathway contributes to colorectal carcinoma (CRC) susceptibility. They discovered that the missense variation rs3764482 (c. 83C>T; p. S28F) located in the gene
SMAD7 was consistently and strongly associated with CRC risk. The rs3764482 allele T was more effective compared to the dominant allele C in limiting TGF-β signaling and reducing the phosphorylation of receptor-regulated SMADs (R-SMADs) via impeding the activation of downstream genes, thereby promoting cancer cell proliferation and contributing to CRC pathogenesis
[11].
Coding SNPs may also affect gene and protein modifications. The N6-methyladenosine (m6A) modification is critical for ensuring messenger RNA stability and is involved in many biological activities, including pre-mRNA splicing, 3’-end processing, nuclear export, translation regulation, mRNA degradation, and the DNA damage response
[12][13]. The m6A methylation modification occurs in the messenger RNA(mRNA) and can be formed by methylation “writers” and removed by demethylation “erasers”
[14]. Rs8100241, located in the gene
ANKLE1, was identified to be associated with susceptibility to both CRC and breast cancer. The presence of the rs8100241 risk allele A (
Figure 1a) combined with the m6A “writer” complex (comprised of the proteins METTL3, METTL14, and WTAP) and the m6A “reader” protein (YTHDF1) was found to increase the levels of the m6A modification on the gene
ANKLE1 and consequently increase its protein expression. Mechanistically, ANKLE1 functions as a potential tumor suppressor by decreasing CRC cell proliferation while maintaining genomic integrity, thereby contributing to a lower risk of CRC
[15].
Figure 1. Schematic diagram of the action mechanism employed by coding SNPs. (a) The A allele of the rs8100241 variant, which is found in the ANKLE1 second exon region, has been linked to a lower risk of CRC by increasing ANKLE1 mRNA m6A levels and thus facilitating ANKLE1 protein expression, thereby potentially functioning as a negative regulator to hinder cell growth by maintaining genomic stability. (b) Interaction between the TCF7L2 missense variant rs138649767 and a regulatory variant rs6983267 in the MYC enhancer and promoter on the expression of MYC.
Notably, coding SNPs may interact with other SNPs to produce a stronger functional role [5]. The rs138649767 A allele (Figure 1b) located in the exon region of TCF7L2 can activate the MYC enhancer containing rs6983267 allele G to promote the expression of MYC [16]. SNPs occurring in the exons and introns of SMAD7 may affect its regulation and jointly affect downstream signaling pathways involving SMAD7 and TGFβ [11]. As a result, while examining SNPs in coding regions, the interactions between them should be taken into account to better understand their functional processes.
2. Functional Mechanisms of Non-Coding Region SNPs
Accumulating evidence shows that a SNP in non-coding regions is the most common type of genetic variation in the human genome, accounting for 90% of inter-individual variation [17][18]. Depending on the location, the region can harbor a response element that is either proximal (promoter, enhancer, or super-enhancer) or distal (intergenic or intra-genic). The risk loci identified by GWAS were located in the genomic regions of cell type-specific active chromatin, and most of them were quantitative trait loci, methylation quantitative trait loci and transcription factor (TF) binding related loci. Chromatin conformational studies have helped to link regulatory regions localized by SNPs to their respective target genes [17][19][20]. These loci may be involved in gene transcription, post-transcriptional processing, translation, post-translational modifications, and other processes to regulate gene expression. Many target genes have been identified using expression quantitative trait loci (eQTL) to detect the relationship between SNPs and gene expression. Non-coding SNPs can regulate the transcription of target genes by sequence-proximal (cis)- or distal (trans)-interactions. Studies have found that histone modifications in the regions of such risk SNPs are particularly abundant, especially those related to promoter and enhancer activities (H3K4me3, H3K4me1, H3K27ac). Most SNPs are predicted to destroy the binding motifs of specific transcription factors. For example, rs6983267 may change the binding of transcription factors such as MYC, CTCF, and TCF7L2 [21]. In addition to affecting gene transcription levels by altering transcription factor-binding sites (TFBS), non-coding SNPs also change epigenetic modifications and/or the chromatin structure to influence target gene expression. Through the above method, non-coding SNP participates in cell proliferation, apoptosis, migration, and invasion.
2.1. Genetic Variants That Alter Promoters
A promoter is a sequence of DNA that is recognized, bound, and serves to initiate transcription by RNA polymerase. Promoters contain variations of a conserved sequence required for the specific binding of RNA polymerase and transcription initiation. Most promoters are located upstream of the transcription initiation point of structural genes, and the promoter itself is not transcribed [22]. Promoters are located upstream of the 5’ end of a given structural gene, and they activate RNA polymerase to bind accurately to the template DNA with specificity for inducing the initiation of transcription [22]. Promoters do not control gene activity themselves; rather, gene activity is regulated by binding to proteins called transcription factors (TF). SNPs within promoter regions generally play a regulatory role by influencing the binding of such transcription factors. A recently reported example is that of the SNP rs13278062 located in the promoter of death receptor 4 (DR4) which confers an altered risk of colorectal cancer. The study revealed that the rs13278062 G>T variant changed the binding affinity of the transcription factor Sp1/NF1, increased the expression of DR4, and thus suppressed carcinogenesis and metastasis of colorectal cancer [23]. The MPO promoter SNP rs2333227 increases the malignant characteristics of colorectal cancer by changing the promoter’s affinity to AP-2α [24]. The variant SNP rs10993994 located in the upstream promoter of the gene MSMB is also found to be overrepresented in individuals with prostate cancer; this is attributed to stronger CREB binding and thus increased promoter activity [25]. Furthermore, the SNP rs11672691 is a risk locus associated with prostate cancer that is related to the lncRNA PCAT19. The non-risk variant rs11672691 and its linkage disequilibrium (LD) SNP rs887391 are more likely to bind the TFs NKX3.1 and YY1 to the PCAT19-short promoter, thereby leading to increased promoter but lower enhancer activity, which then activates PCAT19-short, and ultimately results in lower prostate cancer susceptibility [26]. SNPs in promoter regions of multiple genes, including TERT, KLHDC7A, PIDD1, and ESR1, have been discovered in breast cancer by GWAS, with reporter studies revealing that independent risk alleles change target promoter activity [27][28]. Most of the reported promoter changes exert their regulatory effects by altering TF binding. The SNP rs3824662 allele A (Figure 2a) increases chromatin accessibility by changing the TF GATA3 expression, promoting the binding of GATA3 with the CRLF promoter, and ultimately forming a chromatin loop [29].
Figure 2. Schematic diagram of the action mechanism employed by non-coding SNPs. (a) The SNP rs3824662 allele A increases chromatin accessibility by inducing GATA3 expression, promoting the binding of GATA3 with the CRLF promoter, and ultimately forming a chromatin loop. (b) The NTN4 enhancer risk variant rs11836367 binds to the TF GATA3 to regulate NTN4 expression, ultimately promoting breast carcinoma initiation and progression. (c) Enhancer SNP rs7959129 risk allele G interacts with promoter SNP rs6192603 risk allele G contributing to ATF1 expression by binding TFs GATA3 and SP1. (d) The risk allele rs11986220 and higher methylation at –10 Kb synergistically function to confer a greater risk of tumor; however, when -20 Kb is hypomethylated, the function of the risk SNP is inhibited by the enhancer-blocking insulator loop mediated by CTCF. (e) The risk variant rs11655237 in LINC00673 creates a miR-1231–binding site that interferes with the expression of LINC00673 and contributes to pancreatic cancer susceptibility.
2.2. Genetic Variants That Alter Enhancers
Enhancers are regions of DNA sequence that can increase the cis-acting transcription of their target gene sequences. Enhancers each differ in their distance from their target promoter(s); in mammalian species, an enhancer can be 100 bp to Mb away from their target gene [30]. Enhancers, unlike promoters, can be found anywhere in a gene; they can be positioned either upstream or downstream of their target genes, or even within another gene’s gene body, and enhancer regulation can circumvent other genes irrespective of their orientation. Enhancers must bind to specific protein factors to enhance the transcription of their target. Enhancers generally have tissue or cell specificity, whereby they only show activity in certain cells or tissues, which is determined by the specific protein factors present in these cells or tissues [31]. Enhancers are typically recognized by the epigenetic marks H3K4me1 and H3K27ac, which are present in active enhancer elements. Conversely, H3K27me3 is regarded as a silent epigenetic mark associated with lower enhancer activity [32][33]. GWAS-identified risk loci for common illnesses are often found in non-coding areas, and many of these are thought to function as enhancers [34]. According to emerging data, these SNPs may influence gene regulation by changing the binding of important TFs to critical transcriptional enhancers [35].
2.2.1. Breast Cancer
Of all cancers, breast cancer has so far yielded the greatest number of discovered risk loci [36]. Understanding the driving mechanism(s) underlying malignant transformation provides the prospect of combating cancer recurrence and treatment resistance. Zhang et al. identified that the SNP rs4971059 resides in the sixth intron and within an active enhancer element of the TRIM46 gene. By using CRISPR/Cas9-mediated homologous recombination, they constructed the SNP rs4971059 with the allele G converted to allele A, thereby resulting in TRIM46 overexpression, boosting breast carcinoma cell growth, enhancing chemotherapy resistance in vitro, and hastening tumor development in vivo [37]. In addition, Yang and colleagues (Figure 2b) reported the noncoding regulatory variant rs11836367 at the NTN4 locus (12q22) and identified it to be associated with the risk of breast carcinoma as a causal variant. The rs11837367 protective T allele promotes GATA3 binding to the distal enhancer and increases NTN4 expression [38].
2.2.2. Prostate Cancer
Several studies have independently identified several genes in specific prostate cancer (PCa) susceptibility loci that are either controlled by causative SNPs containing a
cis-regulatory element (CRE) or have been indicated as SNP-associated genes
[39]. SNP rs339331 at 6q22 was found to be a prostate cancer risk-associated variant. The risk allele T of rs339331 has been found to augment the enhancer-binding of HOXB13, alter the level of the RFX6 protein in an allele-specific manner, and confer a predisposition to prostate cancer
[40]. Recently, Huang et al. also identified that the PCa-associated rs11672691 located within an enhancer element can change the binding site of HOXA2, which in turn promotes oncogenesis by impacting the expression of nearby genes
[41].
Notably, there are other cases of SNPs causing DNA-binding polymorphisms in distinct transcription factors. For example, a gastric cancer risk-associated polymorphism (rs2978980 T>G) that is situated in an intronic enhancer of
lncPSCA has been found to disrupt the binding of the transcription factor RORA, thereby resulting in lower
lncPSCA expression in an allele-specific manner
[42]. As another example, the rs2647046 enhancer has been found to interact with the
HLA-DQB1-AS1 promoter to alter its expression via a CTCF-mediated long-range loop in an allele-specific manner, thereby conferring susceptibility to hepatocellular cancer (HCC)
[43]. Another variation on chromosome 11q13.3 in a distant intergenic region has been characterized as a susceptibility locus for renal cell cancer. To control transcription, the 11q13.3 locus encodes a long-range enhancer that physically connects with the
CCDN1 promoter
[44]. Interestingly, SNP sites can act as promoters and enhancers simultaneously, and their conversion is determined by the background genotype. As a result, one gene can produce several different RNAs that are involved in the development of diseases. The SNP rs11672691 mediates promoter and enhancer switching under different genotypes. A risk-associated sequence in the
PCAT19-long enhancer interacts with the
PCAT19-long promoter to enhance prostate cancer development through activating cell cycle genes
[26].
2.2.3. Colorectal Cancer
GWAS have identified numerous colorectal cancer risk loci, but only a fraction of the target genes of these loci have been systematically interrogated. For example, Yu et al. identified a common SNP (rs7198799) in the intron of the gene CDH1. They demonstrated that the risk allele C of rs7198799 acts as an enhancer that can target the TF NFATC2 and remotely enhance ZFP90 expression [45]. A prominent mechanism by which SNP variants can affect cell-specific enhancer function is via altered TF binding, thus regulating the target gene’s expression. Tian et al. identified two risk SNPs (rs61926301 and rs79591129) located in the ATF1 promoter and first intron, respectively. These are enriched in enhancer regions and open chromatin, which are also associated with H3K4me1, H3K27ac, and ATAC-seq peaks. The two variants increase the expression of ATF1 through preferentially binding to the two TFs SP1 and GATA3 [46]. Rs174575 can act as a specific remote enhancer of FADS2 and lncRNA-AP002754.2 with the participation of the transcription factor E2F1. Interestingly, TF E2F1 can promote the expression of FADS2, form a chromatin loop, and affect the occurrence of colorectal cancer [47].
2.3. Genetic Variants That Affect Promoter–Enhancer Interactions
Promoter–enhancer interactions (PEIs) underlie differential transcriptional regulation. Several technologies (chromosome conformation capture (3C), Hi-c, and H3K27Ac-HiCHIP) allow for the study of long-range cis-regulation [48][49][50]. Promoter–enhancer interactions are essential events involved in the current theory of transcriptional control. So far, there is little evidence that PEIs are required for the transcriptional control of an enhancer’s target gene. The insertion or deletion of promoters, the absence of certain PEI-associated proteins, and the inclusion of PEI-disrupting insulators all have an effect on the expression of target genes. Tian et al. found two risk variants (rs1926301 and rs7959129) located in the ATF1 promoter and intron, respectively; the former binds the TF SP1 while the latter binds the TF GATA3 (Figure 2c). They found that these two risk sites increase the interaction between the promoter and enhancer by binding SP1 and GATA3, facilitating ATF1 expression, and conferring hereditary susceptibility to CRC [46]. Moreover, the SNP rs11672691 mediates promoter and enhancer switching in a manner dependent on different background genotypes. The risk is determined by the PCAT19-long enhancer interacting with the PCAT19-long promoter, thereby altering prostate cancer development through activating cell cycle genes [26].
2.4. Genetic Variants That Alter 3D Genome Architecture
Within the nucleus, genomic DNA folds into a three-dimensional structure organized at different levels by the formation of chromatin rings. These structures can bring distant enhancers near their target promoters to affect gene expression and regulation. The chromosomes fold into chromatin characterized by sequence-regulating spatial interactions that are key to maintaining normal cell status and function. In cancer genomes, structural variation typically results in changes to the genome’s 3D structure and, as a result, alterations in genome-mediated transcriptional control
[51]. Changes in the three-dimensional genome architecture or high-order chromatin structure are linked to the development and progression of several diseases
[52][53]. Long-distance chromatin looping regulates cancer susceptibility genes either actively or passively. Enhancers frequently form long-range chromatin loops with their target gene promoter regions to affect gene expression. The 9q22 locus, for example, contains the thyroid cancer risk-related SNP rs965513, which demarcates a 33-kb linkage disequilibrium block (including the lead SNP rs965513) that is strongly linked with PTC risk. The chromatin characteristics and regulatory element signatures of this block indicate at least three regulatory elements that operate as enhancers. Using chromosomal conformation capture technology, researchers have observed the long-range looping connections of these elements with the promoter region shared by FOXE1 and PTCSC2 in a human papillary thyroid cancer cell line (KTC-1) and unaffected thyroid tissue
[54]. Similarly, Zhang et al. discovered that the rs1859962 risk-associated LD block contains a PCa-specific enhancer that forms a 1-Mb chromatin loop with the
SOX9 gene. This study found that the rs1859962 PCa risk LD block contacts
SOX9 via a long-distance chromatin loop that connects it to the E1 enhancer
[55].
CTCF is a transcription factor that promotes long-range chromosomal contact via looping. Hoffman et al. discovered that one allele in the Igf2/H19 imprinting control region (ICR) on chromosome 7 colocalized with one allele of Wsb1/Nf1 on chromosome 11. The lack of CTCF or the ablation of the maternal ICR was found to eliminate this connection and alter the expression of the
Wsb1/Nf1 gene
[56]. This finding confirmed the importance of CTCF in the control of the shape of chromatin and the resulting gene expression. On the other hand, the unique contribution of CTCF is that of an insulator. Insulators are short nucleotide sequences that determine the boundaries of genomic areas that are close to one another
[57]. When CTCF binds to an insulator region, it inhibits gene transcription by interfering with the communication between an enhancer and a gene promoter
[58]. Ahmed M. et al. identified (
Figure 2d) noncoding
cis-regulatory elements (rCRE) by performing CRISPRi screens. They discovered that the 8q24.21 area is widely marked with H3K27ac and has a significant binding affinity to AR, FOXA1, and HOXB13, all of which are important transcription regulators for PCa pathogenesis
[59]. Using an integrated approach involving ChIP, Hi-C, CRISPR, and functional rescue, researchers also discovered that the rs11986220 containing the rCRE sequence interacts with the
MYC promoter in V16A cells but not in 22Rv1 cells, as the promoter–CRE interaction is typically facilitated by a CTCF site in a 10 kb region upstream, which prevents chromatin looping
[59]. Similarly, the rs6702619 region is inhabited by CTCF, which acts as an insulator with long-range physical interactions with CRC-relevant loci
[60]. Understanding CTCF-mediated 3D genomic architecture will aid in understanding the mechanism of action underlying noncoding GWAS SNPs at either CTCF sites or regulatory enhancer sites
[61].
2.5. Genetic Variants That Influence the Binding of miRNA
MicroRNAs (miRNAs) are noncoding RNA molecules that influence gene expression via regulating messenger RNA degradation and translation. MicroRNAs are normally excised by the RNase iii enzyme Dicer from 60–110 nucleotide long hairpin precursor (folded) RNA structures (pre-miRNAs), which are then integrated into the RNA-induced silencing complex (RISC). The pro-miRNA sequence is transcribed by Pol-II
[62]. Accumulating evidence suggests that miRNAs play a key role in carcinogenesis by binding to the 3’-UTR of target mRNAs
[63]. MiRNA mutations or their misexpression have been associated with human malignancies and alterations in cancer-associated gene expression
[64]. Hoffman et al. detected a variant (rs11614913) in
has-miR-196a-2 using GWAS to screen genetic variants in 15 miRNAs. This SNP was identified to be associated with decreased breast cancer risk
[65]. Previous research has confirmed that the methylation of
[66] islands in miRNA regions may change miRNA function, thereby influencing carcinogenic pathways. The author and his colleagues found that a CpG island in the region upstream of the miRNA precursor is associated with breast cancer risk
[65]. The
ATF1 rs11169571 variant was shown to be strongly related to ATF1 expression by influencing
hsa-miR-1283 and
hsa-miR-520d-5p binding, which may increase susceptibility to colorectal cancer
[16]. In addition, SNPs located in the 3’UTR region of
MDM4,
CD44,
LAMC1, and other genes exert a similar mechanism
[67][68][69].
Some SNPs within long non-coding RNA can also change their binding affinity to miRNAs. The variant loci rs1317082, discovered at exon 1 of lncRNA
RP11-362K14.5 (CCSlnc362), establishes a binding site for
miR-4658, which consequently reduces CCSlnc362 expression and confers lowered susceptibility to CRC
[70]. The link between rs140618127 in the lncRNA
LOC146880 with non-small cell lung cancer involves a
miR-539-5p binding site. The combination of
miR-539-5p and
LOC146880 has been found to result in the reduced activation of the oncogene
ENO1. Reduced
ENO1 phosphorylation also results in lower PI3K and Akt activation, which is linked to decreased cell proliferation and tumor formation
[71]. Moreover, the SNP rs11655237 allele G in LINC00673 exon can create a miRNA binding site that increases the function of LINC00667 expression (
Figure 2e). Furthermore, rs67311347 in RCC
[72], rs12982687 in CRC
[73], and rs16854802 in neck squamous cell carcinoma (HNSCC)
[74] are SNPs in lncRNA sequences that affect target gene expression by binding with miRNA. If a SNP occurs within miRNA, it will consequently affect the binding affinity of the miRNA to target genes.
This entry is adapted from the peer-reviewed paper 10.3390/cancers14225636