1000/1000
Hot
Most Recent
The retroposition, in which the processed mRNA from parental genes undergoes reverse transcription and the resulting cDNA is integrated back into the genome, results in additional copies of existing genes. Despite the initial misconception, retroposition-derived copies can become functional, and due to their role in the molecular evolution of genomes, they have been named the “seeds of evolution”. It is convincing that retrogenes, as important elements involved in the evolution of species, also take part in the evolution of neoplastic tumor at the cell and species levels. The occurrence of specific “resistance mechanisms” to neoplastic transformation in some species has been noted. This phenomenon in some cases has been related to additional gene copies, including retrogenes. In addition, retrogene expression correlates with the occurrence of specific cancer subtypes, their stages, and their response to therapy. Phylogenetic insights into retrogenes show that most cancer-related retrocopies arose in the lineage of primates, and the number of identified cancer-related retrogenes demonstrates that these duplicates are quite important players in human carcinogenesis.
New gene copies may be obtained by polyploidization, irregular crossing over, or DNA- or RNA-mediated duplication [1]. Until recently, only DNA-based duplication was considered to be functionally relevant. However, later studies have revealed that RNA-based duplication provides copies that may play a vital role in the cell [2][3][4][5].
The formation of retrocopy begins with the transcription of a parental gene. The processed mRNA goes to the cytoplasm, where L1 retrotransposon-derived proteins bind to its polyA tail. The process takes place with the participation of reverse transcriptase, endonuclease, and chaperones. Parental gene’s mRNA anneals to the broken DNA ends, undergoing reverse transcription, and the resulting cDNA is integrated back into the genome in the form of a retrocopy (Figure 1a). Retrocopies are devoid of introns and regulatory elements. They are equipped with a poly-A tail along with flanking repeats. These copies were long considered to be “dead-on-arrival” and classified as transcriptional noise due to their high similarity to the parental genes. Nevertheless, to promote transcription, retrotransposed transcripts can take advantage of adjacent gene regulatory regions and use distant CpG sequences or sometimes even parts of their own sequences. Furthermore, retrocopy insertion into the intron of another gene often leads to the acquisition of the host’s regulatory machinery [3][6].
Retrocopies that acquired transcriptional ability are called retrogenes. In the process of evolution, they may be subject to subfunctionalization and take over some of the parental genes’ functions (Figure 1b). A good example is retrogene SLC5A3 and the parental gene SLC5A1. Proteins encoded by these genes contain solute binding domains, but they differ in activity. Retrocopy-derived protein is a sodium-dependent myo-inositol transporter. In turn, parental-derived proteins participate in the transport of glucose and galactose [5]. Another functional evolution path of retrocopies is neofunctionalization. As a result, they can encode proteins, novel or similar to those encoded by the parental gene, or they can obtain regulatory functions and be involved in transcriptional regulation of parental counterparts or other genes. They can also participate in transcriptional interference, be a source of different small RNAs, or act as miRNA sponges [5][7][8]. Retrogene-derived RNAs can also be involved in epigenetic regulation [9] or function as trans-NATs (natural antisense transcripts) [5][10]. Moreover, it was demonstrated that retrogenes can functionally replace their parental genes [11]. Retrocopies may also contribute to other genes and/or transcripts; they can create chimeric transcripts, act as recombination hot spots [5], or provide a sequence for alternative exons [12].
It is widely understood that adaptive features, lineage-specific phenotypic traits, are associated with the formation of new genes [3]. Gene duplication is a primary mechanism of new gene formation by providing a substrate for natural selection. The additional copies of ancestral genes are subject to less evolutionary restriction to develop a novel feature [1]. They accumulate mutations faster than protein-coding genes and thus evolve faster [13]. Within the different types of duplication, there are differences in the susceptibility to evolutionary changes. Duplication at the DNA level results in daughter copies with full equipment (core promoters and gene organization). Therefore, these duplicates mostly mirror the protein function and expression pattern of their ancestor [6]. In contrast, analysis of retrogenes has pointed to their significant contribution to molecular evolution as a source of genomic novelties, and they are called “seeds of evolution” [14]. Due to the lack of regulatory elements, transcribed retrocopies must acquire regulatory regions. Thus, retrocopies are probably more predisposed to evolve a novel expression pattern and functional role than copies emerging from segmental duplication. Moreover, retrogenes play a role in gene structure evolution by mediating the decline of introns [3]. Nevertheless, retrocopies may gain introns or additional exons over time. Szcześniak et al. reported two retrogenes, RNF113B and DCAF12, where introns were created through mutations and the appearance of new splice sites [15]. On the other hand, Vinckenbosch et al. identified 27 intergenic retrogenes that acquired de novo exons [6].
Several studies support the hypothesis that splicing signal conservation constrains the rate of protein evolution [3], it has been suggested that the evolution rate is lower within the exon-intron boundaries and for intron-rich genes [16]. Therefore, splicing constraints impose some limitations on parental gene evolution. However, such constraints should not apply to single exon retrocopies. Interestingly, it has been noted that within the retrocopy sequence, the rate of protein evolution is in fact the strongest within previous splicing junctions in the ancestor gene. Consequently, a more effective adaptation of retrogene-derived protein in comparison with the parental gene’s protein can be speculated as a result of relaxing splicing constraints [3].
Many studies emphasize the role of retrogenes in the differentiation and molecular evolution of genomes [17][5][12] and, as a result, are a source of species-specific features as well as interspecies variation. For example, a retrocopy of the cyclophilin A gene within the owl monkey genome is associated with resistance to HIV [17]. Another example constitutes the rodent-specific retrogene Rps23r1, which reduces Alzheimer’s β-amyloid levels and may cause discrepancies between animal model studies and results of clinical trials, for example [18]. Furthermore, the fgf4 retrogene that is responsible for chondrodysplasia is found only in short-legged dog breeds [19]. Finally, as previously mentioned, the increased number of TP53 gene retrocopies results in a lower cancer transformation rate in the elephant population [20]. Moreover, retrocopy number variation was also observed across human populations. This includes transcriptionally active retrogenes like EIF4A1P10 or TCF3P lost in some members of African populations, for example [21][22].
Numerous analyses have shown that the retroposition process was particularly intensive during the evolution of primates. The intensity of this phenomenon is associated with the occurrence of many retrocopies specific for this order of mammals [23][24]. As a result of this “burst of retroposition”, retropseudogenes belong to the largest group within all human pseudogenes [25]. Many retrogenes have been linked to cancer and a lot of them are human and primate-specific as it is demonstrated further down. Therefore, the question arises whether a large number of human retrogenes are associated with a high risk of neoplastic transformation in our species.
Mutational events form the basis of species evolution as well as cancer development [26]. Cancer tumors are highly dynamic and adaptive systems and evolve very quickly. Evolutionary processes play a role in the progression of cancer on two levels, at the level of species evolution and the level of individual cancer development (Figure 2). The phenomenon of natural selection operates on specific features of the population associated with cancer promotion/suppression [27]. Mechanisms of resistance to tumor transformation have been described in several species [28]. Cancer tumors are also subject to natural selection, and the “branched evolution” of species is reflected in the evolutionary trajectories of cancer cell populations [29].
Figure 2. Evolutionary cancer mechanisms in the context of species-specific features in cancer suppression (a), clonal subpopulations within a tumor (b), and cancer treatment (c).
The riddle “Peto’s paradox” indicates that the incidence of cancer among animals does not increase with body size and length of life. A good example constitutes already mentioned studies on a group of African elephants (lat. Loxodonta africana) and Asian elephants (lat. Elephas maximus). Among elephants, in comparison to other species, the tumor transformation rate is lower than expected. Unlike human cells, with one copy of the TP53 gene, African elephants have 20 copies, 19 of which arose from retroposition [20]. The TP53 gene encodes the p53 protein, which is called the “genome guardian” [30]. It belongs to key suppressor genes, and TP53 mutations have been observed in most human cancers [31][32]. Disruption of p53 protein function causes the occurrence of cancer cell features [20]. The presence of extra copies results in an effective DNA damage response through the hyperactive TP53 pathway (Figure 2a) [33].
A similar phenomenon has been observed in the population of the long-lived (approximately 200 years) bowhead whale (lat. Balaena mysticetus), although the resistance mechanism is not entirely clear. This has been linked to the positive selection force acting on cancer and aging genes involved in DNA damage repair and thermoregulation, ERCC1 and UCP1, respectively. Duplication of the PCNA gene, one of the essential repair mechanism genes, has been reported. This can reduce the frequency of mutations and thus prevent tumorigenesis [34][35].
A species-specific cancer defense mechanism was also uncovered in the long-lived rodent lineage, including the naked mole-rat (lat. Heterocephalus glaber) [36]. The naked mole-rat resistance mechanism is based on the limitation of cell proliferation through the expression of high molecular mass hyaluronan (HMM-HA). The longer variant of HMM-HA inhibits divisions, inflammation, and metastatic processes [37]. It is quite interesting that in the naked mole-rat genome, 17 additional copies of PTEN, an important tumor suppressor gene, have been also reported. This may additionally contribute to such strong resistance to cancer [38]. Another example of a rodent that has evolved a way to suppress cancer is the blind mole-rat (lat. Spalax ehrenbergi). In this case, a subterranean lifestyle is connected to unusual tolerance to hypoxia. This has been associated with alterations in the TP53 gene sequence. Similar changes have been identified in hypoxia-tolerant human tumors [36][39]. Interestingly, despite many studies, no cases of malignant neoplasm have been found in this species [33].
The literature data also report bats as a relatively long-lived species. It has been suggested that the ability to fly as an energy-intensive activity has caused the evolution of mechanisms that inhibit oxidative stress. Furthermore, DNA damage checkpoint genes are under positive selection in this case. These effects may be related to cancer resistance in the bat population [28].
The impact of evolutionary forces is also visible in cancer cells. It is supposed that a tumor basically consists of copies of a single cell. During tumor development, neoplastic changes (e.g., mutations) create heterogeneous masses - the starting point for the operation of evolutionary pressure. However, mutations are not the only forces shaping cancer evolution. Successive changes lead to the formation of different genetic subclones (Figure 2b) [27][29]. At this point, the evolutionary selection is also starting to play a role, and one group of tumor cells may be evolutionarily favored. This situation may occur when some cells develop traits that give them an advantage in a particular tumor environment. As a result, these cells will have more “offspring” than others. Furthermore, the high degree of tumor diversity and genomic instability results in a high risk of an adaptive mutation, which in turn is related to a faster progression of the disease [40].
Additional confirmation of the evolutionary forces acting within cancer comes from genetic changes in the response to a particular drug (Figure 2c). The diagnosed tumor may consist only of cells that are sensitive to treatment, and the patient has a good prognosis. However, among some fraction of patients, after applying certain therapies, the response is observed only at the initial phase and, unfortunately, tumor progression unexpectedly accelerates. This may indicate the presence of therapy-resistant subpopulations in the pretreated tumor. It has been also noted that therapy-derived selective pressure can determine the growth of therapy-resistant populations and induce the onset of “acquired resistance”. Therefore, adaptive therapy has been proposed, whereby maintaining a population of drug-sensitive cells limits the growth of populations resistant to treatment. The evolution-based approach relies on the combination of different drugs or their doses to slow tumor proliferation. It is suggested to use repeated optimal doses to reduce tumor volume rather than destroying it. This less aggressive approach may allow better control for the tumor and prevent the development or widespread of more aggressive, treatment-resistant form. This switch from the traditional approach that bases on maximal cell death to maximum progression-free survival could improve cancer treatment outcomes [29][41][42][43].
Just as the evolutionary history of a given species has led to differences in susceptibility to cancer, so does the history of tumor development influence the response to applied oncological treatment. Thus, determining the course of the evolutionary history of the tumor is important for establishing the best oncological treatment for a particular type of tumor and the stage of its development. The evolution of species-specific resistance mechanisms to cancer and the occurrence of specific tumor types and patient responses to treatment have been linked in some cases with retroposed genes as described below.
Identification of new biomarkers that will help predict a series of events in cancer evolution would certainly lead to more effective diagnostics and treatment. Retrocopies seem to be perfect candidates. The literature has shown a relationship between the expression level of some retrogenes and the occurrence of specific cancers [44][45]. Retrocopies involved in the response to a particular treatment, such as radiation [46] or paclitaxel [47], have also been reported [46][47]. Moreover, many studies describe retropseudogenes associated with the occurrence of a particular stage or form of the tumor [48][49][50]. It turns out that they can play multifaceted roles within tumor cells, and the literature reports retrogenes that are both oncogenes and tumor suppressors. A list of the cancer-related retrocopies that have been described so far in the literature is presented in Table 1.
Table 1. Characteristics of the cancer-related retrogenes that have been described in the literature (based on [45][50]).
Retrocopy | Ensembl ID | RetrogeneDB ID [51] | Chromosome | Parental Gene | Cancer Type |
---|---|---|---|---|---|
KRASP1 | ENSG00000220635 | retro_hsap_3474 | 6 | KRAS | prostate cancer [52] |
UTP14C | ENSG00000253797 | retro_hsap_29 | 13 | UTP14A | ovarian cancer [53] |
MSL3P1 | ENSG00000224287 | retro_hsap_2401 | 2 | MSL3 | renal cell carcinoma [54] |
ANXA2P2 | ENSG00000231991 | retro_hsap_4150 | 9 | ANXA2 | hepatocellular carcinoma [55] |
CSDAP1 (YBX3P1) | ENSG00000261614 | retro_hsap_1674 | 16 | YBX3 | lung adenocarcinoma [56] |
LGMNP1 | ENSG00000214269 | retro_hsap_1272 | 13 | LGMN | glioblastoma [57] |
UBE2CP3 | ENSG00000250384 | retro_hsap_2935 | 4 | UBE2C | hepatocellular carcinoma [58] |
RACGAP1P | ENSG00000257331 | - | 12 | RACGAP1 | hepatocellular carcinoma [59] |
PTTG3P | ENSG00000213005 | - | 8 | PTTG1 | breast cancer [60] |
CKS1BP7 | ENSG00000254331 | - | 8 | CKS1B | breast cancer [61] |
PTENP1 | ENSG00000237984 | retro_hsap_4245 | 9 | PTEN | hepatocellular carcinoma [62], gastric cancer [63], renal cell carcinoma [64] |
INTS6P1 | ENSG00000250492 | retro_hsap_3307 | 5 | INTS6 | hepatocellular carcinoma [65] |
TUSC2P1 | ENSG00000285470 | - | Y | TUSC2 | esophageal squamous cell carcinoma [66] |
NKAPL | ENSG00000189134 | retro_hsap_15 | 6 | NKAP | kidney renal papillary cell carcinoma, pancreatic adenocarcinoma, adenoid cystic carcinoma [67] |
CTNNA1P1 | ENSG00000249026 | - | 5 | CTNNA1 | colorectal cancer [68] |
RHOB | ENSG00000143878 | retro_hsap_108 | 2 | RHOA | renal cell carcinoma [69], lung cancer [70], colorectal cancer [71] |
HMGA1P6 | ENSG00000233440 | retro_hsap_1175 | 13 | HMGA1 | endometrial carcinoma [72], ovarian carcinosarcoma, thyroid carcinoma [44] |
HMGA1P7 | ENSG00000216753 | - | 6 | HMGA1 | endometrial carcinoma [72], ovarian carcinosarcoma, thyroid carcinoma [44], breast cancer [73] |
SUMO1P3 | ENSG00000235082 | retro_hsap_240 | 1 | SUMO1 | hepatocellular carcinoma [74], gastric cancer [75], colorectal cancer [76] |
NANOGP8 | ENSG00000255192 | retro_hsap_1549 | 15 | NANOG | gastric cancer [77], prostate cancer [78] |
POU5F1P4 (OCT4-pg4) | ENSG00000237872 | - | 1 | POU5F1 | hepatocellular carcinoma [8] |
POU5F1P5 (OCT4-pg5) | ENSG00000236375 | - | 10 | POU5F1 | endometrial carcinoma [79] |
SLC6A6P1 | ENSG00000226818 | retro_hsap_2498 | 21 | SLC6A6 | ovarian cancer [80] |
PDIA3P1 | ENSG00000180867 | retro_hsap_217 | 1 | PDIA3 | multiple myeloma [49] |
PPIAP43 | ENSG00000255059 | retro_hsap_816 | 11 | PPIA | small cell lung cancer [46] |
FTH1P3 | ENSG00000213453 | retro_hsap_2240 | 2 | FTH1 | breast cancer [47] |
E2F3P1 | ENSG00000267046 | retro_hsap_1749 | 17 | E2F3 | hepatocellular carcinoma [81] |
Retrogenes with elevated expression in cancer constitute a large group, and in many cases, increased expression of these retrocopies promotes cancer development. The expression levels of retrocopy KRASP1 are correlated with the prostate cancer phenotype. Its parental gene - KRAS - belongs to one of the most well-known oncogenes. Cancer cell line studies have shown that KRASP1 overexpression causes increased parental gene expression and cell proliferation [52]. It has also been hypothesized that predisposition to ovarian cancer is associated with the expression of the small subunit processome component UTP14C, a protein-coding retrocopy of the UTP14A gene. This was explained by UTP14C downregulation of TP53 levels, which leads to the prevention of cell cycle arrest and apoptosis [53]. Upregulated expression of MSL3P1, male-specific lethal-3 homolog pseudogene 1, has been correlated with renal cell carcinoma [54]. Other examples of retrocopies overexpressed in cancer tissues include ANXA2P2 [55], CSDAP1 [56], LGMNP1 [57], UBE2CP3 [58], RACGAP1P [59], PTTG3P [60], and CKS1BP7 [61].
Analyses of RNA-seq data performed in our group revealed that more retrogenes may be associated with cancer. Differential expression analysis allowed the identification of 3 potential markers with increased expression levels characteristic of breast cancer, RPL5P4, ASS1P2, and AC007731.2, and 8 retrocopies with elevated expression in lung adenocarcinoma, PTBP1P, AL121949.1, HNRNPA3P9, retro_hsap_4319, AC090695.2, CDK8P2, MSL3P1, and POLR3GP1 [82]. Retro_hsap_4319 is a novel retrogene, i.e., not annotated in the reference genome, placed in the RetrogeneDB database [51].
Retrogenes associated with tumor suppression have also been reported. Downregulation levels of PTENP1 have been associated with gastric cancer and renal cell carcinoma. The PTENP1 functions as a miRNA sponge. A decreased level of PTENP1 contributes to increased degradation of its oncosuppressive parental gene, PTEN, which exerts a growth-inhibitory role within the tumor [45][52][62][63][64]. Another example is INTS6P1 retrocopy. Its lower serum levels correspond to hepatocellular carcinoma. Interestingly, the diagnostic power of this retrocopy is comparable to the most common biomarker for hepatocellular carcinoma—α-fetoprotein [45][65]. In turn, the expression of TUSC2P1 retrocopy suppresses the proliferation and migration of cancer cells and promotes apoptosis. This duplicate share sites for miRNAs with its progenitor TUSC2 gene, thereby regulating its expression. The interaction with common miRNAs promotes parental gene expression and results in inhibition of proliferation, migration restriction, and apoptosis induction [66]. An additional example of a tumor suppressor is NKAPL. Downregulation of this retrocopy is connected with lower overall survival in several cancers, including kidney renal papillary cell carcinoma, pancreatic adenocarcinoma, and adenoid cystic carcinoma [67]. A decreased expression level of CTNNA1P1 has been associated with the pathogenesis of colorectal cancer. Suppressive action of the cognate gene CTNNA1 has also been shown in several tumors [68]. An example of the well-described retrogene in the cancer literature is RHOB exhibiting suppression activity. Decreased expression of RHOB has been reported in many cancer studies [69][70][71][83].
The previously mentioned analysis of breast cancer samples led to the identification of 17 additional retrocopies with decreased expression levels (AC104212.2, RHOQP2, NKAPL, RPL21P16, RBMS1P1, retro_hsap_2623, DIO3, FAM122A, RPSAP70, PTENP1, AC138392.1, DHFR2, CTB-50E14.5, AK4P1, RAB43P1, PSMA2P1, and RBMXL1). In the lung cancer cohort, 13 retrogenes showed decreased expression in cancer (RPL13AP17, HNRNPA1P33, SIRPAP1, AL136982.4, AL136452.1, AC084880.1, HMGN2P15, CDC20P1, AC022217.1, DIO3, HMGB3P10, BET1P1, and TMED10P2) [82].
Differences in the expression level of retrogenes were also observed depending on the subtype of cancer. Retrocopies of the HMGA1 gene have been related to the occurrence of anaplastic thyroid carcinoma. HMGA1P6 and HMGA1P7 have oncogenic activity and contribute to cancer progression. In well-differentiated and weakly aggressive papillary thyroid carcinoma, HMGA1P6 and HMGA1P7 were not identified. In turn, anaplastic thyroid carcinoma, one of the most malignant cancers in humans, expresses high levels of these retrogenes [44]. Interestingly, a similar relationship has been noted among patients with endometrial cancer—increased expression levels of HMGA1P6 and HMGA1P7 correlate with the malignant phenotype [73]. Another good example is the upregulated SUMO1P3 retrocopy in gastric cancer patients, which has the marker potential to differentiate between cancer and benign gastric disease [45][75].
In our laboratory breast cancer analysis, two retrocopies with differential expression characteristics for the ER+ (estrogen receptor-positive) subtype, AC098591.2, and PABPC4L, and 7 downregulated retrocopies in the TNBC (triple-negative breast cancer) subtype, RAB6C, RPS16P5, RHOB, MEIS3P2, PGAM1P5, HMGN2P15, and KRT8P13, were identified [82].
Increased expression of NANOGP8 and POU5F1P4/P5 retrogenes has been correlated with the phenotype of cancer stem cells (CSCs). The occurrence of this subpopulation, with high metastatic capacity, heralds intensive tumor expansion. In addition, the altered expression of a retrocopy of parental genes associated with maintaining pluripotency (NANOG and POU5F1) may also be a sign of early disease relapse [8][48]. It is worth noting that the knockdown of NANOG and NANOGP8 reduces the malignant transformation in prostate cancer cells [78]. Another example of cancer stage-specific retrocopy is SLC6A6P1, also known as SLC6A610P, associated with recurrence in high-grade serous ovarian cancer. This subtype of ovarian cancer is very common (over 70% of affected women) [84]. Moreover, due to the lack of reliable diagnostics, it is usually detected at an advanced stage [80].
The expression of the PDIA3P1 retrocopy was significantly increased in hepatocellular carcinoma. Interestingly, it has been demonstrated that the PDIA3P1 expression level is related to metastasis and TNM stage and that a knockdown of retrocopy causes reduced migration and invasion of cancer cells [49]. One of the metastasis-related retrocopies is the previously described CTNNA1P1, whose expression has been significantly correlated with node metastasis in colorectal cancer patients [68]. Cooke et al., sequenced samples from different stages of lung and colon cancer. Their analysis revealed retrocopies unique for a given stage. Nevertheless, they have also found several processed pseudogenes that are expressed in both the primary tumor and metastasis [26].
Examples of retrogenes that are associated with the response to a particular treatment can also be found in the literature. The expression of the retrocopy PPIAP43, for instance, has been correlated with radiosensitivity in a patient with small-cell lung cancer [46]. This discovery is quite important since radiation constitutes the main strategy in the case of this cancer. The sensitivity to radiation differs among oncological patients, but to date, there is no suitable biomarker. Another example is the FTH1P3 retrocopy, which promotes ABCB1 expression by sponging miR-206. As a result, resistance to paclitaxel is activated in breast cancer patients [47].
The relationship between the sequence variant of a given retrocopy and the individual’s prognosis has also been described. The occurrence of the E2F3P1 GA/AA allele at the rs9909601 locus has been associated with higher overall survival among hepatocellular carcinoma patients [45][81].
A leading role of retrogenes described in the literature is sponging the miRNAs. This posttranscriptional process regulates parental or other genes when they share binding sites for miRNAs [85]. Recently, a genome-wide analysis demonstrated that as many as 181 retrocopies putatively regulate 250 transcripts of 187 genes [5].
Under normal conditions, there is a balance in the expression level of retrocopies. Sufficient expression of retrocopies regulates suppressor genes by competing for shared miRNAs. This prevents suppressor gene transcript degradation and enables the translation process (Figure 3a). Consequently, the low expression level of retrogenes contributes to increased miRNA binding to suppressor genes, which drives them on the path of degradation and promotes cancer transformation (Figure 3c). A good example represents the decreased expression of the PTENP1 retrocopy in cancer [45].
Figure 3. Cancer-related retrocopies as miRNA sponges in normal and cancer cells. Binding miRNAs to the highly expressed retrogene and translation of the suppressor gene (a). Degradation of the oncogene mRNA by miRNAs binding due to a low retrogene expression (b). Degradation of the suppressor gene mRNA by miRNAs binding because of low retrogene level (c). Binding miRNAs to the high expressed retrogene and translation of the oncogene (d).
The opposite is true in the case of oncogenes. In a normal cell, low expression of a retrogene that shares binding sites with oncogenes results in a lack of competition for common miRNAs. As a result, miRNAs bind to oncogene mRNAs and direct them on the degradation path (Figure 3b). Under cancer conditions, elevated expression of a given retrocopy causes sponging miRNAs and prevents oncogene degradation (Figure 3d), which leads to cancer development. This type of relationship occurs in the abovementioned HMGA1 gene and its retrocopies [44].
We used the GenTree database (http://gentree.ioz.ac.cn/) [86] to determine the time of cancer-related retrogene origination. Figure 4 represents the estimated point of the origin of earlier described retrogenes (no data for TUSC2P1, retro_hsap_2623, or retro_hsap_4319). Some of them are characterized by heterogeneous origins (“patchy tree”), but future research is needed to establish whether this results from independent retroposition events or loss of retrocopy in some species.
Figure 4. Schematic tree illustrating the estimated time of the origin of cancer-related retrocopies during animal evolution.
The oldest retrocopies recognized as cancer-related are DIO3 and RHOB. Both arose during the early evolution of vertebrates and represent protein-coding retrogenes. They are widely distributed in animal genomes and well described in the literature. Conservation of ORFs in these quite old retrogenes may indicate that they acquired transcriptional capabilities very quickly and the propensity to accumulate mutations, typical for retrocopies, was “locked” due to the functional importance of gene products. RHOB is an important oncosuppressor, and a decrease in its expression promotes cancer, as described earlier. Increased expression of the DIO3 retrocopy has been related to tumor progression in papillary thyroid cancer and colon cancer [87]. In turn, a decrease in the level of DIO3 expression has been described in lung and breast cancer [82]. Old retrocopies, well recognized in human cancers and present in the genomes of all bony vertebrates, are great candidates for studies of the origins of neoplastic processes. The study of these genes may also be valuable for uncovering common features of tumors among species that are far away in the evolutionary tree. Nevertheless, a lack of data regarding species other than humans and mice seems to be the greatest difficulty in performing such research.
The majority of human cancer-related retrocopies are specific for primates. A large part of these groups arose before the split of New and Old-World Monkeys. There is also a group of retrocopies that are present in the human genome only. One of these is NANOGP8, the retrocopy of the NANOG gene that has 11 pseudogenes. Ten of them are derived from retroposition, and NANOGP8 is evolutionarily the youngest. Interestingly, in the chimpanzee genome, all NANOG copies can be found, except NANOGP8 [48]. Other cancer-related retrocopies unique to humans include AK4P1, RAB43P1, RPL21P16, and AC138392.1. Changes in the expression level in cancer have also been detected in the case of their parental genes, AK4 in lung cancer [88], RAB43 in gliomas [89], and RPL21 in breast cancer [90].
The role of the newly arose genes is intriguing from an evolutionary point of view. It has been suggested that the presence of genes characteristic of a given lineage is related to phenotypic adaptation [91]. Furthermore, it was noted that the expression of some evolutionarily young genes occurs specifically or preferentially in tumors. These genes were termed tumor-specifically expressed, evolutionarily novel (TSEEN) [25][92]. Moreover, it has been reported that new genes are also overrepresented among the testis and brain. It has been hypothesized that new genes can be recruited to processes under strong selection pressure (e.g., spermatogenesis, immune response) or processes involving novel organ development (placenta, expanded brain) [86]. We searched the GTEx database to assess the expression levels of cancer-related retrocopies in human normal tissues [93]. The results of the GTEx analysis are in agreement with the abovementioned statements. In the analyzed cohort, there were 25 retrocopies with low or no expression levels in normal tissues (max median TPM < 1). These genes are good candidates for potential TSEEN genes. Additionally, eight neoplastic retrocopies are active mainly in testis and brain tissues: CSDAP1, AC022217.1, SLC6A6P1, KRT8P13, MSL3P1, HMGA1P7, RACGAP1P, and NKAPL.
As we stated in the previous chapters, the process of tumorigenesis is common across the evolutionary tree. The study of human-specific retrocopies and differences in the retrocopy repertoire across vertebrate species may be essential for better understanding the high rate of neoplastic processes in our species. Moreover, retrogenes included in the TSEEN group, due to their species and tissue specificity, represent great potential as new tumor biomarkers.