Retrotransposition of protein coding genes is RNA-based gene duplication leading to the creation of single exon nonfunctional copies. Nevertheless, over time, many of these duplicates acquire transcriptional capabilities. In human in most cases, these so-called retrogenes do not code for proteins but function as regulatory long noncoding RNAs (lncRNAs). The mechanisms by which they can regulate other genes include microRNA sponging, modulation of alternative splicing, epigenetic regulation and competition for stabilizing factors, among others.
Retrosequences, previously described as meaningless and biologically unimportant elements, are now recognized as evolutionarily significant, and their roles in shaping genomes, transcriptomes and proteomes have become increasingly evident [1][2][3]. This type of RNA-based gene duplicate is created through retroposition, which, together with DNA-based duplication, is known to be one of the major sources of new genes [2][4][5]. Formation of a retrocopy starts with transcription of the multiexonic parental gene (Figure 1). The mature mRNA is transported to the cytoplasm where in mammals proteins from LINE1 (Long interspersed nuclear elements 1), i.e., reverse transcriptase and endonuclease, accompanied by chaperones bind to the polyA tail. This complex is transported back to the nucleus where it anneals to the broken DNA ends and undergoes reverse transcription. Created cDNA is incorporated into new genomic surroundings. The final step includes creating short flanking repeats at insertion site, so called target site duplication (TDS). The presence of the 3′ polyA tail, and flanking sequences constitute signature of LINE-mediated retrotransposition [6][7]. These copies are regarded as “dead on arrival” pseudo(retro)genes, which usually lack introns, core promoters and other regulatory elements. Retrocopies are highly represented in placental mammals, especially primates [8]. In other genomes, Drosophila for example, the number of retroposed genes is relatively low [9][10]. In early studies of duplicated genes evolution, it was postulated that usually one of the duplicates accumulates mutations and becomes nonfunctional [11][12]. However, it occurred that “relaxed” selection and evolutionary freedom, which are characteristic of the majority of duplicates, may lead not only to pseudogenization but also to the acquisition of new functions [13][14]. Over time, two new phenomena related to functional evolution after duplication have been described: (i) neofunctionalization, where one copy acquires a new function and the other one keeps the original one [15], and (ii) subfunctionalization, where maintained function is shared between duplicated genes [16][17]. Additionally, as our and other studies showed, it is also possible that the retrogene (functional retrocopy) replaces its progenitor [18][19]. In the case of retrocopies, the first step needs to be obtaining regulatory elements, and there is growing evidence that many retrocopies gained the capability to be expressed over time [4][20][21].
Figure 1. Retrotransposition of protein coding genes. The parental gene is transcribed and transported to the cytoplasm where LINE1-derived proteins bind to it. This complex is transported back to the nucleus and anneals to the broken DNA ends. Next, the reverse transcription process takes place and cDNA is inserted in the genome along with short flanking repeats. Transcription of created retrocopy can results in coding or non-coding RNA. Transcripts of retroposition-derived genes may be involved in pathogenesis of many human diseases.
Regardless of being described as “junk DNA” for a long time, there are numerous examples demonstrating that retrocopies may successfully work as regulatory sequences as well as crucial protein coding genes [22][23][24]. A spectacular example of retrocopy function is the TP53 gene, a well-known tumor suppressor, and its retrocopies in elephants. Elephants have a lower-than-expected rate of cancer. It has been proposed that multiple functional retrocopies of TP53 are involved in an increased apoptotic response by compensating for the function of their progenitor [25][26]. This compensation mechanism, in turn, might underlie the cancer resistance observed in these animals. Nevertheless, in human protein coding is relatively rare among retrogenes. For example, in RetrogeneDB2 only 106 retrocopies, out of 4611, were identified as known protein coding genes, and only 847 (18%) has intact ORF (Open Reading Frame) inherited from parental gene. Interestingly, it is quite opposite in Drosophila where out of 83 identified in RetrogeneDB retrocopies, as many as 81 are annotated as known protein coding genes [27]. It was found that 256 retrocopies overlaps in the human genome with annotated lncRNAs and additional 230 may act as competing endogenous RNA since they share microRNA (miRNA) targets and have correlated expression with transcripts of 232 protein-coding genes [3]. Accumulating evidence suggests that substantial number of transcriptionally active retrocopies in human act as long noncoding RNAs (lncRNAs) [14][28]. Due to their high sequence similarity, they have a natural ability to regulate, via various mechanisms, their parental genes. Additionally, since almost 40% of retrocopies are located in introns of other genes, they possess great potential to control, as antisense transcripts, their host genes.
There are a number of ways in which retrocopies may regulate their progenitors or hosts. Retrocopies can be transcribed from the antisense strand and act as natural antisense transcripts (NATs) [29]. These NATs could be involved in multiple molecular processes, including epigenetic regulation (Figure 2A), chromatin remodeling [30], or, by forming RNA:RNA duplexes, stability control, RNA editing and processing (Figure 2B) [31]. Many retrocopies work as competing endogenous RNAs (ceRNAs), also known as microRNA sponges (Figure 2C) [15][32], while others can be a source of small RNAs [33]. Retrocopies can also compete with parental genes for other molecules, such as stabilizing factors (Figure 2D) [34] or translational machinery [35]. They may also influence the splicing of the host gene as potential factors that facilitate transcriptional interference [3][36][37][38]. The impact of retrocopies on the DNA level is also noticeable since they may be involved in nonallelic homologous recombination, resulting in the formation of chimeric transcripts (Figure 2E) [3].
In light of the variety of possible functions, lncRNAs originating from retrocopies (retro-lncRNAs) can play a significant role in the cell regulatory machinery. This is especially important when their progenitors or host genes are critical in disease pathogenesis.