1000/1000
Hot
Most Recent
Transcription occurs across more than 70% of the human genome and more than half of currently annotated genes produce functional noncoding RNAs. Of these transcripts, the majority—long, noncoding RNAs (lncRNAs)—are greater than 200 nucleotides in length and are necessary for various roles in the cell. It is increasingly appreciated that these lncRNAs are relevant in both health and disease states, with the brain expressing the largest number of lncRNAs compared to other organs.
Of the 3 billion nucleotides that make up the human genome, only 2% produce transcripts that code for proteins. However, transcription occurs across more than 70% of the human genome, producing many noncoding transcripts of various sizes [1][2][3][4]. Although a large fraction of these transcripts may not be functional, more than half of currently annotated genes produce noncoding RNAs (ncRNAs) [5]. Noncoding RNAs are divided into two main categories based on their size. A 200 bp size threshold was adopted due to the biochemical fractionation properties of RNA, thus separating long ncRNAs (lncRNAs) from small ncRNAs (sRNAs), which include tRNAs, rRNAs, and small, nuclear and nucleolar RNAs [6][7].sRNAs are cleavage products of endogenous or exogenous primary transcripts and often target the recruitment of other proteins in trans, at a physically distinct location from the locus of synthesis (Figure 1A,B) [8]. The most frequently studied groups of sRNAs are 20–30 bp in length and associate with Argonaute (Ago) family proteins [9], although 30–60 bp precisely processed Y RNA and tRNA fragments have also been detected—for instance, in microvesicles and exosomes isolated from patient-derived glioblastoma cell cultures [10]. Based on their mechanisms of biogenesis, these well-studied sRNAs have been divided into classes, among which we will briefly describe microRNAs (miRNAs) and PIWI-associated RNAs (piRNAs). miRNAs are cleavage products of endogenous hairpin ncRNAs by Drosha and Dicer proteins. miRNAs are loaded onto a ribonucleoprotein complex that includes Ago proteins and guide the RNA-induced silencing complex (RISC) to complementary transcripts (Figure 1A) [11]. RISC mediates posttranscriptional gene silencing by triggering transcript degradation or inhibiting the translation of the complementary mRNAs. piRNAs are associated with the PIWI clade of Ago proteins (Figure 1B). PIWI-piRNA complexes are crucial for protecting genomes against instability by repressing transposon activity via transcriptional and/or posttranscriptional silencing, especially in the germline, where cells undergo rapid changes in genome accessibility and transcription dynamics [12][13].
Figure 1. Biogenesis of small and long ncRNAs. (A) miRNAs are transcribed at independent loci (primary miRNA [pri-miRNA]) or together with host protein-coding genes (mirtrons). After processing by the Drosha complex or lariat-debranching enzymes, respectively, precursor miRNAs (pre-miRNAs) are shuttled to the cytoplasm for further processing by Dicer and TAR RNA-binding protein 2 (TARBP2). When two mature miRNAs originate from opposite arms of the same pre-miRNA, one mature species is typically more abundant than that derived from the opposite arm, in which case, an asterisk indicates the low abundant species. Following generation of mature miRNAs, which are loaded onto the RNA-induced silencing complex (RISC), miRNAs function through degradation of protein-coding transcripts or translational repression. (B) PIWI-interacting RNAs (piRNAs) are mostly expressed as ssRNAs from mono- or bidirectional clusters. Additional piRNAs may be produced through a PIWI-protein-catalyzed amplification loop (“ping-pong cycle”) via sense and antisense intermediates.
The PIWI ribonucleoprotein (piRNP) complex functions in transposon repression through target degradation and epigenetic silencing. Roles of the piRNP complex in translation repression, if any, remain unknown. (C) (Top) Long intergenic ncRNAs (lincRNAs) are transcribed by Poll II from intergenic regions (>20 kb from closest protein-coding gene), and spliced, capped, and polyadenylated. (Middle) MALAT1 and NEAT1 are well-studied, highly conserved lncRNAs that are processed by RNase P and stabilized by U-A-U triple helix structures at their 3′ ends. Their 3′-end products are further processed to form MALAT1-associated small cytoplasmic RNA (mascRNAs), which are ~60 nt in length and have unknown functions. (Bottom) Circular RNA (circRNAs) are produced from back-slicing circularization of exonic pre-mRNAs. During splicing, pre-mRNAs are spliced into linear, mature mRNAs or back-spliced into circRNAs.The majority of ncRNA species are lncRNAs, which bear some resemblance in biogenesis and processing to mRNAs, except that traditionally no protein has been detected or predicted as a result of their production (Figure 1C) [14][15][16]. In recent years, however, functionally annotated lncRNAs, such as LINC-PINT, have demonstrated short open reading frames that express small peptides with regulatory function, indicating dual RNA-peptide activity from “noncoding” genomic loci [17][18]. This has informed nomenclature as efforts to catalogue the noncoding genome have rapidly accelerated. Still, lncRNA species can be subdivided based on their proximity to protein coding genes and unique features of biogenesis that influence their final structure (Figure 1C) [19][20][21]. For example, intergenic lncRNAs are transcribed from loci >20 kbp away from any protein-coding gene and include independently transcribed loci as well as RNAs transcribed from enhancers that aid the activation of looped promoters. Well-described lncRNAs MALAT1 and NEAT1 are processed by RNase P and stabilized by U-A-U triple helix structures at their 3′ ends. Finally, circular RNAs are produced by back-splicing circularization of “exons” from pre-mRNAs and remain stably retained in the nucleus without polyadenylation signaling.
Generally, lncRNAs are less evolutionarily conserved at the sequence-level than mRNAs, contain fewer “exons” after splicing, and are more likely localized in the nucleus [22][23][24]. Nuclear lncRNAs have been historically investigated in the contexts of their gene neighborhoods for acting as platforms that assemble regulatory complexes in cis, although mechanisms in trans at physically distant genomic loci are becoming increasingly appreciated, for example by nontraditional base pairing and/or docking of chromatin-associated ribonucleoprotein complexes [25][26][27]. Specific and more widespread pleotropic functions have been increasingly ascribed to each class of RNA molecules, including various architectural and/or gene regulatory roles in different cellular compartments [23]. While many RNAs are unlikely to act alone and instead interact with specific RNA binding proteins, previously described DNA binding proteins, such as the GBM master transcription factor SOX2, have demonstrated RNA binding capabilities, increasing the potential repertoire of molecular interactions in tumor cells for diverse specialized functions [28][29].The lack of evolutionary conservation in many lncRNAs sequences has spurred speculation that many transcripts of low abundance are simply noise, perhaps reflecting a degree of promiscuous action of the transcription machinery sampling open chromatin regions [23][30]. However, it is clear that many lncRNAs have specialized and cell context-specific functions beyond contributing to general transcriptional tone. Regardless of the scope of a given lncRNA’s activity, it has become increasingly apparent that the conservation of secondary structure is a stronger driving force for noncoding transcriptome evolution than the conservation of primary sequence. Primary sequence relationships between lncRNAs were deconstructed to evaluate similarity based on the abundance of short motifs called k-mers [31]. Transcripts with related function often had similar k-mer profiles despite a lack of linear homology, and k-mer profiles correlated with protein binding partners and with subcellular localization. This supports the importance of binding motifs, patterns, and partners, for dictating the local thermodynamic environments that define epigenetic activity and a need for better understanding of the molecular “language” used in particular by malignant cells, which we hypothesize rely on epigenetic flexibility [32]. Furthermore, even ‘junk’ transcripts reflecting biological noise may provide raw material for the evolution of functional noncoding transcripts by nonadaptive mechanisms, such as constructive neutral evolution [33]. For example, although chromatin remodeling by RNA polymerase II likely evolved under the selective pressure to suppress spurious transcription that originates within gene bodies, this process can be co-opted to downregulate endogenous genes [34]. Under weak selective pressures, transcription binding sites and cryptic transcriptional start sites in intergenic regions persistently emerge and vanish, so long as they do not perturb the equilibrium that drives an organism’s fitness. The high prevalence of these sites in the genome, sustained by their frequent appearance and disappearance over time, increases the chances that beneficial transcriptional regulatory events arise.
Of the tens of thousands of lncRNA genes annotated from the GENCODE and ENCODE projects, 40% (anywhere from 4000–20,000 lncRNA genes) are expressed specifically in the brain [24]. This is at least two times more than any other organ, including the testes, although the latter organ demonstrates the highest expression levels of lncRNAs despite having a smaller repertoire. The number of brain-specific lncRNAs is strikingly large given the human genome contains approximately 20,000–25,000 protein-coding genes in total and around 2500 miRNAs. Although the protein and miRNA expression profiles of the central nervous system are more diverse than other organs, only a subset of these are specific to the nervous system [35][36]. The expression of lncRNAs is dynamically regulated during neural development and in response to neuronal activity [37]. Specific lncRNA expression is often highly restricted to particular brain regions and it has been suggested that lncRNAs provide more information about cell type identity during mammalian cortical development than protein-coding genes [38][39][40]. This implies an intimate connection and parallel diversity between lncRNAs and fate commitment in the neuroectodermal lineage as a means of coordinating spatially distinct, yet synchronous responses with contacts and processes. These dynamics and region-specific expression patterns are coordinated by cell-intrinsic or signal-dependent transcription factors as well as well-defined chromatin dynamics at lncRNA loci [41][42]. This raises the possibility that an intricate, highly regulated noncoding RNA axis evolved for highly specialized cellular functions, such as in the brain and testes. The testes express trans-acting regulatory lncRNAs required for the complex, intricate process of spermatogenesis [43]. Rigorous investigation about whether the specific repertoires of noncoding transcriptomes have any relation to the common immune-privileged status of these organs has only recently emerged [44][45]. Access to a large portion of the noncoding genome and transcriptome in highly specialized, immune-privileged tissues thus represent unexplored mechanisms that may contribute to the accelerated Darwinian evolution in malignantly transformed cells originating in these organs.