2. Impact of Epitranscriptomic Marks on lncRNA Structures
When the first draft of the human genome was introduced, there were high hopes for understanding many of nature’s rules about the human body. Two decades later, we have realized that there is more to human genes than simply looking at DNA sequences. The same situation applies to elucidating the functions of lncRNAs. Many researchers were excited to read about terminal differentiation-induced non-coding RNA (
TINCR) as the authors identified TINCR box motifs, which are 25-nt long RNA sequences that interact with many other mRNAs
[27]. The discovery of TINCR box motifs prompted a further search for similar binding domains of other lncRNAs. However, such screening did not yield fruitful results
[28][29]. Not only was such search not successful, but it also recently became clear that
TINCR owns an evolutionary conserved open reading frame, which encodes for peptides of 87 amino acids
[30]. Within this TINCR peptide, one of 10 TINCR box motifs is included, suggesting that sequence alone cannot be used to infer functions of lncRNAs. There are a number of methods proposed and used to predict the functions of lncRNAs by combining different features of lncRNAs, including evolutionary-conserved sequence motifs, secondary structures, and potential binding of RNA-binding proteins and miRNAs
[28][29][31][32]. Yet, none of such methods can predict the functions of all lncRNAs, which is not surprising as not all protein-coding genes have been functionally characterized. In addition to the current challenges facing the computational functional predictions of lncRNAs, growing evidence of epitranscriptomic marks on lncRNAs is of particular interest as yet another parameter that researchers need to consider when investigating the functions of lncRNAs and other types of RNA species.
More than half of the human genome is made up of repetitive sequences
[33]. The Ensembl database currently classifies these repeat sequences into 10 classes (centromere, low complexity regions, RNA repeats, satellite repeats, simple repeats, tandem repeats, LTRs (long tandem repeats), SINE (short interspersed nuclear element), LINE (long interspersed nuclear element), and Type II transposons) and categorize those that cannot be classified into above 10 classes as “Unknown” (
https://m.ensembl.org/info/genome/genebuild/assembly_repeats.html accessed on 22 March 2021). Not surprisingly, such repetitive sequences are also present in lncRNAs
[34][35][36]. For example, the subfamily of SINE, Alu elements, can be found in 11% of the human genome
[37]. These 300-nt repetitive repeats are derived from transposons and exist only in primates. These elements can be expressed as their own RNA
[38] or parts of other transcripts (e.g., introns of mRNAs, lncRNAs), where their expression levels increase upon stresses (e.g., heat shock, hypoxia, viral infection)
[39][40]. When two Alu elements in opposite directions meet, they form double-stranded RNA, which can be recognized by RNA-binding proteins, such as ADARs. The ADAR-mediated A-to-I changes also occur frequently in lncRNAs
[41][42][43]. Not surprisingly, these A-to-I conversions change the secondary structures of RNA
[44], which is also an important point to be considered when analyzing for lncRNA functions as the binding of other macromolecules (i.e., DNA, RNA, and proteins) can alter depending on the presence (or absence) of double-stranded RNA motifs within a lncRNA
[45][46][47].
Besides A-to-I RNA editing, other epitranscriptomic marks affect the structures of lncRNAs. In particular, m
6A marks are of interest as it has been shown to be in a negative relationship with A-to-I RNA editing
[48]. More recently study shows that silencing of the m
6A writer, METTL3, in glioma stem-like cells altered A-to-I and C-to-U RNA editing (another type of RNA editing, which is less frequent than A-to-I) events by differentially regulating RNA editing enzymes ADAR and APOBEC3A, respectively
[49]. An interesting model is proposed recently regarding m
6A marks affecting the secondary structure of one of the most well studied lncRNA,
MALAT1 [50]. By performing secondary data analyses of dimethyl sulfate-sequencing (DMS-Seq) data from human erythroleukemic cell line K562 and psoralen analysis of RNA interactions and structure (PARIS) data from cervical cancer-derived HeLa cells compared to the working structural model of
MALAT1 in noncancerous cells, the authors postulated that m
6A-based structural changes of
MALAT1 might mediate cancer in a cell-type-specific manner
[50]. Thus, increasing evidence suggests that examining epitranscriptomic marks on lncRNAs is important to uncover the potential functions of lncRNAs
[51].
3. Conclusions
On the whole, we summarize updates on lncRNA epitranscriptomics, in the context of lncRNA function and biology. Even though the last couple of decades of research revealed the importance of epitranscriptomics in health and disease, several questions still need to be answered, such as future insights into the functional importance of RNA modification in lncRNAs? Are these modifications conserved between species, whether these modifications are mediators or actual drivers? How do we identify different modifications on the same lncRNA? Can these modifications be targeted to restrict disease progression? All these above questions would unravel our understanding of epitranscriptomics as novel disease mechanisms to design effective and targeted therapeutics.