Features and Functions of Alternative Exon Splicing Events: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , ,

Manipulation using alternative exon splicing (AES), alternative transcription start (ATS), and alternative polyadenylation (APA) sites are key to transcript diversity underlying health and disease. All three are pervasive in organisms, present in at least 50% of human protein-coding genes. These RNA variants have been shown to be highly specific, both in tissue type and stage, with demonstrated importance to cell proliferation, differentiation and the transition from fetal to adult cells. While alternative exon splicing has a limited effect on protein identity, its ubiquity highlights the importance of these minor alterations, which can alter other features such as localization.

  • alternative exon splicing (AES)
  • alternative transcription start (ATS)
  • alternative polyadenylation (APA)
  • protein-coding gene
  • localization
  • cytoplasm
  • tissue-specific

1. Introduction

The number of genes in the human genome has been an open question in biology for decades. Historically, there were several significant waves of interest from the scientific communities, speculating answers to this question. The earliest recorded attempt can be attributed to James Spuhler, who in 1948 published an article titled “On the Number of Genes in Man”, where he proposed two estimates, 42 k genes, extrapolating from the chromosomal length of fruit fly genes, and 20–30 k, based on loci count derived from X-linked lethal mutations [1]. Vogel produced the next estimate in 1964 in “A Preliminary Estimate of the Number of Human Genes”. By assuming that the entire genome was protein-coding and genes were of comparable length (reasonable assumptions at the time), he used the molecular weight of hemoglobin to calculate the DNA weight of haploid chromosomes and divided it by his “standard” gene size, predicting an enormous 6.7 million genes, which he acknowledged seemed “disturbingly high”. He then posited that instead using the gene length of Dipteran giant chromosomes (~50 k nucleotides) would place the number of genes instead at 67 k, a number he was much more comfortable with [2].
The announcement of the Human Genome Project (HGP) appeared to re-galvanize interest in the subject, with a slate of papers on the subject being published in the 1990’s. The HGP was launched in 1990, with the goal of constructing the first full sequence of the human genome and identifying all protein-coding genes. The first prediction released by the Human Genome Project was 100,000 genes, driven by an assumption that that standard gene size was 30 kb, and was used as a baseline for many years afterwards [3,4,5]. The estimates that came after covered the gamut, from lows of 14,000 all the way up to 312,000, utilizing a variety of methods such as estimating from ESTs, chromosomes, and genome homologies [6,7,8,9]. Even as late as 2000, after the first rough draft of the genome assembly had been released, estimates varied significantly, from 26 k to 120 k, highlighting the difficulty of identifying protein-coding genes [10,11,12,13,14]. Though we now know the true number is likely close to 20 k, dependent on the stringency of filtering, it still fluctuates as our understanding evolves and more experimental evidence is found [15,16].
With that, the question became how our systems were able to develop and display such complexity with a protein-coding gene count (~20 k genes in 3000 Mb) a scant few hundred more genes than the simple nematode C. elegans (~19 k genes in 97 Mb) [7,17]. Human’s complex system requires a diversity of gene products to build and maintain the ~30 trillion cells through a rapid progression of tissue- and stage-specific proliferation, differentiation, and development [18]. Decades prior, the discovery of alternative splicing (AS), alternative transcription start (ATS) and alternative polyadenylation (APA) sites hinted at potential explanations. At the time of their discovery, however, their significance was not well understood. Most protein-coding genes possess a dominant isoform, the product most prevalent in cells and tissues across time points and development stages, but these mechanisms allowed the creation of alternative transcripts from these same genes [19].
For many years, AS was considered the dominant mechanism, contributing to transcriptome diversity and publications focused on the subject climbing 425% over 10 years, from 243 a year in 1990 when the HGP was announced to 1276 by the year 2000 with the completion of the first draft of the human genome assembly. Alternative splicing refers to a process where exons from the same gene are combined into different mRNA transcripts, allowing for multiple but related proteins with distinct structures and functions. Here we will refer to AS as alternative exon splicing (AES) to distinguish it from ATS and APA, which are also forms of alternative splicing. However, researchers have mounting evidence that AES is not the primary driver of transcriptome and proteome diversity, but our understanding of these processes (especially ATS and APA which have received relatively minimal attention comparatively) is still shallow [20,21].

2. AES—Increasingly Prevalent in Complex Organisms

Splicing (at the pre-mRNA stage, co- or post-transcriptionally) was discovered in an adenovirus experiment in 1977, which also discussed possible regulation of alternative splicing [27]. Alternative splicing was formally proposed as a theory by Walter Gilbert in 1978 [28]. The application of AES is influenced by the strength of the splicing signal, intronic/exonic enhancers or silencers, RNA binding proteins (RBPs), epigenetic modifications, genetic mutations and more. These factors can be fine-tuned by the organism to accommodate the development stage, differentiation, tissue/cell type, and other environmental elements [29,30]. AES sites are present in 95% of genes, with the average human gene producing three or more alternative transcripts, but is considered the least impactful in its contribution to RNA and protein diversity [20,21,30]. Despite this, its prevalence in higher eukaryotes has increased substantially from primitive eukaryotic organisms such as C. elegans, with an average of 6.4 exons per transcript compared to ~11 in humans. C. elegans also undergoes alterative exon splicing in only 25% of their protein-coding genes, indicating an evolutionary advantage to incorporating additional splicing elements [17].
There are currently five recognized forms of alternative splicing: exon skipping (aka cassette alternative), alternative 5′ and 3′ splice site within exons (where one side has a constitutive splice site and the other has 2+ alternative splice sites—meaning there are alternate regions that can be included or excluded), intron retention, and mutually exclusive alternative exons (two exons where one or the other, but not both, can be included) [31,32,33]. Exon skipping (~30%) and alternative 5′ or 3′ splice site within exons (~25%) account for most AES events in eukaryotes. Alterations to the mRNA may introduce premature stop codons (PTCs) resulting in truncated proteins, which frequently ends in decay of the RNA products through nonsense-mediated RNA decay (NMD) pathways. Multiple studies have shown the majority of alternatively spliced transcripts are either not expected to or do not produce a protein product or express it at such a minor level as to be undetectable using mass spectrometry [20,34]. These transcripts may be producing micropeptides (some new research suggests this may be the case), peptides transcribed from a short open reading frame (sORF) with a length of 100 or less amino acids (AA), or have regulatory functions as RNA [20,35,36]. Shorter isoforms are most often missing one or more exons (whole or partial), leading to potential domain loss, such as localization signals, regulatory domains, and binding sites [32].

3. AES—Dominantly Located in the Cytoplasm

Changes that lead to altered localization signals can affect the ability of an RNA or protein to be properly positioned, either by causing the transporter protein to be unable to dock or removing the localization signal entirely, as is the case in the isoform c-FLIPS. The CFLAR gene isoforms (called c-FLIP) are Death Effector domain containing proteins that are recruited to the DISC complex and regulate caspase-8 and 10 as well as DR5, playing a role in FAS-mediated apoptosis and necroptosis as well as T-cell proliferation. While the long form contains a catalytically inactive caspase-like domain that contains a nuclear localization signal, resulting in a large proportion of that isoform in the nucleus, the short form includes exon 7, which contains a stop codon. This truncated protein is missing the domain containing the localization signal and is restricted to the cytoplasm where it acts in an anti-apoptotic manner, as opposed to the long form that can be either pro- or anti-apoptotic in function [37,38].
NUMA1 is another example of short isoforms localizing in the cytosol due to alternative exon splicing. The full (long) form of NUMA1 is a large protein (~238 kDa) consisting of N and C terminal globular domains, with a long central coiled-coil domain, and acts as a structural hub in the nuclear matrix, interacting with microtubules and involved in the formation and positioning of mitotic spindles. The nuclear localization signal in its C-terminal region allows this isoform to perform its function. The short isoform, NUMA1-s, consists of only the N-terminal globular region of the long isoform, and though its function has been only marginally explored compared to the long form, it appears to have strong tumor-suppressing effects, inhibiting the proliferation of HeLa, heavily impeding the formation of cell colonies and suppressing the expression of MYBL2, a gene known for being overexpressed in the development of multiple cancer types [39,40].
Short isoforms also commonly have either an antagonist effect to the long isoform, as seen in prolactin receptors (PRLRs), or a complementary effect as displayed by the short form of OPA1 [41]. PRLRs have short and long form isoforms, which can act as dominant negatives towards the other. This prevents excessive signaling of one form, with the short form operating different signaling pathways than the long form [42,43]. In the case of OPA1, which regulates mitochondrial stability and energetics, the long and short forms work together to balance function. The long forms are fusion-competent but poor at energetics whereas the short forms are competent at energetics and poor at fusion. The ratio of isoforms allows the fine-tuning of mitochondrial performance [44].
This is reinforced by the experimentally verified group of alternatively spliced isoforms we collected from the available literature. Out of the genes for which we cataloged splicing isoforms for, nearly all had verified localization of those isoforms, and the majority of those had verified “short” forms with shorter lengths than the canonical isoform. Of the ~75% genes whose isoforms had both verified localization and short forms, almost half of those short forms localized to the cytoplasm (NUMA1-s, IGF-1ea, c-FLIPs/r, and CD33-s). These isoforms showed high degrees of tissue specificity, concentrated primarily in brain and muscle tissues [Table S1].

4. AES—Commonly Expressed in Tissue-Specific Manner

AES is commonly tissue- and development-stage-specific, allowing myriad cell types to efficiently use their resources by fine-tuning expression. Tissue-specific AES events can make up as many as 65% of total splicing events, with the major transcript expressed varying in up to 60% of coding genes [45,46]. The prevalence of these events also differs by tissue, with splicing in nervous, muscle (particularly cardiac), testis and blood tissues comprising the majority, and these events often extending to the protein level [45,47,48,49]. Recent studies have shown the presence of microexons, exons comprising 1–9 AA, produced by splicing events in neuronal tissues involved in cell differentiation, synaptic function, and axon guidance. Found on surface-accessible domains, especially in charged regions, these microexons are located in close proximity to or overlapping protein domains, providing an additional level of regulation [50].
Besides being tissue-specific, AES events are often developmentally regulated. This makes them a key factor in highly region-specific cell differentiation and morphogenesis in multiple tissues such as embryonic neurons, spermatozoa, skeletal muscle myoblasts and stem/progenitor cells, among others. The precisely timed swapping of splicing regulators, and the dominantly expressed isoform, is integral for the transition from fetal tissue to adult tissue [51]. An example of this is the transition of dominant PTBP1 expression to PTBP2 expression during differentiation of progenitor cells into postmitotic neurons [47,48,52]. These changes can also differ between regions of the same tissues, as occurs with the gene LIMK2, a member of the LIM kinase (LIMK) family that regulates actin dynamics through cofilin phosphoregulation. LIMK2 encodes two isoforms: LIMK2a, the primary isoform, which is expressed evenly through the brain, and LIMK2b, which is highly expressed in the thalamus and cerebellum [45,53,54].

5. AES—Highly Dysregulated in Neurological Diseases

While these mechanisms allow for great diversity in the transcriptome and proteome, their dysregulation can have serious health consequences. Mutations in ~50% of the known RNA-modifying enzymes have been linked to human disease [55]. Splicing defects can arise from mutations to splicing elements, which are present throughout the genes in large numbers and can result in the deletion or creation of splicing elements, or to the splicing machinery itself. They are highly associated with nearly every aspect of cancer development, developmental syndromes like Prader Willi, and degenerative diseases such as retinitis pigmentosa [56,57]. Alternative exon splicing-related diseases fall into two broad categories: mutations within the transcript itself and mutations within the splicing machinery or regulatory elements. Mutations anywhere within the ORF may lead to frameshifts that result in transcripts often consigned to NMD, while changes in the CDS (SNPs/INDELS) can also lead to changes in amino acid identity. Mutations in introns, the UTR (particularly those regions closest to the coding sequence) and exon/intron borders can alter splicing elements, potentially leading to deleterious transcripts/proteins [31,52,56,58]. Splicing errors in the 3′ UTR can affect the stability and translation efficiency of transcripts, creating imbalances that lead to disease [46,49,52,59].
As only ~10% of a gene is comprised of exon coding sequences and changes to coding sequences most often have inconsequential effects, it should come as no surprise that ~85–90% of disease-causing splicing errors occur outside of exon regions [20,60]. Errors in brain tissue-specific networks are responsible for several known neurological disorders such as autism spectrum disorder (ASD) [48,56]. Mutations in RBFOX proteins, which are local regulatory factors, subsequently cause the alternative splicing of SHANK3, CACNA1C, and TSC2, all of which are involved in ASD. In addition, the mis-splicing of microexons by Ser/Arg RBPs is known to be involved in ASDs. Splicing errors are also highly associated with various forms of cancer [57,60].

This entry is adapted from the peer-reviewed paper 10.3390/genes14112051

This entry is offline, you can click here to edit this entry!
ScholarVision Creations