A core task of the supraspliceosome is catalyzing the mRNA splicing reaction in its four spliceosomes. The splicing reaction itself consists of two subsequent transesterification reactions that together yield two ligated exons and excise the intronic sequence. The splicing process is initiated by recognition of splice sites (ss) in intronic sequences. The vast majority of introns in eukaryotes are U2-type introns that contain highly conserved GT and AG dinucleotides at their 5′ss and 3′ss, respectively. The remaining part of the introns are U12-type introns, accounting for ~0.01-0.02% of eukaryotic introns, that have AT-AC boundary sequences
[27]. Although there are some exceptions, U2-type introns undergo so-called canonical splicing by the major spliceosome, whereas U12-type introns are spliced by the minor spliceosome through non-canonical splicing. As the majority of human introns are U2-type introns, this review will only focus on the major spliceosome. The major spliceosome consists of the five snRNPs U1, U2, U4, U5 and U6; DExD/H-type RNA-dependent ATPases/helicases that facilitate structural remodeling of the snRNPs at different steps in the splicing process, as well as many other splicing factors that regulate splice site usage and the mRNA splicing reaction. The snRNPs are the core units of the spliceosome. In this section, each snRNP’s structure and function will be described in more detail. An overview of all the snRNP-specific and -associated RNA splicing factors according to the KEGG, amiGO and Reactome databases is given in
Supplementary Table S1 and the most important ones are shown in b.
2.1.1. Biogenesis of the Spliceosome: Assembly and Transport of Sm-snRNA Complexes
Biogenesis of snRNPs for assembly of spliceosomes has been reviewed more extensively elsewhere
[28][29][30][31] and is illustrated in . The U1, U2, U4 and U5 snRNAs are transcribed in the nucleus and exported to the cytoplasm, where they associate with the seven Sm proteins that form a doughnut-shaped circular structure around the snRNA. In the case of the U6 snRNA, the LSm proteins form the typical heptameric ring structure around its Sm site and its biogenesis presumably occurs within the nucleus. The important (L)Sm-snRNA structure in the heart of the spliceosome is discussed in detail below. Complete Sm rings incorporating their snRNA (termed Sm-snRNA complexes here) need to translocate back to the nucleus where they reside in Cajal bodies. There, further snRNP biogenesis takes place, by binding of U snRNP particle-specific proteins to the Sm-snRNA complexes. Mature snRNPs are primarily located in nuclear membrane-less organelles termed nuclear speckles that serve as a reservoir for spliceosome components. Nuclear speckles are found in interchromatin regions close to actively transcribed genes (reviewed by
[32][33]). Many proteins involved in transcription, epigenetic regulation, and RNA processing, modification and packaging are present in these speckles, making them nuclear gene expression hubs. The localization of proteins in nuclear speckles is regulated by post-translational modifications such as phosphorylation, addition of phosphoinositol derivates, ubiquitination and SUMOylation
[32]. The localization in speckles is important, as increasing the levels of U1 snRNPs did not achieve enhancement of mRNA production when no speckles were present
[34]. For use in the mRNA splicing reaction, snRNPs are recruited from the speckles to active transcription sites at the interface of speckles and chromatin. The Sm and LSm proteins have been postulated to be the earliest spliceosomal components, with their gene family nearly achieving its current composition already by the time the last eukaryotic common ancestor emerged approximately two and a half billion years ago
[35]. The seven Sm proteins that together form the heteroheptameric ring structure are denoted SmB/B’, SmE, SmF, SmG, SmD1, SmD2 and SmD3. Additionally, SmN is a tissue-specific substitute for SmB/B’ expressed primarily in the brain and heart that affects mRNA splicing through downregulation of mature U2 snRNP when it is incorporated in its Sm ring
[36]. The structurally highly similar (L)Sm proteins form a heptameric ring structure around each snRNA, which presumably functions as a platform for other snRNP proteins to assemble onto. Sm proteins are crucial for the assembly, stability and nuclear import of snRNPs and hence for proper functioning of the spliceosome.
Figure 2. Spliceosome snRNP subunit biogenesis. Details are given in the main text. Since U1, U2, U4 and U5 snRNPs are compiled in the same manner, the biogenesis of U1 snRNP is shown as example. The snRNAs are transcribed by either RNA polymerase II (U1, U2, U4, U5) or RNA polymerase III (U6). RNA polymerase II-transcribed snRNA is exported to the cytoplasm after quality control in Cajal bodies (CBs). SmD1, SmD3 and SmB/B’ are dimethylated (displayed by stars) in the cytoplasm by the methylosome complex. The Sm ring is assembled around the snRNA by the SMN complex and the Sm-snRNA core complex is imported back into the nucleus where snRNP maturation takes place in the CBs. The exact LSm assembly route onto the U6 snRNA is not known but presumably takes place in the nucleus. The LSm-U6 snRNA core complex probably moves to the nucleolus where it is 2-O’-methylated (displayed as a star) and subsequently transported to the CBs where U6-specific proteins are assembled and U4/U6.U5 tri-snRNP maturation can take place. Mature snRNPs are recruited into the supraspliceosome to take part in the RNA splicing reaction.
Each Sm protein consists of a short N-terminal α helix and five anti-parallel β strands, representing the highly conserved Sm fold. β strands 1–3 represent Sm motif 1 which is involved in the protein-snRNA interaction. β4 and β5 represent Sm motif 2 which is involved in the protein–protein interactions within the Sm heptameric ring structure. β4 interacts with the β5 of its neighbor through the formation of hydrogen bonds. The Sm motifs are highly conserved
[37]. The Sm ring structure is further stabilized by hydrophobic residues that point towards its center and make contacts with other Sm proteins
[38].
The association of the Sm ring with the U snRNA represents the first phase of the snRNP assembly
[39]. The stepwise assembly of the Sm-snRNA complex takes place in the cytoplasm () and is mediated by the methylsome complex (consisting of protein arginine N-methyltransferase 5 (PRMT5), methylosome protein 50 (WDR77) and methylosome subunit pICln) and the SMN complex (consisting of Gem-associated proteins Gemin 2–8 and SMN1 and 2)
[40][41]. SmD1-SmD2 initially forms a dimer bound by pICln that is assembled onto PRMT5/WDR77 where SmD1 is symmetrically dimethylated. SmD3/SmB/B’, in parallel, is bound by another pICln molecule and recruited to a second PRMT5/WDR77 complex where SmD3 and SmB are both symmetrically dimethylated. In plants, Sm protein dimethylation by PRMT5 was shown to be required for recruitment of the NineTeen/Prp19 complex (NTC) and therefore proper functioning of the spliceosome
[42]. In addition to PRMT5, PRMT7 was also found to be required for symmetrically dimethylating Sm proteins
[43]. Subsequently, the SmF-SmE-SmG trimer binds to SmD1-SmD2-pICln on the methylosome complex and forms the 6S complex
[40]. The 6S complex is an Sm ring intermediate in which pICln functions as an Sm protein mimic reserving space for SmD3-SmB/B’
[40][44]. Of note, pICln was postulated to not only function as structural chaperone in the formation of the Sm protein ring, but also to prevent the formation of aggregates by unassembled Sm proteins
[41]. In the final steps of the Sm ring assembly, the 6S complex and SmD3-SmB/B’ complex release their pICln subunits and are loaded onto the SMN complex through interactions with Gemin2, where the final heptameric Sm ring is formed
[44][45]. Interestingly, Sm proteins are not the only proteins that adopt the Sm fold, as this structural arrangement was also observed for Gemin6 and Gemin7 in the SMN complex. Gemin6 and Gemin7 form a dimer through their β4 and β5 strands, respectively. The β5 of Gemin6 and β4 of Gemin7 are involved in binding Sm proteins, thereby facilitating the interaction between Sm proteins and the SMN complex
[46]. Defects in the association of Sm proteins with the SMN complex can have drastic phenotypic consequences, as malfunction of the SMN complex due to loss of SMN1 leads to spinal muscular atrophy (reviewed by
[47]). Additionally, the F22S mutation in SmE leads to the disruption of its interaction with the SMN complex and is associated with microcephaly
[48]. Above-mentioned SMN defects give rise to aberrant splicing patterns.
Gemin5 is an RNA-binding protein (RBP) that is part of the SMN complex and is involved in the recruitment of the snRNA towards the Sm proteins through recognition of the Sm site and the 7-methylguanylate (m
7G) cap of the snRNA
[49][50][51][52]. The Sm site of U1 has a weaker binding affinity than the Sm sites of other U snRNAs for Gemin5. Here, U1 snRNP 70 kDa (SNRNP70) in the cytoplasm recruits Gemin2-Sm complexes directly to U1 snRNA, independent of Gemin5
[53]. The Sm proteins are organized around the snRNA in the following consecutive order: SmE, SmG, SmD3, SmB, SmD1, SmD2 and SmF, with each Sm protein directly interacting with a single nucleotide of the Sm site
[54]. The Sm site consists of a 5′ adenosine preceding five consecutive uridines (with U1 being the exception that contains a guanine instead of the fourth uridine), followed by a guanine (with U5 and U6 snRNAs being the exceptions, which contain a uridine or a 2′,3′-cyclicphosphate group, respectively)
[54][55][56][57]. The 5′ adenosine and the 2′-OH groups of the sugar backbone of the Sm site are important for stable Sm-snRNA complex formation. Additionally, flanking nucleotides of the Sm site determine the rate of Sm protein assembly onto the snRNA
[39]. SmE, SmF and SmG make the initial contact with the Sm site, which is stabilized by SmD1 and SmD2, of which the contact between SmG and the first uridine is highly conserved
[57]. Every Sm protein clamps its corresponding Sm site nucleotide in between their loops L3 and L5
[38][54]. The positioning of the Sm ring relative to the Sm site is not the same for every U snRNA. In the human U1 snRNA, the last two nucleotides of the Sm site interact with SmD2 and SmF
[54], whereas SmD1 and SmD2 interact with the last two nucleotides of the U4 snRNA Sm site
[38]. A study in yeast demonstrated that mutagenesis of conserved Sm motif 1 amino acids in individual Sm proteins did not compromise cell viability, but simultaneous mutation in two Sm proteins was lethal
[58]. Thus, at least 6 intact RNA binding sites in the heptameric Sm ring seem sufficient for Sm-snRNA complex formation. During the assembly of the Sm ring around the snRNA by the SMN complex, the snRNA cap is hypermethylated by trimethylguanosine synthase 1 (TGS1) into 2,2,7-trimethylguanosine (m3G)
[59].
The m3G-cap is recognized by snurportin-1, which together with importin-β
[60] transports the fully assembled Sm-snRNA core complexes back into the nucleus where they are loaded into membrane-less nuclear Cajal bodies. Here, the additional snRNP-specific proteins are loaded onto Sm-snRNA core complexes and snRNP maturation takes place
[61]. snRNAs are 2′-O-methylated by small Cajal body-specific RNAs (scaRNAs) and pseudouridylated
[28][29][31]. Additionally, snRNAs can be methylated at the N6-position of 2′-O-methylated adenosine residues, which affect spliceosomal function. For instance, the U6 snRNA is methylated at its 2′-O-methyl adenosine at position 43 by RNA N6-adenosine-methyltransferase METTL16 and this affects 5′ss recognition
[62]. In the U2 snRNA, the adenosine at position 30 is 2′-O-methylated and subsequently N6-methylated by N(6)-adenine-specific methyltransferase METTL4, which affected 3′ss usage
[63]. Both N6-methylated and 2′-O-methylated forms can coexist, as the N6-methylation can be removed by the RNA demethylase alpha-ketoglutarate-dependent dioxygenase FTO
[64]. As the adenosines are 2-O′-methylated before they are N6-methylated, it is assumed that the latter also occurs in Cajal bodies upon re-uptake of the Sm-snRNA complex into the nucleus.
2.1.2. Biogenesis of the Spliceosome: Structure and Assembly of the U1 snRNP
The different snRNAs incorporated in the Sm-snRNA complexes direct the formation of the different U snRNPs of the spliceosome in the Cajal bodies (b and ). The U1 snRNA forms four stem-loop (SL) structures, which are oriented in a latin cross-like shape, with SL4 representing the stem. The Sm site is located between the stem and the four-helix junction
[65]. The U1 snRNA four-helix junction is situated over a flat surface consisting of the N-termini of each Sm protein. The N-terminus of SmD2 is particularly long and extends into the minor groove of the U1 snRNA. SmB interacts with the SL2 backbone
[65]. SL1 and SL2 are also bound by the SNRNP70 and U1 snRNP A (SNRPA) proteins, and SNRNP70 helps to guide the snRNA through the cavity of the Sm ring, together with SmD1 and SmD2
[54]. The U1 Sm core has an additional and unique assembly pathway, in which SNRNP70 plays the key role. As mentioned above, it recruits Gemin2-Sm complexes directly to U1 snRNA, independent of Gemin5. Moreover, SNRNP70 inhibits the formation of other snRNP Sm cores, thereby acting as a regulator of the cell’s snRNP repository. This extra U1 Sm core assembly pathway could be an explanation as to why the U1 snRNP is the most abundant snRNP in vertebrates
[53]. Another protein involved in U1 Sm core assembly is the RBP FUS, which associates with U1-related proteins and SMN complexes. Mutations in FUS that are associated with amyotrophic lateral sclerosis (ALS) were found to dysregulate SMN function, leading to loss of snRNA levels and affected splicing patterns
[66].
The U1 snRNP is the first snRNP to be recruited to pre-mRNA to start the splicing reaction. The composition of U1 snRNP is shown in b. A cryo-EM study of the yeast U1 snRNP revealed a shape similar to that of a footprint
[67]. The U1 snRNP core (or the foot’s ball) is composed of the Sm-snRNA complex and the SNRNP70, SNRPA and SNRPC proteins, with SNRNP70 and the stem-loop 1 (SL1) and SL3 of the U1 snRNA sticking out like toes. The auxiliary area (or the foot’s heel) consists of Prp42, Luc7/LUC7L2, Snu56, Nam8/TIA-1 and Prp39/PRPF39. There is no human homologue reported of the yeast Prp42. Instead, in humans, PRPF39 forms a homodimer which interacts with SNRPC, connecting the ball with the heel of the foot, and mimicking the Prp39/Prp42 heterodimer observed in yeast. In immunoprecipitation
[68] and X-ray crystallography
[65][69] studies of the human U1 snRNP, it was revealed that the N-terminal domain of SNRNP70 plays a crucial role in holding the core domain of the U1 snRNP together, specifically through interacting with SmD2
[65][68][69] and SmB/B’
[68]. Moreover, A feedback regulatory mechanism has been described between SNRNP70 and SNRPC, effectuating efficient U1 snRNP homeostasis. Specifically, SNRPC promotes alternative splicing of the SNRNP70 transcript through usage of an alternative 3′ss. This introduces a premature termination codon (PTC), resulting in a truncated splice variant of SNRNP70 that is targeted for nonsense-mediated decay (NMD), and therefore in decreased protein expression of SNRNP70. This in turn leads to decreased incorporation of SNRPC in the U1 snRNP, restoring proper splicing of SNRNP70 to produce the functional protein
[70].
2.1.3. Biogenesis of the Spliceosome: Structure and Assembly of the Other snRNPs
The U2 snRNP, which is the next spliceosome unit to be recruited, consists of the U2 Sm-snRNA complex, the SF3a and SF3b complexes and additional U2-specific and -related proteins. The U2 snRNA, like U1 snRNA, forms four stem-loops in the Sm-snRNA complex, of which SL3 and SL4 are bound by U2 snRNP A′ (U2-A′) and B″ (U2-B″) proteins. SL2a is contacted by the SF3b complex. The branchpoint-interacting stem-loop (BSL) is located between SL2a and SL1, and is clamped between SF3B1’s HEAT domains. SF3A3 contacts the base of the BSL. The Sm site is located between SL2b and SL3
[71]. The SF3a complex consists of splicing factor 3A subunits 1, 2 and 3 (SF3A1, SF3A2 and SF3A3). SF3A1 facilitates the interaction between the U1 and U2 snRNPs by interacting with the U1 snRNA through its ubiquitin-like (UBL) domain
[72]. SF3A1, but also U2-related Calcium Homeostasis Endoplasmic Reticulum Protein plays an additional role in the recruitment of U2 snRNP towards the pre-mRNA through their interactions with branchpoint-bridging protein (BBP)/Splicing Factor 1 (SF1)
[73]. The SF3a complex bridges the Sm-snRNA complex with the SF3b complex
[71]. The SF3b complex consists of RNA splicing factor 3B subunits 1, 2, 3, 4, 5 and 6 (SF3B1-6) and PHD finger-like domain-containing protein 5A (PHF5A). In this complex, SF3B6 is positioned in such a way that it can bind to the branchpoint sequence (BPS), facilitating BPS recognition by the U2 snRNP. SF3B1 adopts a closed conformation surrounding SF3B6, and serves as a platform for BPS binding together with PHF5A
[71][74]. SF3B1 also appears to play a role in guiding the U2 snRNP towards the pre-mRNA, as this mRNA splicing factor was shown to interact with chromatin at nucleosomes located at exons to be spliced
[75].
While U1 and U2 snRNPs assemble individually before being recruited in the spliceosome to participate in the RNA splicing reaction, U4, U5 and U6 preform a tri-snRNP complex in two steps, where U4/U6 first assemble as di-snRNP before the U5 snRNP attaches. U5 snRNA contains one large stem-loop (SL1) and a smaller SL2. The Sm site is located between these loops
[76] at the 3′ end
[4]. In the mRNA splicing process, SL1 is important for basepairing with the 5′ exon in the pre-mRNA
[77]. In yeast, the U5 Sm core serves as a protein-binding platform for U5 specific proteins Prp8/PRPF8 and Snu114/EFTUD2 that associate either through direct interaction with U5 snRNA or with the Sm ring, respectively
[4]. The U4 and U6 snRNAs are different from U1, U2 and U5, as these are duplexed within the U4/U6 di-snRNP and U4/U6.U5 tri-snRNP. The U4 snRNA comprises three SLs. The Sm site is located at the 3′ end
[78] and is flanked by SL2 above the flat face of the Sm ring, and by SL3 below the tapered side of the Sm ring. The α-helix that makes up the long N-terminus of SmD2 interacts with SL2 and its lysine-rich L4 loop between β3 and β4 interacts with the backbone of SL3 of the U4 snRNA. Moreover, SL2 interacts with SmB and SmG, and SL3 interacts with all Sm proteins except SmG and SmD3
[38]. On either side of SL2, stem 1 and 2 are basepaired with the U6 snRNA
[4][78]. The U4/U6.U5 tri-snRNP is cone-shaped, with the U5 snRNP core located at the tip and the U4/U6 di-snRNP in the broader top part, with the (L)Sm heptamers located at the outer corners of the cone. The following proteins are involved in the U4/U6.U5 snRNP assembly. Small Nuclear Ribonucleoprotein 13 (SNU13) binds to a stem-loop in the U4 snRNA duplexed with the U6 snRNA. Next, pre-mRNA Processing Factors (PRPF) 31, 3 and 4 are recruited, giving rise to the complete U4/U6 di-snRNP
[79]. Prior to U5 snRNP assembly, PRPF8, EFTUD2 and SNRNP200 form an assembly intermediate with protein AAR2 homolog. The actual assembly of the U5 snRNP is supported by heat shock protein 90 and R2TP complex (consisting of RuvB-like 1, RuvB-like 2, PIH1 domain-containing protein 1 and Homeobox-containing protein 1) and zinc finger HIT domain-containing protein 2
[80][81].
The U6 snRNA is the exception from all other snRNAs in the sense that it is not bound by Sm, but by LSm proteins. It does so at its 3′ end, where it is uridylated by Terminal Uridylyl Transferase 1
[82]. The LSm ring can only recognize U6 snRNA (and not the other snRNAs) because this is the only snRNA that contains the 3′-terminal U tract
[83][84]. In yeast, U6 snRNA’s 3′ end reaches into the ring structure, but does not stick through it as observed for the other snRNAs, thereby only interacting with one side of the ring
[85]. The authors speculate that this leaves RNA-binding domains on the other side of the ring accessible to facilitate interactions between the U4 and U6 snRNAs
[85]. Indeed, the LSm proteins were shown to facilitate the formation of the U4/U6 duplex
[83]. LSm proteins share homology with Sm proteins
[83], also form the Sm fold consisting of a short N-terminal α helix and five anti-parallel β strands
[86] and similar to the Sm ring also assemble in a stepwise manner. LSm6-LSm5-LSm7 resembles SmF-SmE-SmG but at least in yeast forms a hexameric LSm657-657 intermediate, which subsequently incorporates LSm2-LSm3 (resembling SmD1-SmD2) and finally LSm4-LSm8 (resembling SmD3-SmB/B’), to form the nuclear LSm2–8 complex that is incorporated in the U6 snRNP
[86]. Similar to SmD3 in the Sm ring, LSm4 is symmetrically dimethylated which enables interaction with the SMN complex
[87]. Comparable to the Sm ring, for each LSm protein, its β4 strand interacts with the β5 strand of the neighboring LSm protein, and the LSm ring is stabilized by hydrophobic interactions through N-terminal α helices
[86].
2.1.4. Dynamic Composition of the Spliceosome: Assembly on the pre-mRNA Substrate
Throughout the RNA splicing cycle, different spliceosome intermediates are formed, termed the E (early), A (pre-spliceosome), B (pre-catalytic), B
act (activated), B* (catalytically activated; for the first transesterification reaction), C (catalytic), C* (catalytically activated; for the second transesterification reaction) and P (post-splicing) complexes (). These intermediates correspond to specific phases of the splicing process, and consist of varying compositions of snRNPs and splicing factors that are described in more detail below. Hence, during the splicing reaction, snRNPs and splicing factors are recruited, rearranged and released in a sequential manner, making the spliceosome a highly dynamic and fluid structure. Many papers have been published over the past decade regarding the yeast and human spliceosome intermediates, describing their structural properties and protein and RNA components (reviewed by
[88][89][90]), the major findings of which are summarized here. Notably, the genomic architecture in lower eukaryotes such as yeast is different from that in higher eukaryotes such as mammals. The former usually have relatively long exons and short introns; the latter often short exons and sometimes very long introns. Most fundamental studies into the biology of the spliceosome were done in yeast, or using recombinant transcripts with short introns. Therefore, the general description of spliceosome assembly below primarily applies to pre-mRNAs with short introns, known as the intron definition model. The steps in the process that are probably different for transcripts with long introns, according to the postulated exon definition model
[91], are mentioned separately.
Figure 3. Pre-mRNA splicing reaction performed by the spliceosome. The dynamic composition of the spliceosome, with it different intermediate complexes, is illustrated. Details are given in the main text. Light and dark blue boxes, exons; line, intron; GU, 5′ splice site; AG, 3′ splice site; YUNAY, branchpoint sequence; Y(n), polypyrimidine tract; M1 and M2, Mg2+ metal ions at the catalytic site.
The early complex E is the first intermediate that can be discerned in the pre-mRNA splicing process. As mentioned above, the intronic sequence that is to be spliced out contains the highly conserved dinucleotides GT and AG at the 5′ss and 3′ss, respectively, that are recognized by the spliceosome. Moreover, the BPS and polypyrimidine tract (PPT) in the intron play crucial roles in the recruitment of splicing factors. The complex E intermediate is formed when U1 snRNP binds to the 5′ss through basepairing with the 5′ end of its U1 snRNA. SNRNP70, together with SmD3, coordinates SNRPC to support the base-pairing interaction between the 5′-end of the U1 snRNA and 5′ss on the pre-mRNA substrate through its zinc-finger domain
[65][69][92]. The recruitment appears to be mediated by RNA polymerase II while it is synthesizing the pre-mRNA; and dependent on the presence of members of the SR family of RNA splicing enhancer proteins
[93]. In yeast, recognition of the 5′ss was shown to be supported by U1C/SNRPC, Luc7/LUC7L2, Nam8/TIA-1
[94] and Prp39/PRPF39
[95]. SF1 binds to the BPS
[94] and U2 snRNP auxiliary factor 65 kDa subunit (U2AF65) and 35 kDa subunit (U2AF35) are recruited to the PPT and intronic 3′ ss, respectively, of the target pre-mRNA
[96][97]. Subsequently, on short introns U2 snRNP is recruited through interacting with U1 snRNP and SF1, replacing SF1 at the BPS. The association of U2 snRNP is further stabilized by U2AF65
[73]. On long introns, U2 snRNP is also recruited to SF1 and U2AF65 near the 3′ss and associates with U1 snRNP, snRNP but positions the U1 snRNP to the downstream 5′ss of the next intron
[91]. ATP-dependent RNA helicase DDX46 is required for the transition from complex E to the pre-spliceosome A, and facilitates conformational changes within the U2 and the interaction between the U2 and U1 snRNPs
[71]. DDX46 remodels the U2 snRNA, allowing its BSL to bind to the BPS in the intron in an ATP-dependent manner, where the adenosine in the YUNAY consensus sequence is excluded, which is important for later catalysis in the splicing reaction. In yeast, Prp39 anchors U2 snRNP to U1 snRNP by acting as a bridge between the U1C protein and U2 small nuclear ribonucleoprotein A′ (U2A′). In humans, an interaction between PRPF39 and SNRPC is also observed, but is not crucial for complex A formation
[98]. This is in line with the exon definition model, where the recruited U1 snRNP and U2 snRNP are to participate in splicing of different introns on either side of the exon. For splicing of transcripts with long introns, neighboring exons must be juxtaposed, existing U1 snRNP-U2 snRNP interactions across exons need to be broken; and new contacts spanning introns need to be established. This transition is still poorly understood, but the process is inhibited by hnRNPI. In the presence of hnRNPI, spliceosome assembly with U1 and U2 snRNPs recruited around exons stalls in an A-like complex
[99], showing that the transition occurs prior to U4/U6.U5 tri-snRNP recruitment. Recently, a model for early spliceosome assembly was proposed that unifies the intron definition and exon definition models
[94]. Based on cryo-EM analysis of in vitro assembled complexes E and A it was concluded that the same structure can be formed across either an intron or an exon. Structural constraints of complexes formed across short exons make it difficult for the U4/U6.U5 tri-snRNP to subsequently join the spliceosome. This is postulated to be a main trigger for remodeling U1 snRNP-U2 snRNP interactions into an intron-spanning complex, allowing further spliceosome assembly
[94].
The pre-activated spliceosome or complex B is formed when the U4/U6.U5 tri-snRNP is recruited. The U5-specific PRPF8 with its N-terminus is able to interact with EFTUD2, DDX23 and the U5 snRNA, which interacts with the pre-mRNA substrate
[100]. As was shown in yeast, Prp8/PRPF8′s C-terminus interacts in U5 with the N-terminal helicase domain of SNRNP200. SNRNP200′s C-terminus interacts with EFTUD2 and Ubiquitin Specific Peptidase 39 (USP39). Positioned at the interface of U4/U6 and U5 snRNPs, USP39 is crucial for the stability of the tri-snRNP
[101]. Moreover, USP39 is postulated to keep the SNRNP200 RNA helicase positioned away from the U4/U6 duplex, preventing premature unwinding of the U4/U6 snRNAs and thereby of spliceosome catalytic activity in the pre-catalytic stage
[78].
During the association of the tri-snRNP with the U2 snRNP, the U1 snRNP places its snRNA between the U4 snRNA and PRPF8, while the U1 SmE and SmG interact with U5-specific ATP-dependent RNA helicase DDX23
[100]. DDX23 unwinds the U1 snRNA:5′ss duplex, and is therefore required for B complex formation, as U6 snRNP replaces U1 snRNP at the 5′ss. Mutations in the DDX23 domain involved in ATP hydrolysis stall the spliceosome before complex B formation, in which U1 snRNP remains associated with the pre-mRNA and the tri-snRNP is not stably integrated yet
[102]. The U4 and U6 snRNAs partly form a duplex within the tri-snRNP, rendering U6 snRNA in its inactive configuration. EFTUD2 is also involved in the recruitment of the NTC and NineTeen/Prp19 complex related (NTR) complexes. Recruitment of the NTC and NTR induce conformational changes within the snRNPs necessary for the formation of the active site for the splicing reaction, such as basepairing of the ACAGAGA box of the U6 snRNA with the 5′ss
[103][104][105]. As was demonstrated in yeast, pairing of U6 snRNA with the 5′ss occurs prior to U4:U6 duplex unwinding within the embrace of PRPF8 and represents a checkpoint for proper complex B assembly
[105].
2.1.5. Dynamic Composition of the Spliceosome: Activation and Catalytic Steps
The unwinding of the U4/U6 snRNAs represents a checkpoint for complex B activation, creating complex Bact. To achieve this, USP39 dissociates, which repositions SNRNP200 and induces conformational changes in SNRNP200 that prompt its helicase activity. During activation of catalytic activity, additional conformational changes occur in the tri-snRNP complex. DDX23 migrates from the outer side of the tri-snRNP towards the RNase H domain of PRPF8 in the center of the complex where the 5′ss basepairing is switched from U1 to U6 snRNA
[78]. This results in the release of the U1 snRNP, thereby preventing steric clash of this snRNP with SNRNP200
[98]. This transition from U1 to U6 5′ss basepairing is further supported by the U4/U6.U5 tri-snRNP specific SNRNP27, as was demonstrated in
C. elegans [106]. Upon the 5′ss transition to the U6 snRNA, SNRNP200 unwinds the U4:U6 snRNA duplex resulting in the dissociation of U4 snRNP from the spliceosome. This allows the 3′ end of the U6 snRNA to basepair with the 5′ end of the U2 snRNA (forming helix I); and also to form a highly conserved internal stem loop (ISL) within the U6 snRNA
[107]. Meanwhile, PRPF8 undergoes rearrangements from an open to a closed conformation, as a pocket must be formed to harbor the newly formed U2:U6 duplex and the U5 snRNA SL1, which is necessary to form the active catalytic site
[78]. Both the helix I and ISL are involved in the coordination of catalytic metal ions. Overall, SNRNP200, EFTUD2 and PRPF8 are essential for the transition from the pre-catalytic B complex to the activated B
act complex. In this intermediate, the active site is cradled by PRPF8, consisting of helix I of the U2:U6 duplex, ISL of U6 snRNA, five Mg
2+ ions and SL1 of the U5 snRNA
[4]. The SL1 of U5 snRNA is basepaired with the 5′ exon
[77]. Moreover, within the active site, a triplex structure is formed by several nucleotides of the U6 snRNA
[77][108][109].
ATP-dependent RNA helicase-like protein Prp2/DHX16 promotes the transition from the activated B
act complex to the catalytically active B* complex
[110], through rearrangement of the U2 snRNP around the U2 snRNA:BPS duplex
[77][109]. Moreover, several nucleotides of the U6 snRNA are involved in the coordination of Mg
2+ metal ions via binding to their phosphate groups
[77][111]. Of these, two are directly involved in catalysis of the splicing reaction, and the other three fulfill more structural roles
[4]. The rearrangements around the U2 snRNA:BPS are supported by step I factors YJU2
[4][109][111][112][113] and CWC25
[4][113] and the presence of one of the two catalytic metal ions (M2) activates the 2′-OH BPS adenosine to perform step I of the splicing reaction; a nucleophilic attack on the phosphorous atom of the 5′ss G nucleotide in which the covalent bond between the 5′ exon and 5′ss is broken. A phosphodiester bond is formed between the BPS adenosine and the guanine of the 5′ss, resulting in an intron-3′exon lariat structure and a free 5′ exon, which remains anchored to loop I of U5 snRNA
[4][114] and is stabilized by Prp8
[112]. This represents the complex C spliceosome intermediate
[114][115].
Transition from step I complex C into the step II catalytically activated C* complex is facilitated by ATP-dependent RNA helicase DHX38, which triggers the release of step I factors
[116] and a conformational change in the Prp8 -encapsulated active site, leading to the replacement of the lariat by the 3′ss at the active site
[112]. The introduction of the 3′ss in the active site is stabilized by SLU7
[113][116]. During the second transesterification reaction, supported by Prp8/PRPF8, Prp17 and Prp18, the 3′-OH of the 5′ exon performs a nucleophilic attack on the phosphate of the 3′ exon
[114]. This results in a covalent bond between the two exons and an intron lariat still bound by spliceosomal components: the post-splicing complex P.
The exon junction complex (EJC) is formed over the ligated exons and connects splicing to other downstream mRNA processes, such as export, translation and NMD
[117][118]. The ligated exons are bound by U5 snRNA loop I and the 3′ end of the ligated exon pulled from the intron lariat and the spliceosome by the ATP-dependent RNA helicase Prp22/DHX8 while the intron lariat is released by the ATP-dependent RNA helicase Prp43/DHX15, giving rise to the intron lariat spliceosome intermediate and the spliced mRNA
[3]. In a final step, Prp43/DHX15 releases the U2, U5 and U6 snRNPs and the NTC and NRC from the intron lariat, facilitating the recycling of these spliceosome components into the next splicing reaction.