1000/1000
Hot
Most Recent
RNAs transmit information from DNA to encode proteins that perform all cellular processes and regulate gene expression in multiple ways. From the time of synthesis to degradation, RNA molecules are associated with proteins called RNA-binding proteins (RBPs). The RBPs play diverse roles in many aspects of gene expression including pre-mRNA processing and post-transcriptional and translational regulation. In the last decade, the application of modern techniques to identify RNA–protein interactions with individual proteins, RNAs, and the whole transcriptome has led to the discovery of a hidden landscape of these interactions in plants. Global approaches such as RNA interactome capture (RIC) to identify proteins that bind protein-coding transcripts have led to the identification of close to 2000 putative RBPs in plants.
DNA, the genetic blueprint of all organisms, controls all life processes through intermediate RNA molecules that dictate the types and levels of proteins made in cells. From the biogenesis to degradation of RNA molecules they are associated with many proteins. RNA–protein interactions are numerous, widespread, and play diverse biologically important roles in all organisms in many processes associated with gene regulation, including generation of coding and non-coding RNAs, transport, translation, and decay of RNAs, and control of diverse processes associated with development and disease. The proteins that interact with RNAs are collectively referred to as RNA-binding proteins (RBPs), a diverse class of proteins characterized by the presence of one or more RNA binding domains, usually alongside other catalytic or functional domains. Over 1800 candidate RBPs have been identified in plants, with over 800 enriched as RBPs in Arabidopsis [1]. Plant RBPs play diverse roles in growth, development, genome organization, stress response, immunity, mRNA processing, and post-transcriptional regulation [1][2][3][4][5][6][7]
RBPs rely on their RNA-binding domains to carry out their biological functions. Several of these classes of domains have been characterized, most notably the RNA-Recognition Motif (RRM), DEAD-box helicases, zinc finger domains, the K homology (KH) domain, the glycine-rich domains, the pentatricopeptide repeats (PPRs), and pumilio/fem-3 binding factors (PUFs) [1][7]. The RNA-binding domains allow RBPs to regulate many different processes: pre-mRNA splicing, pri-miRNA and pre-miRNA processing, polyadenylation, nuclear export, RNA stability, translation, RNA editing, etc. [2]. Moreover, many proteins identified as candidate RBPs lack classical RNA-binding domains, and there is even a high prevalence of metabolic enzymes identified as the RNA-interacting proteins, underscoring the complexity of RNA–protein interactions and the current gaps in understanding [8]. This is in accordance with results from mammalian mRNA-interactome studies, which revealed 23 distinct metabolic enzymes as RBPs [9]. Thus, it has been hypothesized that the moonlighting of metabolic enzymes as RBPs forms a regulatory link between cellular metabolism and RNA fate, known as the RNA-enzyme-metabolite (REM) hypothesis [9][10].
RNA-binding proteins have become a target of great interest in recent years, and many new methodologies have been developed to analyze the RNA–protein interactome. However, in plants, most research done in this field before ~10 years ago relied entirely on the use of indirect or in vitro methods to identify RNA and protein interaction, such as gel shift assay, mutant and knockout screening, nucleic acid-binding assay, and other classical genetics and cell biological techniques [2][11][12][13]. These techniques have contributed significantly to understanding the functions of RBPs in plant biology (see Section 3) but have since been superseded by the development of high throughput and global methods to analyze RNA and protein interactions. These new techniques were developed first in mammalian systems and a few have been used increasingly in plants. Below, we briefly describe these methods and their limitations, especially with respect to applying them in plants.
These techniques fall into three categories: (i) approaches that focus on identifying RNA targets of a candidate RBP, i.e., protein-to-RNA, (ii) approaches that focus on identifying the proteins interacting with an RNA of interest, i.e., RNA-to-protein, and (iii) global approaches (Figure 1). The vast majority of work that has been done in this field in plants has focused on the interacting partners of a single RNA or protein of interest (the bait), but recently the development of RNA-interactome capture (RIC) and its application to plants has allowed a global view of the plant RBPome.
Among the first techniques developed to identify direct targets of RBPs in vivo was RNA immunoprecipitation or RIP [15][16] (Table 1). The basic idea of the RIP approach is simple and involves the use of an antibody against a protein of interest (Figure 1, RIP-seq). The lysate of cells expressing the protein of interest is incubated with antibody immobilized on beads, which are then washed and the proteins on the beads digested. The pool of RNA remaining is used to identify putative binding RNA targets. With the development of high throughput sequencing technologies, methodologies that used such sequencing platforms became known as RIP-seq [17].
RIP can also involve RNA–protein crosslinking, creating covalent bonds between the protein and its RNA ligands. Reversible crosslinking is accomplished using formaldehyde and reversed via heat treatment [18]. The drawbacks of this approach are that the specificity of the results depends on the strength of the antibody–protein interaction, and that formaldehyde treatment also catalyzes DNA-protein and protein–protein crosslinking, leading to the identification of indirect as well as direct targets of an RBP.
Crosslinking and immunoprecipitation (CLIP) builds on RIP by replacing formaldehyde crosslinking with UV-crosslinking to covalently link proteins with RNA molecules within several angstroms distance (i.e., bound by the protein) (Table 1). The RNA–protein complexes are selected after cell lysis using immunoprecipitation [19]. Partial digestion of the bound RNA allows a rough approximation of the binding site, followed by phosphorylation of the complexes with radio-isotope. The covalently bound RNA–protein complex is then rigorously washed, separated via SDS-PAGE, and transferred to a nitrocellulose membrane. The protein is then removed using proteinase K, linkers are ligated to the collected RNA fragments, and the fragment library is cloned after reverse transcription and then sequenced (Figure 1, CLIP-seq). There are many derivative techniques based on the basic CLIP-Seq principle. High-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS-CLIP) in place of traditional sequencing, which placed a limitation on the richness of data that could be generated by a CLIP-Seq experiment, allows more data to be extracted from CLIP fragment libraries (Table 1). This allowed the identification of over 1000-fold more unique binding sites compared to CLIP with traditional sequencing techniques, although this leaves one with the opposite problem—a plethora of data to sift through and discern signal from noise [20]. CLIP-based methods were further improved with the advent of CLIP experiments using photoactivatable ribonucleosides (PAR) to enhance the efficiency of crosslinking. PAR-CLIP incorporates 4-thiouridine into transcripts in vivo, which forms covalent bonds with interacting proteins under UV far more efficiently than random UV RNA–protein crosslinking; the approach improved RNA recovery 100- to 1000-fold [21].
Thus far, CLIP techniques were limited by the fact that reverse transcriptase often terminates prematurely when met with a residual amino acid covalently bound to a nucleotide at a crosslinking site causing such reads to be lost during standard CLIP library preparation. Individual-nucleotide resolution CLIP (iCLIP) was developed to compensate for this problem [22]. iCLIP captures truncated cDNAs using a cDNA self-circularization step in place of the previously used inefficient RNA ligation step in library preparation [22]. CLIP experiments also suffered from high experimental failure rates due to their technical complexity, and enhanced CLIP (eCLIP) was developed to address these issues. eCLIP decreases the amount of amplification necessary and uses random-mer barcode adapters ligated at the termination site of reverse transcriptase (the UV crosslinked nucleotide) to maintain analysis of RBP binding sites. Furthermore, the protocol omits the radiolabeling step and uses a size-matched control without immunoprecipitation to eliminate non-specific RNA interactions from the datasets [23].
The CLIP technique was also simplified by the development of simplified CLIP (sCLIP), which avoids radiolabeling by biotinylating the RNA for visualization, and uses polyadenylation and random-mer barcoding to uniquely identify RNAs and reduce the requirement for PCR amplification [24]. Another technique designed to avoid the use of radiolabeling, termed irCLIP for its use of an infrared dye, also used biotin labeling of the RNA—a biotinylated and infrared dye-conjugated 3’ adapter was ligated to the RNA, allowing visualization of RNA–protein complexes without autoradiography [25]. irCLIP allows the use of 250 times less starting material compared to iCLIP, and although comparisons were not performed with eCLIP or sCLIP, it seems likely that irCLIP lowers the starting material requirement most significantly.
With the advent of HITS-CLIP, many computational tools were developed in order to handle the large datasets produced by HITS-CLIP experiments. One of the most widely used of these is known as dCLIP, a program created to allow comparison of differential binding in different CLIP experiments [26]. dCLIP normalizes CLIP-seq data from different experiments using an application of a Bland-Altmann plot called an MA plot, then uses a Hidden Markov Model to detect shared or distinct binding sites across experiments. dCLIP has the advantage of being a universal computational tool for all types of CLIP-seq experiments; HITS-CLIP, PAR-CLIP, and iCLIP, and to allow comparison among them [26].
CLIP-Seq and its derivatives are powerful techniques but have significant limitations. Namely, CLIP (and its derivative methodologies) are all limited by their reliance on the antibody–antigen interaction; this limits the stringency of washing conditions to those that will not disrupt the antibody–antigen interaction [27]. Thus, the acquisition of a high-affinity antibody is critical for such experiments, and generally cannot be guaranteed. Even meaningful CLIP experiments contain significant noise in the form of proteins that were not eluted under the weak washing conditions, or in the form of proteins that co-immunoprecipitated [28]. Furthermore, CLIP relies on radiolabeling of bound RNA, a prohibitive procedure due to its cost, difficulty, and health hazards [29]. Finally, because of the low efficiency of CLIP techniques, they require large amounts of starting material, on the order of thousands of cells. This means that studies of RNA–protein complexes in specific cell types (which cannot be amassed in the thousands) are forced to use starting material of a mixed population of cell types, lowering the signal to noise ratio in their results [30]. Other techniques, discussed below, have been developed to avoid these limitations.
UV-crosslinking and affinity purification (uvCLAP) was developed as a radiolabeling- and immunoprecipitation-free alternative to CLIP methodologies [29] (Figure 1, uvCLAP). Instead of using an antibody–antigen interaction, uvCLAP relies on the tight interaction of the His6-biotinylation sequence-His6 (HBH) tag with beads that bind polyhistidine-tagged proteins, and then with the even more stringent interaction with streptavidin beads. The RNA is partially digested with RNAseI and the RNA ends are repaired with T4 polynucleotide kinase. Adapters are then ligated to the RNA fragments and reverse transcribed with barcoded primers. The cDNA products are then separated on a polyacrylamide gel, circularized to capture truncated cDNA products (as in iCLIP), linearized, and amplified with PCR.
The use of tandem affinity purification in this approach allows confidence that pulldown efficiency will be similar across conditions, experiments, and laboratories, in comparison with immunoprecipitation approaches in which every antibody–antigen interaction has a unique affinity [29]. uvCLAP also allows the quantification of nonspecific background noise, increasing its specificity. The drawback of this approach is the need for a genetic transformation with an HBH-fused construct prior to affinity purification. Although it is relatively unlikely when done carefully, such transformations could potentially alter RNA–protein interactions from their natural state. Moreover, this introduces extra steps for each RNA-binding protein studied; the significance of this drawback will depend entirely on the ease of genetic transformation in the model system being used.
The TRIBE (targets of RNA-binding proteins identified by editing) and HyperTRIBE approaches were developed in response to the severe limitations of CLIP-based techniques in identifying cell type-specific RNA–protein interactions [30]. TRIBE was developed first; it uses the RNA-editing enzyme ADAR’s (adenosine deaminase acting on RNA) to convert adenosines to guanines, leaving telltale signals in edited RNA (Figure 1, TRIBE/HyperTRIBE). In this approach, ADAR’s double-stranded RNA-binding motifs are replaced with the sequence of an RNA-binding protein of interest to create a fusion protein that targets ADAR’s RNA-editing activity to the RNA targets of the fused RBP. The RNA is sequenced, and detection of editing events indicates the binding of the fusion protein, and thus the RBP of interest.
The original TRIBE technique had the opposite problem as most CLIP experiments: it identified only about 25% of the target RNAs identified by CLIP techniques for the same RBP, and is thought to have had a false negative problem, rather than CLIP’s false positive problem. It was found that ADAR’s editing rate was low due to a sequence specificity for UAG and a double-stranded structure surrounding the edited adenosine [31].
To compensate for these weaknesses, hyperTRIBE was developed by introducing the E448Q mutation in ADAR, which lowers ADAR’s sequence and structure preferences and increases editing efficiency [31]. This mutation increased the number of detected editing events by over 20 times, while increasing the number of detected edited transcripts by 8 times. HyperTRIBE is able to identify about two-thirds of CLIP-identified target RNAs.
This approach has the advantages of avoiding the use of immunoprecipitation and radiolabeling, requiring only a small amount of starting material, and being simple. Like uvCLAP however, it also requires genetic transformation, and in comparison to both uvCLAP and CLIP techniques, has the drawback of providing no information as to the specific binding site on the RNA (as ADAR edits sites within up to 500 nucleotides of known CLIP sites). CLIP remains the method of choice if information about an RBP’s binding site on an RNA is desired, whereas HyperTRIBE is desirable if interested in RNA–protein complexes in specific cell types or if only small amounts of starting material are available [31].
Among the methods described above, the only protein-to-RNA techniques that have been used in plants to-date are RIP-seq and CLIP-seq. As discussed below, the application of these techniques to several RBPs has revealed their role in several processes.
RIP-seq was used to demonstrate that the Arabidopsis Serine- Arginine-rich (SR) protein SR45 directly or indirectly associates with over 4000 RNAs in vivo, regulating constitutive and alternative splicing, post-splicing processing of 30% of ABA signaling genes, and over 300 intron-less RNAs [32] (Table 2). This indicates that SR45 exerts multimodal influence over mRNA processing, differentially regulating intron-containing and intron-less RNAs. The action of SR45 is defined by cis-elements in its RNA targets; four motifs were identified, two of which bear the hallmarks of exonic splicing regulators and two which showed peaks in the intronic regions of 5’ and 3’ splice sites. One of these motifs (M1; GAAGAA) was also found to be enriched in SR45’s intron-less targets [32]. Another study found 1812 RNAs associated with SR45, 81 of which were subject to alternative splicing mediated by the GGNGG motif in both activation and repression of splicing events [33]. These results further define SR45 as a splicing regulator whose activity cannot be easily defined as a positive or negative regulator, possibly explained by the fact that SR45 itself is alternatively spliced and its splice isoforms display differential expression. SR45 produces two splice isoforms, SR45.1 (long) and SR45.2 (short), the long isoform acting as a positive regulator in the salt stress response in Arabidopsis [34]. In rice, SR45 is stabilized through interactions with an immunophilin (OsFKBP20-1b), which plays an essential role in a positive regulation of transcription and splicing of stress response genes during abiotic stress [35]. THO2, a member of the Transcription-Export (THO/TREX) complex, was shown via RIP to participate in the generation of microRNAs; THO2 mutants showed both a decrease of miRNA accumulation and alterations in the splicing patterns of SR proteins, suggesting that the THO/TREX complex plays a role in alternative splicing [36].
RBP | Plant System | Method | Number of RNA Targets | References |
---|---|---|---|---|
AGO4 | Arabidopsis thaliana | RIP | 2 | Wierzbicki et al., 2009 |
AtGRP7 | Arabidopsis thaliana | RIP-seq/iCLIP | 452/858 | Streitner et al., 2012; Meyer et al., 2017 |
AtNSRa | Arabidopsis thaliana | RIP-seq | >2000 | Bardou et al., 2014; Bazin et al., 2018 |
AtNSRb | Arabidopsis thaliana | RIP-seq | >2000 | Bardou et al., 2014; Bazin et al., 2018 |
CPsV 24K (viral) | Nicotiana benthamiana | RIP | 2 | Marmisolle et al., 2018 |
CPsV 24K (viral) | Nicotiana benthamiana | RIP | 2 | Marmisolle et al., 2018 |
CSP1 | Arabidopsis thaliana | RIP-chip | >6000 | Juntawong et al., 2013 |
IDN2 | Arabidopsis thaliana | RIP | 1 | Zhu et al., 2013 |
FCA | Arabidopsis thaliana | RIP | 1 | Tian et al., 2019 |
HLP1 | Arabidopsis thaliana | HITS-CLIP | >5000 | Zhang et al., 2015 |
KTF1 | Arabidopsis thaliana | RIP | 1 | He et al., 2009 |
NSF | Oryza sativa | RIP | ? | Tian et al., 2020 |
PUMPKIN | Arabidopsis thaliana | RIP-seq | 5 | Schmid et al., 2019 |
PDM1 | Arabidopsis thaliana | RIP | 1 | Yin et al., 2012 |
Rab5a | Oryza sativa | RIP | ? | Tian et al., 2020 |
RBP-L | Oryza sativa | RIP | ? | Tian et al., 2020 |
RBP-P | Oryza sativa | RIP | ? | Tian et al., 2020 |
SR45 | Arabidopsis thaliana | RIP-seq | >4000/>1800 | Xing et al., 2015; Zhang et al., 2017 |
THO2 | Arabidopsis thaliana | RIP | 6 | Francisco-Mangilet et al., 2015 |
RIP was used to show that the glycine-rich RBP AtGRP7 modulates alternative splicing in Arabidopsis [37]. A later study using both RIP-seq and iCLIP found 452 (RIP-seq) and 858 (iCLIP) RNA targets of AtGRP7 [38]. AtGRP7 alters the circadian regulation of its targets and seems to act in both alternative splicing and alternative polyadenylation (APA) [38] (Table 2).
Nuclear speckle RNA binding proteins (NSRs) have also been shown via RIP-seq to regulate mRNA processing, alternative splicing, and long noncoding RNA (lncRNA) prevalence [39] (Table 2). An NSR and an alternative splicing competitor (ASCO) lncRNA were shown to form a regulatory module of alternative splicing, in which the ASCO displaces an alternative splicing target from an NSR complex to modulate alternative splicing during development [39]. NSRs affected alternative splicing of hundreds of genes in Arabidopsis, and RIP-seq of an NSRa fusion protein showed that lncRNAs are also targets of NSRs, likely modulating their alternative polyadenylation or splicing as observed with the COOLAIR lncRNA to regulate cross-talk between auxin and immune response [40].
HITS-CLIP was used to identify genome-wide targets of HLP1, an hnRNP A/B protein that binds preferentially to A- and U-rich elements around cleavage and polyadenylation sites of transcripts involved in RNA metabolism and flowering to target APA [41] (Table 2). HLP1 suppresses Flowering Locus C (FLC) to release repression of flowering in Arabidopsis and control reproductive timing [41]. NSR knockout mutants showed modified APA and differential expression of the lncRNAs COOLAIR, produced from antisense transcripts generated from FLC, and function in the release of repression of flowering through suppression of FLC [40].
Using RIP-seq, the pentatricopeptide repeat protein PDM1 was shown to mediate cleavage of a transcript from polycistronic to monocistronic fragments in chloroplasts of Arabidopsis [42] (Table 2).
In rice, RIP-seq was used to show that RNA-binding protein-P (RBP-P) is an RNA-binding protein that plays a role in endosomal trafficking of glutelin and prolamine mRNAs, working to anchor the RBP-bound mRNAs to the endosome via the quaternary complex and transport it to the ER Subdomain for translation, coopting endosomal trafficking [43] (Table 2). RBP-L, an interacting partner of RBP-P, likely plays a coordinating role in subcellular trafficking of its mRNA targets, mediated by its 3’ UTR [44] (Table 2).
In Arabidopsis, a unique combination of RIP and microarray approaches (RIP-Chip) was used to demonstrate that the cold shock protein 1 (CSP1) acts as an RNA chaperone of polysomes to improve the translation of RNA targets at low temperatures [45] (Table 2).
The RNA-directed DNA methylation effector KTF1 was identified via RIP as an RBP that binds Pol V scaffold transcripts to recruit argonaute 4 (AGO4) and its siRNAs for chromatin remodeling-mediated gene silencing [15] (Table 2). AGO4 and RNA polymerase V cooperate with 24 nt siRNAs in this process; siRNAs bound to AGO4 guide AGO4 to target loci through complementary base-pairing with nascent Pol V transcripts, where AGO4 recruits DNA modification factors such as DNA methyl-transferase DRM2 to methylate the chromatin and thus silence the affected genes [15][45] Based on RIP observations that the protein INVOLVED IN DE NOVO 2 (IDN2) is a lncRNA-binding protein that interacts with the SWItch/Sucrose Non-Fermentable (SWI/SNF) nucleosome remodeling complex, lncRNAs are thought to base-pair with siRNAs bound by AGO4 to position the SWI/SNF complex and thus target nucleosome remodeling, leading to decreased transcription by Pol II [46][47] (Table 2). RIP was also shown to be usable in Arabidopsis for the detection of lncRNAs generated by specialized polymerases [48].
A modified RIP-seq assay was developed for the detection of RNAs of heterologous origin in plants and applied to transiently expressed nuclear epitope-containing proteins in Nicotiana benthamiana, but to-date this method has not been used for its intended purpose of detecting viral RNAs in plant cells [49].
The plastid UMP kinase (PUMPKIN) has been shown via RIP-seq to associate with several RNAs in vivo, altering their metabolism thereby [50] (Table 2). This suggests that while PUMPKIN is primarily a metabolic enzyme, it may have a moonlighting function as an RBP, potentially for the purpose of coupling RNA and pyrimidine metabolism [50].
Despite the breadth of techniques available for use in elucidating RNA–protein interactions in vivo, CLIP and its derivatives remain the most tenable non-global approach for use in plants. RIP has also been used extensively and is suitable for certain experimental purposes. There still remain several techniques used in other organisms to probe interactions between a protein of interest and RNAs that have yet to be successfully adapted, or even tried, in plants. These are opportunities for advancement in plant RNA biology, but if adapted into plants should be modified to include the best features and optimizations of the already-proven RIP and CLIP approaches.
Several of these techniques show particular promise; TRIBE, and particularly HyperTRIBE, have not been used in plants as yet, but if viable would overcome the signal to noise ratio problems inherent in CLIP. HyperTRIBE outperforms CLIP when using a small amount of starting material, such as a few cells of homogenous origin. Unfortunately, techniques used to select cells of a single type from a heterogeneous sample in mammalian systems, such as flow cytometry, are not tenable in plants without significantly altering the cell state (i.e., generating protoplasts by degrading the cell wall) [51]. Laser microdissection of plant tissues seems the most viable route for selecting cells of a particular type in plants, and HyperTRIBE would allow the use of smaller amounts of starting material than were previously used for RNA-Seq after laser microdissection [52]. Focus on single cell-types is a necessary next step for plant biology to throw off the albatross of whole-plant and tissue heterogeneity, and HyperTRIBE combined with laser microdissection would represent progress toward that goal in the field of RBPomics (Figure 2). However, laser microdissection requires a somewhat more extended time between sample harvesting and freezing due to the fixation step, which could result in increased RNA degradation after harvesting. Even so, transcriptional profiling has been performed successfully using cells harvested via this technique [52].
RNA antisense purification mass spectrometry (RAP-MS) is a technique used to purify long noncoding RNAs and their interacting proteins with complementary, tiled, biotinylated DNA probes bound to magnetic streptavidin beads [53] (Figure 1, ChIRP-/RAP-MS). RAP-MS starts with UV crosslinking of RNA to interacting proteins in vivo. The crosslinked RNA–protein complexes are then extracted under denaturing conditions to disrupt non-covalent interactions, and the complexes are hybridized with ~120 nt biotinylated DNA probes bound to magnetic beads. After washing, the RNA is digested, and the protein pool is analyzed using mass spectrometry (MS). This method also uses stable isotope labeling by amino acids in culture (SILAC) to label proteins, allowing quantitative comparisons to be made with mass spectrometry [54].
Comprehensive identification of RNA-binding proteins by mass spectrometry (ChIRP-MS) is a related technique predating RAP-MS by several years [55]. It also uses tiled biotinylated DNA probes bound to magnetic streptavidin beads and RNA–protein crosslinking, although the probes used were only 20 nt in length and formaldehyde crosslinking was chosen instead of UV crosslinking. The use of formaldehyde crosslinking has the advantage of being reversible, and ChIRP-MS studies are able to reverse crosslinking while keeping both protein and RNA components intact and allowing further analyses on both [56]. However, formaldehyde crosslinking also catalyzes the crosslinking of protein–protein and protein-DNA interactions.
The technique known as PIP-Seq has been used successfully to elucidate important RNA–protein interactions governing the differentiation of root hair cells [57]. PIP-seq identifies RNA–protein interactions with precise RNA binding sites when paired with a technique capable of identifying individual interacting RBPs. PIP-seq uses formaldehyde crosslinking to covalently bond RNA to interacting proteins, followed by high-throughput sequencing. The sample is split into a matrix of four: one sample with RBPs intact treated with single-stranded RNA nuclease (ssRNAse), one without RBPs treated with ssRNAse, one with RBPs treated with double-stranded RNA nuclease (dsRNAse), and one without RBPs treated with dsRNAse. The use of ss- and dsRNAse in the presence and absence of binding RBPs allows both RNA structure and RBP protection (and thus binding) to be predicted [57].
Recently, a CRISPR-based system called CRUIS (CRISPR-based RNA-United Interaction System) was developed in mammals [58]. CRUIS uses transient expression to couple the RNA-tracking capabilities of dCas13a with a fused proximity protein, Pafa, which labels surrounding RNA-binding proteins. These labeled proteins can then be identified via mass spectrometry. CRUIS was shown to be roughly as efficient as CLIP and identified novel protein targets [58]. The advantage of this technique is that it captures truly in vivo interactions without the potential for spurious interactions to form during lysis and wash steps, but it remains to be seen whether it has a false positive problem. The Pafa proximity labeling protein lacks the specificity of UV crosslinking for angstrom-level RNA–protein interactions, potentially leading to the labeling of indirectly interacting proteins.
There are a number of RNA to protein methods that are useful for in vitro studies but are not applicable to in vivo work. Among these is the labeling of RNA with small molecules [59]. In this RNA to protein approach, small molecules are covalently bonded to an RNA of interest in vitro, then incubated with cell lysate and pulled down using an immobilized receptor for the small molecule ligand (Figure 1, small molecule labeling in vitro). Common forms of this technique include biotin labeling, desthiobiotin labeling, and digoxigenin labeling. Unfortunately, because of the chemical reactions necessary to label an RNA of interest, small molecule RNA labeling is usually not appropriate for in vivo studies.
Another exclusively in vitro approach is nucleotide substitution in RNA [59]. Here, RNA is transcribed in vitro in the presence of a heavy metal-modified dNTP, incorporating the modified nucleotide into the transcript. Immunoprecipitation can then be carried out using an antibody against the modified nucleotide (Figure 1, nucleotide substitution in vitro). The drawback of this approach is that the charge of the heavy metal-modified nucleotide can strongly affect the charge distribution, structure, and protein binding of the RNA of interest.
Whereas the uvCLAP approach uses modifications to the protein primary structure, RNA aptamer pulldown (also known as tandem repeat affinity purification mass spectrometry, or TRAP-MS) uses modifications to the RNA primary and secondary structures, followed by tandem affinity purification [59]. RNA aptamers are short oligonucleotide sequences that reliably assume a secondary structure under physiological conditions, which tightly interacts with a target molecule—the ligand. The affinities of these interactions can be equivalent to or greater than those of antibody–antigen interactions [60][61][62][63]. An RNA aptamer is introduced into an RNA of interest either in vitro or in vivo, the lysate is passed over a column containing immobilized ligand, washed, and ribonucleoprotein complexes are eluted. Interacting proteins are identified via mass spectrometry. This, like RAP-MS/ChIRP-MS, is one of the few in vivo methods to identify ribonucleoprotein complexes in the RNA-to-protein direction.
There are many well-studied RNA aptamers used for such studies; some of the most commonly used are the PP7, S1, D8, tobramycin, streptomycin, Csy4 (H29A), Mango, and MS2 aptamers [59][60][61][62]. Only the MS2 aptamer will be discussed in detail here. This aptamer exploits the tight, highly specific interaction between the coat protein (MCP) of the bacteriophage MS2 and a 19nt RNA hairpin structure from the bacteriophage’s genome, which the virus presents on the surface of its genome to assemble its coat protein [60]. Repeats of the MS2 hairpin structure are inserted at the 3’ end of an RNA of interest, while a fusion protein of MCP and maltose-binding protein (MBP) is immobilized on amylose beads. After pulldown, the protein-RNA-MCP-MBP complex is eluted using excess maltose, which MBP binds preferentially (Figure 1, TRAP-MS). RNA aptamer pulldown has the disadvantage of requiring genetic transformation, which may alter the structure of the RNA of interest and thus distort the pool of RNA binding proteins associated with it. Furthermore, the presence of the RNA aptamer may risk aggregation.
Two other RNA-to-protein techniques were developed in the last year in non-plant systems. One of these methods targets engineered peroxidase (APEX) with MS2 or Cas13 to a specific RNA. APEX targeting uses either the MS2-MCP interaction or an engineered CRISPR-Cas13 interaction to target the biotinylation activity of APEX2 to proteins proximal to target RNAs in vivo [64]. After rapid, one-minute biotin labeling, cells are lysed and pulled down using streptavidin beads. Isolated proteins are identified using liquid chromatography-mass spectrometry (LC-MS). This method was based on the RNA proximity biotinylation (RNA-BioID) and APEX RNA immunoprecipitation (APEX-RIP) approaches. RNA-BioID uses MCP to target a biotin ligase (BirA*) to an MS2-tagged RNA of interest [65]. APEX-RIP uses the promiscuous engineered peroxidase APEX2 expressed by live cells to target cellular components of interest and biotinylate proximal proteins during a short pulse of treatment with hydrogen peroxide and biotin-phenol [66]. Following biotinylation, labeled proteins are crosslinked to proximal RNAs using formaldehyde and pulled down using streptavidin beads, along with co-eluting RNAs. APEX targeting improves on BioID by decreasing the amount of time necessary for biotin labeling [66]. Although it is claimed [66] that APEX2 does not label distal proteins due to the short half-life of the biotin-phenoxyl radical it generates, it is unknown whether APEX2 may label proteins interacting indirectly with the target RNA. Compared to crosslinking, which establishes a hard limit on the distance of RNA–protein interaction, this may raise a concern of false positives when using APEX targeting.
The second method is called CRISPR-assisted RNA–protein interaction detection (CARPID). This method was also inspired by APEX-based approaches but uses the engineered biotin ligase BASU instead of APEX2 [67]. Using a nuclease-activity-free RNA targeting dCasRx to tether BASU to RNAs of interest, CARPID labels interacting proteins via biotinylation, followed by pull-down with streptavidin beads [67]. This method was able to identify RBPs interacting with lncRNAs but requires a longer labeling period as compared to APEX targeting.
Perspective on the Use of RNA-to-Protein in Plants
There is much room for improvement in the RNA-to-protein direction, particularly considering that none of these techniques have been used in plants to-date. ChIRP-MS in particular would be an attractive technique to attempt in plants for the following reasons: it avoids the need for antibody generation used in RIP and CLIP, it does not use radiolabeling, it permits denaturing conditions and stringent washes, and it does not require genetic transformation. However, as previously described it cannot provide any information regarding the binding site of an RBP.
RNA aptamer-mediated pull-down techniques could also be an area for advancement. These approaches do require genetic transformation and could potentially result in altered RNA secondary structure (depending on the aptamer used), but their potential to exceed the antibody–antigen affinity limitations and avoid the antibody generation variabilities of CLIP makes them attractive nevertheless. However, because most of the annotated RNA aptamers in use rely on the binding capabilities of partner proteins (such as the MS2 stem loop’s binding by the MS2 viral coat protein MCP), their use limits the stringency of washing conditions; denaturing conditions cannot be used during incubation and washing to prevent the formation of post-lysis ribonucleoprotein complexes without denaturing the aptamer’s binding partner and thereby compromising pull down. Of those that do not rely on protein partners, few match the affinity granted by the polyA-oligo(dT) interactions used in other techniques, such as RNA-interactome capture.
It might be advantageous to develop nucleotide-nucleotide RNA aptamers to increase the binding affinity, such as by applying a split RNA aptamer. Split aptamer approaches involve separating out an existing aptamer, such as an RNA that forms a tight stem-loop secondary structure, into two fragments that tightly interact in the presence of a ligand; thus, one fragment of the aptamer is appended to a transcript of interest, and the second is immobilized on a nonreactive bead. For example, the cocaine aptamer has successfully been split and used as a biosensor [68]. Although its target as a biosensor is cocaine, the split aptamer actually shows 30- to 50-fold greater affinity for quinine over cocaine, binding at an affinity of 7 ± 4 nM [68][69]. Use of the cocaine aptamer in the presence of quinine during pull-down could potentially tolerate extremely stringent washing conditions. For a summary of these suggested techniques, see Figure 2. Proximity biotinylation-based methods, such as APEX targeting and CARPID could theoretically be used in plants and would be of interest due to their status as RNA-to-protein methods with some modifications.
Until very recently, there was only one currently available global approach to capturing the plant RBPome, called RNA-interactome capture (RIC). RIC uses techniques common to directed RNA–protein interaction studies, beginning with the UV-crosslinking of interacting proteins to their partner RNAs as in CLIP and PAR-CLIP. The cell lysate is then passed over oligo(dT)-magnetic beads under denaturing conditions to pull down polyA RNAs and the denatured proteins covalently bonded to them. After stringent washes to elute any non-covalently interacting proteins, the RNA is enzymatically digested and the protein sample is subjected for proteomics via mass spectrometry [70] (Figure 3). This technique is powerful but limited by its restriction to polyA RNA.
The RIC technique was adapted into plants several years ago by a trio of studies using cell suspension cultures, seedling leaves, leaf mesophyll protoplasts, and etiolated whole seedlings [8][71][72]. These studies identified between 300 and 1200 RBPs, all showing enrichment of proteins containing canonical RNA-binding domains. They also all identified a significant proportion of proteins lacking a canonical RNA-binding domain and playing no known role in RNA biology, underscoring how poorly described RNA–protein interactions are in plants. Finally, all three studies found significant proportions of enzymes involved in intermediate metabolism making up the RBPome, suggesting that the RNA-enzyme-metabolite hypothesis may be a valid consideration in plants as well as mammals.
These studies provide us with a perspective on heterogenous plant samples grown under normal conditions and provide a baseline against which future studies may compare the results of experiments using the array of sample types described. Since their publication, the RIC method has been applied to Arabidopsis cell cultures grown under drought stress (using PEG to simulate drought conditions in culture) to identify 150 RBPs responsive to drought stress [73]. Similarly, RIC was used to probe modifications of the spliceosome and its RBPs in response to drought, identifying 44 spliceosomal proteins and 32 proteins associated with stress granules [74]. Like the previous studies, this work identified many metabolic enzymes interacting with RNA, comprising proteins involved in carbohydrate metabolism and the glycolytic and citric acid pathways. Recently, several optimizations of the RIC protocol—deemed enhanced RNA interactome capture or eRIC—were described, but these modifications have yet to be applied to plants [75] (Figure 3). Separately, RIC has been optimized for leaf tissue (termed plant RNA interactome capture or ptRIC) by adjusting UV conditions, irradiating both adaxial and abaxial surfaces of leaves, increasing the stringency of washing conditions, and shearing genomic DNA by passing the RNA-loaded beads through a narrow needle [76] (Figure 3). It remains now for RIC or its derivatives to be used to view the changes of the RBPome in response to biotic and abiotic stresses beyond drought.
Very recently, a new method for the identification of RNA–protein interactions has been adapted from bacterial and mammalian systems, known as orthogonal Organic Phase Separation, or OOPS. This method uses UV-crosslinking, similar to other techniques, and acidic guanidiniumthiocyanate-phenol-chloroform (AGPC) phase separation to collect RBPs at the interface between the aqueous and organic phases [77]. OOPs has the advantage of being simpler than many other techniques and of not requiring mRNA pulldown, thus capturing RBP interactions with all types of RNA rather than solely coding RNAs. OOPS was applied in Arabidopsis to identify 468 RBPs, 232 of which were enzymatic putative RBPs [78].