1. Experimental Approaches to Dissect the RBPome
1.1. Protein-Centric
Protein-centric methods aim to identify RNAs that bind to proteins of interest
[14][1]. Interacting RNAs that bind to the protein of interest are firstly reverse transcribed into cDNA, PCR amplified, and sequenced
[14][1], and further bioinformatics tools are used to identify the RBP binding sites
[15][2]. A breakthrough in protein-centric methods occurred with the development of cross-linking immunoprecipitation (CLIP)
[16][3], and some CLIP variants have been employed in
Arabidopsis to identify the mRNA targets
[14,17][1][4]. Recently, a new protein-centric method, Hyper Targets of RNA-Binding Proteins Identified by Editing (HyperTRIBE) was developed to identify RBPs targets in plants
[17][4]. Compared to RIP-seq and CLIP, HyperTRIBE is more efficient for the small number of samples
[17][4].
To overcome the drawbacks associated with the “native purification methods”, a “denaturing method”has been developed. In this method, RNAs are cross-linked with the RBPs by different UV cross-linking rays
[14][1]. In this treatment, both RNA and DNA absorb the 254 nm UV light efficiently and are excited to the higher energetic states S1 and T1, respectively
[18][5]. Upon UV cross-linking, a specific type of physical bond is formed between RNAs and proteins. UV cross-linking has several advantages and disadvantages. Its advantages include the interactions of frozen RBPs
[18][5], the zero-distance interaction between RNA and RBPs
[19][6], and the formation of a stable covalent bond which is retained while washing under stringent conditions
[18][5]. Disadvantages are the low binding efficiency (only 5%)
[18][5], interactions of a single transcript with multiple proteins
[20][7], low UV cross-linking efficiency in the presence of tissues, turbid liquid cultures
[20[7][8],
21], and the formation of unnecessary interactions such as RNA–RNA, RNA–DNA, and protein–protein interactions
[18][5]. However, the UV cross-linking inefficiency can be circumvented by optimizing the UV cross-linking rays
[22][9]. Usually, a short wavelength UV light is utilized to ensure the efficient cross-linking of RNA with RBPs.
1.2. RNA-Centric Approaches
RNA-centric approaches are used to identify proteins that bind with the RNA of interest
[11][10]. Generally, most of the currently available methods use tagged RNA as bait to capture the cross-linked RBPs, and the identified RBPs are characterized using mass spectrometry
[23][11]. RNA-centric approaches are categorized into two main variants: in vitro and in vivo variants.
1.2.1. In Vitro Methods
The in vitro approach involves the biosynthesis of RNA bait, the binding of the tagged-RNA to resin, the formation of a complex of ribonucleoproteins (RNPs), and the washing, purification, and elution of the RBPs that are bound with the tagged RNA
[36,37][12][13]. The in vitro method is sensitive to a few challenges: in vitro transcribed RNAs have different structures and modifications from normal RNAs, the association of RBPs with RNA which might not occur due to the lack of posttranslational modifications, and the formation of an anomalous complex such as heterogeneous nuclear ribonucleoproteins (hnRNPs)
[38][14].
The major drawback of the in vitro RNA-tagged method is the alteration of the secondary structure of RNA due to its interaction with the labeling dyes
[39][15]. For RNA labeling, the used chemicals include biotin, fluorescent dyes, digoxigenin, and many unlisted compounds
[11,39][10][15]. The common RNA labeling is biotinylation, in which the 5′ or 3′ end of RNAs are biotinylated based on the “RNA pull-down method”
[40][16]. Further, upon the addition of streptavidin beads, the biotinylated RNA bound to proteins in the cellular extract forms an immobilized complex. Subsequently, RNA-bound beads are washed and boiled to remove the non-specific RNA–protein interactions
[41,42][17][18].
To circumvent the drawbacks of different dyes, numerous natural and artificial aptamers are used to increase the affinity of RNA with the proteins
[11,41][10][17]. Streptavidin-binding aptamer (S1 aptamer) tags have emerged as useful tools to identify the specific RNPs
[41][17]. These aptamers have a high affinity toward immobilized streptavidin beads and are highly stable even in the presence of high salt conditions (400 mM NaCl)
[41][17]. After S1 aptamer tags bind to streptavidin beads, biotin is added to bind with streptavidin beads, and the binding elute RNA is tagged by the S1 aptamer from the cellular extract
[43,44][19][20]. Doudna and colleagues used cys4 endoribonuclease
[18] (Figure 3b) [42] to isolate the RNPs accurately. The incorporation of cys4 endoribonuclease makes a sticky interaction between RNPs and with the tagged cys4 hairpin loop, facilitating the cleavage of RNPs. Further addition of imidazole allows the cys4-endoribonuclease to break the cys4 hairpin loop and liberate the RNA–protein complex with a high specificity
[11,45][10][21].
Protein microarray is also an alternative in vitro approach
[11][10]. In this approach, cy5 dye is used to label the RNA of interest and is followed by RNA hybridization with the recombinant proteins. The major drawback of this approach is the changes in folding and the posttranslational modifications of recombinant proteins, and the artificial concentration of proteins may distort the interaction
[11][10].
Generally, in vitro methods work specifically and efficiently for the individually known RNAs. However, it is difficult to know the insertion place of natural and artificial aptamer tags without any structural information about the RNA of interest, and the insertion of tags changes the RNA structure. Moreover, these aptamers are not resistant to endonucleases which could reduce the lifetime and recovery rate of RBPs
[43,46][19][22]. The addition of different dyes and aptamers tags distorts the chemical properties of RNA–protein interactions. To alleviate the challenges caused by dyes and aptamers, F. Ataide and his colleague developed the “Antisense RNA capture” method to isolate and identify the RNP complexes. The streptavidin–biotin interaction is employed to immobilize the affinity-tagged antisense oligonucleotides, and later the RNA of interest is hybridized with the antisense oligonucleotide; thus, the associated protein complex is isolated. Owing to the high stability and strong binding and hybridization with the RNA, this method is used
tfo
study r various complexes of RNPs, including snRNA and telomerase RNA–protein complexes
[39][15]. In this method, there is no need to label the bait RNA or the RNA of interest. However, it is a very challenging task to design the antisense oligonucleotide to detect the RNA of interest
[44][20].
1.2.2. In Vivo Method
The in vivo method isolates and identifies the RNA–protein complexes inside the cell and retains the integrity of RNA–protein complexes by developing a strong covalent bond between them. The in vivo method has two variants depending on whether UV cross-linking is needed
[11,45,46][10][21][22]. The UV-crosslinking method purifies the RBPs inside the cell under denatured conditions, and can remove non-specific or non-covalent bonded proteins
[47][23]. UV cross-linking can only bind RNA and proteins at zero distance
[48][24]. Short wavelength UV radiations (254 nm) develop a strong covalent bond between the RNA and the protein
[43][19]. Two research groups employed the UV-based RIC approach to identify the mRNA-bound proteome from human embryonic kidney cells and human HeLa cells. In plants, the method was used for the identification of RBPs in
Arabidopsis [49][25]. However, low RNA abundance is a big challenge in the identification of RNA–protein interactions
[1,14,50][1][26][27]. Moreover, RIC neglects non-coding RNAs
[51][28].
The other variant of the in vivo method is formaldehyde cross-linking, which links together the macromolecules within 2 Å. Formaldehyde cross-linking can form protein–protein, RNA–protein, and DNA–protein interactions, and works efficiently on cells, tissues, and even whole organisms
[47][23]. Formaldehyde crosslinking has different biases: nucleophilic lysine residues are strongly prone to form cross-linkages
[52][29]; promotes nonspecific interaction such as DNA–proteins and protein–protein
[53][30]; low cross-linking efficiency requires a significant number of cells (~10
8−10
9)
[54][31]; many proteins can bind with the same RNA transcript
[14][1].
There are RNA-centric variants that do not need UV and chemical-based cross-linking. “Promiscuous” biotin ligase (BioID) is one of these RNA-centric variants
[50][27]. In this method, biotin is converted to reactive biotin-5-AMP, an intermediate that covalently labels the targeted protein and any nearby proximal proteins
[55][32]. Biotin-5-AMP possesses a quenching behavior and becomes reactive within a distance of 20 nm to its point of release, and labels all of the nearby proteins
[36][12]. Before applying this method, the RNA of interest is tagged with BoxB aptamers for recruiting the RaPID (LN-HA-BirA*) fusion protein. The linked BoxB aptamer with the fusion protein not only biotinylates the targeted protein, but also the nearby proteins proximal to the RNA. Later, streptavidin beads are used to isolate the biotinylated RNPs for further proteomic analysis
[11][10]. Despite cons such as simplicity and timesaving, BioID has a few pros including the BoxB site being proximal to RNA of interest; the artificial expression of bait RNA by plasmid transfection; and the formation of complex structures due to RNA folding. It needs to be careful about the positioning of the BoxB aptamer in the case of longer RNA species because it works efficiently for shorter (≤132 nt) RNA motifs
[11][10].
Despite the availability of several RNA-centric approaches, only the RIC RNA-centric approach was employed in
Arabidopsis [22,37,49,56,57,58,59,60][9][13][25][33][34][35][36][37].
2. In Silico Approaches
Both protein-centric and RNA-centric approaches are useful for the identification of RBP in humans, yeast, and plants
[49][25]. However, many of these approaches are time-consuming, costly, and uncontrollable
[39][15]. In silico approaches arise with the accumulation of a large amount of public protein data. In silico approaches use computational methods for the annotation and elucidation of the RNA–protein complexes
[39][15].
Mainly, computational methods are categorized into two categories: template-based and machine learning methods. Template-based methods, initially, find sequence similarity between query and template (known to bind RNA) for assessing the RNA binding preference of the protein sequences, whereas the machine learning method creates predictive models that can find a pattern in the input feature space to score the probability of the RNA-binding preference. Various features and algorithms are used in the machine learning approach for deciphering the RNA–protein interactions
[61,62][38][39]. Some commonly used approaches have been discussed in detail for the identification of RBPs. AIRBP is one advanced machine learning approach
[39,63][15][40]. In AIRBP, “Stacking” is used to predict the RBPs, in which different features are extracted from physiochemical properties, disordered properties, and evolutionary information
[64][41], and used to train the predictive model
[63][40]. However, in silico approaches are devised based on the in vitro methods which determine the set of obtained positive sequences or the structural information of RNAs.
3. Transcriptome-Wide Identification of RBPs
3.1. XRNAX
To overcome the drawback of RIC, Jackob and colleagues developed the XRNAX method for the identification of transcriptome-wide RNA binding proteins
[51][28]. XRNAX features UV cross-linking apparatus that cross-links RNA and proteins. XRNAX can isolate the coding and non-coding RBPs, regardless of whether RNAs are polyadenylated or not
[51][28]. However, in XRNAX, it is compulsory to optimize the UV cross-linking because only 5% of the proteins are cross-linked with the RNAs
[1][26]. For the successful identification of RBPs, the minimum amounts of 8 × 10
7 cells are required for RBP enrichment
[68][42]. By employing the XRNAX method, more than 700 non-polyadenylated-linked RBPs and WKF RBDs were identified
[51][28]. Besides RBP identification, all biotypes of cross-linked RNAs (both coding and non-coding) were identified using XRNAX
[51][28].
3.2. PTex
Beckman and colleagues developed PTex for the identification of transcriptome-wide RBPs
[67][43]. PTex relies on physiochemical properties and identifies all kinds of RBPs including proteins that interact with short RNAs (30 nt). PTex requires a fewer amount of cells (∼5 × 10
6 cells) than the RIC approach
[69][44]. Practically, PTex has been used for identification of the RNA-bound proteome of human HEK293 cells and the bacterium
Salmonella Typhimurium [67][43].
3.3. CARIC
Similar to XRNAX and PTex, the CARIC method captures both poly(A) and non-poly(A)-dependent RBPs. It consists of a series of steps: the metabolic labeling of RNAs (mRNA and non-coding RNAs) with 4-thiouridine (4SU) and 5-ethynyl uridine (EU); in vivo RNA protein photo cross-linking; reaction with azide-biotin; use of biotin tags for the affinity enrichment; and isolation of RBPs by streptavidin beads. Due to its universal acceptability towards the eukaryotes, CARIC has been used in living organisms such as bacteria
[13][45], animals, and plants. For example, CARIC identified 597 known RBPs in HeLa cells including 130 novel RBPs
[13][45]. However, because CARIC was restricted to cross-linking with RNA that had an alkynyl uridine analogue, it identifies fewer RBPs than PTex, XRNAX, and OOPS
[70][46].
3.4. OOPS
OOPS retrieves the free protein, protein-bound, free RNA, and cross-linked protein RNA in an unbiased manner, and enables
the
study of each component of the RNA–protein complex separately
[71][47]. It does not need molecular tagging or the capturing of polyadenylated RNA. OOPS starts with the UV cross-linking of the RNAs and proteins, and RBPs are separated in phase separation according to their gradient. The required components for analysis can be retrieved using protease digestion to digest protein in protein-bound RNA or using the digestion of RNAs to digest RNA in RBPs
[72,73][48][49]. The organic phase separation requires a lower amount of ∼3 × 10
6 cells for RBP-enrichment in OOPS
[73][49]. OOPS can capture unique RBPs that are not identified by any other methods
[72[48][49],
73], and is the first method to identify the transcriptome-wide RBPs in plants.
4. Identification of RBPs in Plants
Recent discoveries have revealed that RBPs have various functions and have shown great implications for crop improvement
[7,74][50][51]. For example, the expression of both AtGRP2 and AtGRP7 proteins conferred a higher grain yield than the control lines under salt and cold stresses
[75][52]. GRP8 is responsible for the phosphate uptake and biomass accumulation and can be edited to increase the phosphate uptake and utilization in plants
[76][53], and MhGR-RBP1 showed high transcript levels in response to several abiotic stresses
[77][54]. Furthermore, it was found that some RNA chaperones can make a plant resistant to external cues. For example, expressing a cold shock protein in maize showed a 6% increase in the yield in field trials under drought stress conditions
[78][55]. The expression of AtRGGA conferred resistance against osmotic stress in response to ABA salt stress in
Arabidopsis [79][56], while the overexpression of RBP MhYTP1 increased the drought resistance in apples
[80][57]. The above-cited
litone
rature was indicate
sd that RBPs can be exploited in the improvement of plant traits.
Among the ribonomic approaches available to identify RBPs
[13,51][28][45] in other living organisms including humans, animals, and microbes, a few approaches have been modified for application in plants. For example, RIC, RIP-seq, and CLIP-seq
[59][36] have been employed in
Arabidopsis, leading to the identification of hundreds of RBPs and their interacting RNAs. Recently, the application of RIC in 2020
[22][9] and OOPS in 2020 provided the landscape of the RBPome in plants by identifying all coding and non-coding covalently linked RBPs
[81][58]. Further utilization of ribonomics methods devised for bacteria and animals would accelerate the identification and characterization of RBPs in kingdom plantae and would facilitate
the study of the role of RBPs in plants.