2.1. gRNA and Its Components
A CRISPR RNA (crRNA) and a transactivating CRISPR RNA (tracrRNA) are the two components of a single guide RNA (sgRNA) in the natural CRISPR/Cas9 system. In the engineered version of the CRISPR/Cas9 system, the crRNA spacer sequence will be replaced with the designed target sequence at the time of gRNA cassette preparation. The important part of the spacer sequence is the seed sequence that binds to the target DNA, following recognition of the protospacer-adjacent motif (PAM), and therefore it is required for precise target recognition and binding. Depending on the Cas variant, the length of the seed sequence varies from 5–12 nt upstream of the PAM. Mismatches between the seed sequence of a gRNA and the targeted region constitute an important risk for failure in CRISPR activity
[33]. The second part of a gRNA—the scaffold sequence—is responsible for generating the required secondary structure to bind to the target sequence and is considered the constant part of a gRNA.
2.2. Critical Criteria in Designing gRNA
2.2.1. Proper Selection of the Target Site(s) within the Locus of Interest
Depending on the desired outcome of editing, different regions within a targeted gene/locus can be selected for mutation induction. In knock-out experiments via a premature termination codon (PTC) strategy, targeting earlier exons is more effective than exons very close to an ATG or intron–exon junction, as PTC-induced loss of function is not common in these regions
[33][34].
2.2.2. Determination and Prediction of Off-Target Activity
Success in the induction of desired mutations and preventing off-target editing are two key outcomes of a successful genome editing experiment. These experiments require that great care be taken in designing the gRNA. The cleavage efficiency of candidate gRNAs can be determined in a faster manner through an in vitro screening method rather than through in vivo expression
[35]. Predicting off-target activities is another approach to finding the most efficient gRNAs among different candidates. Some of the web-based software for designing gRNAs present information regarding the genome-wide off-target activities of the candidate gRNA
[33]. Machine learning algorithms can also predict the mutation induction efficiency of a designed gRNA sequence using repair data obtained from previous studies
[36][37]. Although care in designing an optimal site-specific gRNA is the best way to reduce off-target activity, there are some wet-lab procedures that can be used to reduce the frequency of off-target events; these include: (i) reducing gRNA-Cas9 concentration, (ii) using double-nicking mediated by a Cas9 nickase mutant (nCas9), and (iii) using truncated gRNAs (tru-gRNAs)
[38].
2.2.3. Nucleotide Features of the Designed gRNA and its Associated Secondary Structure
A GC content of 30–80% and no mismatch to the targeted sequence, especially in the seed region targeting the non-transcribed strand, have been reported as key features of an effective gRNA
[33]. Another aspect that should be considered is avoiding multiple “Ts” in the construct
[33]. Repeat and anti-repeat (RAR) stem-loop, stem-loop 1, stem-loop 2, and stem-loop 3 are four required stem-loop structures that should be present in an effective gRNA. The formation of a functional Cas9-gRNA-DNA complex does not necessarily depend on the stem-loop 1 structure. However, stem-loops RAR, 2, and 3 are required and important in plant genome editing experiments
[39]. In silico analysis of gRNA secondary structure through web-based software, such as RNA-fold (
http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi (accessed on 1 February 2022), is an appropriate procedure to predict the efficiency of a designed gRNA
[40]. In addition to the predicted secondary structure, information about the free energy (ΔG) of the self-folding potential of the designed gRNA is available in the RNA-fold WebServer. A designed gRNA with self-folding free energy within the range of 0 to 2.0 kcal/mol can lead to the highest cleavage efficiency of Cas9
[40].
2.3. Number of Designed gRNAs
Designing at least two independent gRNAs for each target gene is a solution to minimize the risk of experiment failure in a CRISPR/Cas9 experiment
[41]. Designing multiple gRNAs is especially important in species such as soybean, where nearly 75% of the genes are present in multiple copies, so one gRNA is not enough for simultaneous mutation induction in paralogs
[42]. Multiple designed gRNAs should be individually inserted between the promotor and gRNA scaffold of a gRNA cassette using several rounds of standard restriction-ligation cloning, which is a tedious and time-consuming procedure
[39]. In the following section, researchers highlight different strategies for assembling multiple gRNAs.
2.4. Assembly of Multiple gRNAs and gRNA Processing Strategies
In multiplex gene editing studies, multiple gRNA cassettes designed to target different genes or different regions of a gene at once should be assembled as a single unit and inserted in the destination vector. The first strategy is the insertion of the designed and synthesized double-stranded oligonucleotides in individual gRNA vectors, then isolation of each gRNA cassette (promotor + guide sequence + gRNA scaffold + terminator) from gRNA vectors and their assembly in an intermediate/destination construct using Golden Gate cloning. In Golden Gate assembly, gRNA cassettes should be designed in a way to produce compatible overhanging ends after digestion with type IIS restriction enzymes. After ligation of adjacent units, the resulting gRNA module can then be mobilized into the binary vector of interest using different cloning methods. Zheng et al.
[43] provided a Golden Gate-based modular system that can assemble 2, 4, and 6 gRNA units for targeted mutagenesis of soybean
[43]. Gibson assembly is another technique for the assembly of various gRNA cassettes and is independent from end-compatibility issues and PCR cleanup steps
[44]. The GoldenBraid cloning system has also been used to assemble multiple gRNAs and insert them into destination binary vectors
[45]. Modification of existing CRISPR vectors for substitution of other Cas9 proteins, fluorescence proteins, and required resistance genes can be easily performed by using the GoldenBraid cloning system
[39][46]. For gene knock-out experiments using gene editing technology, the GoldenBraid technique is well suited as it can carry multiple gRNA constructs leading to an increased possibility of mutagenesis
[47]. This technique can also be used for complete disruption of the genes of interest by introducing large deletions
[33].
Cloning multiple gRNA cassettes as independent units can present disadvantages such as frequent recombination events and plasmid instability in
E. coli and
Agrobacterium because of repeated promotor sequences (promoter crosstalk effects)
[48]. Therefore, the configuration of multiple gRNAs and their simultaneous expression as a single transcript can be more beneficial. In this case, a single polycistronic transcript will be cleaved into individual gRNAs post-transcriptionally using different RNA processing strategies
[49]. The CRISPR-associated RNA endoribonuclease Csy4 (from
Pseudomonas aeruginosa), the tRNA-processing endogenous enzymes, and self-processing ribozymes can be used for post-transcriptional cleavage of a polycistronic transcript
[48]. Luo et al.
[50] used a polycistronic approach (with Csy4 used to achieve cleavage) and reported a 45.3% mutation induction efficiency when targeting
GW2 paralogs in soybean. A polycistronic tRNA-gRNA (PTG) vector is another processing strategy for multiplexing gRNAs
[49].
2.5. Cas9 Nucleases
The Cas protein is responsible for DNA cleavage in a targeted region of the plant genome. Therefore, a high-fidelity Cas enzyme can increase the specificity and efficiency of a targeted genome editing procedure.
Streptococcus pyogenes Cas9 (SpCas9) is the most robust and widely used Cas enzyme
[28]. The presence of a canonical PAM sequence (NGG) in the targeted genomic loci is, however, a prerequisite for cleavage by SpCas9. Therefore, there is a limitation in the number of genomic loci that can be targeted by this protein
[51]. Fortunately, different Cas9 variants, with distinct PAM specificities, can be used to expand this range
[52]. In a study, three variants of Cas9 (xCas9, SpCas9-NG, and XNG-Cas9) were assessed in soybean
[53]. Researchers reported that xCas9 was successful with the NGG and KGA PAMs, SpCas9-NG recognized NGD (NGG, NGA, and NGT), RGC (AGC or GGC), GAA, and GAT PAM sites, whereas XNG-Cas9 cleaved only regions with NGG, GAA, and AGY PAM sequences
[53]. These variants of Cas9 can be used to induce mutations in targets devoid of NGG PAMs.
After choosing the most appropriate Cas9 variant, some additional actions are required to increase its efficiency, such as codon optimization, adding nuclear localization signals (NLSs), and the insertion of introns. Grützner et al.
[54] compared the efficiency of different Cas9 nucleases in
Arabidopsis: a human codon-optimized Cas9 with a C-terminal NLS, a maize codon-optimized Cas9 with a C-terminal NLS and with/out an additional N-terminal NLS, and a zCas9 with 13
Arabidopsis introns in its sequence that contains a C-terminal NLS with/out an additional N-terminal NLS. They showed that constructs with introns work better than those without introns and two NLSs seem better than one. In the same work, it was found that a Cas9 with introns also proved superior in
Nicotiana benthamiana and
Catharanthus roseus.
2.6. Chimeric Deactivated Cas9 Proteins and Their Applications
The Cas9 enzyme can cleave DNA fragments because of its RuvC and HNH nuclease domains. Point mutations in the nuclease domains of SpCas9 can inactivate its RuvC and HNH domains and create a modified Cas9 protein
[55]. The D10A mutation (in the RuvC domain) and the H840A mutation (in the HNH domain) both create a nickase enzyme (nCas9). Together, these two mutations result in a nuclease-dead Cas9 (dCas9), which is unable to cleave target DNA while retaining its ability to bind to target DNA with the help of a gRNA
[55]. Fusion of nCas9 or dCas9 peptides to other proteins or protein domains can enable other DNA modifications at targeted loci. Such chimeric proteins can be used for specific purposes such as base editing, transcriptional repression, epigenetic modification, and in vivo labeling (reviewed in
[37]).
Multiplex base alterations can be used to create multiple substitutions, thus facilitating the directed evolution of plant genes
[48]. In a base editing procedure, gRNAs should be designed such that the targeted base is located between positions 4 and 8 in the gRNA sequences (editing window), counting the end distal to the PAM as position 1
[56]. Two CBEs have been conducted in soybean, and their results showed that the editing window of nCas9-APOBEC1 is located between positions 5–7 in the gRNA sequence, counting the end distal to the PAM as position 1
[57][58].
2.7. Gene Regulatory Elements (GREs) of gRNAs and Cas9 Nuclease Cassettes
Promotors and terminators used in gRNA and Cas9 cassettes are the other critical parameters for achieving an efficient genome edition. Finding the appropriate promoters is crucial as off-target activities can increase markedly when concentrations of the Cas9-gRNA complex are excessive
[33]. Two types of RNA polymerase III promoters, including U6/U3, are the most commonly used promoters to express gRNAs in plants
[38]. Finding the appropriate variant(s) of U6/U3 promoters is an important step in optimizing gRNA expression as there are various versions of these promoters
[33]. Endogenous U6/U3 promoters can lead to better editing outcomes than foreign promoters. In soybean, editing efficiency was significantly increased when the GmU6 promoter (14.7–20.2%) was used instead of the AtU6 promoter (3.2–9.7%)
[38]. Similar results were reported when comparing the GmU6-16g-1 promoter (43.4–48.1%) to the AtU6-26 promoter (11.7–18.1%) via soybean hairy-root transformation
[59].
Different types of promoters can be used: (i) constitutive (from viruses or plants), (ii) tissue-specific, (iii) inducible, and (iv) developmentally regulated promoters. Constitutive promoters of viral origin (CaMV35S and NOS) derived from plant housekeeping genes (UBIQUITIN or ACTIN) are the most commonly used. Broadly, higher efficiency of plant-derived promoters has been reported
[33].
Terminators are other GREs that can affect the stability of Cas9 and gRNA transcripts. RNA Pol II readthrough can interfere with RNA Pol III-mediated transcription of gRNAs can happen when the Cas9 cassette has a weak terminator and both the Cas9 and gRNA expression cassettes are in the same orientation
[33]. This can be minimized by using a strong terminator in the Cas9 transcription unit or using an opposite orientation of the gRNA and Cas9 expression cassettes (head-to-head) in the binary vector
[34]. A comparative analysis of different terminators revealed that the rbcS-E9 terminator (from
Pisum sativum) was the best terminator
[33].
2.8. Configuration of gRNA and Cas9 Cassettes
Overall, there are three options to achieve co-expression of the gRNA and Cas9 cassettes: (i) a single transcriptional unit (STU), (ii) a two-component transcriptional unit (TCTU), or (iii) a bidirectional promoter system
[60][61]. In the STU approach, the expression of both cassettes is jointly driven by a single Pol II promotor. In conventional TCTU, the gRNA is under the control of a pol III promoter (U3 or U6), and the Cas9 cassette is driven by a pol II promoter
[61]. When inducible or tissue-specific expression is needed, an STU approach is best
[62], although there can be negative impacts of such a configuration on Cas mRNA maturation and gRNA stability
[61]. In a comparison of the STU and TCTU configurations in soybean, TCTU was proven to be a better option than STU
[63]. Generally, the editing rate is target-dependent; therefore, finding the best configuration for optimized co-expression of the gRNA and Cas9 requires trial and error
[60]. Overall, these various possible components and configurations offer a range of possibilities but also make the optimization of a CRISPR/Cas system complex and multi-factorial (
Figure 2).
Figure 2. The multi-factorial nature of CRISPR/Cas9-based genome editing procedure considering configuration of gRNA and Cas9 expression cassettes along with different RNA processing strategies in multiplex gRNA experiments. (
a) Simple TCTU configuration of gRNA and Cas9 cassettes in left and right borders of T-DNA. (
b) Expression of gRNA and Cas9 cassettes in bidirectional promoter system. (
c) Simple multiplex gRNA in TCTU system. (
d) Multiplex gRNA in TCTU system with Csy4, tRNA, and ribozyme processing machines. (
e) Multiplex gRNA in STU configuration with Csy4, tRNA, and ribozyme processing machines. (
f) Bidirectional configuration of Cas9 and multiplex gRNA with Csy4, tRNA, and ribozyme processing machines.