It was identified that the spacer sequences between identical repeats of the clustered regularly interspaced short palindromic repeat (CRISPR) loci of bacterial genomes might originate from plasmid and phage
[51][52]. The CRISPR RNA and CRISPR-associated protein (Cas) systems are now confessed as key components governing bacterial adaptive immune response which consists of three main stages: adaptation, expression, and interference. When a bacterium was attacked by an invader, a short DNA fragment, termed a protospacer, which is neighbored by a protospacer-adjacent motif (PAM) of the invader, was processed by adaptation Cas members, such as Cas1 and Cas2, to be inserted into the 5′ end of a spacer-repeat CRISPR array embedded in the host genome as stored memory. Memory was retrieved as the CRISPR array was transcribed to produce a long precursor CRISPR RNA (pre-crRNA), which was processed by an expression factor, such Cas6 or RNase III, within the repeat region to create mature crRNA, which was incorporated with Cas effectors, such as Cas5, Cas7, Cas8, and Cas11, to yield an RNA-guided sequence-specific endonuclease in the interference stage
[53][54]. According to the number of Cas protein subunits included in the effector endonuclease complex, the CRISPR-Cas systems belong to two classes, with multi-subunit effector complexes in class 1, which can be further divided into three types: type I, type III and type IV, and single-protein effectors in class 2, including type II, type V and type VI
[55][56][57]. Besides crRNA and Cas9 protein, a trans-activating CRISPR RNA (tracrRNA) whose 5′ region is complemented with the repeat sequence of crRNA is critical to perform endonuclease activity in the type II CRISPR systems. The crRNA and tracrRNA could be engineered into one single-guided RNA (sgRNA) in accompaniment with Cas9 to restore full and specific endonuclease activity
[58]. The best-characterized and applied Cas9 enzyme was originally isolated from
Steptococcus pyrogenes, and was referred to as SpCas9, or even simply as Cas9. SpCas9 is a large 1368 a.a. multidomain protein with two distinct lobes: the recognition (REC) lobe and the nuclease (NUC) lobe, connected through an arginine-rich bridge helix (residue 56 to 93) and a disordered loop (residue 712 to 717). The REC lobe is composed of three α-helical domains (Hel-I, Hel-II, and Hel-III) and the NUC lobe contains HNH and RuvC-like nuclease domains, as well as a PAM-interacting (PI) C-terminal domain
[59][60] (
Figure 1A,B). The apo-Cas9 protein should be assembled with guide RNA (native crRNA-tracrRNA hybrid or sgRNA) to achieve site-specific DNA recognition and cleavage activities. The 20 nt spacer sequence of crRNA provided DNA target specificity and the tracrRNA conferred a crucial role in Cas9 protein recruitment. Once the PAM (NGG for SpCas9) directly adjacent to a protospacer target site was trapped by R1333 and R1335 of the Cas9-guide RNA complex, it triggered local DNA melting at the PAM-adjacent site. The PAM-proximal 10–12 nucleotides (nt), 3′-end of the 20 nt spacer sequence is absolutely critical for site specificity, and was referred to as seed region. The DNA cleavage activity of CRISPR-Cas9 was excited by the conformational change induced by the R-loop formation between target DNA and spacer RNA
[61][62]. The target DNA strand complementary to spacer RNA was cut by the HNH nuclease domain and the non-target DNA strand by the RuvC nuclease domain to produce a blunt-ended double-strand breakage at 3 bp upstream to PAM
[63][64]. Either D
10A
[58] or H
983A
[65] mutation destroyed the RuvC nuclease activity. On the other hand, D
839A
[66], H
840A
[58], and N
863A
[67] mutations could eliminate the HNH nuclease activity. These mutations did not influence the target site binding affinity of Cas9-sgRNA. Cas9 carrying the D
10A mutation and D
10A/H
840A double mutations were termed nickase (nCas9) and dead enzyme (dCas9), respectively (
Table 1). The dCas9 could be taken as a guide RNA-derived sequence-specific DNA-binding protein, like TALE described above, and coupled with DNA manipulation enzymes or transcriptional activating/inhibitory domains to be harnessed for various applications
[64]. The amino acid residues interacting with the PAM bases could be engineered to generate new PAM so as to broaden the spectrum of target sites. Based on the structure-guided rational design, the wild-type D1135, R1335, and T1337 were converted to E, Q, and R, respectively; the PAM was shifted from NGG to NGA. Additionally, as D1135, G1218, R1335, and T1337 were converted to V, R, E, and R, respectively; the PAM became NGC
[68]. An engineered SpCas9 bearing D
1135L/S
1136W/G
1218Q/E
1219Q/R
1335Q/T
1337R substitutions in PI domain (SpG) targeted NGN PAM. SpG was further engineered to carry A
61R/L
1111R/N
1317R/A
1322R/R
1333P substitutions to near-PAMless (NRN > NYN) variants, termed SpRY, with full endonuclease activities
[69] (
Table 1).
Figure 1. Diagrams of SpCas9 and its derivatives for various applications. The domain organization of SpCas9 (A) and a schematic diagram of wild-type SpCas9 associated with a sgRNA (B) was illustrated. The non-complementary strand is cut by the RuvC nuclease domain, and this nuclease activity was blocked in D10A mutant. On the other hand, the complementary strand was digested by the HNH nuclease domain, and such nuclease activity was destroyed in H840A mutant. (C) The D10A mutant, also named Cas9 nickase (nCas9), was engineered as a C to T nucleotide editor by linking a cytidine deaminase, APOBEC1, on the N-terminus of it and the switching probability could be elevated by the fusion of a uracil glycosylase inhibitor (UGI) on the C-terminus of nCas9. Like TALE, dCas9 could be guided by a sgRNA as a sequence-specific DNA-binding riboprotein. Transcriptional regulators, DNA modification enzymes, or histone modification enzymes could be fused to either or both of the N- and C-termini. In case of reverse transcriptase, it was fused to the C-terminus of Cas9, accompanied by an RNA template with 3′-end complementary to the non-complementary strand of protospacer, which could alter the nearby nucleotides downstream the RuvC cutting site (D). The localization of the Cas9-sgRNA also could be guided to an X-protein through an FKBP12–rapamycin–FRB bridge (E). The localization of the Cas9-sgRNA could also be guided via certain specific interactions, such as those between aptamer RNA and ABP. The tetraloop was replaced by an RNA aptamer of unique secondary structure, which can be recognized by a specific aptamer binding protein (F).
Besides Cas9, which recognized G-rich PAM at 3′ end of protospacer, class 2 type V Cas12a (originally called Cpf1) effector enzymes also became attractive
[73]. The long pre-crRNA was bound and processed by an intrinsic RNase activity of Cas12a protein to mature crRNA, which was composed of a repeat sequence at the 5′ end and spacer at the 3′ end. This characteristic was utilized to design multiple crRNA in a single RNA transcript
[73][74]. A canonical TTTV PAM was at the 5′ end of a 23 bp protospacer. Only a short 42–44 nt crRNA, which was composed of 19 nt repeat and 23–25 nt spacers, was necessary to guide the Cas12a’s RNA-dependent endonuclease activity, of which DNA was cut at the PAM-distal end to leave 5′ protruding staggered ends. Like Cas9, the RuvC nuclease domain was involved in non-complementary strand cleavage, while a new Nuc domain, instead of the HNH domain, was used in Cas12a for complementary strand cleavage
[75]. The size of
Lachnospiraceae bacterium MA2020 Cas12a (LbCas12a) was merely 1206 a.a. and as active as the most widely used Cas12a isolated from
Acidaminococcus sp. (AsCas12a, 1307 a.a.). Engineered LbCas12a with Q
571K and C
1003Y mutations, referred to as Lb2Cas12a, was more active and could recognize both TTTV and CTTV PAM motives
[76].