Optimization of Genome Knock-In Method for Plants: Comparison
Please note this is a comparison between Version 1 by Serge Rozov and Version 2 by Beatrix Zheng.

Plant expression systems are currently regarded as promising alternative platforms for the production of recombinant proteins, including the proteins for biopharmaceutical purposes. However, the accumulation level of a target protein in plant expression systems is still rather low compared with the other existing systems, namely, mammalian, yeast, and E. coli cells. To solve this problem, numerous methods and approaches have been designed and developed. At the same time, the random nature of the distribution of transgenes over the genome can lead to gene silencing, variability in the accumulation of recombinant protein, and also to various insertional mutations. The current research study considered inserting target genes into pre-selected regions of the plant genome (genomic “safe harbors”) using the CRISPR/Cas system. Regions of genes expressed constitutively and at a high transcriptional level in plant cells (housekeeping genes) that are of interest as attractive targets for the delivery of target genes were characterized.

  • recombinant proteins
  • actively transcribed regions
  • plant expression systems
  • housekeeping genes

1. Introduction

Currently, recombinant proteins are widely used in medicine and veterinary as well as in other areas of human activities. First and foremost, this includes vaccines, monoclonal antibodies, drugs, diagnostic tools, and so on [1]. Recombinant proteins are synthesized in prokaryotic and eukaryotic expression systems, such as Escherichia coli, yeast, insect cells, and mammalian cell cultures. Over half of all pharmaceutical proteins are produced in mammalian cells [2][3], since the prokaryotic systems and yeasts are incapable of certain posttranslational modifications (PTMs) characteristic of eukaryotes [4][5]. Incorrect PTMs or their absence can considerably change the properties of a synthesized protein, including its biological activity and pharmacokinetics. Thus, the prokaryotic expression systems are currently used mainly for synthesizing relatively simple therapeutic proteins, while the proteins that are more complex are frequently produced in the expression systems involving mammalian cells [2][3]. However, even the latter have their flaws, in particular, rather expensive cultivation, difficulties with upscaling of the process, and potential viral contamination. Although there are some examples of the successful use of plant suspension cell cultures in commercial production of valuable proteins, the number of yet unsolved problems in this area is still rather large. The most important problem is an insufficient yield of the recombinant protein in plant cells, which rarely exceeds 100 µg/kg biomass [4][5].
The currently available technologies for the delivery of foreign genes to the plant genome are mainly based on the random insertion into transcriptionally active or transcriptionally inactive regions. It is thus evident that implementation of the potentially high expression level of the target genes maximally optimized by researchers directly depends on the random distribution of foreign DNA insertions in the genome. Correspondingly, the expected optimally high expression level of a transgene may well be unattained if the transgene finds itself inserted into transcriptionally inactive genomic regions. This particular fact underlies the observed variation in the expression level of transgenes among the individually constructed transgenic plants [5]. The existing technologies for the production of recombinant proteins in plant expression systems comprise the routine stage of selection of the most favorable transformation events associated with a high yield of the recombinant protein [6].
The development of molecular biological and genetic engineering methods and their enhancement as well as the rapid development of the genome editing techniques utilizing CRISPR/Cas allows researchers to set forth the targeted gene delivery to almost any selected constitutively transcribed genomic regions. This will further make it possible to dispense with the laborious screening of a large number of transgenic lines to purposefully construct highly efficient producers of recombinant proteins carrying the target genes delivered to transcriptionally active regions. The site-specific endonucleases (Cas9 being the best known) can make double-strand DNA breaks (DSBs) in a specified genomic region, which can be further repaired according to one of the two main mechanisms existing in eukaryotes: nonhomologous end joining (NHEJ) or homologous DNA repair (HDR). The repair in all eukaryotes except for yeasts prevalently follows one of the variants of NHEJ pattern [7][8]. If a template carrying a transgene is present nearby during the repair, there is a certain probability that this transgene will be knocked-in in a DSB, providing the insertion of an additional gene into the specified genomic region [9][10][11][12].
It becomes obvious that the modern method of delivering target genes to specific regions of the genome using the CRISPR/Cas system should be accompanied by preliminary work to identify the regions that are most optimal for targeting editing tools. This approach has opened up the possibility of avoiding some negative phenomena associated with the conventional transgenesis, such as variability in target gene expression, T-DNA-induced mutations, and gene silencing. This has stimulated researchers to search for suitable regions in the plant genome for site-specific delivery of target genes. In particular, researchers seek the regions of the genome, the so-called genomic "safe harbors", where no changes in any agronomic character occur upon delivery of one or more linked target genes. The identification of genomic “safe harbors” is becoming a reality due to the development of high-throughput phenotyping methods [13][14].
The attractiveness of this approach, i.e., the preliminary selection of the target region for the delivery of the target gene, is that the researcher assesses the “risks” of delivering the target gene into the plant genome in terms of obtaining the final product. That is, the improvement of one trait by integration into the genome of the target gene should not change or impair the expression of other important characteristics of the improved plant variety. The successful application of this approach was implemented in the modification of some agricultural crops, for example, in the creation of golden rice [15].
In the case of using plants as bioreactors for the production of recombinant proteins, the problem of finding the most optimal regions for the delivery of target genes also remains extremely relevant. The main criterion for choosing such regions should be not only a high level of expression of the target gene but also a high yield of the target recombinant protein in the expression system used, for example, in plant cell culture. The authoresearcherss of this review consider areas of housekeeping genes for plant cell cultures as very promising for these purposes [16][17]. Compared with random insertion, site-specific insertion into the region of housekeeping genes makes it possible to obtain high and stable expression of the target gene with a high probability. These regions are attractive because the housekeeping genes are actively expressed during the entire interphase of the cell cycle, providing the synthesis of vitally important cell proteins. The copy number of housekeeping genes in the genome is very high, and they mainly reside in euchromatic genomic regions [18]. The organization and regulation of the transcription and translation machinery, having formed during a long evolution, ensure the stable operation of these genes and a constitutive pattern of protein synthesis. In addition, loci directly adjacent to the housekeeping genes or their intergenic spacers can be chosen in such way to serve simultaneously as a "safe harbor" of the genome for the insertion of target genes. Thus, the detection of the transcriptionally active genomic regions, in particular, the regions harboring housekeeping genes, and targeted integration of genes into these regions may open new vistas for an increase in the synthetic capabilities of plant cells in producing recombinant proteins.

2. Activation (Euchromatization) of Certain Genomic Regions

A completely new approach to an increase in the expression level of recombinant proteins in plants based on CRISPR/Cas9 technologies has been recently developed. This approach consists of targeted transcription activation in specified genomic regions. It relies on the use of deactivated endonucleases (dCas), which, thanks to their guide sgRNA, bind to the sequence of interest in the genome but are unable to make a double-strand break. Such a dCas fused to the factors activating transcription delivers them to the required genomic region. This approach was for the first time used with human cells [19][20], and then the CRISPRa system was applied to plant cells using dCas9 with five tandemly repeated VP64 transcription factors (dCas9–VP64) [21][22]. Three new systems simultaneously carrying several transcription factors—dCas9–SunTag [23], dCas9–TV [24], and dCas9–EV2.1 [25]—were designed to increase the transcription activation effect. The dCas9–EV2.1 system displayed the highest efficiency: it induced a 3–13-fold increase in the transcription intensity of different genes with different promoters [25]. In the dCas9–SunTag system, dCas is fused with a tandem array of GCN4 peptides, which attract VP64 transcriptional activators [23]. The dCas9–TV system utilizes the attachment of six transcription activator-like effector (TALE) copies united with a VP128 activator to dCas9, 6×TALE–VP128 (TV) [24]. Another approach is used in the dCas9–EV2.1 system: the guide sgRNA is attached to the anchor sites for the VPR transcriptional activator, VP64–p65–Rta [25]. Later, two additional systems were designed: one is CRISPR–Act2.0, based on the fusion of dCas9 with the VP64 and EDLL transcription factors and the attachment to sgRNA of two MS2-binding aptamers, recruiting additional VP64 factors, and the other, mTALE-Act, carrying both TALE and VP64 activators [26]. Figure 13 shows the basic scheme of transcription activation.
Ijms 23 04416 g003 550
Figure 13. Basic scheme of targeted transcription activation in specified genomic regions: TFs (yellow oval), transcription activation factors attached to dCas9; and TFs (green oval), transcription activation factors attached to gRNA.
All these transcription activation systems give a very wide range of the degree of activation depending on the genes to be activated. In particular, dCas9–VP64 elevates 7-fold the transcription level of the A. thaliana PEP1 gene versus a 200-fold increase in the transcription of the FIS2 gene. The mTALE–Act system increases the transcription level of both genes approximately 30-fold, while CRISPR–Act2.0 gives 30–45- and 1500-fold increases, respectively, for PEP1 and FIS2. Interestingly, CRISPR–Act2.0 is able to activate rather long chromatin regions comprising up to four genes [26].
The system CRISPR–Act3.0, based on CRISPR–Act2.0, has been designed just recently: this system is supplemented with the elements of SunTag (10 tandem repeats of GCN4 peptide) and TALE (2 TAD activator repeats). The new third generation system has been tested with rice OsGW7 and OsER1 genes and demonstrated a 10-fold higher efficiency as compared with the second-generation system, CRISPR–Act2.0. The transcription level of OsGW7 grew 250-fold and of OsER1, 100-fold; note that the region of this effect covered up to seven adjacent genes. An analogous system has been constructed involving dCas12b and also displays impressive results [27].
Although all these systems for increasing the transcription level have been so far tested using the own genes of plants, they can be undoubtedly used for increasing the expression level of foreign target genes and even a set of several genes, which can create new metabolic pathways in the plant cell.

3. Spatial Organization of Chromatin and Transcriptional Activity

The eukaryotic cell faces the difficult task of accommodating a considerable amount of DNA within its small nucleus. The genome of several plant species is 50-fold larger as compared with the human genome. In addition to packing into nucleosomal structures, the eukaryotic genome forms the structures of higher orders [28][29]. The chromatin is arranged in the nucleus in a nonrandom manner; each chromosome occupies its own chromosome territory, which influences the availability of the genes for different factors and their expression [30][31]. In turn, chromatin can also form topologically associated domains that divide large regions of the genome into distinctly defined autonomously regulated regions [32][33].
In addition, the genome is arranged as local chromatin loops, which can considerably influence the transcription level of genes [32][34]. These loops can be large and unite the genes rather distant from one another, allowing them to exchange different factors involved in expression regulation [35]. These loops can be short, covering a single locus, and provide the chromatin interaction within a single gene, which makes it possible to most dynamically regulate its expression [36].
Short chromatin loops can be of manifold types, can cover different gene regions, and provide different types of transcription regulation, increasing or decreasing the transcription intensity of genes, initiating or prohibiting the reverse reading of noncoding RNA or antisense RNA, and leading to an alternative splicing [37]. In light of the topic of this resviearchw, the gene loops that lead to an increase in the transcription of a gene they contain are most interesting to the researchersus. First and foremost, this is the loop where the 5′UTR and 3′UTR of the same gene are brought close to each other (Figure 24a). The loop of this type leads to the formation of a separate isolated transcriptional unit from the promoter to the transcription termination site (TTS). This allows RNA polymerase II to work more efficiently: once having reached TTS, it can immediately bind the promoter and further move in the circle [36][37][38].
Ijms 23 04416 g004 550
Figure 24. Chromatin loops increasing the transcriptional activity of genes: (a) the loop between 5′UTR and 3′UTR, allowing RPol II to move in a circle; and (b) the loop between a distant enhancer and a promoter.
At least five cases are known in plants when the chromatin loops between 5′UTR and 3′UTR are formed [36]. The sunflower Helianthus annuum gene HaWRKY6 forms a gene loop in the cotyledon cells, which ensures a high tissue-specific expression thanks to the recirculation of RNA polymerase II. This gene almost does not express in the cells of leaves [39]. The loop in the sunflower gene FLC (flowering locus) disappears after 2 weeks of incubation in the cold, which causes a decrease in the FLC transcription and slows down the switch to flowering [40]. In A. thaliana, the gene loops enhancing transcription have been found in three genes: IPT3, IPT7, and TFL1 [41][42]. Another type of the loops that elevate the transcription level is the loops between a distal enhancer and a promoter (Figure 24b); the loops of this type have been observed in Zea mays [42] and A. thaliana [43].
The mechanisms underlying the formation of chromatin loops are vague and require further in-depth studies. In several cases, the interaction between protein transcription factors contributes to the formation of loops [43][44]. The loop formation also depends on the balance between the histone H3 methylation and acetylation in the region and DNA methylation [40][45]. Short interfering RNAs are also involved in this process [36][39][40][46]. Unfortunately, now, existing molecular methods do not allow the reusearchers to artificially create such loops increasing the level of gene transcription. It is important to realize that transcriptional activity is to a considerable degree determined by the spatial organization of a chromatin region.

4. Conclusions

Considering various examples of site-specific delivery of target genes using the CRISPR/Cas system, it becomes obvious that this work should be preceded by a search for optimal areas in the genome for targeting genetic engineering tools to them. As such, researchers consider genomic “safe harbors”, the choice of which directly depends on the task set by the researcher. When improving any characteristics of varieties of important agricultural crops, one of the main criteria is the preservation of the characteristics of the variety created by the previous efforts of breeders.
The choice of search criteria for GSH for the field of biotechnology of cultivated plant cells, in the ouresearchers' opinion, is determined primarily by the choice of such regions in which integration of the target gene will provide the highest possible yield of recombinant protein. On the one hand, housekeeping gene loci can be such areas. The peculiarity of the organization of such regions is associated with the involvement of many transcription activation factors. Although, as it was shown by the first attempts to deliver a target gene to areas of housekeeping genes, not all of them may be available for genomic editing. On the other hand, such regions can be identified by evaluating the loci of the random transgene integration into the plant genome and determining the regions in which the expression of the target gene and the yield of the recombinant protein are maximum.
Works on chromatin modification and the creation of “artificial” GSHs that are attractive for target transgene delivery also seem promising. The spatial organization of chromatin is an efficient regulator of many aspects of transcription. The chromatin in the nucleus is nonrandomly arranged, forming various loops that regulate and modulate its activity. The 3′UTRs and 5′UTRs of the actively transcribed genes are brought into proximity, which considerably optimizes the work of RNA polymerase II, allowing it to move in a circle. The mechanisms and methods underlying the formation and maintenance of such loops are rather vague; however, this side in the chromatin organization and function should not be overlooked.
Video Production Service