2. Are Agrobacterium Proteins Involved in T-DNA Integration into the Plant Genome?
The pathway of T-DNA transfer from
Agrobacterium to plant cells, and its ultimate integration into the plant genome, starts with nicking the T-DNA region of a Ti (tumor inducing) or Ri (rhizogenic) plasmid by the T-DNA border-specific endonuclease VirD2
[11,12,13,14,15][1][2][3][4][5]. Nicking occurs between nucleotides 3 and 4 of the 25 bp border sequences that flank the T-DNA region
[16,17][6][7]. During T-DNA border nicking, VirD2 covalently links to the 5′ end of T-DNA, resulting in a single-strand form of T-DNA, the T-strand, that on its 3′ end contains nucleotides 4–25 of the left border (LB), and on its 5′ end contains nucleotides 1–3 of the right border (RB)
[18,19,20,21,22][8][9][10][11][12]. VirD2 subsequently leads the T-strand through a dedicated type IV protein secretion system (T4SS) and into the plant cell
[23,24][13][14]. Within the plant, T-strands may suffer deletions at the 3′ and/or 5′ ends before or during integration. Deletions are especially prevalent, and generally more extensive, at the 3′ end than at the 5′ end, which is protected by its linkage to VirD2 protein (e.g.,
[21][11]).
Does VirD2 directly participate in the T-DNA integration process? VirD2 can both cleave and re-ligate (i.e., reverse the reaction) T-DNA border sequences in vitro
[25][15]. However, VirD2 does not harbor an activity that can ligate VirD2/T-strands to generalized target sequences; this could only be done by a ligase activity found in plant extracts
[26][16].
VirD2 contains a highly conserved region (amino acids DGRGG) near the C-terminus, termed the ω domain
[27][17]. Substitution of DDGR (the first D is not part of ω) by four serine residues resulted in a mutant VirD2 protein that conferred somewhat lower transient transformation activity (~20–30% of wild-type levels) upon its
Agrobacterium host compared with wild-type VirD2. However, stable transformation was reduced by >95%
[27,28,29][17][18][19]. Although stable transformation of plants using this VirD2 ω mutant was decreased, the precision of the integration of sequences near the RB was similar to that observed when using wild-type VirD2
[29][19]. Mutation of other sequences in VirD2 protein, however, could alter the precision of T-DNA integrated near the RB
[30][20]. Taken together, the results of these two studies suggest that VirD2 may be involved in T-DNA integration.
Although the VirD2 ω mutant
Agrobacterium strain identified by Shurvinton et al.
[27][17] showed moderately lower transient transformation activity but greatly reduced stable transformation, two other ω domain mutants (a precise deletion of the DGRGG ω amino acids, or their replacement with five glycine residues) resulted in both greatly decreased transient and stable transformation frequency
[31][21]. The different relative transformation activities conferred by the various VirD2 ω mutants may result from altered protein structure conferred by the serine residue substitutions.
3. Where in the Plant Genome Does T-DNA Integrate?
Early studies indicated that T-DNA integration is random at the chromosome level
[32,33,34][22][23][24]. Generation of numerous
Arabidopsis and rice T-DNA insertion libraries, each consisting of tens of thousands of individually tagged plant genomes, allowed the first large scale probing of T-DNA insertion locations at the DNA sequence level. Results of these studies initially indicated that T-DNA preferentially integrated into transcriptionally active genes, promoter regions, or sequences of high A+T content
[35,36,37,38,39,40][25][26][27][28][29][30]. These studies, however, all suffered from the problem of selection bias; the individual transgenic plants each harboring a different T-DNA integration event had been selected for antibiotic/herbicide resistance. If T-DNA had integrated into a transcriptionally inert region of the plant genome, the selection marker gene would not have been expressed and the resulting transgenic plant would have been lost. Indeed, fewer T-DNA insertions into heterochromatic regions of DNA, centromeres, telomeres, and rRNA genes were recovered relative to the proportion of the genomes represented by these sequences, resulting in the appearance of T-DNA integration into only transcriptionally active regions of the genome
[37][27].
In contrast to the studies cited above, two groups examined T-DNA integration sites in the
Arabidopsis genome in which cells were not selected for expression of any transgene, including those for antibiotic/herbicide resistance
[41,42][31][32]. These experiments indicated that T-DNA did not preferentially integrate into any particular sequence context or region of gene expression. For example, approximately 10% of the
Arabidopsis genome is composed of highly repeated DNA sequences, and ~10% of the insertions occurred in these sequences. T-DNA insertions into rDNA, centromeres, and telomeres also approximated their relative proportion of the genome. T-DNA pre-integration sites were average in their extent of transcription and methylation, as compared with the entire genome
[41][31]. Thus, T-DNA integration did not preferentially occur into any particular chromosome sequence or feature. These results were reproduced using a high-throughput sequencing strategy to identify, without selection, T-DNA integrated into the
Arabidopsis genome within six hours after infection
[42][32]. This group also failed to identify preferential T-DNA integration into particular sequences, and the extent of DNA methylation of pre-integration sites was not biased. They did identify a slight local A+T motif enrichment at the pre-integration site, and microhomology was often observed between the T-DNA border sequences and the pre-integration site. Microhomology between T-DNA border regions and pre-integration sites is a common feature of T-DNA integration (e.g.,
[43,44,45][33][34][35]). The one distinct feature of pre-integration sites was a high nucleosome occupancy and a high level of histone H3K27 trimethylation. However, data for these last two items were based on database entries and not on direct analysis of chromatin from the tissues the authors used in their studies.
Taken together, these two studies do not point to any distinctive plant DNA or chromatin features as T-DNA target integration sites. However, chromatin conformation may influence T-DNA integration. The
rat5 Arabidopsis mutant is highly susceptible to transient transformation but highly resistant to stable transformation by
Agrobacterium [46][36]. This mutant contains a T-DNA insertion in the 3′ untranslated region of the histone H2A-1 gene
HTA1 [47][37]. Overexpression of
HTA1 in otherwise wild-type
Arabidopsis plants increased stable transformation, as it also did in rice
[48][38]. The expression of
HTA1 correlates with
Arabidopsis cell and tissue susceptibility to
Agrobacterium-mediated transformation
[49][39], and all tested histone H2A proteins were functionally redundant with respect to increasing transformation when expressed in
Arabidopsis [50][40]. In addition to histone H2A, overexpression of histone H4 and one histone H3 protein (HTR11) also conferred hyper-susceptibility to
Agrobacterium-mediated transformation. However, overexpression of other histone H3 proteins and all tested histone H2B genes had no effect on transformation
[51][41]. It is not clear whether manipulation of histone levels in plant cells directly alters the extent of T-DNA integration, or whether these altered histone levels influence plant gene expression, thereby resulting in T-DNA integration
[52][42]. Nevertheless, the direct interaction of VirD2 with histones in yeast suggests that histones may help direct VirD2/T-strand complexes to the host genome
[53][43].
Do
Agrobacterium or host proteins guide T-strands to host chromatin prior to integration? VirE2 protein, a virulence effector protein secreted by
Agrobacterium into plant cells, is a single-strand DNA-binding protein that has been proposed to interact with VirD2/T- strands in the plant cell, forming a “T-complex”
[54][44]. (It should be noted that such complexes have never been identified in
Agrobacterium-infected plants). VirE2 can interact with the plant bZIP transcription factor VIP1
[55][45]. VIP1 can interact with histones and nucleosomes, suggesting that VIP1 may be a molecular link between T-complexes entering the nucleus and T-DNA integration sites in the plant chromosomes
[56,57,58][46][47][48]. However, two recent publications demonstrated that neither VIP1 nor its orthologs are required for
Agrobacterium-mediated transformation, throwing into doubt a role for VIP1 in T-DNA integration
[59,60][49][50].
In addition to the involvement of histones in
Agrobacterium-mediated transformation and, perhaps, T-DNA integration, histone-associated and -modifying proteins also affect transformation. Crane and Gelvin
[61][51] tested RNAi lines individually directed against 109
Arabidopsis chromatin-related genes for transient and stable transformation. Silencing of 24 of these genes decreased transformation. In particular, silencing of
SGA1 (encoding a histone H3 chaperone) and
HDT1 and
HDT2 (encoding histone deacetylases) greatly decreased both stable transformation and T-DNA integration. Silencing of genes involved in chromatin remodeling, DNA methylation, histone acetylation, and nucleosome assembly also had an effect on stable transformation, although this may result from secondary effects these genes have on the expression of other genes involved in stable transformation. Deletion of genes encoding components of yeast histone acetyltransferase complexes increased yeast transformation, whereas deletion of genes encoding proteins of histone deacetylase complexes decreased yeast transformation
[62][52]. For some of these mutants, integration of T-DNA into the yeast genome was disrupted.
Taken together, these results indicate that proteins associated with chromatin structure and modification are important for T-DNA integration into plant or yeast genomes, and that in some instances, the effect on integration may be direct rather than indirectly influencing the expression of other genes important for T-DNA integration.
As described above, the position of T-DNA integration, and the chromatin structure of that region, may influence the expression of T-DNA-encoded transgenes. As a practical consideration, this variability in transgene expression (the so-called “position effect”) will influence studies on gene and promoter function. Scientists therefore need to examine a large number of independent transgenic events to draw conclusions about, e.g., relative promoter strengths.
4. T-DNA integrates into plant DNA break sites
T-DNA preferentially integrates into double-strand DNA breaks [67][53]. This observation was followed by two other reports also showing preferential T-DNA integration into double-strand break sites [68,69][54][55]. In each of these studies, a rare cutting meganuclease (either I-SceI or I-CeuI) was used to cut tobacco DNA during transformation. T-DNA was preferentially “trapped” in these cut sites at frequencies up to several percent of the examined integration events. More recently, scientists used CRISPR technology to generate double-strand breaks in DNA, either to generate site-directed mutations or to attempt homology-dependent repair using recombination with correction templates. In several instances, T-DNA was trapped at these break sites following Cas nuclease cutting (e.g., [70,71][56][57]). It is thus clear that double-strand DNA breaks can act as a “T-DNA magnet”. However, does Agrobacterium Agrobacterium take advantage of naturally occurring host DNA breaks (or nicks), or can Agrobacterium Agrobacterium infection perhaps induce host DNA disruptions?
That Agrobacterium Agrobacterium can incite DNA breaks would not be unusual, because inoculation by other plant pathogens (bacteria, oomycetes, and fungi) can cause double-strand DNA breaks in host plant genomes [72][58]. DNA disruptions occur in Arabidopsis Arabidopsis cells near the site of Agrobacterium Agrobacterium infection, as detected by COMET assays. However, because alkaline pH conditions were used in this study, it is not clear whether these disruptions resulted from single-strand nicks or double-strand breaks in the plant DNA [73][59]. Recent results indicate that Arabidopsis Arabidopsis cells, exposed to Agrobacterium Agrobacterium but not stably transformed, contain a higher number of in/dels than would be expected from the natural frequency of such mutations [74][60]. These results suggest that incubation of cells with Agrobacterium Agrobacterium is inherently mutagenic, causing double-strand DNA breaks that are mis-repaired.
There are many hints in the literature that Agrobacterium Agrobacterium infection can cause mutations independent of T-DNA integration; these mutations may result from induced double-strand DNA breaks that are subsequently mis-repaired. They may also be generated by “abortive integration” of T-DNA, followed by mis-repair of the abortive integration site. For example, N. plumbaginifolia N. plumbaginifolia plants, containing one mutant nitrate reductase (NR) gene, could be converted to fully NR null mutants (chlorate resistant) following Agrobacterium Agrobacterium−mediated transformation. However, none of these null mutants contained T-DNA in the NR NR gene [75][61]. Mutation of the wild-type NR NR allele must have occurred by some other mechanism.
Another indication that Agrobacterium infection may be inherently mutagenic derives from the observation that only ~35% of the T-DNAs in Arabidopsis T-DNA insertion libraries co-segregate with a screened mutant phenotype [76,77]. Mutations in the selected lines may be derived from disruptions other than T-DNA insertion into the gene of interest.
Another indication that Agrobacterium infection may be inherently mutagenic derives from the observation that only ~35% of the T-DNAs in Arabidopsis T-DNA insertion libraries co-segregate with a screened mutant phenotype [62][63]. Mutations in the selected lines may be derived from disruptions other than T-DNA insertion into the gene of interest.
Many Arabidopsis T-DNA insertion lines contain complex host genome rearrangements that are frequently associated with mis-repair of double-strand DNA breaks. These include inversions, translocations, and other complex rearrangements [78–85]. Similar rearrangements have been detected in transgenic rice [86]. Clark and Krysan [87] noted that approximately 19% of the examined lines from the SALK T-DNA mutant collection contained translocations. The rearrangements of plant genomes following T-DNA integration are reminiscent of the process of chromothripsis resulting from CRISPR−Cas9 mammalian genome editing [88].
Many Arabidopsis T-DNA insertion lines contain complex host genome rearrangements that are frequently associated with mis-repair of double-strand DNA breaks. These include inversions, translocations, and other complex rearrangements [64][65][66][67][68][69][70][71]. Similar rearrangements have been detected in transgenic rice [72]. Clark and Krysan [73] noted that approximately 19% of the examined lines from the SALK T-DNA mutant collection contained translocations. The rearrangements of plant genomes following T-DNA integration are reminiscent of the process of chromothripsis resulting from CRISPR−Cas9 mammalian genome editing [74].
5. What Is the Mechanism of T-DNA Integration?
Perhaps the most important problem remaining in understanding
Agrobacterium-mediated transformation is the mechanism of T-DNA integration. As cited above, numerous models of integration have been proposed. What is clear is that homologous recombination is not the mechanism: Despite many kilobases of homology between plant DNA and engineered T-DNA, integration into homologous sequences in the plant genome occurs extremely rarely. This differs from the situation in yeast, where homologous recombination is predominant when homology between T-DNA and the yeast genome is present (see, e.g.,
[89,90,91][75][76][77]. Thus, what remains for the T-DNA integration mechanism in plants is some form of non-homologous end-joining (NHEJ) in which T-DNA integration occurs in the absence of large regions of homology, although targeting by microhomology may be used in some circumstances.
Two major NHEJ pathways have been described (e.g.,
[92,93,94,95][78][79][80][81]). The “classical” (Ku−dependent) pathway utilizes, among other proteins, the Ku70/Ku80 heterodimer to protect the broken DNA ends, and the complex of XRCC4/XLS/DNA ligase IV to repair the break. It is not unusual that, following repair, small deletions, insertions, or nucleotide substitutions occur at or near the break site. Microhomology between the ligated ends is rarely detected. An “alternative” pathway uses microhomology between a region at or near the break site and another sequence (near or distant from the break site) for repair. Participants in this pathway include members of the MRN complex that process broken chromosome ends, the WRN helicase, and a complex of XRCC1 and DNA ligase III (not found in plants) to repair the breaks. DNA polymerase θ is a participant in this pathway and is proposed to play a key role in T-DNA integration. Microhomology-mediated end-joining (MMEJ) is frequently referred to as theta-mediated end-joining because of DNA polymerase θ’s role in this process. DNA polymerase θ has several unusual properties: the protein is made up of both a helicase and a DNA polymerase domain, and the enzyme has a propensity to “template switch”. This latter property allows it to copy DNA from another region of the genome into break sites, generating “filler” DNA sequences in the break. MMEJ is highly mutagenic, frequently generating deletions as sequences flanking DNA break sites search for homologous sequences with which to join. MMEJ also generates chromosomal rearrangements such as inversions and translocations, features commonly associated with T-DNA integration.
Which of these NHEJ pathways, if any, are involved in T-DNA integration? Numerous studies have been published testing stable transformation efficiencies and T-DNA integration characteristics of various
Arabidopsis and rice NHEJ mutants
[6,96,97,98,99,100,101,102,103,104,105,106][82][83][84][85][86][87][88][89][90][91][92][93]. However, with the exception of three publications
[102[89][91][93],
104,106], all other studies used the frequency of stable transformation as a proxy for T-DNA integration. While detection of stable transformants requires T-DNA integration, it also requires expression of selection marker genes to recover transformed tissue. T-DNA integration may thus occur in the absence of stable transformation if selection marker genes have been silenced. As noted above, such selection bias can confound experimental interpretations
[41,42,107][31][32][94]. An additional complication is that most
Arabidopsis stable transformation experiments were conducted using a flower-dip protocol. It is well-documented that the importance of
Arabidopsis genes essential for somatic cell transformation differs from that of germ-line transformation
[106,108,109][93][95][96]. Finally, stable transformation efficiencies must be calculated with respect to transient transformation frequencies; a decrease in stable transformation may not indicate that a particular plant mutant has altered stable transformation characteristics if the transient transformation frequency is correspondingly altered. It is particularly important that plant inoculations be conducted with several orders of magnitude different
Agrobacterium concentrations to avoid a “saturation response” with high bacterial inoculum conditions, thus obscuring differences among wild-type and mutant plant genotypes.
In light of these numerous variables and limitations, it may not be surprising that different laboratories have come to different conclusions with regard to the importance of various plant NHEJ genes for T-DNA integration (or rather, for most studies, stable transformation). Several reports indicate that mutation of the
Arabidopsis or rice classical (c)NHEJ pathway genes
Ku70,
Ku80, or DNA ligase IV (
Lig4) resulted in lower stable transformation frequencies
[6,96,99,100,103][82][83][86][87][90]. These studies suggest that these cNHEJ genes are important for T-DNA integration. Other publications indicated that such mutations had little or no effect on stable transformation
[97,98][84][85]. These studies suggest that these cNHEJ genes are not essential for T-DNA integration. Still other publications, using both
Arabidopsis and
N. benthamiana, showed that mutation or down-regulation of several cNHEJ genes, including
Ku70,
Ku80,
XRCC4, and the gene encoding DNA ligase VI (
Lig6), increased both stable transformation and T-DNA integration into non-selected plant cells
[102,104][89][91]. These studies suggest that expression of these cNHEJ genes inhibits T-DNA integration, perhaps by speeding the repair of double-strand DNA breaks required for T-DNA integration.
Similarly, individual mutation of two genes associated with MMEJ,
XRCC1 and
PARP2, did not decrease stable transformation of
Arabidopsis root tissue (
[104][91];
PARP1 described in this study has more recently been termed
PARP2). Mutation of
PARP2 actually increased the frequency of T-DNA integration into the genome of non-selected root cells 2- to 10-fold. The discrepancy between increased T-DNA integration frequency and similar stable transformation frequency of wild-type and
parp2 mutant roots was explained by increased DNA methylation of T-DNA in the
parp2 mutant plants, likely resulting in silencing of the selection genes. This result indicates the importance of investigating T-DNA integration biochemically in non-selected tissue, rather than relying on stable transformation frequency of selected tissue as a proxy for T-DNA integration.
6. The Importance of DNA Polymerase Theta for Agrobacterium Transformation and T-DNA Integration
In 2016 van Kregten et al. [7][97] published a seminal paper in which they proposed an essential function for DNA polymerase θ in stable transformation of Arabidopsis Arabidopsis and T-DNA integration into its genome. These authors examined two DNA polymerase θ (polQ) mutants, tebichi (teb) 2 tebichi (teb) 2 and teb5. teb5. Although they could not detect differences in transient transformation between wild-type and polQ polQ mutant plants, they were not able to obtain any stable transformants of the polQ polQ mutants using either a flower-dip transformation protocol or a root transformation protocol requiring selection of transgenic calli and regeneration of plants from these calli. The authors noted that DNA polymerase θ can “template switch” during DNA replication, and that it can thereby generate “filler” DNA sequences, a common characteristic of T-DNA/plant DNA junctions at the break site, by copying and joining T-strand DNA and microhomologous plant DNA. They also noted that copying T-strand sequences into both ends of a plant DNA double-strand break could result in integration of T-DNA “head-to-head” (RB-to-RB) dimers, also a common characteristic of many T-DNA insertions. T-DNA integration via theta-mediated end-joining thus became the favored model for T-DNA integration into plant genomes.
Nishizawa-Yokoi et al. [106][93] re-examined the role of DNA polymerase θ in T-DNA integration. Using the same Arabidopsis teb2 Arabidopsis teb2 and teb5 teb5 mutants used by van Kregten et al. [7][97], as well as three independent rice polQ polQ mutants, this group was able to obtain stable transformants of somatic tissue in all tested polQ polQ mutants. Similar to van Kregten et al. [7][97], they were not able to transform Arabidopsis Arabidopsis by a flower-dip protocol, except when the incoming T-DNA constitutively expressed a wild-type PolQ PolQ gene. These authors additionally showed that transient transformation of roots from the Arabidopsis polQ Arabidopsis polQ mutants did decrease relative to transformation of wild-type roots. T-DNA/plant DNA junctions isolated from transformed rice and Arabidopsis polQ Arabidopsis polQ mutant calli had characteristics similar to those isolated from wild-type tissue. Finally, the authors showed that both Arabidopsis Arabidopsis and rice polQ polQ mutants had growth and/or developmental defects; root segments from Arabidopsis polQ Arabidopsis polQ mutants did not form callus well and the calli grew slowly. Calli derived from rice polQ polQ mutants did not regenerate plants even under non-transformation and non-selection conditions. The variable penetrance of the tebichi phenotype was recently examined and was shown to increase under stress, including replication stress, conditions [110,111][98][99]. Similar to the situation with Arabidopsis Arabidopsis flower-dip transformation, rice polQ polQ mutants could be transformed and regenerated into plants if the incoming T-DNA contained a constitutively expressed PolQ PolQ gene. Thus, transformation and developmental deficiencies resulting from mutation of PolQ PolQ could be complemented by transient expression of a wild-type PolQ PolQ gene in both Arabidopsis Arabidopsis and rice.