The pathway of T-DNA transfer from Agrobacterium
to plant cells, and its ultimate integration into the plant genome, starts with nicking the T-DNA region of a Ti (tumor inducing) or Ri (rhizogenic) plasmid by the T-DNA border-specific endonuclease VirD2 
. Nicking occurs between nucleotides 3 and 4 of the 25 bp border sequences that flank the T-DNA region 
. During T-DNA border nicking, VirD2 covalently links to the 5′ end of T-DNA, resulting in a single-strand form of T-DNA, the T-strand, that on its 3′ end contains nucleotides 4–25 of the left border (LB), and on its 5′ end contains nucleotides 1–3 of the right border (RB) 
. VirD2 subsequently leads the T-strand through a dedicated type IV protein secretion system (T4SS) and into the plant cell 
. Within the plant, T-strands may suffer deletions at the 3′ and/or 5′ ends before or during integration. Deletions are especially prevalent, and generally more extensive, at the 3′ end than at the 5′ end, which is protected by its linkage to VirD2 protein (e.g., 
3. Where in the Plant Genome Does T-DNA Integrate?
Early studies indicated that T-DNA integration is random at the chromosome level 
. Generation of numerous Arabidopsis
and rice T-DNA insertion libraries, each consisting of tens of thousands of individually tagged plant genomes, allowed the first large scale probing of T-DNA insertion locations at the DNA sequence level. Results of these studies initially indicated that T-DNA preferentially integrated into transcriptionally active genes, promoter regions, or sequences of high A+T content 
. These studies, however, all suffered from the problem of selection bias; the individual transgenic plants each harboring a different T-DNA integration event had been selected for antibiotic/herbicide resistance. If T-DNA had integrated into a transcriptionally inert region of the plant genome, the selection marker gene would not have been expressed and the resulting transgenic plant would have been lost. Indeed, fewer T-DNA insertions into heterochromatic regions of DNA, centromeres, telomeres, and rRNA genes were recovered relative to the proportion of the genomes represented by these sequences, resulting in the appearance of T-DNA integration into only transcriptionally active regions of the genome 
In contrast to the studies cited above, two groups examined T-DNA integration sites in the Arabidopsis
genome in which cells were not selected for expression of any transgene, including those for antibiotic/herbicide resistance 
. These experiments indicated that T-DNA did not preferentially integrate into any particular sequence context or region of gene expression. For example, approximately 10% of the Arabidopsis
genome is composed of highly repeated DNA sequences, and ~10% of the insertions occurred in these sequences. T-DNA insertions into rDNA, centromeres, and telomeres also approximated their relative proportion of the genome. T-DNA pre-integration sites were average in their extent of transcription and methylation, as compared with the entire genome 
. Thus, T-DNA integration did not preferentially occur into any particular chromosome sequence or feature. These results were reproduced using a high-throughput sequencing strategy to identify, without selection, T-DNA integrated into the Arabidopsis
genome within six hours after infection 
. This group also failed to identify preferential T-DNA integration into particular sequences, and the extent of DNA methylation of pre-integration sites was not biased. They did identify a slight local A+T motif enrichment at the pre-integration site, and microhomology was often observed between the T-DNA border sequences and the pre-integration site. Microhomology between T-DNA border regions and pre-integration sites is a common feature of T-DNA integration (e.g., 
). The one distinct feature of pre-integration sites was a high nucleosome occupancy and a high level of histone H3K27 trimethylation. However, data for these last two items were based on database entries and not on direct analysis of chromatin from the tissues the authors used in their studies.
Taken together, these two studies do not point to any distinctive plant DNA or chromatin features as T-DNA target integration sites. However, chromatin conformation may influence T-DNA integration. The rat5 Arabidopsis
mutant is highly susceptible to transient transformation but highly resistant to stable transformation by Agrobacterium 
. This mutant contains a T-DNA insertion in the 3′ untranslated region of the histone H2A-1 gene HTA1 
. Overexpression of HTA1
in otherwise wild-type Arabidopsis
plants increased stable transformation, as it also did in rice 
. The expression of HTA1
correlates with Arabidopsis
cell and tissue susceptibility to Agrobacterium
-mediated transformation 
, and all tested histone H2A proteins were functionally redundant with respect to increasing transformation when expressed in Arabidopsis 
. In addition to histone H2A, overexpression of histone H4 and one histone H3 protein (HTR11) also conferred hyper-susceptibility to Agrobacterium
-mediated transformation. However, overexpression of other histone H3 proteins and all tested histone H2B genes had no effect on transformation 
. It is not clear whether manipulation of histone levels in plant cells directly alters the extent of T-DNA integration, or whether these altered histone levels influence plant gene expression, thereby resulting in T-DNA integration 
. Nevertheless, the direct interaction of VirD2 with histones in yeast suggests that histones may help direct VirD2/T-strand complexes to the host genome 
or host proteins guide T-strands to host chromatin prior to integration? VirE2 protein, a virulence effector protein secreted by Agrobacterium
into plant cells, is a single-strand DNA-binding protein that has been proposed to interact with VirD2/T- strands in the plant cell, forming a “T-complex” 
. (It should be noted that such complexes have never been identified in Agrobacterium
-infected plants). VirE2 can interact with the plant bZIP transcription factor VIP1 
. VIP1 can interact with histones and nucleosomes, suggesting that VIP1 may be a molecular link between T-complexes entering the nucleus and T-DNA integration sites in the plant chromosomes 
. However, two recent publications demonstrated that neither VIP1 nor its orthologs are required for Agrobacterium
-mediated transformation, throwing into doubt a role for VIP1 in T-DNA integration 
In addition to the involvement of histones in Agrobacterium
-mediated transformation and, perhaps, T-DNA integration, histone-associated and -modifying proteins also affect transformation. Crane and Gelvin 
tested RNAi lines individually directed against 109 Arabidopsis
chromatin-related genes for transient and stable transformation. Silencing of 24 of these genes decreased transformation. In particular, silencing of SGA1
(encoding a histone H3 chaperone) and HDT1
(encoding histone deacetylases) greatly decreased both stable transformation and T-DNA integration. Silencing of genes involved in chromatin remodeling, DNA methylation, histone acetylation, and nucleosome assembly also had an effect on stable transformation, although this may result from secondary effects these genes have on the expression of other genes involved in stable transformation. Deletion of genes encoding components of yeast histone acetyltransferase complexes increased yeast transformation, whereas deletion of genes encoding proteins of histone deacetylase complexes decreased yeast transformation 
. For some of these mutants, integration of T-DNA into the yeast genome was disrupted.
Taken together, these results indicate that proteins associated with chromatin structure and modification are important for T-DNA integration into plant or yeast genomes, and that in some instances, the effect on integration may be direct rather than indirectly influencing the expression of other genes important for T-DNA integration.
As described above, the position of T-DNA integration, and the chromatin structure of that region, may influence the expression of T-DNA-encoded transgenes. As a practical consideration, this variability in transgene expression (the so-called “position effect”) will influence studies on gene and promoter function. Scientists therefore need to examine a large number of independent transgenic events to draw conclusions about, e.g., relative promoter strengths.
4. T-DNA integrates into plant DNA break sites
T-DNA preferentially integrates into double-strand DNA breaks . This observation was followed by two other reports also showing preferential T-DNA integration into double-strand break sites . In each of these studies, a rare cutting meganuclease (either I-SceI or I-CeuI) was used to cut tobacco DNA during transformation. T-DNA was preferentially “trapped” in these cut sites at frequencies up to several percent of the examined integration events. More recently, scientists used CRISPR technology to generate double-strand breaks in DNA, either to generate site-directed mutations or to attempt homology-dependent repair using recombination with correction templates. In several instances, T-DNA was trapped at these break sites following Cas nuclease cutting (e.g., ). It is thus clear that double-strand DNA breaks can act as a “T-DNA magnet”. However, does Agrobacterium take advantage of naturally occurring host DNA breaks (or nicks), or can Agrobacterium infection perhaps induce host DNA disruptions?
That Agrobacterium can incite DNA breaks would not be unusual, because inoculation by other plant pathogens (bacteria, oomycetes, and fungi) can cause double-strand DNA breaks in host plant genomes . DNA disruptions occur in Arabidopsis cells near the site of Agrobacterium infection, as detected by COMET assays. However, because alkaline pH conditions were used in this study, it is not clear whether these disruptions resulted from single-strand nicks or double-strand breaks in the plant DNA . Recent results indicate that Arabidopsis cells, exposed to Agrobacterium but not stably transformed, contain a higher number of in/dels than would be expected from the natural frequency of such mutations . These results suggest that incubation of cells with Agrobacterium is inherently mutagenic, causing double-strand DNA breaks that are mis-repaired.
There are many hints in the literature that Agrobacterium infection can cause mutations independent of T-DNA integration; these mutations may result from induced double-strand DNA breaks that are subsequently mis-repaired. They may also be generated by “abortive integration” of T-DNA, followed by mis-repair of the abortive integration site. For example, N. plumbaginifolia plants, containing one mutant nitrate reductase (NR) gene, could be converted to fully NR null mutants (chlorate resistant) following Agrobacterium−mediated transformation. However, none of these null mutants contained T-DNA in the NR gene . Mutation of the wild-type NR allele must have occurred by some other mechanism.
Another indication that Agrobacterium
infection may be inherently mutagenic derives from the observation that only ~35% of the T-DNAs in Arabidopsis
T-DNA insertion libraries co-segregate with a screened mutant phenotype 
. Mutations in the selected lines may be derived from disruptions other than T-DNA insertion into the gene of interest.
T-DNA insertion lines contain complex host genome rearrangements that are frequently associated with mis-repair of double-strand DNA breaks. These include inversions, translocations, and other complex rearrangements 
. Similar rearrangements have been detected in transgenic rice 
. Clark and Krysan 
noted that approximately 19% of the examined lines from the SALK T-DNA mutant collection contained translocations. The rearrangements of plant genomes following T-DNA integration are reminiscent of the process of chromothripsis resulting from CRISPR−Cas9 mammalian genome editing 
5. What Is the Mechanism of T-DNA Integration?
Perhaps the most important problem remaining in understanding Agrobacterium
-mediated transformation is the mechanism of T-DNA integration. As cited above, numerous models of integration have been proposed. What is clear is that homologous recombination is not the mechanism: Despite many kilobases of homology between plant DNA and engineered T-DNA, integration into homologous sequences in the plant genome occurs extremely rarely. This differs from the situation in yeast, where homologous recombination is predominant when homology between T-DNA and the yeast genome is present (see, e.g., 
. Thus, what remains for the T-DNA integration mechanism in plants is some form of non-homologous end-joining (NHEJ) in which T-DNA integration occurs in the absence of large regions of homology, although targeting by microhomology may be used in some circumstances.
Two major NHEJ pathways have been described (e.g., 
). The “classical” (Ku−dependent) pathway utilizes, among other proteins, the Ku70/Ku80 heterodimer to protect the broken DNA ends, and the complex of XRCC4/XLS/DNA ligase IV to repair the break. It is not unusual that, following repair, small deletions, insertions, or nucleotide substitutions occur at or near the break site. Microhomology between the ligated ends is rarely detected. An “alternative” pathway uses microhomology between a region at or near the break site and another sequence (near or distant from the break site) for repair. Participants in this pathway include members of the MRN complex that process broken chromosome ends, the WRN helicase, and a complex of XRCC1 and DNA ligase III (not found in plants) to repair the breaks. DNA polymerase θ is a participant in this pathway and is proposed to play a key role in T-DNA integration. Microhomology-mediated end-joining (MMEJ) is frequently referred to as theta-mediated end-joining because of DNA polymerase θ’s role in this process. DNA polymerase θ has several unusual properties: the protein is made up of both a helicase and a DNA polymerase domain, and the enzyme has a propensity to “template switch”. This latter property allows it to copy DNA from another region of the genome into break sites, generating “filler” DNA sequences in the break. MMEJ is highly mutagenic, frequently generating deletions as sequences flanking DNA break sites search for homologous sequences with which to join. MMEJ also generates chromosomal rearrangements such as inversions and translocations, features commonly associated with T-DNA integration.
Which of these NHEJ pathways, if any, are involved in T-DNA integration? Numerous studies have been published testing stable transformation efficiencies and T-DNA integration characteristics of various Arabidopsis
and rice NHEJ mutants 
. However, with the exception of three publications 
, all other studies used the frequency of stable transformation as a proxy for T-DNA integration. While detection of stable transformants requires T-DNA integration, it also requires expression of selection marker genes to recover transformed tissue. T-DNA integration may thus occur in the absence of stable transformation if selection marker genes have been silenced. As noted above, such selection bias can confound experimental interpretations 
. An additional complication is that most Arabidopsis
stable transformation experiments were conducted using a flower-dip protocol. It is well-documented that the importance of Arabidopsis
genes essential for somatic cell transformation differs from that of germ-line transformation 
. Finally, stable transformation efficiencies must be calculated with respect to transient transformation frequencies; a decrease in stable transformation may not indicate that a particular plant mutant has altered stable transformation characteristics if the transient transformation frequency is correspondingly altered. It is particularly important that plant inoculations be conducted with several orders of magnitude different Agrobacterium
concentrations to avoid a “saturation response” with high bacterial inoculum conditions, thus obscuring differences among wild-type and mutant plant genotypes.
In light of these numerous variables and limitations, it may not be surprising that different laboratories have come to different conclusions with regard to the importance of various plant NHEJ genes for T-DNA integration (or rather, for most studies, stable transformation). Several reports indicate that mutation of the Arabidopsis
or rice classical (c)NHEJ pathway genes Ku70
, or DNA ligase IV (Lig4
) resulted in lower stable transformation frequencies 
. These studies suggest that these cNHEJ genes are important for T-DNA integration. Other publications indicated that such mutations had little or no effect on stable transformation 
. These studies suggest that these cNHEJ genes are not essential for T-DNA integration. Still other publications, using both Arabidopsis
and N. benthamiana
, showed that mutation or down-regulation of several cNHEJ genes, including Ku70
, and the gene encoding DNA ligase VI (Lig6
), increased both stable transformation and T-DNA integration into non-selected plant cells 
. These studies suggest that expression of these cNHEJ genes inhibits T-DNA integration, perhaps by speeding the repair of double-strand DNA breaks required for T-DNA integration.
Similarly, individual mutation of two genes associated with MMEJ, XRCC1
, did not decrease stable transformation of Arabidopsis
root tissue (
described in this study has more recently been termed PARP2
). Mutation of PARP2
actually increased the frequency of T-DNA integration into the genome of non-selected root cells 2- to 10-fold. The discrepancy between increased T-DNA integration frequency and similar stable transformation frequency of wild-type and parp2
mutant roots was explained by increased DNA methylation of T-DNA in the parp2
mutant plants, likely resulting in silencing of the selection genes. This result indicates the importance of investigating T-DNA integration biochemically in non-selected tissue, rather than relying on stable transformation frequency of selected tissue as a proxy for T-DNA integration.
6. The Importance of DNA Polymerase Theta for Agrobacterium Transformation and T-DNA Integration
In 2016 van Kregten et al.  published a seminal paper in which they proposed an essential function for DNA polymerase θ in stable transformation of Arabidopsis and T-DNA integration into its genome. These authors examined two DNA polymerase θ (polQ) mutants, tebichi (teb) 2 and teb5. Although they could not detect differences in transient transformation between wild-type and polQ mutant plants, they were not able to obtain any stable transformants of the polQ mutants using either a flower-dip transformation protocol or a root transformation protocol requiring selection of transgenic calli and regeneration of plants from these calli. The authors noted that DNA polymerase θ can “template switch” during DNA replication, and that it can thereby generate “filler” DNA sequences, a common characteristic of T-DNA/plant DNA junctions at the break site, by copying and joining T-strand DNA and microhomologous plant DNA. They also noted that copying T-strand sequences into both ends of a plant DNA double-strand break could result in integration of T-DNA “head-to-head” (RB-to-RB) dimers, also a common characteristic of many T-DNA insertions. T-DNA integration via theta-mediated end-joining thus became the favored model for T-DNA integration into plant genomes.
Nishizawa-Yokoi et al.  re-examined the role of DNA polymerase θ in T-DNA integration. Using the same Arabidopsis teb2 and teb5 mutants used by van Kregten et al. , as well as three independent rice polQ mutants, this group was able to obtain stable transformants of somatic tissue in all tested polQ mutants. Similar to van Kregten et al. , they were not able to transform Arabidopsis by a flower-dip protocol, except when the incoming T-DNA constitutively expressed a wild-type PolQ gene. These authors additionally showed that transient transformation of roots from the Arabidopsis polQ mutants did decrease relative to transformation of wild-type roots. T-DNA/plant DNA junctions isolated from transformed rice and Arabidopsis polQ mutant calli had characteristics similar to those isolated from wild-type tissue. Finally, the authors showed that both Arabidopsis and rice polQ mutants had growth and/or developmental defects; root segments from Arabidopsis polQ mutants did not form callus well and the calli grew slowly. Calli derived from rice polQ mutants did not regenerate plants even under non-transformation and non-selection conditions. The variable penetrance of the tebichi phenotype was recently examined and was shown to increase under stress, including replication stress, conditions . Similar to the situation with Arabidopsis flower-dip transformation, rice polQ mutants could be transformed and regenerated into plants if the incoming T-DNA contained a constitutively expressed PolQ gene. Thus, transformation and developmental deficiencies resulting from mutation of PolQ could be complemented by transient expression of a wild-type PolQ gene in both Arabidopsis and rice.