TAL (transcription activator-like) effectors (often referred to as TALEs, but not to be confused with the three amino acid loop extension homeobox class of proteins) are proteins secreted by Xanthomonas bacteria via their type III secretion system when they infect various plant species. These proteins can bind promoter sequences in the host plant and activate the expression of plant genes that aid bacterial infection. They recognize plant DNA sequences through a central repeat domain consisting of a variable number of ~34 amino acid repeats. There appears to be a one-to-one correspondence between the identity of two critical amino acids in each repeat and each DNA base in the target sequence. These proteins are interesting to researchers both for their role in disease of important crop species and the relative ease of retargeting them to bind new DNA sequences. Similar proteins can be found in the pathogenic bacterium Ralstonia solanacearum and Burkholderia rhizoxinica., as well as yet unidentified marine microorganisms. The term TALE-likes is used to refer to the putative protein family encompassing the TALEs and these related proteins.
Xanthomonas are Gram-negative bacteria that can infect a wide variety of plant species including pepper, rice, citrus, cotton, tomato, and soybeans. Some types of Xanthomonas cause localized leaf spot or leaf streak while others spread systemically and cause black rot or leaf blight disease. They inject a number of effector proteins, including TAL effectors, into the plant via their type III secretion system. TAL effectors have several motifs normally associated with eukaryotes including multiple nuclear localization signals and an acidic activation domain. When injected into plants, these proteins can enter the nucleus of the plant cell, bind plant promoter sequences, and activate transcription of plant genes that aid in bacterial infection. Plants have developed a defense mechanism against type III effectors that includes R (resistance) genes triggered by these effectors. Some of these R genes appear to have evolved to contain TAL-effector binding sites similar to site in the intended target gene. This competition between pathogenic bacteria and the host plant has been hypothesized to account for the apparently malleable nature of the TAL effector DNA binding domain.
The most distinctive characteristic of TAL effectors is a central repeat domain containing between 1.5 and 33.5 repeats that are usually 34 residues in length (the C-terminal repeat is generally shorter and referred to as a “half repeat”). A typical repeat sequence is LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG, but the residues at the 12th and 13th positions are hypervariable (these two amino acids are also known as the repeat variable diresidue or RVD). Two separate groups have shown that there is a simple relationship between the identity of these two residues in sequential repeats and sequential DNA bases in the TAL effector’s target site. The first group, headed by Adam Bogdanove, broke this code computationally by searching for patterns in protein sequence alignments and DNA sequences of target promoters. The second group deduced the code through molecular analysis of the TAL effector AvrBs3 and its target DNA sequence in the promoter of a pepper gene activated by AvrBs3. The experimentally validated code between RVD sequence and target DNA base can be expressed as NI = A, HD = C, NG = T, NN = R (G or A), and NS = N (A, C, G, or T). Further studies have shown that the RVD code NG (but not HD) can target 5-methyl-C. Also, the RVD NK can target G, although TAL effector nucleases (TALEN) that exclusively use NK instead of NN to target G can be less active. The crystal structure of a TAL effector bound to DNA indicates that each repeat comprises two alpha helices and a short RVD-containing loop where the second residue of the RVD makes sequence-specific DNA contacts while the first residue of the RVD stabilizes the RVD-containing loop. Target sites of TAL effectors also tend to include a thymine flanking the 5’ base targeted by the first repeat; this appears to be due to a contact between this T and a conserved tryptophan in the region N-terminal of the central repeat domain. However, this "zero" position does not always contain a thymine, as some scaffolds are more permissive.
This simple correspondence between amino acids in TAL effectors and DNA bases in their target sites makes them useful for protein engineering applications. Numerous groups have designed artificial TAL effectors capable of recognizing new DNA sequences in a variety of experimental systems. Such engineered TAL effectors have been used to create artificial transcription factors that can be used to target and activate or repress endogenous genes in tomato, Arabidopsis thaliana, and human cells.
Genetic constructs to encode TAL effector-based proteins can be made using either conventional gene synthesis or modular assembly. A plasmid kit for assembling custom TALEN® and other TAL effector constructs is available through the public, not-for-profit repository Addgene. Webpages providing access to public software, protocols, and other resources for TAL effector-DNA targeting applications include the TAL Effector-Nucleotide Targeter and taleffectors.com.
TAL effectors can induce susceptibility genes that are members of the NODULIN3 (N3) gene family. These genes are essential for the development of the disease. In rice two genes,Os-8N3 and Os-11N3, are induced by TAL effectors. Os-8N3 is induced by PthXo1 and Os-11N3 is induced by PthXo3 and AvrXa7. Two hypotheses exist about possible functions for N3 proteins:
Engineered TAL effectors can also be fused to the cleavage domain of FokI to create TAL effector nucleases (TALEN) or to meganucleases (nucleases with longer recognition sites) to create "megaTALs." Such fusions share some properties with zinc finger nucleases and may be useful for genetic engineering and gene therapy applications.
TALEN-based approaches are used in the emerging fields of gene editing and genome engineering. TALEN fusions show activity in a yeast-based assay, at endogenous yeast genes, in a plant reporter assay, at an endogenous plant gene, at endogenous zebrafish genes, at an endogenous rat gene, and at endogenous human genes. The human HPRT1 gene has been targeted at detectable, but unquantified levels. In addition, TALEN constructs containing the FokI cleavage domain fused to a smaller portion of the TAL effector still containing the DNA binding domain have been used to target the endogenous NTF3 and CCR5 genes in human cells with efficiencies of up to 25%. TAL effector nucleases have also been used to engineer human embryonic stem cells and induced pluripotent stem cells (IPSCs) and to knock out the endogenous ben-1 gene in C. elegans.