Sleeping Beauty (SB) is a transposon system that has been widely used as a genetic engineering tool. Central to the development of any transposon as a research tool is the ability to integrate a foreign piece of DNA into the cellular genome. Driven by the need for efficient transposon-based gene vector systems, extensive studies have largely elucidated the molecular actors and actions taking place during SB transposition. Close transposon relatives and other recombination enzymes, including retroviral integrases, have served as useful models to infer functional information relevant to SB. Recently obtained structural data on the SB transposase enable a direct insight into the workings of this enzyme. These efforts cumulatively allowed the development of novel variants of SB that offer advanced possibilities for genetic engineering due to their hyperactivity, integration deficiency, or targeting capacity. However, many aspects of the process of transposition remain poorly understood and require further investigation. We anticipate that continued investigations into the structure–function relationships of SB transposition will enable the development of new generations of transposition-based vector systems, thereby facilitating the use of SB in preclinical studies and clinical trials.
The capacity of nucleic acids to move around and integrate into a new locus has evolved in manifold ways. Different enzymes have gained the capacity to process nucleic acids and integrate them—namely retroviruses , endogenous retroviruses , and homologous recombination repair mechanisms . Among them, the large family of transposons first described by Barbara McClintock in maize  have the ability to move their genetic information within the genome.
Transposable elements (TEs) can be classified into two groups according to their mechanism of movement. Class I TEs, also called retrotransposons, follow a copy-and-paste mechanism. After transcription of their DNA genome to RNA, a reverse transcription step back into DNA is performed, and a reintegration into the genome occurs . This process has certain similarities with retroviruses. Class I retrotransposons can be further subdivided into long terminal repeat (LTR) retrotransposons; retroviruses  and endogenous retroviruses (ERV) ; and non-LTR retrotransposons, including long interspersed nuclear elements (LINEs, such as the L1 element ) and short interspersed nuclear elements (SINEs, such as the Alu element ).
Class II transposons are DNA transposons solely relying on DNA intermediates in their transposition process. They can be subdivided into two subclasses. Subclass I follows a cut-and-paste mechanism, during which the transposon is excised from one genomic location and reintegrates somewhere else . In contrast, Subclass II transposons, such as members of the Helitron superfamily , follow a copy-and-paste mechanism, during which the element generates copies of itself which integrate into the genome. However, unlike with retrotransposons, the copying mechanism does not involve an RNA intermediate. Subclass I DNA transposons include the superfamilies Transib , piggyBac , PIF/Harbinger , and Tc1/mariner .
All the members of the Tc1/mariner superfamily have in common that these elements are flanked by terminal inverted repeats (TIRs), and contain a gene encoding a transposase, an enzymatic factor catalyzing the transposition reaction . The transposase binds to the TIRs, excises the transposon from the donor locus, and reintegrates it adjacent to a TA target sequence, leading to a TA target site duplication . Members of the Tc1/mariner family are ubiquitous in eukaryotes .
Because the TIRs and the transposase are considered to constitute the minimally required components for the transposition reaction, a transposon that contains all these elements is therefore considered autonomous . However, many autonomous TEs have given rise to non-autonomous derivatives by mutations, insertions, or deletions in their transposase coding regions. These non-autonomous TEs can still be mobilized, but need a functional transposase expressed by another element in the same cell . It is this trans-complementarity between two functional components (the transposase and the specific TIRs that are recognized and mobilized by the transposase) that serves as the basis of turning transposons into genetic vector systems suitable for moving any gene of interest into the genome of a host cell. The Sleeping Beauty (SB) transposon system  is widely used as a genetic engineering tool (recently reviewed in Amberger et al. ). The structural features and mechanistic steps and processes taking place in the life cycle of SB from DNA binding up to integration are described in the following sections.
The SB transposase (Figure 1a) is composed of an N-terminal DNA binding domain (DBD) (amino acids (aa) 1–110) and a C-terminal catalytic domain (DDE) (aa 114–340) connected by a flexible linker region harboring a nuclear localization signal (NLS) (aa 97–123) . The DBD consists of the two subdomains PAI and RED (PAIRED-like DBD) connected by a linker . Each subdomain is predicted to consist of three α-helices forming a helix-turn-helix (HTH) motif which is found in many DNA binding proteins . The predicted HTH motif was confirmed by the NMR structure of the DBD subdomains  (Figure 1b). The NMR structure shows that the three helices of the PAI subdomain are located in the residues aa 12–22, aa 29–33, and aa 39–55, which are tightly packed. The HTH motif is between the second and third helices . Around 30% of the PAI subdomain consists of positively charged amino acids, mainly arginines and lysines, leading to electrostatic repulsion and the destabilization of the structure in the presence of physiological salt concentration and the absence of the TIRs . The three helices of the RED subdomain are located in the residues aa 67–77, aa 84–93, and aa 100–109 . Helices 1 and 2 pack against each other in an antiparallel arrangement, whereas helix 3 is located on top of them . The HTH motif is between helices 2 and 3; however, in contrast to the PAI subdomain, it does not show a canonical β-turn connecting both helices, but a variation in the β-turn with a longer turn-motif . Additionally, helix 3 in the PAI subdomain is one turn longer . Similarly to the PAI subdomain, the RED subdomain is highly positively charged, enhancing its DNA binding .
Figure 1. Structural features of the Sleeping Beauty transposable element. (a) Schematic drawing of the domain structure of the SB transposase. The SB transposase has an N-terminal bipartite, paired-like DNA binding domain (green box) with the helix-turn-helix PAI subdomain (light green box) and RED subdomain (red box) and a GRRR AT-hook motif. It is followed by a bipartite nuclear localization signal (NLS, yellow boxes) and a C-terminal catalytic domain (orange box), with the DDE amino acid triad catalyzing the DNA cleavage and joining reactions. The clamp loop important for protein–protein interactions is overlapping with a glycine-rich box (light orange box). (b) NMR structure of the PAI and RED subdomains of the SB transposase. Reprinted from Protein Science  with permission from the publisher. (c) Crystal structure of the catalytic domain of the SB transposase with the catalytic triad (DDE) and the clamp loop. Reprinted from Nature Communications  with permission from the publisher. (d) Schematic drawing of the autonomous SB transposable element with the transposase coding region (yellow box) and the TIRs (blue arrows). An untranslated region (UTR, green box) is situated between the left TIR and the transposase coding region. The TIRs contain two binding sites for the transposase (orange arrows) represented by short directs repeats (DRs), one inner and one outer DR per TIR. In addition, the left TIR contains a “half-DR” sharing sequence similarities with the DRs. The DR core sequence, with which the PAI subdomain of the SB transposase interacts, is typed in red.
The catalytic domain is predicted to have an RNaseH-like fold, similar to other DDE recombinases . The catalytic triad of three acidic residues (DDE) , giving the domain its name, catalyze the DNA hydrolysis, required for excision, and transesterification, taking place in the integration reaction, in a two-metal-ion-dependent manner . Crystallographic structure analysis revealed the predicted RNaseH-like fold, consisting of a central five-stranded β-sheet surrounded by five α-helices  (Figure 1c). The three catalytic residues (D153, D244, and E279) are in close proximity, making up the active site of the enzyme . The clamp loop (aa 159–190) between β1 and β2 includes a glycine-rich strip (aa 183–190)  which is curved and pivots on three consecutive glycines (aa 188–190) leading to an extended protein-protein surface . The tip of the clamp loop has two short antiparallel β-strands (aa 169–174 and aa 174–176), forming a β-hairpin which is important for the protein–protein interaction with the inter-domain linker (aa 119–122) of a partner SB transposase molecule .
In addition to the transposase, the TIRs of the SB transposon flanking both ends (Figure 1d) are also critically required for the transposition process. When SB is used as a gene delivery tool, any genetic cargo can be placed between the TIRs and mobilized by the transposase. The TIRs are ~220 bp in length and contain two direct repeats (DRs), one outer and one inner, serving as binding sites for the SB transposase. This TIR arrangement has been called the IR/DR structure . Notably, the four DRs of SB are not identical: the outer DRs are longer than the inner DRs by 2 bps (Figure 1d), and even slight variations in the DR sequences can have a severe effect on the transposition efficiency . The left and right TIRs are not identical either; the left TIR has an extra “half-DR” element showing sequence similarities to the transposase binding site (Figure 1d), which acts as a transpositional enhancer . Downstream of the left TIRs is an untranslated region (Figure 1d) that contributes to the transcriptional regulation of the transposase .
The transposition life cycle begins with binding of the transposase to the transposon DNA (Figure 2a). The DNA binding domain of the transposase is mainly responsible for the DNA recognition. Out of the two subdomains (PAI and RED), the PAI subdomain has the dominant role in base-specific DNA binding . The 3′-part of the transposase binding site containing a core sequence conserved in all four DRs is recognized by the PAI subdomain . The DNA binding region of the PAI subdomain is located in the residues aa 28, 29, 31, 33–36, 38–43, and 47, which are situated on the second and third α-helices and on the loop connecting these helices of the HTH motif , which is consistent with the role of HTH motifs in DNA binding . The RED subdomain interacts with the 5′-part of the DR adjacent to the core sequence . This interaction of the RED subdomain with DNA occurs only in the outer DRs and not the inner DRs . Residues located at the third helix of the RED subdomain have been identified to be primarily responsible for the DNA recognition of this subdomain, however helix 1 is also highly positively charged and therefore potentially capable of binding DNA . All of the four transposase binding sites in the IR/DR structure in the TIRs are necessary for SB transposition . An important aspect for the next steps in the life cycle of SB transposition is the formation of a transposase tetramer in a complex with the transposase binding sites . The inner DRs are bound by the transpose with a higher affinity than the outer DRs , which was also confirmed by the NMR data on the PAI subdomain . Additionally, the “half-DR” in the left TIR is bound by the PAI subdomain and mediates protein–protein interactions with other transposase subunits . The PAI subdomain therefore fulfills three important functions: interaction with the DRs, interaction with the “half-DR”, as well as transposase oligomerization. A GRRR amino acid motif contributes as an AT-hook for specific substrate recognition . In domain swapping experiments, it was shown that primary DNA binding is not sufficient to determine the specificity of the transposition reaction . These experiments indicate that the RED subdomain enforces specificity at a later step in transposition and therefore prevents the mobilization of the SB transposon by transposases expressed by other, closely related subfamilies in the same genome. It was also shown that the RED subdomain is involved in protein–protein interactions and forms dimers upon DNA binding . Helix 2 of the RED subdomain has neutral or negative electrostatic potential and therefore could mediate protein–protein interactions . All these observations of the DNA-binding are consistent with the crystal structures of protein-DNA complexes of closely related Tc1/mariner family members such as Tc3 and Mos1 transposases . Because the Tc3 and Mos1 transposons do not have an IR/DR-like structure of their TIRs (instead, these transposons have a single binding site for their transposases at each end of their short TIRs), the presence and strict requirement for IR/DR in SB transposition suggests a regulatory role, which is discussed in the next section.
Figure 2. Schematic drawing of Sleeping Beauty transposition. (a) The SB transposase (blue circle) binds to the DRs (orange arrows) within the TIRs. (b) The TIRs are brought together by SB transposase molecules in a synaptic complex. Excision of the SB transposon takes place from the donor DNA indicated by yellow flashes. (c) The excised transposon integrates into a TA site in the target DNA (green box) that is afterwards duplicated and flanks the new target site.
The next step required in the life cycle of SB transposition is the formation of a nucleoprotein complex called the synaptic complex (Figure 2b and Figure 3). In this complex, both ends of the transposon are paired and held together by transposase subunits. For the formation of a synaptic complex, the complete TIRs with four transposase binding sites (DRs) and tetramerization-competent SB transposase are required. The “half-DR” motif in the left TIR is not essential for transposition, but functions as an enhancer of the transposition together with the PAI subdomain. It likely stabilizes the complexes formed by a transposase tetramer bound at the TIRs .
Figure 3. Schematic drawing of the synaptic complex formation. (a) At first, the SB transposase binds at the inner DR of the left TIR and forms dimers at this site. The SB dimer then captures the inner DR of the other TIR. Two additional SB transposase molecules are recruited to the nucleoprotein complex, leading to an incorporation of the outer DRs into the synaptic complex. (b) Protein–DNA and protein–protein interactions in the SB synaptic complex. The PAI subdomain of the N-terminal DNA-binding domain of the SB transposase interacts with the DR core sequence at both the inner and outer DRs. The RED subdomain contributes to DNA binding only at the outer DRs. At the inner DRs, the RED subdomain contributes to transposase dimerization. The relative positions of the four transposase monomers within the complex are arbitrarily drawn. Based on the structure of the Mos1 synaptic complex , it is likely that the catalytic DDE domains are acting in trans—that is, the DDE domain of an SB monomer bound at the left TIR executes cleavage at the right TIR and vice versa.
For the formation of the synaptic complex, it has been proposed that a defined order of protein–DNA and protein–protein interactions is important  (Figure 3a). In this process, the assembly is mainly orchestrated by the interplay of the IR/DR structure and the PAIRED-like DNA binding domain of the SB transposase. The specific primary DNA recognition is performed by the PAI subdomain at an inner DR, which is bound at a higher affinity than the outer DRs . The contribution of the RED subdomain to the DNA binding at the inner DR is limited, hence the transposase forms dimers through the protein–protein interaction of the RED-RED interface located in helix 2 . The SB transposase could also bind to the inner DR as a preformed dimer. Once bound, this nucleoprotein complex captures the inner DR from the other TIR (Figure 3a). The incorporation of an outer DR into the synaptic complex by the transposase bound at the inner DR of the opposite TIR does not result in productive transposition. In the next step, two additional SB transposase molecules are recruited to the complex through the PAI-PAI protein interaction interface (Figure 3a,b). This leads to the incorporation of the outer DRs in the synaptic complex  (Figure 3a,b). In this step, the RED subdomain is required to complete the assembly process by recognizing the outer DRs, thereby preparing the complex for strand cleavage executed by the catalytic domain  (Figure 3b). This whole process is assisted by a host-encoded cofactor called HMGB1, which is recruited by the SB transposase to the TIRs . HMGB1 facilitates DNA bending at the inner DR, which could enhance the capture of the inner DR on the other TIR . However, the transposition reaction works also in the absence of HMGB1 to a lower extent . This ordered assembly is an important quality control leading to functional transposition intermediates. It is important to note that if the ends of the SB transposon are too close to each other (for example, in a circular DNA molecule), the efficiency of transposition decreases . Indeed, it has been established that efficient SB transposition requires at least ~300 bp DNA bridging the TIRs . A possible explanation for this observation is that a certain length of DNA might be necessary to accommodate the multimeric transposases and the host factor HMGB1 during the formation of the synaptic complex. This orchestrated assembly of the synaptic complex shows that an alteration in the DNA binding affinity of the SB transposase to the DRs does not necessarily enhance the transposition reaction as a whole. Indeed, the replacement of the outer DR with the sequence from the inner DR leads to insufficient SB transposition . The ordered assembly functions therefore as a “built-in” regulatory checkpoint mechanism, enforcing synaptic complex formation before excision and ensuring that DNA cleavage occurs only at the outer DRs, thereby leading to a higher level of accuracy and fidelity in contrast to other transposons with simply structured TIRs .
It is notable that the mechanistic assembly of synaptic complexes is analogous between SB transposition and V(D)J recombination. The sequences recognized by the RAG1/2 recombinase are related and binding is assisted by HMGB1 . The regulation of an ordered assembly of nucleoprotein complexes by somewhat dissimilar recombination sites is also seen in V(D)J recombination , except that V(D)J recombination occurs between heterologous partner sites (following the so-called 12/23 rule), whereas SB transposition involves homologous sequences.
Following the assembly of the synaptic complex, the excision of the SB transposon from the donor locus occurs and DNA double-strand break (DSB) repair on the excision site takes place (Figure 2b and Figure 4). The excision step is crucial for the later integration step, because it results in the exposure of a free 3′–OH group at the transposon ends required for the strand transfer reactions taking place at the integration site  (Figure 4). The first catalytic step in all transposition reactions is a Mg-cation-dependent hydrolysis of the phosphodiester bond in the DNA backbone. This process is catalyzed by all DDE recombinases in a similar way —namely, first strand cleavage generates a single-strand nick by a nucleophilic attack of a H2O molecule, resulting in a free 3′–OH group . The nicking of the first strand is followed by the cleavage of the complementary DNA stand, resulting in a double-strand break (DSB) that liberates the transposon from the donor DNA. To catalyze second strand cleavage, DDE enzymes evolved versatile strategies . Most DDE transposases, including piggyBac, Tn10, hAT, and the RAG1/2 recombinase catalyzing V(D)J recombination, use a single active site to cleave both DNA strands at one transposon end via a DNA hairpin intermediate either on the transposon end or on the flanking donor DNA . However, members of the Tc1/mariner family do not transpose via a hairpin intermediate, indicating that double-strand cleavage is the result of two sequential hydrolysis reactions by the transposase . Indeed, it has recently been shown that all the chemical steps of mariner transposition are executed by a single transposase dimer, in which one monomer performs two sequential strand cleavage and one strand transfer reactions at the same transposon end . The Mos1 mariner transposase cleaves the non-transferred strand first , and we infer that the first cleavage event during SB transposition also occurs at the non-transferred strand of the SB transposon (Figure 4). The first nick introduced by the SB and mariner transposases occurs three nucleotides inside the element  (Figure 4), which, following second strand cleavage at the exact tip of the transposon, generates three-nucleotide-long 3′–overhangs at the ends of both the excised transposon and those of the flanking donor DNA. The DSBs can be repaired by the non-homologous end joining (NHEJ) or homologous recombination (HR) DNA repair pathways . The dominant way to repair transposon excision sites in somatic mammalian cells is NHEJ, which leads to transposon “footprints” being identical to the 3′–overhangs left at the donor site after SB excision  (Figure 4). Factors including Ku70 and DNA-PKcs of the NHEJ pathway have been shown to be required for SB transposition, because they are key contributors to the NHEJ repair of the excision site . A physical interaction of Ku70 with the SB transposase has been observed , suggesting the active recruitment of repair factors to transposon excision sites by the transposase. NHEJ components have also been shown to be required for efficient retroelement integration and V(D)J recombination . However, in contrast to V(D)J recombination, HR-dependent repair at the excision site can also occur in SB transposition . The interaction of different repair factors at DNA DSBs generated by DNA transposition, retroviral integration, or V(D)J recombination probably defines how mechanistically very similar processes can lead to different products.
Figure 4. Molecular events leading towards the formation of transposon footprints and target site duplications in Sleeping Beauty transposition. The SB transposase excises the transposon with staggered cuts and reintegrates it at a TA target dinucleotide. The single-stranded gaps at the integration site and the double-strand DNA breaks at the donor DNA are repaired by the host DNA repair machinery. After repair, the target TA is duplicated at the integration site, and a small footprint is left behind at the site of excision. Reprinted from CMLS  with permission from the publisher.
CpG methylation of chromosomal DNA, leading to the formation of heterochromatin, decreases the transposition activity of different transposons . However, in the case of SB transposition, CpG methylation in mouse embryonic stem (ES) cells leads to an enhanced transposition activity . This effect is not restricted to SB transposons but is a feature that transposons with the characteristic IR/DR structure share . A possible explanation for the enhanced transposition activity upon CpG methylation could be that due to the formation of a tight chromatin structure at the donor site, the SB transposase can more efficiently bring the distant DR sites in the TIRs closely together.
The free 3′–OH-groups exposed at the ends of the excised transposon are essential for the integration step because they act as nucleophiles attacking the phosphodiester bond of the target DNA (Figure 2c). This reaction can be chemically defined as a transesterification reaction that results in a covalent coupling of the transposon ends to the target DNA . In Tc1/mariner transposition, the transposon ends attack the double-stranded target DNA in staggered positions, displaced from one another by 2 bp on the opposite strands. Thus, integration of the two ends of the transposon with 3′-overhangs at staggered positions in the target DNA results in single-stranded gaps which are filled up by the DNA repair machinery  (Figure 4). This characteristic leads to a duplication of the target site flanking the element called target side duplication (TSD), which is commonly observed with many transposons. In the case of SB, the integration occurs at TA dinucleotides, leading to a characteristic TA TSD , although SB integration can rarely occur at non-TA target sites .
Additional molecular mechanisms involved in the integration of SB remain largely unknown. However, studies on related transposases such as Mu  and the Tc1/mariner superfamily member Mos1  can be related to the integration mechanism of SB. In the case of Mu transposition, the target DNA has to be bent by 140° . This bend is promoted by extended interactions along the DNA backbone and by a C-terminal coiled-coil domain, reducing the electrostatic repulsion between the target DNA arms . Additionally, a sharp bend of 147° was observed in the Mos1 complex . It is important to note that the Mos1 post-excision complex  has an equivalent protein and transposon DNA arrangement, such as the strand transfer complex occurring in the integration step . This implies that target DNA binding and integration occurs without major changes in the rest of the complex. Hence, the target DNA bending is important to bring the phosphate group into the active site of the preassembled transposase. This allows then the 3′-OH group of the transposon end to attack the phosphate group of the target DNA. Another important aspect of the target DNA bending is that possibly after integration at the active site the DNA snaps away, making this reaction irreversible. This product escape has been observed in different strand-transfer complexes . In addition, the different spacing of the transposon ends with respect to the target DNA—which in the case of Tc1/mariner transposases a TA dinucleotide pair—requires a different degree of target DNA bending. It is therefore expected that the SB transposase, such as Mos1, should be equipped with the ability to severely deform the DNA double helix at >140°. Furthermore, it is likely that certain sequence-specific features at integration sites contribute to target DNA bending. Alternating pyrimidine-purine bases, known to be associated with bendable DNA structures, are often enriched in the insertion sites of most transposases and integrases . Biochemical studies have indeed shown that flexible, bent, or mismatched sites are more suitable targets for integration . The model of the SB target capture complex also revealed that only bent target DNA can fulfill the requirement for staggered integration  (Figure 5). Although the integration pattern of SB on the genome level is close to random , a direct interaction with the conserved TA target site has to occur. Additionally, the Mos1 strand transfer complex structure can serve here as a model for SB transposition, because it revealed a direct interaction with the adenine in the conserved TA target dinucleotide . The structure shows that the adenine flips out into the extra-helical space and forms base-specific contacts with a valine (V214) of the transposase. The deformed DNA backbone is stabilized by salt bridges and hydrogen bonds with the transposase.
Figure 5. Model of the Sleeping Beauty strand transfer complex. Cartoon representation of the model: SB100X dimer (blue), transposon ends (TIRs, grey), and bent target DNA substrate (tDNA, dark grey). Close up of the target site showing the 3′-OH group attacking the phosphate of the TA target DNA in a staggered way. Reprinted from Nature Communications  with permission from the publisher.
Although 75% of SB transposon excision events are coupled to chromosomal integration, there is a loss of 25% of the events, which are not detectable as extrachromosomal molecules . A possible explanation for this is the suicidal autointegration of the transposon into itself. This suicidal autointegration has been observed in the SB transposon  but also in other transposons such as Tn10  or Mu . The efficacy of transposition usually negatively correlates with the increasing size of the transposon . One possible explanation for this drop in efficacy is the increased numbers of target sites within the transposon itself, which can lead to a higher frequency of autointegration . A host factor called barrier-to-autointegration factor (BAF or BANF1) that has been identified to protect retroviruses  from autointegration was shown to interact with the SB transposase in human cells and found to inhibit the autointegration of SB .
The molecular mechanisms involved in SB transposition also have a dramatic impact on the distribution of integrations across the genome. Indeed, although SB integration is close to random over the genome when transposition is launched out of extrachromosomal plasmids , target site distribution is fundamentally different when the SB transposon is mobilized out of a chromosomal site. When mobilized from a chromosome, an effect called “local hopping” can be observed. Local hopping is a phenomenon where transposition out of a chromosome leads to preferred integration into cis-linked sites in the close vicinity of the donor locus. This feature seems to be shared by all transposons following the cut-and-paste mechanism, but the extent of this effect varies between different transposons. In the case of the P-element transposon from Drosophila, the rate to insert within a window of 100 kb from the donor site is ~50-fold higher than in regions outside this window . Chromosomal SB transposition results in 30–80% of re-integrations occurring locally , but in a larger (up to 15 Mb) window around the donor site . The extent of local hopping is not only divergent between different transposons but is also dependent on the host genome and the donor locus itself . The underlying mechanism of this effect remains unknown, but a potential explanation could be varying affinities of the transposase for chromatin-associated factors in different hosts and locations within the chromosome or the instability of the post-excision complex itself, which could limit the diffusion of the complex away from the donor locus.