Transposable Elements in Land Plants: Comparison
Please note this is a comparison between Version 1 by Corinne Mhiri and Version 2 by Dean Liu.

Transposable elements (TEs) are important components of most plant genomes. These mobile repetitive sequences are highly diverse in terms of abundance, structure, transposition mechanisms, activity and insertion specificities across plant species.

  • transposable element
  • transposition control
  • plant genome
  • TE classification

1. Introduction

Transposable elements (TEs) can be defined as repetitive DNA sequences able to move/transpose throughout their host genome. They were first discovered in maize by Barbara McClintock in the 1940s as controlling elements able to modify gene expression and change their location upon genomic stress, such as chromosomal double-strand breaks [1]. With the development of molecular biology and sequencing technologies, the detection of mobile genetic elements has been generalized to almost all living organisms. Their high abundance in some genomes (>80% in maize) [2] and extreme diversity in transposition modes and insertion profiles resulted in a progressive interest of the research community for studying the biology of these repetitive sequences, and the way they interact and coevolve with their host genome. Initially seen as invading parasitic sequences because of their proliferative and mutational abilities [3][4][3,4], TEs have been progressively considered as important components of eukaryotic genomes since the discovery of some ‘useful’ TEs contributing to gene expression regulation or enzymatic functions [5][6][5,6]. Such opinion changes, considering TEs as ‘facilitators of evolution’ for their host organisms, have been well documented [7][8][9][10][7,8,9,10].

2. Plant TE Landscape

2.1. A Highly Variable TE Abundance

Due to their ubiquity, abundance and transposition activity, TEs have been proposed as major contributors to genome size, along with other mechanisms such as recombinational rate and polyploidization [11]. Among living organisms, land plants (regrouping Bryophytes, Pteridophytes, and seed plants), and especially flowering plants (Angiosperms), display one of the largest genome size variability exceeding 2400-fold, with C-values ranging from 0.07 pg (65 Mb/1 C) for the small carnivorous plant Genlisea tuberosa to 152.23 pg (149 Gb/1 C) for the monocot lily species Paris japonica ( accessed on 10 March 2022). Indeed, TEs seem to account for the variable proportion of plant genomes sequenced to date, spanning from ~3% in the small 82 Mb carnivorous Utricularia gibba [12] to ~85% in allohexaploid wheat (Triticum aestivum) [13] or maize genome [2].

2.2. Challenging Evaluation of Plant TE Diversity and Classification

Our knowledge about TEs structure, organization and transposition mechanisms has greatly progressed since the discovery of the first TE sequences in plants (the maize Ac/Ds elements) in 1984 by two different laboratories [14][15][14,15]. Transposable elements are extremely diverse and use various mechanisms to move. The so-called autonomous elements encode all specific functions to achieve their mobility, while some non-autonomous elements hitch-hike mobility proteins from autonomous copies, or from other TEs to transpose.
Several types of TEs might co-exist in a given genome, and each TE type can harbor multiple TE copies clustered into different families according to their sequence similarity. As some transposition mechanisms are prone to generate mutations, each family might have evolved over time, displaying a continuum of more or less diverged copies composed of both autonomous and defective elements [7][8][10][16][7,8,10,16]. Such behaviour made TE identification and classification a difficult task.
In 1989, the first TE nomenclature organized TEs into two classes according to their transposition intermediate [17]: (1) an RNA intermediate for retrotransposons (class I elements) that move via a replicative “copy-and-paste” mechanism, where a “mother” copy gives rise to several “daughter” copies without excising itself; (2) a DNA intermediate for DNA transposons (class II elements) that use a conservative “cut-and-paste” mechanism for their transposition, where the “mother” copy excises from its location to insert elsewhere in the genome (“jumping gene” concept).
This bimodal schematic classification has been further refined in 2007 (1) by creating subclasses in the class II group to include TEs that use a DNA replicative mechanism, such as Helitrons, and (2) by setting the “80-80-80” rule to define the identity percentage TE copies should share to belong to the same family, i.e., sequences over 80 bp sharing at least 80% sequence identity in at least 80% of their internal domain or terminal repeats (or both) [16][18][19][16,18,19]. This TE hierarchical classification splits TEs into the two previously cited classes (Class I and Class II), then into subclasses, orders and superfamilies [18]. This classification is based on the presence and order of coding regions for specific proteins or structural motifs present in TE sequences, and on their transposition specificities (presence and sequence of target site duplications (TSDs), i.e., short 2–11 bp DNA duplications, generated upon TE transposition at the new insertion site). For example, the presence of long terminal repeats (LTRs) in direct orientation at TE 3′ and 5′ ends is the signature of LTR-retrotransposons (LTR-RTs). The occurrence of the reverse transcriptase (RT) domain in TE ORF is a hallmark of most but not all Class I retrotransposons. Like self-replicating entities as viruses, TEs have a modular evolution, exchanging essential or facultative protein-coding domains that may blur TE classification [19].
If these first four classification levels are generally well accepted, the need of extra levels to reach the TE family level is still unclear and may vary according to the TE considered. For example, the superfamily subgroups “chromovirus” or “non-chromovirus” have been introduced for Gypsy LTR-RTs [20]. An increase in family/lineage number or even new superfamilies will probably arise with the accumulation of genome sequencing data and the improvement of bioinformatic pipelines for TE detection and annotation. This hierarchical classification also does not include some non-autonomous TEs, i.e., TEs that are still able to transpose but need extra-function in trans coming from another TE element (phylogenetically related or not). This includes LARDs (large retrotransposon derivatives), TRIMs (transposon in miniature) and SMARTs (small LTR-retrotransposon) for Class I TEs, and MITEs (miniature inverted-repeats transposable elements), SNACs (small non-autonomous CACTA) or MULEs (Mu-like elements) for class II TEs. Many such non-autonomous elements share enough sequence similarity/motifs to be easily linked to a candidate “helper” autonomous TE family, as reported in rice with the isolation of the complete RIRE2 Gyspy retrotransposon displaying a high LTR sequence similarity to sequence extremities of the defective Dasheng retrotransposon [21]. Another example is the description of both complete and truncated—but nevertheless active—CACTA Caspar elements in Triticeae [16][22][16,22]. Some other non-autonomous TEs may not present any clear feature allowing them to be classified, as some non-autonomous short TIR-harboring Tes that may share only a few bases homology with the autonomous helper [16]. In the classification we propose, we choose to include the non-autonomous SINES as a full order, as these TEs do not correspond to deleted versions of autonomous class I TEs.
Table 1 and Figure 1 present, respectively, the up-to-date classification and structures of plant transposable elements adapted from Wicker et al. [16]. It is important to note that only a subset of existing TE superfamilies in all living organisms (as reported in repbase accessed on 10 March 2022) has been detected in land plants (~30% of class II superfamilies, and ~17% of class I—representing 20% of all described superfamilies). Plant TEs fall into six different orders, four orders corresponding to class I elements (LTR-RTs, Penelope-like elements (PLE), long interspaced nuclear elements (LINEs), short interspaced nuclear elements (SINEs)) and two orders including elements of class II (terminal inverted repeat (TIR) transposons, Helitrons) (Table 1).
Figure 1. Structure and organization of plant transposable element superfamilies (adapted from [16]). Schemes are not to scale. Protein coding domains: APE = apurinic endonuclease, CHR = chromodomain, EN = endonuclease, GAG = capsid protein, HEL = helicase, INT = integrase, PROT = proteinase, RH = RNAse H, RPA = replication protein A, RT = reverse transcriptase. eORF = extra open reading frame (unknown function), Tpase = transposase (* with DDE motif), YR = tyrosine recombinase, Y2 = YR with YY motif, ◊ = different possible locations of an additional cellular-like ribonuclease H (aRH) specific of the Tat lineages (see Table 1). Optional protein-coding domains only present in some superfamily lineages are indicated in brackets. Some structural features are also represented. Terminal repeats in the same or reverse orientation are indicated by black arrows, and purple rectangles refer to diagnostic sequences present in non-coding sequences. Specific base termination of some TEs are also indicated. PBS = primer binding site, PPT = poly purine tract. Interrupted line in Helitron representation means that the region may contain one or more additional ORFs.
Table 1. Plant transposable element (TE) classification compiled from [16] with updates from [23] for Copia lineages, ref. [20] for Gypsy LTR retrotransposons (LTR-RTs), ref. [24][25][26][24,25,26] for Penelope-like elements (PLEs), ref. [27] for long interspaced nuclear elements (LINEs), ref. [28][29][28,29] for short interspaced nuclear elements (SINEs), and [30][31][30,31] for Sola elements.

(Non-Autonomous TE Name)
SuperfamilyFamily/LineagePlant Family Examples
Class ILTR-RetrotransposonsCopiaOsser
Volvox canteri
(retrotransposons)(LARD) Brycorepresentatives in moss species
 (TRIM/SMART)Lycorepresentatives in clubmosses species (
  Gymco-Irepresentatives in gymnosperms species
  Gymco-IIrepresentatives in gymnosperms species
  Gymco-IIIrepresentatives in gymnosperms species
  Gymco-IVrepresentatives in gymnosperms species
Oryza longistaminata
Oryza sativa
Oryza sativa
 Oryko1-1 and Ilona, 
Hordeum vulgare
Nicotiana tabacum
  IkerosZea mays Sto-4
Nicotiana tabacum
 Tnt1, Tto1 and Tnt2, 
Solanum lycopersicum
Ipomea batatas
  Alesialow copy number representatives in many Angiosperms, close to the Ale lineage
Triticum aestivum
Oryza sativa
Hordeum vulgare
Arabidopsis thaliana
Solanum lycopersicum
Zea mays
Glycine max
 spp. Houba and Osr-1, 
Arabidopsis thaliana
  Gypsy (Chromovirus)Galadriel
Solanum esculentum
 Monkey, Tntom1
Hordeum vulgare
Arabidopsis thaliana
 Legolas Peabody, 
Oryza sativa
Lilium henryi
   ReinaZea mays Reina, 
Arabidopsis thaliana
 Gloin or Gimli
   CRMZea mays CRM (centromeric retrotransposon of maize), 
Beta vulgaris
Oryza sativa
Phycomitrella patens
 Chr21 (4035670,4045566)
Selaginella moellendorffii
Arabidopsis thaliana
 Athila4-1, Diaspora, 
Hordeum vulgare
Selaginella moellendorffii
Picea abies
Picea glauca
Picea abies
Picea glauca
   Ogre/TatIV + TatV
Pisum sativum
   Retand/TatVIZea mays Cinful-1, 
Arabidopsis thaliana
Oryza sativa
Sorghum bicolor

Silene latifolia
 Non-LTR retrotransposons

Pinus taeda
 (loblolly pine) and 
Picea abies
 (Norway spruce) 
 PLEs by horizontal transfer
Selaginella moellendorffii
 spike moss, 
Pinus taeda
Picea abies
 LINEL1Llbsweet potato Llb, 
Beta vulgaris
Cannabis sativa
Beta vulgaris
 Belline2, Belline5
Beta vulgaris
Carica papaya
Solanum tuberosum
Vitis vinifera
   Cin4Zea mays Cin4
Oryza sativa
Oryza sativa
 LINE-1 or OSLINE1-4, Zea mays L1-2_ZM
  RTEplant RTE
Malus x domestica
Solanum tuberosum
Nicotiana tabacum
 TS, Au, Solanales SolS-II, Brassicale BraS-I, SB families, mainly found in Angiosperm
Class II

Subclass 1
TIR (MITE)Tc1-Mariner Stowaway (MITE): 
Sorghum bicolor
  hAT Zea mays Ac/Ds, 
Antirrhinum majus
Nicotiana tabacum
 Sola1, found also in 
Capsicum annuum
C. baccatum
 (MULE)MuDR-Foldback Zea mays Mu, MULEs
  PIF-Harbinger Zea mays PIFa, 
Oryza sativa
 Pong; Tourist (MITE): mPing/Ping; mPIF/PIFa
  CACTA Zea mays En/Spm
  CACTA Zea mays En/Spm, Arabidopsis thaliana CAC1, 
rabidopsis thalian
hinum m
a CAC1, 
 Tam1, Petu
ntirrhinum m
a hybrid
ajus Tam1, Petunia hybrida PsI
Subclass 2
Oryza sativa
Arabidopsis thaliana
 AthE1 Atrep, 
Ipomoea tricolor
In plants, class II elements (subclass 1) belonging to the TIR order (also called DNA transposons) fall into six superfamilies, based on the structure and sequence of their transposase and on the sequence of their terminal inverted repeats (TIRs) (Figure 1). Transposase is the protein catalyzing their transposition, while TIRs harbor key sequences recognized by the DNA-binding domains of the transposase during a transposition event. Some TIR elements also harbor additional coding sequences, as the maize MuDR, and plant CACTA or PIF/Harbinger elements [16]. Most of these superfamilies are also characterized by specific target site duplications (TSDs) lengths, generated after the filling of DNA nicks generated by the transposase on the integration site. Their transposition is not always strictly conservative and could lead to an increase of copy number if it occurs before a DNA replication fork [32].
Replicating plant TEs fall into two major groups: (1) Helitrons (class II-subclass 2, see Table 1) replicate through a rolling-circle (RC) mechanism from one DNA strand, without generating TSDs, by using a RepHel protein with a RC replication initiator (Rep) and DNA helicase (Hel) domains, in association with an ssDNA-binding “replication protein A” (RPA) [33]. (2) Class I retrotransposons replicate from RNA templates by reverse transcription using a TE-encoded reverse transcriptase (RT) and use at least one additional protein to mediate their insertion into their host genome, such as endonuclease (EN) or DDE integrase (INT). We do not include the DIRS superfamily as a member of land plants, as DIRS elements have only been found in green algae until now [34].
Among the four retrotransposon orders present in plants, SINEs occupy a particular place, as these small non-coding and non-autonomous elements of a few hundred base pairs exploit the transposition machinery of LINEs to ensure their amplification. Plant SINEs are derived from tRNAs [28]. They are transcribed by polymerase III, harbor short degenerated internal promoters (A and B boxes), and display mostly A tail at the 3′-end. Apart from these small structural domains, SINEs display a high sequence diversity that hinders their detection and characterization. Recently, a 37 pb Angio-domain located in the 3′-end has been reported in many Angiosperm SINEs [29].
The second non-LTR retrotransposon order present in plants, LINEs, contains elements belonging to the L1 and the RTE (retrotransposable element) superfamilies (Table 1 and Figure 1), which are two of the five known superfamilies of LINEs detected in eukaryotes [16]. RTE and L1 LINEs have one or two open reading frames (ORFs), respectively, and code for proteins required for retrotransposition, such as an endonuclease (EN), a RT, and often a ribonuclease H (RNase H (RH)). The L1 ORF1 is involved in the binding, protection and transport of the RNA intermediate used for retrotransposition. At their 3′-end lies a stretch of (A)n for L1 or (GTT)n for RTE involved in the reverse transcription initiation. A recent study shows that plant LINEs extracted from 23 genomes fall into only seven L1 and one RTE families/lineages/subclades [27]. As the reverse transcription starts from the 3′-end of LINEs and does not always reach the 5′-end, many incomplete daughter copies can be generated.
Between their bordering direct repeats 5′-LTR and 3′-LTR, autonomous LTR-retrotransposons (LTR-RTs) code for structural capsid-like (GAG) and functional (POL) proteins needed for their retrotransposition cycle (RT = reverse transcriptase, RH, INT = integrase), resembling the replication cycle of retroviruses. Only two out of the five superfamilies found in eukaryotes are represented in plants [16]. Plant LTR-RTs are further classified into Copia/Ty1 or Gypsy/Ty3 superfamilies according to the order of their coding pol domains. Recently, a systematic survey of plant LTR-RTs in 80 plant genomes refined the classification by introducing 16 lineages/families into the Copia/Ty1 superfamily and 14 lineages/families into the Gypsy/Ty3 group (six with a chromo-domain and eight without) [20]. Two Gypsy lineages, Chlamyvir and Tcn1, having only representatives in algae and non-Viridiplantae species, have not been included in Table 1. Non-autonomous derivatives of variable size (from a few hundred bp up to 25 kb) have been characterized in plants, containing between both LTRs a DNA sequence of variable length, either non-coding or reminiscent of some retrotransposon internal domains. Large internal sequences (>4 kb) define LARDs (large retrotransposon derivatives), and short ones (<4 kb) are often called TRIM (terminal repeat retrotransposon in miniature) [16].
Retrotransposons belonging to the Penelope-like elements (PLE) order are also found in some plant genomes, but with a patchy distribution. PLE encode an RT domain related to telomerase, a highly specialized class of non-mobile RTs responsible for chromosome end maintenance in most eukaryotes. Some PLEs also carry a second EN domain with a specific GIY-YIG motif. PLEs are bordered by repeats in direct or reverse orientation and are often subjected to 5′ truncation upon retrotransposition, as non-LTR retrotransposons. EN(+)PLEs (Dryads elements belonging to the Penelope/Poseidon group) and EN(-)PLEs have been found in some Conifer genomes (Table 1Figure 1), and some of them were presumably derived from a horizontal transfer (HT) event [26].
The accumulation of sequenced genomes and TE detection pipelines allow the analysis and comparison of TE composition and diversity across plant genera. Figure 2 presents a heatmap of genomic percentage of four types of TEs—LTR-RTs, LINEs, SINEs and TIR DNA transposons—across 74 Angiosperm species displaying variable genome sizes (Data collected from [35] in the Supplementary Tables S1 and S2 of this article). Among the different types of TEs, LTR-RTs (Copia/Ty1 and Gypsy/Ty3) occupy the largest proportion of these genomes, the highest being up to 80% in Zea mays. Such an increase can result from rapid amplification of only a few families. For example, Oryza australiensis has undergone a recent burst of transposition involving only three families (one Copia = RIRE1 and two Gypsy = Wallabi and Kangourou), which compose 60% of its genome [36]. Genomes of the legume tribe Fabeae are also dominated by the Ty3/Gypsy Ogre family/lineage, that accounts for 57% of genome size variation on average in this clade [37]. The predominance of one type of LTR-RT may vary depending on the taxa considered. For example, in Gossypium species, Copia LTR-RTs have accumulated in the small genome of G. raimondii (880 Mb), while Gypsy LTR-RTs (mainly Gorge3) have proliferated in large genomes lineages of G. herbaceum 1667 Mb) and G. exiguum (2460 Mb) [38]. Plant species belonging to the same order (see Brassicales, Poales, Figure 2) might display different TE compositions and genome sizes. Some plant species also harbor specific TE composition as shown in Figure 2, with dominance of LINEs retrotransposons or TIR DNA transposon for the aquatic plant coontail Ceratophyllum demersum (33.6% of total genome size) and the small herbaceous plant Trichopus zeylanicus (~27.3% of total genome size). Non-LTR-retrotransposons have been shown to be abundant (~11.7%) in other plant genomes, such as Arachis ipaenis, one of the peanut parental genomes [39].
Figure 2. Transposable elements (TE) profiles in some land plant genomes. Species are clustered according to their TE profiles. TE percentages, plant orders and genome size estimations from 74 land plant species have been collected from [35] (data collected in Supporting information, Tables S1 and S2 from [35]). Some plant orders as Poales or Brassicales have been highlighted in colors (green and red respectively) in order to underline the diversity of TE composition between species belonging to the same plant order. Plant belonging to different orders as Dioscoreales and Ceratophyllales (in blue) can share close TE composition.
Video Production Service