2. Telomeres as Steps in Species Evolution
To begin with, telomere DNA sequences were assigned as a trait of a large group of organisms, e.g., TTAGGG in vertebrates, TTTAGGG in plants, TTAGG in insects/arthropods
[74,114,115,116,117][14][15][16][17][18] (see
[75][19] for review). Thus, the majority of identified telomere repeats are of minisatellite size and maintained by a special enzyme, telomerase
telomerase (
[76[20][21][22],
77,78], reviewed in, e.g.,
[7,11,79][7][11][23]), except well-known examples of non-telomerase alternatives from Diptera (
[80,81[24][25][26][27][28],
82,83,84], reviewed in
[11,75,98,99][11][19][29][30]). This conservation has proven advantageous in microscopy studies and telomeric probes are second only to rDNA probes
[32,33[31][32][33],
118], e.g., to distinguish and study telocentric chromosomes, to recognize Rabl-like or bouquet organization or various chromosomal aberrations
[119,120,121,122,123,124,125,126][34][35][36][37][38][39][40][41]. Numerous reports that characterized typical telomeric sequences in an increasing number of species seemed to confirm the telomere consensus T
xA
yG
z. Telomeric sequences in yeast models, e.g., TG
1–3 in budding yeast
[127[42][43],
128], T
1–2ACA
0–1C
0–1G
1–6 in fission yeast
[129][44], 8–25 bp-long repeats in
Kluyveromyces and
Candida [130,131][45][46] were treated as an interesting variety from the general repeat unit T
xA
yG
z and special only to yeast. Current research on Saccharomycotina
[132,133][47][48] has revealed even more telomeric variants, although despite their considerable divergence, all of these telomere sequences have guanines (Gs) as one of their most conserved features
[131,132,133,134][46][47][48][49].
Missing signals using telomere probes in in situ hybridization experiments were the first hints towards identifying organisms that do not possess typical telomeres formed by the expected repeat, e.g., plants
Allium (Asparagales,
[114][15]),
Cestrum (Solanales,
[135][50]), some beetles and the spider
Tegenaria ferruginea [117][18]. In the next few years, detailed studies revealed gradually more species with unknown telomeres from plants
[136,137][51][52] and insects
[138][53]. This led to a breakthrough in the general view of telomeres. Studies that mapped telomere sequences in plants, animals and algae identified evolutionary switchpoints in which sequences typical to one group were replaced by other variants
[30,135,139,140,141,142,143,144,145][50][54][55][56][57][58][59][60][61]. For example, a group of species from the plant order Asparagales changed their telomeric sequence from the
Arabidopsis-type repeat TTTAGGG to the human-type TTAGGG. An elusive, highly divergent telomere repeat was finally identified in
Allium (Amaryllidaceae, Alloidae,
[146][62], see
Section 7), one of the largest monocotyledonous genera with an estimated 800–900 species
[147][63]. Similar step changes were found in green algae, in which the transitions from TTTAGGG to novel types TTAGGG, TTTTAGGG or TTTTAGG allowed the grouping of species with the same telomere in distinct phylogeny clades
[30,142,148][54][58][64]. A similar switch was identified in beetles where the repeat TTAGG was replaced with the TCAGG repeat
[143][59]. A broad experimental study of algal telomeres, accompanied with the identification of candidate telomeric sequences from genomic databases of various species across the eukaryotic tree of life, showed TTAGGG and TTTAGGG telomeres as being the predominant telomeric types
[30][54]. Fulneckova and colleagues
[30][54] mapped the occurrence of telomeric sequences in phylogeny revealing the TTAGGG repeat as an ancestral eukaryotic telomere and current phylogeny
[149,150][65][66] still supports this hypothesis (see
[7] for review). Interestingly, just as many telomere variants were experimentally verified, many more species and groups with unknown telomeres were discovered
[11,30,142,148,151][11][54][58][64][67]. Telomere sequence variants and their evolution in plants and algae are described in detail in a review by Peska and Garcia
[7]. Progress in insect telomere identification is reviewed in Mason et al.
[11] and recent findings are mapped in
[151,152,153,154][67][68][69][70].
3. Telomere Minisatellites Are Much like Any Other DNA Sequences
When exploring the occurrence of telomere minisatellite repeats in the genome,
we should keep in mind that telomere-like sequences can occur in locations other than in the telomere. Such sequences are called interstitial telomeric sequences (ITSs) and can be classified as part of several groups according to their length, occurrence and structure (recently reviewed in
[155][71],
Figure 21). ITSs can have the same sequences as telomeres or they can have variant telomere-like repeats. For example, budding yeast has the telomeric sequence TG
1–3 and interstitial tracts of TTAGGG repeats are present in subtelomeric and other regions
[156][72]. ITSs can occur as a few copies across the genome, including regions that are proximal to genes, but also in clusters found frequently in pericentromeric or subtelomeric regions. The arrangement of ITS sites can also be classified in respect to the orientation and composition of telomere-like sequences as head-to-tail or head-to-head, homogeneous or degenerated tandem repeats and with or without linker sequence(s) (
Figure 21b). When ITSs occur in a head-to-head orientation with a linker sequence, these can be amplified using a single-primer PCR reaction
[157][73] (
Figure 21c). ITSs can be unique or part of longer repetitive sequences and are a suitable genetic marker for mapping
[157,158,159][73][74][75]. ITSs in clusters usually contain a large portion of degenerate telomeric motifs and could be interspersed with other repetitive sequences
[158,159,160][74][75][76].
Figure 21. Experimental examination of ITS and telomeric repeats. (
a) Telomere repeats are strand oriented. (
b) Telomere-like repeats in telomeres or internal sites may form clusters or short stretches. Single-primed PCR distinguishes between these using an extension reaction with a single telomeric oligonucleotide primer (C-rich primer is shown, triangles). Telomeric sequences, short and clustered ITSs produce a smear of ssDNA products visible after hybridization with a radioactively (*) labeled probe (right, e.g., from
Chlorela vulgaris, experiment performed as in
[135][50]). Cloneable dsDNA products visible in an ethidium-bromide stained agarose gel (etd) are produced when ITSs occur in head-to-head orientation. When dGTP is omitted, bands are not produced by ssDNA or short ITSs, but ssDNA from a telomere is elongated until primer extension stops at the first G in the subtelomere. This reaction showed the
Arabidopsis- and human-type telomere repeats are absent in
Allium and
Cestrum [135,140,161][50][56][77]. (
c) Different patterns of ITSs amplified from four
Cestrum species in single-primed PCR using C-rich and G-rich primers for the
Arabidopsis-type telomeric repeat
[135,157][50][73]. (
d) The specific pattern of ITS-associated sequence BR23 (green) was visualized on
Cestrum elegans chromosomes using FISH. The high-copy repeat BR23 shows dispersed and clustered signals (5S rDNA in red, counterstained with DAPI; adapted from
[157][73]). (
e) Allotetraploid
Cardamine scutata, a hybrid of
C. parviflora and
C. amara with the parental origin of chromosomes visualized by GISH (left panel, GISH) and telomeric probe (TEL) that detects differing pericentromeric ITS clusters (adapted from
[159][75]; modified). (
f) FISH of the 180-bp centromeric satellite (
CEN180), retroelement
ATHILA and
TEL on pachytene chromosomes of
A. thaliana. Interstitial telomeric locus in the pericentromeric region of chromosome Ch1 is marked by an arrow (adapted from
[160][76]; modified). (
g) TRF (terminal restriction fragment) method visualizes telomeric and ITS fragments from
A. thaliana after restriction digestion of gDNA with
MseI. (
h) Schema illustrating the effect of Bal31 nuclease digestion on telomeric, subtelomeric, ITS and internal genomic sequences. After DNA isolation, DNA is fragmented and Bal31 nuclease gradually shortens these fragments from the end. Bal31-digested samples can be used for specific telomere-subtelomere PCR (left, see below). Further restriction digestion (right, H) results in the visualization of TRF signal (h-tel) shortening and verification of the terminal position of a candidate sequence. (Left) PCR/qPCR investigation of genomes with short telomeres (e.g.,
A. thaliana, see results in (
i–
k) adapted from
[162][78]) proving subtelomeric position of candidate sequences (A,B). When the telomere is completely digested, PCR with a C-rich primer cannot amplify the product (tel-a, tel-b), and further digestion results in a loss of amplification signal from subtelomere regions proximal to telomeres (A) in contrast to ITS (C, pericentromeric ITS in
A. thaliana, see schemas in (
j)) or control sequences (D,E). Bal31 nuclease also degrades ssDNA (F) and some dsDNA sites with altered structures (G). (
i) Dynamics of Bal31 digestion monitored by qPCR. Short gDNA exposure to Bal31 results in a sudden, seemingly non-specific decrease in gDNA amount followed by a gradual decrease over a prolonged time. (
j) Bal31-sensitivity of specific subtelomeric sequences from chromosome arm 2R (pat and gal2) and the resistance of the centromeric ITS region to Bal31 digestion resolved by PCR. gDNA integrity was monitored by amplification of 5 kb-long fragments of the
TERT gene. (
k) qPCR analysis of specific subtelomere (gal2, pat, gal5), ITS and control sequences documented a decrease of subtelomeric sequences in relation to their position in the subtelomere. Relative DNA levels were calculated by the ΔCt method (
i) or ΔΔCt method
[163][79] using ubiquitine-10 as a reference gene relative to the nontreated DNA sample (
k). Color coding is the same for (
h–
k). Pictures were adapted by courtesy of Dr. Terezie Mandáková (
e,
f) and Prof. Andrew Leitch (
d), scale bars are 10 µm.
When such clusters are big enough, these can be detected by FISH (
Figure 21d–f) and distinguished from telomeres (e.g.,
[49,157,159,164,165,166,167,168,169][73][75][80][81][82][83][84][85][86]). If they are shorter than the detection limit of this method, they can still show a positive signal when investigated by Southern hybridization or primer extension (
Figure 21g). The origin, evolution and function of ITSs are still subject to much discussion
[120,155,169,170,171,172,173][35][71][86][87][88][89][90]. The massive areas of ITSs often found in pericentromeric regions can be explained as the result of mechanisms such as unequal gene conversion, crossing-over, DNA replication slippage and rolling circle replication of extrachromosomal circular DNA. Some ITSs co-localize with sites of chromosomal breakage and are described as remnants of ancient chromosomal rearrangements, such as during primate evolution
[174][91]. A similar view holds for human ITSs arranged as head-to-head blocks of telomeric repeats that seem to result from the terminal fusion of ancestor chromosomes
[126,175][41][92].
WResearche
rs are still far from understanding the interplay of mechanisms that are activated during genome instability. It has long been considered that overall change in chromosome architecture can result from breakage-fusion-bridge cycles, a phenomenon first described in maize (
[176][93], reviewed in
[177][94]). The classic theory behind this is that a chromosome with one end broken during meiotic crossing-over can fuse with another such broken chromosome, leading to the formation of a “bridge“ conformation chromosome with two centromeres during the subsequent cycle of meiosis. This bridged chromosome is then ultimately cleaved into two daughter chromatids, but not necessarily at the site of the original breakage. This can lead to sequence deletion or replication on subsequently-healed daughter chromatids
[176][93]. Experimental examination of this theory in
Caenorhabditis elegans revealed evidence of such cycles, but also suggested more complex chromatin rearrangements can arise
[178][95]. These more extensive rearrangements are proposed to arise from stalled replication events followed by template switching as may occur in areas with high-homology satellite sequences
[178][95]. A simpler phenomenon is where non-reciprocal translocations can occur during break-induced DNA replication (
[179][96], reviewed in
[180][97]). Broken chromosomes are proposed to invade intact chromosomes with areas of homology during the G1 or G2 phase of the cell cycle, initiating DNA repair with the sequence from the other chromosome arm, possibly acquiring new genes and a telomere in the process
[179][96]. Similar genome instability is also possible when telomeres are lost, making chromosome ends indistinguishable from double-strand breaks
[181][98].
It is clear that telomerase and possible ITSs could have an important role in chromosome rearrangement. For example, when tobacco cells recovered to full cell viability after extensive chromatin fragmentation induced by cadmium stress, this was accompanied by a concomitant increase in telomerase activity
[182][99]. Wheat chromosome end healing after gametocidal gene-induced breakage, efficient telomere healing by telomerase and stabilization of holocentric chromosomes in irradiated
Luzula elegans plants were also previously reported
[183,184][100][101]. Interestingly, when constructs containing telomeric arrays are introduced into mammalian or plant cells, the sites of integration become fragile, chromosomal breakage is induced and the new ends are stabilized
[185,186,187][102][103][104]. Telomere-mediated chromosomal truncation has even been employed as a chromosome engineering technique
[188,189,190,191][105][106][107][108]. All this supports the hypothesis that ITSs are preferred sites for breakage and that telomere-like repeats at a break site may favor chromosomal healing
[170][87].
4. Telomere Proteins
Chromosomal DNA in cells associates with proteins that fold these long polymeric molecules into condensed, ordered forms. Most of the DNA sequence, including genes, subtelomeric satellites and the proximal sections of telomeres is folded into a series of compact but dynamic protein-DNA complexes called nucleosomes
[269][109]. In 2001, Fajkus and Trifonov
[270][110] proposed telomeric nucleosomes are packed in a variant, columnar chromatin structure. Recently, the formation of this structure was confirmed experimentally using cryoelectron microscopy
[271][111]. The ends of telomeres associate with a more diverse set of proteins depending on organism that maintain a 3′ single-stranded overhang (aka G-overhang), recruiting enzymes to lengthen the 3′ strand and shorten the 5′ strand which induce and stabilize t-loop formation (reviewed in
[9,272][9][112]). These mechanisms protect telomeric DNA, prevent aberrant DNA repair and mediate interactions with telomerase (see above,
[272][112]). In
Arabidopsis and
Chlamydomonas some telomere ends are instead blunt, with no or little 3′ overhang, although it is unknown whether this is a special feature of these organisms or a more widespread characteristic
[195,273,274][113][114][115].
Of principal interest to telomere researchers are the specialist proteins that interact with the distal sections of telomeres at the ends of chromosomes. Two major telomere protecting complexes have been described, CST and shelterin. These were initially thought to be alternative mutually exclusive systems, but the search for homologues revealed that many eukaryotes, including humans, had both systems able to work in parallel
[275,276,277,278][116][117][118][119]. Continuing research focused on looking for homologues of human systems across all eukaryotes, however this approach has had only partial success (reviewed in
[279][120]). The CST complex is largely conserved in eukaryotes
[280][121] in terms of function, if not necessarily the sequence of its components
[281,282,283][122][123][124]. CST binds ssDNA and recruits Pol1α primase for C-rich strand synthesis and also has a role in preventing stalled replication forks (for recent advancements see
[278][119] and references herein). In comparison, shelterin (reviewed in
[272][112]) coats telomeric DNA generally and interacts with telomerase for G-rich strand synthesis. Shelterin is not present in all eukaryotes, although most have an identifiable protein family that occupies the same role (
Figure 52). In addition to these larger end-protection protein complexes, there is a highly conserved heterodimer of proteins called Ku70/Ku80 that is normally involved in non-homologous end-joining events, but which also has an enigmatic role in telomeres. This complex binds dsDNA ends non-specifically, but is known to interact with components of shelterin in mammals and telomerase RNA in yeast (reviewed in
[284,285][125][126]).
Figure 52. Telomere protection by protein complexes. (
a) The six core units of shelterin
[272][112] form a complex coating distal telomeres, although stoichiometries of subunits may vary. TRF2 forms T-loops and binds the double-stranded vertebrate telomeric sequence. TRF1 assists this binding, TIN2 and TPP1 form the core of the complex and control other protein-protein interactions and POT1 can bind single-stranded telomeric repeats to stabilize the T-loop. (
b) Fission yeast shelterin
[286][127] is analogous to vertebrates but differs in stoichiometries of proteins. (
c)
Drosophila terminin
[95][128] has a similar function to shelterin although the precise roles of components that share little homology with shelterin components are speculative. (
d) Budding yeast telosomes
[287][129]. Rap1 binds telomeric DNA and can be complexed into dimers by Rif2 or tetramers by Rif1. The entire assembly is proposed to form a velcro-like coating of telomeres although to date structural studies of this complex are on dsDNA only, so any interaction with 3′ overhangs is speculative.
Shelterin has a dynamic composition and variant complexes bind the entire length of distal telomeres, there are six core protein components in humans which are more-or-less thought to be conserved in mammals
[272,288][112][130]. Telomeric repeat binding factor 2 (TRF2) binds the telomeric DNA motif with nanomolar affinity via a SANT/Myb domain sometimes termed the telobox in older literature
[289][131], not to be confused with interstitial telomeric motifs, which are also called teloboxes
[172][89]. TRF2 binds dsDNA and homodimerization enhances this process. It is also proposed to have helicase-like activity where it can wrap dsDNA from near to the telomere end around itself causing steric torsion in the telomere end that encourages T-loop formation. Consistent with this, TRF2 is both necessary for T-loop formation by shelterin and capable of forming T-loops in the absence of any other shelterin components
[290][132]. TRF1 is a highly homologous protein to TRF2 which only binds telomeric repeats and lacks T-loop forming ability. Both proteins (possibly as homodimers) bind TRF1-interacting protein (TIN2) to form the core dsDNA binding subunit of shelterin
[291][133]. TIN2 binds TINT1/PIP1/PTOP1 (TPP1) which in turn binds protection of telomeres 1 (POT1), a protein with multiple OB-fold domains that can bind ssDNA and which is thought to be the main interactor with the 3′ overhang in the complete shelterin complex. TRF2 alone can also recruit repressor/activator protein 1 (RAP1) as the sixth member of core shelterin and the interactions between shelterin subunits can generally occur across multiple protein surfaces
[291][133]. TPP1 in complex with POT1 interacts with telomerase as part of the coordination of telomerase and shelterin protein complexes
[278,292][119][134].
Unsurprisingly,
Drosophila has evolved a separate group of proteins in a complex called terminin to protect the retrotransposon-derived sequences at the ends of its chromosomes. Terminin was identified from the larval brain cells of mutant flies with end-fused chromosomes and consists of a core of heterochromatin protein 1/origin recognition complex-associated protein (HOAP), Modigliani (Moi) and an OB-fold protein called Verrocchio (Ver)
[293][135]. Whilst fission yeast has a shelterin complex made from paralogues of human proteins
[294[136][137],
295], budding yeast, instead has a velcro-like network of proteins called the telosome. This consists of Rap1, a general transcription factor which coats double-stranded telomeric DNA, Rif1 which binds DNA via a Myb domain and Rif2 which binds DNA via an AAA+ domain. Rif 1 and Rif2 can bind four or two molecules of Rap1 respectively through binding domains attached to long disordered chains to form a dense protein network (
[296][138], reviewed in
[297][139]). The system in plants is not yet clear (reviewed in
[298][140]).
Although plant proteins that share some sequence homology to human shelterin proteins have been identified (summarized and reviewed in
[299,300][141][142]), including those with C-terminal Myb domains similar to TRF1 and TRF2, these do not have any obvious end-protection role
[301][143]. The only definitive double-stranded telomeric DNA binding proteins so far characterized in plants are the telomere repeat binding proteins (TRB1–3)
[302,303,304][144][145][146]. These proteins bind to
Arabidopsis telomeres in vivo
[304[146][147],
305], and TRB1 colocalizes with telomeres when introduced to
Nicotiana benthamiana in live cell imaging studies, suggesting a general role for these proteins at plant telomeres
[306,307][148][149]. TRBs have histone-like domains that allow multimerization and binding to telobox-related DNA motifs in a multitude of chromosome sites and N-terminal Myb domains that specifically bind double stranded telomeric DNA
[302,303,304,308,309][144][145][146][150][151]. Similar to TPP1 in human shelterin, TRBs can interact with telomerase and so together with DNA binding and multimerization it is easy to draw parallels with other end-protecting proteins
[289,299,303,304][131][141][145][146]. It can be speculated that in addition to their other regulatory DNA-binding roles
[308[150][151][152],
309,310], TRBs could form some sort of end-protection framework, similar to the telosome in yeast. Alternatively, it could simply be that any end-protection proteins in plants are sufficiently variant from other organisms to have eluded discovery so far.
One final quirk in plant telomere biology is the occurrence of blunt-ended telomeres. Some blunt DNA ends in
Arabidopsis [311][153] are known to at least temporarily bind Ku70/80, a ubiquitous DNA end-protecting protein complex that is part of the normal double-strand break maintenance mechanism. Studies in budding yeast and human cells revealed that Ku can interact with telomeric chromatin either by directly binding to telomeric DNA or via interaction with telomere associated proteins, including the shelterin subunits such as TRF1, TRF2 and Rap1
[312,313,314][154][155][156]. Studies using mice revealed considerable telomere abnormalities where Ku is knocked out, but phenotypes are complex enough that a specific role is difficult to ascertain
[315,316][157][158]. In yeast, Ku also binds the telomerase RNA
TLC1 separately from telomere ends in a mutually exclusive fashion, and is required to maintain levels and nuclear localization of
TLC1. YKu association with telomeres is independent of its association with
TLC1 RNA and occurs throughout the cell cycle
[317,318][159][160]. As with other eukaryotic systems, the Ku heterodimer in
Arabidopsis forms a tube that slides onto and encircles the double-stranded telomere from one free end, providing simple end-protection without translocating inward
[274,311][115][153]. It is so far unknown whether Ku-protected blunt ends in
Arabidopsis and
Chlamydomonas are unique to these organisms or whether a more widespread phenomenon is yet to be found in other eukaryotes. It is possible that these are an evolutionary step that limits the amount of work that telomerase has to conduct or provides cells without telomerase more stability during proliferation
[319][161].
5. How to Find a Telomere Candidate
Experimental approaches which have been used successfully in the past to characterize telomeres de novo (summarized in [206][162]) comprise proof of the end-protection function of newly-discovered sequence in vivo [208[163][164],209], genomic DNA library screening with verification of terminal position by BAL31 digestion and Southern hybridization [49 [16][17][62][80],115[165],116,146,202,205],[166] cloning of telomerase products [30,139,142,148,229,230][54][55][58][64][167][168] and a novel combination of genomic and transcriptomic studies with classical methods [146,204,205,206][62][162][166][169]. Today raw data or assembled contigs generated by researchers or from public NGS (next generation sequencing) datasets can be mined for repetitive sequences using, e.g., Tandem Repeats Finder [234][170] and/or RepeatExplorer [39][171]. New ways of in silico analysis in combination with experimental approaches for the identification and verification of novel telomere sequences were used e.g., in yeast Lachancea sp. [132],[47] beetle Anoplotrupes stercorosus [151] [67], a plant with human-like telomere sequence Zostera marina (Alismatales) [251],[172] and also a plant with unusual telomere type A.A. cepa cepa [146][62]. Moreover, comparative transcriptome study led to identification of telomerase RNA (TR) subunits and telomeric repeats across the entire land plant phylogeny [204][169]. Subsequently, a new bioinformatic approach based on prediction of TR subunits in combination with results from Tandem Repeats Finder resulted in a broad identification of telomere sequences in green algae, ciliates and Stramenopiles including novel types TATAGGG, TGTTAGGG, TGTAAGGG and demonstrated the deep evolutionary TR origin in the megagroup Diaphoretickes [252] [173].