Well-characterized telomeric satellites of the human genome can also be applied broadly as informative markers to study a variety of hominoid species owing to multiallelic variation and a high degree of heterozygosity [
70]. The MsH42 locus shows high similarity with immunoglobulin regions and is involved in recombination events as well as in promoting high rates of unequal crossovers [
78,
84,
85]. The telomeres harbor short stretches of sequences termed interstitial telomeric sequences (ITSs), which are located far from the chromosomal ends. To trace the evolutionary origin of these sequences in NHP genomes, 22 ITS loci from the human genome were compared with their orthologs in 12 NHPs, representing species such as great apes, gibbons, Old World monkeys, and New World monkeys. Comparison of sequences indicated that, unlike other microsatellites, these ITS sequences were not derived from expansion of pre-existing TTAGGG monomers but rather emerged abruptly during genome evolution in primates as a result of double-strand break repair [
86]. Similar findings were observed from investigation of a chimpanzee-specific ITS. A universal satDNA classification is still the subject of debate; however, most commonly, satDNA can be grouped according to position and association with different chromosomal loci. SatDNA is primarily clustered within the heterochromatin regions of primate chromosomes. The heterochromatic portion is mainly localized in centromeric and telomeric regions, and sometimes within the interstitial regions of the chromosomes [
87], whereas satDNA sequences are mostly located in centromeric regions, and the nearby pericentromeres may be enriched with TEs. Different types of primate satDNA are discussed and summarized as
Supplementary Table S1.
3.1. Centromeric and Pericentromeric satDNA: Primate-Specific Alpha Satellites and HORS
The centromere cores of human chromosomes span abundant and highly enriched stretches of satDNA, and are surrounded by heterochromatin containing a combination of short satDNA sequences and retroelements [
29,
88]. Occasionally, these centromeric regions are termed “satellite centromeres” [
89]. The centromere is an important region of the chromosome for preservation of genetic materials and plays a critical role in chromosome segregation, cell division, kinetochore organization, and spindle attachment [
89,
90,
91,
92]. In primates, the bulk of the centromere is composed of the pancentromeric alpha satellite (AS), organized as stretches of 171 bp monomers in a head-to-tail fashion extending for ~250 kbp up to ~5 Mbp per chromosome [
93,
94,
95,
96] (a(i)). This structure has been reported across diverse groups, including great apes, Old World monkeys, and New World monkeys [
96,
97,
98,
99,
100,
101,
102]. These centromere-associated satellites are arranged as superfamilies (SFs) that can be orthologous between human and gorilla [
60]. The surrounding pericentromeric satDNA are essential elements that assist in stabilization of DNA–protein binding and regulation of chromosome segregation [
58,
61]. These pericentromeric satellites vary greatly across NHP species but can be conserved among closely related species or may be species-specific [
20,
103]. For instance, a large block of human chromosome 9 that spans a pericentromeric area enriched with satellite III (SatIII) shares close homology with the gorilla sequence [
104]. The Y chromosome of NHPs may carry higher numbers of copies of satellite III sequences than the human Y chromosome [
105]. FISH mapping of the pericentromeric-type satellite pW-1 SatIII DNA on chromosomes of various NHP species showed that these sequences might be lacking in the genomes of squirrel monkey (
Saimiri sciureus) and baboon (
Papio hamadryas) [
105]. These centromeric satellites can vary substantially across different species, but certain species-specific or even highly conserved satDNA may also be present in the centromere domains [
20,
103]. For example, two major families of centromeric satellites, termed C1 and C2, detected in Old World monkey species crested mona monkey (
Cercopithecus pogonias) and sun-tailed monkey (
Cercopithecus solatus) have remained highly conserved [
48]. For Old World monkeys, apes, and humans, each genome harbors evolutionarily distinct AS monomers [
106]. Although most primate centromeres can be enriched with satellites repeats, there are certain chromosomes of orangutan that comprise non-repeated centromeres [
92,
107,
108,
109,
110]. In such cases, the centromeres may resemble newly formed neocentromeres as a result of disruption in the centromeric region, such as in humans [
92,
111]. Such non-repeated centromeres are likely to be evolutionary new centromeres (ENCs), forming neocentromeres that might have subsequently gained repeat sequences to stabilize the genome and become fixed in populations. This phenomenon can also occur in the centromeres of several non-primate species, such as horse and chicken [
112,
113]. In the following, we focus mainly on the predominant centromeric satDNA in primate genomes as AS repeats.
The AS repeats were first observed as tandem repeats in the African green monkey (
Chlorocebus aethiops) genome [
93], followed by identification of homologous repeats in New World monkeys and apes [
96,
114]. These sequences are considered to be critical components for the various functions of primate centromeres [
94]. Previous results suggest that AS sequences were involved in stabilization of ENCs after their emergence in primate genomes [
109,
115]. Human and macaque chromosomes contain a total of 14 ENCs, of which nine ENCs in the macaque genome show abundant arrays of AS [
109]. Interestingly, ENCs occur in macaque chromosome 4 and human chromosome 6, which are orthologous to each other (a(ii)) [
109,
116,
117].
The AS monomer size is 171 bp, tandemly arranged in a head-to-tail manner, and shows as much as 70% sequence similarity. The combined monomers can form a long array spanning an uninterrupted 250–5000 kb stretch of repeated satellites, giving rise to high-order repeats (HORs) (a(iii)). A certain monomer in the HORs with a sequence size of 17 bp is termed the CENP-B box. This motif acts as a protein-binding site for a centromeric CENP-B protein in primates. The human genome project, which was declared complete in 2003, was still unable to recover a large proportion of the centromeric and other repeats, including more than 10% of the contents of the whole genome, mainly sex chromosomes. However, subsequent technological developments enabled assembly of the entire human Y chromosomal centromere [
62,
118]. The Y chromosome assembly could be used as a reference sequence to extend evolutionary insights into the centromeric repeats of NHPs for which Y chromosome assemblies have not been hitherto accomplished.
In primates, the flanked regions of centromeres have specialized HORs arrays, whereas AS sequences are organized as non-structured and heterogeneous repeats, forming distinctive pericentromeres. In these pericentromeres, AS sequence repeats are arranged as monomers instead of HORs and are interrupted with additional elements, mainly retrotranposable elements in humans [
119] (a(iii)), which may also be common to other primate genomes. The pericentromeres of certain human chromosomes may also show enrichment of several other repeat sequences, including the 5 bp satDNA II and III type sequences [
103,
120]. The AS sequences can show nucleotide variation when one monomer is compared with the repeats of the same array, with nucleotide identity ranging from 70% to 90%. The sequences of a monomer in one array may show up to 95% similarity with its counterpart unit in the other array at the same locus [
63,
121,
122]. In the human genome, the organization of HORs with their monomer units has been extensively studied [
65,
97,
123,
124], and shows the occurrence of various subfamilies of chromosome-specific AS sequences. The sequences of HORs in great apes, such as orangutan, gorilla, and chimpanzee, show a lower degree of variation in comparison with HORs observed in the human genome [
125,
126,
127,
128]. Initially, it was presumed that the organization of HORs might be restricted to hominids; however, HORs were subsequently detected in the genomes of gibbons [
101,
102,
129] and of Old World and New World monkeys [
48,
102,
130]. During the evolution of the primate genome, the 170 bp AS monomer underwent a series of sequence variations [
87]. A novel AS monomer type of 189 bp was discovered in the centromeres of gorilla [
60]. Chromosome-specific subfamilies are absent in Old World and New World monkeys as well as in gibbons [
87,
101,
106]. Cloning, sequencing, and hybridization of acrocentric chromosomes revealed novel AS sequence repeats in Azara’s owl monkey (
Aotus azarae), which is a species of New World monkey [
22,
23]. These repeats include three megasatellites, namely OwlRep, OwlAlp1, and OwlAlp2, which vary in size from 184 to 344 bp as identified in the centromeric and pericentromeric regions. Analysis of retina samples using three-dimensional FISH revealed that OwlRep is the major component of heterochromatin, which indicates its role in the evolution of night vision in this species [
131,
132]. Recently, Cacheux et al. [
49] investigated the evolutionary dynamics of AS sequence repeats and their diversity in the Old World monkeys
Cercopithecus pogonias and
C. solatus using targeted sequencing and FISH mapping. These authors reported evidence of chromosome-specific subfamilies that might have evolved through homogenization. The OwlRep repeat shows ~82% homology with a satellite sequence termed HSAT6, which is a 126 bp long tandem centromeric repeat. The HSAT6 sequence was also detected in the owl monkey genome, and comparative analysis revealed its broad distribution among hominoids and New World and Old World monkeys. Phylogenetic analysis confirmed that OwlRep evolved from HSAT6 [
132].
In addition to AS, an additional type of satellite family termed the beta satellite is distributed in the heterochromatin of primates [
133,
134,
135]. Beta satDNA are repeats that comprise ~68 bp monomers. They are predominantly organized in the shorter arm of acrocentric chromosomes and arranged in stretches several kb in length [
136,
137,
138,
139]. The beta satDNA repeats can form complexes with arrays of specific repeats, termed D4Z4 repeats, at certain acrocentric loci, such as 10q26 and 4q35 [
140,
141]. Evolutionary analyses involving cloning and FISH experiments have predicted that 4q35 containing D4Z4 repeats might represent an ancestral locus with an extensively radiated sequence region that evolved after the divergence of hominoids and Old World monkeys [
142,
143,
144]. The origin and evolution of beta satDNA vary in diverse species of hominids, such as humans, chimpanzee, and gorilla [
145,
146]. FISH mapping data confirm that D4Z4 is also conserved in Old World and New World monkeys, whereas in primates distantly related to humans (e.g., lemurs), this sequence has retained tandem repetition but conservation is limited to promotor regions [
147]. Genomic analysis of orangutan has revealed the origin of beta satDNA in earlier ancestors of hominoids and shows that these repeats are preferentially located in pericentromeres [
135]. This study concluded that these repeats originated as low copies, remained non-duplicated in the early ape ancestors, and later evolved as duplicons acquiring the typical characteristics of classical satellites in humans and other primates. Adjacent to ASs, the classical non-alphoid satDNA repeat families I, II, and III are located in pericentromeres of human chromosomes [
95]. The human genome includes the Sat III family, which is composed of GGAAT and GGAGT repeat sequences in different percentages. The satellite III family is mainly localized on the short arm of acrocentric chromosomes in humans and other primate species. This family is also present in the chimpanzee, gorilla, and orangutan genomes [
148,
149]. The chromosomal organization of this satellite family has provided interesting evolutionary insights into primate genomes [
105]. Sequence comparisons have detected variation across different primate species and suggest that the Sat III family might have appeared ~16–23 million years ago in Hominoidea [
105]. The evolutionary origin and extensive diversification of centromeric satellites in primate genomes remain unclear; however, it is speculated that TEs are the possible progenitors and sources that form novel satellites by insertions into existing satellite regions [
119].
3.2. Telomeric and Subtelomeric satDNA
The telomere is located at the end of the chromosome and is enriched with a non-coding, repetitive DNA sequence. The 500 kb region of each chromosomal arm terminal is the so-called subtelomeric region [
150]. Both telomere and subtelomere have high-density of satDNA repeats. Telomeric regions of the primate genome show a high frequency of minisatellites, which also occur in other loci of chromosomes [
67,
151]. The bulk of telomeric-specific regions are mainly composed of (TTAGGG)
n microsatellites in humans [
79]. Adjacent to the telomere, the subtelomere region is mostly enriched in rapidly evolving satellite repeats with variable levels of repetitiveness and size [
57,
152,
153]. Although these subtelomeric satellites can be species-specific and often chromosome-specific, there are also satellites that remain highly conserved [
154]. The microsatellites (CCCTAA)
n, (CCCCAA)
n, and (CCCTCA)
n are present in telomeres of primates [
155], whereas (CCCGAA)
n is restricted to subtelomeres [
156] (b). In New World monkeys, the subtelomeres can carry novel satDNA sequences. The subtelomeric regions of callitrichid monkeys harbor a satellite termed MarmoSAT that is composed of a 171 bp motif [
157]. The MarmoSAT occurs as a monomer, whereas in common marmoset (
Callithrix jacchus) it is organized in HORs with a sequence of 338 bp. Recently, some intriguing groups of satDNA sequences enriched with AT nucleotides, termed StSats, have been reported in telomeres of humans and great apes, including bonobo, chimpanzee, gorilla, and orangutan [
47]. The StSats are located in proximity to telomeric regions [
158,
159,
160]. Astonishingly, these satellites are very highly enriched in the gorilla and chimpanzee genomes compared with their abundance in humans [
47]. Previously, it was hypothesized that these repeats occurred in hominid ancestors and were lost in humans [
158,
159,
160]. The abundance of StSats repeats in the bonobo, chimpanzee, and gorilla genomes indicates that these sequences might contribute to important genomic functions in these species. Different functions have been proposed for these repeats that include their role in meiosis, telomere clustering, and control of replication duration with telomeric regions [
158,
159,
160].