Human Satellite DNA Families | Encyclopedia MDPI

Human Satellite DNA Families: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Genetics & Heredity

Contributor: Mariana Lopes

Going back to the 1960s, the discovery and classification of three clearly distinguishable human genomic DNA fractions in CsSO4 gradients established the identity of the corresponding classical satellite DNAs I, II, and III. More precisely, a set of repetitive sequences with analogous buoyant densities was found to compose each gradient fraction. These DNA fractions presented a characteristic inter-sequence heterogeneity, which led to a new classification in 1987, as a prime family of simple repeats was identified for each fraction. The three families were described as satellite DNA families I, II, and III and were first reported to be present in all acrocentric chromosomes, as well as in chromosomes 3 and 4. Additionally, the centromeric alpha (α) satellite DNA family was also identified and described, soon becoming the most intensively studied human satDNA sequence. Later on, gamma (γ) and beta (β) satellites were likewise found among the diverse families of human satellite DNAs.

satellite DNA families
satellite DNA characterization

1. α Satellite DNA

Initially, α satellite DNA (αSAT) was isolated from a highly repetitive fraction present in the African green monkey genome [53]. Subsequently, α satellite repeats were shown to be present in all human centromeres and to be composed of tandem repeats of an AT-rich 171 bp-long monomer [54,55,56,57]. Alphoid monomers can form HORs composed of nmer repeats (being n the number of monomers) or be organized in a non-HOR manner as simple monomeric repeats [28]. HORs can be formed by 2 to 34 monomers [28,56,58,59]. Some monomers within α satellite HORs have a 17 bp sequence motif called the Centromere Protein B (CENP-B) box because of the ability of CENP-B to recognize and bind to these regions [60]. The CENP-B box location is structurally related to the chromosome-specific HOR array (varying accordingly) [5]. Moreover, the CENP-B box is present with high degree of conservation in other mammal genomes [61]. Studies show that an active CENP-B box is required for de novo centromere assembly in humans, acting in the recruitment of the Centromere Protein A (CENP-A) and stabilization of the Centromere Protein C (CENP-C) [62], both related to an active kinetochore and proper chromosome segregation, which might explain its retention in different genomes. Each human chromosome contains one or more exclusive α HOR array, except for chromosomes 13/21 and 14/22, which share the same HOR array [46,50,63]. Regarding acrocentric chromosomes, a variety of α satellite subfamilies can be found in the vicinity of the centromere: pTRA-1, pTRA-2, pTRA-4, and pTRA-7, all of them present in chromosomes 13, 14, and 21 [46,64]. These subfamilies are part of a catalog of 28 clone-isolated α subfamilies from all human chromosomes [64], although, presently, an accurate genomic analysis is still required to avoid redundant classifications.

α satellite soon became the model for the hierarchical HOR organization [29]. Alphoid sequences are deeply related to proper cell division (being the foundation for kinetochore formation); the occurrence of active centromeres; and, therefore, centromere identity. It is possible to distinguish human centromeres based on their α HOR specificity-conferring composition, namely, by the number and order of monomers (that share 50–70% of identity) [65]. By defining α monomer consensus sequences, it is possible to discern five suprachromosomal groups or subfamilies, based on the possible monomer combinations [65,66] (reviewed in [5,66]). The main suprachromosomal subfamilies (SF1-3) correspond to the kinetochore formation region and are associated with centromere functionality [67,68]. Hybridization studies performed at high stringency allow the mapping of individual HORs to specific chromosomes [56] because of sequence polymorphisms found between them [5]. At low stringency, subsets of HOR arrays co-hybridize, allowing one to study how suprachromosomal subfamilies relate to each other [58,69]. Beyond the occurrence of α HORs, α monomeric repeats are present in transitory, array-adjacent pericentromeric regions, feasibly evolving non-homogeneously from homogenous HORs [59,70]. The relative mutation rate of centromeric α satellite sequences (accelerated comparing to unique genomic portions) lines up with a layered and symmetric evolution in the following direction: active HOR repeats-ancient HOR repeats-monomeric repeats [71]. In fact, closeness to the functional core centromere is a determining factor for HOR homogenization, as distant monomers are considered older, more variable, and a trace of centromere primate evolution [72]. Therefore, HOR array chromosome specificity results from intrachromosomal homogenization [13].

2.2. Satellite DNA I

SatDNA I (SATI) is distinguished by the presence of 42 bp repeats, consecutively arranged in units of 2 types, A (17 bp) and B (25 bp) repeat units [29], which can tandemly organize in ABABA constructs [30,73]. SATI repeats can form HORs of 2.97 Kb [74]. The amplification of these sequence arrays arranged in a head-to-tail fashion resulted in the current complexity of the SATI DNA family [73]. SATI is the most AT-rich fraction of the human genome, being also the least abundant classical satellite [51]. This classical satellite was first described using a probe (pTRI-6) that hybridizes with all acrocentric chromosomes at low stringency and only with chromosomes 13 and 21 at high stringency [46,74]. Until this day, pTRI-6 remains the only sequence described as a SATI subfamily. In 1986, experiments with the restriction enzyme RsaI allowed for the detection of the ABABA construct [30]. In this study, the A-B 42 bp form was considered predominant, although a possible second form (B-B dimers) was also observed. Later, Meyne et al. determined that B-B repeats hybridize with chromosome 3 and acrocentric chromosomes. The predominant ABABA construct showed an analogous chromosomal location to that of the pTRI-6 subfamily (chromosomes 3 and 4 and acrocentric). Acrocentric hybridization signals could be found in two locations: proximal pericentromeric and more distal short arm regions. The authors highlighted the need for high-resolution molecular studies for variant analysis [73], a requirement that remains pressing a quarter a decade later.

2.3. Satellite DNA II/III

SatDNA II (SATII) associates with a poorly conserved repeat unit (ATTCC), and satDNA III (SATIII) was shown to be composed of pentameric repeats of the same motif (well-conserved and interspersed with a specific 10 bp sequence) [29,75]. The inconsistent arrangement of satellite II/III in complex repeats (as opposed to tandem repeats) has led to a poor characterization of these satellite families [41]. SATII and SATIII probably arose from the same pentameric repeat [30], yet today these sequences locate to different genomic regions [41,48].

SATII repeats were initially reported to predominantly locate at chromosome 1 [45,48] and chromosome 16 [76,77]. In particular, the chromosome 1 SATII array represents a chromosome-specific 1.77 kb unit [48]. To a smaller extent, SATII was also found in pericentromeric regions of chromosomes 2 and 10 [45]. In 2014, three different SATII subfamilies were analyzed, presenting different sequence composition and genomic location [41]. Today, the chromosomal location of SATII family is more broadly recognized, supported by increasing genomic and bioinformatic studies.

SATIII was localized to chromosomes 1, 9, and Y [76,77,78], as well as to acrocentric short arms [79]. SATIII repeats have likewise been progressively found in additional chromosomal locations (e.g., chromosomes 5, 10, 17, and 20) [29,45]. SATIII presence in the Y chromosome long arm is distinguished by a male-specific 3.6 Kb repeat unit [78,80,81]. With respect to SATIII acrocentric repeats, eleven different subfamilies have been identified and characterized: pTRS-47 [82], pTRS-63 [83], pTR9-s3 [31], pTRS-2 [46], pE-1, pE-2, pR-1, pR-2, pR-4, pK-1, and pW-1 [84]. The pTRS-47 and pTRS-63 subfamilies, located at chromosomes 14/22 and 14, respectively, seem to be particularly significant in a clinical context for their involvement in the breakpoint of human Robertsonian translocations [85]. Interestingly, computational clustering analysis of human sequences was able to identify a total of eleven SATIII subfamilies [41]. Although the number of identified subfamilies correlates with previous clone hybridization studies on acrocentric chromosomes, predicted chromosomal locations do not seem to match (not all identified subfamilies locate to acrocentric chromosomes). A comprehensive sequence analysis is essential since different sequence composition and physical locations (observed in both approaches) point to the existence of a higher number of SATIII subfamilies. Gaps in our understanding of SATII/SATIII repeats are strongly associated with limiting bioinformatic/sequencing approaches, due to their short irregular nature [41], as well as close sequence relation.

2.4. β Satellite DNA

β satDNA (βSAT) was initially named as Sau3A satDNA family [86] and effectively termed β satellite in 1989 [87]. β satellite repeats consist of tandem arrays of a 68 bp monomer organized in multimeric HORs, described to be present in all acrocentric chromosomes and chromosomes 1, 3, 9, 19, and Y [47,79,86,87,88,89], predominantly in pericentromeric regions [90]. Indeed, β satellite was distinguished in two different types of HORs (pB3 and pB4), composed of non-overlapping arrays with distinct genomic locations. pB3 is specifically localized in chromosome 9, and its representation is equivalent to 50–100 times per haploid genome. The second type of HOR, pB4, is 5 times more represented per haploid genome and is located in acrocentric chromosomes, where β satellite was found early on to map distally and proximally to rDNA [87]. Recently, β satellite was identified to be present in multiple eukaryotic taxa and to be the object of horizontal transfer (HT) events, contradicting previous claims of its exclusive presence in primates [91].

2.5. γ Satellite DNA

Originally, γ satDNA (γSAT) was isolated from a chromosome 8 specific clone [92]. Later on, another γ subfamily was described in chromosome X [93]. Known γ satellite subfamilies (GSAT, GSATX, and GSATII, with ~60% identity) are GC-rich tandem pericentromeric repeats of a vastly diverged 220 bp monomer and have been identified in all human chromosomes [40,94] usually forming clusters of 2–10 kb [92]. Kim et al. [94] proposed that γ satellite repeats may possibly work as barriers for heterochromatin expansion to chromosomal arms, being functionally similar to genomic insulators. This thesis emerged in accordance with previous statements regarding the existence of structural and functional constrains related to γ satellite [29].

Regardless of the common satDNA composition, centromeric and pericentromeric chromatin are structurally different, essentially because centromeres are epigenetically compatible with kinetochore assembly and chromosome segregation, while pericentromeric regions have a typical heterochromatic behavior [26]. Thus, the ubiquitous centromeric presence of α satellite sequences is contrasted by the nature of pericentromeric satellite families that clearly behave in a more non-homogenous manner [23,29,55], frequently leading to incongruences about their overall existence and location in the human genome [95]. Human centromeres are not only composed of satellite sequences, but also contain mobile elements, including LINEs and SINEs (Long/Small Interspersed Nuclear Elements), already described both in HOR arrays and monomeric repeats [13,95]. Hence, the centromeric region of human chromosomes is mostly composed of α HORs, eventually punctuated by transposable elements (TEs) [96,97], and progressively replaced by pericentromeric satellite families (classical satellites and β/γ satellites) [23]. Table 1 presents a summary of the available information about human satellite families.

Table 1. Summary of currently recognized human satDNA families features. Different satDNA families present distinct traits and can be divided in AT-rich or GC-rich satellites [29,33,41,47,74,87,88,89,94,95,98,99]. 1-SATII presents large blocks on chromosomes 1 and 16. 2-SATIII is widely represented on chromosome 9 [45].

	Repeat Unit Size	Identified Subfamilies	HOR Formation	Chromosomal Presence	Genome Representativity
αSAT	171 bp	SFs; 28 identified (e.g., pTRA-1/2/4/7)	✓	All	3–5%	AT-rich
SATI	42 bp	pTRI-6	✓	3; 4; All acrocentric	0.12%	AT-rich
SATII	5 bp	3 mentioned, no name identified	✓	1¹; 2; 5; 7; 10; 13–17; 21; 22	1.5% (together w/ SATIII)	GC-rich
SATIII	5 bp	pTRS-47; pTRS-63;	✓	Y; 1; 3–5; 7; 9²; 10; 13–18 ;20–22	1.5% (together w/ SATII)
		pTR9-s3; pTRS-2;
		pE-1/2; pR-1/2/4;
		pK-1; pW-1
βSAT	68 bp	pB3/4	✓	Y; 1; 3; 9; 19; All acrocentric	0.02%
γSAT	220 bp	GSAT; GSATX; GSATII	-	All	0.13%

This entry is adapted from the peer-reviewed paper 10.3390/ijms22094707

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.