The three domains clearly diverged in how they organize their chromosome structures. However, there are proteins that are commonly used in more than one domain such as histone, HU, and SMC proteins. In this section, we focus on the roles of individual chromosomal proteins in understanding the principles of chromosome folding.
1.1. Histone
Histones are the most fundamental building blocks of eukaryotic chromosomes. In 1990, the first archaeal histone was discovered in
Methanothermus fervidus, a methanogenic archaeon
[1]. Later, it became evident that species in Euryarchaeota (a major phylum in Archaea), except for
Thermoplasmata, encode proteins homologous to eukaryotic histones. Species in other phyla, such as Nanoarchaeota, Thaumarchaeota, and Lokiarchaeota, also encode histones
[2]. Archaeal histones are smaller (~6 kDa) than eukaryotic histones, and they do not possess the tail region that undergoes post-translational modification and is involved in the regulation of chromosome structure and gene expression in eukaryotes
[3]. A recent report of lysine-specific acetylated proteome from
Thermococcus gammatolerans suggests that archaeal histones undergo acetylation
[4]. However, it needs to be clarified whether this modification really takes place in the cell and has a physiological role. Phylogenetic analysis indicates that there are two paralogues of archaeal histones: histone A and histone B. In contrast to the strict subunit composition of eukaryotic histones (a histone octamer consisting of an H3/H4 tetramer with two H2A/H2B dimers), which is defined by the structure of the four-helix bundle (4HB), archaeal histones can form either homo- or hetero-dimers
[2][3]. Studies using AFM have shown that the size of a single archaeal “nucleosome” is approximately 9 nm
[5].
Based on early crosslinking and electron microscopic studies, archaeal histones exist as dimers in solution and as stable tetramers in the presence of DNA
[6]. However, careful observation of the MNase-digested pattern of DNA from
Thermococcus kodakarensis (a minimum of 60 bp fragment and a ladder of 30 bp steps) suggested the presence of structures other than the histone tetramer . This is because if archaeal chromatin is a simple beads-on-a-string structure formed by nucleosomes consisting only of histone tetramers and the linker DNA between them, the size of MNase-protected DNA would be a multiple of 60 bp, without the 30 bp steps that were actually observed
[5].
In 2013, analysis of DNA sequence associated with the MNase-protected ladder showed that archaeal ”nucleosome” is not restricted to tetramer, but it is a flexible structure composed of histone dimer units that can pile up to form larger complex
[7]. In these structures, the histone dimer is a unit that can bind ~30 bp of DNA. It was proposed that a unit of archaeal histone dimer can stack on each other using its flexible 4HB domain, resulting in the formation of an archaea-specific nucleosome that is flexible in its number of histone dimer units
[7].
Since the first proposal of an alternative chromatin structure model specific to archaea
[7], a number of experimental studies have proved its correctness. For example, crystallographic analysis of archaeal nucleosomes revealed that certain amino acids within the 4HB domains are responsible for the multiplex stacking of dimers; mutations in these amino acids resulted in the loss of the multiplex ladder pattern in the MNase assay
[8]. These mutations have led to changes in the expression of several genes
[8], but the biological significance of this structure is not yet fully understood. The flexible multimeric structures formed with archaeal histone have been assigned various names including ”hypernucleosome”
[2], ”archaeasome”
[9], and ”
archaeal
histone-based
chromatin
polymers (AHCP)”
[10]. In this review, the term ”hypernucleosome” was adopted.
The structure of the hypernucleosome may dynamically change depending on the state of the cell and the environment, thereby regulating gene expression in response to changes in the environment
[8][10]. Another important question is the extent to which hypernucleosomes are present on chromosomes in living cells. Furthermore, since hypernucleosomes are not found in all archaeal species
[11], it is possible that they play a role in the evolution of certain lineages of archaea.
Interestingly, although a majority of eukaryotic nucleosomes contain regular histone octamers, several types of non-canonical nucleosomes exist. For example, histone H3 is replaced with the centromere-specific histone CENP-A in eukaryotic centromeres
[12]. Moreover, other structural variants of the nucleosome have been found such as overlapping dinucleosomes, which are formed by nucleosome collision during chromatin remodeling, and hexasomes, which are formed by removing one H2A/H2B dimer from a dinucleosome, that are formed during the transcription process
[12]. The discovery of a flexible hypernucleosome structure in archaea raises the possibility that the archaeal ancestor of eukaryotes already had the ability to regulate chromatin structure through modulation of histone-based chromatin structure
[3].
1.2. Alba
Alba is a 10-kDa DNA/RNA-binding protein found widely in archaea including Euryarchaeota, Crenarchaeota, and newly proposed phyla such as Nanoarchaeota, Korarchaeota, Thaumarchaeota, and Lokiarchaeota
[6]. Alba undergoes post-translational modifications, including acetylation and methylation
[13], and has several different modes of interaction with DNA such as DNA stiffening or bridging
[14]. AFM revealed that Alba forms a ~10 nm fiber without wrapping DNA
[5][15].
A comparison of chromosome structures of different archaeal lineages using a combination of in-vitro reconstitution and AFM revealed that Alba-mediated chromosome structures may differ depending on the type of other chromosomal proteins expressed in the cell
[15]. For example, Alba and histone are co-expressed in several archaeal lineages, including some euryarchaea
[13]. In-vitro reconstitution and competition experiments combined with AFM showed that increasing the amount of Alba decreases the extent of DNA wrapping by histones
[15]. This suggests that Alba plays a role in regulating transcription and chromosome structure by modulating the extent of DNA topology determined by histone binding.
Alba superfamily proteins are also found in eukaryotes and have been suggested to be involved in RNA metabolism
[13]. It remains to be seen whether eukaryotic Alba plays a role in genome folding.
1.3. HU
HU is a major bacterial NAP that bends DNA. AFM analysis of lysed
E. coli cells has shown that its nucleoid is composed of 30–80 nm structures, and RNase A degraded the size of the 30–80 nm structures into 10 nm fibers, indicating that RNA is involved in the formation of the 30–80 nm higher-order structure, but not in the 10 nm fiber
[16]. AFM analysis of nucleoid fibers derived from
Escherichia coli strains that lack a single NAP gene indicates that there is no single NAP that is responsible for the formation of the 10 nm fiber. However, it is clear that proteins (NAPs or topoisomerase) are involved in the formation of the 10 nm fiber, because naked DNA appears only after proteinase K treatment of the nucleoid
[17].
HU protein is associated with the survival of some bacterial species and regulates a variety of cellular processes, such as growth and SOS response
[18]. Single-molecule tracking has shown that HU has several different modes of DNA binding and indicated that HU plays a dual function of compacting the nucleoid through specific DNA structure-binding and decondenses the nucleoid through nonspecific and weak interactions with the genomic DNA
[19].
In the case of archaea, species in Thermoplasmatales are unique in that, although they belong to Euryarchaeota, they lack histones. Instead, they encode HTa, a protein homologous to the bacterial HU
[20]. Early studies have shown that HTa is associated with
Thermoplasma genomic DNA and protects about 40 bp of DNA at a minimum
[21]. Although phylogenetic analysis suggests that HTa was horizontally transferred from bacteria to archaea at some point
[20], it remains unclear whether HTa plays the same role as bacterial HU (i.e., to bend DNA). Several recent studies on HTa have shown that, unlike bacterial HU, HTa in
Thermoplasma wraps DNA, forming particles of approximately 6 nm
[15][20]. Therefore, DNA wrapping seems to be a common requirement for DNA folding in Euryarcaheaota, regardless of the protein used (histone or HTa).
We propose that, at some point after the horizontal transfer of the bacterial HU to the ancestor of Thermoplasma, HTa acquired the ability to wrap DNA; when and how this shift occurred should be a topic for future studies. After HTa adapted to the archaeal environment, including acquiring the ability to wrap DNA, the histone gene has been lost. For the chromosomes of Euryarchaeota, DNA wrappers may be indispensable for maintaining or relaxing the topological state of DNA, or for other unknown reasons. Therefore, Euryarchaea must encode a protein that fulfills this role, but that protein does not necessarily have to be a histone.
1.4. Suppression of Horizontally Transferred Genes by Global Regulatory Proteins
Horizontal gene transfer (HGT) is fundamental to archaeal and bacterial evolution
[22][23]. Various mechanisms of gene flow in archaea have been revealed, for example, transformation in Euryarchaeota, vesicle transport in Thermococcales, transduction (via virus), conjugation in
Sulfolobus, cell fusion in Haloarchaea, and chromosomal DNA exchange in Crenarchaea
[22][23].
H-NS is an NAP that functions as a global repressor of transcription in bacteria. It preferentially binds to AT-rich sequences and can bridge two DNA strands together
[24]. H-NS can suppress the expression of downstream genes by binding to the promoter region and forming a filamentous structure or forming loops
[25]. This characteristic of H-NS is involved in virulence of some bacteria, as well as in the suppression of horizontally transferred genes, a function called ”xenogeneic silencing”
[26]. When a foreign DNA element is incorporated into the genome, H-NS covers the region, thereby suppressing the expression of potentially harmful genes. H-NS detects foreign DNA by differences in GC content compared to the host genomes
[26]. This is important in driving prokaryotic evolution by promoting the horizontal transfer of genes while suppressing their toxic effects
[27]. Although HGT frequently occurs between archaea and bacteria, such xenogeneic silencing was not known in archaea.
Recently, a model was proposed that the archaeal chromosomal protein TrmBL2 plays a role similar to that of H-NS in suppressing horizontally transferred genes
[28]. TrmBL2 is a transcription factor-like protein with a helix-turn-helix DNA-binding motif, which was initially identified as an abundant chromosomal protein that forms a thick (~14 nm) nucleoprotein filament on DNA without shortening (wrapping) DNA. Interestingly, TrmBL2 binds to both coding and non-coding regions and suppresses gene expression when bound to the promoter region
[5]. In contrast to TrmBL2, which can bind to genomic DNA without sequence specificity, the (archaeal) histone binding site on the genomic DNA is more strictly defined by the histone-positioning signal
[7][29]. Single-molecule analysis using magnetic tweezers revealed that the TrmBL2 protein of
T. kodakarensis competes with archaeal histones for DNA binding
[30]. Therefore, TrmBL2 may be located where there is no histone positioning signal on the DNA, and when the binding happens to be in the promoter region, the gene expression is suppressed. Indeed, histone frequency is lower in the promoter region of
T. kodakarensis [7][29]. We propose that genes on DNA segments that are horizontally transferred from bacteria or non-histone-coding archaea can be suppressed by TrmBL2 in this manner (
Figure 1). These facts support the idea that TrmBL2 is indeed a missing xenogeneic silencer in archaea.
Figure 1. A model of how the transcription factor-like chromosomal protein TrmBL2 represses the expression of horizontally transferred genes in archaea. The chromosome of T. kodakarensis chromosome is shown as an example. (a) The genomic sequence and the state of chromosomal protein binding. Histone localization is determined by the histone binding signals on genomic DNA, and TrmBL2 bind to genomic regions with low histone localization because of its low sequence specificity. (b) When foreign DNA is obtained from bacteria, TrmBL2 essentially covers the entire region, because bacterial DNA does not have a histone-binding signal. This way, the expression of potentially harmful genes is repressed in the early stages of horizontal gene transfer.
As discussed above, proteins that serve as transcriptional repressors and genome folding factors also play an important role in suppressing horizontally transferred genes. Are there any such silencer proteins in eukaryotes? The horizontal transfer of genes from bacteria to eukaryotes is known, and several mechanisms have been proposed for how the expression of these genes is initially regulated, but it is not assumed that transcriptional repressors are involved
[31]. In fact, to the best of our knowledge, general xenogeneic silencers such as H-NS in bacteria and TrmBL2 in archaea are absent in eukaryotes. Since eukaryotes have a different gene structure from archaea and bacteria (e.g., eukayotes have introns), they may handle this problem in a way that is different from that of prokaryotes
[31]. In summary, the coupling of chromatin structural proteins and the silencing of horizontally transferred genes may be unique to archaea and bacteria.
1.5. SMC Proteins Are Involved in 3D Structure Formation in the Three Domains of Life
SMC family proteins are essential for higher-order chromosome folding in the three domains of life. In general, SMC proteins are composed of ATPase domains, coiled-coil regions, and a hinge domain (
Figure 2a). SMC proteins usually form larger functional complexes with accessory proteins
[32]. In eukaryotes, condensin and cohesin complexes play roles in mitotic chromosome condensation and sister chromatid cohesion, respectively . Smc5/6 complex plays a role in the cellular response to DNA damage
[33]. Single-molecule analysis using AFM revealed that condensin heterodimer forms a head–tail structure and that the ATPase activity of condensin is regulated by the binding of non-SMC trimer to the head of the SMC heterodimer
[34](
Figure 2b). AFM also showed that condensin but not cohesin induces DNA reannealing through protein–protein assembly
[35]. Loop extrusion by condensin and cohesin has been proposed as a mechanism underlying their function in genome organization
[36]. Detailed analysis using protein engineering and mutagenesis in conjugation with single-molecule experiments revealed that loop extrusion depends on five DNA binding sites in the case of cohesin
[37]. Single-particle tracking using photo activated localization microscopy in live fission yeast revealed that Smc5/6 localize in the nucleus throughout the cell cycle but exhibited more dynamic association with chromatin compared with cohesin
[33].
Figure 2. SMC proteins are commonly involved in 3D structure formation in the three domains of life. (
a) Comparison of SMC proteins in the three domains. Their domain structures and molecular weight are shown. (
b) AFM images of fission yeast condensin heterodimers with a head–tail structure (left). The head (filled triangle) consists of four globular ATPase domains, and the tip of the tail (open triangle) represents the hinge regions. Scale bar, 100 nm. As shown in the hypothetical model (right), the coiled-coil region (green) is folded back at the hinge (purple), and four globular domains (blue) are assembled. The AFM images were taken from Yoshimura et al.
[34]. (
c) Although various proteins are involved in genome folding at the basic level, SMC proteins (marked in light blue) are commonly involved in the highest level of folding. In eukaryotes, the beads-on-a-string structure folds into 30 nm fibers with the help of linker histone H1. Condensin contributes to the formation of mitotic chromosomes. In bacteria, conserved HU forms 10 nm fibers, and then other NAPs and various types of RNAs contribute to the formation of 30 and 80 nm fibers; MukBEF (SMC complex in
E. coli) is involved in higher-order structuring of nucleoid; Dps is responsible with stress-induced compaction. In archaea, regardless of the mode of basic folding of the genome (either DNA wrapping in Euryarchaeota or non-DNA wrapping in Crenarchaeota), the genomes are commonly folded into 30–40 nm globular structures
[15]. In Crenarchaeota, the SMC protein ClsN is implicated in chromosome compartmentalization. It needs to be elucidated as to if SMC proteins play a similar role in Euryarchaeota.
Bacterial species encode proteins homologous to eukaryotic SMCs. Bacterial SMCs have a domain structure similar to that of eukaryotic condensin and cohesin (
Figure 2a), and they play a role in bacterial nucleoid organization. For example, MukBEF in
E. coli and Smc-ScpAB in
Bacillus subtitles are involved in the formation of higher-order nucleoid structures
[38] (
Figure 2c). Single-molecule tracking of
B. subtilis nucleoids revealed that SMC operates by different patterns of motion compared to other DNA-condensing NAPs such as gyrase and HBsu (an HU family protein)
[39][40]. Electron cryomicroscopy (cryo-EM) single-particle analysis revealed the detailed mechanism of how
E. coli MukBEF entraps two distinct DNA strands when bound to the unloader MatP
[41].
The condensin complex is highly conserved across all three domains of life and plays an important role in higher-order chromosome folding
[42] (
Figure 2c). Although proteins homologous to condensin are found in archaea,
Sulfolobus species lack condensin. Recently, coalescin (ClsN), a novel
Sulfolobus-encoded SMC protein, was shown to be involved in the organization of chromosomes into a two-domain compartment structure resembling the eukaryotic A/B compartments
[43]. ClsN is shorter than condensins and has a Zn hock in the middle as in the case of eukaryotic Rad50 (
Figure 2a). Comparison of ClsN binding (ChIP-seq) and transcriptional activity (RNA-seq and RNA polymerase localization) indicated that ClsN is associated with the transcriptionally inactive B-compartment
[43].
Hi-C analysis of euryarchaeal chromosomes showed slightly different structures. The chromosome of
Haloverax volcanii, an extreme halophilic euryarchaeon, and
T. kodakarensis, contains multiple large chromatin loops and self-interacting domains similar to bacteria and eukaryotes. Unlike
Sulfolobus species, these euryarchaeal chromosomes are not organized in a way that separates transcriptionally active and inactive compartments
[44]. It remains to be elucidated whether there is a general rule for chromosome compartmentalization in archaea, the proteins responsible for higher-order genome folding in each archaeal lineage, and the physiological roles of these higher-order structures.
Structurally diverse SMC proteins were found to play similar roles in the higher-order genome folding in the three domains. Interestingly, the absence eukaryotic condensin-like protein and the use of a lineage-specific SMC (i.e., ClsN) coincides with the absence of DNA wrapping at a fundamental level of genome folding in Sulfolobus. Investigating this association between fundamental and higher-order chromosome structures in Sulfolobus and comparing it to other lineages of life may provide clues as to how different genome folding mechanisms have evolved.