Protein molecules can be further modified after translation. Post-translational modifications (PTMs) are responsible for most of the proteome diversity and often modulate critical protein functions in health and disease. Histone H1 is a chromatin structural protein, which contains many PTMs that may alter chromatin structure and function. In this entry, we present a summary of the post-translational modifications (PTMs) identified, up to date, in histone H1 from lower eukaryotes to humans. We also highlight the use of different proteomic strategies, as well as the technical challenges involved in mapping H1 PTMs.
Histone H1 is a key chromatin structural protein, which mediates higher-order chromatin folding. H1 is also emerging as an important epigenetic mark and regulator of gene expression and cellular differentiation. Metazoan H1 has three structural domains: a short N-terminal domain (NTD), a central globular domain (GD), and a long C-terminal domain (CTD). The globular domain contains a winged-helix motif and is responsible for the binding of H1 to nucleosomal DNA. Both terminal domains are intrinsically disordered and have a positive net charge in physiological conditions due to the abundance of lysine residues [1,2][1][2]. The CTD is considered the main determinant of H1-driven chromatin compaction [3]. This domain is also responsible for the preference for the scaffold-associated regions (SAR-DNA) [4], for the interaction with apoptotic nuclease DFF40 [5], with the β-amyloid peptide, and the formation of amyloid-like fibers [6].
Histone H1, also known as linker histone, is the most divergent and heterogeneous group of histones. It has been suggested that the first H1-like proteins may have appeared early in evolution in eubacteria, while the sequence winged-helix motif, present in the GD, appeared much later, in protists [7]. Detection of putative recombination points suggest that this process may have been involved in the acquisition of the H1 tripartite structure [8]. In lower eukaryotes, histone H1 is very heterogeneous. Some protists, like Tetrahymena thermophila, are only lysine-rich with a sequence composition similar to bacterial H1-like proteins and the CTD of metazoans, while the Saccharomyces cerevisiae version of H1, HhoI, contains a second winged-helix motif at its C-terminal end [7,9][9]. However, it has been reported more recently that the high mobility group protein HMO1 functions as a linker histone in Saccharomyces cerevisiae [10].
In mammals, 11 H1 subtypes have been identified [11]. Subtypes H1.0–H1.5 and H1x are differentially expressed in somatic cells, while subtypes H1t, HILS1, and H1T2 are expressed in male germinal cells, and H1oo is expressed in oocytes [12]. Orthologous genes are readily identified across mammalian species, but they are rarely detected in moderately distant phyla. Mammalian H1 subtypes differ in their evolution velocities [7[13],13], expression patterns [12], chromatin binding affinity [14–16][14][15][16], and genomic distribution [17–20][17][18][19][20], among other features. The presence of post-translational modifications (PTMs) adds a new level of complexity to H1 diversity, as individual PTMs or their interplay can modulate protein structure and function. In this entry, we present a comprehensive summary of the PTMs identified in H1, from lower eukaryotes to humans.
The first post-translational modification in H1 was described in the 70s, when phosphorylated H1 was identified in different species, throughout the tree of life. In a relatively short period, H1 phosphorylation was reported in protists such as Physarum polycephalum [21] and Tetrahymena sp.[22[22][23],23], and also in animals including Drosophila melanogaster [24], chicken erythrocytes [25], mammalian cell lines (rat hepatoma cells (HTC), and Chinese hamster ovary cells (CHO)) [26,27][26][27]. The identification of phosphorylation was carried out based on the changes in H1 electrophoretic mobility and on the incorporation of 32P and/or phosphatase treatment. At the time, mapping the phosphate groups to individual residues was still elusive, and significant amounts of the modified protein were necessary for its detection.
The development of mass spectrometry-based proteomics has represented a significant breakthrough in the identification of post-translational modifications, allowing mapping of PTMs to specific residues as well as the detection of low abundant modified species. There are three main types of proteomic strategies, depending on the molecule analyzed by mass spectrometry: top-down, middle-down, and bottom-up. Top-down proteomics analyzes intact proteins, while middle-down and bottom-up proteomics analyze large protein fragments/domains and small peptides, respectively [28]. PTM mapping and identification is mostly carried out by bottom-up proteomics. In this type of proteomic strategy, intact proteins are digested into small peptides, which are analyzed by tandem mass spectrometry (MS/MS). During MS/MS, ionized peptides are selected in the first mass analyzer, fragmented, and the m/z of the product ions is determined in the second mass analyzer. Mass differences between product ions allow for peptide sequencing and PTM assignment. The availability of mass analyzers of high mass accuracy has allowed for the distinction between quasi-isobaric PTMs, trimethylation/acetylation, and dimethylation/formylation [29,30][29][30]
Analysis of histone H1 PTMs by bottom-up proteomics has represented a technical challenge for two main reasons. First, H1 amino acid composition is characterized by a high content of lysine residues. Therefore, digestion with trypsin yields peptides that are relatively small and hydrophilic, which are difficult to detect by MS due to poor retention in the C18 Reverse-Phase Ion-Pairing High Performance Liquid Chromatography (RP-HPLC) column. As a result, regions with the highest density of lysine residues such as the CTD tend to have low coverage. This problem has been addressed by propionylation of amine groups in the protein (N-terminal amines, and free and monomethylated lysine ε-amino groups) before or after tryptic digestion [31,32][31][32]. This modification has significantly improved the coverage of H1 by MS.
The second challenging aspect is the presence in variable quantities of multiple subtypes or variants in higher eukaryotes. In particular, mammalian subtypes H1.1–H1.5 have more than 60% of sequence identity, and the sequence of the GD is almost identical [33]. Thus, many promiscuous peptides, matching the sequence of several subtypes, are often found in bottom-up studies. Several strategies have been used for the assignment of subtype-specific PTMs. Separation of individual subtypes before proteolytic digestion have been performed by capillary electrophoresis [34] and by 2D-electrophoresis. The latter used acetic-urea (AU) or triton-acetic-urea (TAU) electrophoresis in the first dimension, and Sodium Dodecyl Sulfate PolyAcrylamide Gel Electrophoresis (SDS-PAGE) in the second dimension [35–37][35][36][37]. Combining bottom-up and top-down proteomics has also been used to assign PTMs to individual subtypes [31,38–42][38][39][40][41][42].
Bottom-up proteomics has identified and mapped a wide variety of PTMs in several eukaryotes (summarized in Tables S1–S3). To our knowledge, at least 13 PTM types have been identified in H1 including phosphorylation, methylation, acetylation, citrullination, crotonylation, ubiquitylation, formylation, 2-hydroxyisobutyrylation, and ADP-ribosylation (parylation) [29,30,32,42–45][42][43][44][45]. In this section, a brief overview of the PTMs identified in several model organisms is made. For consistency, in all cases, the modified residue number corresponds to the mature protein, which lacks the initial methionine. Therefore, for some modifications, the position number is not the same as the number in the original reference.
Phosphorylation of histone H1 in Tetrahymena thermophila was detected during the 70s. However, the complete characterization of phosphorylation of macronuclear H1 by mass spectrometry, in vegetative growing cells and starved cells, was not performed until 2006 [38] (Figure 1, Table S1). This study confirmed the five phosphosites previously identified by Edman-sequencing and peptide microsequencing [46[46][47],47], which included three consensus cyclin-dependent kinases (CDK) sites with the sequence S/T-P-X-K/R: T34, T46, and T53, and two non-canonical CDK sites: S4 and S5. Additionally, the analysis identified two novel phosphorylation sites, S42 and S44, and two novel acetylation sites, K77/78 and K154 (Figure 1, Table S1).
Separation of phosphorylated species from unphosphorylated up to heptaphosphorylated H1, by cation-exchange chromatography combined with bottom-up and top-down proteomics revealed the precise hierarchy of phosphorylation in this organism, where the first phosphorylated residues corresponded to the CDK consensus motifs. Finally, relative quantification of the phosphorylated species by stable-isotope labeling, Immobilized Metal Affinity Chromatography (IMAC), and mass spectrometry showed that phosphorylated H1 was more abundant in growing Tetrahymena cells than in starved cells, and also confirmed the hierarchy of phosphorylation. In yeast, another unicellular eukaryote, the protein Hho I is generally considered the equivalent to metazoan histone H1. In Hho I, only three phosphorylation sites have been identified by mass spectrometry at S141, S172, and S173 (Figure 1, Table S1). The first residue, S141, is located between the two globular domains, while S172 and S173 are located in the second globular domain, WHD2 [48].
Drosophila melanogaster is a model organism widely used to study differentiation, control of gene expression, and several diseases [49]. In Drosophila, there is only one somatic linker histone (dH1), and one germ-line specific H1 (dBigH1), which is present during early embryogenesis until the zygote genome is activated [49]. The first proteomic analysis of PTMs in dH1 found modifications in the first ten amino acids during embryonic development, consisting of N-terminal acetylation, mono- and diphosphorylation (Figure 1, Table S1) [50]. Analysis of the phosphorylated positions showed that the main phosphorylation site was S10 and that the amount of this modification decreased as the embryos matured. Another four residues, S1, S3, T7, and S8 were also found phosphorylated, albeit in lower proportions than S10.
Analysis of PTMs from cultured Drosophila S2 cells showed the presence of additional modifications including two new phosphorylated positions T19 and S67 as well as eight methylation sites, three acetylation sites, and four ubiquitylation sites (Table S1) [39]. In some cases, more than one PTM was mapped to the same lysine residue. In this study, bottom-up proteomics was complemented with the analysis of the N-terminal domain and of the intact protein in order to determine which PTMs coexisted in the same dH1 molecule. Top-down experiments showed the existence of multiple proteoforms for dH1, containing different arrays of the PTMs identified in the bottom-up approach. Middle-down analysis of the N-terminal domain identified S8 and S10 as the positions modified in the dephosphorylated species, in agreement with the previous study [50]. Other species were detected with combinations of Nα-terminal acetylation, mono- and dephosphorylation, and dimethylation. Tri- tetra- and pentaphosphorylated species were also detected in low proportions.
Figure 1. Modified positions in histone H1 of Tetrahymena thermophila, Saccharomyces cerevisiae, and Drosophila melanogaster. The positions refer to the mature protein, which lacks the initial methionine. Highlighted in yellow, phosphorylation in cyclin-dependent-kinase (CDK) consensus motifs. Question marks are included in post-translational modifications (PTMs) of ambiguous assignment.
Chicken erythrocytes are a model system to study the chromatin structure [51–53][51][52][53]. In chicken erythrocytes, there are six H1 subtypes, H1.01, H1.02, H1.03, H1.10, H1.1L and H1.1R, which amount to 40% of the linker histones. H1 subtypes have more than 85% of sequence identity and lack clear orthologous with mammalian subtypes [33,54][54]. The high sequence identity can impair the assignment of PTMs in chicken H1 subtypes because sometimes the modified peptides are shared between several subtypes. Histone H5, which is considered orthologous to mammalian H1.0, represents the remaining 60% of the linker histones [33].
Analysis of PTMs by bottom-up proteomics identified five modification types in chicken erythrocyte linker histones: acetylation, phosphorylation, methylation, formylation, and deamidation (Figure 2, Figure S1, and Table S2) [55,56][55][56]. All H1 subtypes were N-terminally acetylated. Two additional acetylated residues were found in the NTD of H1.02 and H1.1R (K17, for both subtypes). The NTD was also modified by phosphorylation in the first and/or the third residue, depending on the subtype. Two acetylations and one phosphorylation mapped to the GD, K34ac, K90ac, and S39p (referred to H1.01 sequence) were in peptides common to all H1 subtypes. The two of the acetylated sites of the GD were also found formylated. This modification was also detected in additional sites of the GD. The mass-shift caused by formylation is quasi-isobaric with that of dimethylation, thus, in some cases, the type of modification was not determined [56]. Additionally, one of the asparagine residues of the GD was found deamidated. Up to four acetylated sites were found in the CTD, depending on the subtype. Like in the GD, all the acetylated sites in this domain were found in peptides common to more than one subtype. Two monomethylated peptides belonging to the CTD were detected. One of these peptides is shared by two subtypes and was also acetylated. Finally, only one of the three CDK-consensus motifs in the CTD of H1 subtypes (S155, referred to H1.01) or in the CTD of H5 (S129) was phosphorylated. This result was expected as phosphorylation in H1 decreases during erythrocyte terminal differentiation [56]. In H5, most of the PTMs found were in the NTD. They consisted of five phosphorylations (T1, S3, S7, S22, and S24), three acetylations (T1, K12, and K14), and one monomethylation (K12). Furthermore, the CTD of H5 was phosphorylated in S129 and acetylated in K150. PTMs were differentially distributed among soluble and insoluble chromatin fractions, as shown by relative quantification [55].
Figure 2. Modified positions in linker histones of chicken erythrocytes. (A) PTMs identified in H1 subtypes are shown in the sequence alignment. (B) PTMs identified in H5. The residues of the globular domain are shown inside the box. In yellow, phosphorylated residues located at CDK-consensus motifs. The positions refer to the mature protein, which lacks the initial methionine. The complete sequences and the original sequence alignment are shown in Figure S1.
Identification of PTMs in histone H1 in mammals has been carried out mostly in humans and mice, although some PTMs were characterized in rat testis (Figure 3, Figure S2, Table S3). The most extensive bottom-up study identified multiple PTM types in human cell lines and several mouse tissues [29]. Other studies have targeted specific PTMs including phosphorylation [42], methylation [43], formylation [30], 2-hydroxyisobutyrylation [44], crotonylation [32] or specific cell lines [35,36],, tissues [34], processes [57], and subtypes [58].
Figure 3. Modified positions in mammalian H1 subtypes. The PTMs are shown based on the sequence alignment of the human, mouse, and rat sequence for each subtype. The residues of the globular domain are shown in the box. In yellow, phosphorylated residues located at CDK-consensus motifs. In blue, PTM-hotspots in the globular domain. The positions refer to the mature protein, which lacks the initial methionine. The complete sequences and the original alignments are shown in Figure S2.
From the accumulated data, several conclusions about the abundance, distribution of PTM types, and modification types can be drawn. All somatic subtypes are post-translationally modified. However, the number of modified positions appears to be determined by the abundance of each subtype. Therefore, many PTMs have been identified in the most abundant subtypes H1.2 and H1.4, while very little information is available for low abundance subtypes such as H1.0 and H1x. H1.1 has very restricted expression, but PTM mapping has been performed in the testis and in mouse embryonic stem cells (mESCs), where this subtype is present in significant proportions [59] [59]. In some cases, PTMs in subtypes with high sequence identity, especially in the GD, like H1.2–H1.5 and to a lesser extent H1.1, might be overestimated. However, modifications in the terminal domains are often subtype-specific, as the sequence identity between subtypes is much lower in the terminal domains [33].
Despite the fact that new PTM types like formylation, crotonylation. or citrullination have been described, the most widespread modifications in histone H1 are phosphorylation, methylation, and acetylation. H1 subtypes contain between 27–44 residues that can be phosphorylated, of which up to 40% of them have been found to be phosphorylated (Table S4). Phosphorylated positions have been detected in all structural domains, but most of them are in the terminal domains. The NTD of subtypes H1.1–H1.5 contains two phosphorylation hotspots, the SET motif, and a CDK consensus motif or another residue phosphorylated during the cell cycle. The rest of the CDK consensus motifs (three or four, depending on the subtype) are in the CTD. Phosphorylation is highly abundant in H1, as up to 75% of the proteins are phosphorylated in mitosis [60].
All H1 subtypes are rich in basic amino acids, mostly lysine. Considering that this residue is capable of acquiring different PTMs, H1 subtypes are heavily modified. Acetylation and methylation sites are quite abundant in H1. In human cell lines, acetylation sites were more abundant than methylation, while the opposite was true for mouse and human tissues [29,43]. A different study also showed that the H1 acetylation level in mESCs had increased compared to that of differentiated cells [35]. There are also differences regarding the localization of methylation and acetylation. Lysine residues in the NTD are often methylated, whereas acetylation is predominant in the GD [29,35,43].
Distinct residues of mammalian H1 subtypes have been found modified by formylation, crotonylation, ubiquitylation, citrullination, 2-hydroxyisobutyrylation, and parylation (Table S3). It can be observed in Figure 3 that most of the formylated, crotonylated, and ubiquitylated sites are located in the GD in lysine residues where other PTMs have been mapped. Thus, most lysine residues of the GD including those directly involved in DNA binding can be considered PTM-hotspots as they appear to be targeted by many different modifications, which may modulate the binding properties of this domain in response to different situations and stimuli (Figure 3).
In mammals, there are four H1 germline-specific subtypes. Several studies have characterized the H1 complement in testis, thus allowing the exploration of H1 PTMs in the male germline-specific subtypes [34,57,58,61,62][61][62] (Figure 3, Table S3). Extensive characterization of H1 PTMs has been carried out in rat testis [34,58,61][61]. Perchloric acid extracts were analyzed by mass spectrometry using different separation methods, capillary electrophoresis (CESI-MS), and nano-HPLC-liquid chromatography (LC-ESI-MS/MS). This procedure allowed the identification of modifications in somatic H1 subtypes H1.0–H1.5 and one of the male germline-specific subtypes, H1t. Multiple PTMs were detected in subtypes H1.1, H1.3, H1.4, and H1t, while few modified sites were found in H1.0, H1.2, and H1.5 (Table S3). Most of the sites were identified by both CE and LC, but a few were detected by only one method [34]. The most abundant PTM was phosphorylation, but a few acetylated residues were also found. Acetylation was found at the N-terminus of all the detected subtypes, and also at other positions in subtypes H1.1, H1.3, and H1t. Multiple phosphorylated sites were detected in all subtypes including most of the CDK motifs present in subtypes H1.1, H1.3, H1.4, and H1t. In agreement with mouse and human data, most of H1t PTMs were located in the CTD, but no methylated residues were detected in this species. More PTMs were detected during mouse spermatogenesis than in mature human sperm [57] (Table S3). In H1t, as in somatic subtypes, the predominant modification types included phosphorylation, methylation, and acetylation. In contrast, in H1t, the PTMs were mainly located in the CTD, whereas in somatic subtypes, PTMs have been mapped in the three structural domains of H1 (Figure 3).
Endogenous HILS1, another male germline-specific subtype, was separated from the rest of the H1 subtypes by reversed-phase high-performance liquid chromatography (RP-HPLC), allowing the identification of 15 PTMs including acetylation and phosphorylation. In particular, phosphorylation appeared to be abundant in this protein, as over 40% of the S/T/Y residues were found phosphorylated, with most of the phosphorylation sites located in the GD (Figure 3, Figure S2, Tables S3 and S4) [58]. Interestingly, tyrosine phosphorylation was detected for the first time in linker histones at Y78 of this subtype, a residue located in the globular domain. HILS1Y78p appears in early elongating spermatids, where it co-localized with Transition Protein 2 (TP2), and disappears from the head region of condensing spermatids, remaining only in the tail region [58]. While multiple PTMs were detected in human, mouse, and rat H1t as well as in rat HILS1, no post-translational modifications have been detected in the third male-germline specific subtype, H1T2.
PTMs have been detected in nine out of eleven mammalian H1 subtypes, depending on the species. At least 13 modification types have been described in H1, with phosphorylation, methylation, and acetylation the most abundant. Almost 400 positions, in the consensus sequence for each subtype, have been found modified. This number is a rough estimation of the extent of modification of H1 subtypes because some residues can have alternative PTMs, and also because promiscuous peptides are mapped to more than one subtype. However, the number of PTMs whose function has been described is quite small in comparison.
H1 PTMs have been characterized from lower eukaryotes to humans. The most abundant modification is phosphorylation, followed by methylation and acetylation. PTMs in H1 are variable, depending on the cell type and cell-cycle phase. Post-translational modifications in H1 may favor H1 binding or dissociation from chromatin, therefore altering chromatin compaction, which is a key factor in most nuclear processes. In organisms with more that one H1 subtype, the presence of subtype-specific PTMs can be associated with subtype functional differentiation.