Mass Spectrometry and Chromatin Compaction

Mass Spectrometry and Chromatin Compaction: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Biology

Contributor: Simone Sidoli

Chromatin accessibility is a major regulator of gene expression. Histone writers/erasers have a critical role in chromatin compaction, as they “flag” chromatin regions by catalyzing/removing covalent post-translational modifications on histone proteins. Anomalous chromatin decondensation is a common phenomenon in cells experiencing aging and viral infection. Moreover, about 50% of cancers have mutations in enzymes regulating chromatin state. Numerous genomics methods have evolved to characterize chromatin state, but the analysis of (in)accessible chromatin from the protein perspective is not yet in the spotlight.

chromatin
DNA methylation
histone
mass spectrometry
post-translational modification
proteome

1. Introduction

Chromatin accessibility has a fundamental role in a wide range of biological processes including gene regulation and DNA repair. While the dogma open chromatin = transcribed genes still stands, there is still much unexplored in understanding which molecular mechanisms regulate chromatin accessibility, which are inherited, and which are failing in disease pathogenesis [1]. Numerous reviews discuss the genomics strategies and the know-how on chromatin accessibility in much detail, including an excellent recent review of Klemm et al. [2]. However, most of the data we have available come from DNA-centric approaches, i.e., high-throughput sequencing. Protein-centric studies are far less common. One exception might appear to be chromatin immunoprecipitation-sequencing (ChIP-seq) [3,4,5], as it maps protein occupancy on the chromatin; but still it provides DNA reads rather than protein data.

Proteins are actually critical players in chromatin state and dynamics. In eukaryotic cells, DNA exists in close association with histone proteins, which form nucleosomes every ≈147 bp of DNA. Canonical histones are H2A, H2B, H3, and H4, and they are bound especially in heterochromatic silenced domains by the linker histone H1 [6]. Histones are highly decorated by post-translational modifications (PTMs), which contribute to chromatin accessibility directly through their charge state or indirectly by recruiting other proteins involved in chromatin modulation (Figure 1). An example of direct accessibility is histone methylation that occurs on lysine and arginine residues. In mammalian cells, one of the best-studied marks is H3K9me2/3 (histone H3 tails di/trimethylated on the lysine residue 9), which are mainly found in constitutive heterochromatin [7]. Another example is histone acetylation, or rather acylation; acyl groups added on lysine residues neutralize the positive charge of the amino acid, which reduces the electrostatic interaction with the negatively charged DNA. For instance, acetylation of histone H3 is associated with active chromatin and plays a fundamental role in transcriptional activation [8]. Beside the abundant acetylation, histone propionylation, malonylation, crotonylation, butyrylation, succinylation, glutarylation, 2-hydroxyisobutyrylation, and β-hydroxybutyrylation are other examples of acylations detected on histone proteins [9]. Proteins with domains that bind those modifications are defined as “readers”. Some of them are transcription factors, whose duty is to recruit RNA polymerases for transcription [10]. Selected readers contribute to chromatin remodeling, i.e., rearranging DNA from a compacted to transcriptionally accessible state, or vice versa. These chromatin remodelers may directly bind DNA motifs rather than histone modifications, e.g., the SWI/SNF complex has high affinity for DNA and it is required for the enhancement of transcription by many transcriptional activators in yeast [11]. As well, other chromatin readers recognize silencing marks and contribute to chromatin compaction. An example of a protein involved in maintaining condensed heterochromatin is the Heterochromatin Protein 1 (HP1), which recognizes and binds H3K9me3 [12]. The spatial positioning of chromatin in the cell nucleus is also contributing to its accessibility [13]. For instance, Lamina-Associated Domains, or LADs, are heterochromatic domains sequestered at the nuclear periphery. Interestingly, those domains are heavily decorated by a selected histone mark, H3K9me2 [14]. As well, the nuclear pore complex associates primarily with DNA regions silenced by the Polycomb Repressive Complex, at least in Drosophila [15]. In summary, the fine-tuning of protein–DNA interactions, protein–protein interactions, and protein modifications (especially histones) are critical contributors to chromatin accessibility. Intuitively, mutations and anomalous protein regulations can have a dramatic effect on the cell phenotype.

Figure 1. Chromatin dynamics. Chromatin state is modulated by histone post-translational modifications (PTMs) (purple and red dots). Part of condensed heterochromatin is located at the nuclear periphery and is enriched in methylations of histone H3 at the residue K9 (H3K9me3). Readers of this modifications like Lamin B and HP1 maintain the chromatin in a compacted state. Euchromatin is shown as open and accessible chromatin, enriched for histone acetylation (H3 acetyl) and prone to transcription.

Events like UV exposure, smoking, viral infection, cancer, and aging correlate with chromatin decondensation, i.e., large and small heterochromatic domains become euchromatic (accessible and prone to transcription) [16]. In fact, chromatin decondensation is not only an issue in terms of uncontrolled gene expression; chromatin domains decorated by H3K9me2/3 are rich in DNA repetitive units such as transposons, ALUs, and other satellite regions [17], and their accessibility to the transcriptional machinery is harmful for the cell [18]. Together, DNA methylation, histone PTMs, and non-coding regions ensure proper chromatin conformation and promote genome stability [19].

One of the main features of cancer is the (epi)genome instability. The transition from normal tissue to cancer is sometimes characterized by changes in the distribution of H3K9me2/3, in HP1 expression levels [12] and by regional loss of heterochromatin which, in turn, become euchromatin [16]. Additionally, DNA hypomethylation of CpG dinucleotides in the pericentromeric region of the chromosome might be involved in many types of tumors [20]. All these changes directly affect the transcriptional activity and genomic stability, leading to cellular uncontrolled proliferation and metastasis. Interestingly, elevated levels of methylation, especially in gene promoter regions, is related to aberrant silencing of transcription and inactivation of tumor-suppressor genes [20].

Reduced global heterochromatin, altered histone marks, and global hypomethylation of DNA have also been associated with aging. Significant changes in global nuclear architecture during physiological aging, as well as altered gene expression, might be triggered by the loss of heterochromatin domains [21]. This global loss was observed in human old fibroblasts and fibroblasts from Hutchinson–Gilford progeria syndrome (HGPS), indicating that several components are shared between normal aging and accelerated aging syndromes [22]. Senescent cells, during aging, present different levels of histone variants. An example is the loss of canonical histone H3.1 and H3.2 and the increase of the histone variant H3.3, which is incorporated into the genome in a replication-independent manner and plays a key role in chromatin maintenance when cells are no longer dividing [21]. MacroH2A is another histone variant that promotes transcriptional silencing and is abundant in SAHF (senescence associated heterochromatin foci), in addition to being a critical regulator of chromatin dynamics during senescence [21,23]. In fact, chromatin structure is under dynamic changes throughout the entire life span of an organism. Among the histone modifications that are known to affect the longevity process, the most important ones are acetylation and methylation of lysine residues. Increased levels of H4K16ac lead to more open chromatin and, in old yeast cells, it correlates with decreased silencing of reporter genes and shortened lifespan. Conversely, reduced levels of H4K16ac is beneficial for longevity in yeast, due to a more closed global chromatin structure [24].

External events also affect chromatin state. DNA damage accumulates with age, but the process can be accelerated by reactive oxygen species (ROS), exposure to UV irradiation, and alcohol intake. Oxidative stress occurs because of ROS accumulation, affecting chromatin and chromatin modifying-enzymes. In general, it can stimulate global heterochromatin loss and modify histones folding and stability, as well as their PTMs, influencing the expression of genes that are normally in a silenced state [25,26,27]. Besides oxidative stress effects, exposure to ionizing radiation leads to less compact heterochromatin, which adopts a more loose structure [28]. At the same time, there is evidence that radiation induces global compaction of chromatin, indicating a potential mechanism to protect genome integrity [29,30]. Metabolites generated during ethanol metabolism can also impact chromatin structure. Animal experiments demonstrated that excessive alcohol intake modifies the mechanisms regulating chromatin remodeling and gene expression by altering the levels of histone acetylation as well as DNA methylation [31].

2. Mass Spectrometry to Study Chromatin State: First Steps with Nucleotide Modifications

DNA methylation is a known marker of DNA silencing which regulates gene expression and epigenetics inheritance [56,57]. Chromatin domains with methylated DNA are associated with compacted heterochromatin states. This prompted the need for genome-wide DNA methylation analysis and resulted in the continuous evolution of various analytical methods involving bisulfite reactions, the use of methylation-sensitive restriction enzymes, radiolabeling, immunoassays, methylation specific PCR, microarray technology, next generation sequencing, thin layer chromatography (TLC), and reversed phase high pressure liquid chromatography (RP-HPLC) and with mass spectrometry [58]. However, the two most popular methods remain ELISA and mass spectrometry [59]. Immunoassays are arguably faster and simpler, but they tend to be more variable due to non-specificity issues. Mass spectrometry is considered as the “gold standard” due to the high sensitivity and specificity of the technique, but it is not as robust and straightforward.

Paper, thin layer, ion exchange and gas chromatography coupled to either UV or mass spectrometry are historical methods to separate and quantify the four major DNA bases (G, C, A, T) and the methylated DNA bases (5mdC and 5hmdC). In 1980, Kuo and colleagues successfully used C₁₈ chromatography coupled to UV detection to quantify 1–2% 5mdC from DNA of calf thymus and salmon sperm [60]. DNA was previously digested into individual nucleosides using DNase I, nuclease P1, and alkaline phosphatase. Another study reported the use of electrophoretic derivatization and electron-capture negative chemical ionization combined with moving belt liquid chromatography-mass spectrometry to quantify 5mdC and 5hmdC [61]. The method has then evolved including a combination of (1) DNA hydrolysis using HpaII and MspI restriction nucleases, (2) electron ionization gas chromatography, (3) C₁₈ chromatography, and (4) hybridization analysis using a series of probes. Together, this was used to map the methylated regions of DNA containing an actively and differentially expressed somatic H1 histone gene from sperm, embryo, and adult tissues of Chaetopterus worm [62].

DNA methylation was quantified in disease states like leukemia using urine samples [63]. Chromatographic separation has required optimization, mostly because small molecules like nucleosides have weak hydrophobic interaction with reversed-phase C₁₈ columns. Further optimization included varying methanol solvent to decrease surface tension, addition of acetic acid for protonation and found that ammonium acetate/methanol (88:12 v/v) is the best for both chromatographic separation and detection in mass spectrometry using electrospray ionization [63]. The addition of two different RNAses and re-precipitation of DNA was importantly discussed to at least minimize possible interference from 5-methylcytidine residues from tRNA and rRNA contaminants [64]. Song and colleagues [65] reported a chromatographic separation for the efficient separation and detection of 5mdC and 5hmdC from the other four deoxyribonucleosides and methylated RNA nucleoside contaminants by electrospray ionization tandem mass spectrometry using a triple quadrupole mass spectrometer from genomic DNA. Methylated DNA was also analyzed using mass spectrometry in embryonic stem cells by measuring 5hmdC and 5mdC [66].

Ito and colleagues discovered that 5mC is not only converted to 5hmC but also into 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) by TET proteins [67]. Those modified nucleosides represented a greater challenge for quantification due to their lower abundance; 1–20 × 10⁶ cytosine for 5fC and 3 × 10⁶ cytosine for 5caC. In fact, they required modifications to the chromatographic setup to achieve sufficient sensitivity [59]. Chemical derivatization using 2-bromo-1-(4-dimethylamino-phenyl)-ethanone (BDAPE) was introduced to selectively label 5mdC, 5hmdC, 5fdC, and 5cadC [68]. This enhanced sensitivity of 35–123-fold compared to un-derivatized cytosine modifications [59,68]. More recently, hydrophilic interaction liquid chromatography (HILIC) [59,69] and porous graphitic carbon (PGC) [70] have emerged as potential alternatives to C₁₈-based chromatography for nucleosides. However, a robust platform for nucleoside quantification using mass spectrometry is still less common in research labs than one could expect. In summary, quantification of DNA modifications has been the pioneer of chromatin state analysis using mass spectrometry since its establishment about 40 years ago. However, for a comprehensive overview of molecular components of accessible and inaccessible chromatin, a protein perspective is required.

3. Multi-Dimensional Histone Modification Analysis Using Mass Spectrometry

Mass spectrometry is currently the only technique that can identify and quantify in a large-scale manner the relative abundance of histone PTMs [71]. Other techniques, such as Western blotting, ELISA, or immunohistochemistry, may be used for histone PTM quantification. However, all of these methods rely on specific antibodies, which may not be readily available for some PTMs or may provide inaccurate quantitation due to cross-reactivity or epitope masking [72,73]. Mass spectrometry is used to quantify both single and combinatorial histone PTMs [74], although it is important to note that extracted histones are decoupled with their original DNA location and thus this analysis does not allow to define the genome-wide distribution of histone marks. Creative approaches have been exploited in mass spectrometry to look beyond the sole abundance of histone PTMs; e.g., the group of Anja Groth used metabolic labeling in cell culture to monitor whether newly synthesized or recycled histones were transferred into the newly replicated DNA [75]. Nevertheless, this approach cannot define the location of these histones nor provide direct information about their accessibility.

The traditional peptide centric analysis of histone proteins (bottom-up) is similar to the typical proteomics pipeline. Specifically, proteins are usually digested with the protease trypsin (cleaves after lysine and arginine residues) into short (4–20 aa) peptides for analysis, as both chromatographic separation and detection by mass spectrometry are more robust and sensitive than intact protein analysis (top-down). However, histones are very basic proteins, i.e., rich in arginine and lysine residues, and thus trypsin digestion would result in excessively short peptides for reconstructing their position on the protein. For this reason, derivatization of lysine residues is frequently applied to modify the side chain of lysine residues so that trypsin can only cleave arginine residues and generate proper size peptides [71]. Since 2004, the sample preparation strategy has been periodically optimized and different laboratories have applied different derivatization methods to generate proper size peptides, i.e., from the use of D₆-acetic anhydride [76], to propionic anhydride [77] to NHS-propionate [78] to phenyl isocyanate [8]. Independently from the protocol applied, it is now clear that by performing chemical labeling of lysine residues assists more confident and accurate PTM quantification. An overview of the different sample preparation strategies, including their advantages and disadvantages, is described elsewhere [79,80,81,82,83].

While histone PTM quantification per se does not provide direct information about chromatin state, the relative abundance of selected histone marks is used to define how accessible chromatin is overall. For instance, hyperacetylation of histone H4 on the residues K5/K8/K12/K16 (in particular K16 [84]) reveals that chromatin is relatively unfolded. As well, the increase in abundance of selected silencing marks has been interpreted as “restricted” chromatin environment, e.g., in schizophrenia [85]. Those are indirect conclusions to the chromatin state, and we should not forget that hundreds of histone marks have been identified but still not assigned to either accessible or inaccessible chromatin. The advent of “middle-down” mass spectrometry showed an even more complicated picture; this strategy is named as such as it is a compromise between the peptide-based approach (bottom-up) and the analysis of intact undigested proteins (top-down). Middle-down utilizes proteases that cleave rare amino acid residues on histone sequences, i.e., aspartic and glutamic acid, to generate intact histone N-terminal tails (50–60 aa) [50]. Identifying and quantifying these long polypeptides is equivalent to mapping co-existing PTMs on the same histone protein, i.e., this approach can be used to define combinatorial PTM codes [50]. Notably, other strategies not based on mass spectrometry have been implemented to study combinatorial modifications; Shema and co-workers developed an antibody-based imaging platform to map co-existing modifications on nucleosomes [86], while Sadeh and colleagues established a method named Combinatorial-iChIP to map genome-wide the co-occurrence of two histone PTMs instead of the typical single PTM analysis of canonical ChIP-seq [87]. These methods offer undeniable advantages; on the other hand, middle-down mass spectrometry is independent from antibodies and it is not limited in the number of co-existing modifications to quantify on a single polypeptide. From middle-down data, it became rapidly clear that histones are very rarely decorated with one or two modifications in the cells, but they rather have 5–8 co-existing marks on the same histone protein [88]. Frequently, those PTMs have unknown biological function or presumed opposite roles on chromatin. Why do they co-exist then? This is still an unanswered question, as there is currently no technology that can define the accessibility on chromatin of hypermodified histone codes.

The differential turnover of nucleosomes has been described in multiple publications [89,90,91], firmly establishing that nucleosomes are exchanged from chromatin multiple times within a cell cycle. This opens an opportunity for mass spectrometry, as protein turnover can be quantified by metabolic labeling (e.g., Zee et al. [92]). In a recent work, we have assessed that metabolic labeling of histones can be utilized to define whether a certain modification is on actively transcribed chromatin or inaccessible [93]. The principle is based on cell cultures feeding on stable isotope labeled amino acids, which are partially incorporated in the histone amino acid sequence (Figure 2). Those histones with higher recycling rates will have a relatively higher heavy/light ratio, and this recycling rate is more frequent on chromatin domains with active transcription. Interestingly, this labeling has the potential to be utilized for middle-down [94], paving the way to the determination of the accessibility on chromatin of combinatorial histone codes. However, part of the current challenge is developing the proper bioinformatics to discriminate signals corresponding to combinatorial modifications vs. partial metabolic labeling.

Figure 2. Metabolic labeling of histones peptides. Cells in culture are fed with media containing stable isotope labeled amino acids and are maintained for a certain interval of time to produce about 50% of newly synthesized histones. This interval of time is contingent to their proliferation rate, as the population needs to undergo at least one cell cycle for proper labeling. Heavy amino acids are incorporated in the histone amino acid sequence and then cells are processed for histone PTM analysis. Accessible chromatin (euchromatin—in orange) is labeled with higher rate compared to condensed heterochromatin (in yellow). Isotopic labeling is represented in blue.

4. Quantifying the Chromatin-State Dependent Proteome with Mass Spectrometry

Proteomics has become a discipline with many applications, most of them contingent with appropriate sample preparation. The routine procedure of sample preparation for proteomics is arguably one of the simplest among the -omics; most extracted or purified proteins are soluble in water, and they can be prepared for mass spectrometry with a rapid three steps procedure, i.e., reduction, alkylation, and digestion into peptides. For this reason, a myriad of alternative procedures have been engineered to enhance sensitivity, specificity, and quantitative dimensions (time, localization, interactions, turnover rate). In other words, we can analyze the proteome of a specific chromatin state if we establish a dedicated chromatin fractionation procedure that allows to physically isolate chromatin fractions and analyze them separately by mass spectrometry or selectively label those domains (Figure 3). In 2003, Shiio and colleagues, in a pioneering study, developed a method to identify and quantify chromatin-associated regulatory factors by a combination of chromatin isolation and mass spectrometry analysis [95]. A recent approach was named “gradient-seq”, a method in which chromatin is cross-linked and afterwards fractionated over a sucrose gradient based on its resistance to sonication [96]. Cross-linked heterochromatic domains generate larger macromolecular structures, which can be separated by smaller accessible euchromatic domains. However, this method is unable to define the histone PTMs associated with more subtle differences in chromatin compaction since it is largely limited to resolving heterochromatin from euchromatin. Hybridization capture of chromatin-associated proteins for proteomics (HyCCAPP) is an approach developed to identify the protein components of alphoid chromatin, which is rich in a highly repetitive class of DNA [97]. Using this method coupled to mass spectrometry, Buxton and colleagues were able to analyze human protein–alpha satellite interactions. Moreover, locus specific proteomics was performed by exploiting the pull-down of a specific DNA region; two protocols named proteomics of isolated chromatin segments (PICh) [98] and insertional chromatin immunoprecipitation (iChIP) [99] were optimized for the direct identification of the bound proteome. PICh is based on nucleic acid probes, while iChIP utilizes antibodies to precipitate specific proteins benchmarking the locus of interest, e.g., CTCF was targeted to identify insulator complexes, which function as boundaries of chromatin domains. Alternatively, synthetic chromatin was also engineered with histone PTMs using ligation to purify proteins binding to selected histone marks [100].

Figure 3. Chromatin-state proteome analysis. To differentially identify and quantify proteins from accessible vs. inaccessible chromatin, the chromatin is physically fractionated into separate tubes. DNA is cross-linked and the extracted chromatin is fractionated based on its resistance to sonication. Larger macromolecules are the result of sonicated heterochromatin (in yellow), while accessible chromatin is sheared into smaller fractions, i.e., euchromatin (in orange). Those fractions can be separated using gels or centrifugation.

This entry is adapted from the peer-reviewed paper 10.3390/biology9060140

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.