The positions of enhancers and promoters on genomic DNA remain poorly understood. Chromosomes cannot be observed during the cell division cycle because the genome forms a chromatin structure and spreads within the nucleus. However, high-throughput chromosome conformation capture (Hi-C) measures the physical interactions of genomes. In previous studies, DNA extrusion loops were directly derived from Hi-C heat maps. By using Multidimensional Scaling (MDS), we can easily locate enhancers and promoters more precisely.
1. Introduction
For cells to utilize genetic information, many genes must be expressed in a coordinated manner. The accessibility of genomic information depends on how DNA is packed into the chromatin. Chromatin is the basis of various biological processes, including cell cycle regulation and, DNA replication, repair, and maintenance
[1]. Euchromatin is a genome region consisting of DNA with a relatively loose structure. The open structure allows RNA polymerase and other proteins to access the genome for DNA transcription. Enhancers and promoters also approach the euchromatin region to form DNA loops. Gene expression is controlled by promoters near the gene and by gene regulatory sites named as enhancers that are distant from the gene. However, how promoters and enhancers interact with each other to regulate gene expression is not well understood. High-throughput chromosome conformation capture (Hi-C) can be used to analyze the 3D structure of a genome by detecting genomic regions that are spatially close to each other using next-generation sequencing
[2]. This conventional method led to an approximation of the genome structure from the Hi-C heat map
[3]. We demonstrated the potential of using this method for identifying enhancers and promoters by applying multi-dimensional scaling (MDS).
2. Hypothetical Chromosomes
Figure 1 shows a heat map of Hi-C data after arranging these data as shown in Equation 1 below.

where
i and
j are coordinates, and
dij is the number of Hi-C detections. Pairs with large values in the matrix indicate region pairs with a high contact probability. The inverse of the Hi-C data was used as the distance data because MDS was used for similarity matrices. Then, MDS was applied to the Hi-C data, and the resulting hypothetical chromosomes are shown in
Figure 3. The euchromatin region was identified (
Figure 2).
Figure 1. Heat map of Hi-C data after adding weighting.
Figure 2. Distance plot between coordinates (The red line is the threshold and blue points are roots).
Figure 3. The hypothetical chromosomes 18 (0 bp–86,000 kbp).
3. Enrichment Analysis
We used BiomaRt
[4] in R to retrieve genes from the obtained coordinates. Finally, the obtained euchromatin regions were subjected to enrichment analysis using g: Profiler
[5]. The results are presented in
Table 1 and
Table 2. The functions and processes involved in transcription were also determined such as the pre-transcriptional initiation complex and RNA polymerase II initiation complex, and transcription factors involved in cancer, such as CAMP responsive element binding protein 3 (CREB3)
[6] and forkhead box M1 (FOXM1)
[7].
Table 1. Results of enrichment analysis of 90 min Hi-C data by g:Profiler.
Table 2. Results of enrichment analysis of 120 min Hi-C data by g:Profiler.
4. Comparison with Previous Studies
In addition, several studies have used MDS to analyze Hi-C data for accurately reproducing 3D genome structures. The framework for predicting 3D genomic structures using t-distributed stochastic neighbor embedding (t-SNE) is named as StoHi-C
[8]. MDS has inherent problems with very sparse high-dimensional Hi-C datasets, whereas tSNE overcomes these limitations. This method can reproduce the characteristics of chromosome 3D structures more clearly than MDS in yeast Hi-C data. The distances between the coordinates obtained from the 3D structure reproduced by the StoHi-C method are shown in Figure 4. As shown in Figure 4, attempts to precisely reproduce the 3D structure resulted in no significant difference in the distance between coordinates, even when acquiring DNA loops with a threshold value. Therefore, the enhancers and promoters cannot be precisely identified. We focused on the ones with a large number of Hi-C detections, although the distance between coordinates is large because the goal of this study was to identify enhancers and promoters. Therefore, we added weights as shown in Equation (1).
Figure 4. Distance plot between coordinates by StoHi-C.
Based on our results, it is useful to obtain DNA loops by automatically visualizing the chromosome structure using MDS, as performed in this study.
This cover illustration is Attribution 2.0 Generic (CC BY 2.0).