2. Models of Distance Interactions between Regulatory Elements
Two models have recently been proposed to explain long-range interactions in the genome. The main model is based on the findings that originate from mammalian Hi-C and ChIP-seq studies and indicate that the cohesin complex, together with CTCF, forms most of the enhancer–promoter interactions and boundaries of topology-associated domains (TADs)
[7,8,9,10][7][8][9][10]. Inactivation of the cohesin complex or CTCF results in partial disruption of chromosome organization in TADs
[11,12,13][11][12][13]. The cohesin complex is highly conserved in eukaryotes, and its main function is to hold sister chromatids together during mitosis and meiosis
[14,15][14][15]. The cohesin complex consists of four subunits, which form a ring around the two DNA strands by using the energy of ATP
[15]. A cluster consisting of 11 zinc finger domains of the C2H2 type is a feature of the structure of the CTCF protein
[16,17,18][16][17][18]. Five C2H2 domains of CTCF specifically bind to a 15 bp motif, which is conserved in animals and determines most of the functional properties of this architectural protein
[19]. A conserved motif interacting with the cohesin complex was found at the N-terminus of human CTCF
[20]. A classical model suggests that, once fixed on chromatin, the cohesin complex begins ATP-dependent DNA extrusion with the formation of a chromatin loop
[21]. CTCF blocks the movement of the cohesin complex, thus leading to fixation of the boundaries of chromatin loops at the CTCF sites
[22].
An alternative group of models is based on the studies of mammalian LIM domain-binding factor 1 (LDB1)
[23], the
Drosophila architectural C2H2 proteins
[24[24][25],
25], and the
Drosophila proteins that preferentially regulate the activity of housekeeping promoters
[26,27][26][27].
In mammals, the C-terminal domain of LDB1 interacts with DNA-binding transcription factors of the LIM family
[23]. The N-terminal domain of LDB1 forms a stable homodimer
[28] to maintain long-range interactions between enhancers and gene promoters
[29,30][29][30].
In
Drosophila, several architectural C2H2 proteins have been characterized and shown to preferentially bind to gene promoters and known insulators
[17,24,25][17][24][25]. The architectural proteins of this group have clusters of C2H2 domains, some of which specifically bind to motifs of 12 to 18 bp in length
[17,24,25][17][24][25]. Most of the
Drosophila C2H2 architectural proteins have structured domains that form homodimers at the N-terminus
[31,32,33][31][32][33]. Interestingly, unstructured homodimerization domains are found at the N-terminus in the CTCF proteins of various animals, including
Drosophila and mammals
[34]. The domain is required for functional activity of
Drosophila CTCF (dCTCF)
[35], while the role of similar domains in mammalian CTCFs remains unstudied. In
Drosophila, dCTCF, Pita, and Su(Hw) are the best-characterized architectural C2H2 proteins and determine the activity of most of the known
Drosophila insulators
[36,37,38][36][37][38]. Binding sites for these proteins can support long-distance interactions between regulatory elements in model transgenic lines
[33,39,40][33][39][40].
The CP190, Chromator, Z4, and BEAF proteins preferentially bind to insulators and promoters of housekeeping genes, which are at the boundaries of most
Drosophila TADs
[26,27,41,42,43][26][27][41][42][43]. The proteins interact with each other and contain homodimerization domains
[44[44][45][46][47][48],
45,46,47,48], suggesting their likely involvement in maintaining long-distance interactions. Like mammalian LDB1, CP190 is recruited to regulatory elements through interactions with DNA-binding transcription factors including dCTCF, Pita, and Su(Hw)
[49].
Either model by itself cannot explain a number of experimental results. For example, it was shown using Micro-C that inactivation of CTCF or cohesin does not affect the formation of chromatin loops between regulatory elements in mouse embryonic stem cells
[50]. On the other hand, alternative models do not explain how distant chromatin regions initially find each other to form a stable pairing, which is necessary for the organization of chromatin loops. The most obvious is a combination of the two models, which will explain most of the current experimental data in both mammals and
Drosophila (
Figure 1A). In
Drosophila, ChIP-seq data show that motifs recognized by different architectural C2H2 proteins are combined in many insulators and promoters
[51,52][51][52]. Recent studies in mammals showed that, in well-studied genomic regions, CTCF binds in cooperation with the other C2H2 proteins ZNF143, MAZ, and WIZ
[53,54[53][54][55][56],
55,56], which are involved in the formation of long-distance interactions. MAZ and WIZ were shown to interact with the cohesin complex
[54,55][54][55]. The cohesin complex likely interacts with a large number of C2H2 proteins. It can be assumed that the movement of cohesin complexes is most efficiently blocked in the chromatin regions that are associated with groups of C2H2 proteins. As a result, cohesin brings the regulatory elements together in a space, and their pairing is additionally stabilized by multiple interactions between the homodimerized domains of C2H2 architectural proteins and their associated partner proteins, such as CP190, Z4, and Chromator.
Figure 1. Combination of two models of distance interactions. (A) Local interaction between regulatory elements. Various combinations of architectural proteins bind to insulators or tethering elements. The same associated proteins (such as CP190, Z4, and Chromator) bind to different combinations of architectural proteins. The specificity of distance interactions between tethering elements/insulators is determined by the number of C2H2 proteins associated with different elements that are capable of interacting with each other. (B) Two copies of an insulator interact in head-to-head orientation.
The specificity and stability of the interaction between two regulatory elements is determined by the number of involved proteins whose domains are capable of forming homodimers (
Figure 1A). Studies in transgenic
Drosophila lines showed that two identical copies of any of the insulators tested pair in a head-to-head orientation
[39,57][39][57]. When two identical insulators were oriented head-to-head, the configuration of the resulting chromatin loop was favorable for the interaction between a promoter and an enhancer located outside the loop (
Figure 1B). When the insulators were in the same orientation, the enhancer could only stimulate the promoter when it was inside the loop. Such orientation-dependent interaction between identical copies of insulators is consistent with the model that regulatory elements consist of binding sites for several C2H2 architectural proteins, each of which can support long-distance interactions via its homodimerization domains. A direct consequence of the model is that inactivation of any architectural protein should not significantly affect the organization of chromosome architecture but may disrupt the individual local interactions between enhancers and promoters.
3. Current Models of Enhancer—Promoter Communication
Enhancers usually average about 500 bp in size and consist of combinations of motifs recognized by DNA-binding transcription factors (TFs), which suppress or activate enhancer activity (
Figure 2A).
[58]. Enhancers can be assembled into large modular super enhancers, which range in size from 5 to 50 kb
[59]. The main function of enhancers is to mediate the recruitment of the mediator complex to promoters, resulting in transcriptional activation
[60,61][60][61].
Figure 2. Model of promoter activation by an enhancer. (A) Activation or suppression of enhancers. The concentration of activators and repressors determines the fate of the enhancer in a particular nucleus. The mediator complex is recruited to the active enhancer. TFs can still bind to a repressed enhancer. In this case, Polycomb proteins play an important role in the suppression of enhancer activity. Alternatively, compaction of chromatin leads to dissociation of TFs from the enhancer. (B) Possible mechanism of functional interaction between an enhancer and promoters at a distance. Tethering elements or insulators form a chromatin loop that brings promoters into the active zone of the enhancer. The mediator complexes bind to the promoters located in the area of the enhancer. The level of transcription depends on the properties of a particular promoter.
The mediator complex is conserved in eukaryotes and consists of 26 subunits in mammals. The subunits are grouped in three modules, which are called the head, middle and tail (
Figure 2A). A core part of the mediator interacts with the kinase module, which can function both as part of the complex and separately
[60]. The head and middle modules provide interaction with RNA polymerase II; the tail module is responsible for the binding of the mediator with TFs on enhancers and the main TFIID complex on promoters
[61] (
Figure 2B). Binding to the mediator complex, the kinase module blocks its interaction with RNA polymerase II. The tail module is the most flexible and can take on various conformations
[62,63][62][63]. The mediator complex binds to the non-phosphorylated carboxy-terminal domain (CTD) of RNA polymerase II, and the binding changes the conformation of the tail module. Next, RNA polymerase II is released from the complex with the mediator after CTD phosphorylation on the promoter to change the conformation of the tail module again. It is likely that different conformations of the tail module determine the specificity of binding of the mediator with TFs on enhancers or TFIID on promoters.
Several complexes with enzymatic activities are also recruited to enhancers: acetyltransferase (p300/CBP), methyltransferase (Mll3/Mll4/COMPASS), and deubiquitinase
[64]. Mll3/Mll and p300/CBP are responsible for histone H3 monomethylation at lysine 4 (H3K4me1) and acetylation at lysine 27 (H3K27ac), respectively. The H3K27ac and H3K4me1 modifications of histone H3 are thought to reduce the stability of nucleosomes, resulting in the formation of open chromatin
[65]. In addition, the enzymatic complexes can introduce modifications into TFs that bind to enhancers and gene promoters, thereby stimulating their activity
[66,67][66][67]. For example, p300/CBP may play an important role in acetylation of the TFs involved in the pre-initiation complex formation
[66]. Acetylation of different domains in the p53 protein usually positively regulates its activity
[68]. Methylation of p53 at K327 increases its stability and ability to stimulate transcription
[69]. There are other examples of the positive role of TF methylation and acetylation, but this area remains poorly studied in general.
In addition to transcription activators, repressors are recruited to enhancers to suppress their activity in cells where the enhancers should not function (
Figure 2A). Complexes with deacetylase and, less commonly, demethylase activities are recruited to enhancers by repressors
[70]. Deacetylation of TFs on enhancers probably decreases their ability to attract enzymatic and mediator complexes. In addition, histone deacetylation increases chromatin compaction, thereby reducing the ability of TFs to bind enhancers
[71]. Thus, enhancer activity in a particular cell is determined by the concentration of TFs interacting with activator and repressor complexes (
Figure 2A).
The Polycomb proteins play an important role in the suppression of enhancer activity
[71,72,73][71][72][73]. Two main Polycomb complexes are known in
Drosophila, of which one has ubiquitinating activity (Polycomb repression complex 1, PRC1) and the other has methyltransferase activity (Polycomb repression complex 2, PRC2)
[73]. PRC1 and PRC2 can be recruited directly to enhancers and promoters through interactions with DNA-binding TFs
[74]. A large number of variations in these two basic Polycomb complexes have been found in mammals, with them being determined by the need to finely regulate numerous groups of enhancers and promoters during development and cell differentiation
[71]. The most studied mechanism of repression is the formation of inactive chromatin through the introduction of H3K27me3 and H2AK119ub modifications into nucleosomes mediated by Polycomb complexes
[71,73][71][73]. Methylation and ubiquitination of key TFs is also a possible mechanism to suppress enhancers and promoters. For example, methylation of lysine 99 in the coactivator BRD4 negatively regulates its activity in transcription
[75].
Recruitment of the Polycomb complexes to enhancers can lead to their transformation into silencers that repress transcription of adjacent genes
[76,77,78][76][77][78].
Drosophila has well-characterized, specialized regulatory elements that specifically recruit PRC1 and PRC2 and they are called Polycomb response elements (PREs)
[79]. Such regulatory elements can function as specific silencers, increasing the efficiency of the complete repression of the enhancers and promoters that should be completely turned off in a certain group of cells during development
[80,81][80][81].
Two recent studies
[82,83][82][83] investigated the compatibility of enhancers and promoters. It was found that enhancers preferentially activate weak promoters rather than strong promoters, which normally determine the transcription of housekeeping and cell cycle genes. In general, it was shown that most of the enhancers tested can activate almost every promoter. A lack of specificity of interactions between enhancers and promoters presumably increases the role of insulators and TADs in limiting enhancer–promoter interactions.
However, recent studies have shown that TADs do not block long-range interactions between enhancers and promoters
[50]. It was shown using
Drosophila transgenic model systems that chromatin loops formed by interacting insulators cannot effectively block the interaction between enhancers and promoters
[40,84,85][40][84][85]. Thus, there are no strict structural restrictions to block the co-localization of enhancers and promoters belonging to different regulatory domains. Using micro-C, intense contacts were detected in the genome between certain genomic sites including enhancers, promoters, and insulators that do not coincide with TAD boundaries
[50,86,87,88][50][86][87][88]. A special class of regulatory elements, called tethering elements, was isolated in
Drosophila. The elements occur next to enhancers and promoters and form stable chromatin loops between them
[86]. Ultra-high resolution microscopy showed that some functionally interacting enhancers and promoters are relatively far away from each other
[3,89][3][89].
It can be assumed that mediator complexes are concentrated on enhancers as a result of multiple interactions between subunits of the tail module and unstructured domains of enhancer-associated TFs
[61] (
Figure 2B). In the next stage, the mediator leaves the enhancer as a result of a change in the conformation of the tail module. Conformational changes in the tail module are possibly a result of methylation (Mll3/Mll4/COMPASS?), acetylation (p300/CBP?), or phosphorylation (the kinase module?) of subunits of the mediator complex. However, this issue has not been studied as of yet. In the new conformation, the tail module has greater affinity for the TFIID complex on the promoter, resulting in pre-initiation complex formation and the recruitment of RNA polymerase II. The enhancer-bound p300/CBP complex can simultaneously acetylate TFs to activate them. Increasing concentrations of active forms of the mediator complex and TFs should stimulate the promoters located in a certain active zone around the enhancer
[89,90][89][90]. It does not matter to such a trans-activation mechanism whether the enhancer and promoter are in close contact, interact briefly, or are at some distance from each other. Interactions between insulators and/or tethering elements lead to the formation of chromatin loops, which form a region in which enhancers stimulate a specific group of promoters. In some cases, chromatin loops can reduce the likelihood of promoter localization in the nuclear region where the enhancer functions.
4. Interacting Insulators form an Autonomous Regulatory Domain of the eve Gene
The regulation of the pair-rule gene
even-skipped (
eve) is one of the best studied in
Drosophila (
Figure 3A).
[91,92,93,94][91][92][93][94]. Eve belongs to a group of primary pair-rule factors whose stripe-pattern expression starts in early embryonic development
[95,96][95][96]. The
eve gene is in the center of a 16 kb domain surrounded by housekeeping genes, which are active in all cells.
Figure 3. Model of transcriptional regulation of the pair-rule gene eve in early Drosophila embryos. (A) Schematic representation of the eve regulatory region that is flanked by the Homie and NHomie insulators. (B) Transcriptional activation model of the endogenous eve gene and the reporter transgene in the stripe 7 of early embryos. The interaction between the Homie and NHomie insulators forms a zone in which the activated eve enhancer can stimulate transcription of the endogenous eve promoter and the reporter gene promoter. Identical copies of the Homie insulator located in the endogenous eve locus and the transgene interact in head-to-head orientation, which brings only the reporter located on the head side of the insulator into the active eve enhancer zone.
The body is divided into segments with certain morphological differences in
Drosophila, like in all insects
[97]. Segments formed at the embryonic stage are called parasegments (PSs). During the early development of an embryo, 14 PSs are formed, corresponding to anatomical structures of the larva. PSs are initially determined by the products of the maternal genes
Bicoid (
Bcd),
Hunchback (
Hb), and
Caudal (
Cad), which precisely regulate the expression levels of gap group genes, including
hunchback (
hb),
Kruppel (
Kr),
knisps (
kni), and
giant (
gt) [98,99,100,101][98][99][100][101]. In early embryos, the maternal and gap genes cooperatively regulate the expression of a large group of pair-rule genes, including
eve and
fushi tarazu (
ftz)
[102,103][102][103]. The
eve gene is expressed in seven broad stripes along the anteroposterior (AP) axis of the embryo during its early development (
Figure 3A). At this stage,
eve expression is controlled by five enhancers that are active in separate stripes
[95,96][95][96]. The stripes that express
eve subsequently become thinner with clear anterior and posterior borders
[104]. Expression of the
eve gene at this stage is controlled by a single enhancer, which is bound with the early pair-rule proteins paired, runt, and sloppy-paired
[105]. At late stages of embryonic development,
eve expression loses its characteristic pattern and is controlled by several tissue-specific enhancers.
The
eve enhancers contain binding sites for ubiquitous transcriptional activators, such as STAT and Zelda (Zld), and the maternal Bicoid activator
[70,106,107,108][70][106][107][108]. Repression of the enhancers is controlled by the Kr, Kni, and Gt proteins, which recruit the CtBP repressor complex
[70]. CtBP-dependent repressor complexes have deacetylase activity. Finally, the Hb protein can recruit activators or repressors to the enhancers, depending on the nearby partner proteins
[109]. For example, the stripe 3 + 7 enhancer is stimulated by the activators Zld and STAT and repressed by Hb and Kni
[107,108][107][108]. At the same time, the stripe 2 enhancer is controlled positively by Zld, Hb, and Bcd and negatively by Gt and Kr.
Each stripe enhancer has a specific set of activator and repressor motifs, which are arranged in a specific sequence and orientation. Each stripe enhancer shows more efficient recruitment of activator (acetylase activity) or suppressor (deacetylase activity) complexes, depending on the concentration of gap repressors in the nucleus. TF acetylation/deacetylation is likely to stabilize the active/inactive status of each stripe enhancer. Deacetylation of nucleosomes also leads to the formation of more stable local chromatin, which blocks the binding of activators to enhancers. This possibility is consistent with the finding that the Zelda and Hb proteins cannot stably bind to their sites on chromatin
[110].
The complex regulatory region of the
eve gene (
Figure 3A) is flanked by housekeeping genes, which are expressed in all cells
[111,112][111][112]. The housekeeping gene
TER94 is on one side of the regulatory region of the
eve gene and is actively transcribed in all cells. The other side is flanked by the 3′ region of the
CG12134 gene, which shows ubiquitous but weaker expression.
A 368 bp insulator (
Figure 3A) was found immediately upstream of the core promoter of the
TER94 gene
[111,112][111][112]. The insulator efficiently blocks the activity of embryonic enhancers in model transgenic lines. When the insulator was inserted into the P-transposon, the construct was found to preferentially integrate into the genomic region near the
eve locus
[111,112][111][112]. This effect is called homing and is explained as follows. When DNA of the P-transposon with the insulator is injected, proteins are assembled on the insulator to form a complex, which interacts with a similar complex on the endogenous insulator to increase the specific integration of the P-transposon. The insulator was therefore named Homie. The function of Homie in vivo is currently unknown since its deletion has not been obtained. It is likely that Homie performs many functions, one of which is to be the distal part of the
TER94 gene promoter since deletion of the insulator significantly reduced
TER94 expression in transgenic lines
[111].
A PRE was found next to the insulator (
Figure 3A); its function is to negatively regulate the
eve gene enhancers at the late stages of embryogenesis
[111]. Homie was assumed to protect
TER94 expression from the PRE, which represses
TER94 transcription in oocytes and late embryos in transgenic lines
[111]. A second insulator (
Figure 3A), named new Homie (NHomie), was found between the 3′ UTR of the
CG12134 gene and the regulatory region of the
eve locus
[113]. Interestingly, both insulators are bound with the Su(Hw)
[114] and Ibf1/2
[115] proteins. The proteins can be involved in recruiting CP190 and Mod(mdg4)-67.2 to Homie and NHomie
[115,116][115][116]. Homie additionally binds with Pita, which is another architectural C2H2 protein, and also interacts with CP190
[52,117][52][117]. Thus, the Homie insulator has binding sites for two architectural C2H2 proteins. In Micro-C studies, Homie and NHomie efficiently interacted to form a small TAD in embryos
[86].