Endogenous Retroviruses Expression Regulation Mechanisms: Comparison
Please note this is a comparison between Version 1 by Gkikas Magiorkinis and Version 2 by Wendy Huang.

Human endogenous retroviruses (HERVs) are the result of retroviral infections acquired millions of years ago; nowadays, they compose around 8% of human DNA. Multiple mechanisms have been employed for endogenous retroviral deactivation, rendering replication and retrotransposition defective, while some of them have been co-opted to serve host evolutionary advantages. A pleiad of mechanisms retains the delicate balance of HERV expression in modern humans. Thus, epigenetic modifications, such as DNA and histone methylation, acetylation, deamination, chromatin remodeling, and even post-transcriptional control are recruited.

  • HERVs
  • expression
  • regulation
  • methylation
  • DNA
  • histone
  • acetylation
  • deamination
  • chromatin

1. Introduction

Human endogenous retroviruses (HERVs) are the result of retroviral endogenization, which happened millions of years ago and currently comprise about 8% of the human genome. HERVs are classified as retrotransposons; they originally utilized reverse transcription for their duplication and reintegration in other sites of the human genome [1]. In humans, ERVs have lost their capacity for replication and retrotransposition [2], while the majority of their loci have lost the coding ability after the accumulation of point mutations, including non-sense mutations, leading to premature stop codons and frameshifts [3]. Still, though, some LTRs have retained intact ORFs with coding capacity [3]. The full-length provirus resembles the typical structure of a simple retrovirus, with the two LTRs flanking the gag, pro, pol, and env genes (reviewed in Mao et al., 2021 [4]) (Figure 1), coding for matrix, capsid, and nucleocapsid proteins (gag), protease (pro), reverse transcriptase and integrase (pol), and the envelope (env) [4][5][4,5]. HERVs have originally been classified in families, the nomenclature of which is defined by the tRNA primer binding site (PBS) at 5′ LTR of gag [6]; however, such categorization leads to problems regarding the actual phylogenetic relations among members of the same family [7].
Figure 1. HERV genes and solo-LTR formation. The HERV provirus contains four main genes: Gag, encoding for the matrix, nucleocapsid, and capsid proteins; Pro, encoding for the protease; Pol, encoding the retrotranscriptase and integrase; and Env, encoding for the envelope. Flanking the genes are two LTRs with enhancer and promoter activity. Solo LTRs are created through homologous recombination, while the internal part containing the genes is lost.
Throughout evolution, certain HERV-originating proteins appear to have been co-opted by mammals to acquire new functions to the benefit of the host [8], most importantly the syncytins (HERV-W envelope: syncytin-1, HERV-FRD envelope: syncytin-2) that majorly contribute to the successful placentation [9]. Furthermore, HERV expression, namely HERV-H, HERV-L, and HERV-K [10], is higher in stem cells and during embryo development, contrary to differentiated cells, suggesting a potential role in the maintenance of the pluripotency state [10]. Additionally, full-length HERV sequences and mainly LTRs can also function as regulatory elements with promoter, enhancer, and repressor activity, poly-A signals, and alternative splicing sites in the human genome [11][12][13][11,12,13].
To maintain the physiological functions of the hosts and control HERV expression at levels that suit each cell type and developmental stage, multiple mechanisms have been developed in the course of evolutionary history. In the following sections, the main mechanisms that aid in the maintenance of the hosts’ homeostasis will be explored.

2. Methylation

Methylation, both regarding the retrovirally originating DNA sequences and in the frames of histone modifications, is employed for HERV transcriptional repression, with the former being mainly a characteristic of the somatic tissues, while the histone 3 lysine 9 trimethylation (H3K9me3) is mainly used as a silencing mechanism in the embryonic stem cells (ESCs) [14][18]. The first mechanism is used to avoid cell reprogramming, while histone methylation is more commonly associated with the prevention of unfavorable gene activation [15][19]. Strikingly, in the differentiating tissues, lower methylation levels and increased levels of HERV expression are found, indicating the importance of the regulation of endogenous retroviral elements in the state of pluripotency toward differentiation [10][16][10,20]. Interestingly, the type of epigenetic machinery employed appears to be dependent on the evolutionary age of the HERV insertions, as it was demonstrated that DNA methylation occurs mainly on evolutionarily younger insertions, while the histone methylation on the intermediate-age/older ones, with lower CpG density [17][21].

2.1. DNA Methylation

DNA methyltransferases (DNMTs) drive cytosine methylation. In mammals, three DNTM classes (DNMT1, DNMT3A, and DNMT3B) have been extensively evaluated. Studies in mice indicate the importance of the DNMT3A and DNMT3B function in gametogenesis and early embryonic developmental stages, while low expression of these enzymes characterizes the somatic cells, rendering DNMT-mediated methylation patterns necessary for the orchestration of the epigenetic hallmarks of pluripotency and differentiation. DNMT3L, despite being a catalytically inactive homolog of DNMT3, seems to play an ancillary role in the DNMT3 machinery [18][22]. Regarding DNMT1, it selectively silences retrotransposon sequences in the developing mouse early embryo genomic (four- and eight-cell stage), mainly in the silencing of the LINE1 (long interspersed nuclear elements) and ERV-K sequences, while in the DNMT1-knockdown mouse embryo, enrichment of the corresponding transcripts was identified, while DNMT1 appears to be non-redundant in the process of embryogenesis [19][23].

2.2. Histone Methylation

Zinc-finger proteins (ZFPs): Kruppel-associated box (KRAB) zinc-finger proteins (KRAB-ZFPs) are crucial in the regulation of gene expression in mammals and are structurally characterized by two domains, the N-terminal KRAB domain and the C-terminal C2H2 zinc-finger domain, with the former functioning through the recruitment of other cellular transcription factors and the latter through binding to specific sequences for the regulation of transcription. KRAB-ZFPs appear to exert an important role in the silencing of transposable elements (TEs), including HERVs [20][24] and the nearby genes, through the regulation of the TE-embedded enhancers and promoters [21][22][25,26]. The KRAB domain recruits the transcription factor TRIM28 (tripartite motif containing 28 or KRAB-associated protein 1-KAP1), which functions as a scaffold for histone deacetylation proteins heterochromatin protein 1 (HP1) and histone-lysine N-methyltransferase (SETDB1) to control transcription [23][27], which appears to be pivotal for TE silencing in early embryos [24][28]. Interestingly, the co-function of TRIM28 and SETDB1 seems to enhance the transcriptional repression of HERVs in a distinct manner, more intensely than each of those factors alone [18][22]. KRAB-TRIM28 also regulates the expression of long-distance sequences through the extension of heterochromatin, as TRIM28 binds to the 3′ end of genes and leads to the extension of H3K9me3 and HP1β toward the 5′ end [25][29]. The interactions between KRAB and TRIM28 are summarized in Figure 2.
Figure 2. Zinc-finger protein pathway. The Kruppel-associated box zinc-finger proteins (KRAB-ZFP) act as recruiters for other proteins that lead to the silencing of the transposable elements. In detail, the N-term of the KRAB-ZFP recruits a tripartite motif containing 28 (TRIM28), which will work as a scaffold for the SETDB1 and HP1 proteins, a methyltransferase that will methylate the histone H3 on the lysine 9 and an H3K9me3 binding protein, respectively.
Current evidence suggests that ZFPs have been evolutionarily selected to recognize foreign sequences and lead to their silencing. This has been described in the case of ZFP809, a ZFP that appears to demonstrate a specificity in embryonic stem cells, where, despite recognizing a significant number of genomic sites, high-affinity binding and heterochromatin development takes place in primer binding sites (PBSs) associated with HERVs. ZFP809 is elevated during the state of pluripotency compared to somatic cells and, upon ZFP809 depletion in embryonic stem cells, leads to a loss of H3K9me3 and HERV transcriptional release [14][18]. The importance of the KRAB-ZFP system in the anti-viral protection of the host has been demonstrated in mice with ZFP961, a PBS-Lys binding protein that restrains both endogenous and exogenous retroviruses. The same study identified through chromatin immunoprecipitation sequencing (ChIP-seq analysis) similar PBS-Lys binding proteins in humans, ZNF417 and ZNF587, which, aside from silencing HERVs through facilitating their methylation, appear to restrain the HIV viral infectivity by interfering with viral transcription and integration [26][30]. The importance of the ZFP HERV transcription control is further shown by the HERV-T and HERV-S TRIM28-mediated silencing and ZNF genes, which appear to participate in innate immunity regulation. They were found to retain their methylation-inducing function in adult human peripheral blood mononuclear cells, where they control interferon responses [27][31]. Upregulation of the transcription of ZFP genes because of increased HERV element expression in tumors indicates a two-way circuit between HERV transcription and ZFP gene activation. Increased HERV and ZFP expression in these cells was linked to the better prognostic features of these cells, including behavior modifications regarding reduced growth, migrating potential, and invasiveness [28][32]. Despite the importance of ZFPs and the TRIM28-SETDB1 network in HERV transcriptional control, SETDB1 and H3K9me3 seem to be equally important and independently functioning in the epigenetic silencing of HERVs in differentiated cells [14][18]. The human silencing hub (HUSH) complex: the HUSH complex is another mechanism that is employed for the defense of the mammalian genome against “invading” sequences, mainly LINE-1 elements and HERVs, through the facilitation of histone H3 lysine 9 trimethylation (H3K9me3) [29][33]. HUSH has been proposed as a “universal, cell-autonomous genome-surveillance system” and could be considered an innate immunity weapon. HUSH initially recognizes and targets long, intronless transcripts, regarding non-exon organized genome as a conserved hallmark of non-mammalian origin; thus, the transcription of these “foreign” DNA elements is necessary for the H3K9me3 through HUSH [29][33]. This complex consists of three proteins resident in the nucleus: transcription activation suppressor (TASOR), M-phase phosphoprotein 8 (MPP8), and periphilin [30][34], with TASOR and periphilin existing in different isoforms as a result of gene conflict [31][35] (Figure 3). HUSH leads to the activation of MORC family CW-type zinc finger 2 (MORC2), an ATP-dependent chromatin remodeler that compacts chromatin, and SETDB1, for the H3K9me3 of the target sequence [32][36]. Studies in mice reveal the significant contribution of the HUSH complex to de novo repression of retrotransposons, especially ERVs and LINE-1, with increased specificity to evolutionarily young integrations through TRIM28 activation. Similar findings regarding HUSH silencing of at least LINE-1 elements have been described in humans [33][37].
Figure 3. The HUSH complex. The HUSH complex is made up of three different proteins: M-phase phosphoprotein 8 (MPP8), transcription activation suppressor (TASOR), and periphilin. The latest binds on the intronless RNA molecule, while TASOR suppresses its expression. After the stop of the transcription, MPP8 interacts with ZCCHC8, a zinc-finger protein of the nuclear exosome targeting (NEXT) complex, leading to the recruitment of an RNA helicase, namely MTR4, and the RNA-binding protein RBM7. The involvement of the NEXT complex results in the decay of the HERV transcript.
Finally, these two mechanisms (DNA and histone methylation) are not separately and independently functioning but rather interact and synergize during the silencing of genomic regions, especially those of retroviral origin, for the integrity of the host, since in embryonic stem cells in mice, de novo ERV sequence methylation was dependent on the presence of KRAB-ZFPs that could recognize and could properly function on these sequences [34][38].

3. Histone Acetylation

Acetylation of ε-amino group of lysines is regulated by the histone/lysine acetyltransferase enzymes (HATs/KATs), leading to the more “unstable” chromatin structure and sequences more available to the transcriptional machinery, the effects of which are reversed by histone deacetylases (HDACs) [35][39]. Interestingly, acetylation can also be applied to non-histone proteins and also be involved in the regulation of different cellular processes, including steps of cell division and differentiation and neuronal function [36][40]. HDACs, similarly, can induce the deacetylation of both histone and non-histone proteins, leading to an alteration of the transcription in the first case but also to the establishment of other post-translational modifications (PTMs) on the lysines, such as methylation [35][39], which indicated their mediating role in other regulatory pathways described previously. Following the consideration of HDAC inhibitors (HDACis) as a means to induce latent HIV-1 provirus expression intended for the eradication of HIV-infected cell reservoirs, interest was drawn to the potential effects of such agents on HERV expression, given their multiple implications in human disease. Results of such studies indicate a rather weak efficacy of acetylation in the control of HERV expression [37][38][41,42]. While multiple HERV loci were differentially transcribed following HDACis treatment of HIV(+) primary CD4(+) T-cells, no specific HERV expression pattern was found, such as global HERV upregulation, with the ERV-L family demonstrating a significant downregulation, and the HERV-9 LTR-12 elements were significantly enriched [37][41]. On the other hand, no substantially different expression of the HERV-K (HML-2) (HK2) env and pol, syncytin-1, and syncytin-2 was found in another study [38][42].

4. Cytosine Deamination

The cytidine deaminases APOBEC3 (A3) family mediates the deamination as a mechanism of control of HERV expression which functions both on DNA and RNA molecules [39][43]. This system, through the expression of seven members of the A3 family, is considered a part of the human innate immunity machinery to protect the host against exogenous and endogenous retroviruses [40][44]. It has been demonstrated that this protein family represses retroelement expression through binding at their genomic sites in the human DNA [39][43], while A3G exerts its antiretroviral functions through the inhibition of the retroviral reverse transcription, aiming at the control of the retroviral replication [41][45]. The evolutionary expansion of the APOBEC protein family could be considered a milestone in the human genome, in terms of protection against endogenous pathogens as inferred compared to other mammals such as rodents, mainly through the A3A- and A3B-mediated regulation of LINE-1 elements and HERVs [42][46]. A3 genes have been amplified in the mammalian genomes in multi-species studies, and this expansion correlated positively to the germline expansion of HERVs; also, this amplification dates concurrently with ancient retroviral invasions. Finally, deamination mediated by the A3 system leads to increased A-to-G point mutation counts in ERV sequences, rendering them more susceptible to deactivation through frameshifts or premature stop codons [42][46].

5. Chromatin Remodeling

Chromatin remodeling, the histone shuffle to render DNA regions accessible for transcription and/or epigenetic modification, is another HERV silencing mechanism [43][47]. One of the mediators of such changes is the SWItch-sucrose non-fermentable (SWI-SNF) complex. Polybromo 1 (PBRM1), a member of the SWI-SNF complex, deregulation leads to the deregulated expression of HERV-ERI in clear cell renal carcinoma with a still unknown mechanism. However, repression of hypoxia inducible factor 1α (HIF-1α) seems to be a part of this SWI-SNF-mediated process [44][48]. Another significant protein family in the epigenetic control of HERVs is the MORC proteins that, after reducing DNA accessibility to the transcriptional machinery, aid H3K9 tri-methylation through the HUSH-complex and DNA methylation by DNMTs [45][49]. Interestingly, MORC3’s proper function is pivotal for ERV chromatin regulation in mice, as global ERV dysregulation was described in MORC3 knockout mice. The HUSH complex also recruits the nuclear exosome targeting (NEXT) complex as a post-transcriptional means of intronless RNA decay after its formation. HUSH and NEXT are simultaneously recruited in the nucleus, where following their binding, they regulate ERV expression both at a transcriptional and post-transcriptional level [46][50] (Figure 3).
Video Production Service