After intensive research, there is a consensus that FSHD is caused by the aberrant expression of the full-length isoform of double homeobox transcription factor (DUX4), particularly in skeletal muscle nuclei [
9].
DUX4 is normally expressed in the early stages of development in stem cells and germ lines, especially in the testis, while it is repressed via a repetition-mediated epigenetic silencing (methylation) mechanism [
10] during cell differentiation [
11] and in most adult somatic tissues, including muscle [
12,
13], except for the thymus [
14] and keratinocytes [
15]. However, the precise mechanism by which this gene induces dystrophic changes, as well as the changes themselves induced at the cellular level, are still controversial and under investigation [
12,
15]. Tassin et al., have proposed a dynamic model for DUX4 protein expression in FSHD myotubes. In this model, the DUX4 transcription factor, initially expressed in few nuclei, diffuses in many nuclei of the myotube and thus activates a transcriptional deregulation cascade in each nucleus to which it has diffused. The presence of a dystrophic phenotype causes compromised muscle tissue. The researchers hypothesized that even low levels of DUX4 could result in the formation of amorphous muscle cells [
16]. The relatively higher presence of DUX4 in myotubes as opposed to proliferating myoblasts may imply that
DUX4 transcription is induced during differentiation.
2. Myogenesis and Muscle Regeneration
Skeletal muscle, one of the three major muscle types, is a contractile tissue responsible for movement, maintaining posture, supporting soft tissues, and maintaining temperature. Tendons are bundles of collagen fibers that connect skeletal muscles to bones, skin, and other muscles. Skeletal muscle is composed of multinuclear cells called myofibers [
17], which are formed by the fusion of myoblasts during development [
18]. When a muscle is injured, it activates a complex response that leads to tissue regeneration [
19,
20,
21]. Skeletal muscle regeneration is primarily mediated by satellite cells (SCs) that receive signals from the surrounding environment [
17,
22,
23,
24], which replenish myogenic progenitor cells and differentiate into new myofiber for muscle repair in response to injury [
17,
25,
26]. Muscle regeneration and differentiation are initiated with the modulation of the expression of certain genes and proteins: myogenic regulatory factors (MRFs, summarized in
Table 1).
Table 1. Myogenic regulatory factors (MRFs) related to FSHD.
The term myogenesis refers to the complex cellular process that mediates the formation of a skeletal muscle fiber starting from a myogenic precursor during embryonic development, as in adult tissue repair [
29] (
Figure 2). The myotomy level, which is the sub-medial area of the somite, formed by cells that, splitting from it, stretch and gradually differentiate into myoblasts, is where myogenesis starts in the first week of embryonic development [
30]. Their fusion in myotubes occurs before the development of mature muscle fibers so that the entire locomotor apparatus originates from the myotomes of the various subtypes. Myoblasts and muscle fibers in the maturation phase are also surrounded by a fibroblastic connective tissue scaffold that guides the development and spatial organization [
31]. Other factors, such as the surrounding environment and paracrine factors, actively control and regulate the process, activating muscle-specific genes. The coding genes for the factors responsible for myogenic differentiation begin to express themselves in a coordinated manner in proliferating myoblasts [
32]. MRFs are products of these genes (
Figure 2): their expression is mediated by paracrine factors and the factors present in the surrounding microenvironment, which first activate the paired box gene 3 (Pax3) transcriptional factor [
33].
Figure 2. Myogenic markers and stage-specific expression of the major proteins involved in muscle differentiation.
Among the major MRFs are myoblast determination protein 1 (MyoD) and myogenic factor 5 (Myf5) transcription factor proteins belonging to the family of basic helix–loop–helix (bHLH) myogenic proteins, which are responsible for myoblastic commitment and myogenic regulatory factor 4 (MRF4, also known as Myf6) and myogenin, which are required to keep myotubes differentiated [
28]. Pax3 directly activates the transcription of MyoD and Myf5. Myoblasts are myotome cells that produce bHLH proteins and can proliferate in the presence of specific growth factors. The depletion of these factors is responsible for cell proliferative arrest, fibronectin secretion, and the expression of integrin receptors. The fibronectin–integrin adhesion signal is required for myoblasts to start the differentiation process. Cell–cell recognition is the event that causes the arrest of the cell cycle [
34,
35].
This may thus begin cell fusion, giving rise to myotubes. Cells at this point are unable to respond to mitogenic stimuli, and during the late stages of the process, myoblasts that have already fused secrete factors, promoting the fusion of additional myoblasts to the myotube in formation. During embryonic myogenesis, mesoderm-derived structures generate the first muscle fibers of the body proper, and in subsequent waves, additional fibers are generated along these template fibers [
36,
37]. Although the mesoderm is the only germ layer of a trilaminar embryo capable of generating skeletal muscle, the exact sites of origin and regulators of body muscle vary, depending on the group of specific embryonic muscle, i.e., dorsal or ventral trunk muscle, limb muscle, and head or neck muscle [
38]. Pax3-positive myogenic stem cells, named SCs, are located between the basal lamina and sarcolemma of associated myofibers which ensure adult muscle growth. These cells can both replicate themselves (self-renewal) and, after activation, escape from a quiescent state and give rise to proliferating myoblasts by re-entering the cell cycle.
Skeletal muscle can regenerate itself on a daily basis as well as in response to injury [
39]. This ability is due, at least partly, to the adult stem cell population that has been named SCs because of their location at the periphery of mature skeletal myofibers. Muscle regeneration depends on a balance between pro-inflammatory and anti-inflammatory factors, which determine whether the damage is repaired with muscle fiber replacement and the reconstitution of a functional contractile apparatus, or with scar formation [
40]. Muscle tissue repair following damage can be thought of as a two-step process with two interdependent phases: degeneration and regeneration.
The first event, degeneration, is characterized by the disruption of myofibers. In the early stages of muscle injury, inflammatory cells usually infiltrate the damaged muscle. Among the primary immune cells involved, macrophages play a critical role: after the infiltration, macrophages phagocytose cellular debris and remove disrupted myofilaments, other cytosolic structures, and the damaged sarcolemma. Following injury, muscle repair processes are activated and quiescent SCs enter a massive proliferation phase, allowing the myogenic cell population to expand. This proliferation is characterized by an asymmetric cell division in which SCs replicate themselves for their self-renewal and can generate proliferating myoblasts for the development of the new muscle fibers. Myoblasts can differentiate and fuse in order to repair or to generate new fibers [
41]. Muscle regeneration is regulated by a family of muscle-specific, basic helix–loop–helix transcription factors called MRFs, including MRF4, myogenin, MyoD, and Myf5. After muscle injury, Myf5 and MyoD are typically the first MRFs to be expressed in the regenerating muscle cells, followed by myogenin, and finally MRF4. However, MyoD and Myf5 play different roles in the process of muscle regeneration. While MyoD promotes SCs’ progression to terminal differentiation, Myf5 promotes SCs’ self-renewal [
28].
3. Myoblast Fusion
Myoblast fusion is the process that results in the generation of syncytial muscle cells. It can occur between myoblasts (primary fusion) and myotubes (secondary fusion), and it can also happen during muscle regeneration. Injury is sufficient to activate SCs, which can produce new myoblasts after an asymmetric cellular division, necessary to maintain the SC pool [
42]. Mechanistic studies of these components suggest that muscle cells go through at least three consecutive steps before forming a fusion pore [
43]. Muscle cell fusion begins when myoblasts exit the cell cycle. Myoblasts will proliferate without differentiating if growth factors (particularly fibroblast growth factors) are present. The second step is cell recognition, which involves aligning the myoblasts into chains. The third step is the cell fusion event itself. Recent studies in a variety of model organisms have uncovered many molecular components required for myoblast fusion. Many steps in this process are facilitated by the actin cytoskeleton, for example. Myoblasts need cytoskeletal shape changes to migrate toward their sites of fusion, and a reorganization of the actin cytoskeleton is required for the following steps: fusion recognition, adhesion, and vesicle transport [
44]. Furthermore, although glycolipids and cholesterol are less abundant, they play an important role in regulating membrane polarity and fluidity [
45]. Moreover, cholesterol is required for the formation of specialized membrane regions responsible for the regulation of fusion signaling, such as lipid rafts and caveolae [
46]. Another fundamental molecule class is represented by the proteins involved in recognition and adhesion. This step necessitates the use of specific integrin family members and cell adhesion molecules (CAMs). The recognition is also mediated by cell membrane glycoproteins, including several cadherins [
47]. It plays an important role in mammalian myoblast regeneration, but it has also been found in developing muscle, even if its pathway expression is more visible after the initial fusion steps have been completed. Moreover, M-cadherin (M-cad) is also expressed in SCs and the sarcolemma. Once fusion occurs, M-cad signaling is switched off by M-cad movement into caveolae. M-cad is thus sequestered from the plasma membrane and subsequently transported to the proteasome for degradation [
48]. Furthermore, recent studies have identified specific cell signaling pathways whose activation results in the expression of genes required for the fusion process and cytoskeleton rearrangement regulation.
4. Genetics of FSHD
On the basis of their underlying epigenetic mechanism, two well-defined subtypes of FSHD exist, namely FSHD1 and FSHD2. Both of them often show an autosomal dominant pattern of inheritance and result in chromatin relaxation and abnormal
DUX4 expression in skeletal muscle, leading to progressive muscle weakness and atrophy [
49,
50].
FSHD1 represents the most common form, accounting for about 95% of all FSHD cases, and is caused by the partial deletion (shortening or contraction) of the macrosatellite D4Z4 repeat, located in the subtelomeric region of chromosome 4 (4q35) (
Figure 3) [
16,
51,
52,
53]. The D4Z4 macrosatellite, consisting of repeated units of 3.3 kb, is highly polymorphic: in the healthy population, the number of copies varies between 11 and 150, whereas in affected individuals, the number of copies ranges between 1 and 10. This contraction results in a partial loss of D4Z4 DNA methylation, which ultimately leads to
DUX4 transcription in skeletal muscle [
54]. The severity of the disease increases as the number of repetitions decreases [
55,
56]. The decrease in D4Z4 units leads to chromosome relaxation and hypomethylation, allowing DUX4 transcription in muscle cells [
53]. In addition to
DUX4, additional genes located in the 4q35 region proximal to the D4Z4 repeat array, such as the FSHD region genes 1 and 2 (
FRG1,
FRG2), adenine nucleotide translocator 1 (
ANT1) and FAT atypical cadherin 1 (
FAT1), seem to be inappropriately overexpressed in affected muscles [
57], but their role in both the onset and severity of disease is still controversial.
FRG1 has been considered a candidate gene because of its development of a phenotype similar to that of FSHD in a murine model overexpressing
FRG1 [
57], and it has been linked to muscle development [
10]. Indeed, while
FRG1 is subject to extreme variability, it has been found to be upregulated in affected patients.
FRG2 is 37kb proximal to D4Z4 and is specifically upregulated in FSHD muscle cells that are differentiating [
58]. This gene does not appear to be expressed in some FSHD patients who have an extended deletion at the proximal portion of the macrosatellite, implying that its dysregulation is more likely the result of epigenetic changes than a direct cause of pathology [
59,
60]. In support of this,
FRG2 overexpression in mouse models did not result in the development of muscular dystrophy [
61,
62].
ANT1 encodes a mitochondrial homodimeric protein that is localized asymmetrically on the inner mitochondrial membrane. The dimer forms a membrane channel through which ADP can pass from the matrix to the cytoplasm, being thus essential for cellular oxidative metabolism. ANT1 protein levels appear to be higher in FSHD muscles than in healthy controls or patients with Duchenne muscular dystrophy [
57], making muscle cells more susceptible to oxidative stress and apoptosis [
63]. While the involvement of
FRG1,
FRG2, and
ANT1 in FSHD pathogenesis is still debated, the role of the
FAT1 gene in this disease has been confirmed by independent studies [
64,
65,
66]. FAT1 is a member of the cadherin-like protein family and is involved in the regulation of tissue growth, morphogenesis, and polarity during development [
67]. The first association between
FAT1 and FSHD was reported in
Fat1-deficient mice, which showed muscular and non-muscular phenotypes resembling FSHD symptoms and pathological features [
64]. Other authors observed a lower expression of
FAT1 in diseased adult muscles than in matched controls, which did not appear to be regulated by DUX4 [
65]. They also found that
FAT1 is expressed at lower levels in early-stage FSHD-affected muscles compared to later-stage or unaffected muscles in control fetal human biopsies or developing mice embryos [
65]. Additional experimental research and case reports have further confirmed
FAT1 as a gene involved in disease onset and severity [
64,
66]. However, further in-depth studies are needed to clearly understand its role along with the cellular and molecular mechanisms leading to its altered expression in FSHD cells [
64,
66,
68].
Figure 3. Representation of the FSHD locus. The D4Z4 repeat array is located in the subtelomere of chromosome 4 and can vary between 11 and 100 copies in healthy individuals. In FSHD patients, the structure of D4Z4 adopts a more open configuration and has fewer copies (between 1 and 10).
The rarer FSHD2 form, which accounts for the remaining 5% of FSHD patients, has been attributed to variants in D4Z4 chromatin repressors, mainly occurring within the structural maintenance of the chromosomes flexible hinge domain-containing 1 (
SMCHD1) gene encoding a chromatin remodeling factor important for DNA methylation [
69,
70]. Interestingly,
SMCHD1 mutations have been also reported to act as modifiers of disease severity in patients with FSHD1 [
71,
72], suggesting that FSHD type 1 and 2 form a disease continuum instead of separate entities [
73]. Rare heterozygous variants in the DNA methyltransferase 3β (
DNMT3B) gene have been also associated with FSHD2 manifestation and penetrance [
74]. Intriguingly, biallelic
DNMT3B mutations are also responsible for immunodeficiency, centromeric instability, and facial anomalies syndrome type 1 (ICF1) [
75]. Moreover, a homozygous mutation in the
LRIF1 gene (encoding a ligand-dependent nuclear receptor-interacting factor 1) that suppresses the long isoform of this protein has been recently detected in a patient with FSHD2.
Rare FSHD cases have been linked to uncommon DNA changes leading to D4Z4 chromatin relaxation, thus allowing for
DUX4 transcription [
76,
77,
78]. Intronic mutations in
SMCHD1 influencing mRNA splicing [
79], partial deletions of the D4Z4 macrosatellite repeat array extending proximally into surrounding non-D4Z4 sequences [
61,
80], and small duplications of the D4Z4 macrosatellite repeat region [
78,
81] are examples of these alterations.
However, disease-causing mutations have not been found in a small subset of FSHD patients with a normal-sized but transcription-permissive D4Z4 macrosatellite repeat, suggesting the existence of additional FSHD-related modifiers, probably other D4Z4 chromatin modifiers [
82].
FSHD has been proven to be a genetically heterogeneous disorder involving both genetic and epigenetic alterations. A thorough genomic analysis of the 4q35 region resulted in the identification of several haplotypes. In particular, 15 single-nucleotide polymorphisms (SNPs) were discovered in a region near D4Z4 (the D4F10S1 region), and a second large region with sequence variants (alleles A, B, and C) was also identified distal to D4Z4. Based on these differences, 4q alleles can be classified into 18 haplotype variants, with macrosatellite deletions being pathogenic in only a few of them (4qA161, 4qA159, and 4qA168) [
55,
83]. In permissive haplotypes (the 4qA allele), SNP causes the appearance of an ATTAAA polyadenylation signal. In non-permissive haplotypes, the sequence is instead ATCAAA, which is a non-functional polyadenylation signal. Transfection experiments that introduced the functional site into non-permissive alleles and removed it from permissive alleles confirmed the importance of the polyadenylation site [
83]. The presence of a stable
DUX4 transcript was detected solely in the presence of the polyadenylation site and appears to directly affect the etiopathogenesis of FSHD. D4Z4 deletions in permissive alleles, on the other hand, are insufficient to cause FSHD; in fact, asymptomatic carriers have been observed. This suggests that these haplotypes are just a permissive condition for dystrophy development. As a result, the great number of D4Z4-homologous sequences found in the genome, along with the complexity of the subtelomeric 4q region, have always made the understanding of the molecular mechanisms underlying FSHD particularly difficult [
55,
84].