DUX4-rearrangement (DUX4r) is a recently discovered recurrent genomic lesion reported in 4-7% of B cell acute lymphoblastic leukaemia (B-ALL). The gene fusion most commonly links the hypervariable IGH gene to DUX4 a gene located within the D4Z4 macrosatellite repeat on chromosome 4. DUX4r is cryptic to most standard diagnostic techniques and difficult to identify even with next-generation sequencing assays.
B-cell acute lymphoblastic leukaemia (B-ALL) is a malignant disorder of the bone marrow resulting in over proliferation of immature B lymphoblasts. The disease can manifest at any age but the majority of patients are children, making B-ALL the most common childhood malignancy . This heterogeneous disease is characterised by a variety of different genomic alterations including changes in chromosome number, chromosomal translocations and single nucleotide variants (SNV). Detection of the underlying genomic alterations assists clinical risk stratification and therapeutic triage. Cytogenetic analysis has proven adept at identifying several recurrent genomic alterations which result in diseases with distinct gene expression profiles (GEP) and defined prognosis. This includes high hyperdiploidy, hypodiploidy, and the translocations t(12;21) [ETV6-RUNX1], t(9;22) BCR-ABL1, t(1;19) TCF3-PBX1 and alterations of chromosome 11q23 resulting in rearrangement of KMT2A/MLL. These alterations account for approximately 60% of pediatric B-ALL cases . Remaining patients were historically classified as B-other and demonstrated highly variable prognosis and treatment response.
Molecular studies involving GEP and next-generation sequencing (NGS) have subsequently identified a number of additional recurrent molecular alterations not detectable with standard cytogenetics, several of which may be targetable by precision medicine approaches. This includes the newly recognized subtype of Philadelphia chromosome-like (Ph-like) ALL characterized by a gene expression profile similar to cases with a BCR-ABL1 translocation, but instead carrying one of multiple kinase activating lesions. One example is rearrangement of the cytokine receptor gene CRLF2 (CRLF2r), commonly with concurrence of Janus Kinase 2 (JAK2) mutations , affecting 5–7% of children with B-ALL . NGS has also recently identified a rearrangement of the homeodomain encoding the Double Homeobox 4 (DUX4) transcription factor with the immunoglobulin heavy chain (IGH) locus which results in a distinct genetic subtype.
As early as 2002 researchers identified a novel B-ALL subtype, with a distinct microarray GEP, not associated with any known recurrent genomic alterations, that appeared to confer a good prognosis. Interrogation of overexpressed genes in these patients failed to uncover a causative lesion . Follow-up studies involving copy number alteration (CNA) analysis revealed many of the patients with the distinct expression profile also demonstrated deletion of the ERG gene (ETS transcription factor ERG), a genomic alteration absent in almost all other subtypes. ERG deletion was consequently proposed as the driving lesion in this subtype . Multiple studies have subsequently demonstrated that monoallelic deletion of ERG is observed in only a subset of patients demonstrating this GEP. Furthermore, ERG deletions were subclonal in several patients at diagnosis and either altered or absent at relapse .
In 2016, two independent studies identified rearrangement of the DUX4 locus (DUX4r), most commonly partnered with IGH, present in patients with the previously detected GEP . Transduction of the DUX4 fusion transcript into NIH3T3 fibroblasts resulted in cellular transformations, demonstrating the oncogenic potential of this alteration . Multiple studies have subsequently confirmed the unique GEP of DUX4r cases and identified the rearrangement in 4–7% of B-ALL patients , as well as within the NALM6 cell line . Leukaemic cells carrying the DUX4r also display a unique methylation profile, associated with widespread hypomethylation , and express a specific non coding RNA signature . Consequently, DUX4r, also reported in the literature as DUX4/ERG, has increasingly been accepted as a distinct molecular subtype in B-ALL. In this review, we present the current understanding of the molecular structure and biological effects of the DUX4r.
The DUX4 gene is present within each repeat of the D4Z4 tandem array located in the subtelomeric region of chromosome 4q , with an almost identical locus (>98% nucleotide identity) on 10q . The D4Z4 array is polymorphic in length containing between 11–100 copies of the 3.3 kb repeat in healthy adults (Figure 1) . In healthy tissue, transcription of DUX4 is restricted to germline cells of the testes. Transcription has also been observed in induced pluripotent stem cells, suggesting a role for DUX4 in germline development . Expression of the full-length DUX4 transcript is epigenetically silenced in somatic tissue . Only the first exon of the spliced transcript contains a coding sequence for the protein which consists of two N-terminal homeodomains capable of DNA binding  and a C-terminal transactivation domain . The DUX4 protein is capable of binding to, and upregulating expression of, multiple genes as well as initiating expression from alternate promoters, producing non-canonical transcript isoforms .
Contraction of the D4Z4 region resulting in fewer than 10 repeats is associated with facioscapulohumeral muscular dystrophy (FSHD) , a genetically inherited disorder that initially manifests as progressive weakening of the facial, shoulder and upper arm muscles . Partial deletion of the D4Z4 array is associated with hypomethylation and loss of repressive histone modifications that are believed to reduce chromatin packing of the subtelomeric region allowing DUX4 expression . Intriguingly, FSHD only manifests in patients who demonstrate D4Z4 contraction on chromosome 4q and not the homologous array on chromosome 10q. Furthermore, contraction of an alternative chromosome 4q allele (4qB) , does not result in disease. Sequencing efforts have subsequently revealed that the permissive 4qA allele carries a polymorphism in the region immediately distal to the final repeat of the array (Figure 1). This polymorphism creates a canonical polyadenylation signal in the 3’UTR of DUX4 enabling expression of a stable mRNA transcript . The translated protein then binds multiple target genes resulting in widespread changes in gene expression that are ultimately cytotoxic .
DUX4 has also been implicated in several cancers involving rearrangements that produce chimeric proteins with altered transcriptional activity . For example, a recurrent translocation between Capicua Transcriptional Repressor (CIC) and DUX4 occurs in a proportion of patients with Ewing-like sarcoma. This chromosomal rearrangement produces an in-frame transcript containing the first 20 exons of CIC but replacing the terminal exons with the 3’ portion of DUX4. Translation of the chimeric transcript produces a protein that retains the majority of CIC, including the N-terminal DNA binding domains, but replaces the C-terminus with the DUX4 transactivation domain. As a result, the chimeric CIC-DUX4 protein acts as an oncogenic transcriptional activator . In B-ALL, translocation of DUX4 results in a different chimeric protein, but one that again acts as an oncogenic transcriptional activator. In all but one reported case, the 5’ coding sequence of DUX4 is cryptically inserted into an alternate genomic location, resulting in expression of a chimeric transcript which retains sequence containing the N-terminus of DUX4 but replacing the 3’ coding sequence (Figure 2). While multiple potential fusion partners have been identified, including ERG, DUX4 is most commonly inserted into the IGH locus .
Multiple rearrangements of IGH (IGHr) have been reported in B-ALL resulting in expression or overexpression of genes with oncogenic potential. The most common of these is a translocation between chromosome 14 and CRLF2 of the pseudoautosomal region of chromosome X/Y resulting in increased expression of cytokine receptor-like factor 2 . In the case of DUX4, most analyses report that the rearrangement involves a portion of the D4Z4 array on 4q or the homologous region on 10q, consisting of either a partial copy of DUX4 or one complete and one partial D4Z4 repeat, being inserted into the IGH locus, placing them close to the IGH enhancer (E) . As with other IGHr, the presence of the enhancer induces expression of the translocated gene . Repeats containing DUX4 can be inserted in either orientation resulting in expression from the positive or negative strand. In some cases, a more complex rearrangement involving sequences from a third genomic location have been reported (Figure 1C) . Alternatively, Hi-C data performed on the NALM6 cell line suggest that a reciprocal translocation can occur in which the telomeric ends of 4/10q are exchanged with 14q .
IGH breakpoints are enriched in the 3.5 kb region preceding the IGHM constant allele and overlapping the IGH D-J junctions but can occur throughout the locus . Breakpoint locations for the DUX4 gene are harder to define given the repetitive nature of the D4Z4 array, but most commonly occur within the 5’ region upstream of DUX4 and within the 3’ coding region of exon1. This results in a DUX4 transcript which maintains the homeodomains encoded at the 5’ end of the transcript fused to sequence, usually from IGH-JH or IGH-DH regions, but can also be another genomic location . The resulting protein thus maintains its ability to bind DUX4 targets but possesses a truncated C-terminus with inclusions of some amino acids encoded by the alternate locus. As the genomic breakpoints for this rearrangement are highly variable, the resultant length and amino acid sequence the C-terminal domain of the DUX4-fusion varies considerably between patients but consistently retains the DNA binding homeodomains (Figure 2) .
Figure 1. Potential chromosomal rearrangements involving IGH and DUX4. (A) Ideogram of chromosome 4 indicating location of the D4Z4 array and a depiction of the two alleles which vary in the sequence distal to the final repeat (repeat indicated by open triangles). This includes the permissive 4qA allele which can result in FSHD when contracted to fewer than 10 repeats. (B) Ideogram of chromosome 10 indicating location of the homologous D4Z4 array with 98% identical sequence. This chromosome is associated with a non-permissive allele which does not result in FSHD. (C) Schematic diagram of the repeat 3.3 kb repeat sequence indicating location and exons of the DUX4 gene. (D) Ideogram of chromosome 14 and depiction of the IGH locus indicating constant (CH), joining (JH), diversity (DH) and variable (VH) alleles (E) Schematic diagram depicting possible rearrangements as a result of cryptic insertion of DUX4 from either chromosome 4 or 10 into the IGH locus. DUX4 can be inserted in either orientation, include only a partial or one complete and one partial copy of the repeat array and also be inserted with sequence from a third genomic location.