1. Molecular Biology for Sarcoma Diagnosis
Cancer diagnosis is classically based on pathology, with the consequence that cancers are usually classified according to their organ and/or supposed tissue of origin. However, cancer is primarily a genetic disease, and it has become clear that pathologically homogeneous cancers can harbor a large heterogeneity in their underlying genetic make-up. Since the genetic alterations leading to oncogenesis are determining for the behavior of the tumor, it has become increasingly essential to characterize them for better diagnosis, let alone prognosis, and potentially treatment guidance.
This is no exception for soft tissue sarcomas: the classification is historically based on histological characteristics, but molecular biology has allowed the refinement of the diagnostic nosology of this large and heterogenous group of tumors. For simplicity, sarcomas are classically divided into two groups based on genomic characteristics: (1) sarcomas with a single driver molecular alteration (or sarcomas with “simple genetics”) and (2) sarcomas with a complex genomic profile (sarcomas with “complex genetics”)
[1][3]. The former group comprises sarcomas that are defined by specific driver molecular alterations, mainly oncogenic gene fusions, but also activating or inactivating mutations, or gene amplifications. Therefore, their overall genomic profile is usually “simple” with near-diploid karyotypes, meaning that there are few other genomic alterations other than the driver alteration. If the oncogenic properties of all the gene fusions found in rare sarcomas have not yet been assessed in relevant models, their similarities in terms of structure, the homogeneity of the gene expression profiles of tumors with a given fusion, as well as the scarcity of other genomic alterations found in their genomes, suggest that these molecular alterations are a very early driver event in the oncogenesis of these tumors. This contrasts with the second group of sarcomas which harbor highly rearranged genomic profiles, with large numbers of chromosomal and copy number alterations as well as point mutations including of tumor-suppressor genes, often reflecting genomic instability. This binary classification is probably oversimplifying, and it may be misleading, for instance dedifferentiated liposarcoma is characterized by a driver alteration (
MDM2 amplification), but it also has a highly rearranged genomic profile
[2][4].
For the group of sarcomas with a driver alteration, molecular biology is logically essential for their accurate diagnosis and characterization. For other sarcomas, it also has the potential to inform diagnosis, especially as a useful tool to distinguish them from morphologically similar benign tumors.
2. Sarcomas with “Simple Genetics”
Sarcomas with a simple genetic driver alteration represent 30% to 40% of soft tissue sarcomas. They are characterized by specific molecular alterations that are usually pathology-defining, therefore molecular biology is essential to make the diagnosis. Classically, these molecular alterations are divided into oncogenic gene fusions, activating and inactivating point mutations, and gene amplifications.
2.1. Gene Fusions
The most common driver alterations in sarcomas are gene fusions. A large number of sarcomas are translocation related, i.e., the result of a chromosomal translocation giving rise to a fusion gene encoding an oncogenic fusion protein, usually a chimeric transcription factor
[3][5]. The paradigm of this model of oncogenesis is Ewing sarcoma
[4][6]: this tumor which develops from bone but also soft tissues in young adults and adolescents is characterized by a translocation between chromosomes 11 and 22, giving rise to a fusion gene
EWS-FLI1, leading to a chimeric transcription factor with oncogenic properties
[5][7]. In recent years, dozens of other sarcoma-defining gene fusions have been described, thus extending the number of subtypes of oncogenic fusion-driven sarcomas and refining the classification of often similar-looking but biologically different tumors. Most gene fusions involve transcription factors, though some may lead to constitutive activation of a tyrosine kinase receptor or growth factor.
In clinical practice, diagnosis of the oncogenic fusion is done using molecular techniques such as fluorescence in situ hybridization (FISH), reverse transcription–polymerase chain reaction (RT-PCR), or targeted RNA sequencing
[6][8]. The former detects rearrangement of genes involved in the fusion, while RT-PCR and targeted RNA sequencing search for the resulting RNA transcript in tumor cells. While both methods are highly sensitive, specific, and accessible in most routine labs, they are targeted assays, and they require a good
a priori knowledge of the differential diagnoses.
In contrast, a more recent technique based on next-generation sequencing and increasingly used for diagnosis of sarcomas is whole transcriptome profiling (RNA sequencing, RNA-seq). Using this unsupervised technique, a single assay can detect every possible gene fusion leading to a fusion transcript, including yet undescribed oncogenic fusion transcripts. In addition to its powerful fusion detection capacity, profiling the whole transcriptome enables refining, and it helps in classification using transcriptomic similarity to other sarcomas. In this way, novel entities with homogeneous transcriptomic profiles and specific gene fusions have been described. For instance, Watson et al. used RNA-seq to characterize a group of 180 sarcomas for which no diagnosis could be made using FISH or RT-PCR
[7][9]. A gene fusion was detected in more than half of
cas
ituationes, including several previously uncharacterized fusion transcripts. Moreover, whole-transcriptome profiling allowed high-dimensional clustering of sarcomas, showing that most fusion genes are associated with a characteristic transcriptomic profile, and that some sarcomas with differing fusion transcripts can be grouped into transcriptomically homogeneous entities, such as
CIC-fused sarcomas which comprise
CIC-DUX4,
CIC-FOX4, and
CIC-NUTM1 sarcomas. Thus, transcriptomic profiling, and more generally molecular profiling, allows a grouping of sarcomas that may differ from simple pathological diagnosis or gene fusion detection: one can envision that techniques such as RNA-seq could lead to a novel classification of sarcomas complementary of the present pathologically oriented classification. Indeed, some centers such as the Institut Curie are using RNA-seq to help in the diagnosis of sarcomas, primarily for gene fusion detection but also for transcriptomic clustering. Of note, whereas initial use of RNA-seq was restricted to fresh frozen tissues, it has now evolved and can also be performed on paraffin-preserved tissues
[8][10]. RNA-seq has since allowed the characterization of novel fusion genes such as
CIC-NUTM1 [9][11],
TFCP2-rearranged
[10][12],
EWSR1-SSX1 [11][13], as well as the identification of
NTRK-rearranged sarcomas
[12][14] or
NRG1-fused sarcomas
[13][15]. It has also led to the identification of different molecular subgroups of entities previously considered as pathologically homogeneous, for instance pediatric and spindle cell rhabdomyosarcomas
[14][15][16,17]. These molecular alterations defining homogeneous groups of sarcomas have mostly been integrated in the current classification scheme as an essential complementary information to pathology
[16][1].
2.2. Mutations
While gene fusions constitute the most frequent molecular alterations in sarcomas, some subtypes are characterized by mutations of specific genes, either oncogenesis “driver” genes (activating mutations), or tumor suppressor genes (inactivating mutations).
2.2.1. Activating Mutations
Though rare in the number of subtypes, some sarcomas present activating mutations in “driver” genes as their primary oncogenic mechanism. The paradigm of this are gastrointestinal stromal tumors (GISTs) that are characterized by gain-of-function mutations of the
KIT gene (85%), and less often the
PDGFRA gene (5%), which are both mutually exclusive and lead to constitutive activation of these transmembrane receptors and their downstream signaling pathways
[17][18][19][20][18,19,20,21]. GISTs are the most common mesenchymal tumors of the gastrointestinal tract and molecular diagnosis has transformed their management. In clinical practice, these diagnosis-defining mutations are detected in tumor DNA by Sanger sequencing or gene panel targeted next-generation sequencing.
2.2.2. Inactivating Mutations
Several sarcomas are associated to inactivating mutations of tumor suppressor genes. As in most cancers, genes such as
TP53 and
PTEN are frequently mutated during the course of oncogenesis
[2][21][22][4,22,23], but some inactivating mutations constitute the primary molecular alteration. For instance, malignant peripheral nerve sheath tumors (MPNST) are characterized by mutations in the
NF1 tumor suppressor gene (50%)
[23][24]. Perivascular epithelioid cell tumors (PEComas) are associated with mutations in
TSC1 and
TSC2 with subsequent activation of the mTOR pathway
[24][25][25,26]. Another group of sarcomas, BAF-deficient sarcomas, harbor mutations in genes of the BAF (also called SWI-SNF) complex: epithelioid sarcomas
[26][27] and malignant rhabdoid tumors including atypical teratoid/rhabdoid tumors (ATRTs) of the central nervous system (
SMARCB1 mutations)
[27][28], small cell carcinomas of the ovary, hypercalcemic type (SCCOHT), and SMARCA4-deficient thoracic sarcomas (
SMARCA4 mutations)
[28][29][29,30]. It has been shown recently that a subgroup of ATRTs have mutations of
SMARCA4, and they are distinct from classical
SMARCB1-mutated ATRTs
[30][31]. The BAF complex is involved in chromatin remodeling and highlights the essential role of epigenetics in the pathogenesis of sarcomas. In clinical practice, these mutations can be found in tumor DNA by Sanger sequencing or gene panel targeted next-generation sequencing. Moreover, loss of proteins of the BAF complex can be shown using immunohistochemistry.
2.3. Gene Amplifications
A significant proportion of sarcomas harbor gene amplifications, the most frequent of which is the 12q amplification characteristic of adipocytic tumors: atypical lipomatous tumors (ALT) and well-differentiated liposarcomas (WDLPS) and dedifferentiated liposarcomas (DDLPS)
[31][32]. Less often, the same amplification can be found in other tumors such as intimal sarcomas
[32][33]. The 12q amplicon can be different in length and composition from one tumor to another, but it invariably contains the
MDM2 gene, which is an antagonist of
TP53, and it promotes oncogenesis through suppression of the activity of the p53 protein
[33][34], as well as through its direct binding to the chromatin to promote serine metabolism dependency
[34][35]. DDLPS are tumors that contain two compartments: one is composed of adipocytic tumor cells and is similar to WDLPS, while the dedifferentiated compartment consists of undifferentiated high-grade tumor cells that may be confused with other high-grade non-lipogenic sarcomas such as undifferentiated pleomorphic sarcoma (UPS) or MPNST, or sometimes show heterologous differentiation with features of osteogenic or myogenic differentiation. Thus,
MDM2 amplification is an essential diagnostic tool to diagnose liposarcomas and in practice it can be found with FISH
[35][36]. Other techniques that can be used are comparative genomic hybridization (CGH) and whole exome sequencing. When using these techniques, it is common to find a large number of genomic rearrangements in DDLPS
[36][37], highlighting the limits of classifying sarcomas into sarcomas with simple or complex genetics.
3. Sarcomas with “Complex Genetics”
Genomically complex sarcomas represent more than 50% of soft tissue sarcomas in adults. In contrast to sarcomas with simple genetics, they do not harbor specific and characteristic molecular alterations. Indeed, they show large numbers of genomic rearrangements, copy number variations and point mutations, sometimes dubbed “genomic chaos”. While some recurrent mutations can be found in tumor suppressor genes such as
TP53,
RB1, and
ATRX [2][4], molecular biology techniques are less essential for the diagnosis of these sarcomas, which are still predominantly defined by pathology associated to immunohistochemistry. However, it can still be of help in difficult situations, for instance in differentiating a benign from a similar-looking malignant tumor. One example is the distinction to be made between benign leiomyomas and malignant leiomyosarcomas in smooth muscle tumors of the uterus. Microscopic features such as mitoses and tumor necrosis are classically used to distinguish between benign and malignant tumors, but they may sometimes be difficult to assess, leading to the diagnosis of uterine smooth muscle tumors of unknown malignant potential (STUMPs). Genomic analysis with CGH array or whole exome sequencing can be used in these
cas
ituationes to detect malignant tumors that show a genomic index (score of genomic rearrangement) of more than ten
[37][38].