Repetitive sequences represent about half of the human genome. They are actively transcribed and play a role during development and in epigenetic regulation. The altered activity of repetitive sequences can lead to genomic instability and they can contribute to the establishment or the progression of degenerative diseases and cancer transformation. Increased levels of heterochromatic repetitive satellite-coded RNAs in mammary glands induce breast tumor formation in mice, altering the BRCA1-associated protein networks that are required for the proper stabilization of DNA replication forks that in turn lead to genomic instability. In humans, patients with breast cancer that express high levels of RNA derived from alpha satellite have an increased risk of developing multiple cancers.
1. Breast Cancer Classification
Breast cancer is still the leading cause of mortality among the female population in developed countries. In post-menopausal women, it accounts for 23% of all cancer deaths
[1]. Breast cancers can be classified following anatomical, histological and molecular features
[1], and their classification is a dynamic process, as stated in the last World Health Organization classification of tumors of the breast
[2], and novel entities are added to the classification year by year following the increase in the knowledge of the disease
[3]. Breast cancer is as a heterogeneous disease with different clinical and pathological features, variable therapeutic approaches and responses and with different outcomes even within the same class of breast cancer, suggesting that the current classifications are far from exhaustive. In order to follow the original classification of the cohort used in this
sent
udry
[4[4][5],
5], the breast cancer specimens have been classified as: luminal-A, luminal-B (HER2-negative), luminal-B (HER2-positive), HER2-enriched and triple-negative breast cancers. This classification is based on immunohistochemical-relevant markers and was recommended by the St. Gallen Expert Consensus
[6] and it has become a standard in routine clinical analysis since then
[6,7,8][6][7][8]. For detail reviews, please refer to Hennigs et al.
[8] and Prat et al.
[9].
2. Non-Coding RNA in Mammals
The mammalian genome is pervasively transcribed and only a small portion of the transcriptional output has protein-coding potential
[10]. The non-coding RNAs (ncRNAs) can be categorized using sizes and function, such as small-nuclear RNAs (snRNAs), small-nucleolar RNAs (snoRNAs), long non-coding RNAs (lncRNAs) and many others. The most well-studied class of ncRNAs is probably represented by microRNAs (miRNAs). Many studies have identified or suggested their role in human health and diseases, aging
[11], cancer
[12[12][13],
13], diagnostic or prognostic purposes
[14,15][14][15] and as therapeutic agents
[16], either per se or in complex networks of cross regulation by the name of competing endogenous RNAs (ceRNAs)
[17,18,19,20,21,22,23,24][17][18][19][20][21][22][23][24]. An abundant class of ncRNAs with variable functions that are still not fully understood is represented by transcripts arising from non-coding DNA sequences that are repeated along the genome in multiple copies. Even if the transcription from a single copy can be negligible, the sum of transcripts arising from thousands or millions of copies can be massive. A detailed description of them is given in the following paragraphs.
3. Repetitive DNA Sequence Classification
A multifaceted category of ncRNAs, of growing interest due to their roles in human health and diseases, is represented by the transcripts arising from repetitive DNA sequences (RS), i.e., DNA sequences that are present in multiple copies in the genomes, with low or nonexistent coding potential. RS represent about 45% of the human genome and are differentially transcribed in many tissues
[25]. In mammals, RS have many roles in development and epigenetic regulation, but also in diseases such as cancer transformation
[26,27,28,29,30][26][27][28][29][30] and degenerative diseases
[31], but they are notoriously difficult to study
[32]. Due to their nature, length and origin, RS can be roughly classified as: (i) Satellite repeats: a tandem array of simple or complex sequence repeats, abundant in heterochromatic regions, including alpha satellite repeats that represent the main DNA component of human centromeres. (ii) Long interspersed nuclear elements (LINEs): retrotransposons devoid of long terminal repeats (non-LTR) including some that are still able to retrotranspose. (iii) Small interspersed nuclear elements (SINEs): non-autonomous retrotransposons including the Alu elements in humans, which are often involved in genomic rearrangements. (iv) Integrated LTR retroviruses, mainly represented by the human endogenous retrovirus (HERV) families. (v) Additionally, the families of DNA transposons, that are usually not active in humans (
Figure 1). The role of RS is starting to be properly understood. E.g., in the human brain LINE-1 retrotransposons are actively transcribed and mobilized and they are suggested to play a role in shaping the adult human brain
[33], there is also a suggested role of RS in a model of aging of human brain
[34].
4. Repetitive DNA Sequence and Cancer
Increased levels of heterochromatic repetitive satellite-coded RNAs in mammary glands induce breast tumor formation in mice, altering the BRCA1-associated protein networks that are required for the proper stabilization of DNA replication forks that in turn lead to genomic instability
[35]. In humans, patients with breast cancer that express high levels of RNA derived from alpha satellite have an increased risk of developing multiple cancers
[36].
It is known that LINE-1-encoded retrotranscription activity is widespread and its inhibition can reduce the rate of proliferation and promote the differentiation of breast cancer cells
[37]. LINE-1 (and Alu) hypomethylation, suggesting an increased transcription in cancer cells and thus their mobilization, has been associated with the HER2-enriched subtype of breast cancer with worst prognosis
[38,39,40][38][39][40]. In the transgenic mice of a well-defined model of breast cancer progression, LINE-1 is upregulated at a very early stage of tumorigenesis
[41]. Indeed, the altered expression patterns of LINE-1-coded ORF1 and ORF2 proteins, with differences in overall patient survival, have been reported in invasive breast cancers
[42]. In specific cases, pesticide exposure induces LINE-1 reactivation, suggesting the role of LINE transcription in pesticide-induced breast cancer progression
[43], and MET-LINE-1 chimeric transcripts identify a subgroup of aggressive triple-negative breast cancers
[44]. Overall, it has been suggested that LINE-1 may contribute to the origin or progression of breast cancers
[45].
There are many reports regarding Alu and other SINE elements within or surrounding
BRCA1 and
BRCA2 genes essential to genomic rearrangements or genetic mutations leading to etiopathogenic, prognostic or predisposing mutations of breast cancers, both in somatic and germ lines
[46,47,48,49,50,51][46][47][48][49][50][51]; indeed, the demethylation of Alu sequences may induce, at the same time, both transcription and rearrangements of Alu sequences. Thus, Alu transcription is a marker of increased susceptibility to Alu-mediated genomic rearrangement or genetic mutation at Alu sites. Looking for a direct effect of Alu transcription, it is noteworthy that heterogeneous nuclear ribonucleoprotein C (HNRNPC) is essential in breast cancer cell survival by inhibiting the double-stranded-RNA (dsRNA)-induced interferon response. Indeed, dsRNA in this setting is highly enriched in Alu sequences
[52], suggesting that an overexpression of Alu sequences is characteristic of many breast cancers and may have lethal effects in cancer cells if not controlled.
There is significant evidence regarding the use of HERV-K-coded proteins as tumor markers and immunologic targets
[52,53,54,55,56,57,58][52][53][54][55][56][57][58] and in influencing cancer stemness
[59]. It has even been suggested that they could act as etiological agents
[60,61][60][61]. Indeed, the expression of HERV-K is upregulated and associated with the basal-like breast cancer phenotype
[62] and a HERV-derived long non-coding RNAs (namely, TROJAN) promotes triple-negative breast cancer progression
[63]. HERV can directly contribute to cancer progression by activating the ERK pathway and inducing migration and invasion
[64]; it has been even suggested that the activation of HERV-K may be essential for the tumorigenesis and metastasis of breast cancer
[65]. Indeed, HERV-K-derived RNAs and antibodies against HERV-K-coded proteins are elevated in the blood of patients at an early stage of breast cancer
[66].
DNA transposons are the less active and less well-studied class of RS in humans. Nevertheless, few reports suggest their role in breast cancer
[67]; however, they were not investigated further. In addition, a mechanism of
BRCA1 mutation in three unrelated French breast/ovarian cancer families, that can be generated by an abortive integration of the human Tigger1 DNA transposon, has been postulated
[68].
Figure 1. Repetitive sequences (RS) represent about half of the human genome. The panel reports RS activities associated with breast cancer. In orange: Satellite repeats
[35,36][35][36]. In red: Long interspersed nuclear elements (LINEs)
[37,42,45][37][42][45]. In yellow: Small interspersed nuclear elements (SINEs)
[46,47,48,49,50,51][46][47][48][49][50][51]. In blue: Human endogenous retrovirus (HERV)
[62,63,64,66][62][63][64][66]. In green: DNA transposons
[68].