6.1. TCGA Data Analysis
We identified 370 pseudogene transcripts associated with HNC, where SPATA31D5P (SPATA31 subfamily D member 5 pseudogene), HERC2P3 (hect domain and RLD2 pseudogene 3), SPATA31C2 (SPATA31 subfamily C member 2), MAGEB6P1 (melanoma antigen family B6 pseudogene 1), SLC25A51P1 (solute carrier family 25 member 51 pseudogene 1), BAGE2 (B melanoma antigen family member 2), DNM1P47 (DNM1 pseudogene 47), SPATA31C1 (SPATA31 subfamily C member 1), ZNF733P (zinc finger protein 733 pseudogene), and OR2W5 (olfactory receptor, family 2, subfamily W, member 5) were found to be the most deregulated in HNC. The ten most deregulated pseudogene transcripts, chromosome location, gene family function, and involvement in cancer can be found in Table 2.
When we stratified patients’ information by tumor location, we found a few differences in pseudogene transcripts’ pattern that could be explained by different cancer behaviors
[43], but some of the identified pseudogenes were present in more than one tumor location.
In oral cancer (n = 62), SPATA31D5P, NBPF25P (neuroblastoma breakpoint family member 25 pseudogene), HSP90AB2P (heat shock protein 90 alpha class B member 2 pseudogene), NXF4 (nuclear RNA export factor 4 pseudogene), FOLH1B (folate hydrolase 1B), DNM1P47, BNIP3P1 (BCL2/adenovirus E1B 19 interacting protein 3 pseudogene 1), PKD1L2 (polycystin 1-like 2 pseudogene), BAGE2, and ZNF658B (zinc finger protein 658B, pseudogene) were found to be the 10 most deregulated of 154 pseudogene transcripts identified (Table 2).
In oropharyngeal cancer (n = 51), POTEA (POTE ankyrin domain family, member A), MROH5 (maestro heat-like repeat family member 5), MSL3P1 (male-specific lethal 3 homolog pseudogene 1), HLA-H (major histocompatibility complex class I, H pseudogene), TUBB8P7 (tubulin beta 8 class VIII pseudogene 7), SLC7A5P2 (solute carrier family 7 member 5 pseudogene 2), DPY19L2P1 (DPY19L2 pseudogene 1), TSSC2 (tumor suppressing sub-transferable candidate 2 pseudogene), SPATA31C2, and NXF4 were found to be the 10 most deregulated of 109 pseudogene transcripts identified (Table 2 ).
In hypopharyngeal cancer (n = 8), DPY19L2P3 (DPY19L2 pseudogene 3), SPATA31D5P, GBA3 (glucosidase beta acid 3), PLEKHM1P (pleckstrin homology domain containing, family M member 1 pseudogene), DPY19L2P1, MST1P2 (macrophage-stimulating 1 pseudogene 2), RP11-44F14.1, ADAM21P1 (ADAM metallopeptidase domain 21 pseudogene 1), MAGEB6P1, and OR12D2 (olfactory receptor family 12 subfamily D member 2) were found to be the 10 most deregulated of 21 pseudogene transcripts identified (Table 2 ).
In laryngeal cancer (n = 98), HERC2P3, SPATA31D5P, SPATA31C2, SLC25A51P1, MAGEB6P1, SPATA31C1, BAGE2, PNLIPRP2 (pancreatic lipase-related protein 2), ZNF733P and DNM1P47 were found to be the 10 most deregulated of 287 pseudogene transcripts identified (Table 2 ).
Only one pseudogene identified in our analysis,
DPY19L2P1, has already been associated with HNC.
DPY19L2P1 was considered an independent prognosis predictor of laryngeal cancer, where its higher expression was associated with the worst OS, although the exact mechanism was not explored
[36]. In our analysis,
DPY19L2P1 was included in the ten most deregulated pseudogenes in oropharyngeal and hypopharyngeal cancers. However, it was identified only as the 54th most deregulated pseudogene in laryngeal cancer, and its prognostic value could not be evaluated. Other described pseudogenes have already been associated with carcinogenesis, although the exact mechanisms were not reported or detailed in most studies.
HERC2P3 upregulation was associated with gastric cancer cell growth and migration by interacting with the Akt signaling pathway
[44].
BAGE2 was related to the tumor-specific antigen profile
[45].
FOLH1B was found to be involved in metallopeptidase activity, and its upregulation was associated with aggressiveness and metastasis in prostate cancer
[46].
BNIP3P1 upregulation was found in breast cancer brain metastases
[47].
PKD1L2 upregulation was associated with a good prognosis in breast cancer patients
[48] and colorectal cancer risk and poor survival in obese patients
[49].
POTEA upregulation was associated with increased colorectal cancer risk
[50].
MSL3P1 upregulation was considered a non-invasive biomarker of renal cell carcinoma
[51].
HLA-H upregulation was associated with cervical
[52] and lung
[53] carcinomas.
GBA3 lower expression was associated with poor prognosis of hepatocellular carcinoma patients
[54].
MST1P2, binding to miR-133b, affected the chemoresistance of bladder cancer cells to cisplatin-based therapy
[55] and promoted cervical cancer progression
[56].
PNLIPRP2 lower expression was found in pancreatic ductal adenocarcinoma
[57].
We also identified 993 somatic genetic alterations in the 370 pseudogene transcripts identified in HNC, and SNV was the most common type (96.8%), followed by deletions (1.9%) and insertions (1.3%). The genetic variations of the 31 most deregulated pseudogenes from HNC and its subtypes. The pseudogenes SPATA31D5P, HERC2P3, MAGEB6P1, SPATA31C2, SPATA31C1, SLC25A51P1, BAGE2, HSP90AB2P, OR2W5 and REG1P were the most genetically altered by several SNVs in the HNC patients.
From these identified pseudogenes, only
BAGE2 was already studied for genomic mutation profile and its copy number variation may be associated with the Robertsonian Down syndrome
[58]. Pseudogenes’ genetic alterations, their potential of interfering in pseudogene transcription, and carcinogenesis have not been explored in the literature yet.
Table 2. Most deregulated pseudogene transcripts in head and neck cancer patients identified from The Cancer Genome Atlas (TCGA) database, chromosome location, gene family function, and studies in cancer.
Tumor Location and Pseudogene Transcript |
Chromosome Location |
Gene Family Function |
Studies in Cancer |
Reference |
Head and neck (n = 219) |
|
|
|
SPATA31D5P |
9q21.32 |
UV response and DNA repair |
None |
[59] |
HERC2P3 |
15q11.1 |
Cell growth and migration |
Gastric |
[44] |
SPATA31C2 |
9q22.1 |
UV response and DNA repair |
None |
[59] |
MAGEB6P1 |
Xp21.3 |
Tumor-specific antigen |
None |
[60] |
SLC25A51P1 |
6q12 |
Mitochondrial NAD+ transporter |
None |
[61] |
BAGE2 |
21p11.2 |
Tumor-specific antigen |
Lung, colon, and breast |
[62] |
DNM1P47 |
15q26.3 |
Mitochondrial division |
None |
[63] |
SPATA31C1 |
9q22.1 |
UV response and DNA repair |
None |
[59] |
ZNF733P |
7q11.21 |
Transcription factor |
None |
[64] |
OR2W5 |
1q44 |
Cellular signaling |
None |
[65] |
Oral cavity (n = 62) |
|
|
|
|
SPATA31D5P |
9q21.32 |
UV response and DNA repair |
None |
[59] |
NBPF25P |
1q21.1 |
Neuronal modulation |
None |
[66] |
HSP90AB2P |
4p15.33 |
Cell proteostasis |
None |
[67] |
NXF4 |
Xq22.1 |
RNA export from nucleus |
None |
[68] |
FOLH1B |
11q14.3 |
Metallopeptidase activity |
Prostate |
[46] |
DNM1P47 |
15q26.3 |
Mitochondrial division |
None |
[63] |
BNIP3P1 |
14q12 |
Autophagy and apoptosis |
Breast cancer brain metastases |
[47] |
PKD1L2 |
16q23.2 |
Transmembrane protein |
Colorectal and breast |
[48][49] |
BAGE2 |
21p11.2 |
Tumor-specific antigen |
Lung, colon, and breast |
[62] |
ZNF658B |
9p12 |
Transcription factor |
None |
[69] |
Oropharynx (n = 51) |
|
|
|
|
POTEA |
8p11.1 |
Apoptosis |
Colorectal |
[50] |
MROH5 |
8q24.3 |
Uncertain |
None |
|
MSL3P1 |
2q37.1 |
Transcription regulation |
Renal and gastric |
[51][70] |
HLA-H |
6p22.1 |
Immune homeostasis |
Cervical and lung |
[52][53] |
TUBB8P7 |
16q24.3 |
Oocyte maturation |
None |
[71] |
SLC7A5P2 |
16p12.2 |
Amino acid transporter |
None |
[72] |
DPY19L2P1 |
7p14.2 |
Transmembrane protein |
Larynx |
[36] |
TSSC2 |
11p15.4 |
Tumor suppressor |
None |
[73] |
SPATA31C2 |
9q22.1 |
UV response and DNA repair |
None |
[59] |
NXF4 |
Xq22.1 |
RNA export from nucleus |
None |
[68] |
Hypopharynx (n = 8) |
|
|
|
|
DPY19L2P3 |
7p14.3 |
Transmembrane protein |
None |
[36] |
SPATA31D5P |
9q21.32 |
UV response and DNA repair |
None |
[59] |
GBA3 |
4p15.2 |
Glucosylceramide hydrolysis |
Liver |
[54] |
PLEKHM1P |
17q24.1 |
Autophagy |
None |
[74] |
DPY19L2P1 |
7p14.2 |
Transmembrane protein |
Larynx |
[36] |
MST1P2 |
1p36.13 |
Cell invasion and apoptosis |
Bladder and cervical |
[55][56] |
RP11-44F14.1 |
16q12.2 |
Unknown |
None |
|
ADAM21P1 |
14q24.2 |
Cell adhesion and proliferation |
None |
[75] |
MAGEB6P1 |
Xp21.3 |
Tumor-specific antigen |
None |
[60] |
OR12D2 |
6p22.1 |
Cellular signaling |
None |
[65] |
Larynx (n = 98) |
|
|
|
|
HERC2P3 |
15q11.1 |
Cell growth and migration |
Gastric |
[44] |
SPATA31D5P |
9q21.32 |
UV response and DNA repair |
None |
[59] |
SPATA31C2 |
9q22.1 |
UV response and DNA repair |
None |
[59] |
SLC25A51P1 |
6q12 |
Mitochondrial NAD+ transporter |
None |
[61] |
MAGEB6P1 |
Xp21.3 |
Tumor-specific antigen |
None |
[60] |
SPATA31C1 |
9q22.1 |
UV response and DNA repair |
None |
[59] |
BAGE2 |
21p11.2 |
Tumor-specific antigen |
Lung, colon, and breast |
[62] |
PNLIPRP2 |
10q25.3 |
Lipase activity |
Pancreas |
[57] |
ZNF733P |
7q11.21 |
Transcription factor |
None |
[64] |
DNM1P47 |
15q26.3 |
Mitochondrial division |
None |
[63] |
6.2. Co-Expression Networks and GO Enrichment Analysis
We performed co-expression network and GO enrichment analyses to map the relationship between pseudogenes and other genes in the genome. After analyzing 19,406 protein-coding genes and 31 pseudogenes, our co-expression networks contained 14,379 genes organized in 36 modules, with an average of 399 genes per module (median: 140 genes; range: 35–3645 genes;).
From the 31 most deregulated pseudogenes from HNC and its subtypes, 14 of them (SPATA31D5P, SPATA31C2, DNM1P47, NBPF25P, HSP90AB2P, NXF4, ZNF658B, POTEA, MROH5, HLA-H, DPY19L2P1, DPY19L2P3, GBA3, and PNLIPRP2) were present in our network across 8 modules (light yellow, red, dark magenta, orange, black, salmon, ivory, and dark olive green). Although the other pseudogenes of interest did not show a co-expression profile similar enough to other genes to be assigned to a module according to the parameters used to build this network, they are still important for further studies in HNC based on our previous TCGA data analysis.
We chose to highlight three modules (light yellow, red, and dark magenta) that contain genes well-known to be involved in head and neck carcinogenesis based on two important reviews
[76][77]. However, the other five modules also presented relevant gene associations that should be evaluated in further studies.
The light-yellow module contained the greatest number of pseudogenes (
NBPF25P, HSP90AB2P,
ZNF658B, and
DPY19L2P3). Interestingly, this module also contains 12 genes known to participate in head and neck carcinogenesis (
CASP8, involved in apoptosis;
NOTCH2 and
NOTCH3, involved in cell differentiation;
TRAF3, involved in antiviral response;
RB1 and
HRAS, involved in cell cycle control and proliferation;
PTEN, involved in apoptosis and cell cycle control;
CUL3 and
NFE2L2, involved in oxidative stress response;
EP300, involved in chromatin remodeling, and the transcription factors
IGF1R and
TP63)
[76][77], suggesting that these genes may directly or indirectly interact with the pseudogenes in this module and play a role in HNC (
Figure 2A). In addition, the light-yellow module is enriched in GO terms related to tumor development, such as regulation of cell communication, cellular response to stress, and intracellular transport (
Figure 2B). The potential relationship between the described pseudogenes and genes in this module has not been explored in the literature yet.
Figure 2. Co-expression network and GO enrichment analysis. Light-yellow module containing NBPF25P, HSP90AB2P, ZNF658B and DPY19L2P3 pseudogenes from co-expression network of 213 head and neck cancer (HNC) patients and gene ontology (GO) enrichment analysis. (A) Network highlighting pseudogenes (red letter) and HNC-related genes (blue letter) constructed using the Weighted Correlation Network Analysis (WGCNA) R software package. (B) Enriched GO terms identified by GOATOOLS Python library and summarized by the web tool REVIGO. Circles colored with darker red indicate GO terms with lower p-values in our enrichment analysis, while the size of the circles indicates the frequency of a GO term in the GO annotation database. GO terms that are highly similar have thicker lines.
The red module contains the pseudogene
DNM1P47 and the tumor suppressor
TP53 gene, mainly involved in cell cycle control and apoptosis
[76][77] (
Figure 3A), and is enriched in GO terms associated with regulation of cell proliferation, response to stress, and cell death (
Figure 3B). The potential relationship between the
DNM1P47 pseudogene and the
TP53 gene has not been explored in the literature yet.
Figure 3. Co-expression network and GO enrichment analysis. Red module containing DNM1P47 pseudogene from co-expression network of 213 head and neck cancer (HNC) patients and gene ontology (GO) enrichment analysis. (A) Network highlighting pseudogene (red letter) and HNC-related gene (blue letter) constructed using the Weighted Correlation Network Analysis (WGCNA) R software package. (B) Enriched GO terms identified by GOATOOLS Python library and summarized by the web tool REVIGO. Circles colored with darker red indicate GO terms with lower p-values in our enrichment analysis, while the size of the circles indicates the frequency of a GO term in the GO annotation database. GO terms that are highly similar have thicker lines.
The dark magenta module contains the known tumor genes
HLA-A and
HLA-B, involved in immune response
[77], very closely associated with the
HLA-H pseudogene (
Figure 4A), and is also enriched in GO terms important to cancer biology, such as cell proliferation, regulation of the immune system, regulation of gene expression, and Wnt signaling pathway (
Figure 4B). The relationship of the
HLA-H pseudogene and
HLA-A and
HLA-B genes has been described in the literature, and even related to lung cancer in the Asian population
[53]. The SNV rs12333226, capable of modulating
HLA-A and
HLA-H expression levels, was suggested to exert an effect on lung cancer through these two immune-related genes and the pseudogene
[53].
Figure 4. Co-expression network and GO enrichment analysis. Dark magenta module containing HLA-H pseudogene from co-expression network of 213 head and neck cancer (HNC) patients and gene ontology (GO) enrichment analysis. (A) Network highlighting pseudogene (red letter) and HNC-related genes (blue letter) constructed using the Weighted Correlation Network Analysis (WGCNA) R software package. (B) Enriched GO terms identified by GOATOOLS Python library and summarized by the web tool REVIGO. Circles colored with darker red indicate GO terms with lower p-values in our enrichment analysis, while the size of the circles indicates the frequency of a GO term in the GO annotation database. GO terms that are highly similar have thicker lines.
The other five modules consist of an orange module containing DPY19L2P1, a black module containing NXF4, MROH5, and SPATA31D5P, a salmon module containing PNLIPRP2, an ivory module containing SPATA31C2, and a dark olive-green module containing POTEA and GBA3 pseudogenes.
6.3. Survival Analysis
To investigate the prognostic value of the pseudogenes identified by the TCGA database in HNC, we assessed the online database The Kaplan–Meier Plotter. In this analysis, only the prognostic value of the ten most deregulated pseudogenes identified in HNC general samples was assessed (SPATA31D5P, HERC2P3, SPATA31C2, MAGEB6P1, SLC25A51P1, BAGE2, DNM1P47, SPATA31C1, ZNF733P and OR2W5) because the online database does not allow to stratify the patients by tumor location.
We observed that lower expression of SPATA31D5P, SPATA31C2, BAGE2, SPATA31C1, ZNF733P and OR2W5 pseudogene transcripts was associated with the worst RFS of HNC patients. Patients with lower expression presented 2.56, 2.63, 2.33, 3.57, 3.03, and 3.03 more chances respectively, to relapse, compared to other patients (Figure 5A).
Figure 5. Pseudogene transcripts and head and neck cancer (HNC) patients’ survival. Prognostic value of deregulated pseudogene transcripts in HNC patients by Kaplan–Meier Plotter online database. (A) Lower expression of SPATA31D5P, SPATA31C2, BAGE2, SPATA31C, ZNF733P and OR2W5 pseudogene transcripts indicated worst relapse-free survival in HNC patients. (B) Higher expression of SPATA31D5P, ZNF733P, and OR2W5 pseudogene transcripts and lower expression of SPATA31C2, BAGE2 and SPATA31C1 pseudogene transcripts were associated with worst overall survival of HNC patients.
In addition, we observed that higher expression of
SPATA31D5P,
ZNF733P and
OR2W5 pseudogene transcripts was associated with the worst OS of HNC patients. Patients with higher expression presented 1.39, 1.56, and 1.53 more chances respectively, of dying, compared to other patients. Moreover, lower expression of
SPATA31C2,
BAGE2 and
SPATA31C1 pseudogene transcripts was associated with the worst OS of HNC patients. Patients with lower expression presented 1.43, 1.39, and 1.43 more chances respectively, of dying, compared to other patients (
Figure 5B).
MAGEB6P1,
SLC25A51P1 and
DNM1P47 pseudogene transcripts were not available in The Kaplan–Meier Plotter database, so we could not perform survival analysis. The
HERC2P3 pseudogene transcript did not influence RFS or OS in the HNC patients. However, in the literature,
HERC2P3 upregulation was associated with cell migration in gastric cancer, an independent prognosis factor
[44]. The prognostic values of
SPATA31D5P,
SPATA31C2,
BAGE2,
SPATA31C1, ZNF733P and
OR2W5 have not been explored in the literature yet.