6.1. TCGA Data Analysis
We identified 370 pseudogene transcripts associated with HNC, where SPATA31D5P (SPATA31 subfamily D member 5 pseudogene), HERC2P3 (hect domain and RLD2 pseudogene 3), SPATA31C2 (SPATA31 subfamily C member 2), MAGEB6P1 (melanoma antigen family B6 pseudogene 1), SLC25A51P1 (solute carrier family 25 member 51 pseudogene 1), BAGE2 (B melanoma antigen family member 2), DNM1P47 (DNM1 pseudogene 47), SPATA31C1 (SPATA31 subfamily C member 1), ZNF733P (zinc finger protein 733 pseudogene), and OR2W5 (olfactory receptor, family 2, subfamily W, member 5) were found to be the most deregulated in HNC. The ten most deregulated pseudogene transcripts, chromosome location, gene family function, and involvement in cancer can be found in Table 2.
When we stratified patients’ information by tumor location, we found a few differences in pseudogene transcripts’ pattern that could be explained by different cancer behaviors
[43], but some of the identified pseudogenes were present in more than one tumor location.
In oral cancer (n = 62), SPATA31D5P, NBPF25P (neuroblastoma breakpoint family member 25 pseudogene), HSP90AB2P (heat shock protein 90 alpha class B member 2 pseudogene), NXF4 (nuclear RNA export factor 4 pseudogene), FOLH1B (folate hydrolase 1B), DNM1P47, BNIP3P1 (BCL2/adenovirus E1B 19 interacting protein 3 pseudogene 1), PKD1L2 (polycystin 1-like 2 pseudogene), BAGE2, and ZNF658B (zinc finger protein 658B, pseudogene) were found to be the 10 most deregulated of 154 pseudogene transcripts identified (Table 2).
In oropharyngeal cancer (n = 51), POTEA (POTE ankyrin domain family, member A), MROH5 (maestro heat-like repeat family member 5), MSL3P1 (male-specific lethal 3 homolog pseudogene 1), HLA-H (major histocompatibility complex class I, H pseudogene), TUBB8P7 (tubulin beta 8 class VIII pseudogene 7), SLC7A5P2 (solute carrier family 7 member 5 pseudogene 2), DPY19L2P1 (DPY19L2 pseudogene 1), TSSC2 (tumor suppressing sub-transferable candidate 2 pseudogene), SPATA31C2, and NXF4 were found to be the 10 most deregulated of 109 pseudogene transcripts identified (Table 2 ).
In hypopharyngeal cancer (n = 8), DPY19L2P3 (DPY19L2 pseudogene 3), SPATA31D5P, GBA3 (glucosidase beta acid 3), PLEKHM1P (pleckstrin homology domain containing, family M member 1 pseudogene), DPY19L2P1, MST1P2 (macrophage-stimulating 1 pseudogene 2), RP11-44F14.1, ADAM21P1 (ADAM metallopeptidase domain 21 pseudogene 1), MAGEB6P1, and OR12D2 (olfactory receptor family 12 subfamily D member 2) were found to be the 10 most deregulated of 21 pseudogene transcripts identified (Table 2 ).
UV response and DNA repair |
None |
[ |
59 |
] |
NXF4 |
Xq22.1 |
RNA export from nucleus |
None |
[ |
68 |
] |
Hypopharynx ( |
n |
= 8) |
|
|
|
|
DPY19L2P3 |
7p14.3 |
Transmembrane protein |
None |
[36] |
SPATA31D5P |
9q21.32 |
UV response and DNA repair |
None |
[59] |
GBA3 |
4p15.2 |
Glucosylceramide hydrolysis |
Liver |
[54] |
PLEKHM1P |
17q24.1 |
Autophagy |
None |
[74] |
DPY19L2P1 |
7p14.2 |
Transmembrane protein |
Larynx |
[36] |
MST1P2 |
1p36.13 |
Cell invasion and apoptosis |
Bladder and cervical |
[55][56] |
RP11-44F14.1 |
16q12.2 |
Unknown |
None |
|
ADAM21P1 |
14q24.2 |
Cell adhesion and proliferation |
None |
[75] |
MAGEB6P1 |
Xp21.3 |
Tumor-specific antigen |
None |
[60] |
OR12D2 |
6p22.1 |
Cellular signaling |
None |
[65] |
Larynx (n = 98) |
|
|
|
|
HERC2P3 |
15q11.1 |
Cell growth and migration |
Gastric |
[44] |
SPATA31D5P |
9q21.32 |
UV response and DNA repair |
None |
[59] |
SPATA31C2 |
9q22.1 |
UV response and DNA repair |
None |
[59] |
SLC25A51P1 |
6q12 |
Mitochondrial NAD+ transporter |
None |
[61] |
MAGEB6P1 |
Xp21.3 |
Tumor-specific antigen |
None |
[60] |
SPATA31C1 |
9q22.1 |
UV response and DNA repair |
None |
[59] |
BAGE2 |
21p11.2 |
Tumor-specific antigen |
Lung, colon, and breast |
[62] |
PNLIPRP2 |
10q25.3 |
Lipase activity |
Pancreas |
[57] |
ZNF733P |
7q11.21 |
Transcription factor |
None |
[64] |
DNM1P47 |
15q26.3 |
Mitochondrial division |
None |
[63] |
6.2. Co-Expression Networks and GO Enrichment Analysis
We performed co-expression network and GO enrichment analyses to map the relationship between pseudogenes and other genes in the genome. After analyzing 19,406 protein-coding genes and 31 pseudogenes, our co-expression networks contained 14,379 genes organized in 36 modules, with an average of 399 genes per module (median: 140 genes; range: 35–3645 genes;).
From the 31 most deregulated pseudogenes from HNC and its subtypes, 14 of them (SPATA31D5P, SPATA31C2, DNM1P47, NBPF25P, HSP90AB2P, NXF4, ZNF658B, POTEA, MROH5, HLA-H, DPY19L2P1, DPY19L2P3, GBA3, and PNLIPRP2) were present in our network across 8 modules (light yellow, red, dark magenta, orange, black, salmon, ivory, and dark olive green). Although the other pseudogenes of interest did not show a co-expression profile similar enough to other genes to be assigned to a module according to the parameters used to build this network, they are still important for further studies in HNC based on our previous TCGA data analysis.
We chose to highlight three modules (light yellow, red, and dark magenta) that contain genes well-known to be involved in head and neck carcinogenesis based on two important reviews
[76][77]. However, the other five modules also presented relevant gene associations that should be evaluated in further studies.
The light-yellow module contained the greatest number of pseudogenes (
NBPF25P, HSP90AB2P,
ZNF658B, and
DPY19L2P3). Interestingly, this module also contains 12 genes known to participate in head and neck carcinogenesis (
CASP8, involved in apoptosis;
NOTCH2 and
NOTCH3, involved in cell differentiation;
TRAF3, involved in antiviral response;
RB1 and
HRAS, involved in cell cycle control and proliferation;
PTEN, involved in apoptosis and cell cycle control;
CUL3 and
NFE2L2, involved in oxidative stress response;
EP300, involved in chromatin remodeling, and the transcription factors
IGF1R and
TP63)
[76][77], suggesting that these genes may directly or indirectly interact with the pseudogenes in this module and play a role in HNC (
Figure 2A). In addition, the light-yellow module is enriched in GO terms related to tumor development, such as regulation of cell communication, cellular response to stress, and intracellular transport (
Figure 2B). The potential relationship between the described pseudogenes and genes in this module has not been explored in the literature yet.
Figure 2. Co-expression network and GO enrichment analysis. Light-yellow module containing NBPF25P, HSP90AB2P, ZNF658B and DPY19L2P3 pseudogenes from co-expression network of 213 head and neck cancer (HNC) patients and gene ontology (GO) enrichment analysis. (A) Network highlighting pseudogenes (red letter) and HNC-related genes (blue letter) constructed using the Weighted Correlation Network Analysis (WGCNA) R software package. (B) Enriched GO terms identified by GOATOOLS Python library and summarized by the web tool REVIGO. Circles colored with darker red indicate GO terms with lower p-values in our enrichment analysis, while the size of the circles indicates the frequency of a GO term in the GO annotation database. GO terms that are highly similar have thicker lines.
The red module contains the pseudogene
DNM1P47 and the tumor suppressor
TP53 gene, mainly involved in cell cycle control and apoptosis
[76][77] (
Figure 3A), and is enriched in GO terms associated with regulation of cell proliferation, response to stress, and cell death (
Figure 3B). The potential relationship between the
DNM1P47 pseudogene and the
TP53 gene has not been explored in the literature yet.
Figure 3. Co-expression network and GO enrichment analysis. Red module containing DNM1P47 pseudogene from co-expression network of 213 head and neck cancer (HNC) patients and gene ontology (GO) enrichment analysis. (A) Network highlighting pseudogene (red letter) and HNC-related gene (blue letter) constructed using the Weighted Correlation Network Analysis (WGCNA) R software package. (B) Enriched GO terms identified by GOATOOLS Python library and summarized by the web tool REVIGO. Circles colored with darker red indicate GO terms with lower p-values in our enrichment analysis, while the size of the circles indicates the frequency of a GO term in the GO annotation database. GO terms that are highly similar have thicker lines.
The dark magenta module contains the known tumor genes
HLA-A and
HLA-B, involved in immune response
[77], very closely associated with the
HLA-H pseudogene (
Figure 4A), and is also enriched in GO terms important to cancer biology, such as cell proliferation, regulation of the immune system, regulation of gene expression, and Wnt signaling pathway (
Figure 4B). The relationship of the
HLA-H pseudogene and
HLA-A and
HLA-B genes has been described in the literature, and even related to lung cancer in the Asian population
[53]. The SNV rs12333226, capable of modulating
HLA-A and
HLA-H expression levels, was suggested to exert an effect on lung cancer through these two immune-related genes and the pseudogene
[53].
Figure 4. Co-expression network and GO enrichment analysis. Dark magenta module containing HLA-H pseudogene from co-expression network of 213 head and neck cancer (HNC) patients and gene ontology (GO) enrichment analysis. (A) Network highlighting pseudogene (red letter) and HNC-related genes (blue letter) constructed using the Weighted Correlation Network Analysis (WGCNA) R software package. (B) Enriched GO terms identified by GOATOOLS Python library and summarized by the web tool REVIGO. Circles colored with darker red indicate GO terms with lower p-values in our enrichment analysis, while the size of the circles indicates the frequency of a GO term in the GO annotation database. GO terms that are highly similar have thicker lines.
The other five modules consist of an orange module containing DPY19L2P1, a black module containing NXF4, MROH5, and SPATA31D5P, a salmon module containing PNLIPRP2, an ivory module containing SPATA31C2, and a dark olive-green module containing POTEA and GBA3 pseudogenes.
6.3. Survival Analysis
To investigate the prognostic value of the pseudogenes identified by the TCGA database in HNC, we assessed the online database The Kaplan–Meier Plotter. In this analysis, only the prognostic value of the ten most deregulated pseudogenes identified in HNC general samples was assessed (SPATA31D5P, HERC2P3, SPATA31C2, MAGEB6P1, SLC25A51P1, BAGE2, DNM1P47, SPATA31C1, ZNF733P and OR2W5) because the online database does not allow to stratify the patients by tumor location.
We observed that lower expression of SPATA31D5P, SPATA31C2, BAGE2, SPATA31C1, ZNF733P and OR2W5 pseudogene transcripts was associated with the worst RFS of HNC patients. Patients with lower expression presented 2.56, 2.63, 2.33, 3.57, 3.03, and 3.03 more chances respectively, to relapse, compared to other patients (Figure 5A).
Figure 5. Pseudogene transcripts and head and neck cancer (HNC) patients’ survival. Prognostic value of deregulated pseudogene transcripts in HNC patients by Kaplan–Meier Plotter online database. (A) Lower expression of SPATA31D5P, SPATA31C2, BAGE2, SPATA31C, ZNF733P and OR2W5 pseudogene transcripts indicated worst relapse-free survival in HNC patients. (B) Higher expression of SPATA31D5P, ZNF733P, and OR2W5 pseudogene transcripts and lower expression of SPATA31C2, BAGE2 and SPATA31C1 pseudogene transcripts were associated with worst overall survival of HNC patients.
In addition, we observed that higher expression of
SPATA31D5P,
ZNF733P and
OR2W5 pseudogene transcripts was associated with the worst OS of HNC patients. Patients with higher expression presented 1.39, 1.56, and 1.53 more chances respectively, of dying, compared to other patients. Moreover, lower expression of
SPATA31C2,
BAGE2 and
SPATA31C1 pseudogene transcripts was associated with the worst OS of HNC patients. Patients with lower expression presented 1.43, 1.39, and 1.43 more chances respectively, of dying, compared to other patients (
Figure 5B).
MAGEB6P1,
SLC25A51P1 and
DNM1P47 pseudogene transcripts were not available in The Kaplan–Meier Plotter database, so we could not perform survival analysis. The
HERC2P3 pseudogene transcript did not influence RFS or OS in the HNC patients. However, in the literature,
HERC2P3 upregulation was associated with cell migration in gastric cancer, an independent prognosis factor
[44]. The prognostic values of
SPATA31D5P,
SPATA31C2,
BAGE2,
SPATA31C1, ZNF733P and
OR2W5 have not been explored in the literature yet.