Among the ~22,000 human genes, very few remain that have unknown functions. One such ex-ample is suprabasin (SBSN). Originally described as a component of the cornified envelope, the function of stratified epithelia-expressed SBSN is unknown. Both the lack of knowledge about the gene role under physiological conditions and the emerging link of SBSN to various human diseas-es, including cancer, attract research interest. The association of SBSN expression with poor prognosis of patients suffering from oesophageal carcinoma, glioblastoma multiforme, and myel-odysplastic syndromes suggests that SBSN may play a role in human tumourigenesis. Three SBSN isoforms code for the secreted proteins with putative function as signalling molecules, yet with poorly described effects.
Since its original description in human and mouse keratinocyte differentiation , suprabasin (SBSN) has been associated with multiple diseases, including cancer. SBSN isoforms are putative signalling molecules inducing cellular signalling (AKT, WNT/β-catenin, and/or p38MAPK signalling) and various cellular processes, such as migration, proliferation, neovascularization, therapy-, apoptosis- and immune-resistance. Therefore, SBSN is considered an oncogene and is a proposed biomarker in a couple of diseases, lung carcinoma and myelodysplastic syndromes (MDS). The apparent need for a deeper understating of the nature and the function of SBSN prompted us to compile all current knowledge of SBSN together with suggestions for future research work.
SBSN gene is located on human chromosome 19 (chr 19: 35,523,367–35,528,351 reverse strand; GRCh38:CM000681.2; band 19q13.1) close to other keratinocyte-differentiation associated genes dermokine-α/β and KDAP . In mice it corresponds to chromosome 7 (7:30,751,471–30,756,134 forward strand; GRCm38:CM001000.2; band 7B2–7B3). The SBSN gene is a part of a coordinately expressed new stratified epithelium-related gene cluster, tentatively named stratified epithelium secreted peptides complex, SSC. SBSN was studied mostly in human and mice; however, other mammals possess SBSN homologs. Predicted SBSN peptide encoded in the genome of Gorilla gorilla gorilla possesses 97.6% amino acid sequence similarity to the human SBSN-1. Relatively high amino acid sequence identity (58.9%) between mouse and human orthologs of the largest SBSN isoform (SBSN isoform 1; SBSN-1) suggests strong gene integrity and conserved function among the species. Paralogs of SBSN were not defined, and no genes with confidently high sequence identities were identified. The Ensembl database  (Ensembl Genome Browser version 101, accessed on 21 August 2020) enlists 106 SBSN orthologs (26/26 primates; 30/32 rodents and related; 39/45 Laurasiatheria; 0/19 Sauropsida; 2/86 fish, and nine Monotremata and Marsupialia; Figure 1) with 96 orthologs having Gene Order Conservation Score 100 (identical four closest genes), implicating true orthology. Note, the putative SBSN fish genes show Gene Order Conservation Scores of 0 and 2.1% and 8.6% sequence identity to the human ortholog, respectively. Altogether, these observations suggest that SBSN is likely a mammalian-specific orphan gene.
Figure 1. The gene tree of suprabasin SBSN. The gene tree of SBSN generated with MEGA X software  utilizing Enesembl (version 101) available data . With the total sum of branch lengths (SBL) being >9.477. Distance scale = 0.10.
Human SBSN gene consists of five exons and four introns (Figure 2a). Human SBSN mRNA can be alternatively spliced producing three known isoforms SBSN-1 (transcript length: 1946 bp; ENST00000452271.7), SBSN-2 (957 bp; ENST00000518157.1) and SBSN-3 (593 bp; ENST00000588674.5). Importantly, SBSN-2 represents a fully spliced isoform , whereas SBSN-1 contains an in-frame retention of the first intron. SBSN-3 is also a fully spliced isoform but lacks exon 2. Strikingly, the mouse (Mus musculus) isoform corresponding to human SBSN-2 has not been identified. This may restrict the applicability of a mouse model for functional studies of SBSN isoforms. Interestingly, a sequence with high sequence identity (81.4%) to human exon two is present within the murine Sbsn-1 exon one, and the major difference between the human intron one exon two junction and homologous murine sequence (CACGAGGCCGGG vs. AACCAGGGTCAA) implies that in mice, the splicing site was likely not established, or was lost. The murine putative exon two is spliced together with the entire exon one in murine Sbsn-2, hence resembling human isoform three, based upon amino acids sequence identity analysis (44.5% for human SBSN-2 vs. murine Sbsn-2, and 65.5% for human SBSN-3 vs. murine Sbsn-2). Multiple other rodents lack homolog corresponding to the human SBSN-2 isoform, and interestingly, several primates (e.g., Macaca mulatta, Pan paniscus and Microcebus murinus) lack putative SBSN-2 as well. However, this may be due to the absence of its identification, rather than sequence deviation.
The UniProtKB database  (The UniProt release 2020_4, accessed 21 August 2020) refers to three human SBSN isoforms (Figure 2b). SBSN-1 (Q6UWP8-1; 590 aa, predicted mass 60.541 Da) and SBSN-2 (Q6UWP8-2; 247 aa, predicted mass 25.335 Da) are well defined. However, SBSN-3 (K7ESC4) is described as a 149 aa long peptide with a predicted mass of 15.318 Da, which is incorrect due to the lacking description of putative N-terminal signal peptide (i.e., missing N-terminal sequence MHLARLVGSCSLLLLLGALS; see also sequence alignment in Figure 2a) in the database. In fact, the SBSN-3 coding sequence is 507 nts long and translates into a 169 aa long peptide. All SBSN isoforms possess a putative N-terminal signal peptide (aa 1–25)  granting SBSN secretory nature and extracellular localization.
Figure 2. The sequences of SBSN isoforms. (a) The amino acid sequences of human SBSN isoforms. (b) The gene organization of human SBSN isoforms.
Distinctly of the other two isoforms, SBSN-1 is alanine-(14%), glycine-(20%), and histidine-(10%) rich due to the presence of short tandem Glycine-x-Histidine-Histidine repeats (GxHH repeats, x stands for any classical amino acid) encoded by the retained intron one. Note, structural proteins often contain compositional biases of similar characteristics to SBSN-1 Ala-/Gly-/His-rich domain (aa 26—524), and based upon these observations, SBSN was proposed as a component of the cornified envelope (CE), the essential structure of corneocytes responsible for the skin protective function. Notably, no similar sequence encoding SBSN-1 Ala-/Gly-/His-rich domain is present in the human genome, indicating an evolutionarily unique and conserved role. This is further supported by a comparison of SBSN among species, showing remarkable conservation of the Ala-/Gly-/His-rich domain, as well as retention of the first intron encoding the domain. Human SBSN-2 and SBSN-3 lack most of the Ala-/Gly-/His-rich domains due to intron one splicing; however, a short repeat is located at the 3′ end of exon one. Similarly to Ala-/Gly-/His-rich domain of SBSN-1, the C-terminal sequence of all isoforms is of unknown function. As mentioned, SBSN-3 lacks exon two, but the remaining exons are present in all human SBSN isoforms.
The mouse homolog of SBSN possesses similar features. The N-terminal signal peptide was predicted (aa 1–23), and the compositional bias of Ala-/Gly-/His-rich domain (aa 97—478) in SBSN-1 (Q8CIT9; 700 aa; predicted mass 72.334 Da) is also present in mouse homolog. The current description of other murine Sbsn isoforms is rather confusing. Originally, murine Sbsn-1 isoform was identified as a 700 aa long peptide. A shorter isoform, lacking the longest isoform-specific internal repeats, is a putative Sbsn-2 (Q8CIT9-3; 164 aa; predicted mass 16.967 Da), which corresponds to human SBSN isoform-3 according to sequence analysis. Besides these two isoforms, Ensembl (version 101) identifies two other protein-coding isoforms supported by one or no EST, respectively. The UniProtKB database refers to six mouse Sbsn isoforms, including Sbsn-1 and Sbsn-2, the other isoforms are either duplications of Sbsn-1 or Sbsn-2, or unreviewed isoforms.
Secreted murine Sbsn-2 was detected using overexpression experiments, showing a mobility shift on immunoblots which indicates a post-translational modification. Indeed, murine Sbsn-1 is a substrate to tissue transglutaminase (Tgm) 2 and epithelial Tgm3 resulting in intermolecular crosslinking, and this represents the only verified post-translational modification in vitro. Its physiological relevance is still undetected. Citrullination of Sbsn was observed in the blood proteome of Parkinson’s disease in a rat model of pre-motor Parkinson’s disease . Additionally, glycosylation of SBSN-1 was predicted and indeed, using mass spectrometry approach, threonine 59 of human SBSN was identified to undergo O-glycosylation in various human cell lines . Three protein kinase C (PKC) phosphorylation sites in C-terminus and two casein kinase II phosphorylation sites were predicted together with 73 potential N-myristoylation sites in the mouse homolog; however, these predictions require experimental validation.
Three-dimensional structure of SBSN is still unknown. We performed Phyre2-based structure prediction  of all three human SBSN isoforms. In the case of SBSN-1, the predicted model predominantly consisted of disordered regions. Crystal structure of d337a mutant of Pseudomonas sp. mis38 lipase (PDB: 2ZJ6) served as a template for structure prediction with 12% sequence identity and 99.9% prediction confidence. Collagen α chain was among other possible templates with slightly lower confidence levels. The methyl-accepting chemotaxis protein of Escherichia coli (PDB: 1QU7), showing 13% sequence identity with SBSN-2, was used as a template for the structure prediction of the isoform with 98.5% confidence. Interestingly, the methyl-accepting chemotaxis protein of Thermotoga maritima (PDB: 2CH7) was suggested as a second template with 98.4% confidence. SBSN-3 isoform model was predicted upon the structure of human micelle-bound α-synuclein (PDB: 1XQ8), which showed 18% sequence identity.
SBSN expression is tightly associated with stratified epithelia, but other expression sites have also been defined. Mechanisms of SBSN transcriptional control are not clearly described. We utilized the JASPAR algorithm  to predict binding sites of transcription factors in human (Supplementary Table S1) and mouse (Supplementary Table S2) SBSN proximal promoter (2 kbp upstream) and downstream coding region. Additionally, we verified putative binding sites. Based on these results, we established a model of human SBSN promoter (Figure 3). From transcription binding sites predicted within human SBSN proximal promoter region by in silico analysis, the SOX2 binding site is currently the only one experimentally validated  (Figure 3). Next, a Brother of the Regulator of Imprinted Sites (BORIS)-binding site within the coding region was experimentally verified . Importantly, aberrant changes in expression of SBSN isoform are associated with atopic dermatitis (see below ), but mechanisms responsible for differential expression of isoforms under physiological and pathological conditions are not known.
SBSN is physiologically expressed in mouse stratified epithelia including the suprabasal epithelial layer of epidermis, tongue, oesophagus, palate, stomach, uterus, thyroid, trachea, lung, vagina, thymus, and urinary tract. In human, SBSN expression is associated with epidermis, thymus, uterus, tonsils, vagina, and oesophagus . Originally, there was no evidence of SBSN expression in mouse embryonic and adult brain , but recent studies showed, using immunofluorescence, SBSN expression in human astrocytes, and its elevation under pathological conditions . Transcriptome analysis of the human brain revealed SBSN mRNA levels in basal ganglia, but evaluation of SBSN protein presence in the human brain is needed to confirm this expression site . On the contrary, the mouse brain does not show the presence of Sbsn mRNA; neither do several other mouse tissues and organs such as heart, kidney, or smooth muscle. Therefore, between mouse and human, the SBSN expression sites seem to be conserved, except for the brain.
Figure 3. SBSN promoter. Two kilobase pairs upstream of the promoter and downstream sequence of human SBSN with predicted transcription factor binding sites. SOX2 binding site is currently the only validated site together with Brother of the Regulator of Imprinted Sites (BORIS) binding sites.
Mouse Sbsn mRNA was detected on day seven of embryonic development when the expression is likely mediated by extra-embryonic tissue. Hence, the embryonic expression was detected on day 15 of the development at first, which is in coincidence with epidermal stratification. Sbsn mRNA levels then peak on day 17, and SBSN expression is associated with expression of dermokine-α/β, the components of secreted peptides complex. Similarly, SBSN mRNA was elevated in skeletal muscle cells of Alaskan sled dogs during an acute response (2 h post-exercise) after a prolonged endurance training together with dermokine and keratin 5. At the same time point, transcripts of genes involved in inflammation, oxidative stress, intermediary metabolism, immune response, and cellular compromise transcripts, e.g., S100A8, were also upregulated. The role of inflammation, immune, and stress response in SBSN expression is supported by the microarray analysis of therapy-resistant cancer cells in vitro, which showed transcript elevation of innate immune response genes and SBSN following 5-aza-2′-deoxycytidine (5-AC)-treatment or γ-radiation. Notably, activation of the ERK pathway downstream of IFN signalling emerged as a direct activator of SBSN expression . Therefore, epidermal differentiation and response to inflammation provide hints at the understanding of SBSN expression inducing processes.
The composition of transcription factors and stimuli responsible for SBSN transcription is not specifically defined, but some are suggested. The Sbsn transcript elevates in differentiating mouse keratinocytes in vitro, whereas, several genes of the cornified envelope are downregulated upon SBSN knockdown. This provides additional support for the coordinated gene expression program during skin differentiation. Targets of the ERK pathway, i.e., components of the AP-1 transcription factor complex c-FOS, FRA-1, FRA-2, c-JUN, JUND, and JUNB, are differentially expressed in keratinocytes during their terminal differentiation in organotypic cultures, and AP-1 proteins are differentially expressed in the human epidermis . This supports the role of ERK in SBSN expression; however, the role of ERK in keratinocyte differentiation provides contradictory results . Additionally, MAL/SRF signalling also results in JUNB elevation, which plays an essential role in epidermal differentiation . Inhibition of BCR-RHOA-MAL/SRF pathway resulted in the reduction of JUNB and SBSN transcripts, together with disruption of keratinocyte granulation and development of stratum corneum in an organotypic model of the human epidermis. Mouse bearing conditional knockout of Srf in basal cells, showed reduced Sbsn transcript levels , and indeed, the SRF binding site was predicted within the SBSN proximal promoter region of both, human and mouse (with three binding sites in mice (−1994/−1978, −1036/−1023, −789/−773 (0.83, 0.8, and 0.8 relative score) and one binding site (−1777/−1760) in human with a 0.82 relative score). Changes in the actin cytoskeleton and MAL/SRF promote physical stimuli-induced keratinocyte differentiation via JUNB . Note, in dog muscle, the JUNB transcript is elevated 2 h post-exercise, while the Fos transcript is downregulated. ERK-mediated regulation of SBSN is further supported by increased expression of Sbsn in murine endothelial cells following treatment with Egf, but not bFgf . Furthermore, phorbol 12-myristate 13-acetate (PMA)-mediated ERK activation enhanced SBSN expression efficiently, though this effect might be partially mediated by PKC since calcium-induced SBSN expression during differentiation of primary epidermal keratinocytes in vitro was suppressed with PKC inhibitor. Therefore, multiple pathways activated during keratinocytes differentiation may promote SBSN expression likely via, but not only, AP-1-enabled transcription. Indeed, SBSN promoter region contains multiple AP-1 binding sites. In total, a JASPAR search predicted 72 AP-1 binding sites, with extensive sequence overlaps, hence lesser number of regions is more likely. Five regions showed >0.9 relative scores (−93/−102, −112/−118, −131/−137, −194/−206, −1147/−1153).
Lower temperature (33 °C) is frequently used in biotechnology for culture/propagation of Chinese hamster ovary cells. In a recent study, a list of cold-induced genes was established using RNAseq 48 h post change of condition, which included SBSN . This was further supported with ectopic expression of luciferase driven by SBSN promoter exposed to lower temperatures. S100A4 was identified to be a cold-induced gene , together with S100A6, which is supported by a previous study . Furthermore, the JunD binding site was predicted among selected cold-regulated promoters, including SBSN, and JUND transcript levels were elevated following cold-treatment .
In response to hypoxic stress, high altitude acclimatization leads to lower oxygen tension and hypobaric pressure with enhanced hematopoiesis, increased blood volume, and neoangiogenesis to redistribute blood flow to vital organs, including the brain. The latter is mediated by carotid arteries. Upon acclimatization to high-altitude-associated long-term hypoxia, SBSN transcript was one of 58 significantly upregulated in carotid arteries in sheep . Interestingly, most altered genes were associated with cell migration, growth and proliferation, and angiogenesis. The authors also noted that some of the regulated genes are also common targets of treatment with lipopolysaccharide (LPS), again supporting the contribution of the innate immune response to SBSN expression. Note, the upstream pathway, ERK, and a target of SBSN signalling, AKT, showed increased activation in sheep carotid arteries accompanied long-term hypoxia .
The proximal region of SBSN gene promoter was originally described as AT-rich, lacking CpG islands and containing a canonical TATA box, however, in our analysis presented here (Figure 3) we were not able to identify these regions within 250 bp upstream of +1 site of human SBSN promoter region. Conversely, we observed an initiator element at +9/+15 (Figure 3). Multiple NF-kappaB binding sites were predicted within a 2 kbp region upstream of SBSN transcription start site in both human and mouse cells . We predicted thirteen NF-kappaB binding sites within this region (Figure 3). The most prominent sites depicted had a relative score > 0.92; −1921/1930, > 0.8; −48/−39). Note, both PMA and LPS are potent inducers of NF-kappaB pathway. Furthermore, several other binding sites for transcription factors, such as SP1, TF2APA, MYC, SMAD2, and FOXO1/FOXO4 were predicted within the SBSN proximal promoter . Indeed, SBSN transcription in confluent human adipocyte tissue-derived stem cells (ASCs) is mediated by FOXO1 , since downregulation of FOXO1 with silencing RNA reduced SBSN transcript levels. We predicted four FOXO1 binding sites within 2 kbp SBSN promoter region (relative score > 0.84; −1616/1609, −1668/−1661, −1692/−1685, −1930/−1923; Figure 3). Interestingly, SBSN promotes aromatase expression, and SBSN was shown to be induced with 17beta-estradiol treatment . This is supported with the prediction of estrogen receptor (ER) binding sites within the promoter. We identified thirteen ER binding sites within 2 kbp upstream region of SBSN promoter and selected three highest scoring regions to depict (relative score > 0.89; −438/−427, > 0.86; −578/−568, > 0.85; −1902/−1888; Figure 3). Altogether, dozens of transcription factors binding sites are predicted. Needless to mention, confirmation of function of the predicted binding regions require further investigation.
Multiple studies described aberrant elevation of SBSN in human malignancies , but the mechanisms responsible for SBSN upregulation under pathological conditions is not understood. As mentioned, SBSN expression is limited to specific tissues. Bisulfite sequencing of 11 healthy human lung tissue samples revealed methylation of SBSN promoter CpG islands. This indicates that promoter methylation, and likely associated transcriptional repression, are responsible for SBSN silencing . Indeed, normal human bronchial epithelial and human small airway epithelial cells treated with demethylating agent 5-AC and histone deacetylase inhibitor trichostatin A (TSA) elevated SBSN transcript levels . This is further supported by hypomethylation of SBSN promoter in approximately 50% (13/28) of primary non-small cell lung carcinoma (NSCLC) samples observed in the same study. Hypomethylation of the SBSN promoter was an effect of a dysregulated proto-oncogenic zinc finger transcription factor, CTCFL/Brother of the Regulator of Imprinted Sites (BORIS). Similarly, SBSN transcription can be induced in vitro in a salivary gland adenoid cystic carcinoma (ACC) cell line with 5-AC and trichostatin A (TSA) resulting in hypomethylation of the CpG island . The SBSN transcripts are induced with 5-AC in human cancer cell lines such as DU-145, MCF-7, and HeLa. Notably, SBSN proteins were only detectable in a low-adherent subfraction of therapy-resistant 5-AC-treated cells with stem cell-like properties, indicating post-transcriptional regulation of SBSN expression. The SBSN gene promoter contains two experimentally confirmed CTCFL/BORIS binding sites in the first exon close to the transcription start site (+202/+374) and in the second intron in front of a CpG island (+2678/+2840). BORIS-mediated induction of SBSN is associated with demethylation of a CpG island in SBSN second intron and changes in histone marks comprising elevation of the active H3K4me3 and H3K14Ac, and downregulation of the repressive H3K9me3 modifications. Importantly, the elevation of SBSN transcript levels mediated by BORIS is dose-dependent. Relatively low BORIS levels were responsible for significantly higher SBSN transcript levels compared to high BORIS levels associated with re-methylation of the second intron of SBSN and increased nucleosome occupancy of SBSN transcription start site. A repressive histone mark H3K9me3 mirrored the SBSN second intron methylation pattern. Interestingly, CTCF and BORIS compete for the same binding sites of the SBSN promoter region, indicating that the epigenetic and chromatin state play essential roles in SBSN expression. Hence, methylation of regulatory sites represents the main feature responsible for the regulation (suppression) of SBSN transcription.
The connection between cell stemness and SBSN expression suggested previously is supported by the identification of SOX2, a stem cell factor commonly upregulated in cancer, as a regulator of SBSN expression in oesophageal squamous cell carcinoma (ESCC). SOX2 binding site (−1566/−1559) at the proximal region of human SBSN was confirmed by chromatin immunoprecipitation (ChIP), and remains the only determining region of SBSN promoter with the validated transcription binding factor. Furthermore, double knockout Klf2 and Klf4 mouse cardiac microvascular endothelial cells showed significantly reduced SBSN transcript levels compared to control mice . We identified 25 putative KLF4 binding sites and 30 putative KLF2 binding sites in the human SBSN promoter region, two most prominent are depicted (KLF4 relative score > 0.9; −250/−239, KLF2 relative score > 0.9; −1993/−1983; Figure 3). These observations strengthen the importance of factors of stemness in SBSN expression regulation.