2. Major Histocompatibility Complex (HLA) Locus: Population Aspects
The HLA system area is one of the most complex and polymorphic regions in the human genome and consists of more than 200 genes located within chromosome 6, p-arm at 21.3. The structure of these genes determines the individual profile and affinity of T cell receptors, which affect the functioning of the immune system. There are three classes of
HLA genes, namely I, II, and III, differing in their functions. HLA class I molecules, in the form of transmembrane glycoproteins, are found on the surface of all nucleated cells.
HLA genes class II are located in B lymphocytes, macrophages, dendritic cells, Langerhans islet cells, and thymic epithelial cells. The HLA class III region encodes some molecules important in inflammation including complement components C2, C4, and factor B; tumor necrosis factor (TNF)-alpha; lymphotoxin; and three heat shock proteins
[17].
Genetic predisposition to the risk of developing T1DM is determined by HLA class II
[18][19]. The region of HLA class II genes has a complex structure; it contains three loci, named
DR,
DQ, and
DP; each of them includes a variable number of α- and β-chain genes. The
HLA-DRB is the most polymorphic locus, which, in turn, consists of the
HLA-DRB1 gene and may also include the following genes, with dependence on the 13 gene haplotypes:
DRB3,
HLA-DRB4,
HLA-DRB5 and pseudogenes
HLA-DRB2,
HLA-DRB6,
HLA-DRB7,
HLA-DRB8, and
HLA-DRB9 [20].
The following three genes are of the greatest importance in clinical practice:
DRB1, containing more than 400 allelic variants;
DQA1, consisting of 25 allelic variants; and
DQB1, which has 57 allelic variants. There are pronounced population differences in the frequency and spectrum of
HLA haplotypes between world populations, as well as between European populations living in different regions of Europe. For example, in most European populations, the most common
HLA haplotype is
A*01-B*08; among the Finns, that is
A*03-B*35; in populations of southeastern Europe, that is
A*02-B*51 [21].
Most
HLA risk haplotypes have been identified in cohort studies of European ancestry, and thus they may differ somewhat from those in populations of Asian and African ancestry. In particular, a comparative analysis by Harrison et al., 2020, revealed that the
DR3-DR3 variant in Indians was more significantly associated with T1DM compared to Europeans (odds ratio OR = 148.8 versus OR = 16.9 in Europeans); and, on the contrary,
DR4-DR4 was not associated with a high T1DM risk in Indians
[22].
The results of next-generation sequencing (NGS) based on DNA sampling of T1DM patients from the Laboratory of Diabetes and Inflammation (JDRF/Wellcome Trust) and control sampling from the British Cohort of 1958 Births showed that patients with the alleles
DRB1*03:01 and
DRB3*02:02 were at an independent risk of developing T1DM compared with carriers of the allele
DRB3*01:01 [23]. Haplotypes
DRB1*03:01-
DRB3*02:02 have a high risk of developing T1DM (OR = 25.5, 95% CI 3.43–189.2)
[23]. Another study of people of European ancestry was performed by the scientific group of Zhao, 2016. Herein, patients participated in the nationwide study “Swedish Better Diabetes Diagnosis” and were aged from 9 months to 18 years. Researchers found that among 25 alleles of the
HLA-DRB1 genes, only 4 were associated with T1DM. They were presented by
DRB1*03:01:01,
DRB1*04:01:01,
DRB1*04:04:01 and
DRB1*04:05:01. Moreover, for the
DRB4 gene, a variant
DRB4*01:03:01 was associated with T1DM, while for the variant
DRB4*
01:01:01, no association was identified
[24].
Some
HLA-DQA1 variants, such as
DQA1*02:01, which is protective for T1DM in European populations
[5][25], are among the most common alleles in Brazilian cohorts
[26]. Gomes et al., 2023, found that the risk of T1DM developing in Brazilians may be determined by risk haplotypes, which are typical to European populations; but, if the subjects were of African origin, the same risk variants had a protective effect
[27].
Ethnic ancestry history affects patterns of genetic drift and selection around the world, exerting the risk of developing multifactorial diseases. Allele frequencies, including those in genomic regions, which influence the risk of T1DM, are generated by evolutionary history
[28]. In particular, it is well known that genetic diversity is predominant in African populations, but due to the emphasis of modern DNA testing on
HLA risk haplotypes specific to European populations, there is a high potential for the misidentification of T1DM risk in populations of non-European ancestry. For example, the
HLA-DRB1*04:03 genotype, which is quite common among East Asians and Hispano-Americans, is protective and counteracts the high-risk alleles
HLA-DQ8 and
HLA-DQ2 attributable to Europeans. Haplotypes
HLA-DR-DQ, which occur at low frequency in Europeans, are associated with the risk of T1DM in populations from Africa and the Middle East
[28].
Such differences raise the question of whether population-specific genetic associations, if they are secondary to environmental factors, may depend on the geographical location of peoples. Data indicate that the frequency of
HLA alleles and haplotypes in European populations was generated by strong selection pressures, including the medieval bubonic plague epidemic as one of the most significant factors. In modern European populations, the
HLA-DRB1*13 variant is revealed more than twice as often, while the
HLA-B alleles encoding isoleucine at position
80 (I-80+), and
HLA C*06:02 and
HLA-DPB1 alleles, encoding histidine at position 9, are found twice as rarely when compared to people from burials of the chronological period of the plague epidemic. Thus, significant shifts in
HLA allele frequencies may indicate natural selection on resistance to a specific pathogen
[29].
Modern research notes that over the past 50 years, the frequency distribution of HLA genotypes associated with T1DM has been significantly changed
[30][31], which may indicate a shift in the processes of evolutionary selection and an increase in environmental pressure contributing to higher penetrance of the disease
[32].
Thus, the investigation of genetic diversity among the different peoples of the world in terms of the prevalence of certain HLA haplotypes is of paramount importance for assessing the risk of developing T1DM.
3. Candidate Gene Research
The candidate gene approach allowed the identification of several genes whose changes were associated with T1DM. In particular, Bottini et al., 2004, found that the rs2476601 polymorphism in the
PTPN22 gene, encoding the lymphoid protein tyrosine phosphatase (LYP), was associated with T1DM in non-Hispanic North American Europeans; the rs2476601 polymorphism impairs the LYP–CSK complex formation, whose biological function is to inhibit T cell activation
[33]. This polymorphic variant was investigated by Russian researchers as well. Ivanova et al., 2013 conducted a search on associations of the rs2476601 polymorphism with T1DM in Bashkirs, Yakuts, Buryats, Udmurts, and Russians and found that the 1858T variant of the
PTPN22 gene was associated with T1DM in the Udmurt, Russian, and Bashkir populations, while no such pattern was found in the Yakuts and Buryats
[34].
Gene mapping has shown that
CTLA-4, as well as its adjoining genes, may be involved in susceptibility to T1DM
[35]. Kavvoura et al., 2005, using the MEDLINE and EMBASE databases, which contain genotyping information for 5637 people with T1DM and 6759 healthy control people, identified the A49G mutation of the
CTLA-4 gene in individuals with T1DM. Although the G allele is more common in Asian populations compared to Europeans, the risk effect associated with the presence of the G allele proved to be independent of race and ethnicity
[36].
In 2007, Lowe et al., 2007, based on resequencing of the interleukin 2 alpha receptor gene
IL2RA, identified an association between two independent groups of SNPs covering regions of 14 and 40 kilobase pairs, including intron 1 of the
IL2RA gene and the 5′-regions of the
IL2RA and
RBM17 genes in individuals with T1DM. Among them, rs11594656 was associated with lower circulating levels of
IL-2RA (
p = 6.28 × 10
−28). The authors suggested that a genetically determined low immune response predisposes to T1DM
[37].
Thus, multi-year research into candidate T1DM genes made it possible to identify several significant loci of the examined genes, which were reproduced by several researcher groups.
4. Multicenter and Genome-Wide Association Studies (GWAS)
Genome-wide association studies (GWAS), as well as a number of large multicenter studies, have significantly expanded knowledge about the genetic basis of T1DM; they have identified about 70 highly significant risk single nucleotide polymorphisms (SNPs) that are not localized in HLA genes.
Over the past 15 years, such work has been carried out by large national and international collaborations, including the Type 1 Diabetes Genetics Consortium (T1DGC)
[37][38], The Environmental Determinants of Diabetes in the Young (TEDDY)
[39], Diabetes Autoimmunity Study in the Young (DAISY)
[40], Diabetes in the Newborn Study (BABYDIAB)
[41], Wellcome Trust Case Control Consortium (WTCCC)
[42], Diabetes Prevention of Type 1 (DPT-1)
[5], “International TrialNet Research Network”
[43][44], “Finnish Diabetic Nephropathy Study” (FinnDiane)
[31], “Multinational European Project on Latent Autoimmune Diabetes in Adults” (Action LADA)
[45], “Multicentre Clinical study of the Vienna Eurodiab Center”
[46], etc. These studies resulted in the accumulation of an enormous amount of diverse data, much of which remain to be validated, since the ethnic component and involving different age groups might bring about rather contradictory results.
In particular, the WTCCC has identified relatively few new T1DM risk markers. Among them are the following gene variants:
ERBB3 (epidermal growth factor receptor),
SH2B3 (adapter protein), and
CLEC16A (tyrosine phosphatase)
[42]. A genome-wide association study and further meta-analysis by Barrett et al., 2009, based on a sampling of 7514 cases of T1DM and a control group of 9045 healthy individuals showed an association of more than 40 SNPs (
p < 10
−6) with type 1 diabetes. After excluding loci previously associated with T1DM, the other 27 loci were further examined in an independent sampling of 4267 cases of T1DM, 4463 healthy control individuals, and 2319 siblings. In GWAS replication, more than 15 loci retained a statistically significant association with T1DM (
p < 0.01; overall
p < 5 × 10
−8). The most significant SNPs are localized in the interleukin genes
IL10,
IL19,
IL20, and
IL27, as well as in the transcription factor
GLIS3 gene and the cytokine
CD69 gene
[47]. Mutations in the
GLIS3 gene were identified in children from three different consanguineous families with neonatal diabetes, concomitant congenital hypothyroidism, and other clinical complications
[48]. The 12p13.31 region contains a number of immunoregulatory genes, including
CD69, which is induced by T cell activation and functions in the egress of cells from the thymus; they belong to members of the family of the calcium-dependent (C-type) lectin (CLEC) domain with immune functions. The authors concluded that the relative risks for non-HLA loci were reduced in carriers of risky
HLA haplotypes, which confirms the polygenic and genetically heterogenic structure of T1DM
[49].
Despite the fact that most polymorphic variants associated with T1DM are localized in non-coding regions, it is the coding regions of DNA that are of significant interest, since they not only affect gene expression in the pancreas but also significantly change the structure of signaling proteins in immune-competent cells. Onengut-Gumuscu et al., 2015, identified coding variants associated with T1DM in seven genes including
PTPN22 (tyrosine phosphatase, previously identified in a candidate gene study),
IFIH1 (receptor of RIG-I-like receptor group),
SH2B3 (adapter protein),
CD226,
TYK2 (Tyrosine kinase 2),
FUT2 (Galactoside-2-alpha-L-fucosyltransferase 2), and
SIRPG (Signal regulatory protein gamma)
[50].
In addition, SNPs were identified, which overlap potential enhancers next to the genes
CTLA4 (Cytotoxic T-lymphocyte glycoprotein),
CTSH (Cathepsin H), and
UBASH3A (Ubiquitin-associated protein A containing the SH3 domain)
[50]. The authors emphasized that most markers located in enhancer sequences actively affected gene expression in thymus cells and T and B cells, as well as CD34+ stem cells. According to their preliminary inferences, enhancer–promoter interactions can now be analyzed in these cell types to determine which genes and regulatory sequences are causative, namely, determining the first links in the pathogenesis of type 1 diabetes.
It is worth noting that the rs2476601 loci of the
PTPN22 gene and rs11203203 of the
UBASH3A gene are associated with the emergence of autoimmunity to pancreatic β-cells, while polymorphic variants of the
INS,
UBASH3A, and
IFIH1 genes are associated with the transition from an autoimmune reaction against pancreatic β-cells to the development of clinical diabetes. Similar results were obtained in participants in the TEDDY study, where carriers of high-risk
HLA haplotypes and four risk polymorphic variants including rs2476601 in
PTPN22, rs2292239 in
ERBB3, rs3184504 in
SH2B3, and rs1004446 in
INS exhibited a significant association with the development of an autoimmune reaction to β-cells of pancreatic gland islets
[51]. The Finnish Childhood Diabetes Registry revealed that the
DR3-DQ2/DR4/DQ8 genotype affected the production of islet β-cell autoantibodies but not the subsequent development of T1DM
[52].
Genome-wide association studies and their meta-analysis, as well as their replication and bioinformatics processing, have highlighted an enormous number of previously unknown DNA markers, which confirmed the complex heterogeneous and polygenic structure of T1DM. Currently, it is a relevant issue to assess the contribution of the enormous number of variants with a small risk effect (odds ratio (OR)), located throughout the genome, in the heterogeneity of the disease and clinical outcomes of T1DM.
5. The Polygenic Risk Score in Individuals with Type 1 Diabetes Mellitus
One of the promising methods of bioinformatics analysis, which helps to assess the hereditarily determined risk of multifactorial diseases, is the polygenic risk score (PRS), which implies calculating an individual susceptibility coefficient to a specific phenotype of disease on the basis of the analysis of a large number of polymorphic variants. An obvious application of PRS in the diagnosis of T1DM is assessing the risk contribution of the complex combined effects of
HLA risk haplotypes jointly with other DNA markers. The assessment is performed on the basis of a polygenic score, which is calculated using a weighted sum of individual risk alleles significantly associated with the trait
[53].
Winkler et al., 2014, using multivariate logistic regression of data (sampling N = 5781) from the Type 1 Diabetes Genetics Consortium (T1DGC), found that the additional inclusion in the model of 40 SNPs of genes, which are not located in the
HLA locus, significantly improved the prediction of T1DM, compared with models that include only
HLA alleles and haplotypes. On the basis of the method of including and excluding genetic predictors, the authors selected a model with optimal predictive value which involved the following genes:
HLA,
PTPN22,
INS,
IL2RA,
ERBB3,
ORMDL3,
BACH2,
IL27,
GLIS3, and
RNLS (AUC = 0.86, 95% CI 0.84–0.88)
[54].
In light of the results described above, DNA screening for identifying groups with a high risk of T1DM has become important. Screening for T1DM using a variety of diagnostic tests at multiple time points during the first years of a person’s life is feasible but quite expensive. This problem could be solved with DNA analysis, which does not depend on age and environmental factors and does not change over time, so the introduction of PRS disease prediction models in newborn screening could identify individuals at high risk for broader monitoring and prevention of severe consequences of T1DM.
6. Functional Role of DNA Risk Loci Related to Developing T1DM in Signaling Pathway Changes
Once DNA loci have been identified, the next step should be directed to the investigation of their role in the functional and molecular alterations in cell signaling pathways that bring about T1DM development. In the study by Shapiro et al., 2021, the researchers suggested that the dysfunction of FUT2 (galactoside-2-alpha-L-fucosyltransferase 2), due to gene mutation, leads to a lack of secretion of the ABO blood group antigen through the intestinal mucosa. That, in turn, may cause an impairment in the immune barrier of intestinal epithelial cells, which results in increased susceptibility to certain viral infections, as well as in changes in the composition of the microbiome and microbial metabolites, especially short-chain fatty acids. The expression of the ABO antigen in the intestinal mucosa affects the binding of exogenous pathogens and commensal microbiota. The fecal microbiota of individuals with the rs601338*A/A variant was found to contain, on average, fewer probiotic bifidobacteria, which are capable of producing immunoregulatory short-chain fatty acids and promoting intestinal barrier integrity, which is critical for preventing commensal-induced autoimmunity
[55]. The rs601338 risk allele is associated with a sharp deterioration in the first phase of insulin response in children with multiple autoantibodies at T1DM
[55]. This fact could explain the relationship between patient age and secretory status at the time of diagnosis. Examinations indicated that therapy, positively affecting FUT2, was likely to be required at an early age for patients with a specific genotype
[56][57].
In the context of the pathogenesis of T1DM, TYK2 (a member of the JAK family) enhances antigen presentation by stimulating the expression of
HLA gene class I and promotes the expression of the chemokine CXCL10, which causes the activation of T cells and their recruitment toward the pancreatic islets, thereby increasing the risk of developing the autoimmune process
[58]. Moreover, this effect may be significantly complicated by the biological activity of Cathepsin H (CTSH). Shapiro et al., 2021, made a number of assumptions regarding the importance of this molecule. The fact is that CTSH is a lysosomal proteinase, which plays a role in protein recycling, prohormone processing, and HLA II antigen presentation, as well as it may antagonize CXCL10.
Even as CTSH is expressed ubiquitously, its representation is most pronounced in type II lung alveolar cells during the maturation of surfactant protein
[59]. Allele C of the rs2289702 locus in exon 1 of the
CTSH gene turned out to be protective at T1DM; it can affect the cleavage of cathepsin to its active form and its delivery to lysosomes
[60]. The T allele of this locus is associated with the prevention of an early onset of T1DM, especially in patients younger than seven years
[61]. One possible explanation for this finding is that decreased
CTSH gene expression may reduce the N-terminal cleavage of Toll-like receptor 3 (TLR3), impairing TLR3 functionality and dropping TI-IFN expression in response to viral infections in early childhood
[61]. However, insight into the mechanism of the relationship between
CTSH expression and the risk of developing T1DM gets further complicated by the report of
CTSH overexpression, which induces intrinsic β-cell protection from cytokine-mediated damage and the stimulation of insulin production
[62]. Functional examinations support the relevancy of continuing investigations of CTSH modulation as a potential means of preventing T1DM with specific attention to the off-target effects of targeted therapy toward this protein expression in the treatment of T1DM.
The polymorphic variant rs2476601 in exon 14 of the
PTPN22 gene leads to the replacement of arginine with tryptophan at position 620 and is one of the loci most significantly associated with T1DM, being second in importance only to the
HLA and
INS variants. The gene encodes non-receptor lymphoid tyrosine phosphatase type 22 (LYP), which is responsible for dephosphorylation of signaling proteins. LYP is one of the most powerful inhibitors of T cell activation. The substitution affects the interaction between the LYP proline-rich motif and CSK tyrosine kinase, causing the impairment of signal transduction modulation. The study indicates that the mutation is associated with the synthesis of autoantibodies to insulin, which manifests itself more rapidly in children carrying high-risk
HLA haplotypes or in first-degree relatives with type 1 diabetes
[63]. Nevertheless, the role of rs2476601 in enhancing T cell activity remains controversial, since there is uncertainty in understanding how this variant affects the functional activity of tyrosine phosphatase
[64]. However, it is reliably known that the rs2476601 polymorphism impairs the interaction between LCK and LYP, which is accompanied by a decrease in LYP phosphorylation and, ultimately, contributes to the inhibition of gain in the function of T cell activation
[65]. It is believed that a gain in LYP activity may be a predisposition to autoimmunity through the decreased activation of regulatory T cells, which are required to suppress autoreactivity
[66]. When regarding a potential complete loss of tyrosine phosphatase activity, it implies impairment of the signaling apparatus of T cell receptors, which, in turn, leads to the less effective dephosphorylation of signaling proteins and increased activation of effector T cells
[67].
T cell ubiquitin-1 ligand UBASH3A reduces T cell receptor signaling. Currently, most T1DM-associated variants of the
UBASH3A gene are intronic. Meanwhile, it is known that UBASH3A regulates the NF-κB signaling pathway through a ubiquitin-dependent mechanism and that the risk alleles rs11203203 and rs80054410, associated with T1DM, increase the expression of the
UBASH3A gene in primary human CD4+ T cells upon the stimulation of T cell receptors, which results in reducing NF-κB signaling through the IκB kinase complex and diminishing
IL2 gene expression
[68]. Suomi et al., 2023, published a summary of the results of transcriptomic profiling of whole blood samples from patients with T1DM as part of the INNODIA study. They found that the expression of some genes and the activity levels of signaling pathways involved in innate immunity were reduced during the first year after diagnosis. A significant change in gene expression was associated with positive ZnT8A autoantibody status
[69].
On the other hand,
SIRPG,
STXBP1, and
UBASH3A genes had an inverse correlation with positive ZnT8A autoantibody status. SNPs, associated with T1DM, near the
SIRPG gene have been shown to modulate disease risk by controlling the alternative splicing of the gene. It encodes syntaxin binding protein 1, which regulates the docking and fusion of vesicles with the plasma membrane during exocytosis. STXBP1 is important for the cytotoxic activity of CD8+ T cells and NK cells. The
UBASH3A genetic variant is associated with the development of T1DM in children from the DAISY and BABYDIAB cohorts. Type 1 diabetes-associated variants of the human
UBASH3A gene caused higher levels of gene expression and decreased NF-κB signaling and
IL2 expression in CD4+ T cells
[69].