Generalizability of GWA-Identified Genetic Risk Variants: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor:

The Arabian Peninsula, located at the nexus of Africa, Europe, and Asia, was implicated in early human migration. The Arab population is characterized by consanguinity and endogamy leading to inbreeding. Global genome-wide association (GWA) studies on metabolic traits under-represent the Arab population. Replicability of GWA-identified association signals in the Arab population has not been satisfactorily explored. It is important to assess how well GWA-identified findings generalize if their clinical interpretations are to benefit the target population. 

  • risk loci
  • metabolic traits
  • GWAS
  • transferability of risk loci
  • population diversity
  • Arab ancestry

1. Introduction

Over the past few years, a multitude of global genome-wide association (GWA) studies have identified genetic risk variants associated with metabolic traits and related disorders. Efforts to translate GWAS findings into polygenic risk scores (PRS) across populations to decipher their clinical interpretation are gaining momentum [1,2]. Our recent examination of GWAS Catalog [3] against 313 search terms relating to four classes of metabolic traits (namely anthropometry, glycemia, lipid, and blood pressure) found 7668 genetic variants from ~4000 genes associated with metabolic traits [4]; association signals involving 4746 (i.e., 62%) of the 7668 variants were at genome-wide significance ( p -values of ≤5.0 × 10 −8 ). A majority of these studies were performed on populations of European, East Asian, and African ancestries. Arab populations from the Middle East are among the most underrepresented in genetic studies [5,6,7,8]. Factually, 88.65% of GWA studies, summarized in the GWAS Catalog, were Europeans, while only 7.02% were Asians and 4.33% Africans, Latin, and other populations [9] ( www.gwasdiversitymonitor.com , accessed on 21 July 2021). Applicability of clinical translation of such GWA-identified risk loci and PRS to ethnic populations, under-represented in global studies, depends on the generalizability of the underlying association signals to the target populations.

By virtue of being situated between Africa, Europe, and South Asia, the Arabian Peninsula forms an important region in the history of early human migrations and admixtures [5,6,8]. We and other researchers have illustrated that major sources of ancestry forming the modern Arab population are from sub-Saharan/Western Africa and from West Eurasia [8,10,11,12]. The region had several humid periods resulting in a “green Arabia”, which facilitated human dispersals and migrations [8]. The onset of the current desert climate is thought to have started around six thousand years ago [13]. Eventually, the inhabitants of the Peninsula region adapted to the hot and dry environment. The adaptation and natural selection shaped the extant human populations of the Arabian Peninsula region [8,14]; for example, we demonstrated that a haplotype overlapping TNKS showed strong signals of positive selection in the Arab cohort and proposed that this haplotype under selection potentially conferred a fitness advantage to the Kuwaiti ancestors for surviving in the harsh environment while posing a major health risk to present-day Kuwaitis [14].

The Arab population is characterized by unique features such as large families, consanguinity, endogamy, and first-cousin marriages, which have resulted in creation of inbreeding communities. Such inbreeding communities are expected to have increased homozygosity at-risk variants for both monogenic and polygenic diseases as well as an accumulation of deleterious recessive alleles in the gene pool; our previous GWA study under a genetic model based on the recessive mode of inheritance pinpointed 16 novel risk variants associated with plasma TG levels in Arab individuals from Kuwait [15,16]. Familial aggregation of hypercholesterolemia [17], type 2 diabetes [18,19,20,21], and type 1 diabetes [22] is prominent among Arabs. Further, the exceptional growth in prosperity in the Arabian Peninsula during the rich post-oil era brought rapid changes in lifestyles (such as urbanization, dietary changes, low levels of physical activity, and high levels of sedentary behavior) leading to chronic metabolic disorders [23]. These rapid lifestyle changes are expected to have an impact on gene–environment interactions; several diet–genetics–disease relationships in the region have been discussed as contributing to the increased prevalence of metabolism disorders and micronutrient deficiencies [24]. Furthermore, the Arab populations appear to have a higher genetic risk for metabolic disorders such as diabetes—for example, a study on Arab immigrants in the USA found that they had a higher risk of type 2 diabetes than native inhabitants [25]. Another study of Middle Eastern immigrants in Sweden found that the immigrants had a two- to threefold higher risk of type 2 diabetes than native Swedes [26]. We have discussed, in our earlier publication [27], that combinations of such lifestyle changes, gene–environment interactions, and genetic predispositions have probably led to the dramatic increase in the prevalence of obesity, diabetes, and dyslipidemia in Arabs.

Replicability of GWA-identified association signals for metabolic traits by global studies to Arab population has not been explored to satisfaction. It is important to assess how well the GWA-identified risk loci generalizes, if a target population is to benefit from clinical interpretation of global GWA findings.

2. GWA Studies for Metabolic Traits on Arab Populations

A literature review by us in 2019 [27] reported that only 25 GWA-identified risk loci for metabolic traits have been replicated, largely by targeted genotyping studies, in Arab populations. Our recent genome-wide imputation and meta-analysis study from Kuwait [4] used a cohort of 2732 Arab individuals and observed that association signals involving only 304 (6.4%) of the 4746 metabolic risk variants identified at genome-wide significance in global GWA studies were replicable in the Kuwaiti cohort. These 304 variants are from 151 distinct genes ( Supplementary Table S1 ). The GWA studies observed 178 of these 304 GWA-identified risk variants in more than one population. The GWA study cohorts for these 304 variants were largely of European ancestry (i.e., in 260 of the 304 variants). Many of these transferable GWA-identified signals were observed in the Kuwaiti cohort at borderline significance, suggestive of association. In the same study [4], we further performed power calculations that considered effect sizes of GWA-identified risk variants and their allele frequencies in the Kuwaiti cohorts and projected a sample size of at least 10,000 to observe these 304 GWA-identified association signals for metabolic traits at genome-wide significance in the Arab population. The study further projected a necessary sample size of 20,000 in order to observe the other GWA-identified association signals in the Arab population.

In a recent study, Thareja et al. [28] performed genome-wide association tests to delineate risk variants for 45 clinically relevant traits using a discovery set of whole-genome sequences of 6218 Qatari individuals. The examined traits included two (namely anthropometry and lipid) of the four classes of metabolic traits examined in our study [4]. Though Thareja et al. used a large sample size of 6218, nearing two-thirds of the 10,000 projected by our study, they observed only four GWA-identified association signals relating to anthropometric traits and 26 GWA-identified association signals relating to lipid traits at genome-wide significance. One of these four anthropometric trait association signals and 22 of the 26 lipid trait association signals were observed in our study [4] ( Table 1 ). These 23 association signals for lipid traits comprised 18 distinct variants from 15 genes ( Table 2 ); 10 of these 18 distinct variants are “low-frequency” (MAF < 5%) variants in one of the examined populations while “common” (MAF > 5%) in other populations.

3. Generalizability of GWA-Identified Association Signals in Arab Populations

The results presented by the above-mentioned two studies [4,28] from Kuwait and Qatar indicate that the assessment of generalizability of GWA-identified association signals in the Arab population is still an “open” question. Though it is possible that the limited sample sizes and differences in study designs may contribute to the observed low extent of transferability, the role of differences in factors such as phenotypic variance due to unique environmental conditions, allele frequencies, and linkage disequilibrium profiles cannot be ruled out. Thareja et al. [28], by way of using variants with minor allele frequency (MAF) > 1%, derived reasonable heritability (h 2) values for obesity traits (height = 0.59; BMI = 0.31) and lipid traits (TC = 0.22; HDL-C = 0.41; LDL-C = 0.21; TG = 0.31) in the Qatari cohort. Further, they demonstrated a high overall correlation in heritability with European (r 2 = 0.81) populations compared to a low, yet reasonable, correlation with African (r 2 = 0.44) populations, suggesting that much of the association signals seen in Europeans are transferable to Arabs. However, the heritability values for obesity and lipid traits, when individually examined, were significantly lower in the Qatari cohort compared to Europeans, suggesting that much of the heritability of obesity and lipid traits is still not explained by the study. Since a great proportion of phenotypic variance for complex traits is contributed by rare variants (MAF < 1%) [29], an effective study of heritability requires a further large cohort. These variations in heritability also warrant the need for more Mendelian Randomization studies to pinpoint the environmental factors causally linked to trait associations.

In our study from Kuwait [4], we observed that only those GWA-identified variants with larger effect sizes replicate well in the Arab population; failure to replicate the variants with small effect sizes could be due to the modest sizes of our study cohorts. Thareja et al. [28] found significant differences in both effect size and allele frequency of variants associated with replicated risk loci and emphasized the need for further large GWAS to determine accurate PRS in the Arab population. Complex metabolic disorders are influenced by multiple common genetic variants with small effect size; hence, meaningful polygenic risk scores (PRS) are derived by inspecting the cumulative effect of multiple variants. Such multiple genetic variants used to build PRS can differ in allele frequencies across populations due to reasons such as natural selection and population expansion leading to adaptation to local environmental factors. A recent study from Iran [30] found multiple T2D-risk SNPs that were significantly depleted or enriched in at least one of the five populations of the 1000 Genome Project (African, American, East Asian, European, and South Asian) as well as the Iranian population. They further found that a PRS built using the enriched risk alleles in Iran was significantly associated with type 2 diabetes incidence in their longitudinal cohort study. The global GWA studies are highly Eurocentric. As a result, PRS developed using risk variants identified through such global studies do not predict individual risk accurately in non-Europeans [31]. To realize the full and equitable potential of PRS in ethnic populations such as Arabs, there is a need to prioritize greater diversity in global genetic studies.

Differences in linkage disequilibrium (LD) patterns among populations appear to play a role in predictive differences [32,33]. A GWA variant strongly associated with a trait in one population may not have a detectable association in another, as the LD with the (unknown) causal variant may be much weaker [34]. LD for each Middle East population decayed faster than European and East Asian populations but slower than African populations [35]. The study by Thareja et al. [28] showed marked differences in linkage disequilibrium and allele frequencies among the European, East Asian, and Qatari populations. We found in our earlier study that, though the LD decay patterns seem to exhibit similar rates across the populations, the conservation values are different at any given distance—the population subgroups from Kuwait showed lower conservation values than the European French population [12].

Recently, an interesting framework of the omnigenic model has been proposed [36,37] to explain the observed low transferability of polygenic scores and the variations in effect sizes across populations. The model explains how the interaction network comprising ‘core’ genes of GWAS findings and ‘peripheral’ (to the core) genes (participating in the pathway) ultimately leads to causality of phenotype through gene × environment interactions. The Arab population went through ‘rapid’ lifestyle changes in the post-oil era. Further, the two populations differ considerably in climate conditions. Even with consistency in effect sizes between European and Arab populations, the effect of ‘core’ genes on phenotype via the ‘peripheral’ gene network can differ because of differences in gene × environment interactions; thus, the predictive power of polygenic scores can differ substantially across these two population groups. On the other hand, heterogeneity in effect size (or even direction) at transferable GWA loci to the Arab population could be due to differences in LD structure and allele frequency. Often, the direct estimates of genetic correlations of cross-populations are less than one. Although the difference in the contribution of ‘core’ genes to the loss of variance at PRS level is small, much of the variance loss is likely due to differences in LD, allele frequency, and causal effect by gene × environment of ‘peripheral’ genes [36]. Hence, the predictive power of polygenic risk scores decreases more severely than what would be expected for given differences in allele frequency and LD structure at ‘core’ genes alone.

This entry is adapted from the peer-reviewed paper 10.3390/genes12101637

This entry is offline, you can click here to edit this entry!
Video Production Service