Today, whole-exome sequencing (WES) is used to conduct the massive screening of structural and regulatory genes in order to identify the allele frequencies of disease-associated polymorphisms in various populations and thus detect pathogenic genetic changes (mutations or polymorphisms) conducive to malfunctional protein sequences. With its extensive capabilities, exome sequencing today allows both the diagnosis of monogenic diseases (MDs) and the examination of seemingly healthy populations to reveal a wide range of potential risks prior to disease manifestation (in the future, exome sequencing may outpace costly and less informative genome sequencing to become the first-line examination technique). This resviearchw establishes the human genetic passport as a new WES-based clinical concept for the identification of new candidate genes, gene variants, and molecular mechanisms in the diagnosis, prediction, and treatment of monogenic, oligogenic, and multifactorial diseases. Various diseases are addressed to demonstrate the extensive potential of WES and consider its advantages as well as disadvantages. Thus, WES can become a general test with a broad spectrum pf applications, including opportunistic screening
1. Human Monogenic Diseases
The OMIM database (as of 1 June 2023) includes entries for 7377 hereditary diseases and syndromes, as well as their molecular associations
[1][27]. These include 6305 phenotypes associated with one single gene, i.e., showing the monogenic nature of a genetic trait or syndrome. This was largely achieved due to the active implementation of WES and the exome consortium
[2][3][4][28,29,30].
2. Population Genetic Researches for Monogenic Diseases
The genetic structure of human populations has been extensively studied worldwide. Nonreference (i.e., non-wild type) allele frequency in s particular population is a most important factor influencing the clinical interpretation of a genetic variant. Genetic variability in many regions of the world is poorly understood despite the very large number of variants (125,748) in the genome aggregation database (gnomAD, version v. 2.1). Wenhao Zhou et al. analyzed the prevalence of cystic fibrosis (CF) using 30,951 WES (20,909 pediatric and 10,042 parent) samples and compared these with those of Caucasians
[5][31]. After filtration, 477 variants of the cystic fibrosis transmembrane regulator (
CFTR) gene were left, and 53 variants were annotated as pathogenic/probably pathogenic (P/LP). The authors used the annotated variants to evaluate the prevalence of CF in China to be 1/128,434. Only 39.6% (21/53) of the variants were used to screen for CF in Caucasians, producing underestimated values for the prevalence of CF in China among children (1/143,171 vs. 1/1,387,395,
p = 5 × 10
−24) and an adult population (1/110,127 versus 1/872,437,
p = 7 × 10
−10). The allele frequencies of six (L88X, M469V, G622D, G970D, D979A, and 1898+5G->T) pathogenic variants were higher in a Chinese population compared with a gnomAD non-Finland European population (all
p < 0.1). Using haplotype analysis, the researchers showed greater diversity in haplotypes in a Chinese population compared to Caucasians. The founder mutations of the Chinese and Caucasians were G970D and F508del, with two SNPs (rs213950–rs1042077) identified as related genotypes in an exon region.
Researchers' investigations did not identify prevalent pathogenic SNPs missing from ClinVar or dbSNP in autosomal recessive disease-causing genes. This indicates that the majority of disease alleles are common for Russian and European populations, at least for disorders with recessive inheritance patterns. These results allowed us to suggest preliminary estimates for the prevalence of monogenic disorders, based on the identified exome variants for the region (
Table 1).
Table 1. MD prevalence in Russia and globally determined by the frequencies of pathogenic SNPs [6]. MD prevalence in Russia and globally determined by the frequencies of pathogenic SNPs [32].
Disease/Condition | Gene | Allele Count | Carrier Frequency (Lower/Upper CI) | Disease Frequency (Lower/Upper CI) | Known Frequency | References |
---|
Retinal dystrophy, Stargardt disease | ABCA4 | 13 (23) | 0.0350 (0.0206/0.0589) | 3.1 × 10−4 (1.1 × 10−4/8.8 × 10−4) | 1 in 10,000
1 in 8000 | [7]
[8] |
Cystic fibrosis | CFTR | 11 (19) | 0.0296 (0.0167/0.0522) | 2.2 × 10−4 (6.9 × 10−5/6.9 × 10−4) | 1 in 10,000
1 in 3000–16,000 | Reported carrier frequency of 0.032 [9]
[10] |
Phenylketonuria | PAH | 11 (18) | 0.0296 (0.0167/0.0522) | 2.2 × 10−4 (6.9 × 10−5/6.9 × 10−4) | 1 in 10,000
1 in 4500 [Italy]–1 in 125,000 [Japan] | Reported carrier frequency of 0.029 [11]
[12] |
Wilson disease | ATP7B | 4 (6) | 0.0108 (0.0042/0.0274) | 2.9 × 10−5 (4.3 × 10−6/1.9 × 10−4) | 1 in 30,000
1 in 30,000 | Similar global incidence reported [13][14] |
Galactosemia | GALT | 4 (5) | 0.0108 (0.0042/0.0274) | 2.9 × 10−5 (4.3 × 10−6/1.9 × 10−4) | 1 in 20,000
1 in 48,000 | Reported carrier frequency of 0.006 [9][15] |