Human Exome Sequencing and Prospects for Predictive Medicine: Comparison
Please note this is a comparison between Version 1 by Alexander N Chernov and Version 2 by Catherine Yang.

 Today, whole-exome sequencing (WES) is used to conduct the massive screening of structural and regulatory genes in order to identify the allele frequencies of disease-associated polymorphisms in various populations and thus detect pathogenic genetic changes (mutations or polymorphisms) conducive to malfunctional protein sequences. With its extensive capabilities, exome sequencing today allows both the diagnosis of monogenic diseases (MDs) and the examination of seemingly healthy populations to reveal a wide range of potential risks prior to disease manifestation (in the future, exome sequencing may outpace costly and less informative genome sequencing to become the first-line examination technique). This resviearchw establishes the human genetic passport as a new WES-based clinical concept for the identification of new candidate genes, gene variants, and molecular mechanisms in the diagnosis, prediction, and treatment of monogenic, oligogenic, and multifactorial diseases. Various diseases are addressed to demonstrate the extensive potential of WES and consider its advantages as well as disadvantages. Thus, WES can become a general test with a broad spectrum pf applications, including opportunistic screening

  • human monogenic diseases
  • Cystic fibrosis
  • Phenylketonuria
  •  Wilson disease
  •  Galactosemia
  • whole-exome sequencing

1. Human Monogenic Diseases

The OMIM database (as of 1 June 2023) includes entries for 7377 hereditary diseases and syndromes, as well as their molecular associations [1][27]. These include 6305 phenotypes associated with one single gene, i.e., showing the monogenic nature of a genetic trait or syndrome. This was largely achieved due to the active implementation of WES and the exome consortium [2][3][4][28,29,30].

2. Population Genetic Researches for Monogenic Diseases

The genetic structure of human populations has been extensively studied worldwide. Nonreference (i.e., non-wild type) allele frequency in s particular population is a most important factor influencing the clinical interpretation of a genetic variant. Genetic variability in many regions of the world is poorly understood despite the very large number of variants (125,748) in the genome aggregation database (gnomAD, version v. 2.1). Wenhao Zhou et al. analyzed the prevalence of cystic fibrosis (CF) using 30,951 WES (20,909 pediatric and 10,042 parent) samples and compared these with those of Caucasians [5][31]. After filtration, 477 variants of the cystic fibrosis transmembrane regulator (CFTR) gene were left, and 53 variants were annotated as pathogenic/probably pathogenic (P/LP). The authors used the annotated variants to evaluate the prevalence of CF in China to be 1/128,434. Only 39.6% (21/53) of the variants were used to screen for CF in Caucasians, producing underestimated values for the prevalence of CF in China among children (1/143,171 vs. 1/1,387,395, p  =  5 × 10−24) and an adult population (1/110,127 versus 1/872,437, p  =  7 × 10−10). The allele frequencies of six (L88X, M469V, G622D, G970D, D979A, and 1898+5G->T) pathogenic variants were higher in a Chinese population compared with a gnomAD non-Finland European population (all p <  0.1). Using haplotype analysis, the researchers showed greater diversity in haplotypes in a Chinese population compared to Caucasians. The founder mutations of the Chinese and Caucasians were G970D and F508del, with two SNPs (rs213950–rs1042077) identified as related genotypes in an exon region.
Researchers' investigations did not identify prevalent pathogenic SNPs missing from ClinVar or dbSNP in autosomal recessive disease-causing genes. This indicates that the majority of disease alleles are common for Russian and European populations, at least for disorders with recessive inheritance patterns. These results allowed us to suggest preliminary estimates for the prevalence of monogenic disorders, based on the identified exome variants for the region (Table 1).
Table 1. MD prevalence in Russia and globally determined by the frequencies of pathogenic SNPs [6].
MD prevalence in Russia and globally determined by the frequencies of pathogenic SNPs [32].
 
Disease/ConditionGeneAllele CountCarrier Frequency (Lower/Upper CI)Disease Frequency (Lower/Upper CI)Known FrequencyReferences
Retinal dystrophy, Stargardt diseaseABCA413 (23)0.0350 (0.0206/0.0589)3.1 × 10−4 (1.1 × 10−4/8.8 × 10−4)1 in 10,000

1 in 8000
[7]

[8]
Cystic fibrosisCFTR11 (19)0.0296 (0.0167/0.0522)2.2 × 10−4 (6.9 × 10−5/6.9 × 10−4)1 in 10,000

1 in 3000–16,000
Reported carrier frequency of 0.032 [9]

[10]
PhenylketonuriaPAH11 (18)0.0296 (0.0167/0.0522)2.2 × 10−4 (6.9 × 10−5/6.9 × 10−4)1 in 10,000

1 in 4500 [Italy]–1 in 125,000 [Japan]
Reported carrier frequency of 0.029 [11]

[12]
Wilson diseaseATP7B4 (6)0.0108 (0.0042/0.0274)2.9 × 10−5 (4.3 × 10−6/1.9 × 10−4)1 in 30,000

1 in 30,000
Similar global incidence reported [13][14]
GalactosemiaGALT4 (5)0.0108 (0.0042/0.0274)2.9 × 10−5 (4.3 × 10−6/1.9 × 10−4)1 in 20,000

1 in 48,000
Reported carrier frequency of 0.006 [9][15]
Abbreviations: ABCA4, ATP binding cassette subfamily A member 4; ATP7B, P-type cation transport ATPase family 7B; CFTR, cystic fibrosis transmembrane regulator; and GALT, galactose-1-phosphate uridylyltransferase.
Abbreviations: ABCA4, ATP binding cassette subfamily A member 4; ATP7B, P-type cation transport ATPase family 7B; CFTR, cystic fibrosis transmembrane regulator; and GALT, galactose-1-phosphate uridylyltransferase.
Although the small sample size does not allow us to reliably determine the extent of discordance, researchers' results for CF and phenylketonuria are consistent with estimates for these genes [6][32]. Remarkably, researchers' findings show that, in Northwest Russia, Stargardt disease is more prevalent than cystic fibrosis, as has been the belief [6][32].
The research also looks into pathogenic variants for a number of human diseases. The results are formulated in Table 1, showing the diseases with the highest prevalence: Stargardt disease caused by mutations in the ABCA4 (MIM#601691) gene, which has also been previously reported [7][33]. Researchers' results are concordant with earlier large-scale research into the incidences of pathogenic alleles associated with cystic fibrosis in a non-Finnish European population [16][42]. Researchers' estimates of CF, phenylketonuria, and galactosemia prevalence were concordant with those of other genetic studies [6][17][32,43].
Thus, researchers' results indicate the need to create genetic population databases for the interpretation of variants and the identification of disease risk factors.

3. WES Application to Identify New Variants in the Genomes of Patients

WES allows for the identification of new gene variants in patients with MDs. Doctors Daniel Trujillano, Rami Abou Jamra, et al., using WES, sequenced 2819 samples of 1000 patients from 54 countries with a wide phenotypic spectrum. Overall, they determined 320 pathogenic (P) or likely pathogenic (LP) and 303 unique variants from 1000 patients undergoing clinical WES, 307 (30.7%) of which had a positive gene finding. In addition, other findings included ethylmalonic encephalopathy (ETHE1), Niemann–Pick disease type C2 (NPC2), Temtamy syndrome, pyruvate dehydrogenase E1-alpha deficiency (PDHA1), galactosemia (GALT), propionic acidemia (PCCA), homocystinuria (CBS), CF, long QT syndrome, and polycystic kidney disease. This justifies the idea that highly heterogeneous pathologies can be effectively detected using WES. Among other findings, new genes were detected, such as non-receptor protein tyrosine phosphatase type 23 (PTPN23) associated with brain developmental delay and atrophy, potassium channel tetramerization domain containing 3 (KCTD3) causing severe intellectual disability and seizures, alpha three subunit of sodium voltage-gated channel (SCN3A) associated with autosomal dominant encephalopathy, protoporphyrinogen oxidase (PPOX) causing variegate porphyria and developmental delay, and FERM and PDZ domain-containing 4 protein (FRMPD4) implicated in X-linked intellectual disability as well as recessive Dravet syndrome. The total WES diagnostic rate stands at 31% [18][44]. In another study, Joanne Trinh et al. sequenced 26,119 exome samples from 4351 patients with neurodevelopmental disorders (NDDs), such as global developmental and motor delay, macrocephaly, microcephaly, seizures, and delayed speech and language development. Researchers determined 65 rare variants in 14 genes. The 14 detected variants were classified as P or LP and included cyclin dependent kinase 13 (CDK13), chromodomain helicase DNA binding protein 4 (CHD4), potassium voltage-gated channel subfamily Q member 3 (KCNQ3), lysine methyltransferase 5B (KMT5B), transcription factor 20 (TCF20), and C2H2-type zinc finger protein (ZBTB18). The 51 remaining variants (78%) belonged to the VUS category. Two of the patients had multiple molecular diagnoses, including P/LP variants in forkhead box G1 transcription factor (FOXG1), CDK13, and the transmembrane protein 237 (TMEM237) and KMT5B genes. The total WES diagnostic rate was 31% [19][45]. Zhang Q et al. sequenced 1360 patients to identify 604 genetic pathologies associated with 150 genetic syndromes, 510 genes, and 718 variants. In this cohort, the overall WES positive identification rate for disease-related gene alteration was 44.41%. Investigators detected growth abnormalities in 49.37% (118/239), seizures in 44.54% (102/229), autism spectrum disorder in 32.76% (38/116), global developmental delay in 54.84% (51/93), motor deterioration in 48.06% (99/206), abnormalities of the respiratory system in 40.61% (67/165), cerebral palsy in 41.26% (59/143), and abnormalities of the head or neck in 55.52% (161/290), the skin in 53.70 (58/108), the endocrine system in 49.78 (112/225), hearing or vision in 58.51% (55/94), the skeletal system in 53.95% (116/215), and the cardiovascular system in 43.20% (54/125) of samples [20][46].
WES allows for the identification of new, very different variants in various populations. WES enabled us to identify new variants in the low-density lipoprotein receptor (LDLR) gene in 59 Russian patients with a history of familial hypercholesterolemia (FH) [21][47]. FH results from genetic variants in the LDLR, apolipoprotein B (APOB), and subtilisin/kexin proprotein convertase type 9 (PCSK9) genes [22][48]. FH-associated variants were determined in 25 children and 18 adults, showing mutation detection rates of 89 and 58% for the children and adults, respectively. In the adults, 13 patients had variants in the LDLR gene, 3 patients had APOB variants, and 2 had ATP-binding cassette transporter 5 (ABCG5)/G8 mutations. Twenty-one children had FH-associated variants in the LDLR gene; see Table 1. Researchers' study identified seven novel pathogenic or likely pathogenic LDLR variants (Table 2). Among them, four missense variants were located in the protein coding regions, and two were frameshift mutations responsible for the production of truncated proteins. These mutations were only reported in one patient, whereas an intron 6 splicing variant (c.940+1_c.940+4delGTGA) was detected in four unrelated individuals. Variant p.Gly592Glu in the LDLR gene was identified in six (10%) Russian patients and may presumably constitute the main FH variant in the Russian population.
Table 2.
Pathogenicity of novel
LDLR gene variants.
gene variants.
https://www.mdpi.com/2075-4426/13/8/1236#B32-jpm-13-01236
GenePatient IDExon/IntronVariantAllele Frequency in GnomADAllele Frequency in [23]Variant Pathogenicity Classification by ACMG
LDLRG314c.316_328delCCCAAGACGTGCT p.(Lis107Argfs*95)Not foundNot foundP (PVS1 PS1 PM1 PM2 PP3)
LDLR
[29][54]. Investigators performed one single WES, two duo-WES, and fifty-nine trio-WES. The overall diagnostic rate was 46% (28/61) and 50% (15/30) in neonate subgroups. The yielded data showed that WES is a noninvasive diagnostic tool with a high rate of MD identification in neonates and infants. Thus, the evidence justifies the application of WES as a first-line examination for preconception genetic diagnosis and in idiopathic disorders in probands with a “blurred” phenotype.
To ensure efficiency (see Table 3), the following cost-effective strategy is suggested for the genetic diagnosis of MODY, WD, and other MDs associated with major mutations. Researchers also show additional benefits of the application of WES in disease diagnosis.
Table 3.
Most efficient diagnostic strategies for hereditary diseases.
NosologyEfficiency of Diagnostics Prior to NGS, %Efficiency of Diagnostics after WES, %Efficiency of Diagnostics with Novel Variants Considered, %Reference
Cystic fibrosis45–55 (1 mutation)

58 (35 mutations)
67–80-G294c.325T>G p.(Cys109Gly)Not foundNot foundLP (PS1 PM1 PM2 PM5 PP3)
Unpublished
WDUp to 75 (4 mutations)

Up to 86 (12 mutations)
Up to 9697[30]LDLRG364
MODYc.401G>C (p.Cys134Ser)Not found15–35Not foundLP (PS1 PM1 PM2 PM5 PP3)
40–5055[31]LDLR14c.433_434insG p(Val145Glyfs*35)Not found
GeneticallyNot found

P (PVS1 PM2 PP3)
heterogeneous

condition
28%65% [32]LDLRG184c.616A>C (p.Ser206Arg)Not found
Neurometabolic disorderNot foundUncertain significance (PM2 PP1 PP3)
24%35% [32]LDLRG21IVS6c.940+1_c.940+4 delGTGA (g.18154_18157delGTGA)Not foundNot found
Single anomalyy of the fetuses 6% [33]P (PVS1 PM1 PM2 PP3)
LDLR328c.1186G>C p.(Gly396Arg)Not foundNot foundP (PVS1 PM1 PM2 PM5 PP3)
LDLRG26IVS8c.1186+1G>T (g.22279G>T)Not foundNot foundP (PVS1 PM2 PP3)
LDLRG1711c.1684_1691delTGGCCCAA p.(Pro563Hisfs*14)Not foundNot foundP (PVS1 PM1 PM2 PP3)
Abbreviations: LDLR, low-density lipoprotein receptor; ACMG, American College of Medical Genetics; and P, pathogenic.
 
FH is a common, underdiagnosed, and untreated genetic disease worldwide [24][50]. Therefore, WES sequencing data can be used to detect new candidate genes.
Current sequencing methods allow for the detection of a bundle of hereditary diseases in an individual, thus gaining unprecedented significance. Such cases are not as rare as they may seem. For instance, researcjers would like to refer to a case of the coinheritance of X-linked and dominant forms of ichthyosis [25][51]. This information may be valuable for genetic counseling because of similar clinical symptoms. It is therefore necessary to analyze both steroid sulfatase (STS) and filaggrin (FLG) genes to exclude combined forms of ichthyosis. Notably, NGS allows us to identify P or LP SNPs in genes that were earlier believed to possess mutations of a single type [26][52].
For a set of disorders, adequate therapy is the most critical outcome of NGS examination. In male probands with delayed growth and bone age, intellectual impairment, skeletal and facial features, and partial responses to hormone treatment, researcjers identified a c.7466C>G (p.Ser2489*) heterozygous pathogenic mutation in the last exon of the SRCAP (Snf2 related CREBBP activator protein) gene, thus suggesting a new model of floating harbor syndrome (FHS) pathogenesis. These genetic mutations have dominant-negative effects that explain the limited efficacy of growth hormone treatment in FHS [27][53].

4. General Strategy and Algorithm of WES Implementation in Human Genetic Pathology Diagnostics

WES provides a robust technique for MD diagnosis in humans. Yingchao Liu et al. utilized WES to study 169 children with critical disorders (median age = 10.5 months) and MDs [28][18]. Monogenic disorders were diagnosed in 43 (25%) patients. Pathologies with the highest incidences included metabolic (33%) and neuromuscular (19%) diseases, as well as multiple deformities (14%). The efficacy of diagnoses in children with metabolic disorders, growth impairment, or ocular abnormalities improved once thorough clinical data were available. WES data enabled adjustments in 30 (70%) cases, including disease monitoring initiation in 41.9% (18 cases), rehabilitation and palliative care in 27.9% (12 cases), the modification of ongoing treatment in 25.6% (11 cases), other comprehensive evaluation procedures in 7% (3 cases), and family intervention in 4.7% (2 cases).
Tasja Scholz et al. studied the diagnostic efficacy of WES for MDs to identify phenotypes in 61 infants with critical idiopathic disorders
Two and more anomalies of the fetuses
 
35% [33]
Anomalies of the fetuses 10.3–18.9% [34]
Anomalies of the fetuses 8.5–15.4% [35]
Anomalies of the fetuses 6.2–80% [36]
Abbreviations: MODY, maturity onset diabetes of the young; WD, Wilson’s disease; WES, whole-exome sequencing; and NGS, next-generation sequencing.
The obtained data are concordant with the global assumptions (7.7 × 10−6) [5][31].
It should be noted that NGS does not always suffice to formulate a diagnosis; hence, in some cases, concurrent or subsequent Sanger sequencing is required to detect the other pathogenic variant. In patients with a blurred clinical picture, differential diagnosis with WES is necessary to identify the root cause of a disease. For example, NGS was used to analyze the hotspot region in the RNA processing endoribonuclease (RMRP) gene promoter in a proband with extremely rare autosomal recessive skeletal chondrodysplasia (anauxetic dysplasia, AD). Heterozygous rs387906533 (n.91_92delinsGC) variants of the nucleotide sequence (chr9:35657924-35657925delCTinsGC) were detected in exon 1 of the RMRP gene and an unknown n.–6_–5insTCTCAGCTTCAC substitution (chr9:g.35658020 35658021insTCTCAGCTTCAC) in the gene promoter region. The variant is a 12-nucleotide insertion between the TATA box and the transcription start site [37][61].
It was found that the n.–6_–5insTCTCTCAGCTTCAC mutation was of paternal origin and the n.91_92delinsGC mutation was of maternal origin. No prior evidence has ever been reported regarding the insertion in the RMRP gene promoter region as a cause of AD with no extraskeletal manifestations (typical of carriers of similar mutations) [37][61].
ScholarVision Creations