Sequencing for Inherited Retinal Diseases

Sequencing for Inherited Retinal Diseases: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Genetics & Heredity

Contributor: Adrian Dockery

Inherited retinal diseases (IRDs) represent a collection of phenotypically and genetically diverse conditions. IRDs phenotype(s) can be isolated to the eye or can involve multiple tissues. These conditions are associated with diverse forms of inheritance, and variants within the same gene often can be associated with multiple distinct phenotypes. Such aspects of the IRDs highlight the difficulty met when establishing a genetic diagnosis in patients. Here we provide an overview of cutting-edge next-generation sequencing techniques and strategies currently in use to maximise the effectivity of IRD gene screening. These techniques have helped researchers globally to find elusive causes of IRDs, including copy number variants, structural variants, new IRD genes and deep intronic variants, among others.

genetic diagnosis
inherited retinal disease
rare disease
retina
sequencing
diagnostics
macula
genomics
variant interpretation
eye

Note: The following contents are extract from your paper. The entry will be online only after author check and submit it.

1. Introduction

A primary focus in ocular genetics globally is accurate genotyping of patients with rare inherited retinal diseases (IRDs). Next-generation sequencing (NGS) has been a common strategy employed in many countries to achieve this goal for several years [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]. This review focuses on the various methods and strategies that are being implemented to elucidate the genetic pathogenesis of IRDs and provides an overview of how these approaches have evolved. IRDs have an estimated prevalence of 1 in 4000 [17]. With a current global population of approximately 7.8 billion [18], it is estimated that approximately 2 million people currently have some form of IRD.

As a global community involved in ocular genetics, the common goal is to achieve a diagnostic success rate of 100% for all IRD patients enrolled in clinical studies. This objective, however, presents several challenges. Firstly, over 270 genes have been associated with the aetiologies of IRDs (RetNet, Retinal Information Network, https://sph.uth.edu/retnet/ accessed on 20 April 2021) [19]. Furthermore, extensive diversity in clinical presentation due to mutations even within a single IRD gene, as well as intersecting clinical phenotypes and phenocopies, is encountered. Mutations in disease genes may affect the retina in isolation, or may have more systemic effects. For example, there are 80 systemic conditions with a retinal phenotype and 200 genes that not only affect retinal health but also the central nervous system, kidneys or heart [20]. Such complexity makes it near-impossible for a diagnosis to be achieved in most instances solely on the basis of disease phenotype [2,21,22,23]. Furthermore, even a single pathogenic variant can manifest with phenotypic variability [24]. For some IRDs, modifier loci have been identified, somewhat blurring the borders between Mendelian and polygenic forms of IRD and mirroring similar observations with other disease aetiologies [25,26].

In this review, we aim to provide an overview of the NGS strategies employed globally to maximise the detection of IRD-causing mutations. This includes the use of targeted gene panels for all IRD phenotypes or phenotypic subsets; whole-exome sequencing (WES); whole-gene sequencing, whereby an IRD gene’s 5′ and 3′ sequences, exons and introns are interrogated; whole-genome sequencing (WGS); and bespoke methods to compliment other strategies, such as structural variant (SV) detection and copy number variant detection (CNV), among many others. In parallel with the use of the above technologies for the identification of candidate IRD-causing sequence variants, a wide array of methods to explore the functional effects of sequence variants have also been developed, and these are discussed.

While NGS technologies have enabled rapid characterisation of the genetic architecture of IRDs in many disease cohorts with diagnostic rates often approximating 70% [27], much still remains to be optimised. Strategies currently under development to further improve diagnostic rates are reviewed herein, as are the approaches being employed to enable interpretation of novel coding and non-coding candidate variants. Additionally, given the availability of increased numbers of WGS sequences from IRD and control populations, a greater focus is placed on the elucidation of the genetic modifier loci that may influence the effect(s) of the primary disease-causing mutations. An overview of the findings to date is provided.

2. IRDs—Target Panels and Whole Exome Studies

To elucidate the genetic contributions to IRDs, DNA, typically isolated from saliva or peripheral blood sample, is analysed. Optimal processing of the sample will depend on which form NGS is to be employed. In terms of both cost and data generated, NGS methods ranging from low to high involve targeted sequencing (TS), whole-exome sequencing (WES) or whole-genome sequencing (WGS). WES exclusively captures the protein-coding exons, but only accounts for approximately 1% of the genome. It is important to note that exon-based sequencing is likely to also reliably detect intronic variants located close to the targeted exons, such as non-canonical splice site variants, which are known causes of IRDs [28,29,30]. WGS is significantly more comprehensive including introns, promoters, and intergenic regions; in principle sequencing every nucleotide possible in a sample. TS typically captures the smallest amount of genetic information but does so in a completely customisable manner. For example, some IRD phenotypes are associated with pathogenic variants in a very small number of genes, but some of those genes may also be known to harbour pathogenic deep-intronic variants. In this case, adopting a TS approach would be more fruitful than WES. Arguably, WGS could also be used for this purpose but would generate more off-target data requiring significantly greater levels of analysis and storage and the on-target data would likely be less than a TS approach.

The benefits of TS are that it is an economical method of focusing sequencing capacity in smaller genomic regions including noncoding regions, therefore maximizing the coverage of clinically relevant genes. Enhanced coverage translates to greater sequencing read depth which is valuable, for example, to increase the resolution of detecting genetic variants and to detect smaller levels of heteroplasmy in the mitochondrial genome, or mosaicism in the nuclear genome [31,32,33,34]. By reducing the size of the region of genome sequenced per sample, a greater number of samples can be multiplexed together and processed in the same sequencing run. There are other cost-savings elements to TS, smaller file sizes allow for cheaper storage and faster processing. Moreover, targeting specific regions of the genome previously implicated in IRDs, can massively reduce the risk of detecting secondary or incidental findings.

For these reasons, TS strategies have frequently been employed for IRD screening for many years. Shortcomings of a TS strategy are that it often involves multiple gene panels for different conditions and if new IRD genes are identified or new variant associations are made for genes outside of the panel, a panel redesign will be required to include them. It is possible to use TS to detect indicators of large structural variants; however, such genomic breakpoints would likely have to occur within the captured loci reducing the likelihood of identification [1].

The customisable element of TS has become increasingly valuable with the recent detection of several population-enriched rare pathogenic variants likely due to founder effects. For example, a novel PDE6B variant was observed in the Māori IRD participant group and is likely to account for 16% of all recessive IRDs in that population [35]. Similarly, EYS gene variants were found to be causative in 51% of a RP cohort from Japan [36]. This discovery is not unique, as several other parallel studies have revealed similar founder mutations in their target populations, for example, Belgium, RAX2 [37]; Costa Rica, RPE65 [38]; Finland, CERKL [39]; Japan, EYS [36]; Spain, RP1 [40] and ABCA4 [41]; Jewish community in Caucasia, PDE6B [42]; Pakistan, ABCA4 and NMNAT1 [43]; Guyana, BBS9 [44]; and Faroe Islands, MERTK [45]. The enrichment of these variants, several of which are large structural variants, emphasises the value of population-specific TS panels to target and detect mutations and mutational breakpoints that may be missed by commercial generic gene panel sequencing or even WES.

The use of WES has increased in popularity in recent years (Table 1) compared to previous metadata reported [27] and has many advantages over the TS approach. An effective TS panel design can be optimised with prior knowledge of the spectrum of mutations capable of causing the patient’s condition. This includes but is not limited to, knowledge of possible founder mutations in the population, all possible genotype–phenotype associations and breakpoint locations of any large structural variants that may exist. WES is agnostic to these issues. Although WES is not capable of detecting deep intronic mutations without modifying the method, it enables exonic variants to be detected even if their relevance is not entirely elucidated at the time of capture. This provides the potential for future interrogation of WES data as new IRD genes are discovered. Importantly, WES allows for the potential future resolution of a previously unsolved diagnosis.

Table 1. Screening studies of inherited retinal disease (IRD) populations. CRD = cone–rod dystrophy; LCA = Lebers congenital amaurosis; IRD = inherited retinal dystrophy; MD = macular dystrophy; RP = retinitis pigmentosa; TC = target capture; WES = whole-exome sequencing; WGS = whole-genome sequencing.

Country	Author	Year	Pedigrees	Solve Rate	Cohort Details	TC (Genes)	WES	WGS
Australia	Thompson [46]	2017	34	90	LCA	Yes	-	-
Brazil	Motta [47]	2018	559	72	IRD	Yes	Yes	-
China	Liu [48]	2020	800	60	RP	Yes	Yes	-
China	Gao [49]	2019	1243	72	RP	586	-	-
China	Liu [50]	2020	182	48	IRD	Yes	Yes	-
China (Han)	Huang [51]	2017	98	41	RP	-	Yes	-
China	Dan [52]	2020	76	57	IRD	Yes	Yes	-
China	Wang [53]	2018	319	39	IRD	Yes	Yes	-
Finland	Avela [54]	2019	53	77	IRD	Yes	-	-
Germany	Weisschuh [55]	2020	1785	69	IRD	Yes	-	-
Germany	Birtel [56]	2018	251	74	MD/CRD	Yes	-	-
Iran	Tayebi [57]	2019	50	72	IRD	Yes	-	-
Ireland	Whelan [58]	2020	710	70	IRD	Yes	-	-
Israel	Sharon [59]	2020	2420	56	IRD	Yes	Yes	Yes
Japan	Koyanagi [60]	2019	1204	30	RP	Yes	-	-
Japan	Numa [36]	2020	220	45	RP	Yes	-	Yes
Korea	Surl [61]	2020	50	78	LCA	Yes	Yes	-
Korea	Kim [62]	2019	86	44	IRD	Yes	-	-
Mexico	Zenteno [63]	2019	143	66	IRD	Yes	-	-
Norway	Holtan [17]	2020	650	32	IRD	Yes	-	-
Poland	Wawrocka [64]	2018	18	39	CRD	Yes	Yes	-
Polynesian and Māori	Vincent [35]	2017	16	44	IRD	Yes	-	-
Spain	Perea-Romero [41]	2021	3951	53	IRD	Yes	Yes	Yes
Spain	Martin-Merida [65]	2019	877	38	RP	Yes	-	-
Spain	Gonzàlez-Duarte [66]	2019	73	85	IRD	Yes	-	-
Spain	Diñeiro [67]	2020	100	45	IRD	Yes	-	-
Taiwan	Chen [68]	2020	60	53	IRD	Yes	-	-
Tunisia	Habibi [69]	2020	73	68	IRD	-	Yes	-
UAE	Khan [70]	2020	71	100	Pediatric IRD	Yes	Yes	-
UAE	Patel [71]	2018	75	82	IRD	Yes	Yes	-
UK	Jiman [72]	2020	106	49	Syndromic IRD	Yes	-	-
UK	Shah [73]	2020	655	43	IRD	Yes	-	-
UK	Carss [74]	2017	722	56	IRD	-	Yes	Yes
UK	Lenassi [75]	2020	201	64	Pediatric IRD	Yes	-	-
UK	Patel [76]	2019	277	25	Pediatric IRD	Yes	-	-
UK	Taylor [77]	2017	85	79	Pediatric IRD	Yes	-	-
USA and Canada	Goetz [78]	2020	5385	62	IRD	Yes	-	-
USA	Stone [79]	2017	1000	76	IRD	-	Yes	-
USA	Bryant [80]	2018	69	64	IRD	-	Yes	-

Further to this point, many disease phenotype-based gene panels are very specific and therefore typically target only a small number of genes, not allowing for the possibility of new genotype–phenotype correlations or ambiguous phenotypes. Indeed, it was recently established that 23% of cases analysed would not have been resolved if they were sequenced by a commercial panel designed specifically for a patient’s phenotype [81]. In the same study, it was also found that for 26% of participants, the cheapest applicable commercial gene panel would have been costlier than performing WES for those patients.

Several large IRD screening studies in recent years have sought to identify the genes responsible for the largest proportions of their cohorts’ IRDs. In the UK, over 3000 pedigrees were reviewed, and it was determined that 135 IRD genes contributed to the genetic pathogenesis of the cohort. Interestingly, 70% of resolved cases were deemed to have causative mutations in 20 genes only [82]. Similarly, in over 5000 pedigrees with genetic eye conditions from Canada and the US, 68% of pathogenic or likely pathogenic mutations were identified in just 10 genes [78]. Both of these large studies identified ABCA4, USH2A and RPGR as the top three genes contributing to IRDs.

Although clearly not 100% effective, smaller whole-gene panels may be very effective as a first-tier screening approach. Several gene associations have been recently flagged as unlikely to be as pathogenic as initially reported. Nineteen percent of queried autosomal dominant retinitis pigmentosa (adRP) genes were deemed to harbour variants unlikely to be disease-causing for reasons relating to their respective allele frequencies or variant interpretation at that time [83]. Such variant “false positives” are shortcomings of the diagnostic odyssey, and this implies that, for an initial screening procedure, there may not be the need to screen as many of the genes and variants that are typically included in large gene panel screening studies.

It is important to also note that there have been many reports of the occurrence of multiple IRDs within the same family, or even within a single individual. Although individually IRDs are rare globally, the concurrence of multiple IRDs in a patient or pedigree unfortunately represents another diagnostic challenge. Our team has previously reported a pedigree in which five affected members of the family were broadly categorised as RP phenotypes. After genetic investigation it was revealed that four of these individuals were homozygous for a FLVCR1 variant, while the remaining affected patient was compound heterozygous for pathogenic variants in NR2E3 [84]. Similarly, in a US study, involving three IRD pedigrees, each given an initial diagnosis of RP, one with a dominant RP and the other two with a dominant, incompletely penetrant RP, it was found that multiple IRD genes were responsible for various affected individuals in each of the three families: both USH2A and RP1 was segregating in one family; PRPH2 and CRX in a second family and PRPH2, PRPH8 and USH2A in the third family [85]. These studies, however, are trumped in complexity by Birtel et al.’s analysis of a single family with four different IRDs each caused by distinct pathogenic variants and inheritance patterns: father, RHO, dominant RP; mother, ABCA4 and CACNA1F, recessive Stargardt and CSNB; first son, CACNA1F, CSNB; second son, MITF, dominant Waardenburg syndrome [86]. These are some examples of the many that exist, illustrating the complexity of IRD screening and reinforcing the necessity of thorough clinical and genetic investigation prior to genetic counselling [87].

3. Expanding IRD Diagnosis via Whole-Gene or WGS

For any laboratory electing to use NGS, it is essential that the limitations of the NGS approach to be employed are known. Failure to appropriately sequence the target genomic sites clearly will limit the success rate from the outset. Three consistent biases that exist for WES but not WGS are strand bias, evenness of coverage and the proportion of transcripts covered in their entirety. Interestingly, it has also been found that WGS provides a 3% better coverage of exons compared to WES, 98% versus 95% [88]. In another study it was found that the WGS approach offered superior detection of structural variants, variants in regulatory regions and detection of variants in GC-rich regions compared to WES [74]. However, this additional superior detection comes at a significant financial cost. A review of studies that used WES and WGS for clinical practice revealed that the price range for WGS studies was approximately five times higher for WGS on a per sample basis [89].

Costs incurred by WGS include not only the upfront cost of sequencing, but also additional downstream expenses. WGS produces vastly more data, thus immediately requiring additional computational power, people hours and storage to process. Although storage issues may be a limiting factor in the budgets of most research groups, policies regarding data storage can be readily adjusted to meet the needs of the research group, as the needs of two research groups will rarely be the same. This bespoke approach is advisable to avoid issues such as inadequate infrastructure and overspending. Raw sequencing files (such as .fastq files) and output files (variant call files, .vcf) are relatively small in comparison to the alignment files (such as .bam files) that need to be produced as part of the analyses [90,91,92]. Given this information, it is possible to reduce the capacity required for long term storage of sequencing data by electing to discard the alignment files but keeping the input and output files so that the analysis can be repeated at a later date and outputs can be compared for discrepancies and newer discoveries. However, increased storage will be required again upon reanalysis, as new alignment files will be created as part of the process.

Another viable alternative in reducing the disk footprint of alignment files, is the use of additionally compressed formats, such as .cram files. This format uses reference-based compression, only storing base calls that differ to the reference genome used. This compression can be either lossless or can incur a reduction in base quality scores corresponding to the level of compression. Even the lossless format offers a 40–50% reduction in space required by comparison with BAM [93]. In addition, the use of WGS as a second-tier approach, for cases that remain genetically unresolved following first-tier sequencing will decrease data-storage demands. More research groups are now making the move to cloud-based storage for their NGS data and minimising the amount of data stored has a direct impact on cost [94]. It is important to note that sample processing and analyses are available via cloud-based solutions also, and may be an attractive option for research groups lacking the necessary in-house infrastructure to process NGS data [95].

The additional cost of larger-scale analysis is not the only hazard associated with this data management. Both WES and WGS have an increased likelihood of carrying intrinsic responsibilities regarding the management of incidental or secondary findings (SFs) unrelated to the initial indication for sequencing. For example, Hart et al. (2018) found that a SF is detected in 1.7% of patients who undergo WES [96]. Some IRD studies have employed a nested targeted approach, wherein the entire genome of an individual is sequenced but only variants in genes relevant to the IRD phenotype are interrogated by use of variant filtering with a virtual gene panel. This still provides benefits over traditional targeted panels, as it also includes sequencing of non-coding regions, as well as the potential for analysis of an expanded panel in the future. For example, Carss et al. (2017) performed WES on 117 individuals, identifying pathogenic variants in 59 cases [74]. Forty-five of the unresolved cases then underwent WGS and positive candidate variants were identified in an additional 14 cases. This approach is likely chosen due to the immense volume of data produced by WGS and WES, and the need to more rapidly analyse the most relevant data available.

This approach also limits the possibility of detecting SFs. In SF v1.0, The American College of Medical Genetics and Genomics (ACMG) recommended analysis of 56 medically actionable gene–phenotype pairs which was then updated to a panel of 59 genes in v2.0 [97,98]. ACMG SF v3.0, recommending the analysis of SFs in 73 gene–phenotype pairs, was very recently released [99,100]. Of particular interest to the ocular genetics community, ACMG SF v3.0 now includes the RPE65 gene. The RPE65 gene was included on the basis that an FDA-approved gene therapy now exists for biallelic RPE65-retinopathies and that patients may derive additional benefit from earlier detection and therapeutic intervention. The ACMG recommends application of these SF guidelines in a clinical setting as opposed to a research setting. Nonetheless, as with all genomic testing, it is imperative that the patient’s interests are at the forefront. ACMG currently recommend that patients/guardians have the choice to opt-out of SF testing. This highlights the necessity of an appropriate and comprehensive pre-testing consent procedure. This includes and is not limited to information pertaining to what will not be disclosed should the patient/guardian chose to abstain from SF analysis and thorough pre-test and post-test counselling.

Despite additional costs, there may be diagnostic benefits to employing WGS to resolve genotypes. Lionel et al. investigated 103 cases of diverse genetic disorders comparing WGS to targeted panel sequencing. Not only was the solve rate superior when WGS was used, 41% versus 24%, 18 diagnoses were made based on structural variants or intronic variants that were not captured by the TS method [101]. Regardless of the substantial number of genes identified and targeted by TS, as estimated from studies to date, the genetic cause of 43% of all IRDs patients remains unknown and suggests the need for more studies to employ WGS (Table 1). These missing genetic aberrations may reside in introns or intergenic regions, both of which are captured by WGS. There is also the possibly of novel IRD gene discovery that is facilitated by WGS. The superior uniformity of genome coverage enabled by WGS also allows for greater sensitivity when detecting copy number variants (CNVs) that are notoriously difficult to detect by TS and WES.

A cost-effective alternative that retains many of the same benefits as WGS is whole-gene sequencing (GS). GS enables the capture of exonic, intronic and 5′ and 3′ regulatory regions for a target gene of interest but has many of the same limitations as the TS approach, including strand bias and GC-rich impedance to capture. GS has been utilised very successfully for cohorts with phenotypes associated with monogenic or near-monogenic causes. For example, individuals affected with incomplete congenital stationary night blindness (icCSNB) present with a recognisable phenotype. This form of icCSNB is primarily associated with mutations in the CACNA1F gene. In a recent large genotyping study of icCSNB–CACNA1F patients (n = 189), 4% of CACNA1F causative variants were attributed to intronic and synonymous mutations [102]. It is also probable that there are additional intronic variants yet to be designated as pathogenic in the unresolved portion of this cohort.

Similarly, Khan et al. investigated 1054 unresolved Stargardt cases with a GS approach. Stargardt disease is predominately caused by biallelic variants in the ABCA4 gene. The authors of the study used a single-molecule molecular inversion probes (smMIPs) approach, which proved reliable and cost-effective. Their study revealed the presence of pathogenic SVs and deep-intronic variants in 25% of biallelic cases [103]. The smMIPs method is gaining in popularity given its superior target capture and low cost compared to other TS capture methods. In a recent comparative study, 176 IRD patients were analysed with both smMIPs and TS. The smMIPs approach demonstrated enhanced target coverage (97.3% versus 93.9%) and was five times more cost effective when greater than 500 samples were analysed [104].

The GS approach has also been combined with probes for other IRD genes to investigate if this combinatorial strategy could significantly improve diagnostic rates for a range of IRDs when compared to traditional exon-based TS IRD panels. The study design encompassed a second-tier approach for patients who had one previous variant found in USH2A, ABCA4 and CEP290. These whole genes, plus exons of 76 additional IRD genes and pathogenic intronic regions of two IRD genes were sequenced in an effort to resolve the “one-hit” patients. An overall diagnostic rate of 58.6% was achieved; two copy number variants were detected in USH2A [105]. Although this diagnostic rate was no higher than the average study (Table 1), it does represent significant improvements that can be made to address the large proportion of unresolved patients identified by standard screening studies. The structural variants established in this study would likely not have been detected by use of a more traditional, purely exon-targeting design.

An improved GS study design as outlined above may have additional advantages. The RPGR gene is one example of an IRD gene that includes regions that are challenging to sequence comprehensively with traditional TS or WES; sequencing through ORF15 is impeded due to a low-complexity sequence composition [106]. However, it is vital to capture this gene as, for example, it accounts for nearly 40% of X-linked retinopathies in the UK. This makes pathogenic variants in this gene the third most-prevalent cause of IRDs in this population [82]. In an Italian cohort of 48 RPGR-related RP cases, approximately half had mutations in ORF15 and presented with a more severe phenotype than the other causative variants in exons 1–14 of RPGR [107]. It has been suggested that the sequence coverage of ORF15 could be optimised by modifying NGS library preparation, reducing false-negatives, miscalled variants and false-positives when compared to traditional methods [108]. For this reason, many recent NGS screening studies have adopted bespoke approaches to sequencing RPGR, including entirely separate analyses or spiking the NGS libraries with separately generated amplicons for RPGR [50,53,56]. In another study improved alignment of sequencing reads mapping to the ORF15 region by using a de novo assembly approach were reported. The accuracy of sequencing can be quickly determined for males when analysing the RPGR gene, as variants called in error will likely not be represented in every sequencing read mapping to this region. Therefore, heterozygous variant calls can be readily identified as errors, since males have only one copy of RPGR. This de novo assembly approach reduced the number of false-heterozygous calls in males and improved the accuracy of indel calls [109].

Another example of genes that benefit from tailored GS design are those encoding the opsins, OPN1LW (red cone cells) and OPN1MW (green cone cells). These genes encode photopigments in the retina and pose an interesting challenge to sequencing efforts. These two genes are 96% homologous which introduces unique challenges for the IRD gene panel, as short-read sequencing may be unable to determine the best alignment option when mapping back to the genome [110]. A new two-step method from Atilano et al. has demonstrated that long-range PCR can generate specific long amplicons that can be more readily mapped back to the genome [111]. This approach offers a solution that can be analysed separately, by direct sequencing of the amplicons, or alternatively, as part of a larger panel if long-read sequencing (LRseq) is used.

Sequencing the entirety of a gene also facilitates the detection of variants in the upstream regulatory regions which have been implicated in retinal disease previously, for example in Blue-Cone Monochromacy (BCM) [112]. In this condition, a c. −71A>C promoter mutation was initially thought to decrease expression and cause a deutan colour vision deficiency. However, after functional analyses, the mutation was revealed to result in more than double the wild-type expression level of the gene [113]. Other deletions in this area have also been shown to result in BCM phenotypes, suggesting that this gene is sensitive to alterations in both under and overexpression [114,115]. Similarly, Radziwon et al. used luciferase reporter assays to assess upstream variants detected in the CHM gene in patients with choroideremia. Both probands had variants at position c. −98: C>A and C>T. Both mutations led to a reduction in luciferase activity and furthermore, the promoter region for CHM was defined as the region encompassing nucleotides c. −119 to c. −76 [116].

Regulatory mutations are often difficult to interpret, particularly for genes associated with recessive forms of inheritance. Previously, consanguineous pedigrees have been useful for identifying homozygous variants in these cases, such as NMNAT-related Leber congenital amaurosis (LCA) [117]. Variant interpretation can be further complicated as such variants may not have strong effects on gene expression. In a recent study of promoter variants in ELOVL4 two variants were found, c. −236 C>T (rs240307) and c. −90 G>C (rs62407622) which resulted in 18% and 14% reduction in expressivity, respectively. However, as the patient in question had the variants in trans, a severe phenotype was observed, much more than would have been expected from the modest effect of the two variants analysed separately [118]. This detrimental synergistic effect may emphasise the threshold sensitivity of retinal tissues and cell components to the dosage levels of this protein and its downstream effects.

4. Copy Number and Structural Variants

As discussed above, TS and WES methods are the most universally utilised, yet they are largely incapable of detecting large copy number variants (CNVs), structural variants (SVs) and chromosomal rearrangements. In 2018, an extensive literature-mining endeavour revealed that 1345 copy-number variants (CNVs)—specifically, 317 unique variants—had been reported in 81 distinct IRD genes. When further analysed, the size of the gene correlated with the reported numbers of CNVs associated with that gene. Additionally, many of these large variants affected non-coding and potential cis-regulatory elements [119]. The relevance of such variant types is now recognised, and guidelines have been published to assist in the interpretation and classification of them, similarly to those published in recent years for single-nucleotide variants [120,121].

CNVs and SVs can also vary significantly in the complexity of their rearrangements. Gross deletions have previously been detected in many genes, including BEST1, EYS, MERTK, USH2A and many more from the aforementioned study alone [119]. Large deletions have also been reported in RPGR [122], CHM [58], OPN1LW/OPN1MW [123] and USH1C [1], to name but a few. Deletions are likely to be the most detectable CNV type given that most studies employ WES or TS to detect mutations. Homozygous deletions are the most readily detectable from using these methods as the read coverage over the deleted region would be zero, given no template exists for capture or amplification. Heterozygous deletions may be under-reported when using WES or TS, if significant amplification has occurred, which may unintentionally normalise the ultimate read depth aligned to the deleted region. For similar reasons, duplications can be very difficult to detect with these methods. However, such mutations can be more readily detected by WGS due to the superior and more even coverage, or by more specific approaches, such as targeted locus amplification [119]. Some regions of the genome, such as the RP17 locus, have been shown to harbour many complex CNV and SV variants associated with IRDs. These convoluted rearrangements resulted in the interference of the surrounding genome architecture, disrupting enhancer–promoter interactions, and resulting in aberrant gene expression [124].

Genomic rearrangements are more likely to be detected by the presence of broken sequencing reads when aligned back to the reference genome. This occurs when a read, or pair of reads, partially align to one part of the genome and partially to somewhere quite different. This is applicable to translocations and inversions, as unlike CNVs, the read depth is not expected to be altered in these scenarios. Given the significant presence of retrotransposon sequence in the human genome, it is not surprising that several retrotransposons insertions have been reported to disrupt the functionality of IRD genes [125,126,127]. The BBS1 gene in particular, has been recently reported to harbour retrotransposons causative of disease [128,129]. Retrotransposons, much like other large genomic insertions, can be difficult to detect as they are unlikely to disrupt read depth in the genomic region to which they have relocated to, since alignment tools will align these reads to their original positions in the genome. Broken reads may indicate that a rearrangement has occurred. If breakpoints are detected, the region can be directly sequenced to shed light on the nature of the SV. Alternatively, a de novo assembly approach may be used to reconstruct the queried genome [130]. This approach will likely only be beneficial in the case of WGS, since TS or WES will likely not have sequenced the insert because the original genomic region was not an intended target.

In one study, involving an investigation of PRPF31-related disease, 45% of probands (10 of 22) tested positive for a CNV. The PRPF31 gene has no obvious sequence elements that may make it particularly susceptible to genomic rearrangement, such as long interspersed nuclear elements (LINE) and long terminal repeat (LTR) elements [131]. The study emphasises the importance of integrating CNV and SV detection into screening protocols, even for genes that may not appear to be conventionally susceptible to genomic rearrangements. The estimated prevalence of causative SVs in IRD cases is roughly 10% [51,131,132,133]. This is similar to findings from other rare disease cohorts as 12% of developmental disorders are estimated to be caused by pathogenic CNVs, therefore CNV and SV detection is recommended to be incorporated into first-tier testing for that set of conditions [134]. In a large hearing loss screening study of over 1000 patients, 18% of resolved patients were found to have causative CNVs [135]. CNV detection has also proven very useful in diagnosing atypical syndromic IRD cases resulting in novel genotype–phenotype associations and the refinement of complex phenotypes in multiple cases [136].

Many of the NGS methods discussed so far have revolved around short-read sequencing; however, long-read sequencing is arguably the superior approach for detection of SVs and CNVs. Short-read sequencing is generally preferred to ensure that high-quality data are produced [137]. However, this technology is greatly hindered by features of the genome, such as repetitive elements, which are not only abundant in our genomes, but also known to increase the likelihood of an SV event occurring in IRD genes [123,138]. Long-read sequencing offers superior sequencing of such regions and offers a chance to more accurately recapitulate patients’ true genomic sequences through the use of de novo assembly [139,140,141]. Results from several studies to date have revealed IRD-causing SVs by the use of long-read sequencing, and in some cases, concluded that the complexity of the SV was such that it was likely not possible to fully resolve it by short-read sequencing [44,142].

Another useful application of long-read sequencing is determining the phase of potentially causative recessive variants. Determining the phase of variants is of critical importance, as it may determine whether causative variants have been established, if in trans, or not, if in cis. This task is challenging for IRD cases primarily for two reasons. Firstly, variants causative of Mendelian IRDs are extremely rare in most cases. This prevents the establishment of known haplotypes or complex alleles in most cases that may otherwise indicate that the two detected variants are likely in cis. Secondly, many IRD screening endeavours are still in their infancy. This results in the widest possible age range of patients, since even patients with paediatric onset of their condition, may be elderly when screened. This can often make segregation analysis difficult, as many of their close family members may be immobile or deceased. Long-read sequencing offers the interpreter a greater chance of capturing both variants of interest within the same sequencing molecule and therefore determining phase of the variants without the need for additional family members [140].

This entry is adapted from the peer-reviewed paper 10.3390/ijms22115684

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.