Single-Chromosome Sequencing: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor:

Sequencing of DNA from single isolated chromosomes (ChromSeq) is an elegant approach to determine the chromosome content and assign genome assemblies to chromosomes, thus bridging the gap between cytogenetics and genomics. 

  • ChromSeq

1. Introduction

Single-chromosome sequencing has been previously referred to as ChromSeq in plant genome studies.

2. ChromSeq Workflow 

ChromSeq workflow consists in three main steps: (i) physical chromosome isolation; (ii) high throughput sequencing of isolated chromosomal DNA; and (iii) bioinformatic analysis of sequencing data (Figure 1).

Figure 1. Schematic representation of ChromSeq workflow. Briefly, chromosomes are isolated via either flow sorting or microdissection (only mechanical microdissection is shown). After isolation, Whole-Genome Amplification (WGA) is performed on chromosomal DNA. Eventually, chromosomal DNA can be labeled with fluorochromes and hybridized onto the target species metaphases to confirm the identity of isolated chromosomes. WGA products are then sequenced with next generation sequencing technologies. Sequencing data can be mapped on the target species reference genome or assembled de novo. The latter approach has proven successful when a combination of high throughput chromosome isolation (millions of copies) and long-read sequencing approaches are implemented.

Two main methods for physical chromosome isolation are currently available: flow sorting and microdissection [38]. Both approaches require the preparation of a metaphase chromosome suspension. Other methods based on microfluidic mechanics have been developed in the last decade for chromosome isolation (e.g., [39,40,41]); however, these methods were not widely used compared to flow sorting and microdissection.

In flow sorting, chromosomal DNA is labeled with two different fluorochromes specific for GC- and AT-rich regions. Fluorochrome-labeled chromosomes are passed through a narrow stream of liquid and broken into fine droplets. Fluorescence intensity is measured for each chromosome contained in a droplet, and the measurements of fluorescence intensities are visualized as a flow karyotype. Ideally, each chromosome forms a distinct peak in the flow karyotype, whose location is proportional to the ratio of GC/AT fluorescence intensity, a relative measure of chromosome size. Peaks can be gated for a specific fluorescence intensity ratio and droplets containing single chromosomes are deflected with an electromagnetic field into tubes [32,42,43]. An advantage of this method is the possibility to isolate a high number of specific chromosomes. However, key disadvantages of the method are the difficulty to separate chromosomes with similar size, and impurity due to the fragmentation of chromosomes.

In microdissection, metaphase spreads on slides are used. Metaphases are observed under an inverted microscope and single chromosomes are physically isolated either with a micromanipulator armed with thin glass needles (mechanical microdissection) or cut out with a laser beam (laser microdissection). In laser microdissection slides are covered with specific membranes to allow the chromosome cut [38,44]. This method generates relatively contamination-free samples and can be used to isolate not only a whole chromosome, but also specific target regions [45,46]. However, microdissection is labor intensive and is restricted to the isolation of usually no more than a dozen chromosomes per type. Moreover, part of the chromosomal DNA can be damaged or lost during the isolation.

Both flow sorting and microdissection yield DNA quantities, which are by themselves too low for high-throughput sequencing. For this reason Whole-Genome Amplification (WGA) is performed on chromosomal DNA prior to sequencing using either degenerate primer (DOP-PCR, Ref. [47]) or multiple displacement amplification (MDA, Refs. [48,49]). An aliquot of the amplified DNA can be used to produce chromosome paints [50]. For this purpose, amplified DNA is labeled with fluorochromes and eventually hybridized onto the target species metaphases, in order to confirm the identity of isolated chromosomes (e.g., [51]). Once a clear correspondence between the isolated chromosomal DNA and the species karyotype is obtained through FISH of chromosome paints, amplified DNA can be used to prepare libraries for high-throughput sequencing according to the manufacturer’s protocols. Currently a short-read sequencing approach is mainly preferred for ChromSeq (e.g., [52,53,54,55,56]), but long-read approaches have also been employed (e.g., [57]).

Sequencing data generated from isolated chromosomes can be processed using a wide variety of approaches that can be divided into two main categories: (i) alignment to a reference genome and (ii) de novo assembly of chromosome-specific sequencing data. In cases DOP-PCR or MDA are used prior to the chromosome-specific library sequencing, pre-processing of sequencing data is needed to trim primers and/or adapters independently from the approach used.

Reference-based analysis consists in the alignment of chromosome-specific reads to a reference genome, and represents the most commonly used approach so far. Based on the alignment data, reference genome scaffolds are assigned to specific chromosomes, that is, if reads obtained from chromosome 1 map onto three different scaffolds of the reference genome, it means that those three scaffolds are parts of chromosome 1. If no rearrangement is expected between the reference genome and the sampled chromosome, any statistic for mapped read density can be used to rank scaffolds and subsequently retain those assigned to the chromosome. The problem is further complicated if rearrangements between the target species chromosome and reference genome are possible. In order to predict the rearrangement breakpoints, several methods were successfully developed based on various statistical approaches and read density metrics, including maximum likelihood based on read count per Kb [29,58], circular binary segmentation [56] or clustering [59] based on distances between non-overlapping read mappings. The software DOPseq is the only one developed ad hoc for ChromSeq data analysis and it unifies the chromosome region detection with the upstream processing into an automated and reproducible pipeline [56]. The main disadvantages of a reference-based approach involve errors in read mapping and sample contamination, which can lead to misinterpretation of alignment data. Therefore, it is crucial to separate the true chromosome assignment signal from background noise.

A de novo assembly approach can also be implemented on ChromSeq data. In this case, chromosome-specific assemblies are produced independently for each chromosome. This approach requires cross-contamination checks among all chromosome-specific data pools and repetitive sequence removal to increase the assembly contiguity (e.g., [60]). Sequencing data derived from only a few isolated chromosomes are usually highly fragmented and a de novo approach might produce assembly with a low contig N50 (e.g., [61]). However, this problem can be circumvented by either sequencing a very large number of chromosome copies (up to millions) or by implementing sequencing data with long-read approaches. Kuderna et al. [62] for instance, successfully assembled de novo the human chromosome 1 by using a combination of high throughput chromosome isolation (10 million copies) and Oxford Nanopore sequencing. The resulting assembly had an N50 of 10.5 Mb and allowed the identification of structural variants. The gorilla Y chromosome was also successfully assembled de novo using a combination of short and long-read sequencing [57].

This entry is adapted from the peer-reviewed paper 10.3390/genes12010124

This entry is offline, you can click here to edit this entry!
Video Production Service