1. General Process of scRNA-seq
Although dozens of different scRNA-seq methods have been published
[1][2][3][4], they all follow a similar general process, which is shown in
Figure 1. In short, it includes single-cell dissociation and isolation, cell lysis and RNA release, reverse transcription (RT), second-strand cDNA synthesis, amplification, library preparation, sequencing and data analysis. In this section, the researchers will introduce some technical points and solutions in sequencing library construction.
Figure 1. General workflow of scRNA-seq. Firstly, single-cell suspensions are obtained from cultured cells or tissue blocks. Then, cells are isolated and lysed. The released RNAs are reverse-transcribed into cDNAs. After second-strand synthesis and amplification, sequencing adapters are added to both ends of the cDNAs to construct the final sequencing library. Finally, bioinformatics pipelines are used to analyze the sequencing data and re-establish gene expression signatures of single cells.
Specifically, there are two ways of single-cell dissociation: enzymatic and mechanical dissociation
[5]. The dosage, digestion time, temperature, and other parameters of enzymatic digestion need to be adjusted according to the property of tissue to maintain the integrity and activity of cells as much as possible. Mechanical dissociation or laser capture microdissection (LCM) uses a microscope and laser beam to carefully dissect single cells from frozen tissue sections
[6]. This technology was developed at the National Cancer Institute (NCI), originally for the study of heterogeneous cell populations in tumors
[7]. LCM has been widely used for single-cell analysis
[8][9][10][11], with the advantage of preserving the spatial information of single cells in tissues and facilitating the analysis of cell–cell and cell–environment interactions
[12][13][14]. However, the main restriction that limits the application of LCM in single-cell sequencing is its low throughput. In contrast, a large number of single cells can be easily obtained by enzymatic digestion, which is suitable for high-throughput analysis. However, the spatial information of the cells is inevitably lost in the dissociation step
[15]. In addition, it is difficult to obtain intact single cells from some samples (e.g., brain tissue), so it is necessary to extract a single nucleus for sequencing. At present, many sequencing technologies
[16][17][18][19]have been optimized for single-nucleus sequencing.
Single-cell isolation is a crucial point in single-cell sequencing. In the early days, the common method to isolate single cells is plate-based isolation. Single cells are picked manually with microcapillary pipettes under the microscope or sorted by flow cytometry, and then distributed into individual wells of a 96/386-well capture plate, and prepared for the subsequent operations
[20]. Manual cell picking is time-consuming, low-throughput, inefficient, and requires a certain degree of micromanipulation skill. Fluorescence-activated cell sorting (FACS), a widely used technique for cell sorting, can process thousands of cells in a short time, which greatly increases the throughput, but it is hard to operate for non-professionals. In addition, >10,000 cells are required as a starting input
[21]; therefore, FACS should be used conservatively when analyzing rare samples. Currently, microfluidic-based single-cell manipulation methods have become the mainstream methods of cell isolation, including droplet-based methods, microwell-based methods and the commercialized Fluidigm C1 integrated fluidic circuit (IFC) system.
Due to the limited amount of nucleic acid in a single cell, RNA molecules need to be amplified after RT to meet the requirements of sequencing. The general methods of amplification include exponential amplification based on polymerase chain reaction (PCR), and linear amplification based on in vitro transcription (IVT). PCR-based amplification can easily amplify a large number of cDNAs in a short time. However, since PCR is an exponential process that creates amplification bias, the excessive amplification of some sequences, and insufficient amplification of others, will result in inaccurate transcript quantification, and amplification errors will be propagated permanently if not properly corrected
[22][23]. Linear amplification is thought to be more accurate and reproducible than PCR
[24]. The second method, IVT, was first introduced by Eberwine
[25]. In this strategy, an upstream T7 promoter sequence, which is used to initiate IVT, is included in the RT primer. After RT and second-strand cDNA synthesis, T7 polymerase recognizes the T7 promoter and catalyzes transcription to produce more RNA molecules. Amplified RNAs need to be reconverted into DNAs for sequencing library construction. Therefore, this method is time-consuming (~28 h starting from IVT), and increases the complexity of operation, so it is not as widely used as PCR
[1].
In addition to the technical difficulties in library construction, the development of analysis pipelines that are valid for scRNA-seq data analysis is also an important technical challenge. Summaries of sequencing data analysis can be obtained in other reviews
[26][27][28], and are not covered here.
2. Developmental Course of scRNA-seq
2.1. Scaling of Sequencing Throughput
Since Tang et al. performed RNA sequencing in single mouse blastomeres, for the first time, in 2009 [
35], many complex questions about living systems composed of cells have been answered by scRNA-seq
[20][29][30]. One of the main focal points is how many types of cells are present in functional tissues. Identifying all kinds of cell types in an unbiased and accurate manner, especially those with a low proportion, requires the researchers to analyze a large number of cells
[31]. With the efforts of many researchers, scRNA-seq technology has developed from low-throughput mode to high-throughput mode.
Figure 2 shows several landmark techniques that mark the development of scRNA-seq technology in terms of throughput.
Figure 2. Timeline of the development of scRNA-seq technology. The upper half of the graph shows the main events marking the development of sequencing throughput, and the bottom half the improvement in sensitivity.
The initial scRNA-seq methods can only analyze a few cells at a time, because of the time-consuming and laborious manual cell isolation and separate sequencing library construction. In order to pool all of the transcripts from different cells for library construction and sequencing, while preserving information about their cellular origin, protocols such as STRT-seq
[32] and CEL-Seq
[1] introduced cell-specific barcodes. All transcripts from a single cell are labeled with the same barcode, which is unique for each cell. The barcode is a short oligonucleotide sequence that can be identified through sequencing. According to the barcode information, transcripts can be easily assigned to the corresponding cells. Barcoding is an excellent strategy to achieve parallel processing. However, the introduction of barcodes alone cannot solve the difficulty of isolating a large number of cells. Later, by combining FACS with automatic liquid handling, MARS-Seq
[33] successfully sequenced thousands of cells at one time. Three levels of barcodes are used to tag mRNAs, cells and plates, respectively, to pool all the materials for the subsequent automated processing. Similarly, STRT-Seq-2i
[34] utilizes a specialized FACS and barcoding protocol to increase the scale of sequencing. A custom aluminum plate with 9600 wells arranged in 96 subarrays is constructed for two rounds of barcode addition, allowing 9600 cells to be sequenced in parallel. However, these plate-based methods are not easy to carry out, and the number of cells analyzed is still limited.
With the introduction of microfluidic technology, the technical barriers of high-throughput single-cell operations have been fundamentally solved. The first commercial automated microfluidic platform, the Fluidigm C1 system, enables 96 single cells to be automatically processed at one time
[35]. Nonetheless, its processing capacity is far from meeting the needs of large-scale analysis. In 2015, the emergence of two droplet-based scRNA-seq technologies
[36][2] was a revolutionary breakthrough in the single-cell sequencing field, enabling the simultaneous processing of thousands of cells, and truly realizing high-thrughput parallel sequencing. Based on microfluidic droplets, the commercial platform 10x Genomics Chromium
[37] was developed rapidly, which can characterize tens of thousands of cells at a time. The three technologies will be described in detail below.
More recently, sci-RNA-seq
[38] and SPLiT-seq
[39] using a combinatorial indexing strategy to label cells were successively published. Instead of physical compartmentation of single cells, the purpose of labeling more than 100,000 single-cell transcriptomes at one time can be achieved simply by multiple rounds of splitting and pooling. These methods are very easy to operate, have high cell labeling efficiencies, and can considerably reduce the cost of sequencing. Furthermore, the more cells sequenced at one time, the lower the cost of sequencing per cell.
2.2. Improvement in Sensitivity
In addition to the capacity to sequence multiple cells in parallel, sensitivity, accuracy, repeatability, technical noise, cost and other features are also important aspects to be considered in the design of scRNA-seq methods. Especially, sensitivity is the most critical feature, being a fundamental indicator of the performance of a method. The main events marking the development of sensitivity are shown in Figure 2.
Sensitivity can be interpreted as the probability of capturing a particular transcript and eventually detecting it by sequencing
[40]. For some transcripts with low expression levels, drop-out events occur frequently in low-sensitivity sequencing platforms
[41]. Low sensitivity will also reduce the accuracy and repeatability of transcript quantification, which is detrimental for the distinction of subtle differences between cell subpopulations and accurate cell type classification
[42]. Every step of sequencing library construction may cause the loss of transcripts, thereby impairing the sensitivity of sequencing methods. The primary aim of all methods is to convert mRNA into cDNA for amplification. Most methods, including Smart-Seq
[43] and CEL-Seq
[1], use poly(T) primers to capture mRNA through the poly(A) tail to initiate RT reactions. While it can selectively capture mRNA and easily filter out numerous rRNA molecules, it also excludes some important transcripts without a poly(A) tail, such as circRNA, miRNA and nascent RNA. By contrast, MATQ-seq
[44] uses random primers to capture transcripts, which can not only detect all types of transcripts, but also improve the mRNA capture efficiency, and thus achieve higher sensitivity. Another whole-transcriptome analysis method, SUPeR-seq
[45], also takes advantage of random primers. After first-strand cDNA synthesis, in those methods that use PCR for cDNA amplification, the addition of the second PCR handle is also a key step in determining the efficiency of conversion. Some methods use a transferase to add a homopolymer tail, such as the poly(A) tail in SUPeR-seq or the poly(C) tail in MATQ-seq, to the 3′ end of the first-strand cDNA. A poly(T) or poly(G) primer containing the second PCR handle anneals to the homopolymer tail for second-strand synthesis. Smart-Seq uses a more convenient approach known as template-switching to incorporate another PCR handle. This method can obtain the full-length transcripts and reduce the 3′-end bias that exists in homopolymer tailing approach. However, the efficiencies of both these reactions are not 100%, and a proportion of cDNAs will inevitably be lost. To address this problem, Seq-Well S3
[46] uses random primers to initiate second-strand cDNA synthesis, and recovers most cDNA molecules without the second PCR handle. In addition, avoiding the loss of nucleic acid molecules during operation contributes substantially to the improvement in sensitivity. This is particularly evident in high-throughput methods. Cell fixation is necessary in some sequencing methods, resulting in the loss of transcripts and impaired sensitivity. For this reason, methods based on combinatorial indexing, such as sci-RNA-seq and SPLiT-seq, cannot completely replace the microfluidic-based method, despite having many benefits.
The microfluidics method reduces the reaction volume from microliters to nanoliters; thus, the sensitivity will be improved, along with an increased concentration of the targets, i.e., transcriptomes from single cells. However, when optimized, the sensitivities of the reactions in the tubes can reach the same levels as those using microfluidics. The application of microfluidics in scRNA-seq is key to improving the throughput. To improve the sensitivity, the researchers still need to focus on the fundamental chemistry utilized in scRNA-seq methods. The key to improving the sensitivity of scRNA-seq can be split into two aspects: (1) increasing the capture efficiency of RNA during the first-strand synthesis; (2) increasing the efficiency of the conversion of cDNA into amplifiable products, regardless of using second-strand synthesis or the template-switching activity of RT enzymes. Therefore, by integrating scRNA-seq chemistry with microfluidics, a higher sensitivity could be more easily achieved; meanwhile, the target concentration is greatly increased and background signal is reduced. Other smart strategies improving accuracy, reducing cost and so on will be discussed in the introduction of specific methods.