Mechanistic understanding of germ cell formation at a genome-scale level can aid in developing novel therapeutic strategies for infertility. Germ cell formation is a complex process that is regulated by various mechanisms, including epigenetic regulation, germ cell-specific gene transcription, and meiosis.
Note:All the information in this draft can be edited by authors. And the entry will be online only after authors edit and submit it.
The male and female germ cells combine to form the zygote, and this process is called fertilization. The development of fertilization-competent germ cells involves complex regulatory processes, including germ cell-specific cell division (meiosis), re-establishment of sex-specific imprinting genes, and acquisition of sex-specific dimorphic characteristics [1–3]. Various studies have attempted to elucidate the mechanism underlying germ cell development using several model systems. The key biological pathways and molecules involved in germ cell development and fertilization have been identified. In the field of reproductive medicine, these molecules serve as diagnostic and therapeutic biomarkers for patients with reproductive disorders [4,5].
The male and female germ cells combine to form the zygote, and this process is called fertilization. The development of fertilization-competent germ cells involves complex regulatory processes, including germ cell-specific cell division (meiosis), re-establishment of sex-specific imprinting genes, and acquisition of sex-specific dimorphic characteristics [1][2][3]. Various studies have attempted to elucidate the mechanism underlying germ cell development using several model systems. The key biological pathways and molecules involved in germ cell development and fertilization have been identified. In the field of reproductive medicine, these molecules serve as diagnostic and therapeutic biomarkers for patients with reproductive disorders [4][5].
Genome-scale analyses of germ cells provide promising insights into the fields of developmental biology and reproductive medicine. However, the numbers of developing and meiotic germ cells are limited. Hence, conventional genome analysis approaches have limitations to delineate genomic, transcriptomic, and epigenomic regulation at a single-cell resolution. In the conventional bulk sequencing method, numerous heterogeneous cells are subjected to sequencing. Most studies have adopted the bulk sequencing method, which can capture global or representative gene expression patterns or chromatin conformations of the pooled cells. However, this method does not account for cell-to-cell heterogeneity. The differentiation of immature germ cells, including progenitor primordial germ cells (pre-PGCs) and primordial germ cells (PGCs), into mature germ cells involves various steps [1,6]. Thus, a small degree of epigenomic heterogeneity could result in distant cell fate, which is not captured by bulk sequencing. To overcome this limitation, single-cell sequencing (SC-seq) was developed in the last decade [7]. The SC-seq can identify the developmental fate of each cell. The SC-seq technique was first developed using germ cells (oocytes) and preimplantation embryos (blastocysts). Various studies have improved the single-cell isolation and sequencing library preparation techniques. Currently, the most common method of SC-seq is single-cell RNA sequencing (scRNA-seq). The scRNA-seq can identify cell-to-cell heterogeneity within a mixed cell population without averaging the cell-specific gene expression levels. Additionally, scRNA-seq enables cell lineage tracing analysis. Cell heterogeneity from the scRNA-seq data can be visualized using principal component analysis, t-stochastic neighbor embedding (t-SNE), or uniform manifold approximation and projection [8,9]. The plots display cells with similar sequencing read characteristics as a cluster. The analysis of a sufficient number of cells can reveal their lineage trajectory, which could provide valuable information for low-input and complex samples. The scRNA-seq can be a useful tool to analyze rare and scarce target cells. Bulk sequencing involves cell sorting techniques, such as fluorescence-activated cell sorting (FACS) and magnetic-activated cell sorting (MACS), to isolate the target cells. However, the low number of rare and mixed cell types is a major limitation for sorting these cells as they yield a small library size for bulk sequencing. If the rare cells are not impaired during sequencing, scRNA-seq can bypass the cell sorting and isolation procedures and capture their unique characteristics. Therefore, scRNA-seq can be employed in studies involving germ cells, zygotes, and preimplantation embryos.
Genome-scale analyses of germ cells provide promising insights into the fields of developmental biology and reproductive medicine. However, the numbers of developing and meiotic germ cells are limited. Hence, conventional genome analysis approaches have limitations to delineate genomic, transcriptomic, and epigenomic regulation at a single-cell resolution. In the conventional bulk sequencing method, numerous heterogeneous cells are subjected to sequencing. Most studies have adopted the bulk sequencing method, which can capture global or representative gene expression patterns or chromatin conformations of the pooled cells. However, this method does not account for cell-to-cell heterogeneity. The differentiation of immature germ cells, including progenitor primordial germ cells (pre-PGCs) and primordial germ cells (PGCs), into mature germ cells involves various steps [1][6]. Thus, a small degree of epigenomic heterogeneity could result in distant cell fate, which is not captured by bulk sequencing. To overcome this limitation, single-cell sequencing (SC-seq) was developed in the last decade [7]. The SC-seq can identify the developmental fate of each cell. The SC-seq technique was first developed using germ cells (oocytes) and preimplantation embryos (blastocysts). Various studies have improved the single-cell isolation and sequencing library preparation techniques. Currently, the most common method of SC-seq is single-cell RNA sequencing (scRNA-seq). The scRNA-seq can identify cell-to-cell heterogeneity within a mixed cell population without averaging the cell-specific gene expression levels. Additionally, scRNA-seq enables cell lineage tracing analysis. Cell heterogeneity from the scRNA-seq data can be visualized using principal component analysis, t-stochastic neighbor embedding (t-SNE), or uniform manifold approximation and projection [8][9]. The plots display cells with similar sequencing read characteristics as a cluster. The analysis of a sufficient number of cells can reveal their lineage trajectory, which could provide valuable information for low-input and complex samples. The scRNA-seq can be a useful tool to analyze rare and scarce target cells. Bulk sequencing involves cell sorting techniques, such as fluorescence-activated cell sorting (FACS) and magnetic-activated cell sorting (MACS), to isolate the target cells. However, the low number of rare and mixed cell types is a major limitation for sorting these cells as they yield a small library size for bulk sequencing. If the rare cells are not impaired during sequencing, scRNA-seq can bypass the cell sorting and isolation procedures and capture their unique characteristics. Therefore, scRNA-seq can be employed in studies involving germ cells, zygotes, and preimplantation embryos.
The scRNA-seq was first used to examine the transcriptome of mouse oocytes and blastocysts and identify the aberrantly expressed genes in Dicer1 or Ago2 knockout oocytes and blastocysts [7]. The study reported that scRNA-seq identified a higher number of differentially expressed genes (DEGs) than microarray analysis. Other studies have modified and improved the scRNA-seq protocol. The advanced methods include Smart-seq [10[10][11],11], CEL-seq [12,13][12][13], Qualtz-seq [14], MARS-seq [15], Cyto-seq [16], SUPeR-seq [17], Drop-seq [18], InDrop [19], MATQ-seq [20], Chromium [21], sci-RNA-seq [22], Seq-Well [23], DroNC-seq [24], and SPLiT-seq [25] (Table 1). Generally, scRNA-seq involves the following steps: preparation of in vitro or in vivo samples, dissociation of the sample into single cells, barcode tagmentation of individual cells and reverse transcription, library preparation, massively parallel sequencing, and downstream bioinformatics analysis (Figure 1). Various scRNA-seq methods differ in at least one of the aforementioned steps. Furthermore, some scRNA-seq protocols, including Drop-seq [18], InDrop [19], and Chromium [21], utilize droplet-based technologies in which dissociated individual cells are encapsulated into oil droplets and subjected to barcode tagmentation as well as amplification using microfluidic devices [26]. These methods are suitable for analyzing samples containing mixed cell populations, examining transcriptomic heterogeneity in the mixed cell population, and cell lineage tracing experiments. When Tang et al. first introduced scRNA-seq [7], the method did not involve microfluidic manipulation as individual oocytes or preimplantation embryos were manually selected under the microscope. In addition to the manual single-cell isolation methods, the conventional cell separation techniques, including FACS, MACS, and laser capture microdissection, have been employed for single-cell separation and harvesting. The sequencing read coverage also varies among the scRNA-seq methods. Smart-seq [10], MATQ-seq [20], and SUPeR-seq [17] can sequence almost full-length transcripts, whereas other methods can sequence either 5′ end (STRT-seq) or 3′ end (Drop-seq [18], DroNC-seq [24], Seq-Well [23], and SPLiT-seq [25]) of the transcripts. The full-length sequencing method, which can detect splice variants and strand-specific transcripts, has more advantages than the methods that sequence 5′ or 3′ ends of the transcripts. MATQ-seq [20] and SUPeR-seq [17], which are reported to detect both polyA(+) and polyA(−) transcripts simultaneously, are optimized for the examination of non-coding RNAs.
TablFigure 1. Schematic illummary of technical features of thestration showing the procedure of scRNA-seq in gonadal tissues. Reproductive tissues are isolated and enzymatically dissociated. Highly pure single cell populations are obtained by conventional cell sorting methods such as fluorescence-activated cell sorting (FACS) or magnetic-activated cell sorting (MACS). Uniquely barcoded beads are required for microfluid-based scRNA-seq methods described in the review. Technically, one cell is interacted with a bead, and subsequently the cells are subjected to cell lysis for the preparation of mRNAs. The isolated mRNAs are used for reverse transcription. Finally, scRNA-seq libraries containing bead-specific oligo sequences and unique molecular identifier (UMI) are generated.
Methods |
Summary |
Advantages |
Challenges |
Smart-seq [10,11] |
§ 102–103 cells/run § Detects full-length transcript § Addition of a few cytosines on 5′ end of full-length transcript allows hybridization with oligonucleotide primer |
§ Available commercial kits § Detection of different splice variants |
§ No detection of strand-specific nature of mRNAs |
CEL-seq [12,13] |
§ 102–103 cells/run § Only 3′-tag transcripts § Pipets single cell per tube |
§ Improved accuracy § Strand specificity and efficient barcoding |
§ Difficult to distinguish splice variants § Less sensitive |
Qualtz-seq [14] |
§ 104–105 cells/run § Cell isolation using FACS § Barcoding cells and first round of PCR performed on individual cell |
§ High UMI conversion efficiency § Low cell/run cost |
§ High amplification error rate § Smaller fragments preference |
MARS-seq [15] |
§ 103–5 × 103 cells/run § Cell isolation using FACS § Barcoding cells and first round of PCR performed on individual cell § Only 3′-tag transcripts |
§ Low reaction volume § Low noise § Strand specificity |
§ Not suitable for identifying splice variants § Limited to polyA RNAs § Requires FACS |
Cyto-seq [16] |
§ 102–104 cells/run § Only 3′-tag transcripts § PCR amplification using gene-specific primers § Beads with unique barcodes used for barcoding and transcript amplification |
§ High throughput § No restriction on cell sizes |
§ Time-consuming § Trade-off between sequencing depth and detection of differential gene expression |
SUPeR-seq [17] |
§ ~10 cells/run (micromanipulation) § Individual cell processing § Random primers with universal anchor sequence used for PCR amplification |
§ Detection of circular RNAs § 3′ bias avoidable |
§ Low throughput |
Drop-seq [18] |
§ Split and pool synthesis of cell barcodes and UMI synthesis conducted on primer beads § cDNA amplification of transcripts of the cells carried within droplets § Only 3′-tag transcripts |
§ Low cost § Robust cell processing (104 cells/day) § High yield § Customizable cell barcode |
§ High dependency on microfluidics |
InDrop [19] |
§ Only 3′-tag transcripts § Polyacrylamide hydrogels with ssDNA primers with barcodes and polyT tails used § Each cell suspended in droplet with hydrogel and cell lysis proceeds within the droplet |
§ Low cell/run cost § Robust cell processing § High yield § Customizable cell barcode |
§ Low mRNA capture efficiency § One to one labeling of cell and barcode not guaranteed § High dependency on microfluidics |
MATQ-seq [20] |
§ ~102 cells/run § Cells mouth-pipetted into individual PCR tube § Barcodes incorporated to transcript from G enriched primers that bind to polyC tail |
§ Captures both polyA and non-polyA RNAs § Low 3′ end bias |
§ Low throughput |
Chromium [21] |
§ 102–104 cells/run § Only 3′-tag transcripts § Barcoded gel beads, cells and enzymes partitioned by oil |
§ Robust cell processing § Automated procedures § Relatively high cell capture efficiency |
§ High dependency on microfluidics |
sci-RNA-seq [22] |
§ Methanol fixation of cells § Only 3′-tag transcripts § Reverse transcription incorporates UMI and barcode to each cell § Transposase used prior to library amplification |
§ Minimized perturbance to cell state or RNA integrity § FACS step can be incorporated |
§ Low throughput |
Seq-Well [23] |
§ Method largely follows Drop-seq method § Cells loaded into subnano liter well by gravity |
§ Microfluidics device-independent § Potential for multi omics measurement at single cell scale |
§ Not fully automated |
DroNC-seq [24] |
§ Method largely follows Drop-seq method § Only 3′-tag transcripts § New microfluidics design and nuclei isolation incorporated to the original Drop-seq method |
§ Reduced nuclei isolation time § Minimized RNA degradation |
§ High dependency on microfluidics |
SPLiT-seq [25] |
§ ~5 × 104 cells/run § Cell or nuclei are fixed with formaldehyde § Only 3′-tag transcripts § Transcriptome identification performed by four rounds of combinatorial barcoding § Barcoded samples undergo PCR amplification and are pooled to be sequenced |
§ Minimized perturbance to cell state or RNA integrity § Independent of microfluidics device |
§ Low number of average read/cell § Low cell type differentiation resolution |
Table 1. Summary of technical features of the scRNA-seq methods described in the entry.
Methods | Summary | Advantages | Challenges |
---|---|---|---|
Smart-seq [10][11] |
|
|
|
CEL-seq [12][13] |
|
|
|
Qualtz-seq [14] |
|
|
|
MARS-seq [15] |
|
|
|
Cyto-seq [16] |
|
|
|
SUPeR-seq [17] |
|
|
|
Drop-seq [18] |
|
|
|
InDrop [19] |
|
|
|
MATQ-seq [20] |
|
|
|
Chromium [21] |
|
|
|
sci-RNA-seq [22] |
|
|
|
Seq-Well [23] |
|
|
|
DroNC-seq [24] |
|
|
|
SPLiT-seq [25] |
|
|
|
The signal-to-noise ratio of scRNA-seq is low owing to the low amount of input sequences. To overcome this limitation, a normalization method for measuring endogenous transcript levels should be employed. Currently, unique molecular identifiers (UMIs) or spike-in controls have been used for normalization [27]. The UMIs are used to determine the absolute transcript levels. Spike-ins, such as the external RNA control consortium controls from different species with known sequences and concentrations, are used to calculate the relative levels of endogenous transcripts. Previous studies have demonstrated that UMIs (approximately 5 bp in length) can reduce technical noise and aid in fitting the sequencing reads into statistical models [28–30][28][29][30]. Spike-in controls with known concentrations of synthetic transcripts can be used to calculate the differences between expected and observed expression of the spike-ins along with a cell type-specific factor that adjusts the difference. Next, the cell type-specific factor is applied to obtain the normalized level of endogenous transcripts. The spike-in normalization method has been successfully used in the development of statistical models that can be applied to various scRNA-seq experiments [31–33][31][32][33].
Figure 1. Schematic illustration showing the procedure of scRNA-seq in gonadal tissues. Reproductive tissues are isolated and enzymatically dissociated. Highly pure single cell populations are obtained by conventional cell sorting methods such as fluorescence-activated cell sorting (FACS) or magnetic-activated cell sorting (MACS). Uniquely barcoded beads are required for microfluid-based scRNA-seq. Technically, one cell is interacted with a bead, and subsequently the cells are subjected to cell lysis for the preparation of mRNAs. The isolated mRNAs are used for reverse transcription. Finally, scRNA-seq libraries containing bead-specific oligo sequences and unique molecular identifier (UMI) are generated.
Mouse and human germ cells are unipotent cells that can differentiate into oocytes or sperms [1,34,35][1][34][35]. In mice, the germ cells begin to form a subset of specialized mesoderm-origin cells called PGCs at the extraembryonic region of the epiblast during gastrulation (Figure 2). The specified PGCs then migrate and colonize the genital ridge. The migrating PGCs are reported to undergo epigenetic reprogramming, including global DNA demethylation, imprinting erasure and re-establishment, and histone methylation (H3K9me2 and H3K27me3) [36–38][36][37][38]. The bone morphogenetic protein (BMP)- small mother against decapentaplegic (SMAD) signaling axis mediates PGC specification by activating critical transcription factors (TFs), including BLIMP1, PRDM14, and TFAP2C [39,40][39][40] (Figure 2). The TF-regulated transcriptional circuit modulates the activation of germ cell-specific gene expression and repression of somatic cell lineage-specific gene expression [41–44][41][42][43][44]. The loss of at least one of the key TFs leads to impaired PGC specification and repression of mature germ cell formation.
Figure 2. Human and mouse germ cell development and associated genes. Primordial germ cells (PGCs, marked as green) can be recognized for the first time at the extraembryonic region of epiblast in mouse (at ~E6.25) and a layer between epiblast and visceral endoderm in human (at ~2 to 3 weeks of gestation) during gastrulation. These cells migrate towards the genital ridge during embryo turning, and simultaneously undergo extensive epigenetic reprogramming. Upon arrival at the genital ridge, PGCs are dispersed in the female genital ridge and organized to make a winding tubular pattern in male genital ridge. Multiple scRNA-seq studies in various stages of germ cell development were performed to elucidate cellular diversity, and critical gene expression signatures in developing germ cells, terminating mitosis and entering meiosis. Stage-specific genes identified by scRNA-seq are noted. SSC: spermatogenic stem cells, diff-SPG: differentiating spermatogonium.
The male and female germ cells undergo dimorphic differentiation processes after they reach the genital ridge [45]. In the genital ridge, the male germ cells become mitotically quiescent (arrested at G0/G1 phase) after several cell divisions and begin to proliferate after birth [46]. The proliferating male germ cells colonize at the base of the seminiferous tubule and transform into spermatogonial stem cells, which are diploid cells that give rise to mature spermatozoa [47]. In contrast, the female PGCs reach the genital ridge and undergo meiosis I. The cell cycle of female PGCs is arrested at the diplotene of meiotic prophase I. During puberty, the female germ cells resume meiosis I, enter meiosis II, and complete meiosis II after fertilization [48].
Various studies have demonstrated that transcriptional regulation by TFs is conserved using an embryonic stem cell (ESC)-derived in vitro germ cell differentiation model. However, the downstream gene networks in humans are distinct from those in mice. For example, a group of pluripotent genes, comprising Sox2, Esrrb, and Klf2, are expressed in mouse PGCs, whereas KLF4 and TFCP2L1 are expressed in human PGC (hPGC)-like cells (Figure 2). SOX17 upregulates the expression of BLIMP1 and TFAP2C in hPGCs, which is not observed in mouse PGCs. The formation of PGC-like cells from ESCs is hindered upon the loss of SOX17 [34]. Therefore, these studies suggest the presence of both common and unique TF circuits during PGC development across different species.
Various studies have demonstrated that transcriptional regulation by TFs is conserved using an embryonic stem cell (ESC)-derived in vitro germ cell differentiation model. However, the downstream gene networks in humans are distinct from those in mice. For example, a group of pluripotent genes, comprising Sox2, Esrrb, and Klf2, are expressed in mouse PGCs, whereas KLF4 and TFCP2L1 are expressed in human PGC (hPGC)-like cells (Figure 2). SOX17 upregulates the expression of BLIMP1 and TFAP2C in hPGCs, which is not observed in mouse PGCs. The formation of PGC-like cells from ESCs is hindered upon the loss of SOX17 [34]. Therefore, these studies suggest the presence of both common and unique TF circuits during PGC development across different species.