Single Cell Transcriptomics to Understand Hematopoietic Stem Cells

Single Cell Transcriptomics to Understand Hematopoietic Stem Cells: Comparison

Please note this is a comparison between Version 3 by Jessie Wu and Version 4 by Jessie Wu.

Aging leads to a decline in the functions of the hematopoietic and immune system, which in the elderly results in an increased risk of infection, poor vaccination efficacy, anemia and blood cancers. It is now well established that age-related dysfunction of the entire hematopoietic system originates from hematopoietic stem cells (HSCs), which lose their fitness over time. Single-cell transcriptomic technologies enable the uncovering and characterization of cellular heterogeneity and pave the way for studies aiming at understanding the origin and consequences of it. The hematopoietic system is in essence a very well adapted model system to benefit from this technological advance because it is characterized by different cellular states. Each cellular state, and its interconnection, may be defined by a specific location in the global transcriptional landscape sustained by a complex regulatory network. This transcriptomic signature is not fixed and evolved over time to give rise to less efficient hematopoietic stem cells (HSC), leading to a well-documented hematopoietic aging.

single cell transcriptomic
HSC
Aging

1. Understanding Hematopoietic Stem Cell Heterogeneity to Grasp Aged Hematopoietic Stem Cell Deregulations

The very large number of scRNA-seq studies carried out over the last ten years has greatly increased people's knowledge of different biological processes, particularly in the context of hematopoiesis. The interest of laboratories working on hematopoietic stem cells (HSCs) for scRNA-seq approaches is explained by the need to understand, at the molecular level, the heterogeneity of HSCs and its impact on their progeny. It is now admitted that each individual HSC, although sharing the same cell surface marker combination, differs in terms of functional outputs and molecular signatures ^[1][2][71,72]. The numerous recent scRNA-seq studies, which have been enriched with the increase in the number of cells sequenced, have challenged the traditional, hierarchical view of hematopoiesis and have led to the reconsideration of blood cell relationships and the routes by which lineage differentiation occurs ^[3][73]. It became evident that hematopoiesis is a continuous differentiation model with a lack of clear delineation between the different subtypes of murine HSCs ^[4][5][54,74] and with the identification of an early lineage priming in subsets of HSCs ^[6][7][55,75]. The continuum of differentiation in the HSC population has also been demonstrated in humans, where the transcriptomes of the HSC and MPP (CD34+ cells) compartment were found to be very similar, suggesting a cloud of HSCs differentiating directly into unipotent progenitors ^[8][9][76,77]. This model has been related to Waddington’s landscapes of cell differentiation in which the differentiation of a cell is represented by a bead going down a hill into diverging valleys, each of which ultimately leads to a different cell type ^[10][11][78,79].

The value of using scRNA-seq to identify and characterize rare populations has been demonstrated in studies seeking to understand the cellular and mechanistic evolution of HSC during aging. The transcriptomic analyses at the single cell level have particularly been useful to clarify the origin of the myeloid bias found in aged HSCs. Although a first study demonstrated a cell autonomous impaired lymphoid differentiation potential of aged MPP4 pinpointing the cellular compartment responsible for lymphoid cell loss with aging ^[12][80], most of the following studies suggested a gain in myeloid differentiation potential of aged HSCs. This gain was mainly due to an expansion of identified platelet-primed or megakaryocyte-primed HSCs ^[6][13][55,81]. In line with the increase in myeloid potential observed in old HSCs, several studies demonstrated an expansion with age of HSCs primed to respond to an inflammatory stimulus. Three studies highlighted a clear subpopulation of HSCs primed for interferon stimulus that were prepared to strongly respond to future stress or injuries that are expanded with aging ^[6][14][15][55,82,83]. In addition, a new group of MPPs has been described ^[16][84], which ultimately resembles from a transcriptomic point of view the group of interferon-primed progenitors described in ^[6][15][55,83]. This specific age-related amplification of LTHSCs with mis-regulated interferon signaling is consistent with the concept of inflammaging ^[17][85]. Another interesting HSC group, amplified during aging, is the cluster of LTHSCs presenting a TGF signature ^[6][55] that may correspond to accumulation of HSC subtypes with differential responses to the TGF that was previously identified ^[18][86].

The accumulation of non-functional HSCs during aging has also been extensively studied and is thought to be related to the history of HSC cycling activity and an imbalance between self-renewal and the initiation of HSC differentiation ^[19][87]. Concerning the regulation of the balance between population maintenance and differentiation, a study based on scRNA-seq suggested an increase in self-renewal, which would be linked to a shortening of the G1 phase of the cell cycle ^[20][37]. However, another study showed that the accumulation of non-functional aged HSCs, seemed to originate rather from a blockage of the differentiation of quiescent-aged HSCs biased towards the myeloid lineage ^[6][55]. The latter study also showed an increase in the TGF signature in these cells, suggesting a role for TGF signaling in the accumulation of aged HSCs. In parallel, another study explained the accumulation of the aged HSC population by the activation of the JAK-STAT pathway and p53 ^[21][88]. Interestingly, these two studies agreed on the markers of this aged HSC population that shared the same key genes (Hes1, KLF factors, JunB, Nr4a1, Cdkn1a). EGR1 was found to be upregulated in aged human hematopoietic stem cells, independent of cell cycle phase progression, but was associated with loss of CDK6 and CCND2 during S phase, which would disrupt HSC cell cycle ^[22][89]. These studies converge to show an accumulation of cells with several quiescent markers, such as Nr4a1, Junb, and Cdkn1a, which are compatible with observations on the quiescent state of HSCs ^[23][24][90,91]. They are also consistent with the distinction between quiescent and active HSCs and with the role of retinoic acid signaling in maintaining the hypoxic dormant state ^[25][26][92,93]. In addition, a study that implies an important role for Cdc42 activity/polarity in HSCs for driving the symmetric/asymmetric mode of division revealed that the frequency of polar HSCs decreases upon aging, which results in more symmetric divisions with daughter stem cells, with impaired potential ^[27][94].

It is important to note that all of the studies discussed above only partially overlap, with some features of aging not replicated across studies. One source of discrepancy may lie in how and when they handle cell cycle bias in their analysis. Indeed, the cell cycle induces variations in the whole transcriptome, confounding signals of interest, such as cell differentiation ^[28][95]. Thus, young and aged HSC/MPP heterogeneity in term of priming for specific lineage seems to be better resolved with a regression of cell-cycle effect before dimension reduction ^[6][55] than without ^[14][20][37,82]. The partial overlap between studies could also very well be a consequence of heterogeneous aging capture, consistent with the theory of clonal hematopoiesis, which shows that competitive clones emerge and amplify at the expense of others during an individual’s lifetime ^[29][96]. One way forward would be the construction of a large single-cell atlas collecting hundreds or even millions of thousands or even millions of cells to provide an in-depth phenotypic description of biological tissues across a wide range of pathological conditions, similar to the human cell atlas ^[30][97] or the tumor infiltrating lymphocytes atlas using samples from hundreds of donors ^[31][98].

2. Coupling the Transcriptome to the Cell Fate

An important issue is to capture the temporal dynamics of the cell population by evaluating the future state of an individual cell. One strategy is to use RNA velocity analysis, which is based on calculating the time derivative of the gene expression state by distinguishing newly transcribed (spliced) mRNAs from mature (unspliced) mRNAs by the presence of introns. The combination of velocities between genes is then extrapolated to estimate the future state of each cell in transcriptome space ^[32][33][99,100]. Although promising, this bioinformatics approach seems poorly suited to the study of hematopoietic cells in which changes in mRNA processing and stability may be key factors in HSC activation ^[34][101]. This is observed experimentally by a boost in RNA transcription that induces unexpected and probably erroneous projections of future cell states due to biased velocity estimates ^[35][102]. To address the challenge of correctly inferring the cell differentiation trajectory, the best strategy seems to combine scRNA-seq with other single-cell omics technologies and/or lineage tracing approaches ^[36][103]. Lineage tracing methods have proven effective in understanding the heterogeneity of the initial HSC population, as well as the clonal relationships between individual HSCs and their progeny ^[36][103]. These methods are based on a library of DNA barcodes expressed within a transgene that are stably integrated into the genomes of the HSCs under study. The offspring after multiple divisions of a particular HSC, thus, inherits its own barcode, allowing the assessment of transcriptional changes and functional potential of each cell in the same experiment. In this way, it is possible to trace the progeny of a HSC in an inferred pseudo differentiation trajectory and present an unbiased view of differentiation. Altogether, lineage tracing studies interestingly showed that rather few mature cells had integrated barcodes at the HSC level (apart from the megakaryocytic lineage), suggesting that MPPs rather than HSCs were the main contributors to undisturbed hematopoiesis (reviewed in ^[37][104]). When coupled with scRNA-seq, they appeared to be the right way to follow the fate of a given HSC in relation to its transcriptomic signature and have deepened researchers' understanding of HSC differentiation potential. They confirmed the early priming of HSCs to different lineages under physiological conditions ^[7][38][75,105]. They additionally showed, using computational methods of dynamic inference that fate choice occurs earlier than detected by the algorithms and that cells progress smoothly in the differentiation with precise and consistent dynamics ^[38][105]. These results were later strengthened by the addition of CRISPR-seq to lineage tracing and scRNA-seq, which clarified that the HSC subpopulation with high self-renewal potential contributed very little to hematopoiesis ^[39][106]. Single-cell epigenomic approaches could also be useful in determining the fate of an HSC. This hypothesis is supported by a HSC lineage tracing study, combined with extensive transcriptomic and epigenomic analyses, which demonstrated that epigenetic features, in contrast to the transcriptome, are consistently correlated with the cell fate ^[40][107]. Altogether, these approaches reveal the limits of using scRNA-seq alone to distinguish functionally heterogeneous HSC states and emphasized that transcriptionally similar cells can have cell-autonomous bias toward different fate choices ^[37][104].

3. Network-Based Dynamic Modeling: A Successful Approach to Decipher Hematopoiesis

As discussed above, the HSC fate is driven by complex interaction networks involving signaling, transcriptional and also epigenetic regulations. To decipher such large and complex biological networks and to cope with and take advantage of multiple layers of information, explanatory and predictive mathematical models are beneficial. Because of the ease of collecting blood or bone marrow samples, which facilitated data generation, the hematopoietic system was early on the subject of mathematical modeling studies ^[41][108]. Initially, studies at the level of the production of the different blood cell populations and their interactions were modeled using differential equations ^[42][43][44][109,110,111]. However, this mathematical formalism requires precise parameters, such as reaction constants or other initial conditions that are important because the behavior of the system can be very sensitive to them, but their knowledge is often limited or incomplete. Qualitative approaches are more suitable to model gene regulatory networks and decipher their biological complexity ^[45][112]. More economical in parameters, they gain in flexibility by allowing the variables to represent different biological realities (activation of a gene or phosphorylation of a protein for example). Logical models are therefore much more abstract and qualitative representations of biological systems than models based on systems of differential equations ^[46][113]. They have demonstrated their effectiveness in a variety of biological systems ^[47][114], such as the regulation of the cellular response to DNA damage ^[48][115] or the combinatorial effect of mutations in tumorigenesis ^[49][116]. The emergence of next generation sequencing and the amount of data generated have facilitated the study of molecular regulatory networks helping the construction of numerous logical models, among others related to hematopoiesis. These models have been useful in understanding regulatory events in hematopoiesis, such as those focused on T cell differentiation ^[50][51][117,118] and their activation ^[52][119], which have contributed to the understanding of cancer resistance to immunotherapies ^[53][120]. Logical models have been particularly helpful to describe the dynamical behavior and differentiation of the progenitor and stem cell compartment. A logical model of differentiation starting from the CMP toward erythrocytic, megakaryocytic, granulocytic and monocytic lineages recapitulated the differentiation hierarchy characterized by the presence of an initial branching between granulocyte-monocyte progenitor (GMP) and megakaryocytes-erythrocyte progenitor (MEP) ^[54][121]. Further upstream in hematopoiesis, a logical model of HSPC regulation was established, presenting a stability in the dynamics that reflects the heterogeneity of the hematopoietic stem and progenitor cell (HSPC) population at different stages of activation, observed in single cell expression data ^[55][122]. The analysis of the model showed that some transient external activations allow the modelled system to escape from this stability zone and reach a differentiated state (for example, the activation of GATA1 allowed system to reach an erythroid state). Another model described a differentiation pathway starting from MPP states and reaching two stable states, corresponding to lymphoid and myeloid progenitors ^[56][123]. Interestingly, stimulations of the model with cytokines suggested a possible reprogramming of pre-B cells into macrophages by transient activation of CEBPA ^[56][123]. These studies demonstrated that in addition to providing a global mechanistic vision of the observed phenomena (progenitor differentiation), the construction and analysis of logic models also allow the prediction of new local regulations and additional mechanisms. Logical modeling has also been very useful to understand the effect of extrinsic signals on HSC behavior. This can involve modeling the interaction between two types of cells. For example, a logical model of the dialogue between HSCs and mesenchymal stromal cells (MSCs) revealed attractors representing states of the system where HSC and MSC are attached and detached, respectively. The model highlighted the role of aberrant NF-kB expression in the creation of a tumor microenvironment ^[57][124]. Another recent model of a molecular regulatory network governing HSC quiescence and activation has been elaborated ^[58][125]. In the model, synchronous simulations provided stable LTHSC, STHSC, and proliferating HSC states, depending on a combination of niche signals that promoted quiescence or cell cycle activation. This model uncovered a novel regulatory mechanism of p53 in homeostasis, involving ROS- and RAS-activated TF regulators ^[58][125]. Such qualitative analysis of the dynamics opens up interesting avenues for studying age-related extrinsic deregulations of HSCs and the key players that control the resulting changes.

4. Single Cell Data and Boolean Networks Inference to Understand Hematopoietic Stem Cell Aging

One of the current challenges is to adapt mathematical methods to single cell data in order to integrate as much information as possible extracted from the data into the Boolean model. This relies on inference methods, which have a two-fold purpose: the construction of the influence graph and the characterization of the Boolean parameters (Figure 1). This involves reverse engineering approaches.

Figure 1. Boolean Network inference. Workflow of Boolean network inference from scRNA-seq data. (a) scRNA-seq data can be used to infer transcriptional interactions between TFs and complement influence graph constructed from prior knowledge of the searched BN. (b) The pseudo trajectories issued from the scRNA-seq data, can be discretized and translated in discrete observations of the searched BN. (c) Given these inputs, constraint programming can be used to infer the logical rules of the BN.

4.1. Gene Regulatory Network Inference

The issue of inferring the molecular interaction network underlying the biological process has been addressed in the past from bulk expression data ^[59][126]. Nowadays, the amount of information provided by single-cell data has greatly improved the quality of inferring interactions between biological components of these networks, most notably for regulations between TFs and their target genes based on expression dependencies observed in the data. Numerous mathematical methods have been proposed for this purpose ^[60][127] and evaluated ^[61][128], relying on regression approaches ^[62][63][129,130], expression correlations ^[64][131] and information theory ^[65][66][132,133]. These approaches can be used to infer TF influence graph from single cell data and complement prior knowledge of gene regulatory network of the modelled biological system (Figure 1a). Inference of influence graphs from scRNA-seq data is clearly improved by adding additional levels of information, such as epigenetics and genomics. Epigenetic data from ChIP-seq experiments of TFs and histone marks, as well as ATAC-seq, have identified regulatory regions with DNA binding motifs for hundreds of TFs. These motifs were used to identify sets of genes coregulated by the same TF ^[67][134]. This strategy has been developed by an application, SCENIC, which searches for motif enrichments in the regulatory regions of a TF’s target genes by regression trees ^[68][60]. The combination of chromatin landscape, transcriptome, and other omics analyses at the single-cell level promises new developments in methods for inferring contextualized and accurate influence graphs of transcriptional interactions ^[60][127]. For example, scATAC-seq data have been used to link DNA regulatory elements to their potential target genes to reconstruct TF networks ^[69][135].

4.2. Boolean Network Inference

Mathematical models present the specificity of integrating the dynamics with the influence networks, thus enabling a better understanding of how the network of regulations gives rise to the global observed behavior. Model inference, therefore, requires the determination of the dynamical parameters, consistent with the influence graph. In the case of Boolean networks (BNs), this consists in inferring, for each node of the influence graph, a logical rule describing the changes of its state according to the state of its regulators. The trajectories of the global system will be defined from this set of logical rules. The pseudo-trajectories constructed from the scRNA-seq data can be interpreted as observations of some trajectories generated by a BN (Figure 1b). The challenge is then to find the logical rules from these “partial dynamics” that are sets of transitions between states observed in the data (Figure 1c). Several methods have been proposed to address this inference problem thanks to constraint programming, model checking and the introduction of novel semantics ^{[70][71][72][73]}[34,136,137,138]. In practice, BN inference from scRNA-seq data have been successfully implemented to define a BN of embryonic hematopoietic development in humans ^[70][34], as well as a BN of HSC differentiation into MPP and MEP configurations ^[73][138]. Incorporation of new information from scRNA-seq into BN inference has necessitated, and thus enabled, the emergence of new analysis methods that are proving interesting and effective in capturing physiological changes, such as aging. Hence, Boolean models of aging have started to emerge. Schwab et al. ^[74][139] proposed an original analysis of single cell data exploiting population heterogeneity to sample time series data, from which sets of Boolean models were inferred. Then, by studying some topological properties, their analysis captured the dynamical heterogeneity, occurring with HSCs aging. Hérault et al. ^[75][140] presented an original inference methodology to construct a Boolean model allowing a better understanding of the mechanisms and factors controlling the effects of aging on HSCs. They developed an inference strategy that consists in a specific combination of different methods ranging from transcriptional regulation analysis for influence graph inference to constraint programming ^[71][136] and model checking techniques for logical rule inference. They succeeded in obtaining a Boolean model which agrees with most prior experimental observations of HSC biology. They provided expected and new predictions that could explain the myeloid bias in aged HSC differentiation, including the overactivation of Egr1 and Junb or/and the loss of Cebpa activation by Gata2 ^[75][140].