2. TCR Repertoire Analysis
It allows the definition of these types of “barcodes” in different contexts, revealing, for example, important information about a successful antitumor T-cell response, on how to improve efficacy and safety of immune checkpoint inhibitors (ICI), on the tumor microenvironment (TME) characterization, on the immune response during disease development and treatment, on minimal residual disease (MRD) assessment, transplantation, autoimmune disease and infectious disease characterization and therapy
[1][4][8][13][28][29][30].
In this entry, researchers describe TCR repertoire sequencing strategies and applications (Figure 1) in individuals exposed to infectious agents, such as HIV, HBV, HCV and SARS-CoV-2, and to cancer, with an updated overview of the available technologies.
Figure 1. Schematic representation of TCR repertoire analysis and applications.
3. TCR and HLA
As already mentioned, T-cell activation occurs as a consequence of the specific recognition between TCR and foreign antigen peptides presented by the MHC molecules (Figure 2), which are transmembrane glycoprotein complexes expressed on the cell surface.
Figure 2. Schematic representation of TCR repertoire generation upon exposure to infectious agents and cancer neoantigens.
MHCs in humans are coded by the highly polymorphic human leukocyte antigen (HLA) gene family located on chromosome 6 and involved in the identification of self versus non-self.
Three subclasses of HLA molecules are expressed in various human tissues: HLA class I, class II and non-classical HLA molecules; some of them make up the class III region.
All nucleated cells express HLA class I proteins on the plasma membrane, allowing to expose peptides derived from intracellular antigens to CD8 T-cells monitoring via TCR interaction. As a result, cells expressing viral or mutated non-self-antigens are killed directly to restrain infection and prevent further cell transformation
[31]. There are three main HLA types within this class encoded by the HLA-A, HLA-B and HLA-C loci.
HLA class II proteins are constitutively expressed by professional APCs, including B-cells, and their expression can be upregulated on activated immune cells, binding peptides derived from antigens captured from outside of the cell, presenting ‘exogenous’ peptides to CD4 T-cells
[31]. The three main types of HLA in class II are encoded by the HLA-DR, HLA-DQ and HLA-DP loci. APCs can present outside captured antigens also using MHC class I through a process of cross-priming/cross-presentation.
More than 30,000 HLA variants among class I and II have been determined so far, and their permutations raise the number of possible combinations to astronomical numbers, making it unlikely that the individual’s resulting HLA type would be shared with an unrelated individual and defining the subset of peptide epitopes that could be presented for immune surveillance
[32].
The class III alleles encode for factors involved in the inflammation process, leukocytes differentiation and the complement system
[33].
In addition to HLA proteins’ role in the human immune system T-cell activation, HLA type plays an important role in driving T-cell positive and negative selection in the thymus, thereby also shaping the naive T-cell repertoire.
HLA importance is evident in the context of organ and bone marrow transplantation, being responsible for the rejection process, but many studies today link the HLA type with disease susceptibility or development and response to therapy for many diseases.
The first link found in this context has been the discovery of HLA-B and Hodgkin lymphoma association
[34], and, since then, MHC is considered the genome region with the greatest amount of association with human diseases
[35] (some examples are shown in
Table 2).
Table 2. Examples of association linking HLA type and disease.
Type of HLA Alleles Association |
HLA Typing Future Opportunities |
Example |
With specific infectious diseases or the severity of infection |
To provide insight into differences in T-cell repertoires in infectious disease and patterns of T-cell targeting |
Heterozygous individuals progress less rapidly to AIDS than HLA homozygous individuals after HIV infection [31]. Kaslow et al. found that HLA-B27 and B57 were strongly associated with slow progression to AIDS [36]. |
With increased risk of or protection from various autoimmune disorders |
To clarify a subject’s disease state and potentially stratify patients for treatment studies. |
Association of the HLA class I region has been detected for several autoimmune diseases (AIDs); some examples are:
- -
-
HLA-B with type 1 diabetes (T1D) [37];
- -
-
HLA-C with multiple sclerosis (MS) and Graves’ disease (GD) [37];
- -
-
HLA B-27 with ankylosing spondylitis (AS) [38];
- -
-
HLA-DRB1, in particular HLA-DRB1*04 and *10 alleles [39] in rheumatoid arthritis (RA);
- -
-
HLA-G with Crohn’s disease (CD) [40].
|
With cancer therapy outcomes |
To understand and infer the efficacy of immunotherapy in specific individuals |
Higher heterozygosity in HLA has been linked to a better response to anti-cancer treatments [41]. |
An example of complemented data can be found in De Witt III et al.; in their work, they analyzed TCRs from a cohort of 666 healthy volunteer donors to find links between TCRs profiling and HLA associations to disease
[32]. Starting from the analyses of the common TCRs across the whole cohort, the study of TCR-HLA association patterns co-occurrence, as a strong influence by HLA alleles distribution, was observed in accordance with the fact that most αβ TCRs are HLA-restricted.
Additional analyses revealed that significant TCR clusters, shared within the cohort, may represent markers of immunological memory and showed that most highly HLA-associated TCRs are related to common viral infections, such as influenza virus and Epstein–Barr virus (EBV).
Moreover, they further analyzed CDR3 sequence–HLA allele correlations, identifying a significant negative association between CDR3 and peptide charges, which suggests that the maintenance of charge complementarity across the TCR-MHC complex is a relevant feature of binding.
These results demonstrate the potential of combining statistical tools to TCR repertoires and immune exposure as sequences from the clusters can infer a TCR expansion driver.
Thus, TCR sequence–disease associations are complicated by individual HLA type dependence, thereby characterizing the TCR-HLA interactions. Therefore, it is crucial to understand antigen discrimination by T-cells and to deepen researchers' comprehension of the interplay and associations among individual HLA type, TCR sequences and disease. In this respect, the implications for the development of novel therapeutics are obvious and find translation to many disease settings, including infectious diseases, autoimmune diseases and oncology.
4. TCR Repertoire via HTS: When Details Matter
High-throughput sequencing (HTS) has emerged as a suitable method for evaluating TCR diversity, allowing the characterization of immune repertoires with massive parallel sequencing at a deeper and finer level
[8][42]. This technique combines the resolution of individual TCR nucleotide sequences decoded with the ability to read millions of sequences simultaneously
[43]. Traditional strategies, such as spectratyping, Sanger sequencing and other assays, such as flow cytometry
[30], are time-consuming and insufficient for generating a deep analysis of the immune repertoire.
To perform a TCR repertoire analysis, many aspects must be taken into account, such as the kind of starting material for the library preparation, the method for sequencing
[8] and the following data analysis pipeline.
First, following a nucleic acid extraction from the samples cohort of interest, genomic DNA (gDNA) and messenger RNA (mRNA) can both be used for library preparation
[44].
The amount of gDNA is proportional to the number of analyzed cells with a 1:1 number of clonotypes and number of cells ratio (1 gDNA template per cell), allowing researchers to determine the relative abundance of sequences in a sample at the cost of unavoidably detecting potentially irrelevant and non-expressed sequences that must be removed through post-processing bioinformatic analysis
[45][46].
On the contrary, mRNA is related to cell function/activation
[1], and RNA-based methods are more sensitive due to the presence of multiple copies of the transcript of interest per cell. Thus, a more comprehensive recognition of both unique receptor variants and functional expressed TCRs can be obtained using RNA as it allows the detection of very rare clones and reveals sequences effectively transcribed and thus more likely to yield functional TCRs
[7][45].
Further, gDNA as input material does not require the reverse transcription step, minimizing the possible biases introduced in cDNA synthesis
[5], while starting from already spliced mRNA converted into cDNA holds the advantage that less reverse primers are sufficient for C region amplification, reducing PCR biases from multiplexed J primers
[30][42], obtaining both a higher detection sensitivity and no need for adapter sequences
[47].
Additionally, RNA-based methods allow the implementation of unique molecular identifiers (UMIs), which consist of random DNA sequences added during cDNA synthesis in order to label individual cDNA molecules, correcting for amplification and sequencing errors
[48]. However, since RNA-based approaches are affected by the relative expression of TCRs in the cells and not only by the number of cells expressing the same TCR, those methods are believed to be less reliable in describing the relative abundance of clonotypes in a cell population. The advantages and disadvantages of strategies based on gDNA and mRNA are listed in
Table 3.
Table 3. DNA-based vs. RNA-based approaches, choosing the right starting material for TCR profiling.
Advantages |
gDNA |
mRNA |
-
easier to obtain;
-
-
no requirement for reverse transcription (RT);
-
better reflect the number of analyzed cells;
-
accurate measurement of clonality without bias caused by variable expression levels in different cells.
|
-
higher number of copies in a single cell;
-
large information at the gene transcription level;
-
reduced interference of non-coding signals after the splicing process [50];
-
overall length sequence in the CDR region is easily available;
-
non-productive receptor transcripts are underrepresented [51].
-
close proximity of V and C regions after the splicing process facilitates PCR amplification [13].
|
Disadvantages |
gDNA |
mRNA |
|
|
The choice between gDNA and mRNA as a source for TCR repertoire sequencing depends on the quantity of nucleic acids requested to start a specific workflow, on the options for library preparation and on the type of results required at the end of the process. Currently, peripheral blood is the most used starting material due to the ease and non-invasive sampling procedure, especially in relation to cohorts of healthy subjects, even if peripheral blood lymphocytes are estimated to 2% only of the total lymphocytes in the body
[1].
To obtain robust and comparable data, it is important to standardize the processing of all samples to process them as uniformly as possible, starting from a determinate amount of material concentration for each sample and analyzing them with the same parameters as a comparable number of reads in relation to the depth of analysis to achieve in the experiment
[53].
The sequencing depth can be adjusted according to the sample type and experimental goals. Deeper sequencing is appropriate when analyzing samples with large or diverse cell populations at the expense of higher throughput.
The majority of TCR repertoire profiling studies are based on the analysis of the CDR3 region; however, full-length sequencing includes additional regions, such as CDR1 and CDR2, involved in antigen receptor binding affinity and/or downstream signaling, and allows to directly clone and express the identified and chosen receptors to perform others experiments. This aspect is crucial when the identification of therapeutic candidate TCRs is the goal of the analysis
[45].
TCR HTS methods can be divided into bulk sequencing, for T-cell populations evaluation or single-cell sequencing for the analysis of individual T-cells
[2]. The choice between these two analyses depends on the goal of the experiment and on other factors, such as sample requirements, hands-on and total workflow time, degree of polymerase chain reaction (PCR) bias, quantifiability, immune repertoire coverage, ease of data analysis and cost. Generally, for the analysis of immune repertoire diversity in health and disease, a bulk sequencing approach is used because it allows the sampling of many more sequences in a single experiment, even if information about αβ-TCR pairing is lost and undetected low-frequency TCRs could mislead diagnostics outcomes
[54].
However, a single-cell approach is preferred in experiments set to investigate the specificity of a TCR for an antigen of interest, for capturing the paired αβ-chains information and producing complete antigen receptors and/or characterizing their function.
Using mRNA, single-cell TCR sequencing makes it possible to evaluate cell transcriptional heterogeneity down to the single nucleotide level and gene expression variability at the single cell level, leveraging the study of phenotypically different cells populations to an unprecedented resolution
[55][56]. Compared with TCR bulk sequencing, the number of cells sequenced using the single-cell approach drops to 10
2–10
3 instead of up to 10
6 [54], and another consideration to take into account is that isolating single cells can be challenging, and obtaining viable cells at the end of the process requires care and to work quickly, with a consequent decrease in the number of samples analyzed and an increase in variability in any single-cell study due to the process workflow. If starting from cryopreserved cells, it must be tested whether the process of cryopreservation has changed or damaged the cell viability and/or phenotype. Moreover, data analysis requires specific tools and expertise regarding the most appropriate analytical approaches. Because of the above-mentioned considerations, single-cell is still a more expensive method compared to other sequencing techniques.
Researchers have often performed initial bulk analysis and moved, after selection of features of interest, such as binding affinity, to single-cell
[45], even if recently developed commercial single-cell sequencing solutions start to provide full-length paired αβ-chains sequencing of many T-lymphocytes
[2]. These ultimate technologies are based, for example, on barcoded gel beads mixed with cells, enzymes and partitioning oil, used to generate V(D)J gene expression libraries (e.g., 10X Genomics, Pleasanton, CA, USA). A major improvement in the throughput came with emulsion-based approaches
[7] in which single cells are encapsulated in water-in-oil emulsions, where cDNA is synthetized thanks to TCR primers and RT-PCR reagents and then sequenced
[5], maintaining native αβ-chains pairing while sequencing both chains. Quantitative transcriptomics is used to analyze TCRs and other cell markers, and since each cell is individually barcoded, amplification bias is not an issue. Using the same principle of the methods reported above, microfluidic platforms reliant on individual cell compartmentation in microwells or droplets have been applied in single-cell isolation
[57].
Droplet-based instruments require a dedicated hardware platform
[58], and encapsulation efficiency is variable depending on which method is used, as thousands of cells can be encapsulated, but rare clones can still be missed, while costs remain elevated and hinder a broad application of those approaches
[59]. To improve single-cell TCR sequencing, the cells sorting through the FACS instrument represents an approach to enrich populations of interest through surface markers presence and helps in analyzing rare subsets of T-cells
[5].
Library construction and data analysis of bulk and single-cell sequencing approaches share essentially the same principles and workflows
[60].
Multiplex PCR represents the most used approach to prepare sequencing libraries for TCR repertoire analysis, and consists of two rounds of PCR using multiple primers, specifically a set of forward primers for V genes and a set of reverse primers for either J or C genes, according to the template used
[5].
At first, receptor locus amplification takes place with the addition of known sequences; all the possible recombination events of the receptor sequences are captured using a V and J primers pool in the first PCR, while the additional sequences are fundamental for the incorporation of sequencing adaptors and indexes to each amplicon during the second PCR. By the way, multiple rounds of PCR before sequencing could introduce sequencing biases due to the fact that the priming sequences at the 3′ and 5′ ends of the first PCR overlap significantly between different sites, implying the use of a pool of slightly diverse primers for different TCR sequences amplification, and, importantly, receptor sequences that share similarity with the primers used could be recognized more effectively, affecting clone frequencies’ results, with a negative impact on research outcomes
[45]. However, it is possible to quantify templates before and after multiplex PCR using synthetic TCR molecules targeted by the multiplexed primers pool, with primer concentration optimization and correction of potential biases
[61], and many assays commercially available for library preparation already contain validated internal controls that correct for these biases.
Rapid amplification of cDNA ends is an approach of library preparation used only for RNA templates and relies on a template switching mechanism, an intrinsic property of certain reverse transcriptases (RTs)
[62]. This method avoids amplification bias between V regions introduced by multiplex PCR
[47], and it is applicable both to 5′ or 3′ ends. The 3′ RACE approach takes advantage of the mRNA poly(A) tail at the 3′ end by using it as a generic priming site for the PCR amplification step following retrotranscription, targeting the region of interest between a known exon and the 3′ end
[63].
The 5′ end of mRNA does not present any generic priming sites; therefore, accurate incorporation of an adapter sequence at the cDNA first-strand 5′ end, by adding non-templated nucleotides through RT activity, is required. A hybridization step occurs then between a template-switching oligo complementary to the added non-templated nucleotides, enabling templates to switch and enabling adapter sequence incorporation, which serves the next two PCRs
[45]. Consequently, targeting the 5′ adaptor sequence and the C region by using just one pair of primers is enough for all TCR rearrangements’ amplification
[5], thus reducing PCR errors and ensuring the TCR repertoire profile matches the original sample instead of the primer design. Additionally, a high-on-target rate is guaranteed by the second semi-nested PCR, with a decrease in sequencing costs
[45].
Some library prep protocols still employ ligation reaction to anchor adapters and barcodes to the amplicons even if the suboptimal ligation efficiency of the adapters could represent a limiting factor of this choice
[62][63], impacting the accuracy of the quantification, especially for the low frequency TCRs, which can justify why 5′ RACE is less reproducible than multiplex PCR
[8][64].
After finishing the library preparation step, sequencing of samples can be continued, most of which are run on Illumina platforms. TCR sequences obtained at the end of the workflows consist of sequences of nucleotides that have to be first aligned to VDJ regions’ reference sequences and then grouped according to sharing the same CDR3 in order to evaluate clonotypes
[65].
The use of algorithms, such as IgBLAST
[66], IMGT/HighV-QUEST
[67], MiXCR
[68], immuneSIM
[69] and RTCR
[70], allow the evaluation of TCR sequences analogies and discrepancies as compared to publicly available TCR databases.
TCR repertoire analysis is, nowadays, becoming more and more accessible to the scientific community and the pharma industry to unravel TCR specificities, clonality, diversity and the intensity of response associated with treatments and disease states. The panorama of applications for TCR sequencing on the market is really broad and complex, with many companies proposing specific protocols according to the specificities previously defined. As a means of orienting researchers in the choice of the best suited approach for TCR sequencing in the present scenario researchers provide, to the best of researchers' knowledge, a comprehensive and synthetic description of the kits and the services accessible on the market at the moment.