1000/1000
Hot
Most Recent
Cell free circulating DNA (cfDNA) refers to DNA fragments present outside of cells in body fluids such as plasma, urine, and cerebrospinal fluid (CSF). CfDNA was first identified in 1948 from plasma of healthy individuals. Afterward, studies showed that the quantity of this cfDNA in the blood was increased under pathological conditions such as auto-immune diseases but also cancers.
Cell free circulating DNA (cfDNA) refers to DNA fragments present outside of cells in body fluids such as plasma, urine, and cerebrospinal fluid (CSF). In 1989, Philippe Anker and Maurice Stroun, from the University of Geneva, demonstrated that this cfDNA from cancer patients carries the characteristics of the DNA from tumoral cells [1]. Then, the research and identification of genomic anomalies specific of a cancer type in the circulating DNA, such as NRAS and KRAS mutations or HER-2 amplifications [2][3][4], started to expand, and for the first time, the term of circulating tumor DNA (ctDNA) appeared.
Sequencing technologies for detection and analysis of the ctDNA range from point mutations analyses using PCR-based methods to analyses of whole genome using NGS based methods. The choice of the method employed depends on the application and the sensitivity intended.
Targeted approaches can detect, with high sensitivity, specificity and at a fast and cost-effective rate, already known recurrent mutations. These hotspot mutations frequently occur in a specific type of tumor and can be, most of the time, targeted by a therapy. Thus, targeted approaches can be very useful for the follow up of minimal residual disease to early detect relapse or track resistant mutations. Contrariwise, untargeted approaches are less sensitive but are useful for the discovery of new DNA mutations and genome wide alterations such as copy number variations (CNV, or copy number alterations, CNA).
Several parameters in the sequencing processing can affect the sensitivity of detection. For example, as around 3000 copies of haploid genome are present in 10 ng of DNA, approximatively 60 ng of cfDNA will be required for a sensitivity of 0.01% (one rare event in 10,000 molecules), which is often challenging, even more if we consider that more than one observation is necessary to determine a true variant. Amplification steps cannot replace low input of cfDNA because the polymerase will introduce errors, increasing the risk to have false positive variants. Another parameter that may improve sensitivity is to monitor multiple alterations simultaneously in order to increase the chances of detecting ctDNA.
PCR-based methods, such as the derivatives of quantitative PCR (qPCR) and digital PCR, are fast, cost-effective, and relatively simple to carry out and analyze. They allow detection of single or few mutations at low variants allele frequency, up to 0.1% and less, with high specificity.
At first, the qPCR method, by measuring the fluorescence emitted by a labeled probe during amplification of a targeted gene, was used to estimate the concentration of cfDNA in plasma of patients with cancer [5]. Later, qPCR assays were developed to detect mutations in tumoral cfDNA and the sensitivity of detection was improved by promoting the specific amplification of the mutant allele.
ARMS-PCR (amplification-refractory mutation system) is a simple method for detecting point mutations or small deletions, in which DNA is amplified by allele specific primers. In this technique, the lack of 3′ to 5′ exonuclease proofreading activity of the Taq polymerase reduces dramatically the annealing and hence the amplification in case of mismatch at the 3′ end of the primer. Although there are some improvements of the method, the false positive rate is still high with a limit of detection around 0.5 to 1% in plasma samples [6][7]. This limit can go down to 0.015% with ARMS-plus that includes a “Wild-type blocker” and in which amplicons were shortened to 50–80 bp, prohibiting the non-specific amplification and thus increasing the detection specificity [8].
PNA-LNA (peptide nucleic acid-locked nucleic acid) Clamp PCR uses a blocking synthetic nucleic acid analog complementary to wild type sequence to favor the amplification of the mutant allele. This method is particularly used in non-small cell lung cancer (NSCLC) to detect EGFR mutations, especially T790M mutations in tumor resistant to EGFR-TKIs (tyrosine kinase inhibitors), where cfDNA could be an alternative to the re-biopsy. Using smaller PCR products and by increasing the number of cycles, Watanabe and colleagues reached less than 0.1% detection rate [9].
COLD PCR (co-amplification at lower denaturation temperature-PCR) is an amplification method that selectively enriches low-abundance variant alleles from a mixture of wild-type and variation-containing DNA, irrespective of mutation type and position, by exploiting the critical denaturation temperature. The use of a lower denaturation temperature results in selective denaturation of molecules containing wild-type mutant heteroduplexes, which is followed by amplification. COLD-PCR has been used to improve the reliability of a number of different assays that traditionally use conventional PCR, such as Sanger sequencing, pyrosequencing or qPCR, greatly increasing their sensitivity. Thus, this method can detect mutant allele fraction down to 0.1% [10][11].
Digital PCR or droplet digital PCR (dPCR, ddPCR) allows clonal amplification of single DNA molecules, based on water-oil emulsion droplet technology. The PCR solution is partitioned into tens of thousands nano-liter sized droplet reactors where nucleic acid molecules are randomly distributed. A specific fluorescent amplification of the template (wild-type or mutated DNA) occurs in each individual droplet and positive droplets are subsequently counted to give an absolute target quantification using Poisson’s statistics. The detection rate of hot-spot mutations with dPCR is 0.1% to 0.01% [12][13][14].
As an example in lymphoma, this technique has a potential clinical use in diffuse large B cell lymphoma (DLBCL), as co-occurring mutations inMYD88andCD79Bcan predict response to Ibrutinib treatment, thus providing a predictive molecular tool for patient and therapy selection [15]. As well, in primary central nervous system lymphoma (PCNSL), mutation MYD88 L265P was identified by ddPCR in cerebrospinal fluid or vitreous fluid with a superior sensitivity when compared with qPCR [16][17]. Since this mutation is found in up to 85% of PCNSL cases and not in non-hematological brain tumors, this ddPCR assay may be a promising technique for minimally invasive confirmation of PCNSL diagnosis.
BEAMing (beads, emulsion, amplification, magnetics) is a highly sensitive digital PCR method that combines emulsion PCR and flow cytometry to identify and quantify specific somatic mutations present in DNA [18]. Diehl and coworkers used a BEAMing approach to detect mutations in cfDNA from patients with colorectal cancer, showing that ctDNA dynamics reflects tumor responses and progression, and that ctDNA detection after surgery represented a marker of residual disease [19]. This method, mainly used so far in solid tumors, such as colorectal [20], breast [21], and lung cancers [22], has a highly sensitive detection rate with variant allele fraction as low as 0.01%.
Although ddPCR allows for quantitative assessment of mutant frequencies in cfDNA, it is limited by the number of fluorescent probes that can be used in one assay (up to five) [23][24].
Copy number variations have also been investigating in cfDNA using ddPCR. Even if the number of targets is limited, it can be a useful tool for detecting, simply and rapidly, some gains or losses, which are associated with poor prognosis at diagnosis or during follow-up [25][26].
DdPCR can also be suitable to detect chromosomal rearrangements, especially in hematological malignancies. Among others, assays have been developed for translocation t(11;14) deregulating the CCND1 gene and translocation t(14;18) deregulating the BCL2 gene, which are frequently observed in Mantle cell lymphoma (MCL) and follicular lymphoma (FL), respectively [27][28]. The sensitivity of these techniques can go down to 0.01%.
The major limitation of the previous PCR-based approaches is their very limited multiplexing ability. Mass spectrometry-based methods such as surface-enhanced Raman spectroscopy (SERS) and UltraSEEK are adaptation of the conventional PCR method with a unique advantage in multiplexing to detect ctDNA mutations at low frequency with low input amount of cfDNA and fast turnaround time.
SERS is a surface-sensitive technique that enhances Raman scattering by molecules adsorbed on rough metal surfaces or by nanostructures such as plasmonic-magnetic silica nanotubes [29]. The detection of target specific DNA is based on the use of labeled nanotags (Raman reporters) and the measurement of the Shift in the spectrum of Raman reporter that can provide information about low-frequency transitions in molecules. The status of mutations is then analyzed with SERS spectrum where unique spectral peaks demonstrated the presence of targeted mutations. Multiplex PCR/SERS identifying three hotspot mutations has been developed in melanoma and colorectal cancer with a limit of detection as few as 0.1% [30][31].
The UltraSEEK chemistry is able to interrogate multiple informative variants within a single reaction. In this method, the mutant allele is specifically targeted by a primer extension step that omits the wild type allele. In NSCLC, the limit of detection of the UltraSEEK Lung Panel, consists of 73 variants, was 0.125–1% with low input of specific tumoral cfDNA fragments beforehand measured with the LiquidIQ Panel [32]. Of note, this study showed the importance of preanalytical cfDNA quality control and input amount for the accuracy of liquid biopsy testing.
Overall, these PCR-based assays are very effective tools for detecting mutations at a relatively low-cost, which make them feasible in routine clinical practices. The main limitation is the limited multiplexing ability, which restricts the possibility of targets and can lead to a greater consumption of material. Furthermore, the alterations detected must be previously known such as hotspot mutations, which is more suitable for a minimal residual disease but less as a diagnostic tool.
Targeted deep sequencing techniques are still limited to a certain number of regions but can cover entire genes or entire coding regions of genes. Thus, they are suitable for genes without hotspot mutations, which is often the case for loss of function mutations in tumor suppressor genes.
Targeted enrichment in library construction can be achieved by direct amplification (amplicon or multiplex PCR) or hybridization capture (hybrid capture) of the DNA regions of interest. Techniques using multiplex PCR-based methods are more dependent on the length of the fragments and may require several simultaneous reactions for target enrichment to cover a large region of a gene, consuming more DNA. Hybrid capture methods employ custom RNA probes complementary to targeted regions and are able to detect both single nucleotide variants (SNV) and structural variants [33]. In this method of enrichment, the fragmentation of cfDNA can lead to a heterogeneous coverage across targeted exons with a lower fragment depth in the edge regions of exons, which must be taken into consideration when designing the panels for ctDNA sequencing [34].
Although they have high sensitivity and specificity, NGS platforms show a random error rate between 0.1 and 1.5% per base call, but library preparation protocols have been upgraded to improve the detection of rare variants [35][36]. In targeted DNA sequencing, the use of few DNA molecules combined with ultra-deep sequencing increases the risk to read several times the same molecule where polymerase errors are introduced at any step during the NGS process, leading to the inability to confidently call rare variants. One of the major recent technological advances is the use of molecular barcodes, which are random sequences introduced before any amplification step. They allow the counting of original DNA molecules instead of PCR duplicates, thereby enabling digital sequencing and resulting in unbiased and accurate mutation profiles with an increased sensitivity [37][38][39][40].
Tam-Seq is an amplicon method using a target enrichment array with barcoded primers to prepare the amplicon library for NGS. First, an initial targeted preamplification step is carried out, followed by a selective amplification of the regions of interest in single-plex reactions. Then, sequencing adaptors and sample-specific barcodes are attached to the amplicons in a further PCR. It was first able to detect mutations in circulating DNA with high sensitivity and specificity (>97%) at allele frequencies as low as 2% [41]. The technique has been recently improved (enhanced Tam-Seq, eTam-Seq) with a primer design strategy, allowing for amplification of highly fragmented DNA, a workflow reducing the background error rate, and a more efficient calling algorithm with better detection of SNV and indels (insertions/deletions), and also CNV [42]. This assay, using an optimal amount of DNA, detected 94% mutations at 0.25–0.33% allele fraction (AF) with a limit of detection down to 0.02% AF with high per-base specificity (99.9997%). In this study comparison of eTam-Seq with dPCR showed a good concordance between the two techniques, demonstrating the quantitative accuracy of eTAm-Seq technology for reliable detection of mutations at low allele frequency [42].
This amplicon method was originally described by the group of Bert Vogelstein [37]. It was the first approach using molecular barcodes in DNA sequencing, to increase sensitivity of massively parallel sequencing. Thus, this method allows a correction of amplification and sequencing errors and can quantify rare mutations with a sensitivity of 0.05% of allele fraction. Safe-SeqS showed high performance in detecting mutations in cfDNA from patients with solid tumors, for molecular profiling as well as real-time monitoring of minimal residual disease [43]. A recent study on three independent cohort of nonmetastatic colorectal cancer, showed a median mutant allele frequency of 0.046% with a minimum of 0.01% [44].
Duplex sequencing is an improvement of the Safe-SeqS technique [45][46]. In this method, a semi-degenerated double stranded unique barcoded adapter is ligated to a target double stranded DNA. Thus, in addition to get rid of PCR and sequencing errors, the advantage of this technique is to identify artifacts due to sample alterations [47] because it can examine both strands individually and the damage to them is usually not identical (error correction by double-stranded consensus sequence). The theoretically sensitivity of this approach to discovering mutants is one molecule among 10^7 which is much higher in accuracy than conventional next-generation sequencing methods [45][46].
Several studies, in various types of cancers, applied this method on plasma cfDNA. In combination with target enrichment using hybrid capture, this approach allowed detection of tumoral fraction at 0.1% and below with high sensitivity and specificity, providing a powerful tool for diagnosis as well as longitudinal monitoring of disease [48][49][50].
In this technique, molecular barcoding is also used to facilitate the discrimination between true mutations and false positive variants. This combination of barcodes allows keeping track of each fragment as they are sequenced around 30,000 times [38]. The analytical sensitivity was 100% and 89% for detecting mutations present at 0.2% and 0.1%, respectively, using minimum thresholds of 0.05% in hot-spot positions and 0.1% at all other locations, resulting in a sensitivity of 97.4% overall, and without detection of false positives (less than one error in three million bases sequenced).
This approach uses only one gene specific primer (GSP) for amplification of each genomic region, which makes it less dependent on the size of DNA fragments than PCR using two primers and offers a uniform coverage. As for capture, the first step is a fragmentation step in which the buffer used inhibits fragmentation of the high length fragments of DNA such as contaminating These adapters will be used for amplification of targeted region (together with GSP) and contain the degenerated molecular barcodes (UMI, Unique Molecular Index). Moreover, given this UMI contains 12 base pairs, it allows a large number of combinations and a very little risk for redundancy [39].
This technique of deep sequencing, using molecular barcodes to improve accuracy in variant detection, has been used at diagnosis in order to identify actionable genetic alteration with targeted therapies available for treatment or hotspot mutations to be tracked with ddPCR during follow up, with a detection of variant allele frequency down to 1–5% [51][52]. Further investigations are needed to find the real limit of detection of this technology, which may be below 1% as other techniques using molecular barcoding.
This approach also allowed detection of CNV. In PCR-based library construction, amplification introduces biases in further reads count because the amplification factor is dependent on many parameters such as library size, GC content, region length or competition between primers overlapping the same locus. Thus, the use of UMI via the mCNA tool allows the direct count of targeted DNA molecules before any amplification and the detection of CNV in a robust and sensitive way [53].
In this technique, the first important step is to query cancer databases to identify known recurrent mutations for a particular cancer type. Then, biotinylated oligonucleotide probes, named “Selector”, are designed to target large segments of the concerned regions. The sensitivity is also improved by its ability to detect simultaneously various types of alterations: single nucleotide variants, rearrangements, insertions/deletions, and copy number alterations. It was originally described to detect and monitor lung cancer but was successfully adapted to a broad range of cancers, including different types of solid tumors as well as hematological malignancies such as DLBCL, LF, and HL [54][55][56][57][58].
With this method, ctDNA was detected in blood of NSCLC patients with 96% specificity for mutant allele fraction down to 0.02%. This iDES-enhanced CAPP-Seq combines CAPP-Seq with duplex barcoding sequencing technology and with a computational algorithm that removes stereotypical errors associated with the CAPP-Seq hybridization step. This improved version of CAPP-Seq has shown a high sensitivity in the detection of EGFR mutations in cfDNA of NSCLC patients, with variant allele frequency as low as 0.004% with >99.99% specificity. Moreover, using duplex sequencing and covering a large number of mutations (≥200), the authors outperformed iDES and managed to detect ctDNA down to 0.00025%, with an input of only 32ng of cfDNA [59].
This test was specifically developed for MRD in hematological malignancies. In this method, ultra-deep sequencing of genomic DNA, with a set of locus-specific multiplex PCR covering all possible rearranged IgH, IgK, and IgL receptor gene sequences, firstly identifies the tumor-specific clonotype. Then, this clonotype can be tracked as a specific fingerprint to quantify ctDNA in lymphoma disease monitoring with a sensitivity of approximatively 10^-6 [60][61]. This technique presents some technical limitations, including the need of tissue biopsy to identify clonotype and difficulties to identify clonotype sequences in some lymphoma types such as DLBCL of the germinal center type and FL because of somatic hypermutation (SHM). Nevertheless, this method has shown high performance in surveillance ctDNA, after complete remission, to identify risk of recurrence before any clinical evidence of disease in most patients (with a median of 3.5 months)
This approach was also used for MRD monitoring in DLBCL patients after CAR-T cell therapy, showing correlation with clinical and radiologic outcomes for all the patients tested [62].
As mentioned previously, untargeted approaches, namely whole exome and whole genome sequencing (WES, WGS), are less sensitive than targeted approaches. The sensitivity of these techniques on cfDNA is estimated around 5–10%, as compared to less than 0.1% for a targeted sequencing approach [63], making it difficult to detect rare events, especially in situations of early detection or minimal residual disease. However, these approaches may be necessary for the discovery of new alterations in the context of initial profiling at diagnosis, to provide information for the use of more sensitive targeted techniques during disease monitoring. Even if they are not suitable to detect subclonal events, they may be useful, considering intra tumoral heterogeneity, to highlight new drug targets or to track drug resistance clones [64].
WES is, most of the time, limited to coding regions and splicing sites of genes The technical feasibility of whole-exome sequencing (WES) on cfDNA has been demonstrated in various solid tumors and some hematological malignancies [64]. Low coverage and sensitivity, compared to targeted NGS technologies does not allow for the detection of rare variants but WES of cfDNA is suitable for mutational analysis of patients with advanced tumors and increased ctDNA fractions (>5% mutant allele fraction). In metastatic patients, WES could serve as a surrogate for tumor genome analysis, considering the difficulties of doing multiple biopsies and the high ctDNA allele frequencies [65].
Additionally, given intra tumoral heterogeneity, analysis comparing mutational profile between tumor and cfDNA mostly identified more mutations in cfDNA with a high prevalence of targetable genes. Beyond SNV detection, WES of cfDNA also allowed analysis of mutational signatures, copy number variations, fusion genes, rearrangements, predicted neoantigens, and tumor mutational burden [64].
Contrariwise to WES, WGS technologie is more suitable to detect ctDNA by identifying structural and non-coding variations such as genome-wide copy number aberrations, methylation profiles, and fragmentation patterns.
To override the cost and analysis time limitations caused by WGS, Heitzer and colleagues developed a shallow genome-wide sequencing approach called Plasma-Seq [66]. This technique does not have a sufficient sequencing resolution to identify SNV but is able to detect CNV in cfDNA at a depth of 0.1x, with a specificity >80% when ctDNA fraction is ≥10%. Recently, this approach of shallow WGS has been successfully used in cfDNA of DLBCL and HL patients to identify copy number patterns that can differentiate the two diseases at diagnosis [67]. These copy number aberrations were also correlated with clinical parameters, and longitudinal analyses showed correlation with disease status. than in tumor, as for mutation detection [54][68][67].
Aneuploidy has also been explored with WGS derived techniques such as Fast-SeqS (Fast Aneuploidy Screening Test-Sequencing System) and (Within Sample Aneuploidy Detection), using a single specific primer pair to amplify dispersed retrotransposon regions throughout the genome (long interspersed nuclear elements (LINEs)) [69][70]. By simulations with synthetic DNA, the bioinformatic tool WALDO showed high performance to detect individual chromosome arm gain or loss with a fraction of ctDNA >5%, and up to 1% of tumoral fraction with a sensitivity of 78%. However, due to their mechanism of detection, these techniques are limited to cancers presenting aneuploidy.
In order to detect genomic rearrangements, Leary et al. developed a technique called PARE (personalized analysis of rearranged ends), which uses WGS mate-paired analysis of the tumoral DNA to identify patient specific genomic rearrangement. Analyses, in breast and colorectal cancers, suggest that ctDNA concentrations at levels >0.75% could be detected in the cfDNA of patients with a sensitivity >90% and a specificity >99%, and that even a single copy of rearrangement from ctDNA can be detected without false positives[71]. In a recent study, PARE was employed to detect rearrangements in gastric tumor, which were used to design a quantitative PCR assay targeting rearranged loci for quantitative monitoring in cfDNA[72].
WGS, combined with artificial intelligence, can also identify genome-wide fragmentation patterns in cfDNA. Several studies in different cancer types have shown that these patterns can be used to detect ctDNA in body fluids and with very low plasma ctDNA fraction [73][74]. Indeed, the cfDNA fragmentation landscape represents a nucleosome footprint reflecting the cell and tissue of origin, potentially enabling non-invasive diagnosis of cancer type [73]. This nucleosome footprinting firstly identified by WGS represents nucleosome depletion at transcription start sites of highly expressed genes and the capture of this chromatin accessibility profile was used by CAPP-Seq technology to define gene expression differences and thus determine the cell-of-origin in DLBCL subjects from cfDNA [75].
Among epigenetic alterations, aberrant DNA methylation events can also represent an ideal biomarker for detection and classification of early stage cancer, as they occur early in cancer development, sometimes before the acquisition of SNVs. Multiple liquid biopsy studies have been performed utilizing DNA methylation markers in various cancer types [76]. These strategies are either very targeted, as methylation events of interest occur at known, stereotyped positions [77], or larger to identify methylation patterns, which have been shown to enable accurate determination of cell-of-origin from cfDNA and non-invasive cancer classification. In lymphoma, aberrant promoter methylation patterns detected in cfDNA have been shown to be an independent and significant poor prognostic factor for 5-year overall survival in DLBCL, outperforming existing clinical risk parameters an independent [78][79].
Moreover, as healthy cells also participate to epigenetic changes, it may need to be distinguished from these of cancers cells [76]. Thus, it could be of major interest to combine epigenetic analysis of the entire cfDNA pool with mutational analysis of ctDNA molecules.
While common bioinformatics strategies allow variant identification down to 2–5% allele frequency, in most cases, ctDNA accounts for a small fraction of total cfDNA since most of cfDNA is derived from non-cancer cells and especially blood cells. ctDNA fraction can be lower than 0.1%, leading to the detection of somatic mutations at the same level as the sequencing noise. It implies the use of in silico strategies to distinguish true positive variant calls from sequencing noise.
It has been reported from healthy controls that under an allele fraction of 0.02%, more than 50% of sequenced genomic positions had sequencing artifacts [59]. These errors are particularly due to library preparation, the error rates of NGS technologies, and the physical characteristic of the cfDNA fragments.
In addition, there are many tools and therefore many bioinformatics parameters that need to be optimized when analyzing cfDNA samples. While major progress has been made in the harmonization of tumor analyzes with the GATK4 Best Practices Workflows [80], there is not yet an international consensus for bioinformatic cfDNA analysis and research in this area remains very active.
The quality of cfDNA analysis is particularly impacted by adapter contamination. fragments could be shorter than usual which may result in the sequencing of adapters due to too many sequencing cycles compared to their lengths. Consequently, these reads could be either unmappable to the reference genome or could have a lower alignment score. These alignment scores are considered by a large number of bioinformatic tools and could finally affect the results of variant caller algorithms.
Many softwares were developed to find and trim adapters, like Cutadapt [81], TagCleaner [82], Trim Galore [83], or Trimmomatic [84]. In general, these algorithms also integrate the trimming of low quality nucleotides and the extraction of molecular barcodes.
The amplification of the libraries by PCR includes many biases for counting mutated reads because the number of aligned reads is no longer directly proportional to the number of initial unique targeted DNA fragments. The amplification factor of each region is unknown and depends on many parameters such as library size, GC content, or fragment length. Recent advances in library preparation allow the addition of Unique Molecular Identifiers (UMI) to each read. UMI are especially useful to correct library amplification biases by making each DNA molecule in a population of reads distinct.
There are two main bioinformatic approaches to use UMI for cfDNA analysis.
The first one consists in grouping PCR duplicates prior to any downstream analysis by merging sequences harboring the same UMI tag. The advantage of this approach is that it allows the use of classic bioinformatic pipelines after deduplicating the reads. It erases amplification biases due to cfDNA characteristics. However, it no longer provides access to essential information such as the amplification factor of each UMI or the discordant mutation calls of reads having the same UMI.
More recent approaches consist in using new bioinformatic algorithms for variant and CNV calling which are able to take into account the information carried by the UMIs after alignment, i.e., at the end of data processing.
For example, the UMI-VarCal algorithm [85] tries to quantify the number of concordant and discordant UMIs for each candidate variant during the variant calling process. Concordant UMIs were defined as number of unique UMIs for which all the reads carrying these UMIs validate the presence of the variant. Conversely, discordant UMIs quantify the number of abnormal substitutions like sequencing or PCR errors. SmCounter uses a barcode level allele probability and UMI counts to reject candidate mutations lacking enough barcodes with good read evidence.
Many biases due to the amplification step while preparing sequencing libraries prevent the direct quantification of loci copy-number [86]. cfDNA fragments are often shorter than DNA extracted from tissue and make it impossible to use conventional approaches for the detection of CNV such as read-depth algorithms. Recent approaches, like mCNA [53], use the UMI counts instead of read counts to improve high-resolution copy number variation of genes.
There is not yet an international consensus for bioinformatic cfDNA analysis pipeline. The bioinformatics tools and parameters must be adapted to the nature of the sequenced samples (quantity of DNA, quality of extraction, integrity of the cfDNA, etc.), to the kits used to prepare libraries, to the presence of UMI in library construction or not, and finally to the sequencing depth. In addition, sequencing biases are often sample specific which requires an objective assessment of sequencing noise at sample level.
Some first tools, like UMI-Gen [87], allow to create in silico alignment datasets to evaluate the performance of variant calling and filtration tools. UMI-Gen is a UMI-based read simulator, which reproduces targeted sequencing paired-end alignment files (BAM) by estimating sequencing noise from a set of reference BAM files. It is particularly useful for evaluating the performance of variant calling tools because it allows to vary many parameters (sequencing depth, number of initial UMI, etc.) and to insert variants at frequencies of interest during the simulation. It thus makes it possible to optimize bioinformatic pipelines according to the targeted panels or the sequencing technology.