1. Application of DDA for Proteomics of Infectious Diseases
LC-MS-based proteomics has evolved into two analytical methods: (1) discovery proteomics and (2) targeted proteomics
[1]. The DDA, so-called “shotgun proteomics”, is a suitable method for discovery studies because it allows the comprehensive identification of bacterial proteins. In traditional DDA, protein samples are tryptic digested, following which the peptide mixtures are fractionated and analyzed by LC-MS/MS. The most abundant precursor ions in a given spectrum are then selected and fragmented into MS/MS for further analysis
[2]. Various protein identification programs have been developed
[3][4]. The most common approach for protein identification is the sequence database matching algorithm, in which real spectra obtained from MS/MS analysis are comparatively analyzed with in silico spectra derived from peptide sequences from a reference database. Therefore, using the correct high-quality searching algorithms and reference databases is essential for determining the quality of search results when using DDA. The high accessibility and wide coverage of DDA have made it the most widely used method (
Table 1). However, stochastic sampling is the main limitation of DDA; it complicates the identification of low-abundance proteins in complex samples and, in some cases, low-abundance proteins are frequently ignored
[5]. Owing to this problem, data are plagued with numerous missing values, therefore requiring imputation and resulting in the loss of statistical power when the sample size is increased. The depletion of abundant proteins or the fractionation of protein mixtures has commonly been applied to overcome this technical limitation
[6]. Label-based protein quantification methods, such as tandem mass tags and isobaric tags for relative and absolute quantitation, are also routinely applied to the comparative quantitative analysis of the infected host proteome
[7][8]. Optimizing MS/MS measurement conditions in LC/MS is also considered to be an important factor in expanding the usefulness of DDA
[9][10].
Table 1. Characteristics of LC-MS acquisition method.
There have been several important studies regarding the discovery of proteins of pathogen origin from clinical samples of infectious diseases. Kashino and colleagues applied the DDA approach to urine proteomics in patients with pulmonary tuberculosis (TB)
[18]. The urine samples were prepared by filtration through a 5 kDa molecular weight cut-off (MWCO) filter. They found four proteins (MT_1721, MT_1694, MT_3444, and MT_2462) of Mycobacterium tuberculosis (
Mtb) from nine patients with culture-confirmed pulmonary TB. Further validation of these proteins was performed by western blotting using anti-sera from patients with TB. Pollock and colleagues selected one candidate protein (MT_1721 or Rv1681) for further study
[19]. This protein was confirmed by LC-MS analysis, and the full length of the target protein was validated using immunoaffinity precipitation MS analysis
[19]. However, because of the low sensitivity of the DDA approach used in this study, the detection rate of the target protein (MT_1721) in the group of patients was not significantly high (less than 20%). The antibody of the target protein (MT_1721) was subjected to an enzyme-linked immunosorbent assay (ELISA) using approximately 100 clinical samples. ELISAs for the target proteins showed a detection rate of <50%. However, the authors confirmed the complete absence of urine reactivity in the negative controls. Young and colleagues also performed urine proteomics to discover TB-specific biomarkers using clinical samples obtained from patients with TB (
n = 63)
[20]. TB patients were categorized as having definite TB (
n = 21), presumed latent TB (
n = 24), or presumed non-TB (
n = 18). The clinical samples were pretreated by filtration (50 kDa MWCO filter) and concentration (3 kDa MWCO filter) to deplete highly abundant proteins before proteomic analysis. Using the DDA approach, the authors discovered 16 proteins originating from
Mtb. Additionally, 27 human proteins were selectively identified in patients with active pulmonary TB.
However, although the previously described body fluid proteomics studies succeeded in identifying bacterial-derived markers, in many cases researchers failed to identify bacterial proteins because of intrinsic limitations, low quantity target proteins relative to the host proteins, and/or the absence of target proteins in existing databases, as mentioned above
[21]. Spectral library searching is an alternative method for overcoming sensitivity-related limitations
[4]. This is described in more detail in the next section. In brief, this technique is typically more sensitive and faster than the sequence database searching approach because it directly matches the spectra of peptide ions to spectra contained in libraries
[22][23]. Hentschker and colleagues reported improved and faster results based on the proteome and phosphoproteome of pneumococci
[24]. They applied a spectral library instead of a sequence database to identify more unidentified bacterial proteins. The spectral library was derived from MS/MS analysis of the culture cells; it was validated using synthetic peptides. They identified 76% of the theoretical proteome and 128 phosphorylated proteins in Streptococcus pneumoniae. This method is expected to be useful for body fluid proteomics.
2. Application of DIA for Proteomics of Infectious Diseases
Following its introduction in 2004, DIA has become a new strategy for systemically analyzing complex protein mixtures
[25]. Unlike DDA, in DIA all ions present in a certain range of the m/z window are co-fragmented and collectively analyzed. The DIA approach makes it possible to expand the profiles of proteomes and accurately quantify targeted proteins. This method can result in better experimental reproducibility than DDA methods
[14][26][27][28]. DIA has the merits of both DDA and targeted approaches (selected reaction monitoring [SRM]/MRM and PRM). Therefore, it has become a popular technology in proteomics research
[29]. However, it is still unable to overcome the depth of proteome coverage in DDA and the accuracy of MRM or PRM in measuring very low-abundance proteins (
Table 1). High-resolution MS/MS acquisition at fast scan speeds is required for DIA-MS experiments. The most widely used hybrid instruments, QExactive and QE plus, are believed to have sufficient performance for DIA analysis. Although DIA is an extremely powerful method, it is more complex than DDA because of the difficulties of MS/MS spectral data analysis. Previously used peptide identification algorithms are not appropriate for DIA because of the complexity of the MS/MS spectrum of DIA
[12][4]. In order to deconvolute complex spectra, spectral libraries are essential as reference databases. In general, spectral libraries contain intensity and peak information of non-canonical fragment ions generated by multiple DDA analyses of target samples
[13]. Unfortunately, standardized pipelines have not yet been established
[12][3]. The contents necessary for the practical application of DIA have been described in more detail in recent review papers
[28][30].
3. Application of Targeted-MS for Proteomics of Infectious Diseases
DDA has been routinely used to discover biomarkers from clinical samples, with further validation being achieved through rigorous statistical methods. This validation process requires accurate, reproducible, and highly robust methods for quantifying candidate biomarkers. However, the abovementioned major limitations of DDA, related to irreproducibility and imprecision, result from stochastic problems. Targeted proteomics, meanwhile, have been devised for the precise quantitative analysis of specific proteins or protein complexes. Representative targeted proteomics include SRM, MRM, and PRM
[31][32]. SRM/MRM technology eliminates most non-targeted detection methods, which can reduce the noise signal and improve the detection sensitivity. In general, a triple quadrupole instrument is used for these technologies. Monitoring specific transition windows (a small range of m/z values of precursor/fragment ion pairs) results in increased selectivity and sensitivity compared to those with DDA and DIA approaches. It is known that targeted methods are at least 5–10 times more sensitive than DDA when analyzing whole-cell lysates
[26][33] (
Table 1). However, the bottleneck in the development of SRM/MRM-based assays is the complicated procedure of the optimization process
[34][35][36][37]. For example, it is important to choose the prototypic peptides, which are the unique peptides that empirically have a high chance of being observed.
PRM technology has been optimized based on quadrupole-orbitrap instruments to deliver an improved version of targeted proteomics. Unlike SRM/MRM, PRM involves the acquisition of full MS/MS scans of product ions in orbitrap, rather than selected fragment ions from predefined precursor ions. Therefore, this technology is more convenient because it does not require the selection and optimization of fragment ions. It can also be used for qualitative purposes, as in DDA approaches, to avoid false positives. In summary, this technique provides simplified and robust workflows but requires time-consuming optimization steps. Therefore, it is not suitable for discovery-based applications but is very useful for validation applications targeting low-abundance proteins present in body fluids
[38]. Targeted-MS based diagnosis has inherent strength compared to immunoassays in that it can perform the analysis in a multiplexed manner with high selectivity and sensitivity, without an antibody, at a low cost if the lab has appropriate instruments and has developed the assay
[39][40].
Several studies have successfully employed targeted proteomics to quantify biomarkers exposed in body fluids for infectious diseases. Kruh-Garcia and colleagues first developed an MRM assay for the antigen 85 complex (Ag85) mycobacterial proteins that are potential diagnostic biomarkers for TB. They compared the amount of the Ag85 complex (represented by Ag85A, Ag85B, and Ag85C proteins), in the secretome of various clades of
Mtb, revealing precise discrimination among those highly homologous proteins
[41]. In a further study, they expanded their proteomic results in the secretome to include TB patient serum
[42]. They identified 250 targeted peptides using DDA proteomics of
Mtb-infected macrophages and a mouse model. After a thorough optimization process aided by in silico analysis, they selected 76 peptides as target peptides, representing 33 mycobacterial proteins (including Ag85). Then, they performed an MRM assay, using serum exosomes from TB patients as clinical samples. As a result, for the first time, they suggested 20 mycobacterial proteins present in the serum exosomes of TB patients as potential biomarkers (
n = 41). The same research team developed refined MRM assays using isotope-labeled peptide standards
[43]; these assays can detect mycobacterial proteins in serum exosomes in the attomolar to femtomolar range.
Karlsson and colleagues successfully selected species-unique peptides of the Mitis group of the genus
Streptococcus, using proteogenomic analysis. They characterized and identified more than 200 unique peptides from cell lysates of cultured cells using DDA proteomics
[44]. They then expanded their platform to discover peptide biomarkers of representative respiratory tract pathogens, including
S. pneumoniae, Haemophilus influenzae, Moraxella catarrhalis, and
Staphylococcus aureus. For the discovery phase, representative genetic variations were preselected as MS-inclusion lists and validated in bacterial culture proteomics. Finally, the targeted peptides of each of the four pathogens were confirmed in 218 clinical samples
[45].
Wang and colleagues used a similar approach to identify five gram-negative pathogens in the BALF, including
Acinetobacter baumannii, M. catarrhalis, Pseudomonas aeruginosa, Stenotrophomonas maltophilia, and
Klebsiella pneumoniae [46]. Bardet and colleagues, meanwhile, developed an SRM-based method to rapidly and reliably identify pathogens using endotracheal aspirate samples of ventilator-associated pneumonia (VAP)
[47]. Based on the high ionization yields of the unique peptides confirmed in DDA experiments, 97 species-specific peptides from the six most frequent bacterial species (
A. baumannii, Escherichia coli, H. influenzae, Pseudomonas aeruginosa, S. aureus, and
S. pneumoniae) responsible for VAP were selected and monitored using the developed SRM assay.
4. Application of LC-MS/MS for COVID-19 Diagnosis
The current COVID-19 pandemic, which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has justified the need for the development of diagnostic technology for infectious diseases. Molecular diagnostics such as polymerase chain reaction (PCR) have to date been used as the gold standard for the detection of SARS-CoV-2. However, novel alternative approaches have been introduced. Proteomics researchers have introduced novel LC-MS/MS-based diagnostic approaches for COVID-19.
Gouviea and colleagues first reported 101 tryptic peptides derived from six viral proteins identified from SARS-CoV-2-infected Vero E6 cells, using DDA analysis
[48]. Through further curation, 14 peptides from nucleocapsid phosphoprotein (N protein), spike protein (S protein), and membrane glycoprotein (M protein) of the virus were recommended for further targeted MS. In a subsequent study, they proposed a time-efficient diagnostic method for COVID-19 clinical samples using LC-MS/MS as alternative methodologies to PCR or immunodiagnostic assays. They applied artificial nasopharyngeal swabs to evaluate 14 peptides. Among these 14 peptides, two peptides of the N protein were selected as attractive candidates
[49]. Interestingly, the same target peptides were confirmed by independent groups using the PRM method
[50][51]. However, neither approach could overcome the problems of the low detection rate (approximately 20% of the PCR assay) and low throughput analysis (20 min per sample). Thus, further investigations should aim to improve practical usage. Singh and colleagues also reported MRM assays using two other peptides derived from the S protein and replicase polyprotein, achieving significant results of 100% specificity and 90.5% sensitivity in a 2 min gradient run (
n = 103)
[52]. However, MRM measurements are limited by their low resolution, which makes it impossible to verify the peptide spectrum itself.
Cazares and colleagues reported a PRM assay for the detection of viral proteins in virus-spiked mucus samples and found that the limit of detection (LOD) and limit of quantitation (LOQ) were approximately 200 and 390 attomoles, respectively
[53]. These values indicated that the assay could detect approximately 2 × 10
5 viral particles/mL in a sample, showing comparable performance to the RT-PCR method.
Fully automated sample preparation and sample-cleanup methods with high-resolution MS seem to overcome these problems. Cardozo and colleagues developed a fully automated magnetic-based sample preparation method for nasopharyngeal and oropharyngeal swabs that could be completed within 4 h using a robotic liquid handler. Turbulent flow chromatography coupled with tandem mass spectrometry (TFC-MS/MS) can provide an efficient online sample cleanup method. This workflow can analyze four samples in a row within 10 min (in other words, more than 500 samples per day). The authors evaluated the target peptides of the SARS-CoV2 N protein qualitatively and quantitatively using PRM methods. The LOD and LOQ were reported to be 2–3 and 4–6 ng/mL, respectively. Compared to an RT-PCR-validated cohort, this workflow could detect up to 84% of the positive cases with a specificity of up to 97% (
n = 985)
[54]. Renuse and colleagues introduced automated immunoaffinity-based sampling combined with targeted high field asymmetric waveform ion mobility spectrometry (FAIMS)
[55]. Acquired PRM data were used to model an “ensemble” machine learning-based classification method. This method obtained high-quality results, delivering 98% (86/88) sensitivity and 100% (88/88) specificity
[55].
Rajczewski and colleagues thoroughly evaluated 636 viral peptides identified in datasets using Galaxy-based workflows
[56]. Galaxy is a web-based platform that provides reproducible computational research and numerous bioinformatics tools. Using in vitro and clinical source datasets deposited in the public repository proteomeXchange, they selected four peptides derived from N and M proteins. These peptides were consistently detected across all datasets used in the study and were proposed as potential diagnostic biomarkers.
Additional studies from nasopharyngeal swabs, gargle solutions, or other human samples have also been published
[53][57][58][59][60]. However, the results are limited, except for those of nasopharyngeal swabs, compared with the results of a PCR-based study
[61][62]. During the initial phase, Ihling and colleagues reported PRM-based identification of N protein from patient gargle solutions
[60]. Recently, Kipping and colleagues proposed an improved sample preparation protocol and developed MRM methods using a synthetic peptide library to target the N protein from gargle solutions and saliva
[63]. Based on these results, LC-MS-based diagnostics seem to be in the beginning stage, except for the use of nasopharyngeal swabs. The SARS-CoV2 peptides that have been introduced as potential biomarkers in recent studies have been summarized in two previous review papers
[64][65].