1. Introduction
Since the term proteomics was first coined in 1994 by Mark Williams while a doctor of philosophy student at Macquarie University in Sydney, Australia [
18], the technology has seen many exciting developments. Immediately coming with the initial announcement of the Human Genome Project, it was realized that it was essential to populate the human proteome for a comprehensive cognizance to the pathophysiologic mechanism behind human health and disease, using that knowledge to advance health treatment [
19], with cancer recognized as a major priority. With this goal, a number of initiatives were developed including The Human Protein Organization (HUPO), The National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC), The Early Detection Research Network (EDRN) and SEER cancer database, The Applied Proteogenomics Organizational Learning and Outcomes (APOLLO) network and The International Cancer Proteogenome Consortium (ICPC: Cancer Moonshot). More recently, companies such as Grail (
www.grail.com: proteomics accessed on 1 March 2021), Freenome (
www.freenome.com: multiomics accessed on 1 March 2021), SomaLogic (
www.somalogic.com: aptamer technology accessed on 1 March 2021) and Olink (
www.olink.com: Proximity Extension Assay accessed on 1 March 2021) have been established.
HUPO was created in 2001 with the goal of “Translating the code of life” for a deep understanding of biology by boosting the evolution of proteomics through enhanced international cooperation, facilitating the development of advanced technologies. In 2010, the HPP was launched ensuring quality guarantee, data sharing, global cooperation and high stringency annotation of the genome-encoded proteome. The HPP has two separate approaches: chromosome based (C-HPP) and biology and disease based (BD-HPP) backed up by four pillars: mass spectrometry resources, antibody technologies, knowledgebase (bioinformatics) and, more recently (2018), pathology. The human proteome is currently at >90% completion [
11].
Mass spectrometry remains the key platform currently used for proteomics analysis, with shotgun proteomics or bottom-up the most frequently utilized mode. MS-based proteomics relies on success in three main areas: sample pretreatment and analysis and data analysis. Two-dimensional gel electrophoresis (2-DE) and sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) were the original mainstays for sample separation before MS analysis, with the ability to separate over 10,000 proteoforms [
13], and indeed these systems are still in use [
20]. In this example, proteome information of tumor tissues and normal tissues was obtained by SDS-PAGE for a comparative proteomic analysis of different stages of BC. A gel-eluted liquid fractionation entrapment electrophoresis (GELFREE) system was used to separate and fractionate extracted proteins.
More recently chromatographic methods have been well recognized as methodologies worthy of consideration with particular advantages, especially in the areas of sample manipulation, recovery and automation. Multidimensional purification has been found to be particularly efficacious, giving high purification factors and reducing sample complexity prior to MS analysis, enabling deeper mining of the proteome [
13,
21,
22,
23,
24]. As exemplars, Kaur et al. have designed a simple fractionation workflow to extend the coverage of the plasma proteome [
25]. In a similar approach, Ahn et al. [
26] used a combination of high abundance protein ultradepletion (Agilent MARS-14) and an in-house IgY depletion column, multidimensional peptide fractionation (SCX, SAX, high pH and SEC) and sequential window acquisition of all theoretical mass spectra (SWATH-MS) to screen and identify biomarkers that showed expression alterations in colorectal cancer (CRC) tissues to healthy controls.
There have been many instrumental advances over recent years, with improvements in mass accuracy, speed and resolution. More powerful MS instruments such as the Q-TOF, TOF/TOF and the Orbitrap have been developed allowing deep mining of the proteome in time frames from tens of minutes to a few hours [
27]. In particular techniques for sensitive quantitative analysis have matured. In data dependent analysis (DDA) the sample is digested into peptides, ionized and analyzed by MS. In targeted proteomics (selective reaction monitoring (SRM), multiple reaction monitoring (MRM) and parallel reaction monitoring (PRM)), proteotypic peptides representing proteins of interest are used to develop rapid and sensitive assays for proteins, or panels of proteins, of interest [
28]. This is particularly suited for biomarker analysis, and a compendium has been developed [
29], which describes protocols for quantitation of over 99% of the annotated human proteins. However, the current method of choice is becoming data independent analysis (DIA) [
30], in particular SWATH-MS [
31]. In this approach, peptides within a defined mass to-charge (
m/
z) window are fragmented. As the mass spectrometer covers the full m/z range, repeated analysis is able to be realized, collecting the total proteome content.
2. Proteomics, the Current Status