1. Introduction
There are a variety of compounds in the metabolome, but there is still no good technology which can identify the components of these compounds fully and effectively at present. Metabolomics, as an important part of life science and biological system, has made significant contributions to the bioactive ingredients in food and their health effects on human beings through the analysis of changes in the metabolites
[1]. Metabolomics is used to separate and identify small molecules in blood, urine, feces, cells, culture media, or food ingredients, and to study the related pathways
[2]. The molecular weights of these small molecules are generally under 1000 Da
[3].
The procedure of metabolomics analysis mainly involves the following steps: sample preparation, metabolite extraction, derivatization, separation and detection, and data processing (as shown in
Figure 1)
[3]. Small changes in any of these steps may have a significant impact on the final results.
Figure 1. The process of metabolomics analysis. (Some picture elements are from the BioRender).
21. Sample Preparation
Sample preparation is the first step in metabolomics study. The quality of the prepared samples is the key to the success of metabolomics analysis, so it is very important to choose the appropriate preparation method for different samples.
The preparation methods are different for samples of various sources
[4]. Keeping metabolite compositions of the original samples unchanged as much as possible and finding a suitable detection technology platform are two main problems in sample preparation. After the pretreatment of samples, choosing a suitable detection method can make the detection results have better repeatability and extraction efficiency
[5]. For example, 4-chlorophenylalanine can be used to normalize the sample before GC-MS based metabolomics treatment to improve the extraction accuracy and efficiency
[5]. Before the LC-MS based metabolomics sample is processed, the metabolites can be divided into different components by the mixed mode solid phase extraction method, and the appropriate column is selected to analyze the sample to improve the detection range of metabolites
[6]. Methanol extraction can improve the quality of NMR spectra
[7]. With the optimization of pretreatment technology, some special sample preparation methods have also been proposed. Solid phase microextraction (SPME) is widely accepted as a non-destructive method for the preparation of liquid samples
[8]. Tijana Vasiljevic and colleagues proposed a method for preparing small samples of miniaturized SPME tips, which are coated with HLB particles
[9]. It was the first study to analyze caviar samples using small SPME and LC successfully, and there was good extraction efficiency
[9]. Wan Chan and colleagues compared the performance of several different serum preparation methods based on UPLC-MS, and found that serum samples prepared with methanol generated more accurate data
[10]. In another study, it is showed that the speed of ultra-centrifugal treatment had a significant impact on the metabolic profile of fecal water; in particular, the concentration of P-Cresol changed with the increase of rotational speed
[11]. However, this method is only suitable for NMR metabolomics studies at present.
For solid samples, freeze drying and grinding are required to reduce moisture in the samples and increase the release of metabolites, respectively. Quenching is a very important step to stop the metabolic processes, and this step includes adding liquid nitrogen, freezing, heating, and adding acid
[12]. The omission of this step may cause changes in the metabolite composition by residual enzymes. However, time control is necessary in this step
[12][13]. Sample preparation is a key step in metabolomics analysis
[14]. How to prepare samples quickly without changing the original metabolite composition and make the operation repeatable are the problems to be solved in the future.
32. Metabolite Extraction
The step of metabolite extraction is generally the most rate-limiting step in metabolomics analysis
[15]. There are different extraction methods for different types of samples to maximize the number, type, and concentration of the target metabolites. The selection of extraction solvent also has a significant effect on the recovery rate and metabolic profiles. The extraction solvents commonly used include water, chloroform, perchloric acid, methanol, acetonitrile, and other solutions
[16][17]. It is necessary to choose hydrophilic solvents such as water-alcohol solutions for polar metabolites and hydrophobic solvents for non-polar metabolites. Estelle Martineau et al. compared the extraction efficiency of methanol/CHCl
3/H
2O, Acetonitrile/H
2O, methanol/H
2O, and Perchloric acid on mammalian cell metabolites, and found that using methanol/CHCl
3/H
2O for extraction can extract more metabolites, with good repeatability
[15]. Karsten Seeger proposed a new method which extracts metabolites directly from NMR tubes by slice selection after centrifugation, and it provided a new idea for rapid determination of metabolites
[18]. The most important thing of this method is that it could extract as many stable target metabolites as possible without adversely affecting subsequent analytical experiments
[18].
43. Derivatization
This step is not always necessary. Generally, derivatization of the metabolites is required to transform the non-volatile compounds into volatile compounds to facilitate the analysis of metabolites and improve detection ability of the metabolites effectively if using GC-MS
[19]. For example, the physicochemical properties of compounds with low ionization rate were changed by chemical derivatization to improve their ionization rate. Sezin Erarpat and colleagues used ultrasonic-assisted ethyl chloroformate to derivate l-methionine extract in human plasma and the recovery rate was up to 97.8 to 100.5% using GC-MS which could be regarded as a green and economical method
[20]. Stable isotope labeling derivatization (SILD) is a novel sample pretreatment technology proposed in recent years, with a great potential in food metabolomics research based on LC-MS
[21]. Shuyun Zhu et al. investigated a derivative method based on quadruplex stable isotope and developed 3-N-(D
0-/D
3-methyl-, and D
0-/D
5-ethyl-)-2’-carboxyl chloride rhodamine 6 G derivatization reagent, which can quickly and accurately quantify panaxadiol and panaxatriol in food
[22]. Several studies have shown that derivatization can improve the ability of metabolite detection
[23][24].
54. Separation and Detection
Separation and detection are important steps in metabolomics analysis. In the field of food nutrition, common separation technologies mainly include GC, LC, and capillary electrophoresis
[25]. The separation of compounds is based on the adsorption capacity of each molecule in the stationary phase, and it is also related to the selection of column, eluent, fixation, and flow equivalence parameters
[26]. In order to separate more metabolites, it is necessary to choose appropriate separation modes according to the polarity of the compounds. The separation technology is usually combined with high throughput detection technology to obtain large amounts of data. The commonly used detection techniques are NMR and MS
[25]. Although the sensitivity of NMR is low, it can be used for non-invasive, rapid, and repeated analysis of a variety of metabolites at the μM levels
[27]. It is simple to operate and suitable for high-throughput untargeted metabolomics analysis
[28]. Both primary metabolites including amino acids, sugars, lipids, and organic acids, and secondary metabolites including flavonoids and alkaloids can be detected by NMR. By contrast, the sensitivity of MS is much higher, and it requires only a few μL of samples for analysis. MS can be combined with different separation techniques or in series according to different sample types
[25]. GC-MS is mainly used to identify volatile and semi-volatile metabolites, while substances without volatile properties need to be derivatized, separated before detection by GC-MS. However, GC-MS cannot recognize any secondary metabolites
[14]. Unlike GC-MS, LC-MS does not require complex pretreatment of samples, and it can directly separate and detect metabolites after extraction
[14]. LC-MS is more comprehensive in metabolite identification and can determine secondary metabolites such as flavonoids as well as primary metabolites such as amino acids in plants.
Although current metabolomics techniques generally use a single detection tool, each technique has its own advantages and disadvantages. In order to identify and characterize more metabolites, combination of NMR and MS may achieve greater results. Manuja Kaluarachchi and colleagues identified metabolites in human plasma and serum by combination of
1H NMR and UPLC-MS
[29]. They identified 4 metabolites with significant differences in plasma and serum by
1D NMR, and 10 other significant different metabolites by UPLC-MS, and most of them are found on glycerophospholipids
[29]. Dong-sheng Zhao et al. determined the mechanisms of dioscorea bulbifera rhizome (DBR) on rat hepatotoxicity by integrating GC-MS and
1H NMR, and obtained a new potential therapeutic target, thus achieving an effective application of multi-platform metabolomics technology
[30]. In addition, the introduction of chemicals in NMR tubes increased the likelihood of identifying compounds with specific physical and chemical properties; the
15N-edited NMR enabled specific binding to compounds containing free carbonyl
[31]. The method of metabolic fingerprint analysis based on ultra-high-performance liquid chromatography–high-resolution mass spectrometry (UHPLC-HRMS) was optimized by using ethylene bridged hybrid C
18 column, which showed good chromatographic resolution and realized the effective detection of infected metabolites in wheat
[32]. Moreover, the optimization of parameters has been gradually studied. The researchers compared the Isotopologue Parameters Optimization (IPO) processing and manual processing of the original HPLC-TOF-MS data, and the parameters selected by IPO showed higher repeatability, and therefore it can be used to evaluate the optimum XCMS
[33]. However, IPO need to take several days or even weeks to calculate the optimization parameters. In contrast, AutoTuner gives more robust and high-fidelity results
[34]. MetaboAnalystR 3.0 is proposed as a new optimization process, which can not only optimize and correct parameters effectively, but also predict active pathways accurately
[35]. In recent years, a hybrid metabolomics method based on mass spectrometry also attracted much attention. By bridging the advantages of targeted and untargeted metabolomics, more accurate results and more metabolites can be gained
[36].
Different analytical instruments have different emphases. Considering the characteristics of samples and different analytical methods, a variety of separation and detection instruments can be used together to make the obtained metabolic data more comprehensive.
65. Data Processing
Data processing is an essential step in the process of metabolite screening, through which the changes of metabolites can be visualized and the possible metabolic pathways leading to these changes can be investigated using the KEGG database. Statistical analysis can help us to understand the metabolites in food and their impact on human health. Identification of metabolites is the most challenging step in metabolomics analysis
[37]. The metabolites in the samples were obtained by comparing with the data in various resource databases. Choosing the right data processing method can greatly improve the accuracy of data analysis. There are many metabolome databases such as Metlin
[29], Human Metabolome Database (HMDB)
[38], KNApSack Database
[39], and MassBank
[40]. After aligning the information with these reliable databases and with multivariate statistical analysis, the obtained raw data can be converted to more meaningful conclusions, such as biomarkers.
Multivariate statistical analysis methods include principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), orthogonal partial least squares discriminant analysis (OPLS-DA), least absolute shrinkage and selection operator (LASSO), linear discriminant analysis (LDA), and so on
[41][42]. Among which, PCA and PLS-DA are the most commonly used statistical methods in the field of metabolomics. PCA is a commonly used unsupervised dimensionality reduction method for metabolite quantity analysis, which reduces the data set to fewer dimensions to obtain greater variance
[43]. It can help us to visualize the metabolic data, trend, and cluster. It has been reported recently that PCA was used in combination with quadrangular discriminating analysis (PCA-QDA) to identify the MS data of cancer samples, and its accuracy and specificity reached more than 90%, and therefore it can be called a satisfactory classification model
[44]. PLS-DA is a supervised statistical analysis method that maximizes the correlation between variables, and it is often used to screen metabolites and to analyze overall metabolic changes between groups
[45]. The availability of PLS-DA is good, and it can be used to process multiple dependent categorical variables simultaneously. However, PLS-DA is prone to overfitting
[45]. In order to avoid this problem, based on the advantages of PLS-DA, OPLS-DA can divide the data into Y-related variation and Y-independent orthogonal variation and eliminate the variables unrelated to the experiment
[46]. R
2 and Q
2 parameters are used to evaluate the prediction ability of the OPLS-DA model, and variable importance in the projection (VIP) can be generated from the model. VIP > 1.0 indicates that there are important potential biomarkers in the OPLS-DA model
[47]. The authors compared PLS-DA with OPLS-DA in terms of model fitness and interpretability; although both are applicable, OPLS-DA had a higher interpretability
[48]. At present, the co-analysis of PCA and OPLS-DA has become the mainstream trend of metabolomics to discriminate samples. Combination of more analytical models may be a future direction. Currently, the application of OPLS-DA is mainly to screen and identify biomarkers through s-plot/s-line, permutation, and VIP
[49][50]. Using these methods to investigate the changes of metabolites may be a development direction in the future.
LASSO is a model selection method, and it can predict the phenotype by regression analysis of metabolites
[51]. LDA can classify the samples according to the source and maximize the linear separation of the classes
[52]. Kaitlyn M Mazzilli and colleagues evaluated the effects of various daily diets intake on serum metabolism using LASSO and found 102 related metabolites
[53]. Virgilio Gavicho Uarrota et al. identified the metabolic components of cassava postharvest physiological deterioration (PPD) through PCA and PLS-DA models and realized good sample prediction
[42]. The results provided good evidence for the metabolic differentiation of cassava during PPD. Moreover LDA and PCA in cluster analysis were considered to be suitable methods for distinguishing sex differences from organ differences
[54]. For example, argininosuccinate showed significant differences between males and females in kidney tissue, and in the ventricle, males had significantly higher levels of free carnitine and total esterified carnitines than females
[54]. So, targeted metabolomics is a good technique to test sex differences.