HM is a biofluid characterized by a dynamically varying composition according to several factors including lactation time, time of the day, throughout each feed, maternal status, and the environmental exposure. Although compositional variations have been mainly studied regarding the protein content of HM
[37], changes of other compound classes such as fat or vitamins have been also reported
[38][39]. Considering the intrinsic variability of HM, the complexity of obtaining representative HM samples is not negligible. Sources of variation related to sample manipulation and compositional variation can be minimized using standard operational procedures (SOPs). SOPs are fundamental to maintain quality assurance (QA) and quality control (QC) process and facilitate repeatable and reproducible research within and across laboratories. However, biologically meaningful results across studies will only be obtained if several key factors during the sample collection process are successfully controlled. This is of special importance in untargeted approaches, where the interpretation of results is especially challenging, and confounding factors introduced by a non-exhaustive sampling protocol can be wrongly attributed to differences between subjects of a studied population. Conversely, biologically meaningful information can be missed or remain unnoticed due to unwanted bias introduced during sample collection.
Liquid-liquid extraction (LLE) is the classical extraction method employed in metabolomics and lipidomics. This method, developed by Folch et al.
[40] in 1957, uses a chloroform-methanol mixture (2:1,
v/
v), which results in two differentiate phases: an upper phase containing polar metabolites and a lower phase containing nonpolar metabolites. Subsequently, in 1959 Bligh and Dyer
[41] developed a modified method using a miscible chloroform-methanol-water mixture and later separated into two phases by adding chloroform or water. Both approaches enable the separation of polar and nonpolar metabolites, thus, allowing the analysis of a wide range of metabolites and making them compatible with several analytical platforms. While the use of Bligh and Dyer LLE is widely extended for HM metabolomics studies (see
Table 1)
[13][16][17][18][19][24][25][29][32], only Andreas et al.
[28] used a modified Folch extraction protocol for processing HM samples.
Methyl tert-butyl ether (MTBE) in combination with methanol has recently been proposed for single-phase extraction
[27]. MTBE is a nontoxic and noncarcinogenic solvent and it is therefore considered a safe and environmentally friendly alternative to harmful solvents employed in traditional LLE methods, such as chloroform, which is a suspected human carcinogen. In this extraction method, a unique phase containing both, polar and nonpolar metabolites is obtained with a protein pellet at the bottom (see
Figure 3). Thus, the simultaneous analysis of lipidome and metabolome in a very small amount of biological sample is achievable. This method has been successfully employed to determine polar metabolites and fatty acids (FAs) in HM by GC-MS
[27][28], as well as lipids and polar metabolites by LC-MS
[15][27][28], thus, increasing the metabolome coverage by the combined use of complementary analytical platforms.
Precipitation with organic solvents separates the polar and nonpolar metabolites of the proteins that settle at the bottom of the tube which can then be easily removed by centrifugation. This simple method has been employed for the analysis of polar metabolites by GC-MS after derivatization
[36] as well as for the analysis of polar and nonpolar metabolites by LC-MS without further pre-processing
[18]. Furthermore, this approach has been implemented in more sophisticated workflows as recently shown by Hewelt-Belka et al.
[35]. Here, the authors combined LLE and a protein precipitation and solid-phase extraction (SPE) procedure to prepare HM samples, thereby, enabling the detection of high- and low-abundant lipid species (e.g., glycerolipids and phospholipids) in one LC-MS run.
3. The HM Metabolome: Compound Annotation and Coverage
As in other areas of metabolomic research, compound identification is still a major bottleneck in data analysis and interpretation. The Metabolomics Standards Initiative’s (MSI) defines four levels of metabolite identification, which include: identified metabolites (level 1); putatively annotated compounds (level 2); putatively annotated compound classes (level 3); and unknown compounds (level 4)
[44]. Due to the limited availability of pure analytical standards required to reach level 1, biological databanks and spectral databases are the most important resources for metabolite annotation (levels 2 and 3). A large number of databases are available today, providing different levels of information and complementary data on chemical structures, physicochemical properties, biological functions, and pathway mapping of metabolites
[45]. The metabolomics community classifies these resources in several categories: (i) chemical databases; (ii) spectral libraries; (iii) pathway databases; (iv) knowledge databases; and (v) references repositories
[46].
Regarding HM metabolomics, the most frequently used databases and libraries are: Human Metabolome Database (HMDB)
[47], Metabolite and Chemical Entity Database (METLIN)
[48], National Institute of Science and Technology (NIST) library, Fiehn RTL Library
[49], LipidMAPS Structure Database (LMSD)
[50], Milk Metabolome Database (MCDB)
[51][52], Kyoto Encyclopedia of Genes and Genomes (KEGG)
[53], MycompoundID with the evidence-based metabolome library (EML)
[54], Chenomx NMR Suite Profiles and other online university databases, such as CEU-mass mediator
[55][56].
Metabolite assignment in NMR spectra has been performed based on literature data and commercial resonance databases, such as Chenomx NMR Suite Profiles. Metabolite annotation was contrasted with in-house libraries containing pure compound spectra. Some of the proposed assignments were confirmed by two-dimensional NMR spectra, such as Correlation Spectroscopy (COSY)
[13][29][31][32], Homonuclear Correlation Spectroscopy (TOCSY)
[13][31][32][34], Diffusion-Ordered Spectroscopy (DOSY)
[32], Heteronuclear Single Quantum Coherence Spectroscopy (HSQC)
[32][34], and Heteronuclear Multiple Bond Correlation (HMBC)
[32].
In LC-MS and CE-MS-based studies of the HM metabolome, tentative metabolite annotation has been carried out by matching of accurate masses, isotopic profiles, and/or fragmentation patterns to candidate metabolites in online databases such as KEGG, METLIN, LipidMAPS, and HMDB
[18][24][25][27][28][35]. In-house built databases generated by the analysis of commercial standards are also commonly employed
[24][25]. In GC-MS, retention index (RI) corrections are made by analyzing a fatty acid methyl ester (FAME) mixture standard solution and assigning a match score between the experimental FAME mixture and theoretical RI values based on the values contained in the Fiehn RTL library. Furthermore, metabolites were complementarily annotated by comparing their mass fragmentation patterns with those available in Fiehn RTL and NIST libraries
[13][17][18][19][27][28][36].
A comprehensive list of annotated and/or identified metabolites in HM from untargeted metabolomics studies
[14][15][17][18][19][21][22][23][24][25][26][27][28][29][31][32][33][34][35][36] is reported. This table contains information about the metabolites reported in each reference, such as their molecular formula, IDs (LipidMAPS and/or HMDB IDs), the extraction procedure performed, the analytical platform used, and the detected metabolite class. Readers can select metabolites dynamically by filtering data according to the latter information. A total of 1187, 111, and 128 metabolites were reported using LC-MS, GC-MS, and NMR, respectively (see
Figure 2). As shown in the Venn diagram, LC-MS and GC-MS allowed the detection of 36 common metabolites (mainly carbohydrates and FAs); a total of 29 metabolites overlapped between LC-MS and NMR (principally oligosaccharides); and 21 metabolites (predominantly amino acids and organic acids) were commonly reported in GC-MS and NMR based studies. Only 13 metabolites were reported by all three platforms, i.e., creatine, tyrosine, arabinose, galactose, glucose, lactose, maltose, capric acid/caprate, caprylic acid/ caprylate, citric acid/citrate, pyruvic acid/pyruvate, hippuric acid/hippurate, and myo-inositol. These metabolites were assigned to different classes including amino acids, carbohydrates, FAs, and organic acids.
Based on the available data from the literature, the distribution of metabolite classes present in HM according to each technique was assessed. As can be seen in Figure 3, the difference in detected metabolite classes as observed by LC-MS in comparison to GC-MS and NMR is evident. Using GC-MS and NMR, carbohydrates are the most reported metabolites in HM, followed by amino acids, organic acids, organooxygen compounds, and organoheterocyclic compounds, with all these metabolite classes being certainly less abundant in LC-MS studies. In the case of NMR, organonitrogen compounds have also been reported, as well as nucleosides and nucleotides on a smaller scale. In the case of lipid classes, fatty acyls have been identified by LC-MS and GC-MS with similar incidence and in lesser extent by NMR. It is indubitable that lipid classes are more comprehensively studied by LC-MS assays, where glycerophospholipids, glycerolipids, and fatty acyls are detected at relatively high abundances, followed by sphingolipids, sterol lipids, and, to a lesser extent, prenol lipids.
Table 2 shows a list of metabolites reported in > 80% of studies employing either LC-MS, GC-MS, or NMR-based assays. This table is intended to aid method development of future untargeted metabolomics workflows tailored to the study of the HM metabolome, as it shows a shortlist of metabolites that should be detected by each platform regardless of the instrumental settings employed. It should be noted that due to the high versatility of LC-MS, there is a greater variation in metabolites recorded and in return, the list of consistently reported metabolites in HM across studies is shorter than for NMR and GC-MS, where differences in experimental conditions and variations between the employed detection parameters and instruments are smaller. Again, this table represents the high orthogonality between the detected metabolites using NMR and LC-MS. While the use of LC-MS is clearly of advantage for the measurement of different lipids, NMR provides information on amino acids and small organic acids. Metabolome coverage provided by GC-MS falls in-between the other two platforms, consistently providing information on lipids, sugars, amino acids, and organic acids.