Soy has been recognized as a medicinal plant since it contains several bioactive compounds in its various parts. For example, bioactive peptides found in soybeans have been linked to human health benefits with potential anti-hypertensive, anti-cancer, and anti-inflammatory properties. Another type of bioactive compound identified in soybeans, the anthocyanins, showed anti-obesity and anti- inflammatory properties. Isoflavonoids, the best-known class of compounds found in all parts of soy, have been studied due to their potential protective effects associated with chronic diseases, cancer, osteoporosis, and menopausal symptoms. Different factors modulate a plant’s metabolism, and metabolomics can measure these variations qualitatively and quantitatively, analyzing the production and turnover of primary and secondary (specialized) metabolites. In soy, metabolomics studies have identified four main causes of changes in metabolism: genetic modifications, organism interactions, growth stages, and abiotic factors.
Plants have been used to produce food, feed, energy, biomaterials, and also as a source of bioactive compounds. Metabolomics has emerged as one of the principal contributors to enhancing the identification of these compounds, generating innovative discoveries and supporting the development of novel products . Progress in efficient extraction techniques, such as ultrasound, microwave, and pulsed-electric-field-assisted extractions, as well as supercritical fluid and pressurized liquid extractions, among others, generate extracts with a higher yield and bioactivity . Once these extracts are generated, they can be analyzed with one or more powerful chromatography and/or electrophoresis techniques coupled to high-resolution mass spectrometry (MS) or nuclear magnetic resonance (NMR), producing accurate chemical information on a vast number of compounds . For the identification of metabolites, databases have been increasingly updated, crosslinking information from different libraries. Sorokina and Steinbeck  list almost one hundred databases useful for natural product research. In addition, Global Natural Product Social Molecular Networking (GNPS) and Small Molecule Accurate Recognition Technology (SMART 2.0) are examples of bio-cheminformatics tools for the analysis of MS and NMR data, respectively . All these modern techniques and tools support the advancement of metabolomics’ frontiers.
In 2019, 8.3 billion metric tons of cereals, oil crops, roots and tubers, sugar crops, and vegetables were produced . However, it is estimated that one-third of food production is lost and wasted, and this problem is Target 12.3 of the 17 Sustainable Development Goals (SDGs) set by the United Nations (UN) . In this context, foodomics has shown the potential not only of foods, but also of their related by-products, as sources of compounds with human health benefits (Figure 1) . For example, Katsinas et al.  used supercritical carbon dioxide and pressurized liquid extractions to valorize olive pomace, which is a by-product of the olive oil industry. As a result, they identified several phenolic compounds and generated bioactive extracts. Assirati et al.  applied a metabolomics approach in the chemical investigation of the three major solid sugarcane (Saccharum officinarum) by-products, leading to the identification of up to 111 metabolites in a single matrix, with several of these compounds already known by their potent bioactive properties, such as 1-octacosanol, octacosanal, orientin, and apigenin-6-C-glucosylrhamnoside. Terpenes of orange (Citrus sinensis) juice by-products showed antioxidant and neuroprotective potential in in vitro assays, as revealed by Sánchez-Martínez et al. . As for the permeability of the blood–brain barrier, some terpenes of orange extract demonstrated a high capacity to cross this obstacle, which is a critical point for treating Alzheimer’s disease .
Figure 1. Foodomics proposes a holistic approach to develop ingredients and products with health benefits from foods and their by-products.
Soy, also known as soybean (Glycine max (L.) Merr.), is originally from China and Eastern Asia . It is the major oilseed crop worldwide, with a world production of 362, 254, and 61 million metric tons of soy grains, meal, and oil, respectively, in 2020/21. For the same period, the global area harvested was 1.28 million km2, 2.5 times the area of Spain . Figure 2 shows soybean production from 2000/01 to 2020/21, demonstrating consistent growth, with few moments of decrease . However, this production involves just one part of G. max: the beans. Krisnawati and Adie  analyzed 29 soybean genotypes and found an average value of 1.65 for the straw:grain ratio in soy. Therefore, it is estimated that about 597 million metric tons of soy branches, leaves, pods, and roots will be left on the ground post-harvesting in 2020/21 . Figure 3 shows the soil of a no-tillage soybean production, a system which leaves all underused soy parts on the ground. Keeping these materials on the soil contributes to mineral, organic matter, and humidity factors . In contrast, problems related to higher weed and disease infestations, as well as greenhouse gas emissions caused by the decomposition of organic matter, require alternative management of the agricultural straw . By applying a biorefinery approach, such by-products could be transformed into raw material for the extraction of several bioactive compounds.
Figure 2. World soybean production 2000–2020, in million metric tons.
Figure 3. Underused soy parts left on the soil just after the soybean harvest.
Inspired by the potential of underused soy parts, this review aims to show the application of metabolomics in soy analysis, listing the potential of these by-products as a source of high-added-value compounds, as well as the factors which affect their production.
Genetic modifications can be related to different species and cultivar/variety of soybean. Lu et al.  investigated the metabolic changes between two soybean species (Glycine max and Glycine soja) under salt stress. Using gas chromatography coupled to mass spectrometry (GC–MS) and liquid chromatography coupled to Fourier transform and mass spectrometry (LC–FT/MS), the authors found a higher content of hormones, reactive oxygen species, and other substances related to the salt stress condition. In another study, Glycine max and Glycine gracilis presented different profiles of secondary metabolites during the growth stage, as revealed by a 1H NMR-based metabolomics approach . The advancement of molecular biology provides the development of a wide range of soybean cultivars or varieties, with new types of plants resistant against insects, abiotic stress, and other factors. The United States Patent and Trademark Office (USPTO) database reveals 4869 patents for a “soybean cultivar” or “soybean variety” search . Different colored soybeans, such as brown, yellow, or black, present specific metabolite profiles . Isoflavones could be the substrate for the production of proanthocyanidin in the seed coat, being a possible cause for the brown color of the cultivar Mallikong mutant . Yang et al.  identified higher levels of anthocyanin and protein in yellow cotyledon seeds of black soybean. In contrast, higher levels of isoflavone, stearic acid, and polysaccharide are related to the green cotyledon seeds of the same species. Two Korean soy cultivars, Sojeongja and Haepum, presented different levels of soyasaponins Aa and Ab, whose production is related to specific gene variations . Another important factor in genetic modification is the transgenic soybean. García-Villalba et al.  used capillary electrophoresis time-of-flight mass spectrometry (CE-TOF-MS) to qualitatively and quantitatively measure the metabolites of transgenic and non-transgenic soybeans. In summary, similar types and amounts of metabolites were identified. The same result was achieved by Harrigan et al.  and Clarke et al. . However, it is reported that transgenic soybeans were less affected by generational effects and can present more secondary metabolites, such as prenylated isoflavones .
Moreover, the interaction between soy and microorganisms, nematodes, aphids, and other insects causes distinct metabolic responses, and metabolomics is a unique approach for understanding such changes, providing insights to improve soy’s response against biotic factors . Recent works used GNPS to identify metabolite variation in soy infected by the fungus Phakopsora pachyrhizi and the nematode Aphelenchoides besseyi . Both pathogens resulted in a higher production of bioactive compounds such as flavonoids, isoflavonoids, and terpenoids.
Distinct metabolic responses have also been reported for each growth stage of soybean . During germination, 58 metabolites were reported in the separation of soy sprouts, such as phytosterols, isoflavones, and soyasaponins . The production of secondary metabolites such as daidzein, genistein, and coumestrol also changed in the vegetative and reproductive soybean stages, as described by Song et al. .
The presence of soybean crops in a wide range of latitudes and longitudes is a consequence of several adaptive changes in their metabolism. Brazil, which is the major producer of soybean, presents different soil and climate types; even so, there is soy production in all its regions. This fact corroborates the high performance of soybean in several abiotic conditions. In addition, treatments with fertilizers and other agricultural inputs have been tested for the cultivation of soybeans in unfavorable conditions, causing additional modifications in soy metabolism . As an example of external treatments, ethylene application on soybean leaves increased the genistin, daidzin, malonylgenistin, and malonyldaidzin production . Using two ionization methods, electrospray ionization (ESI) and matrix-assisted laser desorption ionization (MALDI), coupled to Fourier transform ion cyclotron resonance-mass spectrometry (FTICR-MS), Yilmaz et al.  analyzed the metabolite profile of soy leaves from midsummer to autumn. They found a decreased production of chlorophyll-related metabolites and a higher level of disaccharides from summer to autumn. Another metabolomic approach analyzed soy leaves from crops with different geographical localizations and identified different amounts of metabolites such as pinitol and flavonoids . An excellent review performed by Feng et al.  summarizes the use of metabolomics in soy under abiotic stress.
In addition to the four main causes of change in soy metabolism mentioned above, both qualitative and quantitative metabolic variations among soy organs are expected. To present an overview of the metabolite profile of underused soy parts, we selected metabolomics and related works which used various approaches to analyze them . Using Jchem (JChem for Excel 126.96.36.1997, ChemAxon (https://www.chemaxon.com, accessed on 8 April 2021))  and ClassyFire , we organized and classified the metabolites identified in soy roots, leaves, branches, and pods, respectively. Figure 4 summarizes the best-known classes of bioactive compounds identified in underused soy parts. Carboxylic acids and their derivatives, such as amino acids, peptides, and analogues, are the most mentioned class of compounds. This class is mainly composed of primary metabolites; however, it also contains several bioactive compounds. Similarly, organooxygen and fatty acyl compounds include metabolites with human health benefits. Isoflavonoids, which are the most mentioned class of secondary metabolites, as well as prenol lipids and flavonoids, have been suggested to have a wide range of medicinal uses. Focusing on secondary metabolites, prenol lipids are the most identified class of compounds in soy roots, with several soyasaponins found in this part. In soy leaves, different subclasses of isoflavonoids have been found, such as isoflavonoid O-glycosides, isoflavans, isoflav-2-enes, and others. The metabolite profiles of soy branches and pods have been less studied; however, approximately 20 flavonoids and isoflavonoids have been identified in each part. Other classes of compounds, such as steroids and steroid derivatives, coumarins and derivatives, and cinnamic acids and derivatives, have been found in underused soy parts.
Figure 4. Classification of the metabolites identified in soy roots, leaves, branches, and pods according to ClassyFire.
Table 1 presents 38 isoflavonoids identified in one or more of the above-mentioned underused soy parts. Eight of them (daidzein, genistein, glycitein, daidzin, genistin, glycitin, malonyldaidzin, and malonylgenistin) were reported in all soy organs. Recent works showed promising biological activities of daidzein against colon cancer and hepatitis C virus . Daidzin, which is a glyco-conjugate form of daidzein, presented therapeutic properties against multiple myeloma and epilepsy . Bioactivity studies regarding the other aforementioned compounds also found properties against chronic vascular inflammation, human gastric cancer, breast cancer, and degenerative joint diseases . Biochanin A, coumestrol, glyceollin, medicarpin, and ononin are more examples of widely known bioactive isoflavonoids which are found in different soy organs (see Table 1 for a summary) . Carneiro et al.  quantified six isoflavones in soy branches, leaves, pods, and beans collected just after mechanical harvesting. Almost 3 kg of isoflavones were found per metric ton of soy leaves. However, less than 1 kg per metric ton was found in soy branches and pods. In soybeans, which are the main product of the soy plant, it was approximately 2 kg per metric ton.
Table 1. Isoflavonoids identified in soy branches (B), leaves (L), pods (P), and roots (R).
|biochanin A 7-O-D-glucoside||C22H22O10||X|||
|biochanin A 7-O-glucoside-6′′-O-malonate||C25H24O13||X|||
Different compounds belonging to the prenol lipids category, which are recognized by their bioactivity, have already been identified in soy. Tsuno et al.  identified several soyasaponins, sapogenins, and isoflavones in soy root exudates. Soyasaponins have been linked to anti-obesity, anti-oxidative stress, and anti-inflammatory properties, as well as preventive effects on hepatic triacylglycerol accumulation . Omar et al.  identified the potent inhibitory effects of soyasapogenol A, which is a triterpenoid, against p53-deficient aggressive malignancies. In addition, other compounds of different classes, such as fatty acyls, isoflavonoids, flavonoids, and others. Linoleic acid, naringenin, and formononetin-7-O-glucoside, which are examples of the aforementioned classes, have been related to cardiovascular health, neuroprotective effects, and anti-inflammatory properties . The chemical structures of these bioactive compounds are presented in Figure 5.
Figure 5. Chemical structures of soyasapogenol A, linoleic acid, naringenin, and formononetin-7-O-glucoside, which are examples of bioactive compounds identified in soy roots.
Leaves and roots are the most-studied underused soy parts. 259 metabolites of 32 classes identified in soy leaves are presented . Almost 90 of these compounds are flavonoids, isoflavonoids, or prenol lipids. Widely known bioactive flavonoids such as apigenin, kaempferol, rutin, and others were also identified. Apigenin has been suggested as a potential anticancer agent . Glyceollin I and soyasaponin I, an isoflavonoid and a prenol lipid, presented activities against breast cancer and Parkinson’s disease, respectively . Moreover, different soyasaponins and even trigonelline, which is an alkaloid, were found in this part of the plant. For example, the latter substance was reported to have potential for lung cancer therapy, memory function recovery, and an anti-obesity effect . Figure 6 shows the chemical structures of the aforementioned metabolites.
Figure 6. Chemical structures of apigenin, glyceollin I, soyasaponin I, and trigonelline, which are examples of bioactive compounds identified in soy leaves.
In soy branches, 197 compounds have already been identified. The most widely reported class among these metabolites is the organooxygen compounds category (53 compounds), such as alcohols and polyols, carbohydrates and their conjugates, and carbonyl. Shikimic acid, an example of an organooxygen compound, was linked to therapeutic effects in osteoarthritis . Metabolites of other classes, such as succinic and stearic acids, presented an apoptotic effect in T-cell acute lymphoblastic leukemia and antifibrotic activity, respectively . Flavonoids and isoflavonoids, such as 7,4′-dihydroxyflavone and glycitin, presented activity against lung diseases . The chemical structures of these compounds are shown in Figure 7.
Figure 7. Chemical structures of shikimic acid, stearic acid, 7,4′-dihydroxyflavone, and glycitin, which are examples of bioactive compounds identified in soy branches.
Similarly to branches, there are few metabolomics works identifying pod metabolites . Amino acids, peptides, and mono-, di-, and tricarboxylic acids and their derivatives are the most mentioned types of compounds in pods, with some of these substances already widely used in industry, such as citric and fumaric acids. Moreover, specialized metabolites such as camphene and α-pinene, which were also identified in soy pods, presented anti-skeletal muscle atrophy and neuroprotective effects, respectively . Quercetin, which is a widely known flavonoid, may be a potential anti-inflammatory treatment in patients with COVID-19, as described by Saeedi-Boroujeni and Mahmoudian-Sani . Hexadecanoic acid, a fatty acyl compound, presented an inhibitory effect on HT-29 human colon cancer cells . Figure 8 presents the chemical structures of one compound of each class mentioned. In addition, fatty acyls, flavonoids, isoflavonoids, and other classes of compounds were identified in pods.
Figure 8. Chemical structures of citric acid, camphene, quercetin, and hexadecanoic acid, which are examples of bioactive compounds identified in soy pods.