2. Major Cash Crops with High PA Contents
A total of 10 high-PA-content cash crops with currently available genomic databases were selected: almond (
Prunus dulcis), apple
(Malus domestica), blueberry (
Vaccinium corymbosum), cacao bean (
Theobroma cacao), common bean (
Phaseolus vulgaris), grape (
Vitis vinifera), peanut (
Arachis hypogaea), soybean (
Glycine max), strawberry (
Fragaria ×
ananassa), and tea tree (
Camellia sinensis) (
Table 1). These crops play a critical role in the global food market and cosmetic industry; according to statistical data from the Food and Agriculture Organization of the United Nations (
www.fao.org, accessed on 5 February 2021), apples, grapes, and soybeans were ranked in the top 20 for commodity production in the USA in 2019. Almonds and blueberries were also produced at capacities of about 193.7 kt and 30.9 kt, respectively, in 2019 in the USA. Peanuts and strawberries are important commercial crops in several countries, including China, India, Indonesia, and the USA. In China, which has the largest market, the trade volumes of peanuts and strawberries were USD 471 million and 75 million, respectively, in 2019. Cacao beans are consumed worldwide as an ingredient in a variety of processed products—such as chocolate, cocoa powder, and cocoa butter—at a capacity of 3991 kt (in 2016, with the latest available data), as reported by Statista (
www.statista.com/, accessed on 6 February 2021).
The PA content ranged from 145.0 to 3532.2 mg/100 g among the 10 species (
Table 1). Grape seeds have been reported to have the highest level of PAs (3532.2 g/100 g), followed by cacao and common beans (1460.0 g/100 g and 1000.1 g/100 g, respectively)
[2][14][23]. The levels of PAs detected in the 10 species can vary depending on the cultivars or measuring conditions used in each study, and the amount of actual absorption by humans can change depending on the manner of intake as food; for example, fresh grapes have much lower levels of PAs (0.05 g/100 g) than grape seeds (3532.2 g/100 g)
[2]; in soybeans, seed coats contained significantly higher contents of PAs than the embryos
[24]. In common beans, the levels of anthocyanin and PAs vary according to the color features of the bean coats, depending on the genotypes
[23][25].
Table 1. List of 10 major crops containing high levels of proanthocyanidins and their reference genomic information. Among major crops with high contents of PAs, the species with currently available genomic databases were selected for this study.
| Species |
Proanthocyanidins Content (mg/100 g) |
Reference |
Genome Database |
Reference |
Assembly Size (Mb) |
Coverage (%) |
Contig N50 (Kb) |
Number of Genes Predicted |
Almond (Prunus dulcis) |
184 |
Prior & Gu, 2005 [14] |
Prunus dulcis Lauranne Genome v1.0 (http://rosaceae.org/, accessed on 12 February 2021) |
Alioto et al., 2020 [26] |
227 |
95 |
103 |
27,969 |
Apple tree (Malus domestica) |
162 |
Hellström et al., 2009 [2] |
(iris.angers.inra.fr/gddh13/, accessed on 12 February 2021) |
Daccord et al., 2017 [27] |
643 |
100 |
620 |
42,140 |
Blueberry (Vaccinium corymbosum) |
255 |
Prior & Gu, 2005 [14] |
V_corymbosum v1.0 (http://gigadb.org/, accessed on 12 February 2021) |
Colle et al., 2019 [28] |
1680 |
102 |
15 |
32,140 |
Cacao bean (Theobroma cacao) |
1460 |
Hellström et al., 2009 [2] |
Cacao Matina1-6 Genome v2.1 (http://cacaogenomedb.org, accessed on 12 February 2021) |
Publication in progress (http://cacaogenomedb.org) |
346 |
80 |
1080 |
27,379 |
Common bean (Phaseolus vulgaris) |
1000 |
Kan et al., 2016 [23] |
Phaseolus vulgaris v2.1 (http://phytozome.jgi.doe.gov/, accessed on 12 February 2021) |
Schmutz et al., 2014 [29] |
600 |
80 |
1900 |
27,433 |
Grape (Vitis vinifera) |
3532 |
Prior & Gu, 2005 [14] |
Vitis vinifera v2.1 (http://phytozome.jgi.doe.gov/, accessed on 12 February 2021) |
Jaillon et al., 2007 [30] |
487 |
102 |
566 |
26,346 |
Peanut (Arachis hypogaea) |
186 |
Hellström et al., 2009 [2] |
(http://peanutgr.fafu.edu.cn/, accessed on 12 February 2021) |
Zhuang et al., 2019 [31] |
2538 |
94 |
1509 |
83,709 |
Soybean (Glycine max) |
300 |
Lee et al., 2017 [24] |
Glycine max Wm82.a4 (http://www.soybase.org, accessed on 12 February 2021) |
Schmutz et al., 2010 [32] |
1150 |
95 |
1492 |
46,430 |
Strawberry (Fragaria × ananassa) |
145 |
Prior & Gu, 2005 [14] |
(https://datadryad.org/, accessed on 12 February 2021) |
Edger et al., 2019 [33] |
813 * |
99 |
79 |
108,087 |
Tea tree (Camellia sinensis) |
189 |
Engelhardt et al., 2003 [12] |
(http://tpia.teaplant.org/, accessed on 12 February 2021) |
Xia et al., 2019 [34] |
2890 |
95 |
67 |
53,512 |
* This assembly genome size (Fragaria × ananassa) is the haploid genome size.
3. Identification of Orthologous Genes Involved in PA Biosynthesis in Major Cash Crops
Despite the nutritional importance of flavonoids, key enzymes involved in these pathways have not been identified in many major food crops with high flavonoid contents (Table 1). To characterize the PA biosynthetic pathways in the selected species, the orthologous genes encoding key enzymes involved in all pathways were searched in the latest reference genomic databases of each species (Table 1). N50 and genome coverage are two of the most important factors for evaluating the quality of genome assembly, where N50 is defined as the length of the contig, scaffold, super-scaffold, and pseudomolecule together being shorter than or equal to 50% of the total genome assembly length, and coverage is calculated from the percentage of the total assembly size over the reference genome. In this study, contig N50 varied widely, ranging from 79 kb (strawberry) to 1900 kb (common bean), and the highest, lowest, and mean genome coverage were 102% (blueberry and grape), 80% (common bean and cacao), and 94%, respectively (Table 1).
To identify orthologous genes involved in the PA pathway in the major crops, a list of key enzymes (F3′5′H, F3′H, F3H, DFR, LAR, ANS, and ANR) based on the EC number from the KEGG database were collected . Of the eight species, 92 genes were identified, including two wild relative species of strawberry and peanut (Arachis duranensis: 6, A. thaliana: 6, Fragaria vesca: 4, G. max: 20, M. domestica: 12, P. vulgaris: 12, T. cacao: 9, V. vinifera: 23) (Table 2). The protein sequences were downloaded from the NCBI GenBank database (www.ncbi.nlm.nih.gov, accessed on 4 March 2021) and cross-checked using the reference genomic database of each species (Table 2). It was found that the sequences of three genes—XP_014632702.1 (G. max), XP_007138248 (P. vulgaris), and XP_017985307.1 (T. cacao)—were not available in the latest reference annotation data (Supplementary Table S1). In total, 83 genes were identified in the reference genome, and 14 paralogs, which had not been reported to be involved in the pathways in the KEGG database, were additionally detected in two species (A. hypogea: 5 and F. ananassa: 9; Table 2). Using these 97 genes, orthologous genes were identified in the eight crop species, and 105 genes were newly identified as candidate genes involved in the flavonoid biosynthetic pathway from five species (A. hypogea: 1, C. sinensis: 19, F. ananassa: 7, P. dulcis: 4, and V. corymbosum: 74) (Table 2, Supplementary Table S2). The annotation data currently available for the newly identified orthologous and paralogous genes agreed well with the EC numbers from the KEGG database. In C. sinensis, P. dulcis, and V. corymbosum, a total of 97 orthologs were first found in our study, and they have not been reported to be involved in the phenylpropanoid pathway according to the KEGG pathway.
Table 2. Numbers of genes involved in proanthocyanidins production in the 11 species. Based on sequence similarity, orthologous genes were searched using BLASTP.
| Species |
a Number of Genes from KEGG Pathway |
b Number of Genes Confirmed from Reference Database |
c Number of Genes Newly Identified Using a |
Number of Orthologous Genes Newly Identified Using b+c |
| Arachis hypogea |
6 * |
6 |
5 |
1 |
| Arabidopsis thaliana |
6 |
6 |
- |
- |
| Fragaria × ananassa |
4 * |
4 |
9 |
7 |
| Glycine max |
20 |
19 |
- |
- |
| Camelia sinensis |
- |
- |
- |
19 |
| Malus domestica |
12 |
12 |
- |
- |
| Phaseolus vulgaris |
12 |
11 |
- |
- |
| Theobroma cacao |
9 |
8 |
- |
- |
| Vitis vinifera |
23 |
17 |
- |
- |
| Prunus dulcis |
- |
- |
- |
4 |
| Vaccinium corymbosum |
- |
- |
- |
74 |
| Total |
92 |
83 |
14 |
105 |
* The genes involved in the biosynthetic pathways have been reported in wild relative species of strawberry (Fragaria vesca) and peanut (Arachis duranensis). a the number of genes identified from KEGG database, b the number ofgenes confirmed from reference database, c the number ofgenes newly identified from reference genome using a.
4. Conclusions
Although different types of PAs are present in many cash crops, PAs have been relatively underinvestigated compared to other flavonoids, such as anthocyanins. This study identified species-specific biosynthesis pathways for PAs and a list of responsive orthologs in the reference genomic data of each species. The competition between parallel pathways may represent a significant regulatory mechanism not only for PA content, but also for the types of PAs, depending on the species. Our results will play a role in molecular breeding, to improve the nutritional quality of dietary food sources. In addition to the food market, PAs have been receiving attention in the pharmacological and cosmetic industries, with therapeutic potential for humans. Furthermore, molecular engineering of the parallel pathways could be a prominent approach to regulate the PA and anthocyanin levels, depending on the desired purpose.