1. Introduction
Proanthocyanidins (PAs), or condensed tannins, are oligomeric or polymeric end-products of flavonoid metabolism, starting from the central phenylpropanoid pathway [
1]. PAs are brown-pigmented and present in the seed coats or seeds, fruits, bark, and leaves of a wide range of plant species, including important cash crops, such as apples, grapes, soybeans, common beans, cereals, and most berries [
2]. These phytochemicals are brown-pigmented, increase plant resistance to herbivory, and protect plants from biotic and abiotic stresses, such as pathogens, insect attacks, and ultraviolet (UV)-B radiation [
3,
4].
In plants, PAs play important roles in resistance to several abiotic and biotic stresses [
3,
4]; several studies have reported that PAs increase tolerance to severe environmental conditions such as low temperature [
5], drought [
6], and UV-B radiation [
7]. Furthermore, PAs impart astringency and bitterness to young leaves and fruits, deterring herbivory [
8]. Accumulation of PAs also enhances tolerance to infection by biotrophic fungi [
9,
10] and other plant pathogens [
7]. More PAs were accumulated after mechanical wounding and attack by herbivores [
8].
PAs have attracted much attention because of their biological and therapeutic potential in humans. PAs provide unique flavors and acerbity to many foods and drinks, such as chocolate, fruit juice, tea, and wine [
11]. Tea, which is the most commonly consumed beverage worldwide, is rich in PAs [
12]. Cacao is a major food source with high PA content in the confectionery industry [
13]. Pharmacological studies have shown beneficial effects of PAs in humans, such as antimicrobial, antidiabetic, antiaging, antioxidant, anticancer, and anti-inflammatory effects [
14,
15]. In addition to bioavailability, PAs have been reported to improve eyesight and neuroprotective functions, and promote flexibility in joints and blood circulation [
16]. With these various benefits for human health, they have been considered to be important food-derived bioactive compounds in the pharmacological and cosmetic industries. Barks of common cinnamons are abundant with PAs, used as a fold medicine or supplement [
17]. In soybeans, a cultivar containing high PA levels has been used as an ingredient for cosmetic products [
18]. In grapes, because PAs are intensively accumulated in their seeds, seed oil products have been used as supplements for health promotion [
19].
In the flavonoid biosynthetic pathway, PAs are the end-products of a branch of the anthocyanidin biosynthetic pathway. PAs and anthocyanins are derived from the same precursor—anthocyanidin—and share a common biosynthetic process for the conversion of phenylalanine to anthocyanidin [
3]. PAs are a group of oligomers and polymers composed of flavan-3-ols—the most common subclass of flavonoids. Depending on the composition of the monomer precursor, the type of PA varies, including catechin, epicatechin, gallocatechin, epigallocatechin, afzelechin, and epiafzelechin, which are commonly found in the plant kingdom [
2,
20,
21,
22].
However, because of their beneficial properties, the accumulation of PAs has become a target of breeding and genetic engineering in a few model species; yet, the key genes involved in the pathway remain unclear in most cash crops [
18]. In this study, species-specific biosynthetic pathways of PAs are reported based on metabolic intermediates, and orthologous genes of the key enzymes involved in each pathway are identified in 10 major cash crops.
2. Major Cash Crops with High PA Contents
In this study, a total of 10 high-PA-content cash crops with currently available genomic databases were selected: almond (
Prunus dulcis), apple
(Malus domestica), blueberry (
Vaccinium corymbosum), cacao bean (
Theobroma cacao), common bean (
Phaseolus vulgaris), grape (
Vitis vinifera), peanut (
Arachis hypogaea), soybean (
Glycine max), strawberry (
Fragaria ×
ananassa), and tea tree (
Camellia sinensis) (
Table 1). These crops play a critical role in the global food market and cosmetic industry; according to statistical data from the Food and Agriculture Organization of the United Nations (
www.fao.org, accessed on 5 February 2021), apples, grapes, and soybeans were ranked in the top 20 for commodity production in the USA in 2019. Almonds and blueberries were also produced at capacities of about 193.7 kt and 30.9 kt, respectively, in 2019 in the USA. Peanuts and strawberries are important commercial crops in several countries, including China, India, Indonesia, and the USA. In China, which has the largest market, the trade volumes of peanuts and strawberries were USD 471 million and 75 million, respectively, in 2019. Cacao beans are consumed worldwide as an ingredient in a variety of processed products—such as chocolate, cocoa powder, and cocoa butter—at a capacity of 3991 kt (in 2016, with the latest available data), as reported by Statista (
www.statista.com/, accessed on 6 February 2021).
The PA content ranged from 145.0 to 3532.2 mg/100 g among the 10 species (
Table 1). Grape seeds have been reported to have the highest level of PAs (3532.2 g/100 g), followed by cacao and common beans (1460.0 g/100 g and 1000.1 g/100 g, respectively) [
2,
14,
34]. The levels of PAs detected in the 10 species can vary depending on the cultivars or measuring conditions used in each study, and the amount of actual absorption by humans can change depending on the manner of intake as food; for example, fresh grapes have much lower levels of PAs (0.05 g/100 g) than grape seeds (3532.2 g/100 g) [
2]; in soybeans, seed coats contained significantly higher contents of PAs than the embryos [
38]. In common beans, the levels of anthocyanin and PAs vary according to the color features of the bean coats, depending on the genotypes [
34,
85].
Table 1. List of 10 major crops containing high levels of proanthocyanidins and their reference genomic information. Among major crops with high contents of PAs, the species with currently available genomic databases were selected for this study.
| Species |
Proanthocyanidins Content (mg/100 g) |
Reference |
Genome Database |
Reference |
Assembly Size (Mb) |
Coverage (%) |
Contig N50 (Kb) |
Number of Genes Predicted |
Almond (Prunus dulcis) |
184 |
Prior & Gu, 2005 [14] |
Prunus dulcis Lauranne Genome v1.0 (http://rosaceae.org/, accessed on 12 February 2021) |
Alioto et al., 2020 [31] |
227 |
95 |
103 |
27,969 |
Apple tree (Malus domestica) |
162 |
Hellström et al., 2009 [2] |
(iris.angers.inra.fr/gddh13/, accessed on 12 February 2021) |
Daccord et al., 2017 [32] |
643 |
100 |
620 |
42,140 |
Blueberry (Vaccinium corymbosum) |
255 |
Prior & Gu, 2005 [14] |
V_corymbosum v1.0 (http://gigadb.org/, accessed on 12 February 2021) |
Colle et al., 2019 [33] |
1680 |
102 |
15 |
32,140 |
Cacao bean (Theobroma cacao) |
1460 |
Hellström et al., 2009 [2] |
Cacao Matina1-6 Genome v2.1 (http://cacaogenomedb.org, accessed on 12 February 2021) |
Publication in progress (http://cacaogenomedb.org) |
346 |
80 |
1080 |
27,379 |
Common bean (Phaseolus vulgaris) |
1000 |
Kan et al., 2016 [34] |
Phaseolus vulgaris v2.1 (http://phytozome.jgi.doe.gov/, accessed on 12 February 2021) |
Schmutz et al., 2014 [35] |
600 |
80 |
1900 |
27,433 |
Grape (Vitis vinifera) |
3532 |
Prior & Gu, 2005 [14] |
Vitis vinifera v2.1 (http://phytozome.jgi.doe.gov/, accessed on 12 February 2021) |
Jaillon et al., 2007 [36] |
487 |
102 |
566 |
26,346 |
Peanut (Arachis hypogaea) |
186 |
Hellström et al., 2009 [2] |
(http://peanutgr.fafu.edu.cn/, accessed on 12 February 2021) |
Zhuang et al., 2019 [37] |
2538 |
94 |
1509 |
83,709 |
Soybean (Glycine max) |
300 |
Lee et al., 2017 [38] |
Glycine max Wm82.a4 (http://www.soybase.org, accessed on 12 February 2021) |
Schmutz et al., 2010 [39] |
1150 |
95 |
1492 |
46,430 |
Strawberry (Fragaria × ananassa) |
145 |
Prior & Gu, 2005 [14] |
(https://datadryad.org/, accessed on 12 February 2021) |
Edger et al., 2019 [40] |
813 * |
99 |
79 |
108,087 |
Tea tree (Camellia sinensis) |
189 |
Engelhardt et al., 2003 [12] |
(http://tpia.teaplant.org/, accessed on 12 February 2021) |
Xia et al., 2019 [41] |
2890 |
95 |
67 |
53,512 |
* This assembly genome size (Fragaria × ananassa) is the haploid genome size.
3. Identification of Orthologous Genes Involved in PA Biosynthesis in Major Cash Crops
Despite the nutritional importance of flavonoids, key enzymes involved in these pathways have not been identified in many major food crops with high flavonoid contents (Table 1). To characterize the PA biosynthetic pathways in the selected species, the orthologous genes encoding key enzymes involved in all pathways were searched in the latest reference genomic databases of each species (Table 1). N50 and genome coverage are two of the most important factors for evaluating the quality of genome assembly, where N50 is defined as the length of the contig, scaffold, super-scaffold, and pseudomolecule together being shorter than or equal to 50% of the total genome assembly length, and coverage is calculated from the percentage of the total assembly size over the reference genome. In this study, contig N50 varied widely, ranging from 79 kb (strawberry) to 1900 kb (common bean), and the highest, lowest, and mean genome coverage were 102% (blueberry and grape), 80% (common bean and cacao), and 94%, respectively (Table 1).
To identify orthologous genes involved in the PA pathway in the major crops, we collected a list of key enzymes (F3′5′H, F3′H, F3H, DFR, LAR, ANS, and ANR) based on the EC number from the KEGG database. Of the eight species, 92 genes were identified, including two wild relative species of strawberry and peanut (
Arachis duranensis: 6,
A. thaliana: 6,
Fragaria vesca: 4,
G. max: 20,
M. domestica: 12,
P. vulgaris: 12,
T. cacao: 9,
V. vinifera: 23) (
Table 2). The protein sequences were downloaded from the NCBI GenBank database (
www.ncbi.nlm.nih.gov, accessed on 4 March 2021) and cross-checked using the reference genomic database of each species (
Table 2). We found that the sequences of three genes—XP_014632702.1 (
G. max), XP_007138248 (
P. vulgaris), and XP_017985307.1 (
T. cacao)—were not available in the latest reference annotation data (
Supplementary Table S1). In total, 83 genes were identified in the reference genome, and 14 paralogs, which had not been reported to be involved in the pathways in the KEGG database, were additionally detected in two species (
A. hypogea: 5 and
F. ananassa: 9;
Table 2). Using these 97 genes, orthologous genes were identified in the eight crop species, and 105 genes were newly identified as candidate genes involved in the flavonoid biosynthetic pathway from five species (
A. hypogea: 1,
C. sinensis: 19,
F. ananassa: 7,
P. dulcis: 4, and
V. corymbosum: 74) (
Table 2,
Supplementary Table S2). The annotation data currently available for the newly identified orthologous and paralogous genes agreed well with the EC numbers from the KEGG database. In
C. sinensis,
P. dulcis, and
V. corymbosum, a total of 97 orthologs were first found in our study, and they have not been reported to be involved in the phenylpropanoid pathway according to the KEGG pathway.
Table 2. Numbers of genes involved in proanthocyanidins production in the 11 species. Based on sequence similarity, orthologous genes were searched using BLASTP.
| Species |
a Number of Genes from KEGG Pathway |
b Number of Genes Confirmed from Reference Database |
c Number of Genes Newly Identified Using a |
Number of Orthologous Genes Newly Identified Using b+c |
| Arachis hypogea |
6 * |
6 |
5 |
1 |
| Arabidopsis thaliana |
6 |
6 |
- |
- |
| Fragaria × ananassa |
4 * |
4 |
9 |
7 |
| Glycine max |
20 |
19 |
- |
- |
| Camelia sinensis |
- |
- |
- |
19 |
| Malus domestica |
12 |
12 |
- |
- |
| Phaseolus vulgaris |
12 |
11 |
- |
- |
| Theobroma cacao |
9 |
8 |
- |
- |
| Vitis vinifera |
23 |
17 |
- |
- |
| Prunus dulcis |
- |
- |
- |
4 |
| Vaccinium corymbosum |
- |
- |
- |
74 |
| Total |
92 |
83 |
14 |
105 |
* The genes involved in the biosynthetic pathways have been reported in wild relative species of strawberry (Fragaria vesca) and peanut (Arachis duranensis). a the number of genes identified from KEGG database, b the number ofgenes confirmed from reference database, c the number ofgenes newly identified from reference genome using a.
4. Conclusions
Although different types of PAs are present in many cash crops, PAs have been relatively underinvestigated compared to other flavonoids, such as anthocyanins. This study identified species-specific biosynthesis pathways for PAs and a list of responsive orthologs in the reference genomic data of each species. The competition between parallel pathways may represent a significant regulatory mechanism not only for PA content, but also for the types of PAs, depending on the species. Our results will play a role in molecular breeding, to improve the nutritional quality of dietary food sources. In addition to the food market, PAs have been receiving attention in the pharmacological and cosmetic industries, with therapeutic potential for humans. Furthermore, molecular engineering of the parallel pathways could be a prominent approach to regulate the PA and anthocyanin levels, depending on the desired purpose.