Legume Pangenome for Crop Improvement

Legume Pangenome for Crop Improvement: Comparison

Please note this is a comparison between Version 1 by Uday Jha and Version 2 by Amina Yu.

Legume crops play a crucial role in ensuring global food security. Legume reference genomes have been constructed for soybean, chickpea, common bean, pigeonpea, pea, lupin, peanut, cowpea, and mungbean. However, pangenomes are needed to obtain insights into the genome dynamics, gene-content variation, genetic basis of agronomic-trait variation, and evolutionary relationship in various legumes.

pangenome
legumes
structural variation
genome sequence
gene

1. Soybean (Glycine max L.)

In soybean, Li et al. ^[1][17] performed pangenome analysis by constructing a de novo genome assembly of seven accessions of Glycine soja collected from different geographic regions to shed light on the genetic basis of untapped genetic diversity and evolution-related novel insights. Comparing the genome sequences obtained from this pangenome analysis and a reference genome sequence of G. max W82 revealed a plethora of loss or gain of copy number variations (CNVs) in multiple genes, ranging from biotic-stress tolerance (including R genes) to various transcription factors contributing to local-environmental acclimation against invading pathogens ^[1][17]. In addition, comparing the G. max and G. soja genomes provided novel insights into the FLOWERING LOCUS T gene controlling flowering time, PHYA-regulating light receptor, and LEAFY genes via various ‘indels’ and nonsynonymous SNPs in the given loci in the G. soja genome ^[1][17]. In Arabidopsis, 1332 genes related to seed protein and oil contents were annotated in the G. max W82 genome based on homology to genes related to lipid-acyl pathways ^[2][53]. A comparative genomics analysis revealed that the G. max and G. soja genomes differ for several indels and large-effect single-nucleotide variants (SNPs) and CNVs for these genes attributing to oil-seed content ^[1][17]. A de novo assembly of 26 soybean accessions (including 14 cultivated, 3 wild, and 9 landraces) from 2898 deeply sequenced soybean genomes and pangenome analysis provided novel insights into the thousands of structural variations related to various candidate genes of agronomic interest (including iron-deficiency chlorosis, seed luster, and flowering) and involved in domestication and evolution ^[3][54] (Figure 12). The assembled sequences contained 20,623 gene families designated as core genes, with 8163 gene families in 25–26 accessions designated soft-core genes, and 28,679 families in 2–24 accessions designated dispensable genes ^[3][54] (Table 1).

Figure 12.

Legume pangenomes and key learnings for crop improvement.

A comparative analysis of this pangenome with the previously assembled reference genomes for Zhonghuang ^[4][5][63,64], W05 ^[6][65], and Wm82 ^[7][33] indicated the presence of myriad presence–absence variations (PAVs) and CNVs in various accessions of current soybean accessions, demanding a thorough pangenome analysis of diverse and geographically distinct soybean accessions ^[3][54]. Torkamaneh et al. ^[8][16] developed the soybean pangenome known as ‘Pansoy’ by constructing a de novo assembly of 204 geographically diverse soybean accessions following the ‘map to pan’ approach using the EUPAN pipeline ^[9][66]. This pangenome contained 108 Mbp novel genome sequences—with 1659 novel genes not present in the reference Wm82 genome—revealing 49,431 hard-core genes, 1401 soft-core genes, 3402 shell genes, and 297 cloud genes ^[8][16]. Thus, the Pansoy resource provided novel insights into intraspecific genetic variation in G. max and offered a platform for genomic diversity and evolutionary and domestication studies for soybeans.

Subsequently, a pangenome measuring 1213 Mbp was constructed from >1000 accessions collected from the USDA Soybean Germplasm collection using the chromosome level ‘Lee’ soybean genome assembly as the reference sequence ^[10][55]. The results identified 3765 genes missing in the ‘Lee’ genome sequence assembly, indicating a loss of genetic diversity during domestication and breeding activity ^[10][55]. This pangenome also contained an additional 198.34 Mbp sequence assembly compared to the ‘Lee’ reference genome. The PAV analysis revealed 86.8% core genes and 13.2% dispensable genes ^[10][55]. The authors also uncovered 110 genomic regions containing 1266 protein-coding genes in a ‘domestication-selective sweep’ and 86 genomic regions containing 1434 genes in a ‘breeding-selective sweep’ ^[10][55]. Based on this pangenome analysis, the authors concluded that wild soybean had higher nucleotide diversity than landraces, old cultivars, and modern cultivars due to the loss of diversity during domestication and breeding efforts.

2. Chickpea (Cicer arietinum L.) Pangenome

Chickpea is a major pulse crop, ranking third among pulses for global production and serving as a plant source of protein and many essential micronutrients ^[11][59]. In the last decade, unprecedented advancements in bioinformatics and NGS technologies allowed for the assembly of a chickpea genome sequence ^[12][13][34,67], which has served as a tool for chickpea improvement. To further build chickpea-genomic resources and to study speciation, phylogenetic analysis, genomic diversity of cultivated species and its wild progenitor, and migration of cultivated chickpea species, Varshney et al. ^[11][59] constructed the first chickpea pangenome (Figure 12) (measuring 592.58 Mb with 29,870 genes) by sequencing 3171 cultivated and 195 wild chickpea accessions using iterative mapping and an assembly approach. From this, 124,833 SNPs were selected to form linkage-disequilibrium blocks; these SNPs and the phenotyping data for 100 seed weight (100 SW) and yield per plant (YPP) from 2980 genotypes were used for genomic prediction to improve these traits ^[11][59]. Likewise, 3.94 million SNPs and the phenotyping data for 16 phenological and yield-related traits in 2980 genotypes revealed associations between 205 SNPs and 11 traits, with 152 SNPs associated with 79 unique genes governing seed size and development ^[11][59].

3. Cowpea (Vigna ungiculata L.) Pangenome

Cowpea is an important warm season, climate-resilient, multipurpose legume crop in sub-Saharan Africa ^[14][68]. Significant progress has been achieved in developing genomics resources for cowpeas. Lonardi et al. ^[15][42] developed the genome assembly of cowpea genotype IT97K-499-35. Liang et al. ^[16][57] constructed the first cowpea pangenome by producing de novo assemblies of six cowpea accessions with a mean size of 449.91 Mb, identifying 21,330 core and 23,531 non-core genes, with the large number of frameshift variations in the non-core genes attributing large diversity in domesticated cowpeas. Comparative and genomic synteny analyses of the IT97K-499-35 reference genome and the genome assemblies of the six accessions indicated the presence of numerous structural rearrangements, including PAVs, inversions, indels, and CNVs ^[16][57]. The authors confirmed that the existing PAVs are responsible for the black seed-coat color in cowpeas.

4. Pigeonpea (Cajanus cajan L.) Pangenome

Pigeonpea is an important grain legume, mostly grown in India and Africa, with an essential role in ensuring protein nutrition security across the semi-arid tropics ^[17][36]. The pigeonpea reference genome sequence ^[17][36] has been used to improve various traits in pigeonpea; however, the pangenome assembly of pigeonpea provides further opportunities to harness untapped genetic variation not present in the reference genome sequence. Zhao et al. ^[18][58] built the pigeonpea pangenome (measuring 622 Mbp) using iterative mapping and an assembly approach ^[19][13], sequencing 89 pigeonpea accessions collected across the globe and the reference pigeonpea with >9.5× coverage. The pangenome contained 55,512 more genes than the previously assembled reference genome (53,612), with 48,067 core genes and 7445 accessory genes ^[18][58]. A functional analysis of the ‘variable genes’ indicated their possible role in proteolysis, pollen tube reception, and the signal transduction process ^[18][58]. Of 909 identified R genes, 836 were core genes and 73 were variable genes. To confirm the role of PAVs attributing phenotypic diversity, the authors detected several additional SNPs for various phenotypic traits (seed weight, days to 50% flowering, and plant height) compared to Varshney et al. ^[20][70], who found a significant association of 1 SNP for pods/plant, 1 SNP for primary branches/plant, and 3 SNP for secondary branches/plant from the study of whole genome resequencing of 292 pigeonpea accessions.

5. Mungbean (Vigna radiata L.) Pangenome

Mungbean is an important grain legume crop with high nutritional benefits, mostly grown in south and southeast Asian countries ^[21][71]. Draft genome assemblies are available for mungbean genotypes VC1973A ^[22][43] and JL7 ^[23][56]. Liu et al. ^[23][56] assembled the first mungbean pangenome (measuring 762.92 Mb), with 43,462 annotated genes, by deep-sequencing JL7 and 217 mungbean accessions, including cultivars and landraces. The predicted genes comprised 33,258 hard-core genes, 2872 soft-core genes, 7154 shell genes, and 178 cloud genes ^[23][56]. Of the nine PAVs controlling flowering regulation, three genes (jg13350, jg13746, and Pang80812) were present in Chinese landraces and breeding lines ^[23][56], while six genes (jg1521, jg5273, jg5274, jg5281, jg5284, and Pang68295 were absent but present in non-Chinese lines ^[23][56].

6. White Lupin (Lupinus alba L.) Pangenome

White lupin (Lupinus albus L.) is a protein-rich grain legume crop domesticated in the Mediterranean region ^[24][25][72,73]. However, the presence of quinolizidine alkaloids in lupin seed causes a bitter taste and toxicity; thus, lowering seed alkaloids is a prime objective of lupin-breeding programs. Hufnagel et al. ^[26][40] and Xu et al. ^[27][74] constructed the white-lupin genome assembly. Hufnagel et al. ^[28][61] established the first white-lupin pangenome by sequencing a diverse set of 39 white-lupin accessions using a ‘map-to-pan’ approach ^[9][66]. The pangenome comprised 32,068 (78.5%) core genes, 6046 soft-core genes, and 8776 shell genes ^[28][61]. Several selection sweeps related to breeding for low-alkaloid accessions were identified on chromosomes 18, 15, 1, and 3, overlapping the low-alkaloid-content QTLs previously reported by Ksiazkiewicz et al. ^[29][75] and Lin et al. ^[30][76].

7. Barrel Medic (Medicago truncatula) Pangenome

Barrel medic (Medicago truncatula) is an important model legume crop used for investigating symbiotic relationships with rhizobia and mycorrhizae and root development ^[31][32][77,78]. Zhou et al. ^[33][60] developed the first M. truncatula pangenome (measuring 388–428 Mbp, with an additional 63 Mbp novel sequence) by de novo assembly of 15 M. truncatula accessions, adding 16% genome space to the reference genome sequence ^[34][79] of this model legume. This pangenome sheds light on causative structural variants, including transposable elements in different gene families, such as the nucleotide-binding site leucine-rich repeat rapid evolvement.

8. Lentil (Lens culinaris) Pangenome

Lentil is an important source of dietary protein, carbohydrates, and iron for humans ^[35][80]. Due to the large size of the genome (4 Gb) and the presence of a high number repetitive elements, the completion of a lentil genome assembly has taken some time to complete. However, an incomplete-draft genome sequence of Lens culinaris cv. CDC Redberry v2.0 ^[36][37][81,82] has facilitated new directions in lentil genomics research. Thus, in the absence of a complete reference genome sequence, pan-transcriptomes ^[38][83] could provide a platform to establish the lentil pangenome. Guerra-Garcia et al. ^[39][84] have used an exome capture array built on the draft lentil genome and transcriptomic data ^[40][85] to examine copy number variation across cultivated lentil germplasm. HereThin, it ws work has uncovered a previously hidden amount of variation across lentils ^[39][84]. Furthermore, Gutierrez-Gonzalez et al. ^[41][62] assembled and annotated the transcriptomes of eight lentil accessions, including cultivated and wild species (L. orientalis, L. tomentosus, L. ervoides, L. lamottei, L. nigricans, and two L. odemensis), identifying 15,910 core genes and 24,226 accessory genes. A comparative analysis of these assembled transcriptomes with the incomplete CDC Redberry reference genome assembly indicated that transcripts of L. culinaris had the greatest similarity with CDC Redberry transcripts, while transcripts of L. lamottei and L. nigricans had the least ^[41][62]. Further long-read-based sequencing efforts could assist in decoding the complete genome and pangenome assembly of lentils.

9. Pea (Pisum sativum L.) Pangenome

Pea is the fourth-most globally important grain legume, enriched with protein, starch, and minerals ^[42][43][38,86], and is considered a ‘founder crop’ member; it is also one of the oldest domesticated crops ^[44][87]. Despite having an assembled pea reference genome ^[45][37], genetic diversity in cultivated and wild pea and the domestication history of pea remain elusive. Yang et al. ^[42][38] constructed a second pea de novo genome assembly of ZW6 and a pangenome assembly of 116 accessions containing cultivated and wild pea genotypes, identifying 112,117 pangenes, including 15,470 core genes, 6170 soft-core genes, 41,028 shell genes, and 50,108 cloud genes. The study confirmed that P. abyssinicum is separate from P. fulvum and P. sativum. The pangenes from P. abyssinicum were unique in stimulus and chemical response, while those of P. fulvum were involved in growth, tropism, and development; thus, the novel genes harboring in these species could be used to improve disease resistance and yield in modern pea cultivars ^[42][38].

10. Pangenome-Derived PAVs for GWAS Analysis to Explore Novel Genes

Pangenome analysis allows the discovery of a plethora of novel structural variations (SVs)SVs, including gene PAVs that may be missing in the reference genome assembly ^[8][23][46][16,27,56]. Thus, these pangenome-derived PAVs can be used in a GWAS to uncover a novel agronomic trait gene allele(s). Liu et al. ^[23][56] uncovered five novel genes missing in the mungbean reference genome assembly that contributed to bruchid resistance (Pang34265, Pang44622, Pang57772, Pang58608, and Pang64254). Zhao et al. ^[18][58] identified 225 significant SNPs associated with various phenological and yield-related traits (e.g., days to 50% flowering, days to maturity, pods/plant, and yield/plant) from SNPs and PAVs derived from a pigeonpea pangenome analysis, which overlapped those previously identified by Varshney et al. ^[20][70]. However, several new SNPs were associated with these studied traits ^[18][58]. Likewise, Varshney et al. ^[11][59] identified 205 significantly associated SNPs for agronomic traits in a GWAS study on 16 traits in a panel of 2980 chickpea accessions using 3.94 million SNPs, of which 152 SNPs were in 79 unique genes related to seed size and seed development ^[11][59]. GWAS analysis using SVs derived from 2898 soybean accessions elucidated the genetic basis of soybean seed luster, uncovering the presence of 10 kb PAVs in the hydrophobic protein-encoding gene causing seed luster ^[3][54].