Lotus Genomics and the Contribution to Its Breeding: Comparison
Please note this is a comparison between Version 1 by Huanhuan Qi and Version 2 by Lindsay Dong.

Genomics is the cornerstone of breeding, and studies based on whole-genome sequencing and genome-wide association study have greatly driven forward genomics-assisted breeding in many crops. Lotus (Nelumbo nucifera), under the Nelumbonaceae family, is one of the relict plants possessing important scientific research and economic values. As a basal eudicot species, lotus plays an essential role in studying plant evolution and phylogeny. It is adapted to the aquatic environment, while its relatives are shrubs or trees living on land. Water lily lies at the phylogenetic position of the base angiosperm and has similar living conditions and flowers. However, its genomes are vastly different. Lotus has unique features such as water-repellent self-cleaning function, multi-seed production, and flower thermogenesis, which may relate to flower protogyny or provide a warm environment for pollination.

  • lotus
  • genome
  • variant
  • germplasm
  • breeding
  • omics

1. Introduction

Lotus is one of the relict plants retaining the original morphology of its ancestors, as well as Ginkgo biloba, Liriodendron, and Metasequoia glyptostroboides. It belongs to the Nelumbo genus of the Nelumbonaceae family, which includes two species, namely Asian lotus (Nelumbo nucifera Gaertn.) and American lotus (Nelumbo lutea Pear.). The two species are named for their different geographical distributions. Asian lotus is mainly distributed in Asia and the north of Oceania, while American lotus is distributed in North America and South America. The plant morphology differs between them. Asian lotus is a tall plant, with oval leaves and seeds, and red or white flower colors, whereas American lotus is a short plant, nearly round and with dark green leaves, spherical seeds, and yellow flowers [1][16]. There is no strict reproductive isolation between them, and the life cycles are similar at about five months. Asian lotus is commonly called lotus and has more than 3000 years of cultivation history as a horticultural crop [2][17]. Lotus seeds and rhizomes have rich nutritional value and unique health-care function. Lotus seeds contain starch, proteins, amino acids, polysaccharides, polyphenols, alkaloids, and mineral elements. Lotus rhizome has a high vitamin C content. During the long period of domestication and artificial selection, about 4500 lotus cultivars have been obtained up till the present [3][18]. These cultivars have been planted to produce edible vegetables, snacks, beverages, restorative materials, and ornamental flowers, which impact human life and economic development. The lotus industry is also important for rural revitalization in the Yangtze River, Pearl River, and Huang Huai river basins. The cultivated lotus is generally divided into rhizome lotus, seed lotus, and flower lotus based on their different usage. The notable feature of the rhizome lotus is the enlarged rhizome but with few flowers. It can be divided into power and crisp type according to the taste of the rhizome. Different varieties were bred to meet the taste of the different regions of people or for further usage. The main breeding goal of rhizome lotus is to improve the yield and quality of the rhizome. Seed lotus is mainly for lotus seed production, with high yield, good quality, and disease resistance being the breeding goals. Flower lotus is preferred for ornamental use, and it has distinct flower colors and shapes. During long cultivation, ornamental lotus with different flower morphologies were obtained, including few-petaled, double-petaled, petaloidy, and thousand-petalled flowers. Red, pink, yellow, and white are the main flower colors. Currently, the breeding objective is mainly aimed at flower shape and color, yield or quality of lotus seed and rhizome, and wide adaptability.

2. Sequencing, Assembly and Annotation of Lotus Genome

Lotus occupies a crucial phylogenetic position in flowering plants. The high-quality reference genome of lotus plays a vital role in studying the origin of eudicot and lotus molecular breeding. In the last decade, some lotus varieties were sequenced by different platforms, which resulted in a different version of the genome assembly and annotation (Table 1). Based on NGS, a wild lotus, “China antique (CA)”, was successfully sequenced and assembled [4][20]. The total sequenced genome length of “CA” is 804 Mb, of which 543.4 Mb (67.6%) were anchored to nine megascaffolds. The contig N50 was 38.8 Kb and the scaffold N50 was 3.4 Mb. The heterozygosity of “CA” genome is 0.03%, and the repetitive sequence is about 57%. A total of 26,685 protein-coding genes were predicted, with the average length of a gene being 6561 bp. Simultaneously, another wild strain of lotus, “Chinese Taizi” was assembled through NGS technology. The final assembled genome size is 792 Mb with the contig N50 39.3 Kb and scaffold N50 986.5 Kb [5][21]. The length of transposable elements is 392 Mb (49.48%), and 36,385 protein-coding genes were annotated. One WGD event -λ in lotus instead of the paleo-hexaploid arrangement (γ WGD) event that occurred in core eudicots was predicted [4][5][20,21]. These two genomes were further anchored to eight pseudo-chromosomes by constructing a higher resolution genetic map and physical maps [6][22].
Table 1.
Comparison of assembled lotus genomes.
Items Year Sequencing Technology Final Assembly (Mb) Contig N50 Number of Genes Repeat Sequences Ref.
China Antique v1.0 2013 Illumina, 454 804 38.8 Kb 26685 57% [4][20]
Taizi 2013 Illumina Hiseq2000 792 39.3 Kb 40348 49.48% (TEs) [5][21]
China Antique v2.0 2020 Pacbio Sequel, Illumina 821.2 484.3 Kb 32124 58.50% [7][23]
Taikonglian NO.3 2022 Nanopore 807 5.1 Mb 28274 63.11% [8][24]
American lotus 2022 Pacbio RSII, Hi-C 843 1.34 Mb 31382 81.00% [9][25]

With the advent of a new sequencing platform, the genome of “CA” was re-assembled using 11.9 Gb long-read data from PacBio Sequel, and 94.2 Gb previously sequenced short-read data [7][23]. The new assembly of “CA” is 807.6 Mb with the contig N50 being 484.3 Kb, which has significantly increased the quality of the genome. The ratio of repetitive sequence (58.5%) was similar to the first version. Moreover, a cultivated lotus, “Taikonglian NO. 3”, was also assembled using the Oxford Nanopore sequencing platform (57.9 Gb raw data) with the contig N50 being 5.1 Mb, and eight chromosomes were anchored based on high-throughput chromatin conformation capture (Hi-C) data [8][24]. Another lotus species, American lotus, was recently assembled using PacBio RSII (74.6 Gb raw data) and Hi-C (50.32 Gb raw data), and the total length is 843 Mb while contig N50 is 1.34 Mb [9][25].

3. Potential Adaptive Evolution and Domestication of Lotus

The availability of lotus reference genome information has facilitated the resequencing of different lotus germplasms. Several studies were conducted on how the lotus genome was subjected to adaptive evolution and artificial selection. Although it is known that there are only two species of lotus, namely Asian lotus and American lotus, except for the difference in flower color, their plant architecture and morphology are very similar. Based on molecular phylogeny analysis, significant genetic differentiation between American and Asian lotus was verified [9][10][11][12][25,26,27,28]. De-novo deep sequencing of the American lotus showed that its genome size is 843 Mb, and an approximate 81% repeat sequence was identified (Table 1), which is larger than the genome of Asian lotus. It is interesting to investigate the dramatic difference in repeat sequence between them because most protein-coding genes show a one-to-one synteny pattern. A total of 29,533 structure variations (SVs) were detected between two lotus species, with the SV-associated genes overexpressed in ‘regulation of mitotic cell cycle’, and ‘protein transporter activity’ [9][25]. Meanwhile, this study also showe d that the selection on an MYB gene might contribute to the color difference between Asian and American lotus [9][25]. It is still an open question about when the two species diverged during the evolution and how they could keep high similarity in the independent geographical evolution. The wild lotus is distributed widely worldwide and maintains higher genomic diversity than cultivated lotus. Tropical and temperate lotus are the two ecotypes of Asian lotus. The comparison of the genome of these two ecotypes showed that a total of 453 genes were subjected to selection, including cyp714a genes that may relate to rhizome morphogenesis and a 10-Mb region in chromosome 1 that might play key roles in environmental adaption; including a homolog gene of at5g2394 in Arabidopsis encoding an acyltransferase protein [8][24]. By comparing their expressional patterns, the genes encoding granule-bound starch synthases, storage organ development, COSTAN-like gene family, vernalization, as well as cold response genes may relate to ecotypic differentiation [10][26]. It is very important to know the genetic backgrounds of parental lines in breeding. The origin, classification, and evolution of cultivated lotus were investigated through population re-sequence analysis. A total of 18 lotus accessions, including categories of American, seed, rhizome, flower, wild, and Thai lotus, were re-sequenced, based on which phylogenetic tree was constructed. The results indicated that the rhizome lotus had a closer relationship with wild lotus. In contrast, seed and flower lotus were admixed [10][26], which could be supported by re-sequencing of an enlarged population containing 296 accessions of different germplasm (58 wild, 163 rhizome, 39 flower, 32 seed lotus varieties) [12][28]. Further re-sequencing of 69 lotus accessions showed that flower lotus might mix with rhizome or seed lotus [11][27].

4. Identification of Genes with Potential Application in Lotus Breeding

Flower color, shape, and flowering time are important traits that determine the ornamental value of lotus. There are three different colors in lotus, red and white in Asian lotus and yellow in American lotus. The red color in Asian lotus is determined by the contents of anthocyanin [13][14][40,41], which is controlled by key enzyme encoding genes, and their regulating transcription factors (TFs) such as MYB, basic-Helix-Loop-Helix (bHLH), WD40 in its biosynthetic pathway. Among all the enzyme encoding genes in this pathway, NnANS and NnUFGT seem to be the decisive two genes [15][16][42,43]. Several TFs including 5 MYB, 2 bHLH, and one WD-repeat genes, may be involved in the regulation of anthocyanin biosynthesis in lotus based on a transcriptome analysis [16][43]. Among them, a bHLH gene NnTT8 was verified to regulate anthocyanin biosynthesis [17][44], whereas the yellow color of American lotus is determined by carotenoid, and no anthocyanin was detected [9][18][25,45].

The rhizome is the main edible part of lotus. It is important to explore the mechanisms underlying rhizome formation and expansion in rhizome lotus breeding. Comparative transcriptomic and proteomic analyses focusing on rhizome development have been conducted to dig out the key genes and pathways critical for the crucial physiological process [19][20][21][52,53,54]. Furthermore, re-sequencing of the natural and genetic F2 populations has also identified several genetic regions and candidate genes that might be involved in lotus rhizome enlargement [22][55]. A systematic analysis was conducted on one candidate gene CONSTANS-LIKE 5 (COL5). Functional analysis in the potato system indicated that NnCOL5 might be positively associated with rhizome enlargement by regulating the expression of CO-FT genes and the GA signaling pathway [23][56]. In addition, one SNP was identified in another candidate gene NnADAP of AP2 subfamily, which is closely associated with rhizome enlargement phenotype and the soluble sugar content [24][57]. There is a big difference between temperate and tropical lotus, especially the rhizome’s morphology.

Lotus seeds are rich in nutrients and functional compounds such as alkaloids, flavonoids, and polyphenols [25][26][58,59]. They are consumed “as both food and medicine” [27][60]. It is essential to increase the yield and nutrition of lotus seed. The main factors determining lotus seed yield are the seed size and the number of lotus seeds per seedpod. Transcriptome analysis on the cotyledon of “CA” and “Jianxuan-17 (JX-17)” seeds at different developmental stages identified 8437 differentially expressed genes (DEGs). Many DEGs are involved in the brassinosteroid biosynthesis pathway, and further analysis predicted two AGPase genes as candidate genes affecting lotus seed yield [28][61]. It seems that phytohormones are involved in lotus seed development. A combination of metabolomic and proteomic methods revealed that 15 DAP (Day After Pollination) was a switch time point from the physiological active to the nutrition accumulation stage [29][62]. Starch is the primary nutritional component in mature lotus seed [30][63]. Its contents and the proportion of amylose and amylopectin could largely determine the nutritional value and taste of lotus cotyledon, respectively. ADP-glucose pyrophosphorylase (AGPase) plays an important role in regulating starch biosynthesis.

5. Conclusions

The new varieties of lotus with high yield, wide adaptability, and stress resistance play a vital role in improving the economic value of this important horticulture crop. The variations identification, functional gene cloning, and metabolites alterations among diverse germplasm resources were investigated in the past decades, driven by the progressively improved genome information which could facilitate breeding practices in lotus (Figure 1). However, a high-quality reference genome is the limiting factor that will affect the molecular breeding process. Improvement of the lotus reference genome will be a requisite in the future, directly affecting the accuracy of molecular markers and the efficiency of cloning functional genes. Gapless reference genomes and pan-genomes have become the new reference, based on which plentiful information of genomes such as open chromatin and more variant information can be explored. With the explosive growth of large-omics data, deep learning can be used to mine biological information and decipher gene regulation networks.

/media/item_content/202209/6327c879f1fd2ijms-23-07270-g001.png 
Figure 1.
Flowchart of the molecular breeding process of lotus.
Video Production Service