Developing Genomic Resources for Crop Improvement

Developing Genomic Resources for Crop Improvement: Comparison

Please note this is a comparison between Version 2 by Fanny Huang and Version 1 by Pradeep Ruperao.

The emerging sequencing technologies target generating more data with fewer inputs and at lower costs. This has also translated to an increase in the number and type of corresponding applications in genomics besides enhanced computational capacities (both hardware and software). Alongside the evolving DNA sequencing landscape, bioinformatics research teams have also evolved to accommodate the increasingly demanding techniques used to combine and interpret data, leading to many researchers moving from the lab to the computer.

sequencing technologies
assemblies
crop
plant genomic

1. Introduction

With more than 40 years of remarkable DNA sequencing improvements, today, the development of cost-reducing and higher throughput sequencing technologies, along with relevant bioinformatics tools, have made it possible to produce high-quality genome assemblies in a much-reduced timeline, which has subsequently led to the mapping of the genetic variations in thousands of individuals, providing genetic insights into population histories and domestication events. The multinational and multi-institutional consortium the Earth BioGenome Project (EBP) aims to unify the phylogenetic networks across all eukaryotic life derived from their complete de novo genomes [1,2]^[1][2]. This illustrates how far the advancement and standardization of genome data generation, assemblies, storage, retrieval, and analysis have developed, with more expected and required with the generation of massive genomic data from species bridging the phylogenetic gaps between currently sequenced genomes.

Complete reference genome assemblies of the entire plant kingdom will open new scientific views on the evolution and speciation events on earth and genetic control of plant traits, both at intra- and inter-species levels. They will also enhance the understanding of how plants function in ecosystems, lead to the discovery of natural botanical compounds for human medicine, and will aid an increase in food production to curb global hunger while respecting planetary boundaries and adapting to climate change.

2. Plant Genomic Resources (Big Data Generation)

Sequencing technologies, mainly using high-throughput NGS sequencers, generate significant amounts of data. For example, the recent sequencer from Illumina (NovaSeq 6000) has a higher output than the earlier generation of sequencing machines producing between 1300–20,000 million reads (65 Gb to 3 Tb). The long reads from PacBio reach up to a maximum of 300 Kb, and the data generated with Sequel I, II (CLR), II (HiFi) range from 0.5 million to 400 million reads (15 Gb to 100 Gb), with the nanopore sequencing technology (Minion and Promethion) sequencing ranging from 2.5–12 million reads (40 Gb to 180 Gb).

With this capacity, sequencing land plants having a wide range of genome size DNA content can, in theory, possibly generate good coverage of the entire genome sequence data. For example, the corkscrew plant Genlisea margaretae with a 1C value of 0.07 pg (65 Mb) and the canopy plant Paris japonica with a 1C value of 152.2 pg (148.9 Gb) are equally accessible in terms of raw sequence generation and coverage [53]^[3] (https://cvalues.science.kew.org/). Generating several-fold coverage of genomic data produces potentially massive datasets, ranging from Gb to Tb of sequence information. Depending on the scope of the project, handling such large datasets is a major concern for small (or even big) research labs. Decades ago, geneticists were mostly involved in lab work; now, the most limiting factor is the analysis of the data to derive meaning or interpretation out of it using computational tools. Understanding the algorithms and processing the data are a crucial part of genetics and genomics data analysis when searching for biological meaning.

Genomic sequencing is a field where handling big data and its processing requires a suitable storage and data transfer platform, such as is present in cloud technologies. These are extensively applied to enhance the availability of the data to all researchers in a project and indeed researchers worldwide. The genome sequence data generated for a crop genome project are immense; for example, a single Sorghum genome sequence contains over 50 gigabytes of raw data (depending on the data format generated), and processing the data for large population-wide studies, such as finding deeper scientific insights, marker–trait association, analyzing diversity, domestication, and assessing data from gene-editing technologies, requires robust storage and computing capacities.

To maintain the uniformity of the data in the global databases, the members’ databases (GenBank, EMBL, DDBJ, CNGBdb, IBDC) of the International Nucleotide Sequence Database Collaboration (INSDC) [54]^[4] share and update genomic data periodically.

The recent stats release of GenBank reports having 16.7 trillion nucleotide bases for 1.7 million whole genome sequences (as of June 2022) (GenBank and WGS Statistics (ncbi.nlm.nih.gov)). Of which, green plant data (Viridiplantae) alone have 93.8 million sequences from 2324 genomes (including variants of the same plant species genome), including genomic DNA/RNA for 33.4 million sequences, mRNA for 41.5 million sequences, and rRNA for 80,709 sequences.

With the increasing complexity of genomic data themselves, the major databases also integrate other genomic features and provide tools to search and retrieve these datasets. The Entrez system of NCBI is one such tool allowing users to search, view, and download the sequences from GenBank. Other modes of data accessibility allow for downloading from the FTP site (ftp.ncbi.nlm.nih.gov) or downloading data programmatically with the provided public API to the Entrez system (https://eutils.ncbi.nlm.nih.gov).

Numerous databases have been developed for genomic data to suit a variety of different purposes. Based on the data catchment of the database, the database is as big as a global repository holding the sequences of all species, like Ensembl Plants, the National Centre for Biotechnology Information (NCBI), PlantGDB, the Plant Genome Database Japan (PGDBj), to medium size databases hosting only plant genome assemblies/annotations, like Phytozome and the Legume Information System (LIS) (https://www.legumeinfo.org), to smaller databases containing crop/plant-specific information, such as for the chickpea SSR database (https://cegresources.icrisat.org/CicArMiSatDB/index.html) [55]^[5] and chickpea SNP and indel database (https://cegresources.icrisat.org/cicarvardb/) [56]^[6]. However, the medium to smaller databases are limited to the scope of species-level data, like the LIS and proposed angiosperms database [57]^[7], and may do not need to use powerful bioinformatics tools and computational resources to explore the terabytes of genomic data, and many such databases were earlier discussed in [58]^[8].

3. Plant Genome Assemblies

Genome assembly refers to aligning the small fragments of a DNA sequence to reconstruct the genome sequence in the original order and orientation. High-throughput sequencing through first- and second-generation sequences has enabled the assembly of many plant genomes. The highly fragmented genome assemblies generated with short reads have been improved with long read sequence assemblies, simplifying and improving the ability to generate chromosome-level assemblies with reduced reliance on dedicated research experts.

Thanks to the NGS technology and increased computational power, the standard of the genome assemblies available has improved significantly. Genomics has accelerated its growth in the past decade from draft-level genome assemblies to reference-level genome assemblies [78,79,80]^[9][10][11].

The plant genomes assembled in the FGS era faced significant throughput issues and were limited by a read length of around 1 Kb. This necessitated approaches such as BAC-end reads and BAC barcoding to allow contigs to be linked and positioned throughout the genetic mapping. The plant genomes assembled in the FGS era are far fewer than the genomes assembled in the SGS and TGS sequencing technology era, primarily due to the lower throughput and high cost of FGS. The situation changed sharply with SGS, as the volume of the sequence (although not the length) was significantly increased. Long-read sequence technologies play a crucial role in genome assembly projects, which helps in scaffolding the contig sequences, and thus many genome projects were initiated with combined SGS and TGS technologies. With the advent of advanced sequence technologies such as PacBio HiFi sequencing, which produces a 10 to 30 Kb circular consensus sequence, thus reducing error rates (CCS) [11]^[12], Oxford Nanopore long-read protocols [81]^[13], Hi-C scaffolding [32]^[14], and optical mapping technologies, such as Bionano [82]^[15], it is possible to assemble complex genomes. The emerging third-generation sequence data have boosted the genome assembly quality to build a chromosome-level assembly by overcoming the limitation of short reads assembly, particularly in plants, where islands of repeat sequences need to be bridged between the gene-rich regions of the chromosomes. With the low-cost and high-throughput sequence data generations, at least 1143 plant reference assemblies have been published (www.plabipd.de). Based on the availability of funds and the feasibility of applying high-volume sequence data generation, multiple individuals of the same species were de novo assembled, e.g., potato [83]^[16], or the genome assembly of the same varieties improved, such as for chickpea [84,85]^[17][18] and sesame [86]^[19]. The development of long-read technologies as part of the TGS allowed for a relatively simple assembly of smaller genomes. With optical and chromatin-based methods, such as Bionano and HiC, far more comprehensive and larger genome assemblies are now possible, which are based on a range of techniques, including the integration of scaffolds into the chromosome through genetic mapping.

In recent years, gold-standard and platinum-standard chromosome-level genome assemblies are being achieved in prominent model crop plants [87,88,89,90,91,92]^{[20][21][22][23][24][25]}. Here, gold-standard assembly refers to cases where the number of superscaffolds matches the number of haploid chromosomes, yielding a chromosome-level assembly; a platinum-standard assembly refers to a telomere-to-telomere (T2T) assembly with the final scaffolds matching the number of haploid chromosomes. This era has led to gold- or platinum-standard assemblies in crop plants, and publications meeting these standards are continuing to appear [93]^[26]. The importance of having platinum-standard reference genome assemblies and the importance to compare cultivated species with wild relatives of rice is documented [94]^[27].

Chromosome-level genome assemblies were initiated with Arabidopsis in 2000 [95]^[28] and later with rice in 2005 [96]^[29]. These assemblies were generated with the traditional, expensive, and low-throughput Sanger sequencing method. With current third-generation sequencing (such as PacBio, HiFi, Hi-C, and optical mapping methods), it is possible to generate chromosome-level pseudomolecules [97]^[30]. With PacBio sequence data, a chromosome-level assembly was first achieved for Arabidopsis [98]^[31] followed by Oropetium [99]^[32]. Similar to the PacBio long reads, ONT generates around 200 Kb length reads highly suitable for bacterial genomes assembly [100]^[33]. Synthetic long reads (SLR) are long reads generated from Illumina short-read data to assemble long reads [101]^[34]. In total, 113 plant species have the chromosome-level genome assemblies published (as of the end of 2022) (www.plabipd.de) of the total assembly number of 1143 flowering plants, and 125 are non-flowering plants. Most of these near-complete plant genomes were produced with sequence data generated from multiple technologies. The long-read 10× Genomics with short-read Illumina data were used to assemble the blueberry genome [102]^[35]. PacBio and Hi-C sequence technology were used for assembling the octoploid sugarcane genome [103]^[36], allotetraploid peanut [104]^[37], and teff [105]^[38].

Several novel technologies have emerged (such as optical mapping [106]^[39]), the Irys system by BioNano Genomics (www.bionanogenomics.com) and chromosome conformation capture sequencing (Hi-C) [32]^[14]) to improve the scaffolding without depending on genetic mapping. However, these advances in genome assembly have recently improved further to generate the telomere-to-telomere (T2T) assemblies, as first implemented in 2020 for the X chromosome sequence of the human genome [107]^[40] and later adapted to plants, such as Arabidopsis [108^[41][42],109], rice [110]^[43], and banana [111]^[44]. The combined integration of PacBio and modified Hi-C protocol as Dovetail Genomics has improved the assembly contiguity for A. alpina [112]^[45]. The high-resolution gap-free T2T genome assemblies ensure the capture of all the repetitive sequences and genomic variants without any misassemblies.

The greatest bioinformatics challenge for sequencing plant genomes was repetitive sequences, leading to sequencing errors and unrecognizable assembling errors at earlier stages of assembly computation. As the plant genome size and ploidy or repeat content increases, the complexity of assembly of the sequence reads correctly also increases, and thus the assembly programs used in these genome projects needed increasingly sophisticated strategies (such as chromosome flow sorting methods used in wheat) to handle such challenges. Additionally, handling the terabytes of sequence data and storage and managing the computing clusters and complexity of the algorithms also need to be addressed.

In addition to improving the quality of reference genomes to platinum-standard, present-day technologies paved the way for the transformational shift from the representative single genotype’s genome sequence to the pan-genome sequence as a reference for a better understanding of the variability present within a species [113]^[46]. The advantages of the pan-genome reference are being realized in generating novel insights and the identification of the genes or genomic regions underlying the important agronomical traits and domestication process [86,114,115,116,117,118]^{[19][47][48][49][50][51]}.

4. Genome Assemblers

As sequencing technology evolved, assembly approaches also had to evolve. The Celera Assembler and Arachne assemblers were developed to handle genomes of the fruit fly (Drosophila melanogaster) and human genome in 2000–2003; later, AMOS was launched under an open-source framework. These assemblers were developed based on overlap–layout–consensus on an overlap graph [120]^[52] in which the nodes were the reads and the edges represented the shared sequence between reads. This type of assembler is suitable for assembling FGS technology sequencing reads produced by the dideoxy termination method (Sanger sequencing). As massively parallel high-throughput sequencing technology was developed to produce millions of bases (in SGS), the read size became smaller and more error-prone with higher genome coverage. The leading Illumina technology of SGS/NGS sequencing technology yields 35–150 bp length paired-end reads from fragments with a 200–300 bp insert size. Such high-throughput data required a new approach, and thus de Bruijn graph-based assembly was developed [121,122]^[53][54] where the nodes represent fixed-length strings drawn from a larger set of strings, and the edges represent perfect shared sequences. However, de Bruijn graph-based assemblers have difficulties handling sequencing errors and need high computational power (100+ Gb of memory). The challenge with uneven genome coverage and reads too short to span repeated regions can be addressed by a combination of many short reads and fewer longer reads or mate–pair reads (Sanger, 454 and Illumina sequencing methods). Multiplex de Bruijn graphs automate the assemblies of long HiFi reads [123]^[55], and the recently updated Minimap2 version can be used for long read assembly [124]^[56]. Newbler was the first assembler released in 2004 to assemble the 454 sequence data followed by a hybrid version of the MIRA assembler for 454 and mixed with Sanger reads. After upgrading the Illumina sequence technology to produce from the initial 36 base-length read to reads over 100 bases in length, the produced sequence was suitable for de novo assembly. After the release of the SHARCGS assembler for Solexa reads, other assemblers were released and became the most popular assembly tools.

Plant genome assembly was initiated with Arabidopsis thaliana in December 2000 [95]^[28] where the approach relied on overlapping bacterial artificial chromosome (BAC) clones which were end sequenced and the same approach was applied to the crop plant rice [125,126]^[57][58]. Later, the emerging whole genome shotgun (WGS) strategy was applied to black cottonwood [127]^[59]. This was where more difficulties and challenges were faced to assemble the short sequence reads, which resulted in a more fragmented assembled genome sequence followed by two versions of the grapevine genome sequence in 2007 [128,129]^[60][61]. A hybrid approach was adopted to sequence the cucumber with Illumina and Sanger sequencing technology, indicating the feasibility of using this approach for plant genome sequencing [130]^[62]. With the change in technology, 454 combined with the Sanger sequencing approach was applied to the genomes of apple [131]^[63], cocoa [132]^[64], and muskmelon [133]^[65]. In 2011, the first plant genome was sequenced using SGS technology combining 454, Illumina, and the SOLID platform for strawberry [134]^[66], Chinese cabbage [135]^[67], potato [136]^[68], chickpea [137]^[69], pigeonpea [138]^[70], and watermelon [139]^[71].

The advances in sequencing technology (SGS and TGS) and assembly approaches have removed the limitation of genome sequencing for not only the crops with small genome sizes but also enabled sequencing and assembly of large genome crops, like wheat (~17 Gbp) [87,140,141]^[20][72][73], barley (5.1 Gbp) [142]^[74], rye (~7–8 Gbp) [143]^[75], and tea (~3.8–4.0 Gbp) [144]^[76], which are important for animal feed and human nutrition.

The genome assembly quality has improved as the sequencing technologies and assembling tools improved, especially when combined with the utilization of multiple sequencing technologies of TGS, for example.

The initial assembly version of the sorghum genome assembly released in 2009 [145]^[77] with shotgun sequencing and BAC libraries data captured 738.5 Mb of sequences in 12,873 contig sequences (scaffolded to 3304 sequences), which is more fragmented compared to the chromosome-scale assembly of the sorghum genome using nanopore sequencing and optical mapping data that produced a hybrid assembly made of 29 scaffolds capturing the 661.16 Mbps [146]^[78].

For a large genome (~8 Gb) rye (Secale cereale), initially, a virtual linear gene order model (22,426 genes) was established with high-throughput transcript mapping and chromosome survey sequencing [147]^[79]; following reference genome assembly with a shotgun, de novo genome assembly produced 1.29 million scaffolds, capturing 2.8 Gbp of sequence [148]^[80] and later chromosome-scale genome assembly with 10×, HiC, Bionano optical genome mapping, and chromosome-specific shotgun (CSS) reads produced 6.74 Gb (of estimated 7.9 Gb) [149]^[81].

In addition to the chromosome-scale assemblies, TGS has enabled the assembly of polyploid genomes, such as bread wheat [87]^[20], potato [150]^[82], and peanut [151]^[83].

5. Advancements in Plant Genomics

With the emerging sequence technology and bioinformatics tools, it is possible to assemble a nearly complete genome sequence. With cytogenetic advances to measure the genome size (such as flow cytometry), a genome size estimation is a useful first step in a complete genome sequencing project. The amount of sequencing data required to produce a given level of coverage depends on the 1C amount of DNA per cell (including ploidy level), and for most species, this can be found in the Kew Plant Genome Database. Most plant genome assemblies are smaller than the cytogenetic genome estimation size; this may be because of assembly errors or difficult-to-approach genomic regions, like centromeric and repetitive regions in the plant genome, where assemblers struggle (physical maps, such as Bionano, resolve such issues). Some of the assembled plant genome sizes are quite close to the cytogenetic estimated size, indicating the assembler has captured the majority of the genome content. Assemblies above the estimated size, however, may need refinement to reduce contaminants or alter the assembly parameters.

The genome assembly provides the coordinate system for the gene models and other genomic features, like SNPs, Indels, SSRs, etc. Predicting the gene models with ab initio gene findings and supporting evidence in the form of RNA data increases the accuracy. However, this may not list out the complete complement of genes of the species for which resequencing a wide range of diverse accessions will reveal more genes that are genotype-specific. For example, the resequencing of >1000 wild and cultivated rice accessions has predicted the presence of thousands of genes with lower sequence diversity in cultivated rice, indicating a rice domestication genetic bottleneck [114,152]^[47][84]. Moreover, genetic diversity is often reduced during domestication, and resequencing a single individual may not capture the species-wide gene content. Thus, the concept of the pan-genome was developed and adapted to plants’ genomes to identify the species-wide gene content. The core genome is usually defined as the housekeeping genes (which must be present for the organism to survive and reproduce) and the variable/dispensable genes (these genes are present or absent in a particular cultivar/accession of a species) that exhibit the gene diversity or variability in a species. Thus, the first plant pan-genomes appeared in 2007, describing the variable genes in rice and maize genomes, and were later adapted to a wide range of plant genomes [153]^[85], including banana [154]^[86], white lupin [155]^[87], barley [156]^[88], wheat [156]^[88], wheat panache [157]^[89], and sorghum [158]^[90].

The most commonly used downstream analysis with pan-genome assemblies is to identify the genetic variation of any DNA segment in a genome or a gene (including gene fragments) that can be used as a marker for genotyping. Bioinformatics resources enhancing crop genomics for downstream analysis include copy number variations (CNV), identification of variations based on the length (SNP, SSR, Indels), a set of SNPs used as a unit in the form of a haplotype to increase the resolution of GWAS, k-mer analysis, linkage disequilibrium (LD), presence–absence variations, pan-genome-wide association studies (PWAS), genotyping-by-sequencing, reduced representation sequencing, domestication, and diversity analysis. With these bioinformatics tools, the genomic data also assists plant phylogenomic research with useful information, such as genome diversity and speciation events. Therefore, bioinformatics has become a most essential part of plant genomics research.

High-throughput genotyping enables the genotyping of thousands of targeted loci (genetic markers) on thousands of samples. Depending on the number of markers and the sample size, different genotyping techniques can call genotypes in different ranges. Some of the technologies include Illumina golden gate, Affymetrix SNP, reduced-representation genome sequencing, exome-seq, Fluidigm (https://investors.fluidigm.com/node/13686/pdf), IntelliQube (https://www.myebpl.com/intelliqube.html), MassARRAY [185]^[91], MassEXTEND, GeneChip [186]^[92], APEX-Seq [187]^[93], BeadARRAY (https://www.illumina.com/science/technology/microarray.html), TaqMan [188]^[94], and DArT (https://www.diversityarrays.com/). Genotyping by sequencing (GBS) is a highly multiplexed system for constructing reduced representation libraries from the sequencing platform with low-cost, reduced sample handling with no need for a reference genome. GBS (including the single digest RAD and double digest RAD and skim-sequencing) are tools for genomics-assisted breeding in a range of plant species through the applications of SNPs identification, gene/QTL mapping, molecular diversity, GWAS, construction of high-density genome maps, haplotype maps, phylogenetics, identification of candidate genes, genetic linkage analysis, molecular marker discovery, and genome sequencing and selection. Such genetic resources assist in predicting the genetic value of selected candidates based on the genomic estimated breeding values (GEBV) from high-density and quality markers. Genomic selection (GS) is an approach to exploit genetic markers to develop new markers-based models to increase the genetic gain of complex traits for breeding programs. High-throughput marker technologies have changed the entire scenario of marker applications and enabled the use of GS routine work for crop improvement.

Plant phenotyping through conventional methods relies on manual measurements, which are laborious, error-prone, and time-consuming. Similar to genotyping, high-throughput phenotyping (HTP) (“phenomics”) has unique advantages in facilitating accurate, automated, high-quality data collection techniques, including visible light imaging, X-ray computed tomography, visible and near-infrared spectroscopy, multispectral imaging, chlorophyll fluorescence, fluorescence imaging, and nuclear magnetic resonance (NMR) [189]^[95] (Xiao et al., 2022). These tools are generally used to obtain high-resolution images of samples from which features are extracted with image processing algorithms. Mostly machine learning algorithms are used to generate robust data processing to produce accurate and time-efficient phenotypes of plants [190]^[96]. Highly accurate genotype and phenotypic data need appropriate statistical methods to identify true associations between genetic and phenotypic variation. Plant phenotyping systems, imaging techniques, challenges, and their applications have been reviewed elsewhere, including imaging systems, data collection methods, and analysis techniques and problems [191,192,193]^[97][98][99]. GWAS has high efficiency and high resolution and is conducted on a genome-wide scale with statistical programs. Some of the R packages developed for association analysis are GAPIT [194]^[100], qqman [195]^[101], gwasrapidd [196]^[102], eQTpLot [197]^[103], Postgwas [198]^[104], GWASTools [199]^[105], and IntAssoPlot [200]^[106].

References

Blaxter, M.; Archibald, J.M.; Childers, A.K.; Coddington, J.A.; Crandall, K.A.; Di Palma, F.; Durbin, R.; Edwards, S.V.; Graves, J.A.M.; Hackett, K.J.; et al. Why Sequence All Eukaryotes? Proc. Natl. Acad. Sci. USA 2022, 119, e2115636118.
Lewin, H.A.; Richards, S.; Aiden, E.L.; Allende, M.L.; Archibald, J.M.; Bálint, M.; Barker, K.B.; Baumgartner, B.; Belov, K.; Bertorelle, G.; et al. The Earth BioGenome Project 2020: Starting the Clock. Proc. Natl. Acad. Sci. USA 2022, 119, e2115635118.
Pellicer, J.; Leitch, I.J. The Plant DNA C-Values Database (Release 7.1): An Updated Online Repository of Plant Genome Size Data for Comparative Studies. New Phytol. 2020, 226, 301–305.
Arita, M.; Karsch-Mizrachi, I.; Cochrane, G. The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res. 2021, 49, D121–D124.
Doddamani, D.; Katta, M.A.V.S.K.; Khan, A.W.; Agarwal, G.; Shah, T.M.; Varshney, R.K. CicArMiSatDB: The Chickpea Microsatellite Database. BMC Bioinform. 2014, 15, 212.
Doddamani, D.; Khan, A.W.; Katta, M.A.V.S.K.; Agarwal, G.; Thudi, M.; Ruperao, P.; Edwards, D.; Varshney, R.K. CicArVarDB: SNP and InDel Database for Advancing Genetics Research and Breeding Applications in Chickpea. Database 2015, 2015, bav078.
Chen, F.; Dong, W.; Zhang, J.; Guo, X.; Chen, J.; Wang, Z.; Lin, Z.; Tang, H.; Zhang, L. The Sequenced Angiosperm Genomes and Genome Databases. Front. Plant Sci. 2018, 9, 418.
Chen, F.; Song, Y.; Li, X.; Chen, J.; Mo, L.; Zhang, X.; Lin, Z.; Zhang, L. Genome Sequences of Horticultural Plants: Past, Present, and Future. Hortic. Res. 2019, 6, 112.
Wang, W.; Mauleon, R.; Hu, Z.; Chebotarov, D.; Tai, S.; Wu, Z.; Li, M.; Zheng, T.; Fuentes, R.R.; Zhang, F.; et al. Genomic Variation in 3,010 Diverse Accessions of Asian Cultivated Rice. Nature 2018, 557, 43–49.
Ballouz, S.; Dobin, A.; Gillis, J.A. Is It Time to Change the Reference Genome? Genome Biol. 2019, 20, 159.
Varshney, R.K.; Sinha, P.; Singh, V.K.; Kumar, A.; Zhang, Q.; Bennetzen, J.L. 5Gs for Crop Genetic Improvement. Curr. Opin. Plant Biol. 2020, 56, 190–196.
Wenger, A.M.; Peluso, P.; Rowell, W.J.; Chang, P.C.; Hall, R.J.; Concepcion, G.T.; Ebler, J.; Fungtammasan, A.; Kolesnikov, A.; Olson, N.D.; et al. Accurate Circular Consensus Long-Read Sequencing Improves Variant Detection and Assembly of a Human Genome. Nat. Biotechnol. 2019, 37, 1155–1162.
Dumschott, K.; Schmidt, M.H.W.; Chawla, H.S.; Snowdon, R.; Usadel, B. Oxford Nanopore Sequencing: New Opportunities for Plant Genomics? J. Exp. Bot. 2020, 71, 5313–5322.
Lieberman-Aiden, E.; Van Berkum, N.L.; Williams, L.; Imakaev, M.; Ragoczy, T.; Telling, A.; Amit, I.; Lajoie, B.R.; Sabo, P.J.; Dorschner, M.O.; et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 2009, 326, 289–293.
Belser, C.; Istace, B.; Denis, E.; Dubarry, M.; Baurens, F.C.; Falentin, C.; Genete, M.; Berrabah, W.; Chèvre, A.M.; Delourme, R.; et al. Chromosome-Scale Assemblies of Plant Genomes Using Nanopore Long Reads and Optical Maps. Nat. Plants 2018, 4, 879–887.
Tang, D.; Jia, Y.; Zhang, J.; Li, H.; Cheng, L.; Wang, P.; Bao, Z.; Liu, Z.; Feng, S.; Zhu, X.; et al. Genome Evolution and Diversity of Wild and Cultivated Potatoes. Nature 2022, 606, 535–541.
Jain, M.; Misra, G.; Patel, R.K.; Priya, P.; Jhanwar, S.; Khan, A.W.; Shah, N.; Singh, V.K.; Garg, R.; Jeena, G.; et al. A Draft Genome Sequence of the Pulse Crop Chickpea (Cicer arietinum L.). Plant J. 2013, 74, 715–729.
Parween, S.; Nawaz, K.; Roy, R.; Pole, A.K.; Venkata Suresh, B.; Misra, G.; Jain, M.; Yadav, G.; Parida, S.K.; Tyagi, A.K.; et al. An Advanced Draft Genome Assembly of a Desi Type Chickpea (Cicer arietinum L.). Sci. Rep. 2015, 5, 12806.
Wang, H.; Yang, J.; Zhang, Y.; Qian, J.; Wang, J. Reconstruct High-Resolution 3D Genome Structures for Diverse Cell-Types Using FLAMINGO. Nat. Commun. 2022, 13, 2645.
Alonge, M.; Shumate, A.; Puiu, D.; Zimin, A.V.; Salzberg, S.L. Chromosome-Scale Assembly of the Bread Wheat Genome Reveals Thousands of Additional Gene Copies. Genetics 2020, 216, 599–608.
Zhang, S.V.; Zhuo, L.; Hahn, M.W. AGOUTI: Improving Genome Assembly and Annotation Using Transcriptome Data. Gigascience 2016, 5, 31.
Mamidi, S.; Healey, A.; Huang, P.; Grimwood, J.; Jenkins, J.; Barry, K.; Sreedasyam, A.; Shu, S.; Lovell, J.T.; Feldman, M.; et al. A Genome Resource for Green Millet Setaria Viridis Enables Discovery of Agronomically Valuable Loci. Nat. Biotechnol. 2020, 38, 1203–1210.
Zhou, Y.; Zhao, X.; Li, Y.; Xu, J.; Bi, A.; Kang, L.; Xu, D.; Chen, H.; Wang, Y.; Wang, Y.G.; et al. Triticum Population Sequencing Provides Insights into Wheat Adaptation. Nat. Genet. 2020, 52, 1412–1422.
Zhu, H.Z.; Zhang, Z.F.; Zhou, N.; Jiang, C.Y.; Wang, B.J.; Cai, L.; Wang, H.M.; Liua, S.J. Bacteria and Metabolic Potential in Karst Caves Revealed by Intensive Bacterial Cultivation and Genome Assembly. Appl. Environ. Microbiol. 2021, 87, e02440-20.
Kille, B.; Balaji, A.; Sedlazeck, F.J.; Nute, M.; Treangen, T.J. Multiple Genome Alignment in the Telomere-to-Telomere Assembly Era. Genome Biol. 2022, 23, 182.
Zhang, F.; Xue, H.; Dong, X.; Li, M.; Zheng, X.; Li, Z.; Xu, J.; Wang, W.; Wei, C. Long-Read Sequencing of 111 Rice Genomes Reveals Significantly Larger Pan-Genomes. Genome Res. 2022, 32, 853–863.
Mussurova, S.; Al-Bader, N.; Zuccolo, A.; Wing, R.A. Potential of Platinum Standard Reference Genomes to Exploit Natural Variation in the Wild Relatives of Rice. Front. Plant Sci. 2020, 11, 579980.
Kaul, S.; Koo, H.L.; Jenkins, J.; Rizzo, M.; Rooney, T.; Tallon, L.J.; Feldblyum, T.; Nierman, W.; Benito, M.I.; Lin, X.; et al. Analysis of the Genome Sequence of the Flowering Plant Arabidopsis thaliana. Nature 2000, 408, 796–815.
Matsumoto, T.; Wu, J.; Kanamori, H.; Katayose, Y.; Fujisawa, M.; Namiki, N.; Mizuno, H.; Yamamoto, K.; Antonio, B.A.; Baba, T.; et al. The Map-Based Sequence of the Rice Genome. Nature 2005, 436, 793–800.
Michael, T.P.; VanBuren, R. Building Near-Complete Plant Genomes. Curr. Opin. Plant Biol. 2020, 54, 26–33.
Berlin, K.; Koren, S.; Chin, C.-S.; Drake, J.P.; Landolin, J.M.; Phillippy, A.M. Erratum: Corrigendum: Assembling Large Genomes with Single-Molecule Sequencing and Locality-Sensitive Hashing. Nat. Biotechnol. 2015, 33, 1109.
Vanburen, R.; Bryant, D.; Edger, P.P.; Tang, H.; Burgess, D.; Challabathula, D.; Spittle, K.; Hall, R.; Gu, J.; Lyons, E.; et al. Single-Molecule Sequencing of the Desiccation-Tolerant Grass Oropetium Thomaeum. Nature 2015, 527, 508–511.
Loman, N.J.; Quick, J.; Simpson, J.T. A Complete Bacterial Genome Assembled de Novo Using Only Nanopore Sequencing Data. Nat. Methods 2015, 12, 733–735.
McCoy, R.C.; Taylor, R.W.; Blauwkamp, T.A.; Kelley, J.L.; Kertesz, M.; Pushkarev, D.; Petrov, D.A.; Fiston-Lavier, A.S. Illumina TruSeq Synthetic Long-Reads Empower de Novo Assembly and Resolve Complex, Highly-Repetitive Transposable Elements. PLoS ONE 2014, 9, e106689.
Colle, M.; Leisner, C.P.; Wai, C.M.; Ou, S.; Bird, K.A.; Wang, J.; Wisecaver, J.H.; Yocca, A.E.; Alger, E.I.; Tang, H.; et al. Haplotype-Phased Genome and Evolution of Phytonutrient Pathways of Tetraploid Blueberry. Gigascience 2019, 8, giz012.
Zhang, L.M.; Leng, C.Y.; Luo, H.; Wu, X.Y.; Liu, Z.Q.; Zhang, Y.M.; Zhang, H.; Xia, Y.; Shang, L.; Liu, C.M.; et al. Sweet Sorghum Originated through Selection of Dry, a Plant-Specific Nac Transcription Factor Gene. Plant Cell 2018, 30, 2286–2307.
Bertioli, D.J.; Jenkins, J.; Clevenger, J.; Dudchenko, O.; Gao, D.; Seijo, G.; Leal-Bertioli, S.C.M.; Ren, L.; Farmer, A.D.; Pandey, M.K.; et al. The Genome Sequence of Segmental Allotetraploid Peanut Arachis Hypogaea. Nat. Genet. 2019, 51, 877–884.
VanBuren, R.; Wai, C.M.; Pardo, J.; Yocca, A.E.; Wang, X.; Wang, H.; Chaluvadi, S.R.; Bryant, D.; Edger, P.P.; Bennetzen, J.L.; et al. Exceptional Subgenome Stability and Functional Divergence in Allotetraploid Teff, the Primary Cereal Crop in Ethiopia. bioRxiv 2019, 580720.
Lam, E.T.; Hastie, A.; Lin, C.; Ehrlich, D.; Das, S.K.; Austin, M.D.; Deshpande, P.; Cao, H.; Nagarajan, N.; Xiao, M.; et al. Genome Mapping on Nanochannel Arrays for Structural Variation Analysis and Sequence Assembly. Nat. Biotechnol. 2012, 30, 771–776.
Miga, K.H.; Koren, S.; Rhie, A.; Vollger, M.R.; Gershman, A.; Bzikadze, A.; Brooks, S.; Howe, E.; Porubsky, D.; Logsdon, G.A.; et al. Telomere-to-Telomere Assembly of a Complete Human X Chromosome. Nature 2020, 585, 79–84.
Naish, M.; Alonge, M.; Wlodzimierz, P.; Tock, A.J.; Abramson, B.W.; Schmücker, A.; Mandáková, T.; Jamge, B.; Lambing, C.; Kuo, P.; et al. The Genetic and Epigenetic Landscape of the Arabidopsis Centromeres. Science 2021, 374, eabi7489.
Wang, B.; Yang, X.; Jia, Y.; Xu, Y.; Jia, P.; Dang, N.; Wang, S.; Xu, T.; Zhao, X.; Gao, S.; et al. High-Quality Arabidopsis Thaliana Genome Assembly with Nanopore and HiFi Long Reads. Genom. Proteom. Bioinform. 2022, 20, 4–13.
Song, J.M.; Xie, W.Z.; Wang, S.; Guo, Y.X.; Koo, D.H.; Kudrna, D.; Gong, C.; Huang, Y.; Feng, J.W.; Zhang, W.; et al. Two Gap-Free Reference Genomes and a Global View of the Centromere Architecture in Rice. Mol. Plant 2021, 14, 1757–1767.
Belser, C.; Baurens, F.-C.; Noel, B.; Martin, G.; Cruaud, C.; Istace, B.; Yahiaoui, N.; Labadie, K.; Hřibová, E.; Doležel, J.; et al. Telomere-to-Telomere Gapless Chromosomes of Banana Using Nanopore Sequencing. Commun. Biol. 2021, 4, 1047.
Jiao, W.B.; Accinelli, G.G.; Hartwig, B.; Kiefer, C.; Baker, D.; Severing, E.; Willing, E.M.; Piednoel, M.; Woetzel, S.; Madrid-Herrero, E.; et al. Improving and Correcting the Contiguity of Long-Read Genome Assemblies of Three Plant Species Using Optical Mapping and Chromosome Conformation Capture Data. Genome Res. 2017, 27, 778–786.
Zhao, J.; Bayer, P.E.; Ruperao, P.; Saxena, R.K.; Khan, A.W.; Golicz, A.A.; Nguyen, H.T.; Batley, J.; Edwards, D.; Varshney, R.K. Trait Associations in the Pangenome of Pigeon Pea (Cajanus cajan). Plant Biotechnol. J. 2020, 18, 1946–1954.
Huang, X.; Kurata, N.; Wei, X.; Wang, Z.X.; Wang, A.; Zhao, Q.; Zhao, Y.; Liu, K.; Lu, H.; Li, W.; et al. A Map of Rice Genome Variation Reveals the Origin of Cultivated Rice. Nature 2012, 490, 497–501.
Montenegro, J.D.; Golicz, A.A.; Bayer, P.E.; Hurgobin, B.; Lee, H.T.; Chan, C.K.K.; Visendi, P.; Lai, K.; Doležel, J.; Batley, J.; et al. The Pangenome of Hexaploid Bread Wheat. Plant J. 2017, 90, 1007–1013.
Gao, L.; Gonda, I.; Sun, H.; Ma, Q.; Bao, K.; Tieman, D.M.; Burzynski-Chang, E.A.; Fish, T.L.; Stromberg, K.A.; Sacks, G.L.; et al. The Tomato Pan-Genome Uncovers New Genes and a Rare Allele Regulating Fruit Flavor. Nat. Genet. 2019, 51, 1044–1051.
Kou, Y.; Liao, Y.; Toivainen, T.; Lv, Y.; Tian, X.; Emerson, J.J.; Gaut, B.S.; Zhou, Y. Evolutionary Genomics of Structural Variation in Asian Rice (Oryza sativa) Domestication. Mol. Biol. Evol. 2020, 37, 3507–3524.
Li, H.; Wang, S.; Chai, S.; Yang, Z.; Zhang, Q.; Xin, H.; Xu, Y.; Lin, S.; Chen, X.; Yao, Z.; et al. Graph-Based Pan-Genome Reveals Structural and Sequence Variations Related to Agronomic Traits and Domestication in Cucumber. Nat. Commun. 2022, 13, 682.
Myers, E.W. The Fragment Assembly String Graph. Bioinformatics 2005, 21, ii79–ii85.
Idury, R.M.; Waterman, M.S. A New Algorithm for DNA Sequence Assembly. J. Comput. Biol. 1995, 2, 291–306.
Pevzner, P.A.; Tang, H.; Waterman, M.S. An Eulerian Path Approach to DNA Fragment Assembly. Proc. Natl. Acad. Sci. USA 2001, 98, 9748–9753.
Bankevich, A.; Bzikadze, A.V.; Kolmogorov, M.; Antipov, D.; Pevzner, P.A. Multiplex de Bruijn Graphs Enable Genome Assembly from Long, High-Fidelity Reads. Nat. Biotechnol. 2022, 40, 1075–1081.
Sadasivan, H.; Maric, M.; Dawson, E.; Iyer, V.; Israeli, J.; Narayanasamy, S. Accelerating Minimap2 for Accurate Long Read Alignment on GPUs. bioRxiv 2022, 6, 13–23.
Goff, S.A.; Ricke, D.; Lan, T.H.; Presting, G.; Wang, R.; Dunn, M.; Glazebrook, J.; Sessions, A.; Oeller, P.; Varma, H.; et al. A Draft Sequence of the Rice Genome (Oryza sativa L. Ssp. japonica). Science 2002, 296, 92–100.
Yu, J.; Hu, S.; Wang, J.; Wong, G.K.S.; Li, S.; Liu, B.; Deng, Y.; Dai, L.; Zhou, Y.; Zhang, X.; et al. A Draft Sequence of the Rice Genome (Oryza sativa L. Ssp. indica). Science 2002, 296, 79–92.
Tuskan, G.A.; DiFazio, S.; Jansson, S.; Bohlmann, J.; Grigoriev, I.; Hellsten, U.; Putnam, M.; Ralph, S.; Rombauts, S.; Salamov, A.; et al. The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray). Science 2006, 313, 1596–1604.
Jaillon, O.; Aury, J.M.; Noel, B.; Policriti, A.; Clepet, C.; Casagrande, A.; Choisne, N.; Aubourg, S.; Vitulo, N.; Jubin, C.; et al. The Grapevine Genome Sequence Suggests Ancestral Hexaploidization in Major Angiosperm Phyla. Nature 2007, 449, 463–467.
Velasco, R.; Zharkikh, A.; Troggio, M.; Cartwright, D.A.; Cestaro, A.; Pruss, D.; Pindo, M.; FitzGerald, L.M.; Vezzulli, S.; Reid, J.; et al. A High Quality Draft Consensus Sequence of the Genome of a Heterozygous Grapevine Variety. PLoS ONE 2007, 2, e1326.
Huang, S.; Li, R.; Zhang, Z.; Li, L.; Gu, X.; Fan, W.; Lucas, W.J.; Wang, X.; Xie, B.; Ni, P.; et al. The Genome of the Cucumber, Cucumis sativus L. Nat. Genet. 2009, 41, 1275–1281.
Velasco, R.; Zharkikh, A.; Affourtit, J.; Dhingra, A.; Cestaro, A.; Kalyanaraman, A.; Fontana, P.; Bhatnagar, S.K.; Troggio, M.; Pruss, D.; et al. The Genome of the Domesticated Apple (Malus × Domestica Borkh.). Nat. Genet. 2010, 42, 833–839.
Argout, X.; Salse, J.; Aury, J.M.; Guiltinan, M.J.; Droc, G.; Gouzy, J.; Allegre, M.; Chaparro, C.; Legavre, T.; Maximova, S.N.; et al. The Genome of Theobroma Cacao. Nat. Genet. 2011, 43, 101–108.
Garcia-Mas, J.; Benjak, A.; Sanseverino, W.; Bourgeois, M.; Mir, G.; Gonźalez, V.M.; Heńaff, E.; Camȃra, F.; Cozzuto, L.; Lowy, E.; et al. The Genome of Melon (Cucumis melo L.). Proc. Natl. Acad. Sci. USA 2012, 109, 11872–11877.
Shulaev, V.; Sargent, D.J.; Crowhurst, R.N.; Mockler, T.C.; Folkerts, O.; Delcher, A.L.; Jaiswal, P.; Mockaitis, K.; Liston, A.; Mane, S.P.; et al. The Genome of Woodland Strawberry (Fragaria vesca). Nat. Genet. 2011, 43, 109–116.
Wang, X.; Wang, H.; Wang, J.; Sun, R.; Wu, J.; Liu, S.; Bai, Y.; Mun, J.H.; Bancroft, I.; Cheng, F.; et al. The Genome of the Mesopolyploid Crop Species Brassica Rapa. Nat. Genet. 2011, 43, 1035–1039.
Xu, X.; Pan, S.; Cheng, S.; Zhang, B.; Mu, D.; Ni, P.; Zhang, G.; Yang, S.; Li, R.; Wang, J.; et al. Genome Sequence and Analysis of the Tuber Crop Potato. Nature 2011, 475, 189–195.
Varshney, R.K.; Song, C.; Saxena, R.K.; Azam, S.; Yu, S.; Sharpe, A.G.; Cannon, S.; Baek, J.; Rosen, B.D.; Tar’an, B.; et al. Draft Genome Sequence of Chickpea (Cicer arietinum) Provides a Resource for Trait Improvement. Nat. Biotechnol. 2013, 31, 240–246.
Varshney, R.K.; Chen, W.; Li, Y.; Bharti, A.K.; Saxena, R.K.; Schlueter, J.A.; Donoghue, M.T.A.; Azam, S.; Fan, G.; Whaley, A.M.; et al. Draft Genome Sequence of Pigeonpea (Cajanus cajan), an Orphan Legume Crop of Resource-Poor Farmers. Nat. Biotechnol. 2012, 30, 83–89.
Xu, Y.; Wang, J.; Guo, S.; Zhang, J.; Sun, H.; Salse, J.; Lucas, W.J.; Zhang, H.; Zheng, Y.; Mao, L.; et al. The Draft Genome of Watermelon (Citrullus lanatus) and Resequencing of 20 Diverse Accessions. Nat. Genet. 2013, 45, 51–58.
Zimin, A.V.; Puiu, D.; Hall, R.; Kingan, S.; Clavijo, B.J.; Salzberg, S.L. The First Near-Complete Assembly of the Hexaploid Bread Wheat Genome, Triticum aestivum. Gigascience 2017, 6, gix097.
Sato, K.; Abe, F.; Mascher, M.; Haberer, G.; Gundlach, H.; Spannagl, M.; Shirasawa, K.; Isobe, S. Chromosome-Scale Genome Assembly of the Transformation-Amenable Common Wheat Cultivar ‘Fielder’. DNA Res. 2021, 28, dsab008.
Mascher, M.; Gundlach, H.; Himmelbach, A.; Beier, S.; Twardziok, S.O.; Wicker, T.; Radchuk, V.; Dockter, C.; Hedley, P.E.; Russell, J.; et al. A Chromosome Conformation Capture Ordered Sequence of the Barley Genome. Nature 2017, 544, 427–433.
Li, G.; Wang, L.; Yang, J.; He, H.; Jin, H.; Li, X.; Ren, T.; Ren, Z.; Li, F.; Han, X.; et al. A High-Quality Genome Assembly Highlights Rye Genomic Characteristics and Agronomically Important Genes. Nat. Genet. 2021, 53, 574–584.
Zhang, X.; Chen, S.; Shi, L.; Gong, D.; Zhang, S.; Zhao, Q.; Zhan, D.; Vasseur, L.; Wang, Y.; Yu, J.; et al. Haplotype-Resolved Genome Assembly Provides Insights into Evolutionary History of the Tea Plant Camellia Sinensis. Nat. Genet. 2021, 53, 1250–1259.
Paterson, A.H.; Bowers, J.E.; Bruggmann, R.; Dubchak, I.; Grimwood, J.; Gundlach, H.; Haberer, G.; Hellsten, U.; Mitros, T.; Poliakov, A.; et al. The Sorghum Bicolor Genome and the Diversification of Grasses. Nature 2009, 457, 551–556.
Deschamps, S.; Zhang, Y.; Llaca, V.; Ye, L.; Sanyal, A.; King, M.; May, G.; Lin, H. A Chromosome-Scale Assembly of the Sorghum Genome Using Nanopore Sequencing and Optical Mapping. Nat. Commun. 2018, 9, 4844.
Martis, M.M.; Zhou, R.; Haseneyer, G.; Schmutzer, T.; Vrána, J.; Kubaláková, M.; König, S.; Kugler, K.G.; Scholz, U.; Hackauf, B.; et al. Reticulate Evolution of the Rye Genome. Plant Cell 2013, 25, 3685–3698.
Bauer, E.; Schmutzer, T.; Barilar, I.; Mascher, M.; Gundlach, H.; Martis, M.M.; Twardziok, S.O.; Hackauf, B.; Gordillo, A.; Wilde, P.; et al. Towards a Whole-Genome Sequence for Rye (Secale cereale L.). Plant J. 2017, 89, 853–869.
Rabanus-Wallace, M.T.; Hackauf, B.; Mascher, M.; Lux, T.; Wicker, T.; Gundlach, H.; Baez, M.; Houben, A.; Mayer, K.F.X.; Guo, L.; et al. Chromosome-Scale Genome Assembly Provides Insights into Rye Biology, Evolution and Agronomic Potential. Nat. Genet. 2021, 53, 564–573.
Freire, R.; Weisweiler, M.; Guerreiro, R.; Baig, N.; Hüttel, B.; Obeng-Hinneh, E.; Renner, J.; Hartje, S.; Muders, K.; Truberg, B.; et al. Chromosome-Scale Reference Genome Assembly of a Diploid Potato Clone Derived from an Elite Variety. G3 Genes Genomes Genet. 2021, 11, jkab330.
Bertioli, D.J.; Cannon, S.B.; Froenicke, L.; Huang, G.; Farmer, A.D.; Cannon, E.K.S.; Liu, X.; Gao, D.; Clevenger, J.; Dash, S.; et al. The Genome Sequences of Arachis Duranensis and Arachis Ipaensis, the Diploid Ancestors of Cultivated Peanut. Nat. Genet. 2016, 48, 438–446.
Xu, X.; Liu, X.; Ge, S.; Jensen, J.D.; Hu, F.; Li, X.; Dong, Y.; Gutenkunst, R.N.; Fang, L.; Huang, L.; et al. Resequencing 50 Accessions of Cultivated and Wild Rice Yields Markers for Identifying Agronomically Important Genes. Nat. Biotechnol. 2012, 30, 105–111.
Bayer, P.E.; Golicz, A.A.; Scheben, A.; Batley, J.; Edwards, D. Plant Pan-Genomes Are the New Reference. Nat. Plants 2020, 6, 914–920.
Rijzaani, H.; Bayer, P.E.; Rouard, M.; Doležel, J.; Batley, J.; Edwards, D. The Pangenome of Banana Highlights Differences between Genera and Genomes. Plant Genome 2022, 15, e20100.
Hufnagel, B.; Soriano, A.; Taylor, J.; Divol, F.; Kroc, M.; Sanders, H.; Yeheyis, L.; Nelson, M.; Péret, B. Pangenome of White Lupin Provides Insights into the Diversity of the Species. Plant Biotechnol. J. 2021, 19, 2532–2543.
Kamal, N.; Lux, T.; Jayakodi, M.; Haberer, G.; Gundlach, H.; Mayer, K.F.X.; Mascher, M.; Spannagl, M. The Barley and Wheat Pan-Genomes. In Methods in Molecular Biology; Springer: Berlin/Heidelberg, Germany, 2022; Volume 2443.
Bayer, P.E.; Petereit, J.; Durant, É.; Monat, C.; Rouard, M.; Hu, H.; Chapman, B.; Li, C.; Cheng, S.; Batley, J.; et al. Wheat Panache: A Pangenome Graph Database Representing Presence–Absence Variation across Sixteen Bread Wheat Genomes. Plant Genome 2022, 15, e20221.
Ruperao, P.; Thirunavukkarasu, N.; Gandham, P.; Selvanayagam, S.; Govindaraj, M.; Nebie, B.; Manyasa, E.; Gupta, R.; Das, R.R.; Odeny, D.A.; et al. Sorghum Pan-Genome Explores the Functional Utility for Genomic-Assisted Breeding to Accelerate the Genetic Gain. Front. Plant Sci. 2021, 12, 666342.
Ellis, J.A.; Ong, B. The MassARRAY® System for Targeted SNP Genotyping. In Methods in Molecular Biology; Springer: Berlin/Heidelberg, Germany, 2017; Volume 1492.
Dalma-Weiszhausz, D.D.; Warrington, J.; Tanimoto, E.Y.; Miyada, C.G. The Affymetrix GeneChip® Platform: An Overview. Methods Enzymol. 2006, 410, 3–28.
Fazal, F.M.; Han, S.; Parker, K.R.; Kaewsapsak, P.; Xu, J.; Boettiger, A.N.; Chang, H.Y.; Ting, A.Y. Atlas of Subcellular RNA Localization Revealed by APEX-Seq. Cell 2019, 178, 473–490.e26.
Mealer, M.; Moss, M. TaqMan® Small RNA Assays. Appl. Biosyst. 2018, 44, 4398987.
Xiao, Q.; Bai, X.; Zhang, C.; He, Y. Advanced High-Throughput Plant Phenotyping Techniques for Genome-Wide Association Studies: A Review. J. Adv. Res. 2022, 35, 215–230.
Mochida, K.; Koda, S.; Inoue, K.; Hirayama, T.; Tanaka, S.; Nishii, R.; Melgani, F. Computer Vision-Based Phenotyping for Improvement of Plant Productivity: A Machine Learning Perspective. Gigascience 2018, 8, giy153.
Tsaftaris, S.A.; Minervini, M.; Scharr, H. Machine Learning for Plant Phenotyping Needs Image Processing. Trends Plant Sci. 2016, 21, 989–991.
Lee, U.; Chang, S.; Putra, G.A.; Kim, H.; Kim, D.H. An Automated, High-Throughput Plant Phenotyping System Using Machine Learning-Based Plant Segmentation and Image Analysis. PLoS ONE 2018, 13, e0196615.
Kolhar, S.; Jagtap, J. Plant Trait Estimation and Classification Studies in Plant Phenotyping Using Machine Vision—A Review. Inf. Process. Agric. 2021, 10, 114–135.
Wang, J.; Zhang, Z. GAPIT Version 3: Boosting Power and Accuracy for Genomic Association and Prediction. Genom. Proteom. Bioinform. 2021, 19, 629–640.
Turner, S.D. Qqman: An R Package for Visualizing GWAS Results Using Q-Q and Manhattan Plots. J. Open Source Softw. 2018, 3, 731.
Magno, R.; Maia, A.T. Gwasrapidd: An R Package to Query, Download and Wrangle GWAS Catalog Data. Bioinformatics 2020, 36, 649–650.
Drivas, T.G.; Lucas, A.; Ritchie, M.D. EQTpLot: A User-Friendly R Package for the Visualization of Colocalization between EQTL and GWAS Signals. BioData Min. 2021, 14, 32.
Hiersche, M.; Rühle, F.; Stoll, M. Postgwas: Advanced GWAS Interpretation in R. PLoS ONE 2013, 8, e71775.
Gogarten, S.M.; Bhangale, T.; Conomos, M.P.; Laurie, C.A.; McHugh, C.P.; Painter, I.; Zheng, X.; Crosslin, D.R.; Levine, D.; Lumley, T.; et al. GWASTools: An R/Bioconductor Package for Quality Control and Analysis of Genome-Wide Association Studies. Bioinformatics 2012, 28, 3329–3331.
He, F.; Ding, S.; Wang, H.; Qin, F. IntAssoPlot: An R Package for Integrated Visualization of Genome-Wide Association Study Results With Gene Structure and Linkage Disequilibrium Matrix. Front. Genet. 2020, 11, 260.