1000/1000
Hot
Most Recent
Plant breeding is a long and tedious process involving the generation of large populations through controlled crosses and the final selection of top individuals, the future new varieties. This process can take between 5 years in the case of horticultural crops to 15 years in the case of perennial fruit crops or 25 years in forest species. Plant breeding is an applied science, insofar as it is focused on solving specific problems, such as productivity, resistance to biotic and abiotic stresses, fruit quality, postharvest performance and sensorial attributes. In this context, a critical decision is the choice of genotypes that are used as parents. Additionally, the management, phenotyping and selection process of these seedlings are the main factors limiting the generation of new cultivars. In order to improve efficiency and sturdiness of plant breeding programs in relation to parent and seedling selection, the implementation of molecular tools is an essential requirement, including development of Marker Assisted Selection (MAS) strategies. On the other hand, we are facing a new molecular-biological perspective based on new methodologies that are affecting the genetics theory in addition to the definition of gene and Central Dogma of Molecular Biology (CDMB). This new molecular perspective, open new possibilities to improve the use of molecular tools in plant breeding. The goal of this review is the discussion about the new perspective of Plant Breeding in the context of the present Postgenomic era.
The Post-genomic era includes features from a methodological point of view and from a global perspective. Firstly, researchers in this era have incorporated new methods of high-throughput sequencing of DNA and RNA, the development of complete genomes that allow a precise reference of the molecular results obtained, and the development of high-throughput methods as Genotype by Sequencing (GBS) for genomic analysis (SNPs, Single Nucleotide Polymorphism), and RNAseq for transcriptomic analysis (DEGs, Differentially Expressed Genes). The generation of large genetic and phenotypic information obtained in a massive way (through high-throughput sequencing and high throughput phenotyping approaches) and the creation of databases with billions of these data, result in what is called in research “Big Data”, which also require the development of new mathematical algorithms (such as machine learning models) and a large set of bioinformatic tools. In addition, from a global perspective, the center of gravity of the molecular processes is focused on the expression of genes and the way in which such expression is regulated. These features have turned researcher’s attention towards the study of gene regulatory networks, and in particular to the role of RNA and other non-derived DNA biomolecules [1].
At this moment, more than 550 plant species have sequenced and their reference genomes are available including the most important crops [2] (Table 1). This whole sequenced work started in 2002 with the development of the reference genome of the rice. Last years for instance, several important crops were sequenced including hazelnut, mulberry, pistachio, poplar or almond. The development of complete genomes is making any organism accessible and amenable for many kinds of studies [1] which will allow a precise reference of the molecular results obtained, and the development of high-throughput methods for genomic analysis involving the most abundant genetic variation (SNPs) and transcriptomic analysis at gene expression level (DEGs) in a new postgenomic perspective [3].
Table 1. Most important crops sequenced with available reference genome at https://www.ncbi.nlm.nih.gov/genome [2].
Species |
Crop |
Year |
Species |
Crop |
Year |
Kiwifruit |
2013 |
Lettuce |
2011 |
||
Pineapple |
2016 |
Apple |
2010 |
||
Peanut |
2018 |
Cassava |
2014 |
||
Argan |
2018 |
Mulberry |
2019 |
||
Asparagus |
2016 |
Tobacco |
2014 |
||
Oats |
2018 |
Olive |
2017 |
||
Sugar beet |
2013 |
Rice |
2002 |
||
Rape |
2014 |
Avocado |
2018 |
||
Cabbage |
2014 |
Bean |
2013 |
||
Mustard |
2011 |
Date Palm |
2009 |
||
Tea |
2019 |
Pine |
2015 |
||
Hemp |
2011 |
Pistachio |
2019 |
||
Pepper |
2013 |
Pea |
2018 |
||
Safflower |
2016 |
Poplar |
2019 |
||
Chickpea |
2012 |
Cherry |
2017 |
||
Watermelon |
2011 |
Almond |
2019 |
||
Clementina |
2013 |
Peach |
2011 |
||
Orange |
2012 |
Pomegranate |
2017 |
||
Coffee |
2018 |
Oak |
2018 |
||
Hazelnut |
2019 |
Rosa |
2018 |
||
Muskmelon |
2012 |
Secale cereale |
Rye |
2016 |
|
Cucumber |
2007 |
Tomato |
2009 |
||
Pumpkin |
2017 |
Eggplant |
2014 |
||
Carrot |
2016 |
Potato |
2011 |
||
Strawberry |
2010 |
Sorghum |
2009 |
||
Soybean |
2014 |
Cacao |
2016 |
||
Cotton |
2015 |
Wheat |
2010 |
||
Sunflower |
2017 |
Bean |
2015 |
||
Rubber |
2013 |
Cowpea |
2016 |
||
Barley |
2011 |
Grape |
2007 |
||
Walnut |
2015 |
Maize |
2010 |
The new Big Data Biology harboring the development of DNA (and RNA as cDNA) high-throughput sequencing technologies together with bioinformatics analysis as well as the creation of databases with billions of data has made possible to access genetics knowledge at the level of each nucleotide. This new methodological perspective where millions of sequences are available in one single experiment with the detailed information of complete genomes (defined as the DNA organized into separate chromosomes inside the nucleus of a cell) and transcriptomes (described as the complete list of all types of RNA molecules). Several authors have even characterized this data-intensive biology as a new kind of science, a science of information management, different from traditional biology.
On the other hand, high-troughput sequencing technologies resulted in a great advance in the development and application of MAS strategies. The first generation of sequencers had an average cost of 0.5 euro per base or nucleotide, which was unaffordable for most laboratories. Later, second generation or high performance ("high-throughput") sequencing techniques emerged for sequencing DNA (DNA-Seq, in 2005) and cDNA from RNA (RNA-Seq, in 2008), based on generating thousands of parallel sequencing reactions immobilized on a solid surface. This technology lowered the cost of sequencing by reducing the reagent requirements. The cost of 0.01 euro per Mb in 2018 facilitated the launch of massive sequencing techniques. In addition, the development of high-throughput methods for genomic analysis (SNPs, Single Nucleotide Polymorphism) and transcriptomic analysis (DEGs, Differentially Expressed Genes) through the FLUIDIGM (https://www.fluidigm.com/) platform is another relevant progress.
This new perspective integrating available reference genomes and new sequencing and bioinformatic methodologies will allow the implementation of new MAS to accelerate breeding process and genome editing strategies [3][4].
Additionally, from a global point of view, the new post-genomics era is characterized by a change in perspective on trait expression derived from the Encyclopedia of DNA Elements (ENCODE) project, performed on humans, in which the study of RNA rather than DNA was determined as the linchpin of trait expression processes [5]. In fact, the genomes of eukaryotic organisms are almost entirely transcribed, giving birth to an enormous number of non–protein-coding RNAs. In this new context, a recent concept, pervasive (interleaved) transcription, is being applied to the whole transcription process. These authors described pervasive transcription as the transcription of the interspersed genes that are embedded within the normal coding genes. The entire stretch of the genome (in other times called “junk DNA”) is transcribed, whether it is coding for a particular protein or not. Therefore, not all coding sequences lie juxtaposed, and they may also overlap one another. Jarvis and Robertson [6] describe this pervasive transcription of noncoding RNA as the dark matter emerging in the post-genomics era.
The new concept of the gene as a process has been applied to the ENCODE project, which has reconsidered the role of RNA compared to DNA [3]. The translation in protein of transcribed mRNA (messenger), together with the presence of non-coding RNA and non-regulatory RNA as rRNA and tRNA and post-transcriptional and post-translational regulation by small noncoding but regulating RNA (miRNA, siRNA, PiRNA or snoRNA), are key to the process of DNA transcription and expression in a particular phenotype. The new research results emphasize the importance of all types of RNA in the final expression of all DNA (including the formerly called “junk DNA”). In addition, ENCODE somewhat tipped the balance for the first time, albeit slightly, in favor of RNA, emphasizing the need for resources to go deeper into these mechanisms of life explained as processes. This new concept definitely rejects the concept of “junk DNA”. In the same vein, it is necessary to indicate the huge number of kinds of RNA with little or no coding capacity that have been discovered quite recently, suggesting that the gene transcript is not the fundamental unit operating in genetics. Moreover, Gerstein et al. [6] defined the gene as a DNA or RNA sequence encoding either directly or from overlapping regions for a functional RNA or protein.
In addition, in the post-genomics era, numerous authors have showed important experimental phenomenon affecting the following concepts: the gene (gene fusion and pleiotropy); the CDMB (copy-number variation, epigenetic regulation of DNA expression, long noncoding RNA, pervasive transcription, DNA damage); the origin of genetic variation (adaptive mutations, epigenetic inheritance or inheritance of acquired traits); and the effect of the environment on these variations (transgenerational transmission of environmental information). These phenomena are new exemplars out of the genetics paradigm. This situation has evolved at an ever-increasing speed over the last years, with significant new experimental, such as the epitranscriptomic (epigenetic regulation of RNA), the epimutations, the long noncoding RNA, the DNA damage or the transgenerational transmission of environmental information (Table 2).
Table 2. Experimental phenomena evidenced in the postgenomic era.
Experimental phenomenon |
Year |
Reference |
Adaptive mutations |
2005 |
[7] |
Copy-number variation |
2005 |
[8] |
Gene fusions |
2006 |
[9] |
Pleiotropy |
2006 |
[10] |
Epigenetic regulation of DNA expression |
2010 |
[11] |
Epigenetic inheritance |
2010 |
[12] |
Pervasive transcription |
2011 |
[13] |
Epigenetic regulation of RNA |
2012 |
[14] |
Epimutations |
2014 |
[15] |
Long noncoding RNA |
2015 |
[16] |
DNA damage |
2017 |
[17] |
Transgenerational transmission of environmental information |
2017 |
[18] |
In this context, genomic (DNA analysis including Copy-number variation, Gene fusions and Pleiotropy) studies for the development of MAS strategies are particularly useful when the evaluation of the character is expensive, time-consuming or with long juvenile periods. In addition, proteomic (proteins and enzymes), transcriptomic (mRNA, lncRNA) and epigenetic (DNA Methylation and histone modifications, epimutations) studies are being applied to breeding programs. These integrated approaches have been classically performed for the genetic characterization of the plant material. However, at this moment the development of suitable markers to be applied in selection for agronomical traits must be used for the clarification of the mentioned genomic studies and the development of more efficient markers.
These new perspectives of plant molecular breeding include the application of different developed markers (at genomic, transcriptomic and epigenetic level) in both development of new varieties and exploitation (Figure 1).
Figure 1. Schematic illustration of DNA, RNA and epigenetic markers applied in plant breeding in the development and exploitation of new varieties.
Plant productivity for feeding an increasing human population and the study of plant diseases and their management in unpredictable context of global climatic changes are particularly relevant aspects as well as genetic plant improvement from the point of view of the quality of fruits and vegetables. Therefore, classical breeding must be combined with MAS using the diverse molecular tools available in the postgenomic era. This situation does mean that MAS will replace conventional breeding, other ways is a necessary complement. The practical learning that conventional breeding gives is unparalleled with the molecular support. In this context, there has been a significant shift of conventional breeders to molecular breeding aspects. However, in this enjoyed strategy (conventional and molecular) the target must be well defined in a complete integrated work plan. While the ability of breeders to generate large new breeding populations is almost unlimited, the evaluation and selection of these promising seedlings is the main limiting factors due to the cost and time-consuming. In this context, genomic studies at DNA level are especially useful for the development of MAS strategies. In addition, proteomic (proteins and enzymes), transcriptomic (RNA) and epigenetic (DNA Methylation and histone modifications) studies are being applied to breeding programs. Finally, these strategies at genomic, epigenomic, transcriptomic and proteomic level should be all integrated for a better understanding of the molecular mechanisms involved in the most important plant breeding aspects, which will facilitate the development and optimization of molecular markers to apply in the field exploitation of the new varieties, offering and integrating complete technological offers.