Submitted Successfully!
To reward your contribution, here is a gift for you: A free trial for our video production service.
Thank you for your contribution! You can also upload a video entry or images related to this topic.
Version Summary Created by Modification Content Size Created at Operation
1 -- 3172 2022-12-01 13:55:20 |
2 format change Meta information modification 3172 2022-12-02 04:40:37 |

Video Upload Options

Do you have a full video?


Are you sure to Delete?
If you have any further questions, please contact Encyclopedia Editorial Office.
Mu, H.;  Wang, B.;  Yuan, F. Bioinformatics in Plant Breeding and Disease Resistance. Encyclopedia. Available online: (accessed on 22 June 2024).
Mu H,  Wang B,  Yuan F. Bioinformatics in Plant Breeding and Disease Resistance. Encyclopedia. Available at: Accessed June 22, 2024.
Mu, Huiying, Baoshan Wang, Fang Yuan. "Bioinformatics in Plant Breeding and Disease Resistance" Encyclopedia, (accessed June 22, 2024).
Mu, H.,  Wang, B., & Yuan, F. (2022, December 01). Bioinformatics in Plant Breeding and Disease Resistance. In Encyclopedia.
Mu, Huiying, et al. "Bioinformatics in Plant Breeding and Disease Resistance." Encyclopedia. Web. 01 December, 2022.
Bioinformatics in Plant Breeding and Disease Resistance

In the context of plant breeding, bioinformatics can empower genetic and genomic selection to determine the optimal combination of genotypes that will produce a desired phenotype and help expedite the isolation of these new varieties. Bioinformatics is also instrumental in collecting and processing plant phenotypes, which facilitates plant breeding. Robots that use automated and digital technologies to collect and analyze different types of information to monitor the environment in which plants grow, analyze the environmental stresses they face, and promptly optimize suboptimal and adverse growth conditions accordingly, have helped plant research and saved human resources.

plant resistance plant breeding bioinformatics automated robots precision fertilization

1. Introduction

Broadly speaking, bioinformatics covers the interdisciplinary studies of biological objects (including genes, proteins, and physiological indices) using informatics methods, such as various algorithms and statistical methods. Specifically, complex biological data can be processed using computer tools, which is common practice in dedicated databases, such as in nucleic acid databases, protein databases, and custom functional databases [1]. The implementation of bioinformatics tools reduces the cost of complex analyses, thus enhancing research into topics such as sustainable agriculture [2]. Understanding how bioinformatics can be applied to plant biology research is therefore important for researchers in the life sciences, and here, the researchers have provided a description of these tools and their applications, focusing on plant breeding and research on disease resistance. For example, based on the VPg gene sequence of a PVY (Y virus) isolated from potato, combined with all published sequences in GenBank, two things can be inferred: the rate of evolution of PVY and the time to reach the most recent common ancestor using a Bayesian system dynamics framework to advance disease resistance studies in potatoes [3][4][5]. Given that multifactorial traits involved in resistance and quality are extremely difficult to improve, especially in combinations, and some of the genomes of major forage crops, such as maize, rice, wheat, sorghum and barley, and the forage legumes soybean and alfalfa, are too large to be analyzed using whole-genome sequencing, attention has been focused on comparative genomic approaches in order to produce seeds with desirable shapes [6][7][8].
The typical datasets generated by plant researchers contain morphological, physiological, molecular, and genetic information that describes the entire plant life cycle. Bioinformatics process the collected data and extract key indices and trends to quickly and accurately generate hypotheses and then offer solutions. For example, phenotypes and genotypes can be combined to reveal the underlying mechanism, such as the study of plant rejuvenation [9], and the future growth pattern of plants can be predicted according to the growth trend of plants in the past, such as the plant growth pattern prediction system, developed by deep learning [10], and the comparison of multiple genomes can be used for the prediction of evolutionary relationships, such as in the study of Amphicarpaea edgeworthii [11].
In agricultural applications, the wide utilization of bioinformatics can assist with efficient crop breeding and the improvement of plant resistance against pathogens [12]. In particular, scientists are committed to breeding and modifying crop species to improve the yield and quality, as well as creating new varieties with qualities that benefit human nutrition and health. Bioinformatics accelerates the generation and deployment of these new varieties. Indeed, genes associated with specific traits can be analyzed on a computer before being introduced into a plant, and the results can be used to determine what to introduce further into the plant for a precise phenotypic analysis. Maize (Zea mays L.) kernels, rich in lysine [13]; lettuce (Lactuca sativa), high in vitamin C [14]; and the recently developed vitamin D-rich tomato [15] are examples of the implementation of such pipelines.
Bioinformatics plays a critical role in data integration, analysis, and model prediction, as well as in managing the massive amounts of data resulting from new, high-throughput approaches [16]. Classical biological experiments, such as the visualization of mitosis and meiosis and pollen tube growth, are undergoing deeper, higher throughput exploration thanks to bioinformatics and time-lapse microscopy [17][18]. Plant growth can be predicted based on the available wealth of physiological and phenotypic data, enabling the generation of a virtual plant that can accurately predict growth patterns and the consequences of interactions with diseases or pests [19]. Bioinformatics has also wide applications in the analysis of plant resistance to various stresses [20]. The molecular mechanisms underlying plant responses to abiotic stress have been studied in depth, and they can open new avenues in agriculture when combined with the predictive power of bioinformatics [21]. In addition, bioinformatics has been applied in plant pathology, such as identifying and predicting the “effector” proteins produced by plant pathogens in order to manipulate their host plants. The functional annotation of this pathogen’s ability to predict virulence is a critical step in translating the sequence data into potential applications in plant pathology [22]. A bioinformatics framework has been proposed to enable stakeholders to make more informed decisions. In this way, a shared biosecurity infrastructure can be established to cater for sustainable global food and fiber production in the context of global climate change and the increased chances of accidental disease invasions in the global plant trade [23].

2. Databases Provide Abundant Gene and Pathway Information to Study Plant Biology

Thanks to large-scale sequencing technologies, vast amounts of data are released continuously and are often uploaded to a specific database. Depending on the species they represent, databases can be formally classified as general or species-specific databases.
General databases include those that integrate information about genomes, proteins, and metabolic pathways (Table 1). Genome databases represent a centralized and public collection of all published data, so researchers can easily obtain information concerning their gene or protein of interest. For example, UniProt offers a comprehensive resource for protein sequences and functional annotation. The database can be queried with a specific gene/protein name or with keywords of interest to sort through the catalogued data, but it is also possible to perform a protein BLAST (basic local alignment search tool) and download the sequence of the new protein of interest [24]. In addition, general databases compile various biological pathways, such as those represented in Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), EuKaryotic Orthologous Groups (KOG), and metabolic pathways, which can be used to determine if a candidate protein belongs to one of many known pathways.
Table 1. General databases used for data integration and presentation.
As one example, a bioinformatic analysis of ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) from multiple C3 plant species included an in silico characterization of RuBisCO and its interacting proteins, whose structures and functions were predicted with the ProtParam, SOPMA, Predotar 1.03, SignalP 4.1, TargetP 1.1, and TMHMM 2.0 tools, which are all accessible from the ExPASy database. A MEME and MAST analysis of RuBisCO from all C3 plants, combined with a phylogenetic tree constructed with MEGA 6.06 software based on a sequence alignment obtained with the ClustalW algorithm, illustrated the high-sequence identity shared by RuBisCO from different C3 plant species, supporting the notion that they originated from a common ancestor [25]. A list of these databases and how they are used is provided in Table 1.
The model plant Arabidopsis thaliana is one of a few plant species with its own databases due to its widespread use in plant research (Table 2). These databases are rich in resources and can help researchers quickly obtain the latest Arabidopsis genome information. For example, The Arabidopsis Information Resource (TAIR) database allows users to download gene sequences in bulk, while the SeqViewer in TAIR also provides a simple tool to visualize the genes. In addition, TAIR has a powerful function for displaying various expression maps, each representing expression data during Arabidopsis development or under different growth conditions [26].
Table 2. Databases specific for Arabidopsis.
Most major crops have dedicated databases, including rice (Oryza sativa), wheat (Triticum aestivum), barley (Hordeum vulgare), maize, soybean (Glycine max), cotton (Gossypium hirsutum), and sorghum (Sorghum bicolor) (Table 3). For example, the Rice Mutant Database (RMD) includes mutants for the identification of new genes and regulatory elements and includes a list of lines for the ectopic expression of target genes in specific tissues or at specific growth stages, providing rich data resources for the study of different rice mutants [27]. The Wheat Genomic Variation Database (WGVD) compiles all published single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (InDels), and selection sweeps, together with a BLAST search tool for wheat [28]. Researchers at Huazhong Agricultural University have developed the ZEAMAP database for corn, which includes multiple omics data resources, such as genomes, transcriptomes and genetic variation, phenotypic data, metabolome studies, and genetic maps. The database also provides access to a variety of data on complex traits and boasts rich online capacities for data retrieval, analysis, and visualization [29].
Table 3. Databases used for major crops.

3. Various Algorithms Create Possibilities for Customized Analysis

Bioinformatics tools or websites can be used to predict protein structure, to look for conserved domains in a protein, or to annotate genes (Table 4). Data visualization and presentation are an integral part of bioinformatics analysis [30]. The biggest advantages of TBtools are batch processing and the visualization of data, and the interactive graphics generated with TBtools are rich with editable features that provide maximum flexibility for users [31]. Protter supports protein data analysis and protein prediction by visualizing the characteristics of an annotated sequence and associated experimental proteomic data in a protein topological environment. Protter is of great use for comprehensive visualization of membrane proteins and the selection of targeted proteomic peptides [32].
Table 4. Bioinformatics tools and websites that can be used in plant research.
Many tools are also tailored for specific applications (Table 5), including the prediction of transcription factor binding sites and the exploration of large-scale genomic variation data. For example, the PlantPAN database hosts a comprehensive list of transcription factors and their cognate binding sites. TRANSFAC and PlnTFDB are comprehensive databases of plant transcription factors, and AGRIS contains a database of Arabidopsis transcription factors, which can be used to predict transcription factor binding sites in plant promoter regions [33]. SnpHub can be used to retrieve, analyze, and visualize large-scale genomic variation data by specifying samples and lists of specific genomic regions [34].
Table 5. Bioinformatics tools specific to applications in plant research.
Outside of dedicated web tools, various algorithms can be used to empower data integration and analysis, such as Python, R, and Perl. Python and R are perhaps more widely used in bioinformatics than Perl. R has powerful statistical functions, which are very useful for processing large experimental datasets, together with a graphics solution for data exploration [35]. Python is better suited for building databases and web applications and is better for developing utilities [36]. While the basic introductory programming paradigm in R relies on so-called functions hosted by user-written packages, Python’s programming paradigm is based on design flow. Although R code might not be as human-readable as Python’s, R is overall better suited to biologists with no strong programming background. Based on these programming languages, various scripts have been developed to efficiently analyze data. For example, R uses a k-means function for clustering analysis and can draw Manhattan plots produced from genome-wide association studies (GWASs) with the qqman package [37].

4. Application of Bioinformatics in Plant Breeding

Plant breeding aims to produce new plant varieties. This long-term activity begins with basic research and often takes many years, thus necessitating a significant financial investment [38]. Genomics-assisted breeding is an effective and economical strategy and is thus widely applied in crop breeding. Genomics may help to understand the organization and function of biological systems and has the potential to track the molecular changes during development under different conditions, such as changes in plant physiology, pathogen pressure, or in the environment [39]. Samples for genomics studies can be collected from the same or different individuals from one species or from different species [40]. In addition, comparative genomics allows the study of specific traits in related plants by capitalizing on sequence conservation between species with small genomes (easier to study) and those with large and complex genomes (more difficult to study, but including most current crop species). For example, in Chrysanthemum, GWASs have been used to explore genetic patterns and identify favorable alleles for several ornamental and resistance traits, including plant structural and inflorescence traits, waterlogging tolerance, aphid resistance, and drought tolerance [41]. Su et al. transferred a major SNP co-isolated with waterlogging tolerance in Chrysanthemum to a PCR-based derived cut amplified polymorphism sequence (dCAPS) marker with an accuracy of 78.9%, which was verified in 52 cultivars or progenitors [42]. Chong et al. developed two dCAPS markers associated with the flowering stage and diameter of the head in Chrysanthemum. These dCAPS markers have potential applications in the molecular breeding of Chrysanthemum [43]. These techniques will provide new powerful tools for future Chrysanthemum breeding.


  1. Chen, C.; Huang, H.; Wu, C.H. Protein Bioinformatics Databases and Resources. Methods Mol. Biol. 2017, 1558, 3–39.
  2. Małyska, A.; Jacobi, J. Plant breeding as the cornerstone of a sustainable bioeconomy. New Biotechnol. 2018, 40, 129–132.
  3. Mao, Y.; Sun, X.; Shen, J.; Gao, F.; Qiu, G.; Wang, T.; Nie, X.; Zhang, W.; Gao, Y.; Bai, Y. Molecular Evolutionary Analysis of Potato Virus Y Infecting Potato Based on the VPg Gene. Front. Microbiol. 2019, 10, 1708.
  4. Blätke, M.A.; Szymanski, J.J.; Gladilin, E.; Scholz, U.; Beier, S. Advances in Applied Bioinformatics in Crops. Front. Plant Sci. 2021, 12, 640394.
  5. Kushwaha, U.K.S.; Deo, I.; Jaiswal, J.P.; Prasad, B. Role of Bioinformatics in Crop Improvement. Glob. J. Sci. Front. Res. 2017, 17, 13–23.
  6. Alemu, K. The role and application of bioinformatics in plant disease management. Adv. Life Sci. Technol. 2015, 28, 28–33.
  7. Khan, A.; Singh, S.; Singh, V.K. Bioinformatics in Plant Pathology. In Emerging Trends in Plant Pathology; Singh, K.P., Jahagirdar, S., Sarma, B.K., Eds.; Springer: Singapore, 2021.
  8. Vassilev, D.; Leunissen, J.; Atanassov, A.; Nenov, A.; Dimov, G. Application of Bioinformatics in Plant Breeding. Biotechnol. Biotechnol. Equip. 2005, 19, 139–152.
  9. Zhang, Z.; Sun, Y.; Li, Y. Plant rejuvenation: From phenotypes to mechanisms. Plant Cell Rep. 2020, 39, 1249–1262.
  10. Yasrab, R.; Zhang, J.; Smyth, P.; Pound, M.P. Predicting Plant Growth from Time-Series Data Using Deep Learning. Remote Sens. 2021, 13, 331.
  11. Song, T.; Zhou, M.; Yuan, Y.; Yu, J.; Cai, H.; Li, J.; Chen, Y.; Bai, Y.; Zhou, G.; Cui, G. First high-quality reference genome of Amphicarpaea edgeworthii. bioRxiv 2020.
  12. Gomez-Casati, D.F.; Busi, M.V.; Barchiesi, J.; Peralta, D.A.; Hedin, N.; Bhadauria, V. Applications of Bioinformatics to Plant Biotechnology. Curr. Issues Mol. Biol. 2018, 27, 89–104.
  13. Houmard, N.M.; Mainville, J.L.; Bonin, C.P.; Huang, S.; Luethy, M.H.; Malvar, T.M. High-lysine corn generated by endosperm-specific suppression of lysine catabolism using RNAi. Plant Biotechnol. J. 2007, 5, 605–614.
  14. Guo, X.; Liu, R.H.; Fu, X.; Sun, X.; Tang, K. Over-expression of l-galactono-γ-lactone dehydrogenase increases vitamin C, total phenolics and antioxidant activity in lettuce through bio-fortification. Plant Cell Tissue Organ Cult. (PCTOC) 2013, 114, 225–236.
  15. Li, J.; Scarano, A.; Gonzalez, N.M.; D’Orso, F.; Yue, Y.; Nemeth, K.; Saalbach, G.; Hill, L.; de Oliveira Martins, C.; Moran, R.; et al. Biofortified tomatoes provide a new route to vitamin D sufficiency. Nat. Plants 2022, 8, 611–616.
  16. Ko, G.; Kim, P.-G.; Yoon, J.; Han, G.; Park, S.-J.; Song, W.; Lee, B. Closha: Bioinformatics workflow system for the analysis of massive sequencing data. BMC Bioinform. 2018, 19, 43.
  17. Luciano, A.M.; Peluso, J.J. PGRMC1 and the faithful progression through mitosis and meiosis. Cell Cycle 2016, 15, 2239–2240.
  18. Hoffmann, R.D.; Portes, M.T.; Olsen, L.I.; Damineli, D.S.C.; Hayashi, M.; Nunes, C.O.; Pedersen, J.T.; Lima, P.T.; Campos, C.; Feijó, J.A.; et al. Plasma membrane H+-ATPases sustain pollen tube growth and fertilization. Nat. Commun. 2020, 11, 2395.
  19. Kim, T.; Lee, S.-H.; Kim, J.-O. A Novel Shape Based Plant Growth Prediction Algorithm Using Deep Learning and Spatial Transformation. IEEE Access 2022, 10, 37731–37742.
  20. İncili, Ç.Y.; Arslan, B.; Çelik, E.N.Y.; Ulu, F.; Horuz, E.; Baloglu, M.C.; Çağlıyan, E.; Burcu, G.; Bayarslan, A.U.; Altunoglu, Y.C. Comparative bioinformatics analysis and abiotic stress responses of expansin proteins in Cucurbitaceae members: Watermelon and melon. Protoplasma 2022.
  21. Aditya Shastry, K.; Sanjay, H.A. Hybrid prediction strategy to predict agricultural information. Appl. Soft Comput. 2020, 98, 106811.
  22. Cock, P.J.A.; Grüning, B.A.; Paszkiewicz, K.; Pritchard, L. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology. PeerJ 2013, 1, e167.
  23. Lucaciu, R.; Pelikan, C.; Gerner, S.M.; Zioutis, C.; Köstlbacher, S.; Marx, H.; Herbold, C.W.; Schmidt, H.; Rattei, T. A Bioinformatics Guide to Plant Microbiome Analysis. Front. Plant Sci. 2019, 10, 1313.
  24. Pundir, S.; Martin, M.J.; O’Donovan, C.; null, n. UniProt Tools. Curr. Protoc. Bioinform. 2016, 53, 1.29.1–1.29.15.
  25. Darabi, M.; Seddigh, S. Computational study of biochemical properties of ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) enzyme in C 3 plants. J. Plant Biol. 2017, 60, 35–47.
  26. Reiser, L.; Subramaniam, S.; Li, D.; Huala, E. Using the Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes. Curr. Protoc. Bioinform. 2017, 60, 1.11.1–1.11.45.
  27. Zhang, J.; Li, C.; Wu, C.; Xiong, L.; Chen, G.; Zhang, Q.; Wang, S. RMD: A rice mutant database for functional analysis of the rice genome. Nucleic Acids Res. 2006, 34, D745–D748.
  28. Wang, J.; Fu, W.; Wang, R.; Hu, D.; Cheng, H.; Zhao, J.; Jiang, Y.; Kang, Z. WGVD: An integrated web-database for wheat genome variation and selective signatures. Database J. Biol. Databases Curation 2020, 2020, baaa090.
  29. Gui, S.; Yang, L.; Li, J.; Luo, J.; Xu, X.; Yuan, J.; Chen, L.; Li, W.; Yang, X.; Wu, S.; et al. ZEAMAP, a Comprehensive Database Adapted to the Maize Multi-Omics Era. iScience 2020, 23, 101241.
  30. Liu, W.; Xie, X.; Ma, X.; Li, J.; Chen, J.; Liu, Y.-G. DSDecode: A Web-Based Tool for Decoding of Sequencing Chromatograms for Genotyping of Targeted Mutations. Mol. Plant 2015, 8, 1431–1433.
  31. Chen, C.; Chen, H.; Zhang, Y.; Thomas, H.R.; Frank, M.H.; He, Y.; Xia, R. TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Mol. Plant 2020, 13, 1194–1202.
  32. Omasits, U.; Ahrens, C.H.; Müller, S.; Wollscheid, B. Protter: Interactive protein feature visualization and integration with experimental proteomic data. Bioinformatics 2013, 30, 884–886.
  33. Chang, W.-C.; Lee, T.-Y.; Huang, H.-D.; Huang, H.-Y.; Pan, R.-L. PlantPAN: Plant promoter analysis navigator, for identifying combinatorial cis-regulatory elements with distance constraint in plant gene groups. BMC Genom. 2008, 9, 561.
  34. Wang, W.; Wang, Z.; Li, X.; Ni, Z.; Hu, Z.; Xin, M.; Peng, H.; Yao, Y.; Sun, Q.; Guo, W. SnpHub: An easy-to-set-up web server framework for exploring large-scale genomic variation data in the post-genomic era with applications in wheat. GigaScience 2020, 9, giaa060.
  35. Ristić, M.M. R programming for bioinformatics. J. Appl. Stat. 2009, 36, 925.
  36. Bioinformatics: Biological models in Python. Nat. Methods 2013, 10, 384.
  37. Chen, X.-W.; Gao, J.X. Big Data Bioinformatics. Methods 2016, 111, 1–2.
  38. Smith, A.M. FAO should focus on real not nominal food prices. Nature 2022, 602, 33.
  39. Nordin, S.M.; Zolkepli, I.A.; Rizal, A.R.A.; Tariq, R.; Mannan, S.; Ramayah, T. Paving the way to paddy food security: A multigroup analysis of agricultural education on Circular Economy Adoption. J. Clean. Prod. 2022, 375, 134089.
  40. Esposito, A.; Colantuono, C.; Ruggieri, V.; Chiusano, M.L. Bioinformatics for agriculture in the Next-Generation sequencing era. Chem. Biol. Technol. Agric. 2016, 3, 9.
  41. Li, P.; Su, J.; Guan, Z.; Fang, W.; Chen, F.; Zhang, F. Association analysis of drought tolerance in cut chrysanthemum (Chrysanthemum morifolium Ramat.) at seedling stage. 3 Biotech 2018, 8, 226.
  42. Su, J.; Zhang, F.; Chong, X.; Song, A.; Guan, Z.; Fang, W.; Chen, F. Genome-wide association study identifies favorable SNP alleles and candidate genes for waterlogging tolerance in chrysanthemums. Hortic. Res. 2019, 6, 21.
  43. Chong, X.; Su, J.; Wang, F.; Wang, H.; Song, A.; Guan, Z.; Fang, W.; Jiang, J.; Chen, S.; Chen, F.; et al. Identification of favorable SNP alleles and candidate genes responsible for inflorescence-related traits via GWAS in chrysanthemum. Plant Mol. Biol. 2019, 99, 407–420.
Subjects: Biology
Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to : , ,
View Times: 979
Revisions: 2 times (View History)
Update Date: 02 Dec 2022
Video Production Service