Submitted Successfully!
To reward your contribution, here is a gift for you: A free trial for our video production service.
Thank you for your contribution! You can also upload a video entry or images related to this topic.
Version Summary Created by Modification Content Size Created at Operation
1 -- 1493 2023-04-19 15:43:29 |
2 layout -1 word(s) 1492 2023-04-20 03:07:02 |

Video Upload Options

Do you have a full video?

Confirm

Are you sure to Delete?
Cite
If you have any further questions, please contact Encyclopedia Editorial Office.
Yoosefzadeh Najafabadi, M.; Hesami, M.; Eskandari, M. Machine Learning Basis and Function. Encyclopedia. Available online: https://encyclopedia.pub/entry/43245 (accessed on 14 October 2024).
Yoosefzadeh Najafabadi M, Hesami M, Eskandari M. Machine Learning Basis and Function. Encyclopedia. Available at: https://encyclopedia.pub/entry/43245. Accessed October 14, 2024.
Yoosefzadeh Najafabadi, Mohsen, Mohsen Hesami, Milad Eskandari. "Machine Learning Basis and Function" Encyclopedia, https://encyclopedia.pub/entry/43245 (accessed October 14, 2024).
Yoosefzadeh Najafabadi, M., Hesami, M., & Eskandari, M. (2023, April 19). Machine Learning Basis and Function. In Encyclopedia. https://encyclopedia.pub/entry/43245
Yoosefzadeh Najafabadi, Mohsen, et al. "Machine Learning Basis and Function." Encyclopedia. Web. 19 April, 2023.
Machine Learning Basis and Function
Edit

Machine learning, as an important subfield of AI, has been widely used in different aspects of our lives, such as communication and agriculture, among many others. In agriculture, ML algorithms can be used for crop-yield prediction, crop-growth monitoring, precision agriculture, and automated irrigation.

data-integration strategies deep learning

1. Machine Learning: Basis 

Machine learning, as an important subfield of AI, has been widely used in different aspects of our lives, such as communication and agriculture, among many others [1][2]. In agriculture, ML algorithms can be used for crop-yield prediction, crop-growth monitoring, precision agriculture, and automated irrigation [3]. ML algorithms are typically divided into three subgroups: supervised learning, unsupervised learning, and reinforcement learning, which are extensively reviewed in Hesami et al. [4]; therefore, researchers provide only a brief explanation of these subgroups in this entry. In supervised learning, the algorithm is trained on a labeled dataset to make predictions based on the data [5]. The model learns by being given a set of inputs and associated outputs and then adjusting its internal parameters to produce the desired output. Supervised learning is the most common subgroup of ML algorithms that are frequently used in plant breeding to predict complex traits in an early growth stage [6], detect genomic regions associated with a specific trait [7], and select superior genotypes via genomic selection [8].

2. Function

Unsupervised learning is used when data is not labeled, and the algorithm uses the data to find patterns and similarities in the dataset on its own [9]. The model learns by identifying patterns in the data, such as clusters or groups. In plant breeding, unsupervised learning is usually implemented to find possible associations among genotypes within a breeding population, design kinship matrices, and categorize unstructured datasets [9]. Reinforcement learning is another ML algorithm, in which the model is exposed to an environment and receives feedback in the form of rewards or penalties based on its actions [10]. The model learns by taking actions and adjusting its parameters to maximize the total rewards received. Reinforcement learning is quite a new area in plant breeding, and its applications need to be explored more.
Several important factors need to be taken into account for the successful use of ML algorithms in predicting a given complex trait. Factors include, but are not limited to, data collection, pre-processing, feature extraction, model training, model evaluation, hyperparameter tuning, model deployment, and model monitoring [11][12]. These factors are intensively reviewed in several studies and review papers [11][12][13]. In brief, (1) data collection is the process of gathering data from different sources (environments, genotypes, etc.) in different formats, such as images, text, numerical/categorial datasets, or video, for use in model training [14]; (2) the pre-processing step is defined as the cleaning, transforming, and organizing of data to make it more suitable for ML algorithms [11]; (3) feature extraction is the process in which features/variables are extracted from the data to be represented in a form that is more suitable for ML algorithms [15]; (4) model training uses different ML algorithms to fit models to the data [7]; (5) model evaluation is the process of assessing the accuracy and errors of the algorithm against unseen data [16]; (6) the hyperparameter tuning step contains a second round of adjusting the parameters of tested ML algorithms to achieve the best performance [4][17]; (7) model deployment is summarized as the process of deploying a developed model in production, usually in the form of an application [13]; and (8) model monitoring is the process of tracking model performance over time to ensure it remains accurate [7].
In plant breeding, data collection is an essential step involving the collection of data for target traits from a wide range of environments, trials, and plant populations. Plant breeders often work in different environmental settings in order to gain an accurate understanding of the genotype-by-environment interaction in different trials within each environment. Additionally, they measure different traits in order to establish accurate multi-trait breeding strategies, such as tandem selection, independent culling levels, and selection index. As such, any collected data must be precise, accurate, and pre-processed using various packages and software in order to be suitable for plant breeding programs. Recently, the AllInOne R-shiny package was introduced as an open-source, breeder-friendly, analytical R package for pre-processing phenotypic data [18]. The basis of AllInOne is to utilize various R packages and develop a pipeline for pre-processing the phenotypic datasets in an accurate, easy, and timely manner without any coding skills required. A brief introduction to AllInOne is available at https://github.com/MohsenYN/AllInOne/wiki (accessed on 15 February 2023). Feature extraction is another critical step in determining the most relevant variables for further analysis. The recursive feature elimination of 250 spectral properties of a soybean population revealed a significance of 395 nm, in addition to four other bands in the blue, green, red, and near-infrared regions, in predicting soybean yield [19]. This spectral band can be used to complement other important bands to enhance the accuracy of soybean-yield prediction at an early stage. Furthermore, another study investigated the potential of 34 commonly used spectral indices in anticipating the soybean yield and biomass of a Canadian soybean panel, in which the Normalized Difference Vegetation Index (NDVI) was identified as the most pivotal index in predicting soybean yield and biomass concurrently [6].
Plant breeding involves a series of tasks and data analyses that are carried out over multiple years, and, therefore, repeatability and reproducibility are two important factors to consider when establishing a plant breeding program. Plant breeders may be reluctant to use sophisticated algorithms, such as ML algorithms, for analyzing their trials because of the ambiguity regarding whether or not the results will be reproducible and repeatable. Therefore, it is of the utmost importance to ensure proper model training and evaluation and hyperparameter tuning, deployment, and monitoring when develop an algorithm. To further improve model training in plant breeding, larger datasets from different locations and years, as well as plant populations with different genetic backgrounds, should be collected [20]. Automated tuning methods can be used to optimize hyperparameters in plant breeding datasets. As an example, grid search is a popular automated tuning method, which is based on an exhaustive search for optimal parameter values [21]. Grid search works by training and evaluating a model for each combination of parameter values specified in a grid. It then selects the combination with the best results [21]. Bayesian optimization is another automated tuning method that uses Bayesian probability theory to determine the best set of parameters for a given problem [22]. Bayesian optimization works by constructing a probabilistic model of an objective function based on previously evaluated values. This model is then used to predict the optimal set of parameters for the given problem [22]. It then evaluates the performance of the system with the predicted parameters and updates the model with new information. This process is repeated to maximally optimize the model’s performance for the given dataset. Bayesian optimization is useful for optimizing complex problems with many variables or where the cost of evaluating the objective function is high [22]. As plant breeders work with different omics datasets, all of which are categorized as bigdata context, the developed algorithm can be exploited in cloud-based services such as the Google Cloud Platform to deploy models at scale [23]. To ensure optimal performance, model performance should be monitored over time and analyzed with metrics such as accuracy and precision, along with anomaly detection, to identify areas of improvement [24].
There are other components/methods that are important in reducing possible errors and increasing the ultimate accuracy of ML algorithms, including transfer learning, feature engineering, dimensionality reduction, and ensemble learning. Transfer learning is an ML technique in which a pre-trained model for a task is reused as the starting point for a model on a second task [25]. Transfer learning reduces the amount of data and computation needed to train a model, and it is particularly helpful for improving the model’s performance when the amount of training data for the second task is small [25]. Feature engineering is the process of using domain knowledge of the data to create features (variables) for the ML pipeline. Feature engineering is an informal topic, but it is considered essential in applied machine learning [26]. Feature engineering can help increase the accuracy of machine-learning models by creating features from raw data that help the model learn more effectively and accurately. Dimensionality reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables [27]. It can be divided into feature selection and feature extraction. Feature selection is the process of selecting a subset of relevant features for use in model construction [28]. Feature extraction is the process of combining or transforming existing features into more informative representations that are more useful for a given task [29]. Ensemble learning is an ML technique that combines multiple models to create more powerful and accurate models. Ensemble learning is used to improve the accuracy and robustness of ML models [28]. It combines multiple weak learners to form a strong learner that can make more accurate predictions than the single model. The most common ensemble-learning techniques are the bagging, boosting, and stacking algorithms [28].

References

  1. Falk, K.G.; Jubery, T.Z.; Mirnezami, S.V.; Parmley, K.A.; Sarkar, S.; Singh, A.; Ganapathysubramanian, B.; Singh, A.K. Computer vision and machine learning enabled soybean root phenotyping pipeline. Plant Methods 2020, 16, 1–19.
  2. Jafari, M.; Shahsavar, A. The application of artificial neural networks in modeling and predicting the effects of melatonin on morphological responses of citrus to drought stress. PLOS ONE 2020, 15, e0240427.
  3. Yoosefzadeh-Najafabadi, M.; Rajcan, I.; Vazin, M. High-throughput plant breeding approaches: Moving along with plant-based food demands for pet food industries. Front. Vet. Sci. 2022, 9, 1467.
  4. Hesami, M.; Alizadeh, M.; Jones, A.M.P.; Torkamaneh, D. Machine learning: Its challenges and opportunities in plant system biology. Appl. Microbiol. Biotechnol. 2022, 106, 3507–3530.
  5. Hesami, M.; Jones, A.M.P. Application of artificial intelligence models and optimization algorithms in plant cell and tissue culture. Appl. Microbiol. Biotechnol. 2020, 104, 9449–9485.
  6. Yoosefzadeh-Najafabadi, M.; Tulpan, D.; Eskandari, M. Using hybrid artificial intelligence and evolutionary optimization algorithms for estimating soybean yield and fresh biomass using hyperspectral vegetation indices. Remote Sens. 2021, 13, 2555.
  7. Yoosefzadeh-Najafabadi, M.; Eskandari, M.; Torabi, S.; Torkamaneh, D.; Tulpan, D.; Rajcan, I. Machine-learning-based genome-wide association studies for uncovering QTL underlying soybean yield and its components. Int. J. Mol. Sci. 2022, 23, 5538.
  8. Yoosefzadeh-Najafabadi, M.; Rajcan, I.; Eskandari, M. Optimizing genomic selection in soybean: An important improvement in agricultural genomics. Heliyon 2022, 8, e11873.
  9. Mahesh, B. Machine learning algorithms-a review. Int. J. Sci. Res. 2020, 9, 381–386.
  10. Lee, D.; Seo, H.; Jung, M.W. Neural basis of reinforcement learning and decision making. Annu. Rev. Neurosci. 2012, 35, 287.
  11. Vabalas, A.; Gowen, E.; Poliakoff, E.; Casson, A.J. Machine learning algorithm validation with a limited sample size. PLoS ONE 2019, 14, e0224365.
  12. Sakib, S.; Fouda, M.M.; Fadlullah, Z.M.; Nasser, N. Migrating intelligence from cloud to ultra-edge smart IoT sensor based on deep learning: An arrhythmia monitoring use-case. In Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), Limassol, Cyprus, 15–19 June 2020; pp. 595–600.
  13. Derakhshan, B.; Mahdiraji, A.R.; Rabl, T.; Markl, V. Continuous Deployment of Machine Learning Pipelines. Open Proceed 2019, 36, 397–408.
  14. Xu, Y.; Zhang, X.; Li, H.; Zheng, H.; Zhang, J.; Olsen, M.S.; Varshney, R.K.; Prasanna, B.M.; Qian, Q. Smart breeding driven by big data, artificial intelligence and integrated genomic-enviromic prediction. Mol. Plant 2022.
  15. Zhang, N.; Gupta, A.; Chen, Z.; Ong, Y.-S. Evolutionary machine learning with minions: A case study in feature selection. IEEE Trans. Evol. Comput. 2021, 26, 130–144.
  16. Niazian, M.; Niedbała, G. Machine learning for plant breeding and biotechnology. Agriculture 2020, 10, 436.
  17. Pepe, M.; Hesami, M.; Jones, A.M.P. Machine Learning-Mediated Development and Optimization of Disinfection Protocol and Scarification Method for Improved In Vitro Germination of Cannabis Seeds. Plants 2021, 10, 2397.
  18. Yoosefzadeh-Najafabadi, M.; Heidari, A.; Rajcan, I. AllInOne. 2022. Available online: https://github.com/MohsenYN/AllInOne/wiki (accessed on 15 February 2023).
  19. Yoosefzadeh-Najafabadi, M.; Earl, H.J.; Tulpan, D.; Sulik, J.; Eskandari, M. Application of machine learning algorithms in plant breeding: Predicting yield from hyperspectral reflectance in soybean. Front. Plant Sci. 2021, 11, 624273.
  20. Beyene, Y.; Semagn, K.; Mugo, S.; Tarekegne, A.; Babu, R.; Meisel, B.; Sehabiague, P.; Makumbi, D.; Magorokosho, C.; Oikeh, S. Genetic gains in grain yield through genomic selection in eight bi-parental maize populations under drought stress. Crop Sci. 2015, 55, 154–163.
  21. Fu, W.; Nair, V.; Menzies, T. Why is differential evolution better than grid search for tuning defect predictors? arXiv 2016, arXiv:1609.02613.
  22. Wang, X.; Jin, Y.; Schmitt, S. Recent Advances in Bayesian Optimization. arXiv 2022, arXiv:2206.03301v2.
  23. Harfouche, A.L.; Jacobson, D.A.; Kainer, D.; Romero, J.C.; Harfouche, A.H.; Mugnozza, G.S.; Moshelion, M.; Tuskan, G.A.; Keurentjes, J.J.; Altman, A. Accelerating climate resilient plant breeding by applying next-generation artificial intelligence. Trends Biotechnol. 2019, 37, 1217–1235.
  24. Laptev, N.; Amizadeh, S.; Flint, I. Generic and scalable framework for automated time-series anomaly detection. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, Sydney, Australia, 10–13 August 2015; pp. 1939–1947.
  25. Agarwal, N.; Sondhi, A.; Chopra, K.; Singh, G. Transfer learning: Survey and classification. Smart Innovations in Communication and Computational Sciences; Springer: Singapore, 2021; pp. 145–155.
  26. Turner, C.R.; Fuggetta, A.; Lavazza, L.; Wolf, A.L. A conceptual basis for feature engineering. J. Syst. Softw. 1999, 49, 3–15.
  27. Huang, X.; Wu, L.; Ye, Y. A review on dimensionality reduction techniques. Int. J. Pattern Recognit. Artif. Intell. 2019, 33, 1950017.
  28. Najafabadi, M.Y. Using Advanced Proximal Sensing and Genotyping Tools Combined with Bigdata Analysis Methods to Improve Soybean Yield. Ph.D. Thesis, University of Guelph, Guelph, ON, Canada, 2021.
  29. Christ, M.; Braun, N.; Neuffer, J.; Kempa-Liehr, A.W. Time series feature extraction on basis of scalable hypothesis tests (tsfresh–a python package). Neurocomputing 2018, 307, 72–77.
More
Information
Subjects: Agronomy
Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register : , ,
View Times: 306
Revisions: 2 times (View History)
Update Date: 20 Apr 2023
1000/1000
ScholarVision Creations