Soybean Monitoring and Management: History
Please note this is an old version of this entry, which may differ significantly from the current revision.

The interest in deep learning in agriculture has been continuously growing since the inception of this type of technique in the early 2010s. Soybeans, being one of the most important agricultural commodities, has frequently been the target of efforts in this regard. In this context, it can be challenging to keep track of a constantly evolving state of the art.

  • deep learning
  • digital images
  • crops

1. Introduction

Soybean (Glycine max L.) has become the most important oilseed crop and one of the five most important food crops worldwide [1,2]. Its high protein content makes soybean a prime source of feed for livestock, and soybean oil is used for both human consumption and industrial applications [1]. While the demand for soybeans continues to grows worldwide [3], environmental pressures due to climate change are becoming more widespread and extreme [4]. In order for the soybean yield to keep up with the demand, new solutions to current production limitations are needed. Although extensive breeding efforts have led to the development of varieties quite robust to different conditions, soybean crops are still vulnerable to many factors. Stresses caused by diseases, pests, unfavorable weather, nutrition imbalances, and others, are responsible for losses that can easily surpass 20% of the total world production [2]. Although completely eliminating losses is likely unfeasible, closely monitoring each one of the relevant variables can greatly mitigate the problem [5]. However, continuous monitoring may require too large a workforce, unless some type of automation is employed. In this context, artificial intelligence techniques emerge as powerful aiding tools for farm monitoring and management [6].
One of the possible definitions for artificial intelligence (AI) states that this is “a computational data-driven approach capable of performing tasks that normally require human intelligence to independently detect, track, or classify objects” [7]. Techniques fitting this definition have existed for many decades, including expert systems, neural networks and other types of machine learning algorithms. With the inception of deep learning models in the first half of the 2010s, the application of artificial intelligence has grown steeply both in number and scope. This is certainly true in the case of agriculture, for which applications like plant disease recognition [5], yield estimation [8], plant nutrition status [9], and biomass estimation [10], among many others, have experienced a surge in the number of articles employing artificial intelligence. Among AI techniques, deep learning has been particularly successful and well adapted to difficult classification problems. One of the reasons for this success is that with deep learning, the explicit extraction of features from the data is no longer required [11,12], making the classification process more straightforward, less biased and more robust to different types of conditions [13].
While the leap between academic research and practical solutions has been successfully completed in some cases (e.g., weed detection and control), in most cases, real-world conditions and variability are too challenging for techniques and models that are, more often than not, trained on data that represent only a small fraction of reality [14]. The most direct way to address this problem is to expand the datasets used to train the models. This is by no means a trivial task, especially considering that the variability involved in some classification problems may require a number of images that can reach the order of millions. Increasing the practice of data sharing and exploring citizen science concepts can help reduce the problem, but in many situations, all-encompassing datasets may be unfeasible [14]. If supervised learning is adopted, there is the additional challenge of data annotation, a process that is often expensive, time-consuming and error-prone [15]. New annotation strategies capable of speeding up the process are already being studied [16], but these were still incipient when this article was written.
AI models failing when presented with new data with distinct statistical distribution, a phenomenon often called “covariate shift” [17], is arguably the most important hurdle for the more effective use of artificial intelligence-based technologies in agriculture, but other factors are almost always present. Each application has its own challenges, so systemically understanding how data and technical issues affect the performance of the models is fundamental for the construction of suitable solutions. Many of these challenges have already been experienced in previous studies and reported in the literature, so a proper understanding of the current state of the art is critical to the novelty of new research and to avoid repeating mistakes. Research on deep learning applied to crop management has been extensive across different types of crops, so including all research would make the article somewhat redundant and impractically long. Soybean, being a major agricultural commodity, has received considerable attention from researchers, to the point that it encapsulates most of the approaches adopted across different crops. This was the main motivation for narrowing the scope of the review to research related to this crop only.
The use of deep learning for soybean monitoring started to gain momentum after 2015. Early research was mostly dedicated to disease and pest detection, but soon, applications like phenotyping, seed counting, cultivar identification and yield prediction began being explored. Since the beginning, studies have been focusing on the investigation of different deep learning models and architectures in the context of each different application and domain. Although this type of research has yielded relevant results, there are not many technologies being effectively used in practice. One important exception is weed detection, as machinery from different manufacturers already have the ability to not only detect the weeds but also actuate to eliminate the problem. For most applications, there are still significant challenges that require more suitable solutions. New approaches emphasizing model interpretability and fine tuning are beginning to be explored [18,19,20], but the research gap is still substantial.

2. Definitions and Acronyms

Most of the definitions are adapted from [7,21]. A list of acronyms used in this article with the respective meanings is given in Table 1.
Table 1. Acronyms used in this review.
Artificial intelligence: a computational data-driven approach capable of performing tasks that normally require human intelligence to independently detect, track, or classify objects.
Data annotation: the process of adding metadata to a dataset such as indicating the objects of interest in an image. This is typically performed manually by human specialists.
Deep learning: a special case of machine learning that utilizes artificial neural networks with many layers of processing to implicitly extract features from the data and recognize patterns of interest. Deep learning is appropriate for large datasets with complex features and where there are unknown relationships within the data.
Domain adaptation: techniques that have the objective of adapting the knowledge learned in a source domain to apply it to a different but related target domain.
Image augmentation: process of applying different image processing techniques to alter existing images in order to create more data for training the model.
Machine learning: application of artificial intelligence (AI) algorithms that underpin the ability to learn characteristics of the classes of interest via extraction of features from a dataset. Once the model is developed, it can be used to predict the desired output on test data or unknown images.
Model: a representation of what a machine learning program has learned from the data.
Overfitting: when a model closely predicts the training data but fails to fit the testing data.
Proximal images: images captured in close proximity to the objects of interest.
Segmentation: the process of partitioning a digital image containing the objects of interest into multiple segments of similarity or classes either automatically or manually. In the latter case, the human-powered task is also called image annotation in the context of training AI algorithms.
Semi-supervised learning: a combination of supervised and unsupervised learning in which a small portion of the data is used for a first supervised training, and the remainder of the process is carried out with unlabeled data.
Supervised learning: a machine learning model, based on a known labeled training dataset that is able to predict a class label (classification) or numeric value (regression) for new unknown data.
Transfer learning: machine learning technique that transfers knowledge learned from one domain to other. Weight fine-tuning and domain adaptation are arguably the most employed transfer learning techniques.
Unsupervised learning: machine learning that finds patterns in unlabeled data.

This entry is adapted from the peer-reviewed paper 10.3390/seeds2030026

This entry is offline, you can click here to edit this entry!
Video Production Service