Autonomous Navigation of Agricultural Robots: Comparison
Please note this is a comparison between Version 1 by Pablo Gonzalez-De-Santos and Version 2 by Camila Xu.

Regarding agricultural harvesting robots, they typically consist of mobile platforms carrying robotic arms. These robots require advanced vision systems, employing adaptive thresholding algorithms, as well as texture-based methods and color shape characteristic extraction, to identify target fruits.

  • object detection
  • precision agriculture
  • agricultural robots
  • crop identification

1. Introduction

In the last hundred years, the world’s population has quadrupled. In 1915, there were 1.8 billion individuals inhabiting the planet. Based on the latest UN estimate, the global population stands at 8 billion, with projections suggesting a potential increase to 9.7 billion by 2050 and 10.4 billion by 2100 [1]. Population growth, compounded by growing challenges to global food security, including increasing dependence on animal-based foods, declining water and land resources, and the impacts of climate change, is amplifying the urgent global need for food [2].
The expected demand for food would maintain constant growth over the next 30 years, reaching a 46.8% increase in global request for food crop production in 2050 relative to 2020 [2][3][2,3]. Therefore, farmers worldwide need to boost crop production by expanding agricultural land for cultivation or improving productivity on existing farmlands by applying fertilizers and irrigation. However, one of the most promising strategies in recent years is the adoption of innovative approaches like precision agriculture (PA), characterized by utilizing modern information and communication technologies to enhance agricultural productivity and profitability, which has recently garnered significant interest [4]. PA involves technologies that integrate sensors, information systems, sophisticated machinery, and informed decision making and aims to enhance agricultural production by effectively managing variations and uncertainties inherent in agriculture systems.

2. Significance of Technology and Automation in Agriculture

One of the elements that has shown the most potential to apply advanced PA techniques has been the use of robotic systems, whose usage and incorporation into the crop field have increased in recent years [5][8]. These robots are produced to perform all types of tasks in the field, such as guidance and mapping, automated harvesting, site-specific fertilization, environmental conditions monitoring, livestock monitoring, pesticide spraying, and precision weed management, among others. Localization and guidance techniques in agriculture encompass a variety of approaches, such as Extended Kalman Filter (EKF) [6][9], Particle Filter (PF) [7][10], and Visual Odometry (VO) [8][11], among others, which have enabled robots to determine their position within the agricultural environment. Mapping applications involve the creation of maps using techniques like metric-semantic mapping, Light Detection and Ranging (LiDAR) mapping, and fusion of point cloud maps, enabling robots to navigate and interact with the agricultural landscape while facilitating specific tasks like fruit monitoring or weed control [9][12]. Regarding agricultural harvesting robots, they typically consist of mobile platforms carrying robotic arms [10][13]. These robots require advanced vision systems, employing adaptive thresholding algorithms, as well as texture-based methods and color shape characteristic extraction, to identify target fruits. The integration of color and depth data, the utilization of reinforcement learning methods [11][12][14,15], and the application of deep Convolutional Neural Networks for image segmentation [13][16] are some of the strategies currently commonly used in the background and foreground. Additionally, these robots rely on human–robot interaction strategies with 3D visualization, systematic operational planning, efficient grasping methods, and well-designed grippers [14][17]. One of the applications that has shown the most interest in being automated is pesticide spraying [15][18] and weed management in general [16][19]. These robots are designed to minimize pesticide wastage by targeting pest-affected areas on plants. Strategies such as Convolutional Neural Network (CNN) have proven to be highly beneficial for accurately identifying pests on crops, ensuring precise pesticide application. Moreover, several autonomous commercial robots have emerged designed for weed removal and precision agriculture tasks in recent years. The Small Robot Company [17][20] has introduced three robots: Tom digitizes the field, Dick zaps weeds with electricity, and Harry sows and records seed location, reducing the need for chemicals and heavy machinery. EcoRobotix [18][21], a Swiss prototype, employs computer vision to identify weeds and selectively spray them with a small dose of herbicide, significantly reducing herbicide usage. AVO [19][22] uses machine learning for centimeter-precise weed detection and spraying, minimizing herbicide volume by over 95% while preserving crop yield. Tertill by Franklin Robotics [20][23] recognizes and cuts weeds using sensors and operates on solar power. TerraSentia [21][24] is an agricultural robot designed for autonomous weed detection. One of the main ingredients that allows the inclusion of mobile robots in the field and the execution of precision agriculture tasks is Global Navigation Satellite Systems (GNSS) [22][25]. Currently, commercial mobile robots perform tasks where full coverage maps are generally used, such as land preparation and seeding [23][26]. Many others focus on perennial crops. This is because navigating in arable land with crops in an early stage of growth is challenging, especially if a precise map of the location of each plant is not available, which can create a risk of damaging the plants.

3. Machine Vision in Agriculture

In the past five decades, AI has demonstrated its resilience and ubiquity across various domains, and agriculture is no exception [24][27]. Currently, agriculture deals with numerous challenges, such as crop disease control, management of pesticide use, ecological weed control, and efficient irrigation management, among many others. Machine learning (ML), a subset of AI, has contributed to addressing all of these challenges [25][28]. AI-based systems, especially those utilizing artificial neural networks, have become reliable and effective solutions for agricultural purposes thanks to their predictive capabilities, parallel reasoning, and ability to adapt through training [26][29]. They excel in complex mapping tasks when they are provided with a reliable set of variables [27][30], like forecasting water resource variables [28][31] or predicting nutrition level in crops [29][32]. Neural networks have played a pivotal role in driving substantial advancements in machine vision, leading to a remarkable surge in their utilization within agriculture. Machine vision facilitates precise crop identification and assessment, delivering significant benefits [30][33]. Capturing insights into crop development enables actions in accordance with the agricultural cycle stage. Moreover, it aids in detecting plant pests and diseases, assessing fertilizer requirements, and optimizing irrigation management [31][34]. To leverage machine vision for identification, close-range or even overhead views are often needed. Additionally, machine vision offers the valuable capability of accurate crop localization [32][35]. This feature empowers various on-field tasks, including treatments at distinct crop growth stages and efficient harvesting. Furthermore, having precise crop location data proves advantageous for field navigation, mitigating the risk of crop damage during agricultural operations [22][25]. Numerous experts within smart agriculture demand the involvement of agricultural robots, encompassing activities like fruit harvesting and crop yield tracking [33][36]. Consequently, agricultural robots and associated technologies hold a crucial role within the domain of smart agriculture [34][37]. Among these technologies, automatic navigation stands out as the fundamental and central function of autonomous agricultural robots [35][38]. In contrast to alternative navigation technologies, machine vision has witnessed growing adoption in autonomous agricultural robots. This preference arises from its merits, including cost-effectiveness, simplicity of maintenance, versatile applicability, and a high degree of intelligence [36][39]. Recent years have seen a proliferation of novel approaches, technologies, and platforms within the topic of machine vision, all of which have found pertinent applications in agricultural robotics [36][39]. Li et al. [37][40] proposed a navigation line extraction method specifically designed to cater to various stages of cotton plant growth, from emergence to blooming. Zhang et al. [38][41] investigated navigation in greenhouses by employing contour and height data of crop rows. The fusion of these features generated a confidence density image, which aided in determining heading and lateral errors. Experiments conducted in an indoor simulation environment have shown that agricultural robots can maneuver through rows of crops in S and O configurations. In the agricultural field, image analysis stands as a significant research domain, where intelligent data analysis methods are actively deployed for tasks such as image recognition, classification, anomaly detection, and more across diverse agricultural applications [39][42].

4. Crop Detection and Localization

Numerous algorithms have been devised and tested to facilitate robot autonomous navigation, and new solutions are still being sought [36][39], many of which use crop row detection as a guide [40][43]. The challenge in row recognition lies in identifying robust features that remain unchanged in different environmental situations. The complexity of this task is heightened by factors like incomplete rows, absent plants, irregular size and shape of plants, and variability in light intensity, which is one of the significant challenges of computer vision in open environments. Additionally, the presence of weeds within the row can introduce noise and disrupt the row recognition process. These research studies mainly focus on advancing image segmentation techniques to extract orientation cues for crop row applications [41][44]. When dealing with crops in their early growth stages, characterized by considerable spacing between individual plants, applying segmentation techniques may not be the most suitable approach. While proficient at delineating the entire plant’s surface, image segmentation models face limitations in accurately capturing the morphology of plants during their initial growth phases [42][45]. Moreover, the presence of weeds and other vegetation can introduce inaccuracies in the results generated by segmentation algorithms. Another drawback lies in the intricacies associated with data preparation for effective model training. In such scenarios, it proves advantageous to employ object detection techniques [43][46], which facilitate the precise identification of each plant and subsequently enable the crop lines to be estimated. While segmentation models require delineating all plant edges, a task further complicated by the limited availability of open databases for training purposes, object detection models necessitate the identification of bounding boxes around objects of interest [44][47], simplifying the annotation procedure. Object detection models have traditionally found primary applications in agriculture for tasks such as fruit detection [45][48]; however, their distinct advantages over segmentation models, particularly when crops exhibit spacing between individual plants, position them as a compelling alternative [36][39]. The versatility and accuracy offered by object detection models in identifying and localizing individual plants in such settings underscore their potential utility for various agricultural applications beyond the conventional use cases.

5. Object Detection

Object detection algorithms represent a vital computer vision technique within Artificial Intelligence and machine learning. Their fundamental purpose involves identifying and localizing objects within images or video frames. These algorithms play a pivotal role in recognizing multiple objects of interest within a given image while providing crucial information regarding their positions and shapes [25][28]. Over the past two decades, object detection has seen a remarkable technological evolution, significantly impacting the entire field of computer vision. This evolution has transitioned from traditional detection models in the early 2000s to the profound influence of deep learning, notably with the advent of Convolutional Neural Networks (CNNs) [46][49]. The introduction of regions with CNN features (RCNN) by R. Girshick in 2014 [44][47] marked a pivotal moment in the rapid advancement of object detection. In the era of deep learning, object detectors fall into two categories: “two-stage detectors” and “one-stage detectors”. The former follows a “coarse to fine” approach, while the latter aims to “complete in one step”. Examples of “two-stage detectors” include RCNN, SPPNet (Spatial Pyramid Pooling Network), Fast RCNN, Faster RCNN, and Feature Pyramid Networks (FPN). On the other hand, “one-stage detectors” encompass models like You Only Look Once (YOLO), Single Shot MultiBox Detector (SSD), CenterNet, and DEtection TRansformer (DETR) [46][47][49,50]. YOLO has emerged as a widely adopted and influential algorithm in object detection. Its distinctive features include model compactness and rapid computation speed [48][51]. YOLO gained prominence by introducing its first version by Redmon et al. in 2015 [49][52] and has since seen several subsequent versions published by scholars. Comparisons of object detection open-source models, such as the one created by B. Jabir et al. [50][53], have underscored the speed and lightweight nature of YOLO v5. Generally, the Mean Average Precision (mAP) metric is commonly used to evaluate the performance of any trained model. mAP is a standard evaluation metric widely employed in machine learning models and benchmark challenges like Pattern Analysis, Statistical Modelling, and Computational Learning Visual Object Classes (PASCAL VOC) [51][54]; Common Objects in Context (COCO) [52][55]; ImageNET (database for visual object recognition software) [53][56]; and the Google Open Images Challenge [54][57]. mAP offers a comprehensive assessment of a model’s performance, mainly in tasks involving multi-class or multi-label classification, such as object detection and image segmentation, by considering precision and recall across multiple classes or labels. To determine the success of object detection models, it is crucial to establish the level of overlap between the bounding box and the ground truth data. One way to determine this is through the use of intersections over unions (IOU), where mAP50 refers to the accuracy level where IOU equals 50%. Thus, detection is successful when there is more than a 50% overlap.

6. Crop Identification

Crop identification using computer vision is a research field that merges image processing, machine learning, and agronomy practices to recognize diverse crop types from images taken mainly by color cameras. The objective focuses on facilitating the management and protection of crops. Computer vision originated in the 1960s [55][58] when early studies acknowledged the necessity of aligning two-dimensional image features with three-dimensional object representations, which were applied to pattern recognition systems. Since then, significant progress has been made by using the results in many sectors, among which agriculture stands out, where it has been applied to the recognition of plants with significant results. In the last decade, some works have implemented the identification of plants and their position, relying on the plants’ outline characteristics [56][59]. CNNs have also been used for soybean image recognition [57][60]. Other methods have mixed deep CNNs and binary hash codes for field weed identification [58][61]. Moreover, some methods have focused on identifying color characteristics to discern between sugar beet plants and weeds, reaching accuracies more significantly than 90% [59][62]. Further work on sugar beet recognition [60][63] reports the utilization of local features such as speeded-up robust features (SURF), scale-invariant feature transform (SIFT), and twin leaf region (TLR) in place of characteristics that describe the complete plant outlines. The method used SIFT to define points of interest that were previously found by applying the Hessian–Laplace detector [61][64]. This method can distinguish thistles and sugar beets with an accuracy close to 100%. Significant advances have also been carried out to identify maize in weeding tasks. This crop is challenging to identify because it has thin and sparsely distributed leaves [62][65], which hurts calculation time and the robustness of the algorithms [63][64][66,67]. In some recent proposals [65][68], the primary emphasis is on comprehensive image processing and centroid detection strategies. The images are first subjected to processing to separate the green color using the RGB color mode and Otsu’s threshold method [66][69]. Subsequently, the positioning of the maize plants is computed with the pixel projection histogram. One challenge in this research is the constrained capacity to differentiate among various plant species. This suffices to locate crop plants for weeding, especially when using mechanical tools, where everything that is not a crop is destroyed. However, in weeding tasks that require distinguishing each weed species to apply a specific process (herbicides, laser, etc.), they need innovative techniques to manage a significant variety of plant species.

7. Determination of Crop Growth Stage

Until a few years ago, the estimation of the growth stage of crops using computer vision techniques presented low precision rates with the limitation that the experiments reported in the literature were carried out with very few images or were limited to very few growth stages [67][70]. For example, a study was reported in [68][71] in which two growth stages of maize (emergence and three leaves) were estimated using an image segmentation method combined with affinity propagation clustering to classify only two growth stages and training with a few samples. With these restrictions, the algorithm achieved a classification accuracy greater than 96%. Other methods tested have been regression analyses from 2D images to model rice panicles [69][72]. This study focused only on the last growth stage, achieving an accuracy greater than 90%. At the same time, color histograms and a Support Vector Machine (SVM) classifier were studied to estimate four different stages of rice growth using RGB images [70][73]. Regarding maize, a previous study examined early growth (6 days) using RGB images [71][74]. The process involved taking images digitally, changing the RGB level to grayscale, cropping the image, and then calculating plant growth using regional growth. From the study results, regional growth can be used to estimate plant length growth with length and time parameters. Advances in machine learning, especially deep CNN processing, now allow crop growth to be estimated with reasonable precision based on images. One applied method [72][75] uses low-level image feature extraction schemes, scale-invariant, mid-level representation, and an SVM to learn and classify wheat growth stages. The work focuses on estimating only two growth stages for six wheat varieties.