UAV-Based Computer Vision for tree inventory: History
Please note this is an old version of this entry, which may differ significantly from the current revision.

Accurate and efficient orchard tree inventories are essential for acquiring up-to-date information, which is necessary for effective treatments and crop insurance purposes. Surveying orchard trees, including tasks such as counting, locating, and assessing health status, plays a vital role in predicting production volumes and facilitating orchard management.

  • unmanned aerial vehicle (UAV)
  • DeepForest
  • YOLO

1. Background

Tree diseases can have a significant impact on orchard quality and productivity, which is a major concern for the agricultural industry. Unhealthy or stressed orchard trees are more susceptible to pests, diseases, and environmental stressors, which can reduce the yield and quality of fruit and lead to financial losses for growers. Therefore, developing methods for surveying and monitoring tree health and production quality is essential for orchard management. This can help growers make informed decisions about practices such as irrigation, fertilization, and pest control, optimize orchard yield and quality, reduce input usage, and improve the long-term sustainability of orchard production systems.
Traditional methods of monitoring orchard tree health, such as manual inspection and visual examination, rely on human expertise to determine quantitative orchard tree parameters. These methods are labor-intensive, time-consuming, costly, and subject to errors. In recent years, remote sensing platforms such as satellites, airplanes, and unmanned aerial vehicles (UAVs) [1,2,3,4] have provided new tools that offer an alternative to traditional methods. Deep neural networks (DNNs) [5] have also emerged as a powerful tool in the field of machine learning. The high spatial resolution provided by UAV images combined with computer vision algorithms have made tremendous advances in several domains such as forestry [6], agriculture [7], geology [8], surveillance [9], and traffic monitoring [10].

2. Tree Detection

Over the years, both classical machine learning and deep learning methods have been extensively explored to address the tree detection problem.
Classical object detection methods often involve the utilization of handcrafted features and machine learning algorithms. Local binary pattern (LBP) [11], scale-invariant feature transform (SIFT) [12,13], and histogram of oriented gradients (HOG) [14,15] are the most frequently used handcrafted features in object detection. For example, the work in [16] presented a traditional method for walnut, mango, orange, and apple tree detection. It adopts the template matching image processing approach to very high resolution (VHR) Google Earth images acquired over a variety of orchard trees. The template is based on a geometrical optical model created from a series of parameters, such as illumination angles, maximum and ambient radiance, and tree size specifications. In [17,18], the authors detected palm trees on UAV RGB images by extracting a set of key points using the scale-invariant feature transform (SIFT). The key points are then analyzed with an extreme learning machine (ELM) classifier, which is a priori trained on a set of palm and no-palm tree keypoints. Similarly, ref. [19] employed a support vector machine (SVM) for image classification into vegetation and non-vegetation patches. Subsequently, the HOG feature extractor was applied on vegetation patches for feature extraction. These extracted features were then used to train a SVM to recognize palm tree images from background regions. The study in [20] proposed an object detection method using shape features for detecting and counting palm trees. The authors employed circular autocorrelation of the polar shape (CAPS) matrix representation as the shape feature and the linear SVM to standardize and reduce the dimensions of the feature. Finally, the study uses a local maximum detection algorithm based on the spatial distribution of standardized features to detect palm trees. The work in [7] presented a method to detect apple trees using multispectral UAV images. The authors identified trees using thresholding techniques applied on the Normalized Difference Vegetation Index (NDVI) and entropy images, as trees are chlorophyllous bodies that have high NDVI values and are heterogeneous with high entropy. The work in [21] proposed an automated approach to detect and count individual palm trees from UAV images. It is based on two processing steps: first, the authors employed the NDVI to perform the classification of image features as trees and non-trees. Then, palm trees were identified based on texture analysis using the Circular Hough Transform (CHT) and the morphological operators. In [22], the authors applied k-means to perform color-based clustering followed by a thresholding technique to segment out the green portion of the image. Then, trees were identified by applying an entropy filter and morphological operations on the segmented image.
On the other hand, numerous studies have investigated the use of deep-learning algorithms to detect trees in UAV RGB imagery. For instance, ref. [23] detected citrus and other crop trees from UAV images using a CNN algorithm applied to four spectral bands (i.e., green, red, near infrared and red edge). The initial detection was followed by a classification refinement procedure using superpixels derived from a Simple Linear Iterative Clustering (SLIC) algorithm and a thresholding technique to address the confusion between trees and weeds and deal with the difficulty in distinguishing small trees from large trees. In [24], the authors adopted a sliding window approach for oil palm tree detection. A sliding window was integrated with a pre-trained AlexNet classifier to scan the input image and identify regions containing trees. The work in [25] exploited the use of state-of-the-art CNNs, including YOLO-v5 with its four sub-versions, YOLO-v3, YOLO-v4, and SSD300 in detecting date palm trees. Similarly, in [26], the authors explored the use of the YOLO-v5 with its subversions and DeepForest for the detection of orchard trees. In [27], three state-of-the-art object detection methods were evaluated for the detection of law-protected tree species: Faster Region-based Convolutional Neural Network (Faster R-CNN) [28], YOLOv3 [29], and RetinaNet [30]. Similarly, the work in [31] explored the use of Faster R-CNN, Single Shot Multi-Box Detector (SSD) [32], and R-FCN [33] architectures to detect seedlings.
Most of these works explored fine-tuning state-of-the-art object detectors for tree detection by taking an object detection model that is pre-trained on Benchmark datasets [34,35,36] and adapting it specifically for the task of detecting trees. However, applying these methods to UAV images has particular challenges [37] compared to conventional object detection tasks. For example, UAV images often have a large field of view with complex background regions, which can significantly disrupt detection accuracy. Furthermore, the objects of interest are often not uniformly distributed with respect to the background regions, creating an imbalance between positive and negative examples. Data imbalance can also be observed between easy and hard negative examples, since with UAV images, a large part of the background has regular patterns and can be easily analyzed for detection. We believe that applying deep learning detection algorithms directly in these situations is not an optimal choice [38], as they mostly assign the same weight to all training examples, so that during the training step easy examples may dominate the total loss and reduce training efficiency.
To mitigate this issue, hard negative mining (HNM) can be adopted for object detection. Various HNM approaches [37,39,40] involve iteratively bootstrapping a small set of negative examples, by selecting those that trigger a false positive alarm in the detector. For example, ref. [41] presented a training process of a state-of-the-art face detector by exploiting the idea of hard negative mining and iteratively updating the Faster R-CNN-based face detector with hard negatives harvested from a large set of background examples. Their method outperforms state-of-the-art detectors on the Face Detection Data Set and Benchmark (FDDB). Similarly, an improved version of faster R-CNN is proposed in [42], by using hard negative sample mining for object detection using PASCAL VOC dataset [36]. Likewise, ref. [43] used the bootstrapping of hard negatives to improve the performance of face detection on WIDER FACE dataset [44]. The authors pre-trained Faster R-CNN to mine hard negatives, before retraining the model. The work of [45] presented a cascaded Boosted Forest for pedestrian detection, which performs effective hard negative mining and sample reweighting, to classify the region proposals generated by RPN. The A-Fast-RCNN method, described in [46], adopts a different approach for generating hard negative samples, by using occlusion and spatial deformations through an adversarial process. The authors conducted their experiments on PASCAL VOC and MS-COCO datasets. Another approach to apply HNM using Single Shot multi-box Detector (SSD) is proposed in [47], where the authors use medium priors, anchor boxes with 20% to 50% overlap with ground truth boxes, to enhance object detector performance on the PASCAL VOC dataset. The proposed framework updates the loss function so that it considers the anchor boxes with partial and marginal overlap.

3. Tree Health Classification

Vegetation indices (VIs) have been introduced as indicators of vegetation status, as they provide information on the physiological and biochemical status of trees. These mathematical combinations of reflectance measurements are sensitive to different vegetation parameters, such as chlorophyll content, leaf area, and water stress. It has been shown through many studies [48,49,50] that by analyzing these indices, we can gain insights into the health and vitality of trees.
For example, the work in [51] presented a framework for orchard tree segmentation and health assessment. The proposed approach is applied to five different orchard tree species, namely plum, apricot, walnut, olive, and almond. Two vegetation indices, including visible atmospherically resistant index (VARI) and green leaf index (GLI) were used with the standard score (which is also known as z-score) for tree health assessment. The study in [52] proposed a process workflow for mapping and monitoring olive orchards at tree scale detail. Five VIs were investigated, including normalized difference vegetation index (NDVI), modified soil adjusted vegetation index 2 (MSAVI 2), normalized difference red edge Vegetation index (NDRE), modified chlorophyll absorption ratio index improved (MCARI2), and NDVI2. The authors applied statistical analyses to all calculated VIs. Similarly, ref. [53] presented an approach for Huanglongbing (HLB) disease detection on citrus trees. First, the trees were segmented using thresholding techniques applied on the normalized difference vegetation index (NDVI). Then, for each segmented tree, a total number of thirteen spectral features was computed, which include six spectral bands and seven vegetation indices. The indices studied were: NDVI, green normalized difference vegetation index (GNDVI), soil-adjusted vegetation index (SAVI), near infrared (NIR)—red(R), R/NIR, green (G)/R and NIR/R. A SVM classifier is then applied to distinguish between healthy and HLB-infected trees. The work in [54] presented a method for the identification of stress in olive trees. An SVM model was applied to VIs to classify each tree pixel into two categories: healthy and stressed. The work in [48] presented a method to monitor grapevine diseases affecting European vineyards. The authors explored the use of different features including spectral bands, vegetation indices and biophysical parameters. They conducted a statistical analysis for the selection of the best discriminating variables to separate between symptomatic vines including FD and GTD from asymptomatic vines (Case 1) and FD vines from GTD ones (Case 2).

This entry is adapted from the peer-reviewed paper 10.3390/rs15143558

This entry is offline, you can click here to edit this entry!
Video Production Service