Tree Detection and Crown Delineation using UAV-SfM Data

Tree Detection and Crown Delineation using UAV-SfM Data: History

View Latest Version

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Remote Sensing

Contributor:

Felix Bachmann

Friederike Metz

Marius G. Heidenreich

Franziska Koebsch

Sören Hese

Clémence Dubois

Christian Thiel

Accurate detection and delineation of individual trees and their crowns in dense forest environments are essential for forest management and ecological applications. This research explores the potential of combining leaf-off and leaf-on structure from motion (SfM) data products from unoccupied aerial vehicles (UAVs) equipped with RGB cameras.

unoccupied aerial vehicle (UAV)
structure from motion (SfM)
leaf-off
leaf-on
deciduous forest
individual tree crown delineation (ITCD)
tree detection

1. UAV Imagery Processing Using Structure from Motion

One of the great advantages of UAV data is their ability to generate 3D information, such as 3D point clouds, from 2D drone imagery applying photogrammetric processing steps, commonly known as structure from motion (SfM) [16]. During data acquisition, highly overlapping images are captured, providing different perspectives on the same ground spots. Prominent feature points are extracted from each image, and features corresponding to the same 3D point are matched in the overlapping regions of different images. Aerial triangulation, such as bundle adjustment, is then applied to define camera positions and orientations, as well as to obtain 3D geometry, creating tracks from a set of matched features. Based on these estimated camera and image positions, densification algorithms (dense stereo matching) can be used to generate dense 3D point clouds. From the point cloud, a digital surface model (DSM) can be derived and, by projecting the single images using the DSM, an orthomosaic can be generated [16,17]. These data products can be further used to extract specific forest parameters. The UAV-SfM approach has the potential to provide both geometric and spectral datasets, serving as input data for various forest parameter extraction algorithms. While methods integrating geometrical point and spectral image data are increasingly used in the field of forestry, most studies rely on LiDAR data rather than on UAV-SfM point clouds [18,19].

2. UAV-Data-Based Products for Tree Crown Delineation

In order to analyze tree parameters on an individual tree level within a forest stand, it is necessary to first segment single trees. This involves delineating the projected tree crown, which can be identified in the orthomosaic or height models, as separate objects. This individual tree crown delineation (ITCD) serves as the foundation for various subsequent analysis steps, including tree species classification, environmental and forest monitoring at the tree level, and the extraction of individual tree parameters directly from remote sensing data [20,21,22,23]. Over the past few decades, numerous ITCD methods have developed, based on generalized characteristics of trees or forests. Most of these methods can be applied to either 2D raster such as orthomosaic, 2.5D raster such as canopy height models (CHMs), or 3D data products such as point clouds. However, there are certain methods that specifically rely on 3D data and cannot be used when only 2D data are available. These methods often originate from the field of laser scanning and are increasingly being tested for point clouds derived from SfM as well [24].

Most methods typically assume a similar—hemispherical to conical—tree shape, with one tree top located at the center of the crown. Due to its exposed position, the top of the tree receives the highest solar radiation, resulting in the highest intensity and brightness values [20,22]. Algorithms are expected to yield better results for forests with a sparse canopy, lower species diversity, and similar age structure. These characteristics are more commonly found in managed forests and in coniferous, savannas, or tundra forest systems [22].

The analysis is often divided into two parts: tree detection, which involves identifying the position of a tree trunk, and delineating the (entire) tree crown [20]. In the following, some of the more commonly used methods for tree detection and delineation will be presented.

Local maxima and region growing: This method builds upon the previously mentioned canopy characteristics. Initially, individual trees are detected by identifying local maxima, which ideally represent tree midpoints. These maxima can be based on both CHM values and brightness values [20,22]. Difficulties may arise when defining the search radius for local maxima, which is derived from pixel size and average crown diameter and due to the fact that, in reality, tree crowns are not symmetrically aligned around a central point. Smoothing filters can reduce unwanted noise within the maxima [23,25,26,27]. Wulder et al. [28] propose using varying local maxima search radii, each based on the semivariance of the pixel.

Starting from these initial seed points, neighboring pixels or objects that exhibit similarity are added to the crown objects until a termination criterion is met, indicating that the crown has been delineated. This process is known as region growing [20].

Valley-following approach: The valley-following approach, initially introduced by Gougeon [29], consists of two parts. The first part involves classifying the areas between the individual tree canopies, while the second part utilizes a rule-based method to refine these classified areas [29]. These intermediate areas are referred to as valleys. They are characterized by higher shading, resulting in lower intensity and brightness values compared to the surrounding areas or represent local minima values [25,30]. According to Ke and Quackenbush [20] and Workie [23], relying solely on this shade-following approach often leads to incomplete separation of individual trees because, depending on tree density, not all crowns are adequately separated from each other by “valleys”.

Watershed segmentation: The watershed segmentation, first described by Beucher and Lantuéjoul [31], is a form of image processing segmentation. It also draws upon the topological analogy of the canopy described earlier. In this method, the values of a gray-scale layer are inverted and “flooded”, starting from local minima (tree tops). The resulting individual watersheds are separated from each other by dams, creating distinct segments [21,22,32]. Markers can be used in this process, representing the local minima from which the “flooding” originates, ideally representing tree tops. The resulting segments are supposed to delimit the individual tree tops [23]. According to Derivaux et al. [32], a common challenge in this method is over-segmentation, which can be mitigated through targeted marker placement, application of smoothing filters, or through the combined use with region-merging techniques and other methods.

Template matching: Template matching is a method that can be employed to detect individual tree crowns when the tree crowns exhibit similar shapes and comparable spectral values. Templates are created based on a gray value layer, representing patterns of typical tree shapes and values, mostly averaged. Both radiometric and geometric properties of the crown are utilized, and different viewing angles can be considered. The template is compared to all possible tree points, with high correlation values above a defined threshold representing individual trees [21,22,25,33].

Deep learning methods: Deep learning methods, such as instance segmentation, are increasingly being utilized for the detection and delineation of single tree crowns [34]. Commonly used model architectures are Mask R-CNN [35], artificial neural networks [15], or U-nets [36]. These methods offer advantages such as the ability to use multi-band images as input, instead of relying solely on a single band, and a focus on textural features, which is beneficial as adjacent trees might have very similar spectral properties. However, a disadvantage is the need for training data, which are typically obtained through time-consuming manual crown delineation and/or field work.

Point cloud-based methods: Terrestrial/airborne laser scanning and UAV-SfM provide 3D point cloud data that can be directly used for the single crown delineation. Most studies focusing on 3D methods utilize LiDAR data, as UAV-SfM does not penetrate dense crown structures well, especially during leaf-on season. As a result, UAV-SfM leaf-on point clouds tend to be less dense for lower canopy and forest floor areas. Various point-cloud-based approaches have been applied for ITCD, including K-means clustering [37], template matching [38], voxel space approaches [39], and mean-shift segmentation [40]. Deep learning models are trained using reference point clouds and have been used to segment single trees [39], employing frameworks such as PointNet [41].

Several recent studies applied one or multiple of the above-mentioned methods to detect tree positions and perform crown delineation using UAV-based imagery data products. A selection of relevant studies is presented in Table 1.

Table 1. Selection of recent studies on tree detection and crown delineation in forest ecosystems using UAV optical data.

Paper	Forest Characteristics	UAV Imagery Type	Camera Angle	Tree Detection Algorithm	Crown Delineation Algorithm	Description
[42]	Dry conifer forest, low tree density	DJI Phantom 4 Pro (RGB)	Nadir & oblique	Local maxima on CHM	-	Various flight altitudes, patterns, and camera angles were tested, and better accuracy values were achieved by combining crosshatch flight patterns with nadir camera angles. The maximum F-score values ranged between 0.429 and 0.771.
[43]	Broadleaf forest, test sites with different stand densities	DJI Phantom 4 Pro (RGB)	Nadir	Local maxima on CHM	Region growing + inverse watershed segmentation	The highest overall accuracy (F-score = 0.79) was obtained for the low-density stand by applying a region growing algorithm on the CHM. Accuracy also varied among different tree species, with the best results obtained for Caspian poplar and the lowest for Persian ironwood. In high-density stands, the crown delineation results could be improved by applying weak gaussian filtering to the CHM.
[44]	Mixed conifer forest, open canopy	DJI Phantom 3 (RGB)	Nadir	Local maxima on CHM	-	Fixed window sizes were used in the local maxima filter, and it was observed that accuracy decreased as the filter sizes exceeded 1 × 1 m. Challenges in tree detection were specifically noted in steep areas and regions with high canopy closure. DTMs obtained through SfM tended to overestimate height in dense vegetation in comparison to DTMs derived from airborne laser scanning.
[45]	Mixed-conifer forest, moderate density	DJI Phantom 4 (RGB)	Nadir, oblique & composite	Variable window filter (VWF), 3D point cloud-based algorithms	-	Different flight parameters (altitude, camera angle, and image overlap), SfM processing settings (depth filtering, alignment, and dense cloud quality), and tree detection algorithms (CHM smoothing, VWF parameters, and point-cloud-based methods) were investigated. Higher accuracies were achieved at high flight altitudes (120 m) and with high image overlap (90%). The combination of nadir and oblique imagery resulted in detection rates worse than using only nadir data. CHM-based VWF methods produced the most accurate results, with F-scores up to 0.664 (trees > 10 m) and 0.826 (trees > 20 m).
[46]	Pine tree plantations	DJI Mavic Pro (RGB)	Oblique	Local maxima on CHM	-	Prior to tree detection using local maxima, the CHM was mean-filtered with a user-defined filter size. Accuracy of up to 0.78 (F-score) was achieved.
[47]	Spruce-pine forest	DJI Phantom 4 Pro, Parrot Disco-Pro Ag & DJI Matrice 210 (RGB & multispectral)	Nadir	Local maxima on CHM	Watershed segmentation	Using consumer-grade cameras yielded higher tree detection rates and more accurate crown diameters compared to multispectral cameras. Cameras with higher spatial resolution performed better at higher flight altitudes, whereas the opposite was observed for cameras with lower resolution. The best results were achieved with the DJI Phantom 4 RGB drone, detecting 84% of the trees correctly. The mean absolute error of crown diameters derived was 0.79–0.99 m (Phantom 4, RGB) and 0.88–1.16 m (Zenmuse X5S).
[48]	Mixed conifer forest, open canopy	DJI Phantom 3 (RGB)	Nadir	Local maxima on CHM	-	Different window sizes for local maxima detection were tested, and the performance of smoothed and non-smoothed CHM was compared. Lower window sizes for local maxima and smoothing proved to be more successful in detecting trees. The overall F-score value was 0.86.
[49]	Forest plantations, high canopy density	DJI M600 Pro (5-lens oblique)	Oblique	Adaptive-/fixed-kernel bandwidth mean-shift (AMS/FMS), region growing on CHM	AMS, FMS, region growing on CHM	Kernel bandwidth was defined based on canopy properties and applied to the mean-shift tree detection and delineation algorithm. The AMS method outperformed FMS and seed-based region growing methods, achieving an overall accuracy of ≥0.72 for tree detection and a relative RMSE of ≤0.13 for crown width.
[50]	Orchard yard/naturally wooded pasture/urban trees, low tree density	DJI Phantom 4 (RGB)		Single shot detector (SSD) deep learning model on raster data (returning bounding boxes around tree position)		Several additional datasets were derived from the RGB and DSM and were used for training purpose. SSD was used for tree detection and species classification. Ensembled models with different input datasets generally demonstrated higher performance compared to models based on only one type of input data.
[51]	Conifer and mixed regenerating forest stands, low tree density	DJI Phantom 4 Pro (RGB)	Nadir	-	Mask R-CNN on raster data	Mask R-CNN were trained using manually delineated crowns; pretrained networks were also incorporated. An average F1-score of 0.91 was achieved. ITCD remained more challenging for heterogeneous and denser forest stands, as well as for smaller crowns.
[35]	Conifer plantation, moderate tree densities	DJI Phantom 4 (multispectral)		Local maxima on RGB and CHM	Marker-controlled watershed segmentation, Mask R-CNN on raster data	The performance of local maxima, marker-controlled watershed segmentation and Mask R-CNN were compared. Local maxima and watershed algorithms scored the best results when applied to the CHM. Overall, Mask R-CNN outperformed the classic algorithms.
[52]	Subtropical broadleaf forest, moderate tree density	DJI Matrice 600 (RGB & hyperspectral)		-	Watershed-spectral-texture-controlled normalized cut	An extension was made to the CHM-based watershed segmentation to reduce over-segmentation. Objects segmented by the watershed algorithm were clustered based on the normalized cut criterion, considering both spectral and textural information.

As emphasized by several of these studies, detection and delineation of tree crowns in complex forest structures with dense canopy closure and overlapping tree crowns remains challenging [35,51]. Some studies focus on ITCD in forest with low to moderate canopy closure or open forest stands [42,48,53], as well as on forest plantations characterized by a more regular tree spacing and structure [44,46]. These types of forest tend to facilitate the detection of crown boundaries and the number of trees, as most algorithms perform better in homogeneous forest stands with lower canopy closure [19].

This entry is adapted from the peer-reviewed paper 10.3390/rs15184366

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.