Low-Cost Relative Positioning Methods Based on Visual-LiDAR Fusion: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: ,

Unmanned Ground Vehicles (UGVs) and Unmanned Aerial Vehicles (UAVs) are commonly used for various purposes, and their cooperative systems have been developed to enhance their capabilities.

 

  • deep learning
  • UAV tracking
  • object detection
  • linear Kalman filter
  • LiDAR-inertial odometry

1. Introduction

Multi-robot systems, particularly those involving UAVs and UGVs, have gained significant attention in various field robotics applications due to their advantages in terms of reliability, adaptability, and robustness [1]. A heterogeneous multi-robot system refers to a system that consists of different types of robots with different nature, hardware or operating environment. In the case of a UAV-UGV combination, the system includes both UAVs and UGVs, working together towards a common goal [2]. The ground-air configuration of UGV/UAV heterogeneous systems offers a wide coverage within a working space, making it an attractive solution for tasks such as precision farming [1], automated construction [3], search and rescue operations [4], firefighting [5], air quality sensing in smart cities [6], and many others.
A critical aspect of building an effective heterogeneous system is the development of a robust relative positioning method [1]. Relative positioning plays a fundamental role in coordinating the movements and interactions among UAVs and UGVs within the system. Traditional approaches to relative positioning have predominantly relied on visual methods or distance measurement techniques. However, these methods often encounter challenges such as limited accuracy, susceptibility to environmental conditions, and difficulties in handling dynamic scenarios. To address these challenges, the fusion of visual and LiDAR data has emerged as a promising approach in relative positioning for UAV/UGV heterogeneous systems [7,8]. By leveraging the advantages of both sensing modalities, the visual-LiDAR fusion can provide more comprehensive and accurate perception capability for the system. Visual sensors, such as cameras, can capture rich visual information about the environment, including object detection, tracking, and scene understanding. On the other hand, LiDAR sensors can provide precise 3D point cloud data, enabling accurate localization, mapping, and obstacle detection even in low-light or adverse conditions [9,10].
The fusion of visual and LiDAR data has been widely explored in various robotics applications, spanning human tracking [11,12,13], 3D object detection [14,15], and dynamic object tracking [10]. These studies have compellingly showcased the effectiveness of integrating visual and LiDAR sensing for improved perception and localization capabilities. Leveraging the strengths of both modalities, the proposed low-cost relative positioning method for UAV/UGV coordinated heterogeneous systems aims to enhance the system’s reliability, adaptability, and robustness within the operational environment.

2. Visual-Based Positioning Techniques

Visual-based positioning techniques utilize cameras mounted on a vehicle to extract visual information from the environment and estimate their positions. These techniques often involve marker-based and learning-based methods. Several studies have explored the application of visual-based techniques for relative positioning in robot-coordinated systems.
A common approach in visual-based positioning is the utilization of markers or landmarks for position estimation. Hartmann et al. [18] used marker-based video tracking in conjunction with inertial sensors to estimate the 3D position of a vehicle. Eberli et al. [19] focused on a vision-based position control method for MAVs (Micro Air Vehicles) using a single circular landmark. They detected and tracked the circular landmark in the camera images, leveraging its known geometry for position estimation. It is worth noting that the relative position of the marker and the object needs to be calibrated in advance. Once the position of the marker changes during work, it will lead to errors in positioning the object. Some researchers prefer to use LEDs as markers because they are small and easy to detect [16,17]. However, this will bring about another problem. The LED needs to be powered on the object, which will affect the durability of the object itself.
On the other hand, some studies focused on learning-based object detection algorithms for position estimation. Chang et al. [20] focused on the development of a proactive guidance system for accurate UAV landing on a dynamic platform using a visual–inertial approach. Additionally, a mono-camera and machine learning were used to estimate and track the 3D position of a surface vehicle [21]. These studies highlight the potential of learning-based methods in visual positioning tasks and demonstrate their applicability in various domains.

3. LiDAR-Based Positioning Techniques

LiDAR-based positioning techniques utilize LiDAR sensors to capture the surrounding environment and estimate the positions of vehicles. LiDAR sensors provide precise 3D point cloud data, enabling accurate localization and tracking. In the context of UAV/UGV coordinated systems, researchers have explored both learning-based and non-learning-based approaches for LiDAR-based positioning.
Non-learning-based LiDAR-based positioning techniques primarily rely on the geometric characteristics of LiDAR data. Quentel [23] developed a scanning LiDAR system for long-range detection and tracking of UAVs. This technique provides a reliable and accurate means of detecting and tracking UAVs, enabling their effective positioning in GNSS (Global Navigation Satellite System)-denied environments. Additionally, Qingqing et al. [24] proposed an adaptive LiDAR scan frame integration technique for tracking known MAVs in 3D point clouds. By dynamically adjusting the scan frame integration strategy, this method improves the accuracy and efficiency of UAV tracking based on LiDAR data, contributing to the precise positioning of UAVs in coordinated systems. However, the geometric properties of the UAV depend on its shape and volume. Without prior information, it is not easy for humans to discover features to detect small UAVs (i.e., DJI F330 drone) relying only on point clouds.
Learning-based LiDAR-based positioning techniques leverage machine learning and deep learning algorithms to extract latent meaningful information from LiDAR data. Qi et al. [25] proposed a point-to-box network for 3D object tracking in point clouds. This method represents objects, including cars, as 3D bounding boxes, enabling accurate and robust tracking using LiDAR data. Their approach achieves satisfactory speed and accuracy on a single NVIDIA 1080Ti GPU, but is limited by the CPU-only situation, which is common in heterogeneous systems for UAVs and UGVs. Inspired by learning-based visual methods for object detection, some researchers have tried to convert point cloud information into images and use one of the most advanced visual detection algorithms, YOLO or CNN, to detect the 2D position of objects. Then, the depth information is given from the point cloud to get the 3D position [8,26]. The proposed UAV tracking system in [8] is consistent with our idea. However, their methods are limited to position estimation based on the output of object detection. Learning-based object detection is time-consuming, which leads to a decrease in overall positioning capabilities or an increase in computational power requirements. Returning object detection to more specialized vision algorithms and freeing 3D object estimation from the constraints of object detection can ensure accuracy while greatly reducing computational costs.

4. Visual-LiDAR Fusion Approaches

In the realm of UAV/UGV coordinated heterogeneous systems, the integration of visual and LiDAR sensors holds great promise for achieving accurate and robust position estimation. However, each sensor modality has its limitations and strengths. Visual sensors can provide high-resolution imagery and semantic information but are sensitive to lighting conditions and susceptible to occlusions. On the other hand, LiDAR sensors offer accurate 3D point cloud data but struggle with low-texture environments. They are also affected by weather conditions. To overcome these limitations and leverage the strengths of both sensor types, Visual-LiDAR fusion approaches have emerged as a compelling solution.
While ignoring computational power constraints, some learning-based visual-LiDAR fusion frameworks have achieved satisfactory results. Among them, some research frameworks simultaneously extract visual and LiDAR information [10,15], and some explore multi-frame learning using 2D backbone and 3D backbone joint [14]. Different from these, Dieterle et al. [11] presented a sensor data fusion based on a recursive Bayesian estimation algorithm, namely the JPDAF. However, although the detection capabilities of a single sensor can be improved by relying on traditional visual and point cloud data processing methods, accuracy is still a key issue since the RGB information is not utilized to assist object detection.

This entry is adapted from the peer-reviewed paper 10.3390/aerospace10110924

This entry is offline, you can click here to edit this entry!
Video Production Service