Off-Road Detection Analysis for AGV: A Review

Off-Road Detection Analysis for AGV: A Review: Comparison

Please note this is a comparison between Version 1 by Fahmida Islam and Version 2 by Rita Xu.

When it comes to some essential abilities of autonomous ground vehicles (AGV), detection

is one of them. In order to safely navigate through any known or unknown environment, AGV must

be able to detect important elements on the path. Detection is applicable both on-road and off-road,

but they are much different in each environment. The key elements of any environment that AGV

must identify are the drivable pathway and whether there are any obstacles around it. Many works

have been published focusing on different detection components in various ways. In this paper, a

survey of the most recent advancements in AGV detection methods that are intended specifically

for the off-road environment has been presented. For this, we divided the literature into three major

groups: drivable ground and positive and negative obstacles. Each detection portion has been further

divided into multiple categories based on the technology used, for example, single sensor-based,

multiple sensor-based, and how the data has been analyzed. Furthermore, it has added critical

findings in detection technology, challenges associated with detection and off-road environment, and

possible future directions. Authors believe this work will help the reader in finding literature who

are doing similar works.

autonomous ground vehicles
off-road environment
drivable ground
positive obstacles
negative obstacles

1. Introduction

For years, industry and academia have been interested in developing autonomous technologies, such as driverless vehicles. Driverless vehicles are also known as autonomous ground vehicles (AGV) [1], unmanned ground vehicles (UGV) [2], autonomous guided vehicles [3], or autonomous land vehicles (ALV) [4]. These terms refer to vehicles that can navigate without or with minimal human assistance [5]. They are one of the first modern applications in robotics research. From the early days, AGV has been constantly being improvised to provide advanced driver assistance, road safety, and collision avoidance ^[6][7][8][6,7,8]. However, research in the unstructured environment still falls behind compared to structured environments. Many uncertain factors are responsible for this, like lack of labeled dataset, accessibility of data, lack of applicable data [9], etc.

Off-road environments are regions of suburban or non-urban, non-structured, or weakly structured road areas that lack well-defined routes and driving instructions like road signs, traffic signals, etc. Some good examples of off-road environments are forests, country roads, muddy or sandy roads, or terrain covered by emergent plants [10]. Oliver et al. [11] defined unstructured environments as situations or environments that “have not been previously adjusted to make it easier for a robot to complete a task”. In layman’s terms, an off-road environment can be any environment that does not have basic driving facilities, road instructions, and more challenging than usual conditions. Figure 1 shows some examples of off-road environments that have unstructured roadways and without proper driving facilities. Figure 1a–c shows rocky, muddy, and sandy road environments, respectively.

Figure 1. Examples of some off-road environments with unstructured roadways; The figure presents (a) rocky, (b) muddy, and (c) sandy road environments. Images are collected from the autonomous testing ground of the Center for Advanced Vehicular Systems (CAVS) at Mississippi State University (MSU).

The on-road environment is an urban area with structured roadways, along with necessary driving instructions, e.g., road signs, street markings, etc. On-road detection techniques involve lanes, traffic signals, road signs, pedestrians, vehicles, and building detection. Most of the objects on-road are specific and identifiable with a vision-based method. However, in off-road scenarios, there may not have any specific element. Therefore, it is very difficult to sense the environment. In the off-road environment, two major detections are essential to navigate smoothly. At first, it needs to identify the traversable or drivable ground where nothing can block the movement of the vehicle. Secondly, the vehicle needs to look for obstacles to find the appropriate path. Any prior knowledge about the environment facilitates the vehicle system to reach its goal at the minimum cost with the shortest distance.

2. Ground or Drivable Pathway Detection

One of the most important criteria in autonomous driving is identifying drivable regions. On-road pathways are easily identifiable because of their structure, color, smoothness, and horizontality. Off-road areas are more challenging, requiring advanced sensing equipment and techniques because, in off-road areas, roads are not the same in all areas. Instead, certain areas of the ground may be unsmooth, sloppy, and bumpy. In addition, having dense vegetation, grass, sand, or dirt, the driving pathway may not be distinguishable by a visual identifier. As a result, the AGV may lose its autonomous navigation capability.

2.1. Single Sensor-Based Detection

Liu et al. ^[12][19] used only 3D lidar to detect drivable ground, positive, negative, and hanging obstacles. They primarily detected obstacles to uncovering the traversable region in their work. They used 3D lidar points and analyzed the radial and transverse features. These features detect the obstacles, and then the leftover areas are defined as drivable regions for traversing AGV. Gao et al. ^[13][20] also used lidar and proposed a deep-learning approach where they used the vehicle trails as input and obstacle marks as the label for the network. The suggested network topology was created considering the obscure and uncertain zone in the off-road environment. For their network, no human involvement is required to label the obstacles as it is done automatically. The main advantage of this network is that it takes data without labeling or weakly labeling but still provides a satisfactory result. Chen et al. ^[14][21] detected traversable road sections and obstacles in one unified model. They collected lidar image data and converted it to a lidar point cloud. A histogram map has been generated with the lidar point cloud, where the traversable road area can be visible in front of the vehicle. Apart from the traversable path, the obstacles are also visible around it. Katramados et al. ^[15][22] created a “traversability map” by extracting color and texture from images. For that, they collected camera data, mounted it on the top of the vehicle, and removed some unnecessary information. Then, they generated the map and adjusted the image’s temporal filtering, which helped to detect edges from blurry images. They removed lighting effects like shadows and reflections from the dataset, making the final detection result accurate. Shaban et al. ^[16][23] developed a deep learning model named Bird’s Eye View Network (BEVNet). This model took aerial images from the lidar sensor and semantically segmented them into four terrain classes- “free, low-cost, medium-cost, and obstacle”. The advantage of this model is that it can fill any gap with information using previous knowledge, and thus, it can overcome the problem of missing values for those areas where no lidar hits are found. In another work, Gao et al. ^[17][24] suggested an approach based on contrastive learning using camera images. In contrastive learning, a single feature is trained for classification. They used a set of human-labeled bounding boxes as features and detected different traversable areas for this work. Those areas are semantically segmented to generate an understandable map of the environment. Overall, lidar and camera sensors have primarily been used for traversable ground detection when using only a single sensor. Furthermore, the deep learning method is quite interesting to the researchers, as they can overcome some limitations like uncertain zone, missing data, etc.

2.2. Multi-Sensor-Based Detection

Zhu et al. ^[18][25] combined three lidars to detect three different types of obstacles and traversable paths. They used lidar odometry, which converts the detection output into a structured form. They recorded several findings, combining each with the following result. This combination uses the Bayesian theory, which provides a reward. However, this method will not work for long-distance traversability. A drivable ground detection method in a dynamic environment has been presented by Dahlkamp et al. ^[19][26]. This method has been applied to the vehicle that participated and won in the “DARPA Grand Challenge robot race”. DARPA is the “defense advanced research projects agency”. A “drivability map” is created by a laser range finder and camera sensor with an appearance model. This vehicle was able to navigate through the desert terrain very fast. Mei et al. ^[20][27] designed an algorithm to detect the traversable area in the off-road environment. They captured images with a monocular camera. The image of the same area is also captured by a lidar-based and observed by a human. The far-field capability is also measured. The final traversable region is defined by comparing the image data with the three measurements. An unsupervised learning-based method has been developed by Tang et al. ^[21][28] for segmenting passable areas in unstructured environments. They used a deep convolutional neural network to classify free, obstacle, and unknown road areas. They used both camera and laser sensors for training data and generated automatic labeling. For testing data, only a monocular camera is sufficient. Semantic segmentation, also known as image segmentation, is the process of allocating one of N predefined classes to every pixel of an image ^[22][29]. It is a deep learning algorithm that depends on a large set of labeled datasets. Dabbiru et al. [9] published a labeled dataset using semantic segmentation of three different vehicle types for an off-road environment. They used two 2D cameras and a 3D lidar sensor for data collection and annotated them based on the vehicle class. Reina et al. ^[23][30] detected drivable ground by combining lidar and stereo data and two classifiers. Each classifier takes data from each sensor, and then the classification result is fused to get the final result. Likewise, Sock et al. ^[24][31] used lidar and camera data to measure road boundaries and shape. They generated two probabilistic maps with two different sensors and then classified the traversable region with a linear support vector machine (SVM). The two classification results have been fused with the Bayes rule. McDaniel et al. ^[25][32] proposed a method for detecting tree stems using an SVM classifier. They used a lightweight lidar scanner from a single point of view. This method has two steps where the non-ground points have been filtered out in the first step, and then SVM classifies the points that belong to the ground from the remaining. In summary, fusing multiple sensors can provide a better detection result for drivable ground. Lidar and camera fusion is the most common fusion method. The classification result from two different sensors is being compared with the Bayesian rule. However, using multiple sensors may be costly. Table 1 encapsulates the works presented in next section.

Table 1. The methods used for traversable ground detection.

Literature	Sensors	Method	Detection
Gao et al. ^[13], 2019	Gao et al. [20], 2019	Lidar	Deep Learning	Drivable ground
Chen et al. ^[14], 2017	Chen et al. [21], 2017	Lidar	Lidar-histogram	Obstacles and drivable ground
Liu et al. ^[12], 2019 Zhu et al. ^[18], 2019	Liu et al. [19], 2019 Zhu et al. [25], 2019	Lidar	Radial and Transverse feature Bayesian Network	Obstacles and drivable ground
Katramados et al. ^[15], 2009	Katramados et al. [22], 2009	Camera	Vision/Traversability map	Drivable ground
Shaban et al. ^[16], 2021	Shaban et al. [23], 2021	Lidar Camera	Semantic Segmentation	Drivable ground
Gao et al. ^[17], 2021 Dabbiru et al. ^[9], 2021	Gao et al. [24], 2021 Dabbiru et al. [9], 2021	Lidar Camera	Semantic Segmentation	Drivable ground
Tang et al. ^[21], 2018	Tang et al. [28], 2018	Laser + Camera	Unsupervised Learning	Drivable ground
Reina et al. ^[23], 2016 Dahlkamp et al. ^[19], 2006	Reina et al. [30], 2016 Dahlkamp et al. [26], 2006	Lidar/Laser + Camera	Supervised Learning/SVM	Drivable ground
Sock et al. ^[24], 2016 McDaniel et al. ^[25], 2012	Sock et al. [31], 2016 McDaniel et al. [32], 2012	Lidar/Laser + Camera	Supervised Learning/SVM	Drivable ground

3. Positive Obstacles Detection and Analysis

AGV considers any particle as an obstacle that obstructs its smooth driving. Navigating securely without colliding with objects or falling into gaps is a key criterion for AGV. Obstacle detection and navigation in an unknown environment are one of the major capabilities of AGV. Obstacles are multiple objects that hinder the usual speed of the vehicles or make it a complete stop. Dima et al. ^[26][33] defined obstacles as “a region that cannot or should not be traversed by the vehicle”. Pedestrians, vehicles, houses, trees, animals, boulders, giant cracks, vast quantities of water, snow, etc., can be considered obstacles.

3.1. Single Sensor Based

Huertas et al. ^[27][34], determined the diameters of trees from the stereo camera image to assess whether they constitute traversable obstacles using edge contours. Edge contours are generated from the stereo pair (left and right) images, which is called 3D fragment information. Then, this information is coded with orientation to confine processing to object borders to match the edges on opposing tree trunk boundaries.

Maturana et al. ^[28][35] aim to differentiate small objects from large obstacles. They built a 2.5D grid map where they labeled terrain elevation, trail, and grass information. A 2.5D grid map is an image type where the height and depth information is provided. That information has been collected through lidar and other image data. The semantic map dataset is further trained and tested with a customized CNN for path planning and cost calculation.

Manderson et al. ^[29][36] presented a reinforcement learning method using labeled images. They collected images from the front end and overhead with various obstacles like vegetation, different rock kinds, and sandy paths. These images are used as inputs, and the labeling process is self-supervised. Value Prediction Networks (VPN) ^[30][37] have been used as a network. VPN is a hybrid network consisting of model-based and model-free architecture.

Nadav and Katz ^[30][37] presented a system with low computational cost. They used a smartphone to collect images, and they converted the images into a 3D point cloud model and processed it to detect obstacles and distance information. Another benefit of this system is that it can operate individually without other sensors.

Broggi et al. ^[31][38] presented a system that provides real-time obstacle detection using “stereoscopic images”. These images are collected from a moving vehicle through two cameras on the left and right sides. Then, the system calculates “V-disparity”, which determines the pitch oscillation of the camera from vehicle movement. V-disparity is a method that uses a single pair of stereo pictures to determine the camera’s pitch angle at the moment of acquisition ^[32][39]. The obstacles are then identified and mapped in real-world coordinates.

Foroutan et al. ^[33][40] presented a different approach than typical obstacle detection. They look into the effect of understory vegetation density on obstacle detection and use a machine learning-based framework. For that, they take point cloud data from the lidar sensor using an autonomous driving simulator. If the understory vegetation increases, the classification performance decreases.

A laser-based system with the Sober algorithm and the Gaussian kernel function has been applied by Chen et al. ^[34][41]. The goal is to group each obstacle’s point clouds, so the super-voxel has been optimized with the Euclidean clustering technique. Then, “the Levenberg–Marquardt back-propagation (LM-BP) neural network” has been applied to extract the features of the obstacles.

Zhang et al. ^[35][42] provided a faster detection method using stereo images. The detection method has two stages. In the first stage, it rapidly identifies the easily visible obstacles, and then in the second stage, it uses space-variant resolution (SVR) ^[35][42] to improve small obstacle detection. SVR is an algorithm that analyzes the geometric features and the level of interest in each area. However, SVR has a very high computation cost.

Overall, cameras are widely used sensors for obstacle detection. However, lidar and laser also provide good detection. While identifying the obstacles from the image, different image processing techniques and detection algorithms have been applied in different works. Some took different approaches by considering small objects.

3.2. Multi-Sensor Based

Kuthirummal et al. ^[36][43] presented an approach that can be applied to lidar and camera sensors. They created a grid-based obstacle map to define traversability. To map the obstacles in each cell, they calculate the elevation histogram and plot them on the graph with the label. Thus, the information about obstacles can be known.

Manduchi et al. ^[37][44] presented a sensor processing algorithm using two sensors. A color stereo camera is used for detecting obstacles based on color and terrain type. A single-axis ladar classifies the traversable and large obstacles. The camera is mounted on the vehicle’s top, while the ladar is placed in the lower portion. These two systems provide better navigation together.

Reina et al. ^[38][45] built perception algorithms to improve the autonomous terrain traversability of AGV. They presented two approaches. One deals with the stereo data to classify drivable ground and uses the self-learning method. The other one uses a radar-stereo integrated system to detect and classify obstacles. They have done field experiments in different environments, including rural areas, agricultural land, etc.

A low-cost and multi-sensor-based obstacle detection method has been developed by Giannì et al. ^[39][46]. They have unified three sensing technologies: radar, lidar, and sonar. Then, the data is “sieved” and passed to the Kalman filter, and this technique estimates the distance of the obstacle accurately. Meichen et al. ^[40][47] take a similar Kalman filter-based approach. They have selected IMU (Inertial Measurement Unit) and lidar sensors to collect the coordinates of the obstacle, and both coordinates are fused to get the obstacle position.

Kragh and Underwood ^[41][48] used semantic segmentation by fusing lidar and camera sensors. The appearance information comes from a 2D camera, and the geometry information comes from 3D lidar data, and both information has been fused to get the final detection result. The advantage of this model is that it can differentiate between traversable overgrown grass or fallen leaves and non-traversable trees and bushes. Furthermore, using this method, ground, sky, vegetation, and object can be classifiable.

Ollis and Jochem ^[42][49] used a set of sensors to generate a “density map” to detect different obstacles and classify terrain. They used various ladar and radar sensors that can update 70 times in a second. The terrain has been divided into six subclasses based on the obstacle. They also classified traversable regions based on the difficulty level, such as non-traversable, traversable, partially traversable, etc.

Bradley et al. ^[43][50] utilized the infrared ray for detecting vegetation that has previously been used to detect chlorophyll. A near-infrared and a video camera are fused to capture image data. The visible light has been removed from each pixel of the image by applying a threshold value. Thus, the presence of chlorophyll or vegetation can be known.

4. Negative Obstacles Detection and Analysis

Positive obstacles are those objects with a positive height above the ground, while negative obstacles are those with a negative height below the ground [4]. Positive obstacles are easily visible and captured by the sensors. Because of the negative height and position below the ground, negative obstacles are somewhat challenging. Furthermore, a regular vision-based system may not measure their depth and area. It could be unsafe for the vehicle if the negative obstacles are not identified correctly.

4.1. Missing Data Analysis

A common practice in negative obstacle detection is dealing with missing data from sensor signals. Many pieces of literature have worked with three-dimensional Lidar data. For example, Larson and Trivedi ^[44][52] have presented an algorithm for negative obstacle detection. They used a 3D laser to collect point cloud data. Two algorithms have been used for the classification method—Negative Obstacle DetectoR (NODR) and support vector machine (SVM). NODR, a geometry-based classifier, works by identifying missing information, which can lead to a negative obstacle. On the other SVM classifies the rays that return from Lidar.

Sinha and Papadakis ^[45][53] also considered the information gap from 2D morphological images for negative obstacle detection. The advantage of their approach is that they process in real-time and provide a high-level accuracy without analyzing the 3D scene. The difference from other works is that they denoised the signals, extracted features through principal component analysis (PCA), and classified them according to the area.

Similarly, Heckman et al. ^[46][54] have also considered missing data from a 3D laser. By identifying areas where data is missing, they aim to detect the negative obstacles which could be a reason for missing data. The benefit of this method is that it can be applied to a complex environment even with sloped ground.

Analyzing missing data from the sensors has been a unique but effective method for detecting negative obstacles. The data come from the point cloud, lidar, or laser sensor for missing data analysis.

4.2. Vision-Based Detection

Much literature focuses on stereo vision to detect negative obstacles like the Lidar sensor. Karunasekera et al. ^[47][55] utilized the stereo camera’s 3D and color information. They generated a disparity map, where v-disparity and u-disparity have been applied to identify road profiles. U-V-disparity is an algorithm for understanding a 3D road scene, where it can classify different features of that scene ^[48][56]. Negative obstacles can be identified by scanning through every pixel of the disparity map.

Shang et al. [4] also used Lidar to collect data about negative obstacles. They mounted the sensor in an upright position in their work, which has a great advantage. The vehicle collects more information about its blind spot and data in this position. The width and the background information have been fused with the Bayesian network. Then, they used SVM classifiers to get the final detection.

Bajracharya et al. ^[49][57] used stereo vision for a special vehicle to detect sparse vegetation and negative obstacles. So, they build a terrain map and walk along with it. The system uses spatial and temporal information, making detection results more accurate. This system can work in dynamic locations and weather, like rain, snow, and even at nighttime.

Hu et al. ^[50][58] fuse different geometric cues from stereo cameras in the image sequence to detect negative obstacles. The stereo images contain range, color, and geometric information, and a Bayesian network calculates the probability of detection. The benefit of this method is that the obstacles can be detected from a far distance.

In ^[51][59], the performance of seven obstacle identification methods with 21 obstacles has been examined. Among these, two of them are negative obstacles. One was detected with a stereo image, and another was on the local map through software.

Overall, negative obstacle detection using vision is a difficult task, which has been possible with some image processing techniques and stereo vision.