Road Anomaly Detection, Deep Learning-Based Methods and Visual-SLAM: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , ,

The proliferation of autonomous vehicles (AVs) emphasises the pressing need to navigate challenging road networks riddled with anomalies like unapproved speed bumps, potholes, and other hazardous conditions, particularly in low- and middle-income countries. These anomalies not only contribute to driving stress, vehicle damage, and financial implications for users but also elevate the risk of accidents. A significant hurdle for AV deployment is the vehicle’s environmental awareness and the capacity to localise effectively without excessive dependence on pre-defined maps in dynamically evolving contexts.

  • autonomous vehicles
  • deep learning
  • road anomaly detection
  • visual SLAM

1. Road Anomaly Detection

1.1. Traditional-Based Methods

Road surface characterisation involves identifying unique features of the road infrastructure. This process also involves identifying instances that deviate from the standard road setting. In real-world applications, these road anomalies include potholes, cracks, swellings, stripping, and unmarked speed bumps [9]. The detection and avoidance of these anomalies is crucial since late detection can lead to vehicle damage or road accidents. Therefore, it is pertinent that an intelligent transportation scheme is able to identify not only the presence of anomalies on the road but also their location on the road [20]. These actions will aid transport authorities to obtain real-time information about the road infrastructure, enable a vehicle to manoeuvre these anomalies via a suitable control or navigation method, and facilitate the incorporation of the model into a relevant visual odometry technique.
In the literature, several methods have been implemented in the detection of road anomalies. One of these techniques is the use of sensor data for anomaly identification. In this case, the vehicle utilises sensors such as accelerometers, gyroscopes, lidar, ultrasonic, and radar sensors to perceive their environments and to detect road anomalies. In some cases, these sensors are combined or fused to utilise different sensor readings for more accurate results.
In [21], an application based on crowd sensing was designed for detecting the condition of roads. The technology estimates the position of potholes and speed bumps using acceleration data from road users’ cell phones. The program, dubbed CRATER, has a 95% and 90% success rate of respectively detecting speed bumps and potholes. However, the program had a 5% false detection rate for speed bumps and 10% for potholes.
For the detection of speed bumps and the lowering of vehicle speed, an intelligent system based on smartphone technology was devised in [22]. The device detects speed bumps with a gravity sensor, and the third equation of motion was used for speed reduction. Data were collected using crowdsourcing and a variety of vehicles. Although the system produced good results, constraints connected with this study include the lack of consideration of the width and depth of potholes and speed bumps.
In [23], using vehicle driving noise, a non-compression auto-encoder was used for identifying road surface anomalies. The authors suggested the non-compression auto-encoder (NCAE) deep learning-based anomaly detection platform, which was cost-effective and operated in real time. Through convolutional operations, the developed platform can predict backward and forward time-series causality data. Furthermore, the architecture outperforms the compared anomaly detection methods in the aspect of the Area Under Receiver Operating Characteristic Curve (AUROC). When compared to vision-based approaches, the high cost and complexity of gathering sensor data is difficult. This strategy also makes determining the type of anomaly challenging.
Additionally, [24] devised a system for detecting speed bumps using accelerometric characteristics and a Genetic Algorithm (GA). The authors created a unique approach for detecting road irregularities (i.e., speed bumps). A GPS sensor, an accelerometer, and a gyroscope sensor put in an automobile were used in this approach. Data are obtained from the sensors after the car has driven through numerous streets. GA is then utilised to create a logistic model that successfully identifies road anomalies using a cross-validation technique. In a blind evaluation, the suggested model had a 0.9714 accuracy, a less than 0.018 false positive rate, and an AUROC of 0.9784. However, the aforementioned limitations of sensor-based methods apply here as well.
The authors in [25] created an innovative technique for identifying road bumps using an accelerometer-based Android program. This program analyses accelerometric sensor data obtained from several roadways to assess the correctness of the recommended approach. The study implemented a noise threshold to differentiate between phone shaking and accelerometric data, which is not an efficient process.
Furthermore, [26] focuses on the development of a new algorithm for detecting and characterising potholes and bumps from signals acquired using an accelerometer. The proposed algorithm utilises a wavelet-transformation-based filter to decompose the signals into multiple scales and then applies a spatial filter to the coefficients to detect road anomalies. The characterisation of these anomalies is achieved using unique features extracted from the filtered wavelet coefficients. The results of the analyses show the effectiveness of the proposed algorithm in accurately detecting and characterising potholes and bumps.
In [27], the detection and identification of road anomalies and obstacles in the road infrastructure using data collected from an Inertial Measurement Unit (IMU) installed in a vehicle is presented. The authors evaluate the use of Convolutional Neural Network (CNN) for this task, as well as the use of time-frequency representation (spectrogram) as input to the CNN instead of the original time domain data. The proposed approach was tested on an experimental dataset collected from 12 vehicles driving over 40 km of road and showed improved results compared to previous shallow machine learning algorithms and the use of CNN on time domain data. The authors report an identification accuracy of 97.2% after extensive optimisation of the CNN algorithm and the spectrogram implementation.

1.2. Deep Learning-Based Methods

Several research studies implemented road anomaly detection using machine learning (ML) methods. These ML techniques were implemented both in sensor-based methods and visual methods. An ML method for the determination of road surface anomalies using some sensors of a smartphone was presented in [12]. Using gyroscope, GPS, and accelerometer data obtained from cellphones, the author investigated different supervised ML approaches for efficiently classifying the conditions of road surfaces. The work concentrated on the classification of three major labels, which are smooth roads, potholes, and deep transverse cracks. The study also found that using characteristics from all three dimensions of the sensors produced more accurate findings than utilising only one axis. Furthermore, the model performance was assessed with respect to deep neural networks. The DT and SVM methods had smaller classification times than the neural-network-based methods. Loss of accuracy and precision was observed, resulting from the small dataset and disproportional distribution of class instances. In addition, the use of three types of sensors increases the complexity of the system.
In [13], using multispectral images from unmanned aerial vehicles, an asphalt pavement pothole and fracture detection system was created. The approach displayed the spectral and spatial characteristics of road abnormalities, and ML techniques such as SVM, ANN, and random forest were utilised to differentiate between undamaged and damaged pavements. The classification accuracy was 98.3 percent; however, the system was unable to identify cracks of less than 13.54 mm width due to the limits of spatial resolution of the UAV pavement photos.
ML techniques have shown impressive results in vehicle perception tasks. However, in ML methods based on classification such as Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), the effectiveness of the algorithm relies on the input data representation and the feature extraction method implemented [28]. To tackle the limitations of traditional ML algorithms, deep learning (DL) has been widely implemented in classification and object detection tasks, especially in the area of AV perception.
The authors in [29] proposed to develop a dual-stage YOLO v2-based road marker detector capable of operating in real time and possessing lightweight spatial transformation-invariant categorisation. The authors presented a two-staged technique to handle distorted road marker recognition and balance performance metrics such as recall and precision. The developed spatial transformation layer was able to tolerate road markings in the second stage which were distorted, resulting in enhanced accuracy. The built network was able to run at a speed of 58 FPS on a single GTX 1070 under varied scenarios. The two-stage model had an 86.5 percent mean average precision, whereas the RM-Net model had a 97.5 percent accuracy. The metrics were shown to be superior to standard classification and detection approaches. Lanes and road boundaries were not taken into account in this study, leaving potential for further research in that area.
In addition, [10] using street-level photos and geographical information, developed a road environment categorisation model. Based on a deep convolutional neural network (CNN), the research presents a novel framework for autonomous systems capable of the identification of street-level photos in a road scene. The model was pre-trained on the ImageNet dataset before it was trained on the KITTI dataset using transfer learning. The model classified urban, rural, and highway street photos with an accuracy of 86 percent. The approach assessed the various types of roadways. However, the road conditions were not investigated, leaving a void for future study directions.
A technique for road crack identification using multi-scale Retinex, which was combined with wavelet transform, was developed in [30]. To eliminate the halo formed by the Retinex technique and to reduce picture distortion, the wavelet transform was incorporated into the standard multi-scale Retinex method. The system had a recognition accuracy of 95.8 percent, which was higher than the traditional algorithm’s accuracy of 75.1 percent. The approach solely targets cracks, not other road oddities like potholes or speed bumps.
Furthermore, a technique for road anomaly and drivable area detection using a dynamic fusion module (DFM) was developed in [11]. The work created a road anomaly and drivable area detection standard for mobile robots by comparing existing contemporary single modal and semantic segmentation CNNs based on data-fusion and utilising six visual feature modalities. Furthermore, a novel module known as the DFM that can be simply implemented in existing data-fusion networks to successfully and efficiently fuse diverse types of visual characteristics was developed. The approach was capable of distinguishing between drivable areas and those with road abnormalities. When tested against other published methodologies on the KITTI dataset, the model had an average accuracy of 94.05 percent. Vehicles and pedestrians were the anomalies studied in this study; road problems were not examined.
The authors in [31] developed a CNN-based pothole detection system using thermal imaging. The study looked at the feasibility of using thermal imaging in pothole identification. A comparison of the self-built CNN algorithm and existing pre-trained models was also performed, with the results revealing that pictures were accurately recognised, with the highest accuracy value of 97.08 percent utilising one of the pre-trained CNN-based residual network models. The pothole identification problem was modeled as a classification challenge rather than an object detection operation in the study. As a result, the potholes could not be identified in the photograph.
Similarly, in [32], a CNN for detecting potholes in vital road infrastructure was demonstrated. The research suggests a unique use of CNN using accelerometer information for the identification of potholes. Data are captured using an iOS-based smartphone put on a car’s dashboard and running a specialised app. The results reveal that the proposed CNN technique outperforms previous models in terms of computing complexity and accuracy in pothole identification. The model has a 98 percent accuracy. The pothole identification problem was modeled as a classification challenge rather than an object detection operation in the study. As a result, the potholes could not be identified in the photograph. Furthermore, obtaining sensor data is difficult due to its high cost and complexity when compared to vision-based solutions.
To address the challenges posed by anomalies in road surfaces, [33] proposed a deep learning approach that uses various models including convolutional neural networks, LSTM networks, and reservoir computing models to automatically identify different types of road surfaces and to distinguish potholes from other destabilisations caused by speed bumps or driver actions. The experiments conducted using real-world data showed promising results and a high level of accuracy in solving both problems.
Furthermore, [20] built a DL-based edge AI-based automatic identification and categorisation method for road irregularities in VANET. The authors introduced a new approach based on VANET and edge AI for the automated identification of irregularities in roads by AVs and communication of relevant information to oncoming vehicles. ResNet-18 and VGG-11 are used for the identification and classification of roads with anomalies and plain roads without abnormalities. The model exhibited accuracy, precision, and recall values of 99.92 percent, 99.85 percent, and 99.85 percent, respectively. The study modelled the pothole detection problem as a classification task and not an object detection operation. Thus, the potholes could not be localised in the image.

2. Visual-SLAM

V-SLAM is a technology in which an autonomous navigation system employs a vision sensor to build and update a map of an unfamiliar area while tracking its location and orientation inside that environment [17,34]. Camera data, as opposed to other sensor data such as lidar, may give rich and extensive information, which improves high-level operations [35]. In a world reference frame, the camera path is represented as a collection of relative positions. Landmarks, which are objects or keypoint elements in each frame, reflect the surroundings. In static situations, landmarks stay stationary, but in dynamic environments, landmarks change location [35]. In this research, two main challenges have been highlighted regarding V-SLAM’s applicability [36]:
  • Reliability in Outdoor Environments: V-SLAM’s reliability, especially in outdoor settings, requires further enhancement. The limitations of lidar and radar sensors in extreme climatic conditions, combined with their exorbitant prices, render them unsuitable for exterior conditions [14,15,16]. Although laser scans offer high precision and strong resistance against interferences, they lack the capability to provide semantic details about their surroundings [17]. While V-SLAM methods aim to address these gaps, they remain vulnerable to environmental factors, such as varying light conditions [37].
  • Operability in Dynamic Scenes: Conventional SLAM and V-SLAM methods typically operate under the presumption of a static environment. This assumption is often not accurate, leading to V-SLAM methods that are designed for static scenes to falter in dynamic settings [38]. Such dynamic scenes often feature moving elements that must be accounted for during localisation and mapping processes. For instance, ORB-SLAM cannot differentiate whether the feature points extracted belong to stationary or moving objects [17]. Even though extensive research has been directed towards object detection [18] and semantic segmentation [19], the exploration of V-SLAM in highly fluid environments like roads and highways remains insufficient. There remains a pertinent need for autonomous systems to gain a comprehensive understanding of dynamic scenarios and to interact appropriately with moving elements [19,39].

 

This entry is adapted from the peer-reviewed paper 10.3390/wevj14090265

This entry is offline, you can click here to edit this entry!
ScholarVision Creations