Deep Learning Models for Road Pothole Detection: Comparison
Please note this is a comparison between Version 1 by Eun-Sung Jung and Version 2 by Sirius Huang.

For self-driving cars, crack detection is crucial because these vehicles rely on sensors to perceive and navigate the environment. Crack visualization has certain methods, such as the use of a deep-learning architecture, capable of processing images at multiple scales. The inability to judge the difference between potholes or patches results in the sudden break or non-breaking elements at inappropriate places because of a confused state of the neural network. In this regard, detecting potholes in self-driving vehicles or road maintenance is vital for future intelligent transportation systems.

  • distributed deep-learning
  • distributed edge AI/ML
  • distributed hybrid model training

1. Introduction

A self-driving automobile employs an artificial intelligence (AI) system to evaluate data from sensors and make judgments while driving [1]. The disposition of smart cars assails as a fast catalyst for the revolutionary steps toward the future of intelligent transportation systems by decreasing pollution, reducing accidents, and decreasing traffic [2]. The self-driving vehicles will remarkably decrease road accidents in the future through human input integrated with AI programs. For self-driving cars, crack detection is crucial because these vehicles rely on sensors to perceive and navigate the environment. If cracks are not detected and repaired promptly, they can interfere with the vehicle’s perception systems, leading to incorrect or incomplete information about the road ahead. Crack visualization has certain methods, such as the use of a deep-learning architecture, capable of processing images at multiple scales [3] and detecting strains in columns via the mark-free vision methodology [4]. However, substantial doubts about reliability, regulations, and predictive detection have been encountered and raised [5]. Most reported accidents of self-driving cars were due to inappropriate or heavy-weight neural network training on edge devices, resulting in heating issues [6]. The inability to judge the difference between potholes or patches results in the sudden break or non-breaking elements at inappropriate places because of a confused state of the neural network [7]. In this regard, detecting potholes in self-driving vehicles or road maintenance is vital for future intelligent transportation systems. The requirements of pothole-detecting AI systems include the following: (1) a lightweight distributed neural network, (2) high-quality input images for training in a distributed edge cloud environment, and (3) reliable communication for appropriate information exchange among distributed deep learning [8].
Over the past few years, distributed deep learning has emerged as a promising area of research, supported by scalable cutting-edge technology and driven by the need to tackle large-scale datasets and complex problems. Numerous state-of-the-art studies have been conducted to develop innovative techniques and frameworks for optimizing distributed deep-learning systems. For instance, exploring data and model parallelism has led to significant advancements in the scalability and efficiency of training large neural networks [9]. Additionally, researchers have been investigating the impact of communication strategies, such as gradient compression [10] and decentralized optimization [11], to reduce the communication overhead and latency associated with the distributed training process. Furthermore, novel approaches, such as federated learning [12], have been proposed to enable collaborative learning among multiple devices while preserving data privacy. These studies reflect ongoing efforts to develop more efficient, scalable, and privacy-preserving distributed deep-learning systems, ultimately contributing to the broader applicability of deep learning in various domains.
There are two main distributed learning strategies. The first strategy is data parallelism [13]. An extensive dataset is common for more accurate results in modern deep learning. Due to the extensive dataset, the memory fitting problem occurs vastly. To overcome this issue, the large dataset is divided into small batches, and their gradients are calculated individually on different GPUs; the final result is the weighted average of all the calculated gradients. Furthermore, the second technique is model parallelism [14]. Model parallelism is required when the model (layers of the model) or parameters are too large to fit in the memory. Therefore, deep-learning models could be divided into pieces; a few consecutive layers could be transferred to a single node, and the gradients could be calculated in the forward direction. Synchronous [15] and asynchronous training [16] is a typical method to solve data/model parallelism.
Majorly, two main libraries support distributed learning: TensorFlow and PyTorch. TensorFlow is used vastly in industrial products and provides distributed APIs for data distribution across multiple devices (e.g., GPU and TPU). Users can distribute data and create a training pipeline with minimal changes in the code. One of the significant drawbacks of TensorFlow’s distributed APIs is that they support model distribution but with many limitations. On the other hand, PyTorch’s distributed APIs are fastly growing in model distribution and data distribution [17]. PyTorch contains various model and data parallelism options according to the user’s requirements [18]. PyTorch also provides flexibility to develop its model and data distribution training pipeline.
However, the increase in computational capabilities is significantly outpaced by the expansion of the datasets and models [19], which is why, even after achieving the distribution scenarios of training, the production deployment of these networks remains premature [20]. Consequently, the memory capacity and communication overhead can limit the scaling of data parallelism.

2. Deep Learning Models for Road Pothole Detection

The detection of road potholes using computer vision and machine learning approaches can be a valuable tool to assist with visual challenges [21]. Potholes can pose a significant risk to autonomous vehicles, potentially causing damage to their sensors or suspensions and can lead to accidents or disruptions in traffic flow. Similarly, the automatic detection of pothole distress in asphalt pavement using improved convolutional neural networks (CNNs) is a promising approach for identifying and addressing potholes on time. Potholes can cause significant damage to vehicles, disrupt traffic flow, and pose safety hazards to drivers and pedestrians alike [22]. Similarly, rethinking road surface 3D reconstruction and pothole detection from perspective transformation to disparity map segmentation is a novel approach to detecting and addressing potholes on the road. The traditional method of pothole detection involves using cameras to capture images of the road surface, followed by perspective transformation to create a 3D surface model. However, this method can be time-consuming and computationally expensive [23]. The system known as 3Pod is a federated learning-based system for 3D pothole detection in smart transportation. The system uses a distributed approach where data is collected from various sensors installed in the vehicles and then sent to a centralized server for processing using federated learning techniques. This approach helps improve the accuracy and efficiency of pothole detection while ensuring data privacy. One drawback of this system is that it requires a large amount of computational power and data storage to process and store the 3D point clouds [24]. Traditional distributed deep-learning pothole detection systems may not be accurate or reliable enough for use in self-driving cars, as they may be affected by various factors, such as lighting conditions, weather, and road surface variations. Moreover, the system’s reliability is dependent on both hardware and software. The system’s hardware components, such as the sensors and processors, must be able to accurately capture and process data for the software to analyze and interpret it effectively. Therefore, it is essential to ensure that the hardware is high-quality and meets the necessary specifications. To achieve cutting-edge development, high-end distributed strategies should be developed. Developing a high-end distributed environment for pothole and road distress detection, as a use case of self-driving cars, requires an in-depth understanding of distributed deep learning. The distributed model analysis is thought to be the foundation of an Oracle tool that can help to identify limitations and bottlenecks of various parallelism approaches during their scaling scenario. This methodology assesses Oracle using six parallelization algorithms, four CNN models, and different datasets (2D and 3D) on up to 1024 GPUs. Compared to empirical results, the Oracle tool has an average accuracy of roughly 86.74% and data parallelism accuracy of up to 97.57% [25]. However, GPU processing performance and training throughput are severely limited because of the excessive memory consumption mentioned before. To tackle the issue mentioned above, a model named Hippie was proposed [26]. Hippie is a hybrid parallel training framework that combines pipeline and data parallelism to increase the memory economy and scalability of massive DNN training. Hippie uses a hybrid parallel approach based on hiding gradient communication to boost training throughput and scalability. Hippie was created utilizing the PyTorch and NCCL platforms. According to tests on diverse models, Hippie achieved above 90% scaling efficiency on a 16-GPU architecture. Hippie also boosts performance by up to 80% while reducing memory overhead by 57%, resulting in a memory efficiency of 4.18×. However, significant speed-up issues were observed in inherently sequential tasks. HyPar-Flow is a single API for processing data, model, and hybrid parallel training at scale for any Keras model. To accumulate/average gradients across model replicates, the all-reduce algorithm is employed. HyPar-Flow presents a significant advancement in distributed training, as it provides several notable benefits. First, it offers up to 1.6 times the speed of Horovod-based (Horovod is an open-source package that overcomes both scaling challenges in inter-GPU communication [27]) data-parallel training in sequential tasks, demonstrating its superior efficiency. Second, HyPar-Flow can achieve 110 times the speed of a single node when deployed, showcasing its impressive scalability. Lastly, for ResNet-1001, an ultra-deep model, HyPar-Flow boasts an astounding 481 times the speed of single-node performance when implemented on 512 Frontera nodes, further emphasizing its remarkable capabilities in handling complex and resource-intensive tasks. While the aforementioned information highlights the impressive performance and scalability of HyPar-Flow, it does not address the potential increase in communication overhead due to the combination of data and model parallelism in HyPar-Flow, which could impact its overall efficiency in specific scenarios. Communication overhead is one of the most significant roadblocks to training big deep-learning models at scale. Gradient sparsification is a promising technique for reducing the amount of data transmitted. First, developing a scalable and efficient sparse all-reduce method has proven to be complex. Secondly, the sparsification overhead is crucial in limiting the potential for speed improvement. The aforementioned issues were addressed for big and distributed deep-learning models by Ok-TOPK, a distributed training system with sparse gradients [28]. Ok-TOPK combines a decentralized parallel stochastic gradient descent (SGD) optimizer with a unique sparse all-reduce technique (less than 6k communication volume and asymptotically optimal). Ok-TOPK achieves model accuracy comparable to dense all-reduce, according to empirical results. Ok-TOPK is more scalable and boosts training performance significantly compared to the optimized dense and state-of-the-art sparse all-reduces (e.g., 3.29×–12.95× improvement for BERT on 256 GPUs). Furthermore, a distributed framework was introduced for air quality prediction featuring Busan, Republic of Korea as its model city. To forecast the intensity of particle pollution, a deep-learning model was trained on a distributed system known as data parallelism (PM2.5 and PM10) [29]. To determine how the air quality particles are connected in space and time with the dataset distribution, multiple one-dimensional CNN layers are combined with a stacked attention-based BiLSTM layer to extract local spatial features. The hybrid approach observed in the mentioned research involved asynchronously distributing data and the model within the same algorithm. For instance, the data was initially distributed and trained with the undistributed model, followed by distributing the undistributed model and training it with undistributed data. Additionally, the communication overhead between the GPUs was a more significant concern than the training and epoch time. The research also lacked practical comparisons, as the developed algorithms’ training times were analyzed but not compared to state-of-the-art APIs.
Video Production Service