LiDAR Point Clouds Semantic Segmentation in Autonomous Driving
Edit

Although semantic segmentation of 2D images is crucial to attaining scene understanding, there are still some limitations to visual sensors, such as the inefficiency of acquiring information under insufficient light, lack of depth information and limited field of view. In contrast, LiDAR can obtain accurate depth information with higher density and wider viewing field regardless of lighting conditions, which makes it a more reliable source of information for environmental perception.

LiDAR point clouds semantic segmentation deep learning asymmetric convolution

1. Introduction

Scene understanding is one of the most critical tasks in autonomous driving. With the challenges introduced by recent technologies such as autonomous driving, a detailed and accurate understanding of the road scene has become a main part of any outdoor autonomous robotic system in recent years. Although semantic segmentation of 2D images is crucial to attaining scene understanding, there are still some limitations to visual sensors, such as the inefficiency of acquiring information under insufficient light, lack of depth information and limited field of view. In contrast, LiDAR can obtain accurate depth information with higher density and wider viewing field regardless of lighting conditions, which makes it a more reliable source of information for environmental perception. Therefore, the scene understanding of LiDAR point clouds with semantic segmentation has become a focal point in autonomous driving.
According to point clouds’ encoding methods, the current LiDAR point clouds semantic segmentation methods can be divided into three categories: point-based methods, voxel-based methods, and projection-based methods. In terms of speed, there is a lot of computation and memory consumption in point-based and voxel-based methods, which makes it difficult to achieve real-time effects with the on-board computing platform. A higher priority should be placed on real-time performance when it comes to autonomous driving than segmentation accuracy. In contrast, projection-based methods are lightweight and fast, so real-time effects can be achieved during deployment. In terms of segmentation accuracy, the projection-based method has shown some success. However, since the point cloud information is not fully utilized during feature extraction, there is still room for improving segmentation accuracy.
When achieving real-time effects, it is of great relevance to improve the segmentation accuracy in autonomous driving scenarios. PolarNet [1] is the baseline network of ACPNet, which encodes point clouds through polar bird’s-eye-view (BEV) representation. BEV is the abbreviation for Bird’s Eye View, which is a perspective that views an object or scene from above, just like a bird looking down at the ground in the air. Also known as God’s perspective, which is a perspective or coordinate system used to describe the perception of the world. The using of polar BEV has some advantages: First, in terms of point allocation within grid cells, the polar BEV method will assign point clouds to their respective grid cells more evenly. Second, since the partitioning method brings about a more balanced distribution of points, the theoretical upper limit of prediction accuracy for the semantic classification of point clouds will be increased, thereby improving the performance of downstream semantic segmentation models [1]. In ACPNet, the encoded point cloud features are fed into an Asymmetric Convolution Backbone Network (ACBN) for feature extraction. Then, the features extracted by the backbone are input to the Contextual Feature Enhancement Module (CFEM) for further mining of contextual features. Moreover, global scaling and global translation are used as Enhanced Data Augmentation (EDA) while ACPNet is being trained.

2. LiDAR Point Clouds Semantic Segmentation in Autonomous Driving Based on Asymmetrical Convolution

Due to the sparsity and disorderliness of point clouds, encoding the input point cloud is a crucial issue when using convolutional neural networks for semantic segmentation of 3D point clouds. According to the encoding methods for point clouds, existing point cloud encoding methods can be divided into three categories: Point-based Methods, Voxel-based Methods, and Projection-based Methods.

2.1. Point-Based Methods

PointNet [2] is a point-wise learning method for point cloud features, and max pooling is used to integrate global features. PointNet++ [3] is an extension to PointNet, and the ability to extract local information of different scales is strengthened. A spatially continuous convolution is proposed in PointConv [4], which reduces the memory consumption of the algorithm effectively. For semantic segmentation in large-scale point clouds scenarios, the point clouds are represented as interconnected superpoint graphs in SPG [5], and then PointNet was used to learn the features of the superpoint graph. An attention-based module was designed in RandLA-Net [6] to integrate local features, achieving efficient segmentation in large-scale point clouds. Segmentation performance was further improved in KPConv [7] with a novel spatial kernel-based point convolution. Lu et al. [8] suggested the use of distinct aggregation strategies for both within-category and between-category data. Employing aggregation or enhancement techniques on local features [9] can effectively enhance the perception of intricate details. Furthermore, to effectively learn features from extensive point clouds encompassing diverse target types, Fan et al. [10] introduced the SCF-Net. This network incorporates a dual-distance attention mechanism and global contextual features to enhance semantic segmentation performance.
Point-based methods directly work on the raw point clouds without excessive initialization transformation steps. However, when handling expansive point cloud scenes, the local nearest neighbor search is inevitably involved, which is computationally inefficient. Thus, there is still clearly room for improvement in point-based methods.

2.2. Voxel-Based Methods

Point clouds are regularly divided into 3D cubic voxels, and Voxel-based methods employ 3D convolution for the extraction of features. SEGCloud [11] is one of the earlier methods for semantic segmentation based on voxel representation. In order to utilize 3D convolution efficiently and expand the receptive field, 3D sparse convolution [12] is used in Minkowski CNN [13], which reduces the computational complexity of convolution. In pursuit of higher segmentation accuracy, a neural architecture search (NAS) based model SPVNAS [14] is proposed, which trades high computational cost for accuracy. In order to fit the spatial distribution of the LiDAR point clouds, a cylinder voxel division method is proposed in Cylinder3D [15], which makes it obtain high accuracy. In order to streamline computations and enhance the intricacies of smaller instances, an attention-focused feature fusion module and an adaptive feature selection module are proposed by Cheng et al. [16]. To improve the speed of voxel-based networks, a method of knowledge distillation from point to voxel is proposed in PVKD [17] to achieve model compression.
High segmentation accuracy is typically achieved in voxel-based methods. However, 3D convolution is inevitably used, resulting in significant memory occupation and high computational consumption.

2.3. Projection-Based Methods

The basic concept behind projection-based methods is to transform point clouds into images that can undergo 2D convolution operations. The SqueezeSeg [18][19][20] series of algorithms based on SqueezeNet [21] perform semantic segmentation after projecting point clouds. RangeNet++ [22] implements semantic segmentation based on the backbone network of DarkNet53 [23], and a K-Nearest Neighbor (KNN) algorithm is proposed to improve segmentation accuracy. 3D-MiniNet [24] is based on a lightweight backbone to build the network, achieving a faster speed. A polar BEV representation method is proposed in PolarNet [1], which uses a simplified version of PointNet to encode the point clouds of each polar coordinate grid to obtain a pseudo image, and KNN post-processing operation is no longer needed. Peng et al. [25] introduced a multi-attention mechanism to enhance the understanding of driving scenes, specifically focused on dense top-view semantic segmentation using sparse LiDAR data. SalsaNext [26] introduced a new context module, which replaces the ResNet encoder blocks with a residual convolution stack that has increasing receptive fields. Additionally, it incorporated a pixel-shuffle layer into the decoder. MINet [27] employed multiple paths with varying scales to effectively distribute computational resources across different scales. FIDNet-Point [28] designed a fully interpolation decoding module that directly upsamples the multi-resolution feature maps using bilinear interpolation. CENet+KNN [29] incorporated convolutional layers with larger kernel sizes, replacing MLP, and integrated multiple auxiliary segmentation heads into its architecture.
There are obvious advantages in computational complexity and speed in projection-based methods. Therefore, it is significant to improve the segmentation accuracy of projection-based methods for practical application in autonomous driving.

References

  1. Zhang, Y.; Zhou, Z.; David, P.; Yue, X.; Xi, Z.; Gong, B.; Foroosh, H. Polarnet: An improved grid representation for online lidar point clouds semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9601–9610.
  2. Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660.
  3. Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30, 5099–5108.
  4. Wu, W.; Qi, Z.; Fuxin, L. Pointconv: Deep convolutional networks on 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9621–9630.
  5. Landrieu, L.; Simonovsky, M. Large-scale point cloud semantic segmentation with superpoint graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4558–4567.
  6. Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11108–11117.
  7. Thomas, H.; Qi, C.R.; Deschaud, J.-E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6411–6420.
  8. Lu, T.; Wang, L.; Wu, G. Cga-net: Category guided aggregation for point cloud semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 11693–11702.
  9. Qiu, S.; Anwar, S.; Barnes, N. Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1757–1767.
  10. Fan, S.; Dong, Q.; Zhu, F.; Lv, Y.; Ye, P.; Wang, F.-Y. SCF-Net: Learning spatial contextual features for large-scale point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14504–14513.
  11. Tchapmi, L.; Choy, C.; Armeni, I.; Gwak, J.; Savarese, S. Segcloud: Semantic segmentation of 3d point clouds. In Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China, 10–12 October 2017; pp. 537–547.
  12. Graham, B.; Engelcke, M.; Van Der Maaten, L. 3d semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9224–9232.
  13. Choy, C.; Gwak, J.; Savarese, S. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3075–3084.
  14. Tang, H.; Liu, Z.; Zhao, S.; Lin, Y.; Lin, J.; Wang, H.; Han, S. Searching efficient 3d architectures with sparse point-voxel convolution. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 685–702.
  15. Zhou, H.; Zhu, X.; Song, X.; Ma, Y.; Wang, Z.; Li, H.; Lin, D. Cylinder3d: An effective 3d framework for driving-scene lidar semantic segmentation. arXiv 2020, arXiv:2008.01550.
  16. Cheng, R.; Razani, R.; Taghavi, E.; Li, E.; Liu, B. 2-s3net: Attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12547–12556.
  17. Hou, Y.; Zhu, X.; Ma, Y.; Loy, C.C.; Li, Y. Point-to-voxel knowledge distillation for lidar semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8479–8488.
  18. Wu, B.; Wan, A.; Yue, X.; Keutzer, K. Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 1887–1893.
  19. Wu, B.; Zhou, X.; Zhao, S.; Yue, X.; Keutzer, K. Squeezesegv2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 4376–4382.
  20. Xu, C.; Wu, B.; Wang, Z.; Zhan, W.; Vajda, P.; Keutzer, K.; Tomizuka, M. Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 1–19.
  21. Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360.
  22. Milioto, A.; Vizzo, I.; Behley, J.; Stachniss, C. Rangenet++: Fast and accurate lidar semantic segmentation. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 4213–4220.
  23. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767.
  24. Alonso, I.; Riazuelo, L.; Montesano, L.; Murillo, A.C. 3d-mininet: Learning a 2d representation from point clouds for fast and efficient 3d lidar semantic segmentation. IEEE Robot. Autom. Lett. 2020, 5, 5432–5439.
  25. Peng, K.; Fei, J.; Yang, K.; Roitberg, A.; Zhang, J.; Bieder, F.; Heidenreich, P.; Stiller, C.; Stiefelhagen, R. MASS: Multi-attentional semantic segmentation of LiDAR data for dense top-view understanding. IEEE Trans. Intell. Transp. Syst. 2022, 23, 15824–15840.
  26. Cortinhal, T.; Tzelepis, G.; Erdal Aksoy, E. Salsanext: Fast, uncertainty-aware semantic segmentation of lidar point clouds. In Proceedings of the Advances in Visual Computing: 15th International Symposium, ISVC 2020, San Diego, CA, USA, 5–7 October 2020; Proceedings, Part II 15. pp. 207–222.
  27. Li, S.; Chen, X.; Liu, Y.; Dai, D.; Stachniss, C.; Gall, J. Multi-scale interaction for real-time lidar data segmentation on an embedded platform. IEEE Robot. Autom. Lett. 2021, 7, 738–745.
  28. Zhao, Y.; Bai, L.; Huang, X. Fidnet: Lidar point cloud semantic segmentation with fully interpolation decoding. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 4453–4458.
  29. Cheng, H.X.; Han, X.F.; Xiao, G.Q. Cenet: Toward concise and efficient lidar semantic segmentation for autonomous driving. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan, 18–22 July 2022; pp. 1–6.
More
Related Content
This entry provides a comprehensive overview of methods used in image matching. It starts by introducing area-based matching, outlining well-established techniques for determining correspondences. Then, it presents the concept of feature-based image matching, covering feature point detection and description issues, including both handcrafted and learning-based operators. Brief presentations of frequently used detectors and descriptors are included, followed by a presentation of descriptor matching and outlier rejection techniques. Finally, the entry provides a brief overview of relational matching.
Keywords: photogrammetry; computer vision; image matching; feature-based matching; area-based matching; relational matching; handcrafted operators; learning-based operators; outlier rejection
Non-destructive testing (NDT) is essential for evaluating the integrity and safety of structures without causing damage. The integration of artificial intelligence (AI) into traditional NDT methods can revolutionize the field by automating data analysis, enhancing defect detection accuracy, enabling predictive maintenance, and facilitating data-driven decision-making. This entry provides a comprehensive overview of AI-enhanced NDT, detailing AI models and their applications in techniques like ultrasonic testing and ground-penetrating radar. Case studies demonstrate that AI can improve defect detection accuracy and reduce inspection times. Challenges related to data quality, ethical considerations, and regulatory standards were discussed as well. By summarizing established knowledge and highlighting advancements, this entry serves as a valuable reference for engineers and researchers, contributing to the development of safer and more efficient infrastructure management practices. 
Keywords: artificial intelligence; non-destructive testing; predictive maintenance; infrastructure evaluation; civil engineering; structural health monitoring; machine learning
The safe and efficient operation of highways minimizes the environmental impact, reduces accidents, and promotes the reliability of the transportation infrastructure, all in support of sustainable transportation. The horizontal alignment of highways holds particular importance as it directly impacts driver behavior, vehicle stability, and overall road safety. In many cases, highway inventory data held by infrastructure operators may contain inaccurate or outdated information. The accuracy of the variables used in crash prediction models eliminates possible bias in the variable estimators. This research proposes a methodology to obtain accurate horizontal geometric features from digital imagery based on the analysis of the planimetry, feature geolocation and centerline azimuth sequence. The reliability of the method is verified by means of numerical and statistical procedures. This methodology is applied to 150 km of motorway segments in Portugal. Although it is found that the geometric characteristics of most of the inventory segments closely matched the extracted alignments, very significant differences are found in some sections. The results of the proposed procedure are illustrated with several examples. Finally, the propagation of error in the determination of the geometric design independent variables in the fitting of the statistical models is discussed based on the results.
Keywords: GIS; highway geometric design; highway inventory; horizontal alignment; motorways; remote sensing; road extraction; safety; sustainable transportation; traffic management
We present an Arduino-based automation system to control the streetlights based on solar rays and object’s detection. We aim to design various systems to achieve the desired operations, which no longer require time-consuming manual switching of the streetlights. The proposed work is accomplished by using an Arduino microcontroller, a light dependent resistor (LDR) and infrared-sensors while, two main contributions are presented in this work. Firstly, we show that the streetlights can be controlled based on the night and object’s detection. In which the streetlights automatically turn to DIM state at night-time and turn to HIGH state on object’s detection, while during day-time the streetlights will remain OFF. Secondly, the proposed automated system is further extended to skip the DIM condition at night time, and streetlights turn ON based on the objects’ detection only. In addition, an automatic door system is introduced to improve the safety measurements, and most importantly, a counter is set that will count the number of objects passed through the road. The proposed systems are designed at lab-scale prototype to experimentally validate the efficiency, reliability, and low-cost of the systems. We remark that the proposed systems can be easily tested and implemented under real conditions at large-scale in the near future, that will be useful in the future applications for automation systems and smart homes.
Keywords: arduino; automation; street lights
Coverage2 Image
a graph represent the single cell coverage. Source: http://openi.nlm.nih.gov/detailedresult.php?img=3549815_1471-2164-14-S1-S7-1&req=4.
Keywords: bacteria; Escherichia coli (E. coli)
Information
Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register : , , , ,
View Times: 418
Revisions: 2 times (View History)
Update Date: 24 Jan 2024
Video Production Service