Point Cloud Semantic Segmentation

Point Cloud Semantic Segmentation: Comparison

Please note this is a comparison between Version 2 by Jessie Wu and Version 1 by Yunjia Feng.

For autonomous vehicles driving in off-road environments, it is crucial to have a sensitive environmental perception ability. However, semantic segmentation in complex scenes remains a challenging task. Most current methods for off-road environments often have the problems of single scene and low accuracy.

point cloud
semantic segmentation
multi-scale feature fusion

1. Introduction

Autonomous vehicles are an innovative mode of transportation based on advanced sensors, computer vision, and artificial intelligence technologies. They do not require human intervention, thus performing driving tasks in various environments more safely and efficiently. For autonomous vehicles, in addition to performing tasks on regular, structured road scenes, sometimes they also need to work in off-road scenes, such as battlefields, post-disaster scenes, etc. In this case, the road is usually unstructured, which is described as an off-road environment. As we can see, it is especially crucial to establish a perception algorithm to deal with autonomous vehicles driving in off-road environments, which will help the unmanned platform realize full-scene perception.

Environmental perception is an important part of the unmanned platform. Cameras and LiDAR (light detection and ranging) are the two main sensors used to obtain environmental information. Camera-based semantic segmentation algorithms mostly rely on the texture or color features of roads, such as boundaries ^[1], lane lines ^[2], or vanishing points ^[3]. Some depth camera-based methods ^[2] use depth information as an auxiliary to perform semantic segmentation in off-road environments. Although good segmentation results have been achieved, they are not robust enough due to the influence of illumination changes. Essentially, camera-based environmental perceptions are realized based on color and texture, which are greatly affected by light, thus causing them to fail at night. In off-road environments, it is necessary to work at night, so the camera sensor is not suitable to be the main sensor. As an active detection sensor, LiDAR is also widely used in environmental perception algorithms. Compared to other onboard sensors, LiDAR can provide richer environmental information ^[4]. Yu et al. ^[5] used LiDAR to perceive street light poles, and Liu et al. ^[6] used LiDAR to greatly improve the detection distance of vehicles, pedestrians, and cyclists. Unlike a passive receiving device such as a camera, LiDAR is a suitable sensor to be used on rainy or foggy days and where the light intensity changes drastically. The advantage is that its sensitivity to the environment is extremely low. Therefore, LiDAR has been widely used in the field of environmental perception on the vehicle side. Additionally, its high robustness to environmental perception makes it very suitable for off-road environments.

2. Structured Road Scene 3D Semantic Segmentation

Semantic segmentation of point cloud refers to classifying each point of the input according to the corresponding class so that different types of objects can be distinguished. For the semantic segmentation of 3D point clouds in structured outdoor scenes, the input point cloud can be encoded in three ways: voxel-based, point-based, and projection-based.

In the projection-based algorithm, the 3D point cloud is projected into the 2D space, and then the semantic segmentation network is used to calculate the “pseudo image” in the 2D space. After that, the segmentation result is back-projected to the coordinate space of the 3D point cloud by interpolation to realize the semantic segmentation of the original point cloud. Among them, SqueezeSeg [8]^[7], SqueezeSegv2 [9]^[8], SqueezeSegv3 [10]^[9], Salsanext [11]^[10], etc., use spherical projection, while PolarNet [12]^[11] and VD3D-FCN [13]^[12] algorithms use bird’s-eye view projection for feature extraction.

Voxel-based semantic segmentation algorithms re-encode the 3D space with voxels. For example, VoxelNet [14]^[13] is a typical algorithm that uses voxels to achieve semantic segmentation of 3D point clouds. It divides the 3D point cloud spaced at equal intervals, which are called voxels. Each voxel is converted into a unified feature representation vector by a VFE (voxel feature encoding) layer, and feature extraction is performed based on this.

Semantic segmentation based on point cloud sequence is a method used to directly extract the feature on the original unordered point cloud sequence. Additionally, a multi-layer perceptron is used to directly perform semantic encoding and spatial position calculation on the point cloud itself. For example, PointNet [15]^[14], PointNet++ [16]^[15], RandLA-Net [17]^[16], and KPConv [18]^[17] are point-based algorithms.

These three feature extraction methods have their own advantages and disadvantages. The method based on projection is usually faster than the method that extracts features directly in three-dimensional space. However, the precision loss caused by the forward and reverse projections cannot be ignored. Voxel-based methods are also widely used. After voxel coding, whether deep learning or traditional clustering algorithms are used, target recognition or segmentation tasks can be effectively performed. However, the 3D convolution algorithm is less efficient as the data size increases. Algorithms based on point sequences have high computational efficiency, but poor locality and easy loss of features, thus making it difficult for some small objects to be segmented from large objects. Until the PVCNN [19]^[18] algorithm was proposed, the fusion of voxel and point cloud sequences for feature extraction greatly improved the accuracy and efficiency. After that, RPVNet [20]^[19] fused three feature extraction methods and obtained excellent semantic segmentation results.

3. Off-Road Point Cloud Semantic Segmentation

The above methods for structured road provide many valuable references for off-road segmentation. The biggest difference between the off-road environments and the structured road scene is that its drivable area has no lane lines, no obvious road boundaries, and even no regular shape. Such off-road is very different from structured road, so it is difficult to directly apply the 3D point cloud semantic segmentation algorithm based on structured road scenes. At present, the semantic segmentation algorithms of off-road scenes are mainly divided into three categories: feature engineering based on point clouds, weakly supervised learning, and transfer learning

The feature engineering algorithm based on point clouds performs road segmentation by extracting the geometric features of roads in off-road scenes. On one hand, Liu et al. [21]^[20] focus on identifying negative obstacles on the road. First, three LiDARs are installed directly above and on both sides of the vehicle. Then a mathematical model of the LiDAR scan line is established, and an adaptive filtering algorithm is proposed to identify negative obstacles based on this model. Finally, the operation results of the three LiDARs are fused to detect the drivable area and negative obstacles of the off-road. On the other hand, Liang et al. [22]^[21] project the LiDAR point cloud into a two-dimensional image plane and generate a histogram from it. Water, positive obstacles, and drivable areas in off-road scenes are detected from the histogram. Finally, the result is back-projected into the LiDAR coordinate system. Although the feature engineering method has achieved good results in specific off-road scenes, it has significant constraints, merely possessing the capacity to classify a few specific elements in the scene, and failing to adapt to various off-road scenes.

Gao et al. [23]^[22] projected the original point cloud onto the image plane through a bird’s-eye view, and then used the GPS information of the moving vehicle to obtain the driving trajectory. On the projected image, the region growth algorithm is performed on the driving trajectory to automatically generate the label of the drivable region, and combined with a small amount of manually labeled data as the training dataset, a good segmentation result is finally achieved. Meanwhile, the workload of manual annotation is greatly reduced. Holder et al. [24]^[23] use an existing CNN framework to pre-train on a dataset of urban structured road scenes. They then use a small dataset of off-road scenes to re-determine the segmentation classes for transfer learning. While achieving good results, it can effectively reduce the labeling of off-road scene LiDAR point cloud data.

To sum up, the main problem in designing semantic segmentation algorithms for off-road scenes is the lack of datasets. Existing algorithms mainly use geometric features or combine specific algorithms with a small amount of data to perform semantic segmentation. However, lower accuracy is still a big problem. Therefore, on one hand, the research should focus on how to obtain a large amount of high-quality data. Relying on computer simulation technology, typical off-road scenes can be built to obtain a large number of accurately labeled datasets. On the other hand, more targeted algorithms should be designed according to the characteristics of off-road scenes. The above two aspects have important engineering value and academic significance for improving the semantic segmentation accuracy of off-road scenes.

References

Yuan, Y.; Jiang, Z.; Wang, Q. Video-based Road detection via online structural learning. Neurocomputing 2015, 168, 336–347.
Broggi, A.; Cardarelli, E.; Cattani, S.; Sabbatelli, M. Terrain mapping for off-road Autonomous Ground Vehicles using rational B-Spline surfaces and stereo vision. In Proceedings of the 2013 IEEE Intelligent Vehicles Symposium (IV), Gold Coast, Australia, 23–26 June 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 648–653.
Wang, P.-S.; Liu, Y.; Sun, C.; Tong, X. Adaptive O-CNN: A Patch-based Deep Representation of 3D Shapes. ACM Trans. Graph. 2018, 37, 1–11.
Xia, X.; Bhatt, N.P.; Khajepour, A.; Hashemi, E. Integrated Inertial-LiDAR-Based Map Matching Localization for Varying Environments. IEEE Trans. Intell. Veh. 2023, 1–12.
Yu, Y.; Li, J.; Guan, H.; Wang, C.; Yu, J. Semiautomated extraction of street light poles from mobile LiDAR point-clouds. IEEE Trans. Geosci. Remote Sens. 2014, 53, 1374–1386.
Liu, H.; Lin, C.; Gong, B.; Wu, D. Extending the Detection Range for Low-Channel Roadside LiDAR by Static Background Construction. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12.
Wu, B.; Wan, A.; Yue, X.; Keutzer, K. Squeezeseg: Convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3d lidar point cloud. In Proceedings of the ICRA, IEEE International Conference on Robotics and Automation, Brisbane, Australia, 21–25 May 2018; pp. 1887–1893.
Wu, B.; Zhou, X.; Zhao, S.; Yue, X.; Keutzer, K. Squeezesegv2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud. In Proceedings of the IEEE International Conference on Robotics and Automation, Montreal, QC, Canada, 20–24 May 2019; pp. 4376–4382.
Xu, C.; Wu, B.; Wang, Z.; Zhan, W.; Vajda, P.; Keutzer, K.; Tomizuka, M. Squeezesegv3: Spatially-adaptive convolution for efficient point- cloud segmentation. arXiv 2020, arXiv:2004.01803.
Cortinhal, T.; Tzelepis, G.; Erdal Aksoy, E. SalsaNext: Fast, uncertainty-aware semantic segmentation of LiDAR point clouds. In Proceedings of the International Symposium on Visual Computing, San Diego, CA, USA, 5–7 October 2020; Springer: Cham, Switzerland, 2020; pp. 207–222.
Zhang, Y.; Zhou, Z.; David, P.; Yue, X.; Xi, Z.; Gong, B.; Foroosh, H. Polarnet: An improved grid representation for online lidar point clouds se-mantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–18 June 2020; pp. 9601–9610.
Li, B.; Zhang, T.; Xia, T. Vehicle detection from 3d lidar using fully convolutional network. arXiv 2016, arXiv:1608.07916.
Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499.
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660.
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. arXiv 2017, arXiv:1706.02413.
Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11108–11117.
Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6411–6420.
Liu, Z.; Tang, H.; Lin, Y.; Han, S. Point-voxel cnn for efficient 3d deep learning. arXiv 2019, arXiv:1907.03739.
Xu, J.; Zhang, R.; Dou, J.; Zhu, Y.; Sun, J.; Pu, S. Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 16024–16033.
Liu, T.; Liu, D.; Yang, Y.; Chen, Z. Lidar-based traversable region detection in off-road environment. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 4548–4553.
Chen, L.; Yang, J.; Kong, H. Lidar-histogram for fast road and obstacle detection. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1343–1348.
Gao, B.; Xu, A.; Pan, Y.; Zhao, X.; Yao, W.; Zhao, H. Off-road drivable area extraction using 3D LiDAR data. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1505–1511.
Holder, C.J.; Breckon, T.P.; Wei, X. From on-road to off: Transfer learning within a deep convolutional neural network for segmentation and classification of off-road scenes. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 149–162.