Self-Attention-Based 3D Object Detection for Autonomous Driving

Self-Attention-Based 3D Object Detection for Autonomous Driving: Comparison

Please note this is a comparison between Version 2 by Sirius Huang and Version 1 by Husnain Mushtaq.

Autonomous vehicles (AVs) play a crucial role in enhancing urban mobility within the context of a smarter and more connected urban environment. Three-dimensional object detection in AVs is an essential task for comprehending the driving environment to contribute to their safe use in urban environments.

smart cities
3D object dejection
semantic features leaning
self-attention

1. Introduction

Smart sustainable cities use ICT for efficient operations, information sharing, better government services, and citizen well-being, prioritizing technological efficiency over availability for improved urban life [1,2,3,4]^[1][2][3][4]. Autonomous vehicles offer immersive user experiences, shaping future human–machine interactions in smart cities [5,6]^[5][6]. Mobility as a service is set to transform urban mobility in terms of sustainability ^[7]. Cities seek smart mobility solutions to address transport issues ^[8]. AVs’ benefits drive their adoption, mitigating safety concerns. AVs promise traffic improvements, enhanced public transport, safer streets, and better quality of life in eco-conscious digital cities ^[9].

At the core of AV technology lies 3D object detection, a fundamental capability enabling AVs to perceive their surroundings in three dimensions. This 3D object detection is vital for safe autonomous vehicle navigation in smart cities [10,11]^[10][11]. It identifies and comprehends surrounding objects in 3D, enabling obstacle avoidance, path planning, and collision prevention ^[12]. Advancements in this technology enhance urban life through improved autonomous vehicle perception [13,14]^[13][14]. Autonomous vehicles are equipped with various sensors, including cameras, LiDAR (light detection and ranging), radar, and sometimes ultrasonic sensors. These sensors capture data about the surrounding environment ^[15].

Recent advancements in autonomous driving technology have significantly propelled the development of sustainable smart cities [16,17,18]^[16][17][18]. Notably, 3D object detection has emerged as a pivotal element within autonomous vehicles, forming the basis for efficient planning and control processes in alignment with smart city principles of optimization and enhancing citizens’ quality of life, particularly in ensuring the safe navigation of autonomous vehicles (AVs) [19,20,21]^[19][20][21]. LiDAR, an active sensor utilizing laser beams to scan the environment, is extensively integrated into AVs to provide 3D perception in urban environments. Various autonomous driving datasets, such as KITTI, have been developed to enable mass mobility in smart cities [22,23]^[22][23]. Although 3D LiDAR point cloud data are rich in depth and spatial information and less susceptible to lighting variations, it possesses irregularities and sparseness, particularly at longer distances, which can jeopardize the safety of pedestrians and cyclists. Traditional methods for learning point cloud features struggle to comprehend the geometrical characteristics of smaller and distant objects in AVs [24,25]^[24][25].

To overcome geometric challenges and facilitate the use of deep neural networks (DNNs) for processing 3D smart city datasets to ensure safe autonomous vehicle (AV) navigation, custom discretization or voxelization techniques are employed [26,27,28,29,30,31,32,33,34]^{[26][27][28][29][30][31][32][33][34]}. These methods convert 3D point clouds into voxel representations, enabling the application of 2D or 3D convolutions. However, they may compromise geometric data and suffer from quantization loss and computational bottlenecks, posing sustainability challenges for AVs in smart cities. Region proposal network (RPN) backbones exhibit high accuracy and recall but struggle with average precision (AP), particularly for distant or smaller objects. The poor AP score hinders AV integration in sustainable smart cities due to its direct impact on object detection at varying distances [35,36]^[35][36].

2. Sustainable Transportation and Urban Planning

Sustainability has become a paramount concern across industries, with particular focus on the transportation sector. Numerous studies have addressed the implications of autonomous vehicles (AVs) and their potential to revolutionize urban living in smart cities [1,2,3,4,5,6,7,8,9,11]^{[1][2][3][4][5][6][7][8][9][11]}. Shi et al. ^[2] introduced a semantic understanding framework that enhances detection accuracy and scene comprehension in smart cities. Yigitcanlar et al. ^[6] highlighted the need for urban planners and managers to formulate AV strategies for addressing the challenges of vehicle automation in urban areas. Manfreda et al. ^[8] emphasized that the perceived benefits of AVs play a significant role in their adoption, especially when it comes to safety concerns. Campisi et al. ^[9] discussed the potential of the AV revolution to improve traffic flow, enhance public transport, optimize urban space, and increase safety for pedestrians and cyclists, ultimately enhancing the quality of life in cities. Duarte et al. ^[10] explored the impact of AVs on the road infrastructure and how they could reshape urban living and city planning, akin to the transformative shift brought about by automobiles in the past. Heinrichs et al. ^[11] delved into the unique characteristics and prospective applications of autonomous transportation, which has the potential to influence land use and urban planning in distinct ways. Stead et al. ^[18] conducted scenario studies to analyze the intricate effects of AVs on urban structure, including factors like population density, functional diversity, urban layout, and accessibility to public transit. Li et al. ^[26] proposed a deep learning method combining LiDAR and camera data for precise object detection, while Seuwou et al. focused on smart mobility initiatives, emphasizing the significance of CAVs in sustainable development within intelligent transportation systems. Seuwou et al. [45]^[37] present a study that examines smart mobility initiatives and challenges within smart cities, focusing on connected vehicles and AVs. Xu et al. [46]^[38] introduced a fusion strategy utilizing LiDAR, cameras, and radar to enhance object detection in dense urban areas. These studies collectively underscore the importance of developing 3D object detection methods to ensure safe and efficient transportation systems in smart cities, addressing critical sustainability challenges.

3. Point Cloud Representations for 3D Object Detection

LiDAR is vital for AVs, generating unstructured, unordered, and irregular point clouds. Processing these raw points conventionally is challenging. Numerous 3D object detection methods have emerged in recent years [2,26,27,28,29,30,31,33,34,37,47,48,49,50,51]^{[2][26][27][28][29][30][31][33][34][39][40][41][42][43][44]}. These methods are categorized based on their approach to handling 3D LiDAR point cloud input.

3.1. Voxel-Based Methods

Studies have aimed to convert irregular point clouds into regular voxel grids and use CNNs to learn geometric patterns [25,30,34,37]^{[25][30][34][39]}. Early research used high-density voxelization and CNNs for voxel data analysis [26,50,51]^[26][43][44]. Yan et al. introduced the SECOND architecture for improved memory and computational efficiency using 3D sub-manifold sparse convolution ^[34]. PointPillars simplified voxel representation to pillars [37]^[39]. Existing single-stage and two-stage detectors often lack accuracy, especially for small objects [29,32]^[29][32]. ImVoxelNet by Danila et al. increased the memory and computational costs for image to voxel projection ^[25]. Zhou et al. transformed point clouds into regularly arranged 3D voxels, adding 3D CNN for object detection ^[30]. Noh et al. integrated voxel-based and point-based features for efficient single-stage 3D object detection [50]^[43]. Shi et al. proposed a voxel-based roadside LiDAR feature encoding module that voxelizes and projects raw point clouds into BEV for dense feature representation with reduced computational overhead ^[2]. Voxel-based approaches offer reasonable 3D object detection performance with efficiency but may suffer from quantization loss and structural complexity, making optimal resolution determination challenging for local geometry and related contexts.

3.2. Point-Based Methods

Different to voxel-based methods, point-based methods generate the 3D objects by direct learning of unstructured geometry from raw point clouds [28,49]^[28][42]. To deal with the unordered nature of 3D point clouds, point-based methods incorporate PointNet [48]^[41] and its different variants [29,39]^[29][45] to aggregate the point-wise features employing symmetric functions. Shi et al. ^[29] presented a regional proposal two-staged 3D object detection framework: Point-RCCN. This method works in quite an interesting way as it generates object proposals from foreground point segments and then exploits the local spatial and semantic features to regress the high-quality 3D bounding boxes.

Qi et al. [52]^[46] proposed voteNet, a deep Hough voting-based one-stage point 3D object detector to predict the centroid of an instance. Yang et al. [53]^[47] proposed 3DSSD, a single-staged 3D object detection framework. It uses farthest point sampling (FPS), a very popular approach, and Euclidean space as a fusion sampling strategy. PointGNN [54]^[48] is a generalized graph neural network for 3D object detection. Point-based methods are not as resource intensive as voxel-based methods. Point-based methods are intuitive and straightforward and do not require any extra pre-processing and simply take raw point clouds as input. The drawback of point-based methods is their limited efficiency and insufficient learning ability.

3.3. Weak Semantic Information for 3D Object Detection

In autonomous driving, point cloud sampling often yields sparse coverage. For example, when aligning KITTI dataset color images with raw point clouds, only about 3% of pixels have corresponding points [42,55]^[49][50]. This extreme sparsity challenges high-level semantic perception from point clouds. Existing 3D object detection methods [29,30,31,33,34,37]^{[29][30][31][33][34][39]} typically extract local features from raw point clouds but struggle to capture comprehensive feature information and feature interactions. Sparse point cloud data, limitations in local feature extraction, and insufficient feature interactions lead to weak semantic information in 3D object detection models, notably affecting performance for distant and smaller objects.

Both voxel-based [30,34,37]^[30][34][39] and point-based [29,48]^[29][41] methods face weak semantic information challenges in sparse point clouds. For example, Yukang et al. [56]^[51] proposed a complex approach with focus sparse convolution and multi-modal expansion but with high computational costs and complexity. Qiuxiao et al. [57]^[52] introduced a sparse activation map (SAM) for voxel-based techniques, and Pei et al. [58]^[53] developed range sparse net (RSN) for real-time 3D object detection from dense images but with spatial depth information issues. Mengye et al. [59]^[54] introduced a sparse blocks network (SBNet) for voxel-based methods. Shi et al. ^[2] incorporated multi-head self-attention and deformable cross-attention for interacting vehicles. Existing methods focus on downstream tasks, under-utilize object feature information, and are often limited to either voxel-based or point-based models, reducing their generalizability.

4. Self-Attention Mechanism

The recent success of transformers in various computer vision domains [42,60]^[49][55] has led to a new paradigm in object detection. Transformers have proven to be highly effective in learning local context-aware representations. DETR [60]^[55] introduced this paradigm by treating object detection as a set prediction problem and employed transformers with parallel decoding to detect objects in 2D images. The application of point transformers [42]^[49] in self-attention networks for 3D point cloud processing and object classification has gained attention recently. Particularly, the point cloud transformer (PCT) framework ^[21] has been utilized for learning from point clouds and improving embedded input. PCT incorporates essential functionalities such as farthest-point sampling and nearest-neighbor searching. In the context of 3D object detection, Bhattacharyya et al. [61]^[56] proposed two variants of self-attention for contextual modeling. These variants augment convolutional features with self-attention features to enhance the overall performance of 3D object detection. Additionally, Jiageng et al. [62]^[57] introduced voxel transformer (VoTr), a novel and effective voxel-based transformer backbone specifically designed for point cloud 3D object detection. Shi et al. ^[2] employed multi-attention and cross-attention to establish a dense feature representation through feature re-weighting.

Overall, these studies highlight the importance of 3D object detection techniques in enhancing the perception capabilities of autonomous vehicles and contribute to the development of safer and more efficient transportation systems in smart cities.

References

Mitieka, D.; Luke, R.; Twinomurinzi, H.; Mageto, J. Smart Mobility in Urban Areas: A Bibliometric Review and Research Agenda. Sustainability 2023, 15, 6754.
Shi, H.; Hou, D.; Li, X. Center-Aware 3D Object Detection with Attention Mechanism Based on Roadside LiDAR. Sustainability 2023, 15, 2628.
Lee, H.K. The Relationship between Innovative Technology and Driver’s Resistance and Acceptance Intention for Sustainable Use of Automobile Self-Driving System. Sustainability 2022, 14, 10129.
Zhang, D.; Li, Y.; Li, Y.; Shen, Z. Service Failure Risk Assessment and Service Improvement of Self-Service Electric Vehicle. Sustainability 2022, 14, 3723.
Xia, T.; Lin, X.; Sun, Y.; Liu, T. An Empirical Study of the Factors Influencing Users’ Intention to Use Automotive AR-HUD. Sustainability 2023, 15, 5028.
Yigitcanlar, T.; Wilson, M.; Kamruzzaman, M. Disruptive Impacts of Automated Driving Systems on the Built Environment and Land Use: An Urban Planner’s Perspective. J. Open Innov. Technol. Mark. Complex. 2019, 5, 24.
Musa, A.A.; Malami, S.I.; Alanazi, F.; Ounaies, W.; Alshammari, M.; Haruna, S.I. Sustainable Traffic Management for Smart Cities Using Internet-of-Things-Oriented Intelligent Transportation Systems (ITS): Challenges and Recommendations. Sustainability 2023, 15, 9859.
Manfreda, A.; Ljubi, K.; Groznik, A. Autonomous vehicles in the smart city era: An empirical study of adoption factors important for millennials. Int. J. Inf. Manag. 2021, 58, 102050.
Campisi, T.; Severino, A.; Al-Rashid, M.A.; Pau, G. The Development of the Smart Cities in the Connected and Autonomous Vehicles (CAVs) Era: From Mobility Patterns to Scaling in Cities. Infrastructures 2021, 6, 100.
Duarte, F.; Ratti, C. The Impact of Autonomous Vehicles on Cities: A Review. J. Urban Technol. 2018, 25, 3–18.
Heinrichs, D. Autonomous Driving and Urban Land Use. In Autonomous Driving: Technical, Legal and Social Aspects; Maurer, M., Gerdes, J.C., Lenz, B., Winner, H., Eds.; Springer: Berlin/Heidelberg, Germany, 2016; pp. 213–231.
Leonard, J.; How, J.; Teller, S.; Berger, M.; Campbell, S.; Fiore, G.; Fletcher, L.; Frazzoli, E.; Huang, A.; Karaman, S.; et al. A Perception-Driven Autonomous Urban Vehicle. In The DARPA Urban Challenge: Autonomous Vehicles in City Traffic; Springer: Berlin/Heidelberg, Germany, 2009; pp. 163–230.
Dai, D.; Chen, Z.; Bao, P.; Wang, J. A review of 3d object detection for autonomous driving of electric vehicles. World Electr. Veh. J. 2021, 12, 139.
Wang, K.; Zhou, T.; Li, X.; Ren, F. Performance and Challenges of 3D Object Detection Methods in Complex Scenes for Autonomous Driving. IEEE Trans. Intell. Veh. 2022, 8, 1699–1716.
Rosique, F.; Navarro, P.J.; Fernández, C.; Padilla, A. A systematic review of perception system and simulators for autonomous vehicles research. Sensors 2019, 19, 648.
Rahman, M.M.; Thill, J.C. What Drives People’s Willingness to Adopt Autonomous Vehicles? A Review of Internal and External Factors. Sustainability 2023, 15, 11541.
Yao, L.Y.; Xia, X.F.; Sun, L.S. Transfer Scheme Evaluation Model for a Transportation Hub based on Vectorial Angle Cosine. Sustainability 2014, 6, 4152–4162.
Stead, D.; Vaddadi, B. Automated vehicles and how they may affect urban form: A review of recent scenario studies. Cities 2019, 92, 125–133.
Pham Do, M.S.; Kemanji, K.V.; Nguyen, M.D.V.; Vu, T.A.; Meixner, G. The Action Point Angle of Sight: A Traffic Generation Method for Driving Simulation, as a Small Step to Safe, Sustainable and Smart Cities. Sustainability 2023, 15, 9642.
Feng, D.; Haase-Schütz, C.; Rosenbaum, L.; Hertlein, H.; Gläser, C.; Timm, F.; Wiesbeck, W.; Dietmayer, K. Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges. IEEE Trans. Intell. Transp. Syst. 2021, 22, 1341–1360.
Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep Learning for 3D Point Clouds: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 4338–4364.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.U.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30.
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186.
Yu, H.; Luo, Y.; Shu, M.; Huo, Y.; Yang, Z.; Shi, Y.; Guo, Z.; Li, H.; Hu, X.; Yuan, J.; et al. DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; Volume 2022.
Rukhovich, D.; Vorontsova, A.; Konushin, A. ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022.
Li, S.; Geng, K.; Yin, G.; Wang, Z.; Qian, M. MVMM: Multi-View Multi-Modal 3D Object Detection for Autonomous Driving. IEEE Trans. Ind. Inform. 2023, 1–9.
Xie, L.; Xiang, C.; Yu, Z.; Xu, G.; Yang, Z.; Cai, D.; He, X. PI-RCNN: An efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12460–12467.
Zhao, X.; Liu, Z.; Hu, R.; Huang, K. 3D object detection using scale invariant and feature reweighting networks. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2019; Volume 33, pp. 9267–9274.
Shi, S.; Wang, X.; Li, H. PointRCNN: 3D object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; Volume 2019.
Zhou, Y.; Tuzel, O. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018.
Yang, B.; Luo, W.; Urtasun, R. Pixor: Real-time 3d object detection from point clouds. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7652–7660.
Xu, W.; Hu, J.; Chen, R.; An, Y.; Xiong, Z.; Liu, H. Keypoint-Aware Single-Stage 3D Object Detector for Autonomous Driving. Sensors 2022, 22, 1451.
Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3D object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; Volume 2017.
Yan, Y.; Mao, Y.; Li, B. Second: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337.
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 2017, 30.
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; Volume 2017.
Seuwou, P.; Banissi, E.; Ubakanma, G. The Future of Mobility with Connected and Autonomous Vehicles in Smart Cities. In Digital Twin Technologies and Smart Cities; Farsi, M., Daneshkhah, A., Hosseinian-Far, A., Jahankhani, H., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 37–52.
Xu, X.; Dong, S.; Xu, T.; Ding, L.; Wang, J.; Jiang, P.; Song, L.; Li, J. FusionRCNN: LiDAR-Camera Fusion for Two-Stage 3D Object Detection. Remote Sens. 2023, 15, 1839.
Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; Volume 2019.
Liang, M.; Yang, B.; Wang, S.; Urtasun, R. Deep Continuous Fusion for Multi-sensor 3D Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Volume 11220. LNCS.
Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum PointNets for 3D Object Detection from RGB-D Data. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018.
Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3D Proposal Generation and Object Detection from View Aggregation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018, Madrid, Spain, 1–5 October 2018.
Noh, J.; Lee, S.; Ham, B. HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021.
Liu, Z.; Zhao, X.; Huang, T.; Hu, R.; Zhou, Y.; Bai, X. TANet: Robust 3D object detection from point clouds with triple attention. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020.
Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph Cnn for learning on point clouds. ACM Trans. Graph. 2019, 38, 1–12.
Qi, C.R.; Litany, O.; He, K.; Guibas, L. Deep hough voting for 3D object detection in point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; Volume 2019.
Yang, Z.; Sun, Y.; Liu, S.; Jia, J. 3DSSD: Point-based 3d single stage object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020.
Shi, S.; Guo, C.; Jiang, L.; Wang, Z.; Shi, J.; Wang, X.; Li, H. PV-RCNN: Point-voxel feature set abstraction for 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020.
Engel, N.; Belagiannis, V.; Dietmayer, K. Point transformer. IEEE Access 2021, 9, 16259–16268.
Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237.
Chen, Y.; Li, Y.; Zhang, X.; Sun, J.; Jia, J. Focal Sparse Convolutional Networks for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5418–5427.
Chen, Q.; Li, P.; Xu, M.; Qi, X. Sparse Activation Maps for Interpreting 3D Object Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 20–25 June 2021; pp. 76–84.
Sun, P.; Wang, W.; Chai, Y.; Elsayed, G.; Bewley, A.; Zhang, X.; Sminchisescu, C.; Anguelov, D. RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 20–25 June 2021.
Ren, M.; Pokrovsky, A.; Yang, B.; Urtasun, R. SBNet: Sparse Blocks Network for Fast Inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018.
Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv 2020, arXiv:2010.04159.
Bhattacharyya, P.; Huang, C.; Czarnecki, K. SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3022–3031.
Mao, J.; Niu, M.; Bai, H.; Liang, X.; Xu, H.; Xu, C. Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021.