The Detection of Lanes and Lane Markings

The Detection of Lanes and Lane Markings: Comparison

Please note this is a comparison between Version 2 by Wendy Huang and Version 1 by Xianwang Yu.

Vision-based identification of lane area and lane marking on the road is an indispensable function for intelligent driving vehicles, especially for localization, mapping and planning tasks. However, due to the increasing complexity of traffic scenes, such as occlusion and discontinuity, detecting lanes and lane markings from an image captured by a monocular camera becomes persistently challenging. The lanes and lane markings have a strong position correlation and are constrained by a spatial geometry prior to the driving scene. Most existing studies only explore a single task, i.e., either lane marking or lane detection, and do not consider the inherent connection or exploit the modeling of this kind of relationship between both elements to improve the detection performance of both tasks.

lane marking detection
lanes
deep learning
multi-task
autonomous driving

1. Introduction

Lanes and lane markings are essential road information for intelligent driving vehicles. The lane marking detection aims to accurately locate road elements like lane lines, crosswalks, and stop zones, while the lane detection focuses on segmenting lane-level areas where vehicles can drive on the road. Due to the low cost and the high representability of scene information, optical sensors and instruments, such as the on-board camera, are widely adopted for road information perception. By applying lane and lane marking detection approaches, visual features of road symbols, arrows, lane markings, pedestrian crosswalks, and vehicle drivable areas, etc., are extracted from the image. These features are indispensable for both high-level autonomous driving or for general ADAS-assisted driving systems. They can be considered as elements in the high-definition map construction, or further converted into the information required by the planning and control system, to assist the driving behavior of vehicles, especially in applications such as adaptive cruise control (ACC), driving route navigation, lane keeping assistance (LKA), etc., thus ensuring driving safety and reliability [1,2,3]^[1][2][3].

Generally, the detection of lanes and lane markings can be classified in two categories: the traditional paradigm [4,5]^[4][5], and the deep learning paradigm [6,7,8]^[6][7][8]. Traditional methods rely on hand-crafted features and sophistically designed rules to manipulate the information from color space or shape structure to detect lanes and lane markings. Due to their poor feature representability, these methods are only limited scalable to varied scenes. In recent years, the deep learning approaches in computer vision have achieved remarkable progress, especially in object detection and semantic segmentation tasks. Since the lanes and lane lines are normally made with inherent long and thin shapes and even irregular ones, the difficulty lies in the exploration of effective representation learning of their complex structures.

In current studies, the detection of lanes and lane markings are considered as two individual tasks. The lane detection is typically interpreted as a pixel-wise semantic segmentation problem while the lane marking can be predicted with various formulations such as instance segmentation [9^[9][10],10], point regression [11^[11][12],12], curve parameter estimation [13^[13][14][15],14,15], etc. Although both tasks have witnessed persistent progress in recent years, especially on public benchmarks [9^[9][16][17],16,17], one fact that has been neglected is that the information of lanes and lane markings on the road scene are complementary to each other. For instance, on structured roads, the associated lane lines can be used to identify the lane boundaries while in scenarios where lane lines or crosswalks are partially missing or broken (due to occlusion), they can still be inferred by the width of the lane. Thus, the detection of lane and lane marking are inherently correlated due to their spatial connectivity. In real driving scenarios, the detection robustness of a single task is poor, and it can be easily disturbed by the disappearance of visual markings, e.g., due to occlusions. However, leveraging the spatial connectivity between the lane and lane marking, the detection robustness can be improved by modeling this internal connection, which has not been studied in existing methods.

2. Lane Marking Detection

As aforementioned, traditional lane marking detection approaches generally rely on sophisticated model design and hand-crafted features, involving color conversion ^[4], combination of Kalman and particle filter ^[18], bar filter ^[19] and Hough transform ^[5]. These approaches directly output lane segments, which are further post-processed to remove false positives and grouped to form the lane markings. Aly ^[20] proposed a robust real-time lane marking detection method, which first generated a top view image by projection transform and then extracted lane markings using a bar filter and a simple Hough transform. Assidiq et al. ^[21] detected edges with the Canny operator and extracted line features through the Hough transform. The lane marking was obtained by line fitting to selected pixels. However, limited by the poor feature representation, traditional methods show inrobustness in complex scenarios, such as with broken lane markings or occlusion by vehicles and pedestrians. In recent years, the deep learning technique has significantly boosted the lane marking detection performance. According to the modeling strategy, such approaches can be classified into four categories: segmentation-based, anchor-based, row-wise detection, and parametric prediction methods. The segmentation-based methods commonly adopt the semantic segmentation or instance segmentation to make pixel-wise predictions [6,9,10]^[6][9][10]. Supervised by a sufficient amount of labeled data, these approaches show advantages in detecting various kinds of lane markings. The aerial LaneNet ^[22] proposed a fully convolutional neural network in a symmetrical structure, which is enhanced by wavelet transform for lane marking segmentation in aerial imagery. Guan et al. ^[23] incorporated the attention mechanism into FPN networks to extract better road marking segmentation results from high resolution UAV images. The anchor-based methods leverage the anchor concept from traditional object detection, but differ from them by taking into account the shape characteristics of lane markings. For instance, the PointLaneNet ^[7] and CurveLane-NAS ^[24] define anchors with vertical lines, while the Line-CNN ^[11] and LaneATT ^[12] adopt the Line Proposal Unit, which resembles the Region Proposal network (RPN) of the Faster-RCNN ^[25]. The row-wise detection approaches make full use of the prior shape of lane markings as well as their spatial distribution characteristics. They divide the image into grids and make row-wise predictions to locate the lane markings [26,27,28]^[26][27][28]. In contrast, the parametric prediction methods define lane markings (especially lane lines) as curve functions with a set of parameters, such as polynomials [13^[13][14],14], and Bézier curves ^[15]. Their interpretations are significantly different from the above-mentioned methods and the corresponding curve parameters are difficult to learn. In addition, to solve the problem of difficult scenes for lane marking detection such as occlusion and low-visibility, Wang et al. ^[29] proposed a dynamic data augmentation framework based on imitating real scenes.

3. Lane Detection

The task of lane detection is also known as the drivable area detection, which is mainly classified as a segmentation task at present. As a result of the great successes of the deep learning, many methods based on semantic segmentation and instance segmentation can be transferred to the drivable area detection. The FCN ^[30] is the first work to introduce the fully convolutional network to semantic segmentation, which makes CNN-based methods widely applicable for lane detection. The UNet ^[31] further constructs an encoder–decoder framework to extract lane semantic information from high-dimensional features. The DeepLabV3 ^[32] combines the atrous convolutions ^[33] with different artous rates to fuse the feature pyramid, namely ASPP, obtaining different receptive fields on feature maps. The PSPNet ^[34] proposes the pyramid pooling module for feature extraction of various scales, which enhances the accuracy of the model. It is also worth noting that both DeepLabV3 and PSPNet leverage the fusion of multi-scale feature information to improve the segmentation performance. He et al. ^[35] embedded the Swin transformer into the classical network (UNet) to improve the semantic segmentation performance for remote sensing images. Xie et al. ^[36] presented a segmentation method for RGB-D data and adopted the motion detection to improve the inference accuracy. Meyer et al. ^[37] expanded the Cityscapes dataset ^[38] by lane-level annotations and presented a novel lane detection pipeline, which used the stereo system to convert the front-view segmentation results into a form of 3D point cloud and projected it to the top-view. Sun et al. ^[39] proposed to leverage crowd-sourced GPS data to extract roads from an aerial image, which achieved improved road segmentation compared to previous works. Fontanelli et al. ^[40] performed lane detection in the front-view image and projected it to the top-view for the construction of the path, which is used to plan the future motion of the robot.

4. Multi-Task Approaches

Although previous studies have achieved excellent performance in a single detection or segmentation task, the multi-task architecture to process perception information is more friendly to practical applications. The goal of multi-task approaches is to establish a trade-off between the detection performance and the computational complexity by utilizing the shared feature information and model structure. The MultiNet ^[41] first introduces a multi-task architecture into the autonomous driving perception task. The architecture adopts a shared backbone and three decoders to perform tasks of road segmentation, vehicle detection, and scene classification simultaneously. The DLT-Net ^[42] inherits the encoder–decoder architecture with a shared backbone and multi-task decoders. It transmits the information from the drivable area decoder, namely the context tensor, to both the lane marking decoder and the traffic object decoder, thus sharing the decoder information to a certain extent. The RBNet ^[43] proposes a multi-task neural network model for unified detection of road and road boundary, which combined the input image, road and road boundary as three nodes into a Bayesian network. Zhang et al. ^[44] considered the geometric constraint between the road and its boundaries and constructed interlinked sub-networks for overall performance improvement of both detection tasks. The RoadNet ^[45] develops a multi-task convolutional neural network to simultaneously make predictions of road boundaries, surfaces, and centerline based on the high-resolution images from remote sensing. The HYDRO-3D ^[46] incorporates object detection features with historical object tracking information to improve the performance of both tasks, which achieves robust object detection. Xia et al. ^[47] proposes a platform for automated driving system data acquisition and analysis, which presents a holistic pipeline for data processing based on connected automated vehicles. However, the exploration on the interaction between lane and lane marking information is still insufficient in the above-mentioned studies.

References

HERE. HERE HD Live Map: The Most Intelligent Sensor for Autonomous Driving. 2022. Available online: https://www.here.com/platform/automotive-services/hd-maps (accessed on 1 December 2022).
TomTom. HD Maps—Highly Accurate Border-to-Border Model of the Road. 2022. Available online: https://www.tomtom.com/products/hd-map (accessed on 1 December 2022).
Homayounfar, N.; Ma, W.C.; Liang, J.; Wu, X.; Fan, J.; Urtasun, R. DAGmapper: Learning to map by discovering lane topology. In Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2911–2920.
Chiu, K.Y.; Lin, S.F. Lane detection using color-based segmentation. In Proceedings of the IEEE Intelligent Vehicles Symposium, Las Vegas, NV, USA, 6–8 June 2005; pp. 706–711.
Satzoda, R.K.; Sathyanarayana, S.; Srikanthan, T.; Sathyanarayana, S. Hierarchical additive Hough transform for lane detection. IEEE Embed. Syst. Lett. 2010, 2, 23–26.
Hou, Y.; Ma, Z.; Liu, C.; Loy, C.C. Learning lightweight lane detection cnns by self attention distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1013–1021.
Chen, Z.; Liu, Q.; Lian, C. Pointlanenet: Efficient end-to-end cnns for accurate real-time lane detection. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 2563–2568.
Tian, W.; Ren, X.; Yu, X.; Wu, M.; Zhao, W.; Li, Q. Vision-based mapping of lane semantics and topology for intelligent vehicles. Int. J. Appl. Earth Obs. Geoinf. 2022, 111, 102851.
Pan, X.; Shi, J.; Luo, P.; Wang, X.; Tang, X. Spatial as deep: Spatial cnn for traffic scene understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32.
Zheng, T.; Fang, H.; Zhang, Y.; Tang, W.; Yang, Z.; Liu, H.; Cai, D. Resa: Recurrent feature-shift aggregator for lane detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 3547–3554.
Li, X.; Li, J.; Hu, X.; Yang, J. Line-cnn: End-to-end traffic line detection with line proposal unit. IEEE Trans. Intell. Transp. Syst. 2019, 21, 248–258.
Tabelini, L.; Berriel, R.; Paixao, T.M.; Badue, C.; De Souza, A.F.; Oliveira-Santos, T. Keep your eyes on the lane: Real-time attention-guided lane detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 294–302.
Tabelini, L.; Berriel, R.; Paixao, T.M.; Badue, C.; De Souza, A.F.; Oliveira-Santos, T. Polylanenet: Lane estimation via deep polynomial regression. In Proceedings of the International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2021; pp. 6150–6156.
Liu, R.; Yuan, Z.; Liu, T.; Xiong, Z. End-to-end lane shape prediction with transformers. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Online, 5–9 January 2021; pp. 3694–3702.
Feng, Z.; Guo, S.; Tan, X.; Xu, K.; Wang, M.; Ma, L. Rethinking Efficient Lane Detection via Curve Modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17062–17070.
Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2636–2645.
TuSimple. 2022. Available online: http://benchmark.tusimple.ai/ (accessed on 1 December 2022).
Loose, H.; Franke, U.; Stiller, C. Kalman particle filter for lane recognition on rural roads. In Proceedings of the IEEE Intelligent Vehicles Symposium, Xi’an, China, 3–5 June 2009; pp. 60–65.
Teng, Z.; Kim, J.H.; Kang, D.J. Real-time lane detection by using multiple cues. In Proceedings of the International Conference on Control Automation and Systems (ICCAS 2010), Suwon, Republic of Korea, 27–30 October 2010; pp. 2334–2337.
Aly, M. Real time detection of lane markers in urban streets. In Proceedings of the IEEE Intelligent Vehicles Symposium, Eindhoven, The Netherlands, 4–6 June 2008; pp. 7–12.
Assidiq, A.A.; Khalifa, O.O.; Islam, M.R.; Khan, S. Real time lane detection for autonomous vehicles. In Proceedings of the International Conference on Computer and Communication Engineering, Kuala Lumpur, Malaysia, 13–15 May 2008; pp. 82–88.
Azimi, S.M.; Fischer, P.; Körner, M.; Reinartz, P. Aerial LaneNet: Lane-marking semantic segmentation in aerial imagery using wavelet-enhanced cost-sensitive symmetric fully convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2018, 57, 2920–2938.
Guan, H.; Lei, X.; Yu, Y.; Zhao, H.; Peng, D.; Junior, J.M.; Li, J. Road marking extraction in UAV imagery using attentive capsule feature pyramid network. Int. J. Appl. Earth Obs. Geoinf. 2022, 107, 102677.
Xu, H.; Wang, S.; Cai, X.; Zhang, W.; Liang, X.; Li, Z. Curvelane-nas: Unifying lane-sensitive architecture search and adaptive point blending. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 689–704.
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28.
Qin, Z.; Wang, H.; Li, X. Ultra fast structure-aware deep lane detection. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 276–291.
Yoo, S.; Lee, H.S.; Myeong, H.; Yun, S.; Park, H.; Cho, J.; Kim, D.H. End-to-end lane marker detection via row-wise classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 1006–1007.
Liu, L.; Chen, X.; Zhu, S.; Tan, P. Condlanenet: A top-to-down lane detection framework based on conditional convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3773–3782.
Wang, Q.; Wang, L.; Chi, Y.; Shen, T.; Song, J.; Gao, J.; Shen, S. Dynamic Data Augmentation Based on Imitating Real Scene for Lane Line Detection. Remote Sens. 2023, 15, 1212.
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440.
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241.
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818.
Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122.
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890.
He, X.; Zhou, Y.; Zhao, J.; Zhang, D.; Yao, R.; Xue, Y. Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15.
Xie, W.; Liu, P.X.; Zheng, M. Moving object segmentation and detection for robust RGBD-SLAM in dynamic environments. IEEE Trans. Instrum. Meas. 2020, 70, 1–8.
Meyer, A.; Salscheider, N.O.; Orzechowski, P.F.; Stiller, C. Deep semantic lane segmentation for mapless driving. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 869–875.
Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223.
Sun, T.; Di, Z.; Che, P.; Liu, C.; Wang, Y. Leveraging crowdsourced GPS data for road extraction from aerial imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7509–7518.
Fontanelli, D.; Moro, F.; Rizano, T.; Palopoli, L. Vision-based robust path reconstruction for robot control. IEEE Trans. Instrum. Meas. 2013, 63, 826–837.
Teichmann, M.; Weber, M.; Zoellner, M.; Cipolla, R.; Urtasun, R. Multinet: Real-time joint semantic reasoning for autonomous driving. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), Suzhou, China, 26–30 June 2018; pp. 1013–1020.
Qian, Y.; Dolan, J.M.; Yang, M. DLT-Net: Joint detection of drivable areas, lane lines, and traffic objects. IEEE Trans. Intell. Transp. Syst. 2019, 21, 4670–4679.
Chen, Z.; Chen, Z. Rbnet: A deep neural network for unified road and road boundary detection. In Proceedings of the International Conference on Neural Information Processing, Long Beach, CA, USA, 4–9 December 2017; pp. 677–687.
Zhang, J.; Xu, Y.; Ni, B.; Duan, Z. Geometric constrained joint lane segmentation and lane boundary detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 486–502.
Liu, Y.; Yao, J.; Lu, X.; Xia, M.; Wang, X.; Liu, Y. RoadNet: Learning to comprehensively analyze road networks in complex urban scenes from high-resolution remotely sensed images. IEEE Trans. Geosci. Remote Sens. 2018, 57, 2043–2056.
Meng, Z.; Xia, X.; Xu, R.; Liu, W.; Ma, J. HYDRO-3D: Hybrid Object Detection and Tracking for Cooperative Perception Using 3D LiDAR. IEEE Trans. Intell. Veh. 2023, 1–13.
Xia, X.; Meng, Z.; Han, X.; Li, H.; Tsukiji, T.; Xu, R.; Zheng, Z.; Ma, J. An automated driving systems data acquisition and analytics platform. Transp. Res. Part C Emerg. Technol. 2023, 151, 104120.