Lane Detection | Encyclopedia MDPI

Lane Detection: History

View Latest Version

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Transportation Science & Technology

Contributor:

Tao Xie

Mingfeng Yin

Xinyu Zhu

Jin Sun

Cheng Meng

Shaoyi Bei

Lane detection is a vital component of intelligent driving systems, offering indispensable functionality to keep the vehicle within its designated lane, thereby reducing the risk of lane departure.

lane detection
re-parameterization
attention mechanism

1. Introduction

In recent years, with the rapid development of intelligent transport systems (ITS), they play a key role in traffic safety [1]. Among the features of these systems, lane detection technology has received widespread attention as an important component of assisted driving. Lane lines clearly delineate driving zones for various types of vehicles. This contributes to reduced road congestion and aids in collision avoidance, thus ensuring road safety [2].

In practical driving, the complexity and diversity of traffic scenarios challenge lane detection. For example, capturing the full shape of lane lines is difficult under conditions of dazzle or insufficient lighting. The thin and elongated appearance of lane lines makes them susceptible to obscuration by surrounding vehicles. Drivers need timely feedback about road conditions. Thus, driver assistance systems must swiftly ascertain the location of lane markings. A formidable challenge in this area is to achieve a balance between lane detection accuracy and real-time responsiveness.

Lane detection technologies can be classified into two main categories: one relying on conventional image processing techniques, and the other on deep learning approaches. Traditional lane detection algorithms primarily use computer vision techniques alongside image processing methodologies to discern the color [3,4], texture [5,6], and other features of lane lines against the surrounding road surface. Algorithms like Sobel [7] and Canny [8] are employed to extract the boundaries of lane lines. Furthermore, by incorporating methodologies such as the Hough Transform [9,10] or Random Sample Consensus (RANSAC) [11,12] can serve to further augment the optimization of detection results. For instance, Cai et al. [13] suggested using a Gaussian Statistical Color Model (G-SCM) to extract areas of interest based on lane line color characteristics. This was then combined with an improved Hough Transform for lane detection within the extracted image region. Guo et al. [14] suggested combining an improved RANSAC version with the Least Squares method to optimzie model parameters, achieving enhanced lane fitting results. However, traditional lane detection methods require manual feature selection and extraction. In intricate driving scenarios, these methods often struggle to discern clear lane lines. This is especially true in circumstances with an absence of structured lane lines or variable lighting conditions.

Contrary to the conventional lane detection algorithms, deep learning techniques can automatically extract and learn features, continually updating model parameters through training on large-scale datasets [15]. This narrows the gap between predictive outcomes and actual results, addressing the challenges of lane feature extraction in complex scenarios. Nevertheless, deep learning demands a vast volume of training data and high computational performance. Therefore, the complexity of the model requires thorough consideration for its practical applications.

Currently, deep-learning-based methods for lane detection consist of three categories: those founded on segmentation [16,17,18,19,20,21], parameter regression [22,23,24], and anchor-based methods [25,26,27,28]. Segmentation-based detection methods can be further divided into semantic segmentation and instance segmentation. Pixels are classified by semantic segmentation in order to identify lanes and backgrounds as separate categories. On the other hand, instance segmentation not only identifies the category of each pixel, but also distinguishes between different instances of objects, making it useful for detecting multiple lane lines, especially when their count varies. However, segmentation tasks typically involve extensive computation, posing challenges to the real-time requirements of driver assistance systems. Parameter regression-based methods use neural network regression to predict parameters. These parameters are then used to construct a curve equation representing the lane lines. While these algorithms can identify lane lines with changing shapes, their predictions are significantly influenced by regression parameters, leading to poorer model generalization. Row-anchor-based methods use prior knowledge of lane line shapes and divide the image into location grids oriented in the row direction. A classifier then returns grids containing lanes. Although this method provides relatively quick inference speeds, its accuracy might not always be optimal.

2. Lane Detection Based on Deep Learning

To cope with the complex and ever-changing driving scenarios, researchers have applied deep-learning-based feature extraction methods to lane detection. Neven et al. [17] present the LaneNet model, which consists of an embedding vector branch and a semantic segmentation network. This model employs an encoding–decoding operation to transform input images into high-dimensional feature vectors and back to the original image, successively determining whether each pixel belongs to the lane line. Seeking enhanced semantic information extraction capabilities, Pan et al. [18] introduced an original network architecture, SCNN, which incorporates a spatial convolution layer to facilitate both vertical and horizontal information propagation. The convolution layer contains connections in four directions: left, right, up and down, thereby enhancing the correlation of long-distance spatial information. However, the overall structure of the model is complex, requiring substantial computational resources and time. Consequently, the training and inference processes are significantly time-consuming. Hou et al. [19] incorporated Self-Attention Distillation (SAD) into Convolutional Neural Networks (CNNs). This innovative method facilitates knowledge distillation between different layers, enabling efficient utilization of information from varying layers to capture critical feature information. It is important to note that while SAD is only involved in the training phase and does not increase inference time, it inevitably escalates the computational cost of model training. Tabelini et al. [22] designed a parameter-based lane line detection model, PolyLaneNet, which represents lane line shapes through polynomial curves. As a regression model, it boasts a faster detection speed compared to segmentation models, but its refining ability is inadequate, and the detection precision is lacking. Qin et al. [27] suggested a row-anchor-based lane detection method, transforming pixel-level classification into global row selection classification, thus reducing the computational load during the inference process. However, due to the simplicity of the network architecture, the lane detection results may be somewhat deficient. Tabelini et al. [25] proposed an anchor point-based lane detection method. This method extracts features from each anchor point using feature maps generated by the main network and then combines these features with the global ones produced by the attention module. As a result, the model can connect information from multiple lanes, improving its detection accuracy compared to other anchor-based lane line detection methods.

3. Re-Parameterization

With the continuous development of CNNs, a series of high-precision models have emerged. These models often have deeper layers and more complex modules to achieve better prediction and recognition capabilities. However, the complexity of these models frequently leads to significant computational resource consumption, making real-time inference challenging. To enable models to achieve faster inference speeds while maintaining high precision, a strategy based on structural re-parameterization has been widely adopted. For example, ACNet [30] utilizes asymmetric convolution to construct the network, improving the robustness of the model to rotational distortion without increasing the computational cost of deployment. The RepVGG [31] model features different structures in its training and inference phases. During training, the model leverages a multi-branch topology structure to capture information at multiple scales. In contrast, during inference, it employs a single-branch architecture reminiscent of VGG [32], consisting of 3 × 3 convolutions and ReLU, to ensure efficient inference. The Diverse Branch Block (DBB) [33], a structure paralleling the Inception model, incorporates a multi-branch design. This design permits the substitution of any K × K convolution within the model throughout the training phase, capturing multi-scale features and thereby enriching the image information extracted.

4. Attention Mechanisms

The attention mechanism dynamically changes the weight of each feature in the image, mimicking the selective perception of the human visual system. It focuses on the critical areas of the image and suppresses irrelevant information. SENet [34] is the first to introduce attention into the channel dimension. It establishes the dependency relationship between convolutional feature channels through squeeze and excitation operations, allowing the model to learn to allocate weights to different channels and improve the utilization efficiency of important features. ECANet [35] is an adaptive channel attention mechanism. It does not depend on the full connection operation and focuses only on the cross-channel interaction of neighboring channels, reducing computational cost and memory consumption. To augment feature extraction, researchers consider the dependency relationships of channels and space, and design a fusion of different attention mechanisms. For example, CBAM [36] concurrently incorporates information from the primary dimensions of channels and spatial contexts, thereby empowering the network to extract more comprehensive features and enabling the network to extract more comprehensive features. DANet [37] designs parallel structure position attention modules and channel attention modules, enabling local features to establish rich context dependencies and effectively improving the detection results.

This entry is adapted from the peer-reviewed paper 10.3390/s23198285

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.