A Fast and Robust Lane Detection via Online Re-Parameterization and Hybrid Attention

Xie, Tao; Yin, Mingfeng; Zhu, Xinyu; Sun, Jin; Meng, Cheng; Bei, Shaoyi

doi:10.3390/s23198285

Open AccessArticle

A Fast and Robust Lane Detection via Online Re-Parameterization and Hybrid Attention

School of Automible and Traffic Engineering, Jiangsu University of Technology, Changzhou 213001, China

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(19), 8285; https://doi.org/10.3390/s23198285

Submission received: 21 August 2023 / Revised: 22 September 2023 / Accepted: 5 October 2023 / Published: 7 October 2023

(This article belongs to the Section Vehicular Sensing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Lane detection is a vital component of intelligent driving systems, offering indispensable functionality to keep the vehicle within its designated lane, thereby reducing the risk of lane departure. However, the complexity of the traffic environment, coupled with the rapid movement of vehicles, creates many challenges for detection tasks. Current lane detection methods suffer from issues such as low feature extraction capability, poor real-time detection, and inadequate robustness. Addressing these issues, this paper proposes a lane detection algorithm that combines an online re-parameterization ResNet with a hybrid attention mechanism. Firstly, we replaced standard convolution with online re-parameterization convolution, simplifying the convolutional operations during the inference phase and subsequently reducing the detection time. In an effort to enhance the performance of the model, a hybrid attention module is incorporated to enhance the ability to focus on elongated targets. Finally, a row anchor lane detection method is introduced to analyze the existence and location of lane lines row by row in the image and output the predicted lane positions. The experimental outcomes illustrate that the model achieves F1 scores of 96.84% and 75.60% on the publicly available TuSimple and CULane lane datasets, respectively. Moreover, the inference speed reaches a notable 304 frames per second (FPS). The overall performance outperforms other detection models and fulfills the requirements of real-time responsiveness and robustness for lane detection tasks.

Keywords:

lane detection; re-parameterization; attention mechanism; row anchor

1. Introduction

In recent years, with the rapid development of intelligent transport systems (ITS), they play a key role in traffic safety [1]. Among the features of these systems, lane detection technology has received widespread attention as an important component of assisted driving. Lane lines clearly delineate driving zones for various types of vehicles. This contributes to reduced road congestion and aids in collision avoidance, thus ensuring road safety [2].

In practical driving, the complexity and diversity of traffic scenarios challenge lane detection. For example, capturing the full shape of lane lines is difficult under conditions of dazzle or insufficient lighting. The thin and elongated appearance of lane lines makes them susceptible to obscuration by surrounding vehicles. Drivers need timely feedback about road conditions. Thus, driver assistance systems must swiftly ascertain the location of lane markings. A formidable challenge in this area is to achieve a balance between lane detection accuracy and real-time responsiveness.

Lane detection technologies can be classified into two main categories: one relying on conventional image processing techniques, and the other on deep learning approaches. Traditional lane detection algorithms primarily use computer vision techniques alongside image processing methodologies to discern the color [3,4], texture [5,6], and other features of lane lines against the surrounding road surface. Algorithms like Sobel [7] and Canny [8] are employed to extract the boundaries of lane lines. Furthermore, by incorporating methodologies such as the Hough Transform [9,10] or Random Sample Consensus (RANSAC) [11,12] can serve to further augment the optimization of detection results. For instance, Cai et al. [13] suggested using a Gaussian Statistical Color Model (G-SCM) to extract areas of interest based on lane line color characteristics. This was then combined with an improved Hough Transform for lane detection within the extracted image region. Guo et al. [14] suggested combining an improved RANSAC version with the Least Squares method to optimzie model parameters, achieving enhanced lane fitting results. However, traditional lane detection methods require manual feature selection and extraction. In intricate driving scenarios, these methods often struggle to discern clear lane lines. This is especially true in circumstances with an absence of structured lane lines or variable lighting conditions.

Contrary to the conventional lane detection algorithms, deep learning techniques can automatically extract and learn features, continually updating model parameters through training on large-scale datasets [15]. This narrows the gap between predictive outcomes and actual results, addressing the challenges of lane feature extraction in complex scenarios. Nevertheless, deep learning demands a vast volume of training data and high computational performance. Therefore, the complexity of the model requires thorough consideration for its practical applications.

Currently, deep-learning-based methods for lane detection consist of three categories: those founded on segmentation [16,17,18,19,20,21], parameter regression [22,23,24], and anchor-based methods [25,26,27,28]. Segmentation-based detection methods can be further divided into semantic segmentation and instance segmentation. Pixels are classified by semantic segmentation in order to identify lanes and backgrounds as separate categories. On the other hand, instance segmentation not only identifies the category of each pixel, but also distinguishes between different instances of objects, making it useful for detecting multiple lane lines, especially when their count varies. However, segmentation tasks typically involve extensive computation, posing challenges to the real-time requirements of driver assistance systems. Parameter regression-based methods use neural network regression to predict parameters. These parameters are then used to construct a curve equation representing the lane lines. While these algorithms can identify lane lines with changing shapes, their predictions are significantly influenced by regression parameters, leading to poorer model generalization. Row-anchor-based methods use prior knowledge of lane line shapes and divide the image into location grids oriented in the row direction. A classifier then returns grids containing lanes. Although this method provides relatively quick inference speeds, its accuracy might not always be optimal.

Building on prior work in lane detection and considering the requirements for real-time and accurate performance, this paper refines the row-anchor-based detection methods. Initially, using the concept of online re-parameterization [29], we convert the multi-branch convolution into a single-structured online re-parameterization convolution. Using this convolution in the network not only ensured detection accuracy, but also reduced both training cost and inference time. Considering detection accuracy, we design a hybrid attention module that combines positional and channel attention mechanisms. This module can capture the spatial relationships and channel dependencies in image information, thereby enhancing the performance of the network. The main contributions of this paper are outlined as follows:

We propose a lane detection model that integrates online re-parameterized ResNet and row-anchor classification. This model possesses efficient inference speed, ensuring real-time detection under various complex traffic scenarios.
A hybrid attention module combining position and channel attention is designed, which captures feature information more comprehensively, enabling the model to focus on the slender lane line details in the image.
Comparative experiments are performed on the TuSimple and CULane datasets with other lane detection models. Our model achieves better detection results. The experiments demonstrate that the proposed model meets the accuracy and robustness requirements for lane detection.

2. Related Work

2.1. Lane Detection Based on Deep Learning

To cope with the complex and ever-changing driving scenarios, researchers have applied deep-learning-based feature extraction methods to lane detection. Neven et al. [17] present the LaneNet model, which consists of an embedding vector branch and a semantic segmentation network. This model employs an encoding–decoding operation to transform input images into high-dimensional feature vectors and back to the original image, successively determining whether each pixel belongs to the lane line. Seeking enhanced semantic information extraction capabilities, Pan et al. [18] introduced an original network architecture, SCNN, which incorporates a spatial convolution layer to facilitate both vertical and horizontal information propagation. The convolution layer contains connections in four directions: left, right, up and down, thereby enhancing the correlation of long-distance spatial information. However, the overall structure of the model is complex, requiring substantial computational resources and time. Consequently, the training and inference processes are significantly time-consuming. Hou et al. [19] incorporated Self-Attention Distillation (SAD) into Convolutional Neural Networks (CNNs). This innovative method facilitates knowledge distillation between different layers, enabling efficient utilization of information from varying layers to capture critical feature information. It is important to note that while SAD is only involved in the training phase and does not increase inference time, it inevitably escalates the computational cost of model training. Tabelini et al. [22] designed a parameter-based lane line detection model, PolyLaneNet, which represents lane line shapes through polynomial curves. As a regression model, it boasts a faster detection speed compared to segmentation models, but its refining ability is inadequate, and the detection precision is lacking. Qin et al. [27] suggested a row-anchor-based lane detection method, transforming pixel-level classification into global row selection classification, thus reducing the computational load during the inference process. However, due to the simplicity of the network architecture, the lane detection results may be somewhat deficient. Tabelini et al. [25] proposed an anchor point-based lane detection method. This method extracts features from each anchor point using feature maps generated by the main network and then combines these features with the global ones produced by the attention module. As a result, the model can connect information from multiple lanes, improving its detection accuracy compared to other anchor-based lane line detection methods.

2.2. Re-Parameterization

With the continuous development of CNNs, a series of high-precision models have emerged. These models often have deeper layers and more complex modules to achieve better prediction and recognition capabilities. However, the complexity of these models frequently leads to significant computational resource consumption, making real-time inference challenging. To enable models to achieve faster inference speeds while maintaining high precision, a strategy based on structural re-parameterization has been widely adopted. For example, ACNet [30] utilizes asymmetric convolution to construct the network, improving the robustness of the model to rotational distortion without increasing the computational cost of deployment. The RepVGG [31] model features different structures in its training and inference phases. During training, the model leverages a multi-branch topology structure to capture information at multiple scales. In contrast, during inference, it employs a single-branch architecture reminiscent of VGG [32], consisting of 3 × 3 convolutions and ReLU, to ensure efficient inference. The Diverse Branch Block (DBB) [33], a structure paralleling the Inception model, incorporates a multi-branch design. This design permits the substitution of any K × K convolution within the model throughout the training phase, capturing multi-scale features and thereby enriching the image information extracted.

2.3. Attention Mechanisms

The attention mechanism dynamically changes the weight of each feature in the image, mimicking the selective perception of the human visual system. It focuses on the critical areas of the image and suppresses irrelevant information. SENet [34] is the first to introduce attention into the channel dimension. It establishes the dependency relationship between convolutional feature channels through squeeze and excitation operations, allowing the model to learn to allocate weights to different channels and improve the utilization efficiency of important features. ECANet [35] is an adaptive channel attention mechanism. It does not depend on the full connection operation and focuses only on the cross-channel interaction of neighboring channels, reducing computational cost and memory consumption. To augment feature extraction, researchers consider the dependency relationships of channels and space, and design a fusion of different attention mechanisms. For example, CBAM [36] concurrently incorporates information from the primary dimensions of channels and spatial contexts, thereby empowering the network to extract more comprehensive features and enabling the network to extract more comprehensive features. DANet [37] designs parallel structure position attention modules and channel attention modules, enabling local features to establish rich context dependencies and effectively improving the detection results.

3. Proposed Method

Figure 1 illustrates a lane detection model that integrates online re-parameterization convolution and a hybrid attention module, which is primarily composed of an encoding network and a decoding network. The encoding network uses an online re-parameterization ResNet as its backbone to extract image features and incorporates a hybrid attention mechanism to enhance its focus on elongated lane line information. The decoding network processes the deep features from the encoding network, converts them into a flattened structure, and then feeds them into the Multi Layer Perceptron (MLP) classifier. This network utilizes a row-anchor classification detection method, classifying the image based on anchor points in each row, and ultimately outputs the lane line positions via the existence and localization branches.

3.1. Online Re-Parameterization

In response to the latency issues in real-time lane detection, we adopt a convolutional structure re-parameterization method to reduce detection time. The core idea involves employing multi-branch model structures during the training process. When transitioning to actual deployment, this intricate structure is condensed into a singular architecture through equivalent transformations. While this approach reduces the inference time without compromising the model’s performance, the model is quite complex during the training phase and thus has considerable training costs. To mitigate this, we further incorporate an online re-parameterization strategy in our model design. This allows the multi-branch structure to be reparameterized in real-time during the training phase, ensuring efficient operation even with limited hardware resources. The structure transformation of online re-parameterization is illustrated in Figure 2.

The online re-parameterization process unfolds in two stages: linearization and block squeezing. During training, the normalization layer is nonlinear, complicating the merging of intermediate layers into a singular convolutional layer. In the initial stage, all nonlinear normalization layers are removed, and in their place, linear scaling layers are introduced. Functionally akin to normalization layers, these scaling layers foster diversity in the optimization across different branches. To cap off the process, a normalization layer is reintroduced post-module, which serves to hasten convergence and stabilize the model throughout its training.

After linearization, block squeezing is carried out, simplifying the complex multi-branch topology into a single convolutional layer. We will now detail the simplification process. The operation of a two-dimensional convolution kernel can be described as follows:

Y = W * X

(1)

where

X

and

Y

, respectively, denote the input and output tensors, and

W

denotes the weight of the convolution.

For a multi-branch topology, we first simplify the serial structure. Multiple sequential convolutional layers are represented as follows:

Y = W_{N} (W_{N - 1} * \cdot \cdot \cdot (W_{2} * (W_{1} * X)))

(2)

In the sequential convolutional architecture, the channel count remains consistent. By leveraging the associative property of convolution, we can combine multiple convolutional layers into a single layer by first convolving the kernels. The conversion process can be illustrated as follows:

\begin{array}{l} Y & = (W_{N} (W_{N - 1} * \cdot \cdot \cdot (W_{2} * W_{1})) * X \\ = W_{e} * X \end{array}

(3)

where

W_{e}

denotes the end to end mapping matrix for the entire sequential structure.

In the subsequent steps, we simplify the parallel structure. Leveraging the linear superposition property of the convolution operation, the linear combination of multiple convolutions is equivalent to performing the linear combination first and then the convolution. The process is specifically demonstrated as follows:

Y = \sum_{m = 1}^{M} (W_{m} * X) = (\sum_{m = 1}^{M} W_{m}) * X

(4)

where

W_{m}

denotes the weight of the m-th branch, and

\sum_{m = 1}^{M} W_{m}

denotes the total weight of the convolutional network.

Based on the two simplification principles mentioned above, the complex multi-branch topology can be compressed into a single linear block. After this transformation, the complex module only requires one convolution, converting operations on the intermediate feature maps into operations on the convolutional kernel. This results in obtaining end-to-end mapping weights, thereby reducing the training cost. Consequently, complex multi-branch structures can be used in model design to ensure accurate detection and achieve faster inference speeds. As illustrated in Figure 3, this paper builds upon the DBB [33] structure by integrating a depthwise separable convolution branch [38] and frequency filter branch [39], leading to the creation of an online re-parameterization convolution module. This enhanced structure showcases notable generalization capabilities. Furthermore, the online re-parameterization convolution module replaces the standard convolution in ResNet18, and the improved network is termed OREP_ResNet.

3.2. Hybrid Attention Module

To more comprehensively capture the fundamental feature information in complex scenarios, we contemplate integrating an attention mechanism into the lane detection model [40]. We designed a Hybrid Attention Module (HAM) comprising the Efficient Channel Attention Module (ECAM) and the Positional Attention Module (PAM). By integrating these two mechanisms, the model is better equipped to attain a multi-dimensional feature representation, leading to heightened accuracy and enhanced robustness.

As illustrated in Figure 4, ECAM first applies Global Average Pooling (GAP) to the input feature map, resulting in a 1 × 1 × C feature vector. This operation reduces the size of the input features and effectively simplifies the complex spatial characteristics. The subsequent step is the processing of channel feature maps. ECAM replaces the Fully Connected Layer (FC) commonly found in traditional channel attention modules with a one-dimensional convolution operation. The output of the one-dimensional convolution is passed through the sigmoid activation function to obtain the weights for each channel. Notably, the size of this one-dimensional convolution kernel can be dynamically adjusted. The convolution kernel size is denoted as ‘k’, which signifies the number of adjacent channel information involved in attention prediction. Typically, dynamic kernel sizes are selected in accordance with the principle that layers hosting a larger quantity of channels necessitate a larger span of channel interaction, thereby employing a larger convolution kernel. Conversely, layers with fewer channels utilize a smaller kernel. The specific kernel size is dictated by the kernel adaptive Formula (5). Lastly, the 1 × 1 × C feature map is element-wise multiplied with the original H × W × C input feature map, realizing adaptive channel weighting.

k = ψ (C) = {|\frac{\log_{2} (C)}{γ} + \frac{b}{γ}|}_{o d d}

(5)

where

{|t|}_{o d d}

denotes the closest odd number to

t

,

C

denotes to the number of channels, and

γ

and

b

dictate the ratio between the channel number and the convolution kernel size. Conventionally, the values selected for

γ

and

b

are 2 and 1, respectively.

As illustrated in Figure 5, PAM first convolves the input feature map

X \in R^{C \times H \times W}

to produce three feature maps: A, B, and C. Each of these feature maps is subsequently reshaped into a structure denoted as

R^{C \times N}

(

N = H \times W

), Following this, the transpose of A is matrix-multiplied with B, and the resulting product is fed into a softmax layer to obtain

S \in R^{N \times N}

. This represents the correlation between two positions on the feature map, as detailed below:

S_{j i} = \frac{\exp (A_{i} \cdot B_{j})}{\sum_{i = 1}^{N} \exp (A_{i} \cdot B_{j})}

(6)

Next, S is matrix-multiplied with C, and the resulting output is added to the original feature map to obtain the final positional weights, as represented below:

Y_{j} = λ \sum_{i = 1}^{N} (S_{j i} C_{i}) + X_{j}

(7)

where

λ

denotes the increased weight during the learning process, starting initially at 0. Thus, different weights can be obtained for each position, realizing adaptive position weighting.

The final step is the fusion of the two attention output feature maps. This fusion process integrates the positional data derived from the global contextual information, while assigning channel weights enriched with more target features, making the model more sensitive to detail information.

3.3. Row Anchor Classification

In the decoding phase, given that the lane lines contained in the input images largely display continuous characteristics in the column direction in the actual detection process, and the lane lines, being slender in shape, occupy only a small area in the row direction, we use a detection method based on row anchor classification [41] to predict the position of the lane lines. Furthermore, to ensure the real-time performance of the model, we eliminated the segmentation branch during the decoding stage, thereby streamlining the model. As shown in Figure 6, the

H \times W

image undergoes a grid division process to obtain

h \times (w + 1)

cells containing position information. Compared to the

H \times W \times (n + 1)

classification computation required by segmentation method, the

h \times (w + 1) \times n

classification computation required by row anchor classification is significantly reduced. This significant reduction in computational complexity enables the model to more effectively satisfy the real-time demands of lane detection.

In the context of row anchor classification, the mathematical formula utilized for predicting the lane lines in each cell is as outlined below:

P_{i, j} = f^{i j} (X) i \in [1, n], j \in [1, h]

(8)

where

P_{i, j}

denotes the probability that each cell within the image contains lane lines,

n

denotes the total count of lane lines present in the image, and

h

denotes the quantity of row anchors.

3.4. Loss Function

First, we obtain the predicted values at each position using a classifier and calculate the loss for the predicted values compared to the true labels. The corresponding formula is as follows:

L_{c l s} = \sum_{i = 1}^{n} \sum_{j = 1}^{h} L_{C E} (P_{i, j}, T_{i, j})

(9)

where

L_{C E}

denotes the cross entropy loss, and

T_{i, j}

denotes the true class label.

Additionally, we add the loss calculation of the lane line existence branch, thereby filtering out the coordinate points without lane lines. The remaining coordinates constitute the lane lines, and the corresponding formula is as follows:

L_{ext} = \sum_{i = 1}^{n} \sum_{j = 1}^{h} L_{C E} (E_{i, j}, T E_{i, j})

(10)

where

E_{i, j}

denotes the predicted existence value, and

T E_{i, j}

denotes the true existence label.

To sum up, the total loss associated with the model is illustrated as follows:

L_{t o t a l} = α L_{c l s} + β L_{e x t}

(11)

where

α, β

each denote different loss weight coefficients, and in the experiments, these loss weight coefficients are set to 1.

4. Experiment

4.1. Datasets

In this study, we focus our experimental design and analysis on two publicly available lane line datasets: TuSimple and CULane. The TuSimple dataset, which is widely used in the field of autonomous driving, consists of 6408 images of highway driving, encompassing various traffic conditions and road structures. The images in this dataset have a resolution of 1280 × 720 pixels, the high pixel resolution facilitates the detection of lane lines at greater distances during the training process. The CULane dataset, a large-scale dataset for general lane line detection, consists of 133,235 images annotated for lane lines with a resolution of 1640 × 590 pixels, covering a diverse range of lighting conditions, road types, and complex traffic environments. The detailed information about these two datasets is presented in Table 1.

4.2. Experimental Environment

The experimental environment is set up using the PyTorch 1.11.0 deep learning framework on the Ubuntu 20.04 operating system, complemented by Cuda 11.3. The hardware configuration includes an NVIDIA GeForce RTX 2080Ti graphics card and an Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50 GHz. For model optimization, we employ the Stochastic Gradient Descent (SGD) algorithm with a batch size of 16, a learning rate of 0.01, and a weight decay coefficient of 0.0001.

4.3. Evaluation Indicators

The official CULane dataset provides the F1 score as a performance evaluation metric for models. The calculation formula is presented in Equation (12), calculates the Intersection over Union (IoU) between the true pixel values and predicted values of the lane lines. Predictions with an IoU greater than 0.5 are considered as True Positives (TP), while those with an IoU less than 0.5 are considered as False Positives (FP). Undetected lane segments are viewed as False Negatives (FN). The F1 score is the harmonic mean of precision and recall. A higher F1 score indicates an excellent overall performance of the model, signifying a reduced probability of false and missed detections.

\{\begin{cases} R e c a l l = \frac{T P}{T P + F N} \\ P r e c i s i o n = \frac{T P}{T P + F P} \\ F 1 = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} \end{cases}

(12)

The official TuSimple dataset provides Accuracy, FP and FN as the primary evaluation indicators. ACC reflects the ratio of samples that are correctly predicted to the total number of samples. FP refers to instances where the model incorrectly categorizes a negative sample as positive, whereas FN signifies cases where actual positives are misidentified as negatives. A decrease in these two metrics indicates an enhancement in the model’s accuracy. The calculation formulas are as follows:

A c c u r a c y = \frac{\sum_{c l i p} C_{c l i p}}{\sum_{c l i p} S_{c l i p}}

(13)

F P = \frac{F_{p r e d}}{N_{p r e d}}

(14)

F N = \frac{M_{p r e d}}{N_{g t}}

(15)

where

C_{c l i p}

represents the number of accurately predicted lane points and

S_{c l i p}

represents the total number of actual lane points.

F_{p r e d}

is the number of erroneously predicted lane points, while

N_{p r e d}

is the total number of predicted lane points.

M_{p r e d}

stands for the number of lane points that were not predicted, and

N_{g t}

denotes the genuine count of existing lane points.

Furthermore, in our experiments, the F1 score was also employed for the TuSimple dataset to provide a comprehensive evaluation of the overall capabilities of the model. Concurrently, FPS represents the number of frames the model can process within a single second. A higher FPS value signifies a swifter inference capability of the model, serving as a measure of its real-time performance.

4.4. Module Comparison Experiment

To demonstrate the effectiveness of online re-parameterization in reducing training costs, the ResNet18 model was tested both before and after the use of online re-parameterization during the training phase. The experiment assessed the number of floating-point operations (FLOPs), parameter quantity (Params), and params size, as shown in Table 2. Typically, FLOPs are employed to gauge the number of floating-point operations necessary for a model to execute a forward propagation. The introduction of online re-parameterization considerably curtails the volume of convolution computations, leading to a drastic drop in the FLOPs value, thereby economizing the computational resources requisitioned by the model. Moreover, there is a discernible reduction in the number of model parameters. The params size descends to 88% of the initial model, which means a reduced demand for memory throughout the training process, thereby facilitating successful model operation within constrained hardware resources.

To demonstrate the efficacy of the proposed hybrid attention mechanism, a comparison between individual attention mechanisms (PAM [37], ECANet [35]) and hybrid attention mechanisms (CBAM [36], HAM) was performed on the TuSimple dataset. As evidenced by Table 3, the hybrid attention mechanism, by taking into account feature dependencies across various dimensions, places a stronger emphasis on crucial information. This leads to a noticeable improvement in detection performance compared to individual positional or channel attention. Additionally, the HAM effectively captures long-distance contextual relationships. In the lane detection task, it achieves the best results across several Indicators compared to other attention modules.

4.5. Ablation Experiment

To substantiate the advantages of the improved lane detection model, the experiment designed detection schemes with different module combinations and calculated their respective F1 scores, ACC, and FPS values. The results for each metric are presented in Table 4. First, we replaced the ordinary convolutions in the ResNet model with online re-parameterization convolutions. Because the simpler convolution structure is used in the inference phase, the FPS value in the experimental results was significantly improved. The re-parameterization multi-branch structure in the training phase more effectively obtained deep feature information, resulting in a certain increase in model detection accuracy. The HAM was tested for its ability to focus on important information in images. The experimental results showed that the addition of the HAM improved both F1 and accuracy compared to the baseline model. but it also slightly increased the computational complexity, Compared to the baseline model, the final F1 score and accuracy increased by 0.7% and 0.5%, respectively, after combining the improvements from both modules. Moreover, the inference speed reached 304 FPS, indicating that the improved model achieved the objective of fast inference and high detection accuracy.

4.6. Performance Comparison of Different Models

This section presents a comparison of the test results between our model and other lane detection models on the TuSimple dataset, as detailed in Table 5. The comparison encompasses models such as segmentation-based detection models (LaneNet [17], SCNN [18], SAD [19]), polynomial regression-based detection model (PolyLaneNet [22]), and ranchor-based detection model (LaneATT [25], UFLD [27]).

Among these models, the segmentation-based detection model, which classifies each pixel to achieve accuracy, results in a marked increase in computation, thereby posing significant drawbacks in terms of detection speed. The model proposed in this study, on the other hand, employs an online re-parameterization structure during the encoding phase to simplify the inference process, and incorporates a row anchor classification strategy during the decoding phase, thereby achieving a noticeably superior detection speed compared to that of segmentation-based lane line detection models. The final model can reach an inference speed of up to 304 FPS. Furthermore, the inclusion of a hybrid attention module within the feature extraction network enhances the ability to focus on detailed features, leading to an improvement in accuracy by 0.53% compared to the anchor-based LaneATT model, and 2.74% compared to the polynomial regression-based PolyLaneNet model. Compared to the similar row anchor classification model UFLD, the model proposed in this study demonstrates improvements across all evaluation indicators. The experimental findings indicate that our model outperforms other advanced lane detection models in terms of F1 score, FN, and FPS. Overall, the improved model demonstrates substantial advantages in terms of the accuracy and real-time performance of lane line detection, ensuring strong competitiveness.

In order to provide a more intuitive demonstration of the performance of the proposed lane detection method, we have selected the segmentation model LaneNet and the row anchor classification model UFLD for a comparative visualization of the results.

As illustrated in Figure 7, all models perform well in the straightforward road scenes. However, in curved road detection, the LaneNet model experiences some drift, and the UFLD model fails to capture the curve trend. In contrast, our proposed model successfully detects the upcoming turns. When faced with near-field occlusion scenarios where lane markings are obscured by nearby vehicles, the LaneNet model exhibits a failure to identify the obscured lane markings, whereas the UFLD model only partially detects them. Our proposed model is able to completely identify the lane markings. Additionally, in situations with shadow occlusions, the other two models display less distinct detection markings, while our model maintains a desirable level of detection performance. In summary, when comparing results across various scenarios, our proposed model demonstrates a clear advantage over the other methods, characterized by high accuracy and low rates of missed detection.

4.7. Robustness Testing

During actual driving, the lane detection model faces the challenge of dealing with lane detection in different complex scenarios, which requires the model to have sufficient robustness. In this section, we conducted robustness testing experiments on the CULane dataset, comparing the detection results of our model with other lane detection models in nine different driving scenarios, including a comprehensive F1 score for detection in complex environments such as night, arrow, dazzle, and FP in the cross.

Table 6 lists the F1 scores for each lane detection method in different scenarios. The results show that our model achieved the best total F1 score. This represents an improvement of 13.8 over the LaneNet model and 0.9 over the UFLD model of the same type of lane anchor detection. The proposed model outperforms other lane anchor detection models in seven driving scenarios, including normal, congested and night conditions. However, as the model does not specifically address the challenges associated with strong light interference and the absence of lane lines at junctions, the overall F1 scores for these two scenarios are slightly lower than some algorithms. Despite some shortcomings, the overall experimental results validate the robust advantage of the improved model in detecting lane lines in various complex environments. The model can handle the majority of driving scenarios, thus meeting the robustness requirements for practical lane line detection.

To more intuitively reflect the detection performance of the model in different complex scenarios, Figure 8 visualizes the results of lane line detection across nine different driving scenarios from the CULane dataset. Within each category, the first row displays the original images from various driving scenarios, the second row indicates the lane line positions with red lines according to the actual labels, and the third row uses green lines to indicate the lane line positions as predicted by the model. As illustrated in the figures, compared to actual labels, the proposed model is capable of accurately detecting lane markings even in challenging conditions where the lanes are obscured by factors such as crowded and shadow. In scenarios with extreme variations in lighting conditions, such as dazzle or night, the model effectively discerns the lane positions, thereby enhancing driving safety. Furthermore, the model demonstrates ideal robustness by distinguishing other road surface features and extracting lane information even in other special scenarios.

5. Conclusions

To meet the real-time and robustness requirements of lane detection tasks, we present an advanced lane detection model that integrates online reparameterization of ResNet with a hybrid attention mechanism. By reparametrizing the ResNet structure, our model streamlines multi-branch topologies into a single branch structure. In comparison with other complex network architectures, this method not only reduces training overhead, but also boosts inference speed significantly, achieving an impressive 304 FPS, which surpasses current advanced lane detection algorithms. With the further inclusion of the hybrid attention module, our model forms an effective connection between spatial locations and feature channels, thereby improving the extraction of critical information. When tested on two public lane detection datasets, our model achieved F1 scores of 96.84% and 75.60%, showing outstanding detection performance across various challenging scenarios. In summary, our proposed method is ideally suited for real-time lane detection in complex environments. In future work, considering the model’s suboptimal detection results under dazzle conditions, we plan to investigate the addition of image preprocessing before the feature network to correct overexposed areas, aiming for better detection results in dazzle situations.

Author Contributions

Conceptualization, T.X. and M.Y.; methodology, T.X.; software, T.X. and X.Z.; validation, T.X., X.Z. and J.S.; formal analysis, M.Y.; investigation, X.Z. and C.M.; resources, M.Y.; data curation, T.X.; writing—original draft preparation, T.X.; writing—review and editing, T.X. and M.Y.; visualization, T.X. and J.S.; supervision, M.Y.; project administration, M.Y.; funding acquisition, S.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 62103192), Natural Science Research Programme for Higher Education Institutions in Jiangsu Province (20KJB520015), Changzhou Applied Basic Research Programme Project (medium subsidy) (CJ20200039).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lamssaggad, A.; Benamar, N.; Hafid, A.S.; Msahli, M. A survey on the current security landscape of intelligent transportation systems. IEEE Access 2021, 9, 9180–9208. [Google Scholar] [CrossRef]
Kumar, S.; Jailia, M.; Varshney, S. A Comparative Study of Deep Learning based Lane Detection Methods. In Proceedings of the 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 23–25 March 2022; pp. 579–584. [Google Scholar]
He, Y.; Wang, H.; Zhang, B. Color-based road detection in urban traffic scenes. IEEE Trans. Intell. Transp. Syst. 2004, 5, 309–318. [Google Scholar]
Chiu, K.; Lin, S. Lane detection using color-based segmentation. In Proceedings of the IEEE Intelligent Vehicles Symposium, Las Vegas, NV, USA, 6–8 June; 2005; pp. 706–711. [Google Scholar]
Tapia-Espinoza, R.; Torres-Torriti, M. A comparison of gradient versus color and texture analysis for lane detection and tracking. In Proceedings of the 2009 6th Latin American Robotics Symposium (LARS 2009), Valparaiso, Chile, 29–30 October 2009; pp. 1–6. [Google Scholar]
Li, Z.; Ma, H.; Liu, Z. Road lane detection with gabor filters. In Proceedings of the 2016 International Conference on Information System and Artificial Intelligence (ISAI), Hong Kong, China, 24–26 June 2016; pp. 436–440. [Google Scholar]
Gao, W.; Zhang, X.; Yang, L.; Liu, H. An improved Sobel edge detection. In Proceedings of the 2010 3rd International Conference on Computer Science and Information Technology, Chengdu, China, 9–11 July 2010; pp. 67–71. [Google Scholar]
Xuan, L.; Hong, Z. An improved canny edge detection algorithm. In Proceedings of the 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 24–26 November 2017; pp. 275–278. [Google Scholar]
Luo, S.; Zhang, X.; Hu, J.; Xu, J. Multiple lane detection via combining complementary structural constraints. IEEE Trans. Intell. Transp. Syst. 2020, 22, 7597–7606. [Google Scholar] [CrossRef]
Bisht, S.; Sukumar, N.; Sumathi, P. Integration of Hough Transform and Inter-Frame Clustering for Road Lane Detection and Tracking. In Proceedings of the 2022 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Ottawa, ON, Canada, 16–19 May 2022; pp. 1–6. [Google Scholar]
Kim, J.; Lee, M. Robust lane detection based on convolutional neural network and random sample consensus. In Neural Information Processing, Proceedings of the 21st International Conference, ICONIP 2014, Kuching, Malaysia, 3–6 November 2014; Proceedings, Part I 21; Springer: Berlin/Heidelberg, Germany, 2014; pp. 454–461. [Google Scholar]
Sukumar, N.; Sumathi, P. A Robust Vision-based Lane Detection using RANSAC Algorithm. In Proceedings of the 2022 IEEE Global Conference on Computing, Power and Communication Technologies (GlobConPT), New Delhi, India, 23–25 September 2022; pp. 1–5. [Google Scholar]
Cai, H.; Hu, Z.; Huang, G.; Zhu, D. Robust road lane detection from shape and color feature fusion for vehicle self-localization. In Proceedings of the 2017 4th International Conference on Transportation Information and Safety (ICTIS), Banff, AB, Canada, 8–10 August 2017; pp. 1009–1014. [Google Scholar]
Guo, J.; Wei, Z.; Miao, D. Lane detection method based on improved RANSAC algorithm. In Proceedings of the 2015 IEEE Twelfth International Symposium on Autonomous Decentralized Systems, Taichung, Taiwan, 25–27 March 2015; pp. 285–288. [Google Scholar]
Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Lee, S.; Kim, J.; Shin Yoon, J.; Shin, S.; Bailo, O.; Kim, N.; Lee, T.; Seok Hong, H.; Han, S.; So Kweon, I. Vpgnet: Vanishing point guided network for lane and road marking detection and recognition. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1947–1955. [Google Scholar]
Neven, D.; De Brabandere, B.; Georgoulis, S.; Proesmans, M.; Van Gool, L. Towards end-to-end lane detection: An instance segmentation approach. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 286–291. [Google Scholar]
Pan, X.; Shi, J.; Luo, P.; Wang, X.; Tang, X. Spatial as deep: Spatial cnn for traffic scene understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Hou, Y.; Ma, Z.; Liu, C.; Loy, C.C. Learning lightweight lane detection cnns by self attention distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1013–1021. [Google Scholar]
Xu, H.; Wang, S.; Cai, X.; Zhang, W.; Liang, X.; Li, Z. Curvelane-nas: Unifying lane-sensitive architecture search and adaptive point blending. In Computer Vision-ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XV 16; Springer: Berlin/Heidelberg, Germany, 2020; pp. 689–704. [Google Scholar]
Ko, Y.; Lee, Y.; Azam, S.; Munir, F.; Jeon, M.; Pedrycz, W. Key points estimation and point instance segmentation approach for lane detection. IEEE Trans. Intell. Transp. Syst. 2021, 23, 8949–8958. [Google Scholar] [CrossRef]
Tabelini, L.; Berriel, R.; Paixao, T.M.; Badue, C.; De Souza, A.F.; Oliveira-Santos, T. Polylanenet: Lane estimation via deep polynomial regression. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 6150–6156. [Google Scholar]
Feng, Z.; Guo, S.; Tan, X.; Xu, K.; Wang, M.; Ma, L. Rethinking efficient lane detection via curve modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022; pp. 17062–17070. [Google Scholar]
Wang, J.; Ma, Y.; Huang, S.; Hui, T.; Wang, F.; Qian, C.; Zhang, T. A keypoint-based global association network for lane detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022; pp. 1392–1401. [Google Scholar]
Tabelini, L.; Berriel, R.; Paixao, T.M.; Badue, C.; De Souza, A.F.; Oliveira-Santos, T. Keep your eyes on the lane: Real-time attention-guided lane detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 294–302. [Google Scholar]
Liu, L.; Chen, X.; Zhu, S.; Tan, P. Condlanenet: A top-to-down lane detection framework based on conditional convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 3773–3782. [Google Scholar]
Qin, Z.; Zhang, P.; Li, X. Ultra fast structure-aware deep lane detection. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 276–291. [Google Scholar]
Zheng, T.; Huang, Y.; Liu, Y.; Tang, W.; Yang, Z.; Cai, D.; He, X. Clrnet: Cross layer refinement network for lane detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022; pp. 898–907. [Google Scholar]
Hu, M.; Feng, J.; Hua, J.; Lai, B.; Huang, J.; Gong, X.; Hua, X. Online convolutional re-parameterization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022; pp. 568–577. [Google Scholar]
Ding, X.; Guo, Y.; Ding, G.; Han, J. Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1911–1920. [Google Scholar]
Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 13733–13742. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Ding, X.; Zhang, X.; Han, J.; Ding, G. Diverse branch block: Building a convolution as an inception-like unit. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 10886–10895. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 22 June 2018; pp. 7132–7141. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–18 June 2020; pp. 11534–11542. [Google Scholar]
Woo, S.; Park, J.; Lee, J.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–19 June 2019; pp. 3146–3154. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Qin, Z.; Zhang, P.; Wu, F.; Li, X. Fcanet: Frequency channel attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 783–792. [Google Scholar]
Ren, F.; Zhou, H.; Yang, L.; Liu, F.; He, X. ADPNet: Attention based dual path network for lane detection. J. Vis. Commun. Image Represent. 2022, 87, 103574. [Google Scholar] [CrossRef]
Song, Y.; Huang, T.; Fu, X.; Jiang, Y.; Xu, J.; Zhao, J.; Yan, W.; Wang, X. A Novel Lane Line Detection Algorithm for Driverless Geographic Information Perception Using Mixed-Attention Mechanism ResNet and Row Anchor Classification. ISPRS Int. J. Geo-Inf. 2023, 12, 132. [Google Scholar] [CrossRef]

Figure 1. Overall structure of the lane detection model.

Figure 2. Online re-parameterization conversion process.

Figure 3. Online re-parameterization convolution module.

Figure 4. Detailed structure of the Efficient Channel Attention Module.

Figure 5. Detailed structure of the Position Attention Module.

Figure 6. Row anchor classification diagram.

Figure 7. The detection results for the three models are presented. The first row is straight road scenes, the second row is distant curved road scenes, the third row is near-field occlusion scenes, and the fourth row is multiple occlusion scenes.

Figure 8. Lane detection visualization results across nine distinct traffic scenarios.

Table 1. Dataset information and partitioning.

Dataset	Frame	Train	Validation	Test	Resolution
TuSimple	6408	3268	358	2782	1280 × 720
CULane	133,235	88,880	9675	34,680	1640 × 590

Table 2. Results of computational volume evaluation metrics before and after the use of online re-parameterization.

Model	FLOPs/G	Params/M	Params Size/MB
Resnet18	9.389	96.369	367.62
Resnet_OREPA	0.235	85.375	325.68

Table 3. Attention module performance comparison. Bold numbers are the best.

Module	ACC	FP	FN	F1
PAM [37]	95.81	2.77	4.57	96.31
ECANet [35]	95.87	2.81	4.61	96.27
CBAM [36]	95.91	2.71	4.55	96.35
HAM	96.03	2.68	4.28	96.50

Table 4. Ablation experiment results. Bold numbers are the best.

Resnet18	OREP	HAM	F1	ACC	FPS
√			96.16	95.65	282
√	√		96.11	95.86	338
√		√	96.50	96.03	250
√	√	√	96.84	96.10	304

Table 5. Comparison of performance with different methods on TuSimple. Red, green, and blue numbers are the three results in descending order of optimality.

Method	F1	Acc	FP	FN	FPS
LaneNet [17]	94.80	96.38	7.80	2.44	44
SCNN [18]	95.97	96.53	6.17	1.80	7.5
SAD [19]	95.92	96.64	6.02	2.05	75
LaneATT [25]	96.71	95.57	3.56	3.01	250
PolyLaneNet [22]	90.62	93.36	9.42	9.33	115
UFLD [27]	96.16	95.65	3.06	4.61	282
Ours	96.84	96.10	2.29	4.00	304

Table 6. Comparison of performance with different methods on CULane. Red, green, and blue numbers are the three results in descending order of optimality.

Method	Normal	Crowded	Night	Noline	Shadow	Arrow	Dazzle	Curve	Cross	Total
LaneNet [17]	82.9	61.1	53.4	37.7	56.2	72.2	54.5	59.3	5928	61.8
SCNN [18]	90.6	69.7	66.1	43.4	66.9	84.1	58.5	64.4	1990	71.6
SAD [19]	90.1	68.8	66.0	41.6	65.9	84.0	60.2	65.7	1998	70.8
PINet [21]	85.8	67.1	61.7	44.8	63.1	79.6	59.4	63.3	1534	69.4
CurveLane [20]	88.3	68.6	66.2	47.9	68.0	82.5	63.2	66.0	2817	71.4
LaneATT [25]	91.1	72.9	68.9	48.3	70.9	85.4	65.7	63.3	1170	75.1
UFLD [27]	91.7	73.0	70.2	47.2	74.7	87.6	64.6	68.7	1998	74.7
Ours	92.1	74.1	71.3	48.4	77.1	88.3	63.1	69.3	1909	75.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xie, T.; Yin, M.; Zhu, X.; Sun, J.; Meng, C.; Bei, S. A Fast and Robust Lane Detection via Online Re-Parameterization and Hybrid Attention. Sensors 2023, 23, 8285. https://doi.org/10.3390/s23198285

AMA Style

Xie T, Yin M, Zhu X, Sun J, Meng C, Bei S. A Fast and Robust Lane Detection via Online Re-Parameterization and Hybrid Attention. Sensors. 2023; 23(19):8285. https://doi.org/10.3390/s23198285

Chicago/Turabian Style

Xie, Tao, Mingfeng Yin, Xinyu Zhu, Jin Sun, Cheng Meng, and Shaoyi Bei. 2023. "A Fast and Robust Lane Detection via Online Re-Parameterization and Hybrid Attention" Sensors 23, no. 19: 8285. https://doi.org/10.3390/s23198285

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Fast and Robust Lane Detection via Online Re-Parameterization and Hybrid Attention

Abstract

1. Introduction

2. Related Work

2.1. Lane Detection Based on Deep Learning

2.2. Re-Parameterization

2.3. Attention Mechanisms

3. Proposed Method

3.1. Online Re-Parameterization

3.2. Hybrid Attention Module

3.3. Row Anchor Classification

3.4. Loss Function

4. Experiment

4.1. Datasets

4.2. Experimental Environment

4.3. Evaluation Indicators

4.4. Module Comparison Experiment

4.5. Ablation Experiment

4.6. Performance Comparison of Different Models

4.7. Robustness Testing

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI