1. Introduction
Recent progress in electric vehicles, telecommunications, machine learning, tracking technologies, and robotics have opened opportunities for deploying autonomous vehicles (AVs) on urban roads and highways.
Current green automotive policies seek to reduce greenhouse gas emissions to zero. This has led to electric vehicles becoming the standard platform for AV development [
1,
2]. Many countries, such as Canada and the United States, have aimed for electric, hybrid, or hydrogen-powered vehicles to dominate the market by 2035 [
3].
Unfortunately, AVs face particular problems, including high energy consumption, navigation system performance, and road safety. Any inadequate motion execution on the road may lead to an accident and increase energy consumption.
The different sensors needed, such as light detection and ranging (LiDAR) or radar, and the processing of the different data, are responsible for the high energy consumption of AVs. This reduces the autonomy of the vehicles considerably. One proposed solution is to use a single type of sensor, namely a camera [
4].
Regarding the navigation system, any unknown element on the road will make it challenging to recognize the driving area, causing the system to fail. This is because it relies heavily on visual perception to interpret the surrounding environment.
Although lane departure warning/prevention (LDW/LDP) systems can prevent 297,000 lane departure accidents annually, the accident rate due to this is still very high. In 2015, the United States recorded 13,000 fatalities due to unintentional lane departure [
5].
For any autonomous navigation module, two challenges must be tackled: (i) the perception of the environment, specifically the road markers and signs, and (ii) the planning and execution of an appropriate motion for driving the vehicle safely to a given destination.
Different external variables cause visual conditions to be less than ideal, thus affecting the perception of road markings and signs. Variables such as weather (snow, fog, rain), time of day, and lighting changes increase the detection complexity [
6,
7].
In the meteorological context of countries where winter involves snow, people and AVs have difficulty perceiving and recognizing road markings and signs due to snowfall and ice accumulation. Drivers must rely on their intuition and memory of the road to maintain proper lanes because they cannot differentiate road markings whose color is diminished by snow [
8]. AVs do not have specific skills for driving in snow. They use camera-based techniques, so their performance decreases significantly when the lane is not easily visible.
Researchers should strive to optimize AVs’ systems, such as navigation and driver assistance, and ensure they function correctly without affecting the vehicle’s autonomy.
This paper provides an experimental analysis of the effects of several challenging environmental driving conditions on the performance of regression-based neural networks applied to lane recognition. A benchmark of six commonly used architectures (alexNet, resNet-18, resNet-50, squeezeNet, VGG16, and NASNet) is performed, using their performances under normal environmental conditions as a reference.
We have observed how much the accuracy rate of these algorithms in predicting lane lines decreases as the conditions of the road scenario vary. These challenging conditions include snow, different times of day, and shaded areas.
The contributions of our paper can be summarized as follows:
In many lane recognition studies, it is assumed that deep learning algorithms can perform well in scenarios with some degree of similarity to the training scenario. Experimentally, this work is intended to verify and demonstrate that the performance of these algorithms is limited. The networks are mainly based on feature extraction from the inputs and statistical dependencies, so they will not adapt well if there are changes or noise in these data.
As it is unknown to what extent snow accumulation on the road makes lane prediction difficult, we wanted to delve deeper into this scenario. Three scenarios with different amounts of snow on the lane, in both day and night, are included among the test conditions. Future work may use the limitations encountered to develop an approach to combat them.
The rest of the paper is arranged as follows.
Section 2 reviews the background of lane detection methods.
Section 3 presents our experiment with six deep learning algorithms. The study’s results are displayed in
Section 4, and the discussion is in
Section 5.
Section 6 consists of the conclusions and future work.
2. Related Work
The lane detection system is one of the AVs’ perception modules, essential to ensure safe navigation. In recent years, different methods have been proposed and developed for lane detection and prediction [
9]. We can find different surveys of vision-based lane recognition systems and advances in this area and LDW systems.
Narote et al. [
10] addressed the different modules of the vision-based lane recognition systems in their survey, such as video capture, lane modeling, feature extraction, and lane tracking. Tang et al. [
7] also included optimization strategies for these methods, which aimed to obtain good performance with a smaller dataset or to avoid post-processing. Ghani et al. [
11] reviewed different lane detection algorithms, considering weather conditions such as fog, haze, or rain. Additionally, they proposed a new contour angle method for lane marker classification. Xing et al. [
12] focused on and analyzed systems in detail, which included information on methodologies that integrate such lane recognition algorithms with sensors or other systems. They analyzed a lane detection framework based on the ACP (artificial society, computational experiments, and parallel execution) theory, which consists of constructing parallel virtual scenarios for model training. They highlighted that it is a possible way to solve generalization problems.
Specifically to recognize the driving zone, researchers use both traditional and deep learning-based methods. Among the former is geometric modeling of traffic lines [
7,
13], which uses information such as color, texture, edges, and gradient. Energy minimization modeling has also been used.
Jung and Bae [
14] presented a prototype for real-time road lane detection using data acquired through a 3D LiDAR sensor and for the automatic generation of lane-level maps. The authors initially distinguished traffic signal points on the ground from LiDAR data. They used an expectation-maximization method to detect parallel lines and update the 3D line parameters as a vehicle moved forward. Finally, they built a lane-level digital map from these parameters and a GPS (global positioning system)/ins (inertial navigation system) sensor.
Many deep learning-based algorithms have been used in the field of lane detection. These include the early CNN-based method, encoder–decoder CNNs [
15,
16], fully convolutional neural networks (FCNs), or combinations of CNNs and recurrent neural networks (RNNs) [
9,
17]. Likewise, generative adversarial networks (GANs) have been shown to be suitable for expressing complex line shapes [
7,
18].
According to Tang et al. [
7], the lane detection task can be approached from three perspectives. The first approach is classification-based methods, which use prior information to obtain the lane position or discriminate the road boundary type [
19,
20]. The second approach consists of object detection-based methods where feature points for each lane segment and coordinate regression are used [
21,
22]. The third perspective encompasses segmentation-based methods [
15,
18,
23,
24,
25], which classify the pixels of an input image into individual classes. All of these use feature extractors via learning.
With the victory of the AlexNet network in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) in 2012, deep learning, especially CNNs, have gained significant momentum. This has become a promising tool in several application fields, including lane marking detection [
7], where they have effectively improved the accuracy and robustness of this task [
26].
Neven et al. [
15] proposed the LaneNet fast lane detection algorithm, which can identify a variable number of lanes, thus allowing to deal with lane changing. The authors employed an instance segmentation method that can be trained end-to-end, where each lane forms its instance. They also proposed using a learned perspective transformation to the parameterize of the segmented lane instances, thus obtaining a solid lane adjustment against road plane changes.
Some strategies have been addressed to optimize deep learning methods by avoiding post-processing or obtaining good performance using fewer data. One of these strategies is using pre-trained models and configuring and adapting them according to the problem of interest. There are two approaches: transfer learning and knowledge distillation methods [
7,
27]. These take advantage of the pre-trained layers of the original model, using the first layers to recognize the most generic features [
28]. Subsequent layers are kept or removed or new ones are added depending on the problem to be addressed and the expected results [
27].
In the study of Kim and Park [
16], they presented a sequential transfer learning method for the ego-lane estimation that considers all the information of an input image during training. This method consists of the SegNet network, an end-to-end deep encoder–decoder architecture built on region segmentation. The authors reported direction estimation accuracies with augmented test data between 66% and 69% for a single transfer learning network and direction accuracies between 76% and 81% for a sequential transfer learning network. In [
29], Zhang et al. developed an adaptive learning algorithm for lane features using KITTI and Caltech datasets. They built a two-stage learning network based on the YOLO v3 (You Only Look Once, v3) algorithm, whose parameters were modified to suit lane detection. For the KITTI dataset, the authors reported accuracies of 79.26%, and for the Caltech dataset, 81.75%. Other lane detection algorithms such as Fast RCNN or sliding window & CNN evidenced accuracies of 49.87% and 68.98%, respectively, for the KITTI dataset. Hou et al. [
30] used the self-attention distillation (SAD) method, which they incorporated into a CNN for lane detection. This method consists of a model learning from itself by continuously improving without additional labels or supervision. The authors validated the method using three road lane databases: TuSimple, CULane, and BDD100K. They use ENet, ResNet-18, and ResNet-34 architectures, obtaining accuracies of 92.69%, 92.84%, and 93.02%, respectively. Hou et al. found that the performance of the ENet–SAD model was good, with an accuracy of 96.64%, surpassing the existing algorithms, while being faster and requiring fewer parameters. Transfer learning has shown great promise as it helps decrease training time, reduce computations, and make training more efficient [
27,
28].
Table 1 presents a summary of some CNN-based lane detection methods based on the network, the dataset used, and the metrics selected for evaluating the methods.
Concerning other deep learning models, the RNN networks, composed of loops, are used to analyze time series data, e.g., for speech recognition, language recognition, or handwriting. These present the problem of long-term dependency, where they cannot remember information for long periods. However, a type of these networks can deal with this problem, the long short-term memory (LSTM) [
35] network. In the literature, we can find studies that have used these networks for lane detection; however, they are combined with others techniques, such as CNNs, which are used as feature extractors.
Zou et al. [
9] developed a deep hybrid architecture for lane recognition. This combines a CNN for abstracting information from each frame of a driving scene, and an RNN, for lane line prediction based on the features extracted by the CNN. This method proved to be adequate, outperforming other methods used for lane detection, even in difficult situations. Kortli et al. [
36] proposed a real-time lane detection system based on a CNN encoder–decoder network and an LSTM network. The encoder extracts the features and reduces the dimensionality, the decoder maps those features, and the LSTM network processes the data to improve the detection rate. The study showed a good performance with 96.36% accuracy. Wu et al. [
37] presented an approach that uses a gradient map to emphasize lane features such as edges or color differences instead of using RGB images as inputs. This approach was used to train a CNN network based on VGG-16, observing an increase in accuracy and a reduction in training and inference time as compared to networks trained with RGB images. Similarly, an LSTM network was trained with the gradient map as an input to improve the detection of obscured lanes due to occlusions or varying lighting conditions. The above showed better performance than the CNN network alone.
Transformers are architectures that adopt the attention mechanism, focusing on specific parts of an input based on the importance [
38]. They accept sequential input data but do not always process them in order, reducing training time due to parallelization. These models are often used in language and image processing, where they have obtained similar results to CNNs [
39].
There is not yet much research in lane detection that rely on transformers; however, the study of Liu et al. [
40] stands out. They use transformers to predict the polynomial parameters of a lane shape model using a self-attenuation mechanism to focus on the long and thin structures of the lanes and the global context. This research yielded good accuracy results when testing the model on the TuSimple dataset (96.18%), while being fast and light in size. It also demonstrated good adaptability to scenarios with nighttime conditions or with occluded parts; however, it does not address complex lane detection tasks. Likewise, in [
41], Han et al. developed the Laneformer transformer-based architecture and adapted it to lane detection. This better captures the lane shape characteristics and the global semantic context of the series of points defining the lanes using a row and column self-attenuation mechanism. Unlike [
40], this model predicts the points in each lane to adapt to more complex scenarios. The architecture included a ResNet-50 backbone to extract basic features. This was evaluated on the CULane and TuSimple datasets, performing well in both normal environmental conditions and night or glare scenarios.
Among other lane detection methods is the algorithm proposed by Cao et al. [
26]. This study covers road environments with poor illumination, winding roads, and background interference, including highway, mountain, and tunnel roads. Using the random sample consensus algorithm, the authors obtained an aerial view of the road and fit the lane line curves based on the third-order B-spline curve model. The method obtained a 98.42% detection accuracy.
Despite the research, there is still little work on lane recognition on winter roads, both day and night, and little research evaluating the performance of neural networks applied to such situations. Some works cover different lighting conditions, and others focus on fog or rain; however, the conditions covered in this work have not been addressed to a large extent.
5. Discussion
With this work, we experimentally analyze the extent to which different challenging environmental driving conditions affect the performance of deep learning methods. We compare the performance of six varied and commonly used network architectures in lane detection. For reference, the different methods were applied to normal daytime driving conditions and compared with the performance obtained under challenging situations.
From a broad point of view, it is observed that the networks could not transfer the performance obtained under normal daytime conditions to the other situations. They showed a performance decrease, with a significant drop in the accuracy percentage and an increase in the RMSE for all cases.
Table 5 and
Table 6 show this decay in terms of accuracy and RMSE values for each situation and each network evaluated. The least challenging situation for the architectures was the daytime situation with light snow, followed closely by the daytime situation with moderate snow, although the percentages decrease were 23.9% and 26.11%, respectively. As for the RMSE values, the increases presented for these situations were 87.64% and 95.67%, respectively. The road with occluded areas and the night road with snow were the most challenging for the networks. They presented drops in prediction accuracy of 55.95% and 75.18%, respectively, and increases in the RMSE value of 230.19% and 301.12%, respectively. This behavior can be seen in
Figure 15.
The network that was least affected by all situations was AlexNet, with a minor difference in accuracy percentages. Although the differences in RMSE values concerning their performance under normal conditions were not the lowest, AlexNet was the network that obtained the lowest RMSE values in each situation, as shown in
Table 4. The networks that suffered the most in all situations were NASNet and ResNet-50, in contrast to their performance in normal daytime conditions, where they obtained the best prediction accuracy results.
As for the punctual performance of the networks with each challenging situation, we note that ResNet-18 had the best percentage of prediction accuracy. NASNet and ResNet-50 had the lowest percentages.
The fact that the performance of the networks under normal daytime conditions could not be satisfactorily transferred to slightly different situations from the training scenarios demonstrates a generalization problem.
Lane non-visibility, illumination changes, false lane lines, and road contrast variations modify the images’ features and add noise. This change in the features modifies the stochastic feature distributions, making them different from the distributions of the features corresponding to normal daytime driving conditions. This difference causes a decrease in the performance of deep learning methods and a drastic reduction in accuracy.
Generally speaking, it is said that the performance of all deep learning-based methods decreases, regardless of their nature (segmentation, regression, classification), because they use feature extractors. Any statistical change in the input data or noise affecting the spatial features will make them vulnerable. It is essential to consider these various road conditions when training the networks to improve the robustness and inclusiveness of the lane detection methods, thus avoiding a decrease in network performance.
In the broader context of autonomous navigation, this drop in performance accuracy impacts both vehicle safety and energy consumption. The vehicle can become a potential obstacle and cause severe road accidents if it cannot detect the lane. The car will not be able to stay in it and may steer away [
98]. Likewise, it will not be able to change lanes properly. Any improper planning and execution of movement on the road can lead to an accident, such as a rollover accident or collision with another vehicle, a stationary object, or pedestrians [
99]. Similarly, the energy budget will be affected by causing the navigation system to move the vehicle back to its correct lane. It is better to try to stay in the lane than to try to return after leaving it. Significant effort must be made to optimize these autonomous driving components and ensure their correct operation in different driving conditions for these energy and safety reasons.
Finding alternative ways to help mitigate performance degradation in challenging environments is essential. In the study by Boisclair et al. [
100], the fusion of different image modalities into a single network is proposed as a solution to the performance degradation of neural networks under adverse weather conditions such as snowfall or rainfall. The authors adopt a parallel multimodal network that receives any image modality. The authors indicate that depth images, which do not need color information, or thermal photos, which are not affected by precipitation or fog, show good performance. In the article of Sattar and Mo [
101], image overlay and alignment are used to find the lane position on roads where the lane is not fully visible. It uses vehicle position information and camera orientation to locate and access alternative images of the visible road and attempt detection on one of these. This research uses feature-based matching to find the corresponding image feature points between the evaluated image and the visual database image most closely resembles it. From the common regions found, pixel-based matching was used to enhance the alignment. In the work of Sajeed et al. [
98], a geographic information system (GIS) and an inertial navigation system are incorporated to locate the vehicle on the road without relying on the visibility of the lane markers. This overcomes the limitations of the vision camera, the LiDAR, and the GPS. The OpenStreetMap (OSM) map database was used to obtain the geospatial data on street attributes and geometry. The authors extracted road segments with geographic latitude–longitude points located at the center of the road.
Other possible solutions to the decrease in lane detection accuracy under challenging conditions are proposed in the future work section, considering the above alternatives.
6. Conclusions and Future Work
In this paper, we presented an experimental analysis of the impact of several challenging environmental driving conditions on the performance of regression-based neural networks for lane detection. Six varied and commonly used architectures (alexNet, resNet-18, resNet-50, squeezeNet, VGG16, and NASNet) were compared under challenging and normal environmental conditions.
Based on the evaluation and analysis of the benchmark, we found that all networks failed to extend the performance achieved successfully under normal conditions. The daytime snowy road conditions affected performance to a lesser extent than the others. The nighttime snowy road affected it the most. The accuracy percentage decreased considerably in all cases, and the RMSE increased drastically.
Overall, we confirm that scenarios where road conditions have little or no similarity to what has been previously learned pose challenges for deep learning algorithms. The stochastic distributions of the environment features vary with the presence of different factors such as noise, snow, or shadows, preventing networks from adapting adequately, as they base their predictions on the extraction of static features and dependencies.
In future work, it is proposed to include different sets of road images under various environmental conditions in training deep learning-based methods. The above is formulated to test if the networks can delimit the driving area by matching the common static features of the environment between similar images, such as trees or buildings. With this, we will try to solve the generalization problem and make our methods more stable and reliable. Likewise, a fusion of sensors (radar, thermal camera, visible camera, and LiDAR) is proposed to complement the previous point, thus having more information about the environment and finding more common static features with greater reliability. Another suggested option is to generate lane lines virtually by applying augmented reality to road images under adverse conditions. Visual and geolocation information from road images in visible and non-visible situations would be used to infer where the lane is located on roads with snow accumulation. The visual information encompasses the static reference features in the environment, which could be extracted using the encoding portion of an autoencoder. Additionally, using the approach of Sajeed et al. [
98], the position of the lane lines could be inferred, and a virtual lane could be generated. Geographic latitude–longitude points located at the center of the road, extracted from the OSM database, would be used.