1. Introduction
The environmental perception system is paramount and indispensable for enabling autonomous driving capabilities in intelligent vehicles [
1]. However, in conditions of reduced visibility such as haze, visual sensors encounter limitations imposed by both external environmental factors and their own inherent constraints. These limitations lead to the acquisition of ambiguous information regarding the vehicle’s external environment, characterized by degraded image quality and diminished contrast. Collectively, these factors significantly impair the precision of target detection [
2]. Therefore, to ensure safe navigation and mitigate the risk of traffic accidents, it is imperative that intelligent vehicles are equipped with the capability to accurately perceive their surroundings, even under the challenging conditions presented by haze.
Object detection in hazy environments is a multifaceted challenge that can be broadly divided into two critical stages: image defogging and subsequent target detection [
3]. The domain of image defogging research involves two primary categories: model-based and model-free defogging algorithms [
4]. Model-based approaches delve into the causal factors of image degradation, examine the atmospheric conditions at the time of image acquisition, and construct mathematical models to emulate this degradation. Prominent among these are the atmospheric scattering model [
5,
6] and the retinex theory [
7,
8]. Li et al. [
9] introduced an end-to-end defogging network model that integrated atmospheric light values and atmospheric transmittance, derived from the scattering model, employing a neural network to learn and apply the function K(x) for restoring the fog-free image. Zhang Qian et al. [
10] enhanced the AOD-Net neural network defogging algorithm, addressing issues of inferior defogging quality and chromatic aberration in images. Their improvements included normalizing convolutional layers by position to accelerate convergence, utilizing multi-scale convolutional kernels for feature extraction, and adjusting network channel weights for more efficient feature utilization. In contrast, model-free algorithms bypass the complexities of model establishment and parameter estimation, effecting fog removal through direct learning of the mapping between fogged and fog-free imagery. Wang et al. [
11] proposed an architecture to address inhomogeneous defogging that incorporated a haze imaging mechanism, establishing high-dimensional, nonlinear mapping. Bharath et al. [
12] suggested that an enhanced conditional generative adversarial network (GAN) could be utilized for defogging, with a hybrid weighted loss function to refine output image quality.
Traditional image enhancement methods, including certain model-free defogging algorithms, are often constrained by high computational demands, leading to protracted processing times unsuitable for real-time defogging requirements in vehicular applications. In this study, we adopt the AOD-Net neural network defogging algorithm, a model-based approach that facilitates end-to-end defogging and is amenable to integration with target detection models, thereby harmonizing the processes of defogging and target detection.
In the realm of intelligent vehicle systems, computer vision techniques are pivotal for processing environmental data and facilitating the generation of target detection information. Deep learning-based target detection algorithms are conventionally divided into two primary categories: single-stage algorithms that approach detection as a regression problem, and two-stage algorithms that rely on region proposal (RP) [
13].The R-CNN (Region-based Convolutional Neural Network) series, proposed by Girshick et al. [
14,
15,
16], epitomizes the two-stage approach. The algorithms bifurcate the detection process into the generation of regions of interest (ROI) for subsequent target classification and bounding box regression. In contrast, single-stage methods, such as the YOLO (You Only Look Once) series [
17,
18,
19] and SSD (Single-Shot MultiBox Detector) [
20], perform target detection as an end-to-end task in real time, eschewing the need for a separate region-proposal step. Among these, the YOLO series, exemplified by YOLOv5, is particularly adept at target detection on mobile platforms. By treating the detection task as a regression problem, YOLOv5 achieves a commendable balance between detection speed and accuracy, surpassing two-stage models in terms of processing velocity. This study leveraged YOLOv5 as the foundational target detection network, with the aim of enhancing its performance to ensure precise target identification in intelligent vehicles. The selection of YOLOv5s was motivated by its suitability for the real-time constraints and computational efficiency required for in-vehicle applications.
Ensuring the safe operation of intelligent vehicles necessitates keen consideration of the vehicle’s operational state, the dynamic driving environment, and the precision of real-time target detection. The accuracy and speed of target detection are paramount for the safe navigation of autonomous vehicles, and thus, the enhancements to the YOLOv5s network presented in this study are designed to meet these stringent requirements.
In this research, we address the critical issue of target detection in hazy environments for intelligent vehicles. The improved AOD-Net model defogs the input foggy image and its output image is used as input to the improved YOLOv5s model for target detection. This study was motivated by the need for real-time, accurate perception in autonomous driving systems, which is often hindered by poor visibility conditions.
The AOD-Net model, while capable of meeting real-time requirements for image defogging, has limitations in feature extraction and does not account for human visual perception. Similarly, the YOLO algorithm, despite its strengths in real-time performance and accuracy, lacks specific optimization for defogging and vehicular applications. To overcome these limitations, we introduce the following contributions:
- (1)
Enhanced Image Defogging Algorithm: We propose an improved AOD-Net algorithm that incorporates a hybrid convolutional module (HDC) to broaden the receptive field for feature extraction and refine the details of defogged images. Additionally, we refine the loss function to expedite model convergence and improve generalization. These enhancements significantly upgrade the defogging performance of the AOD-Net neural network.
- (2)
Optimized YOLOv5s Detection Algorithm: An enhanced YOLOv5s target detection algorithm is introduced, featuring the ShuffleNetv2 lightweight network module to reduce complexity and improve efficiency. We also refine the feature pyramid network (FPN) and introduce global shuffle convolution (GSconv) to balance accuracy with the parameter count. The convolutional block attention module (CBAM) is incorporated to heighten target attention without compromising network accuracy, making it suitable for in-vehicle mobile devices.
The structure of this study is as follows:
Section 2 establishes the foundation by detailing the AOD-Net dehazing algorithm’s structure and principles, followed by our proposed improvements.
Section 3 focuses on enhancing the YOLOv5s model through modifications to the backbone, neck network, and attention mechanism.
Section 4 presents experimental validation of our proposed methods and compares them with mainstream algorithms. Finally,
Section 5 concludes the study with a summary of findings and future research directions.