Next Article in Journal
A Novel Method for Damping State Switching Based on Machine Learning of a Strapdown Inertial Navigation System
Previous Article in Journal
Bio-Inspired Multimodal Motion Gait Control of Snake Robots with Environmental Adaptability Based on ROS
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Feature Fusion Image Dehazing Network Based on Hybrid Parallel Attention

1
School of Automation and Information Engineering, Sichuan University of Science and Engineering, Yibin 644005, China
2
Artificial Intelligence Key Laboratory of Sichuan Province, Sichuan University of Science and Engineering, Yibin 644005, China
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(17), 3438; https://doi.org/10.3390/electronics13173438
Submission received: 3 July 2024 / Revised: 25 August 2024 / Accepted: 26 August 2024 / Published: 30 August 2024

Abstract

:
Most of the existing dehazing methods ignore some global and local detail information when processing images and fail to fully combine feature information at different levels, which leads to contrast imbalance and residual haze in the dehazed images. To this end, this article proposes a image dehazing network based on hybrid parallel attention feature fusion, called the HPA-HFF network. This network is an optimization of the basic network, FFA-Net. First, the hybrid parallel attention (HPA) module is introduced, which uses parallel connections to mix different types of attention mechanisms, which can not only enhance the extraction and fusion capabilities of global spatial context information but also enhance the expression capabilities of features and have better dehazing effects on uneven distribution of haze. Second, the hierarchical feature fusion (HFF) module is introduced, which dynamically fuses feature maps from different paths to adaptively increase their receptive field and refine and enhance image features. Experimental results demonstrate that the HPA-HFF network proposed in this article is contrasted with eight mainstream dehazing networks on the public dataset RESIDE. The HPA-HFF network achieves the highest PSNR (39.41) and SSIM (0.9967) and obtains a good dehazing effect in subjective vision.

1. Introduction

Image dehazing plays a preprocessing role in the field of computational vision and is particularly important to improv the visibility of images captured under non-optimal atmospheric conditions (e.g., fog, haze, and smog). The degradation caused by these conditions (such as reduced visibility, distortion of colors, bad contrast, and blurry [1]) not only reduces the aesthetic quality of the image but also has a negative effect on some computer vision tasks. Image dehazing aims to restore clear and real-haze-free images through certain algorithms and techniques. It improves image quality and facilitates a variety of vision applications such as target detection, autonomous driving, remote sensing, and outdoor surveillance systems [2,3]. Therefore, image dehazing has been widely noticed and studied in computer vision in recent years [4,5].
Haze is an atmospheric scattering phenomenon. In haze weather, due to the presence of a large number of floating particles in the air, light will interact with these particles during propagation, causing light to scatter, which ultimately seriously affects the contrast, saturation, and clarity of the image acquired by the imaging device. Therefore, most of the early image dehazing methods are based on analyzing the imaging principle of hazy images and establishing corresponding hazy image degradation models. The most widely used is the atmospheric scattering model (ASM) [6,7,8], which is described in formula 1. It uses certain prior knowledge and assumptions to solve the intermediate parameters and thus infer the haze-free image. Therefore, rewriting formula 1 as formula 2 will allow us to see the expression of the haze-free image more clearly.
I ( x ) = J ( x ) t ( x ) + A ( 1 t ( x ) )
J ( x ) = I ( x ) A t ( x ) + A
Among them, I ( x ) represents the hazy image; J ( x ) denotes the haze-free image to be restored; x represents the pixel position in the image; A represents the global atmospheric illumination value; t ( x ) = e β d ( x ) denotes the transmission rate of the medium; β is scattering coefficient of the atmospheric; d ( x ) represents the scene depth. Early methods proposed dark channel priors [9] and color attenuation priors to estimate A and t ( x ) . For example, He et al. [10] estimated the haze concentration in the image by combining atmospheric light priors and dark channel estimation, thereby restoring haze-free image by correcting each pixel. Zhu et al. [11] found that the haze concentration is proportional to the difference between the brightness and saturation of the image by counting a large number of hazy images, which led to the proposal of a color attenuation a priori algorithm and the establishment of a linear dehazing model. The prior-based image dehazing algorithms are based on the atmospheric scattering model for image dehazing and significant progress has been made. However, problems such as color cast, insufficient brightness, and detail loss are still serious, and the generalization ability is still poor, so the dehazing effect of the restored clear image is not very good.
Since physical models are not universal in actual scenarios, dehazing methods based on deep learning have emerged, which obtain haze-free images by learning from large amounts of image data. One is that the design of neural networks has not broken away from the constraints of the atmospheric scattering model but uses neural networks instead of prior information to estimate A and t ( x ) . Representative methods include the following: Cai et al. [12] utilized the output medium transmittance and atmospheric scattering model for dehazing and proposed a trainable dehazing network, DehazeNet; Li et al. [13] improved the atmospheric scattering model by combining transmittance and atmospheric light for training and proposed the integrated dehazing network, AOD-Net. The other is to use a complete neural network to directly generate clear, haze-free images without additional prior knowledge. Representative methods include the following: Dong et al. [14] proposed FD-GAN by generative adversarial network (GAN) [15]. This network uses frequency information as a priori information for the fusion discriminator to guide the network to generate more natural and realistic dehazed images; Liu et al. [16] divided the network into three modules: preprocessing, backbone, and post-processing, and proposed an attention-based multi-scale network, GridDehazeNet; and Qin et al. [17] used residual nested attention to achieve dehazing and proposed a feature fusion attention dehazing network, FFA-Net. Although the above image dehazing algorithms have improved the dehazing effect to a certain extent, they still face issues such as insufficient global information, loss of local details, decreased contrast, and multi-scale processing.
To address the above problems, this study needs to enable image dehazing algorithms to fully capture the multi-scale information differences and cope with the haze inhomogeneous distribution, and at the same time fuse the information of pixels, channels, and spatial dimensions to obtain more detailed global features and enhance local detail information. this paper makes targeted improvements on the basic model FFA-Net and proposes a feature fusion image dehazing network based on hybrid parallel attention. The main innovations are as follows:
  • A hybrid parallel attention (HPA) module is proposed to replace the FA module. This module combines pixel attention, channel attention, and spatial attention mechanisms. Through parallel connection, it can not only enhance the extraction and fusion capabilities of global spatial context information but also obtain more comprehensive and accurate feature expression and have a better dehazing effect on the uneven distribution of haze.
  • A hierarchical feature fusion (HFF) module with an adaptively expanded receptive field is introduced, which dynamically fuses feature mappings from distinct paths to capture the trade-off between local and global features, and refines and enhances image features to improve the dehazing effect.
  • A hybrid loss function is used to add a perceptual loss function to the original one to improve the brightness and contrast of the dehazed image. The L1 loss function aims to retain image edge information while reducing the noise and artifacts inside an image and comparing the difference between a generated dehazed image and the real image; the perceptual loss function focuses more on quality of image perception, which can extract the texture and structural features of the image and restore the high-frequency information of the image.

2. Related Work

In recent decades, image dehazing has been a research hotspot in image processing and has attracted great attention from countries around the world. Existing methods can generally be categorized into main groups: priori-based dehazing methods and deep learning-based dehazing methods. This paper begins with a brief summary of these two types of methods and then focuses on FFA-Net, the baseline model used in this paper.

2.1. Dehazing Methods Based on Priors

Dehazing methods based on priors all rely on atmospheric scattering models. They first use prior knowledge or assumptions as restrictive conditions to estimate A and t ( x ) in the atmospheric scattering model and then then the hazy image is recovered as a haze-free image. Zhao et al. [18] proposed an improved a priori dehazing model by transforming the localized medium transmittance estimation into an improved global fusion parameter estimation problem that mitigates image oversaturation. Chen et al. [19] proposed an image fusion de-hazing method combined with sky segmentation, where A and t ( x ) are first estimated and corrected, and then the images are fused to improve the image detail information and attenuate the image distortion. Berman et al. [20] found that hundreds of approximate colors can be used to describe the color of a clean image and form color clusters in RGB space, proposing a non-local prior. These colors cluster in hazy images to create different haze lines to remove the haze from the image. Although the above methods have achieved a certain degree of dehazing effect, all of these methods are based on the accuracy of prior knowledge. If there are no constraints, the parameter estimation will usually not be very accurate, resulting in the haze-free image restored based on prior knowledge not achieving good visual effects.

2.2. Deep Learning-Based Dehazing Methods

The dehazing method based on deep learning is to obtain a haze-free image by repeatedly learning the mapping function of a large number of datasets through a neural network. In the early stage, it relied on physical models to estimate A and t ( x ) and then reconstruct the haze-free image. Zhang et al. [21], relying on the atmospheric scattering model, proposed a GAN-based edge-preserving pyramidal densely connected codec network (DCPDN) and designed a new edge-preserving loss function. Ren et al. [22] designed an overall edge-guided multiscale deep neural network model, MSCNN, to estimate the medium transmittance and refine the local features. The above methods use neural networks to estimate intermediate parameters, but it is difficult to obtain accurate information in actual scenes. Due to the limitations of physical models, the end-to-end neural network dehazing method is designed as a complete neural network that does not need to estimate any intermediate parameters and directly restores the hazy image to a haze-free image. Qu et al. [23] designed a network called EPDN (Enhanced Pix2pix Dehazing Network) for dehazing using a generative adversarial network (GAN) [14,15]. The idea of hierarchical restoration is adopted in this network, and the image is refined from coarse to fine through a multi-resolution generator and enhancer. Based on the Swin Transformer, Song et al. [24] improved the activation function, spatial feature aggregation program and normalized layer for the task characteristics of image dehazing, proposed DehazeFormer, and achieved good dehazing performance. In addition, there has recently been an emergence of deep learning based on non-local models and physics. Dutta et al. [25] proposed a deep neural network called DIVA by processing non-local image structures through patch interactions and Hamiltonian operators. Although the above deep learning-based algorithm has improved the performance of dehazing images to a certain extent, the network structure does not fully consider the characteristics of haze. In areas with large scene depths, the dehazing performance is poor. At the same time, the color fidelity and detail restoration need to be improved.

2.3. Baseline Model FFA-Net

FFA-Net is an algorithm with relatively good image dehazing effect based on end-to-end fully neural networks in recent years. It mainly includes three different design points: first, a feature attention module (FA) composed of pixel attention (PA) and channel attention module (CA) is designed separately so that the network can treat different features and pixels differently and thus can better distinguish and process different types of image information; second, a basic block structure (Block) composed of the FA module and local residual learning structure is designed to bypass less important information, such as mist or low-frequency areas, so that the main network can pay more attention to effective information; third, a group architecture with multiple jump connections is designed. Each group architecture is composed of multiple basic block structures connected in series and a jump connection structure so that the network can adaptively learn weights for important features such as thick fog and retain shallow information of the image, so that the haze-free image can preserve the non-haze characteristics of the original image as much as possible. The overall structure of the network is shown in Figure 1, and the basic block structure is shown in Figure 2.
However, the FFA-Net dehazing method still has some shortcomings and certain limitations in the loss of image detail information and the extraction and fusion of image global information. Therefore, the main work of this paper is to optimize FFA-Net as the basic architecture. In this architecture, a parallel connection attention module that combines pixel, channel, and spatial attention is designed to enhance the feature expression ability of the network, so that the module can extract global information at the same time and retain and enhance local detail information; then, a hierarchical feature fusion module that adaptively expands the receptive field is added, allowing the network to dynamically select convolution kernels of different sizes to capture the trade-off between local and global features, so that it can refine and enhance image features.

3. HPA-HFF Network

In this section, the overall architecture of the hybrid parallel attention-based feature fusion image dehazing network (HPA-HFF) proposed in this paper is firstly described, as shown in Figure 3. This network structure is an overall improvement of FFA-Net and mainly consists of two modules: 1. The HPA module that can obtain more accurate and comprehensive feature expressions and generate different weights for each feature; 2. The HFF module that can fuse feature maps from different paths and refine and enhance image features. The details of the HPA module are then further introduced. This module can extract global shared information and location-related local information about the original features in parallel. Then the HFF module is introduced, which gives different attention to the skip connection and the main branch and dynamically fuses them according to their respective attention. Finally, the loss function used in this network is briefly explained.
The input to the HPA-HFF network is a hazy image, and it first goes through a convolutional layer for shallow feature extraction. It is then passed to a group structure composed of multiple skip connections and block structures (B1, B2, …, BN) along with convolutional layers. The group structure uses our proposed HPA module to fuse the output features of the N basic block structures. These feature maps are then fused together through a concatenation operation to form a rich feature map. One path continues through residual learning to the HPA module on the main branch to further enhance high-frequency information, while another path bypasses the main branch and is directly transmitted to the HFF module. This allows it to further fuse multi-level feature information, balance local and global features, and further refine and optimize the features. Finally, the final dehazed image is generated after a number of convolutional layers with the initial input image through an elemental summation operation.
From Figure 3, the improved network architecture diagram is obtained by replacing the FA module with the HPA module and adding the HFF module and the convolutional layers corresponding to the first two modules on the original structure diagram of FFA-Net. Each group structure block of G1, G2, and G3 is composed of multiple basic block structures connected in series and combined with a skip connection structure so that the network can perform adaptive weight learning on important features such as thick haze features, obtain more comprehensive and accurate feature expression, and have a better dehazing effect on the uneven distribution of haze.

3.1. HPA Module

As can be seen from Figure 4, each basic block structure is made up of a local residual learning structure and an HPA module. The basic block structure first bypasses less important information features, such as low-frequency regions or thin haze, through local residual learning. Then the CBAM module within the HPA module better utilizes the spatial and channel information in the input image to acquire useful positional information, adaptively readjusting the channel features according to the importance of different feature channels. The SPA and PA modules are designed to make the network more concerned with information features such as dense haze of pixels and the higher frequency image regions. The parallel connection of three attention mechanisms in the HPA module allows simultaneous multi-scale feature extraction from the image, enabling the acquisition of global information while preserving detail information and focusing more on uneven haze distribution. This approach addresses the limitations of the original feature attention (FA) module, which, due to its standard serial connection, did not fully consider issues such as non-uniform haze distribution and global information acquisition, resulting in better dehazing performance for the network.
Although the FA module in FFA-Net has significantly improved the dehazing effect, the series design of CA module and PA module that constitute the FA module does not specifically target global spatial information and does not fully consider the uneven hazy distribution. Therefore, it is hoped that the design of a structure that can extract global spatial information and collect local details and multi-scale information will further improve the dehazing ability. To this end, this paper proposes a hybrid parallel attention (HPA) module to enhance the extraction and fusion capabilities of global spatial context information. As shown in Figure 5, the HPA module uses parallel connections to mix different kinds of attention mechanisms, which include a simple pixel attention, a CBAM attention that combines channels and space, and a pixel attention. First, set x as the feature map and normalize it using Batch Norm to obtain x ^ = B a t c h   N o r m ( x ) ; then, through several attention mixing methods, pay attention to the correlation of spatial pixels and the connection between channels, learn multi-scale information, and make up for the lack of globality. Finally, the image is directly recovered to the primitive size for feature supplementation.
Pixel attention [26] can not only effectively extract information features related to position but also address the uneven haze distribution across different images, thus making the network more concerned with information features such as dense haze of pixels and the higher frequency image regions. The simple pixel attention (SPA) module is composed of two offshoots, PL1 and PA1, as seen in Figure 6 (I). PL1 is a feature extraction offshoot, and PA1 is a pixel gated offshoot. This paper uses PA1 as the pixel gated signal for PL1.
P L 1 = C o n v ( C o n v 1 ( x ^ ) )
P A 1 = σ ( C o n v 1 ( x ^ ) )
L 1 = P L 1 P A 1
Among them, L1 represents the final output of the SPA module. Conv1 is a 1 × 1 convolution layer, Conv is a 3 × 3 convolution layer, σ is a sigmoid linear activation, and ⊗ represents an element-wise multiplication operator. The pixel attention (PA) module contains a PA2 branch that can extract global pixel gating information features, as shown in Figure 6 (III).
P A 2 = σ ( C o n v 1 ( δ ( C o n v 1 ( x ^ ) ) ) )
L 2 = x ^ P A 2
Here, L2 represents the final output of the PA module, and δ represents the ReLu function. The meanings of the other symbols, which are the same as those in Formulas (3) to (5), are also the same. The PA module uses Conv1-δ-Conv1 to fit the features, and σ is used to pick-up global pixel gated features. Then, PA2 is used as the global pixel gated signal for x ^ .
CBAM [27] consists of channel attention and spatial attention, which better utilize the spatial and channel information in the input image to acquire useful positional information. This accurately highlights thick haze or high-frequency regions, improving the properties and generalization capability of the model. CBAM’s channel attention module uses both maximal pooling and average pooling to calculate the channel attention graph, while the spatial attention module performs maximal pooling and average pooling along the channel axis, then concatenates the two spatial attention maps. LCBAM is used as the global position gating signal for x ^ , as shown in Figure 6 (II).
L C A M = σ ( M L P ( A v g ( x ^ ) ) + M L P ( M a x ( x ^ ) ) )
L S A M = σ ( C o n v 7 ( [ A v g ( L C A M ) ; M a x ( L C A M ) ] ) )
L C B A M = L C A M L S A M
In Formulas (8) to (10), LCAM is the single-sided channel attention output, L C A M = x ^ L C A M indicates the entire channel attention module output, LSAM denotes the spatial attention output, and LCBAM dictates the entire CBAM module output. The output of the entire HPA module is shown in Formula (12).
L = C o n c a t ( L 1 , L 2 , L C B A M )
Y = x + C o n v 1 ( G E L U ( C o n v 1 ( L ) ) )
First, the three different attention gating results are concatenated along the channel dimension, and then apply a multi-layer perceptron MLP with Conv1-GELU-Conv1 to reduce the dimensionality of the connected feature channels to the same dimension as the input x ^ , and finally, the output of the MLP with the initial value before x ^ .
In the hybrid parallel attention (HPA) module, the global atmospheric illumination value A is a shared global variable, while the medium transmittance t ( x ) is a variable that depends on the position. Pixel attention enables better extraction of location-related information and encodes t ( x ) , while CBAM is both better at extracting shared global information and as well as focusing on location information and encodes A . By reviewing numerous related articles, this paper believes that extracting both globally shared information and position-dependent information from the original features can achieve global optimization of the attention mechanism. However, when two different attentions are connected in series, global optimization cannot be achieved because the subsequent attention extracts features from the output of the preceding attention rather than directly from the original features. This paper believes that this hybrid parallel attention module can better remove haze features. Subsequent experiments have proved that this module is indeed more suitable for image dehazing.

3.2. HFF Module

The hierarchical feature fusion (HFF) module integrates two different types of attention mechanisms. It includes the selective kernel fusion (SKF) module, which focuses on channel-scale attention, and the gated convolution (GC) module, which focuses on pixel-scale attention. The HFF module is mainly designed to dynamically fuse feature mappings from different paths, try to maintain the integrity of the image information, avoid losing a large amounts of image detail information, and refine and enhance image features to improve dehazing capabilities. As shown in Figure 7, the HFF module has two different paths. Setting x 1 is the feature mapping from the main path and x 2 is the feature mapping from the jump connection. First, the main path x 1 and the jump connection x 2 are added to get x , which is then separately input into the GC module and the SKF module.
The GC module includes two branches: the feature extraction branch F P and the gating signal branch W P , as shown in Figure 8. The GC module achieves functionality equivalent to pixel-scale attention and nonlinear activation, allowing the model to effectively extract position-dependent feature information and being more concerned with the details of the information, such as uneven haze distribution. Formula (13) provides the calculation process for the GC module. Among them, x p represents the final output of the GC module, σ represents sigmoid linear activation, Conv1 represents a convolutional layer with a kernel size of 1, and DWConv5 represents a depthwise convolutional layer with a kernel size of 5.
F P = D W C o n v 5 ( C o n v 1 ( x ) ) W P = σ ( C o n v 1 ( x ) ) x p = F P W P
The SKF module is inspired by the selective kernel (SK) module [28], which is used to provide channel attention and then fuse multiple branches through channel attention. Similar concepts can be found in MIRNet [29,30]. It can efficiently extract global information and change the channel dimension of features. Its structure is shown in Figure 7. First, the fusion weights λ 1 , λ 2 , are obtained by using the global average pooling GAP(.), multi-layer perceptron (Conv1-RELU-Conv1) LMLP(.), SoftMax function, and Split operation. Then, this paper multiplies the fusion weights λ 1 , λ 2 by x 1 , x 2 respectively, to obtain the channel attention features x c , x c = λ 1 x 1 + λ 2 x 2 .
{ λ 1 , λ 2 } = S p l i t ( S o f t max ( L M L P ( G A P ( x ) ) ) )
From the above process, it can be seen that at the end of the hierarchical feature fusion module, the outputs x p , x c , from the GC module and the SKF module are added together to obtain the output y of the hierarchical feature fusion (HFF) module, which is expressed as shown in Formula (15).
y = x p + x c

3.3. Loss Function

To optimize the dehazing generation model, this paper uses smoothing L1 loss and perceptual loss as the optimization targets of the network. The baseline model only uses smooth L1 loss, so perceptual loss is now added. Perceptual loss improves the perceptual quality by extracting high-level features of the image through a pre-trained neural network that calculates the difference between the two images. Based on these two loss functions, a multi-task loss function is given, and its expression is shown in Formula (16).
L = L m + η L P
Among them, L represents multi-task loss, Lm represents smoothing L1 loss, Lp represents perceptual loss, and η represents the relative weight between the two loss functions, which is set to 0.04. The following introduces these two loss functions.
Smoothing L1 loss refers to the difference between the haze-free image output by the model and the real image, by comparing the pixel difference between the output and the real image pixel by pixel. In addition, it can better handle the problem of gradient explosion and is less sensitive to outliers than L2 loss during the optimization process, which improves the stability of the optimization algorithm. Therefore, this paper uses smoothing L1 loss to constrain the network, and the expression is shown in Formula (17).
L m = 1 N x = 1 N S l 1 ( I o ( x ) I g t ( x ) ) S l 1 ( z ) = { 0.5 ( z ) 2 , | z | < 1 , | z | 0.5 , o t h e r w i s e .
In the above formula, Io is the estimated image, Igt is the corresponding real image, N is the total number of pixels, and Z represents the pixel distance between the estimated image and the corresponding real image.
Perceptual loss is a loss function commonly used in deep learning-based image-style transfer methods. Compared with the traditional mean square error loss function, perceptual loss pays more attention to the perceived quality of the image and is more in line with the human eye’s perception of image quality. Perceptual loss calculates the difference between the output haze-free image and the real image through a neural network that has been pre-trained on a large-scale dataset. This paper uses the first three pooling layers of the VGG16 network to extract the texture and structure information of the image. The perceptual loss function is shown in Formula (18).
L p = k = 1 3 1 C k W k H k | | ( ϕ k ( I o ) ϕ k ( I g t ) ) | | 2 2
where Io is the estimated image, Igt is the corresponding real image, ϕ k ( I o ) and ϕ k ( I g t ) represents the feature maps extracted from the VGG16 model for the estimated image and the real image. Ck, Wk, and Hk represent the number of channels, width, and height of the k-th layer feature maps, respectively, where k = 1, 2, and 3.

4. Experimental Results and Analysis

To verify the effectiveness of the HPA-HFF network in image dehazing, the dehazing algorithm proposed in this paper is compared with eight other advanced algorithms on the RESIDE dataset. These eight advanced methods are the a priori knowledge-based dehazing method DCP [10], neural network-based dehazing methods AOD-Net [13], DehazeNet [12], RefineDNet [31], GridDehazeNet [16], MSBDN [4], FFA-Net [17], and DehazeFormer-T [24]. Later, this paper will introduce the datasets, experimental configurations, and experimental results from two different angles used in this study in detail. Finally, ablation experiments were completed to verify the usefulness of each module.

4.1. Dataset

Since it is currently hard to capture pairs of haze images and haze-free images in the real world, it is necessary to use the atmospheric scattering model to select corresponding parameters to synthesize haze images. This paper selects the RESIDE dataset as the dataset for neural network pre-training. The RESIDE dataset is a new single-image dehazing benchmark called the Real Single Image Dehazing (RESIDE) dataset. It has a large-scale, comprehensive training set and two different datasets designed for objective and subjective quality evaluation, which are divided into five subsets. This paper uses three of the subsets. The indoor training set ITS and the outdoor training set OTS are used for neural network training, and the comprehensive target test set SOTS is used to test the dehazing effect. The indoor training set ITS has 13,990 synthetic blurry images, which are generated from 1399 clear images from the indoor depth dataset NYU2 and the Middlebury Stereo Database. Different related parameters are estimated by the atmospheric scattering model, and 10 blurry images are generated from one clear image. The OTS outdoor training set has 296,695 synthetic blurry images. The depth is estimated from the outdoor images, and then the blurry images are synthesized. The synthetic target test set SOTS includes 500 indoor and outdoor hazy images, respectively, aiming to show multiple evaluation viewpoints and synthesize hazy images according to the same process as the training set.

4.2. Experimental Setup

The deep learning environments used in this paper are Python 3.7 and PyTorch 1.11.0, and then the experiments are conducted on an NVIDIA GeForce RTX 4090 GPU with 24 GB of storage. The Adam optimizer is used for parameter tuning, with β1 and β2 taking the default values of 0.9 and 0.999, respectively. The initial learning rate is set to 10−4, and the cosine annealing strategy is used for step updates, gradually reducing the learning rate from the initial value to 0. During the training process, in order to enhance the dataset, this paper randomly crops the images to a size of 240 × 240 and performs random rotations of 90°, 180°, and 270°, horizontal flips, and vertical flips, and then inputs these enhanced images into the network for training.
The HPA-HFF network proposed in this paper will be trained separately on the indoor dataset ITS and the outdoor dataset OTS in the RESIDE dataset. The training rounds of the indoor dataset are 50,000, and the optimal results of PSNR and SSIM parameters in the current step are recorded every 5000 rounds during the training process. The training rounds of the outdoor dataset are 500,000, and the optimal results of PSNR and SSIM parameters in the current step are recorded every 50,000 rounds during the training process.

4.3. Quantitative Analysis

Objective evaluation means to quantitatively evaluate image quality through mathematical models and algorithms. This paper uses peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) as objective indicators for quantitative analysis. PSNR is a metric for evaluating image quality. The larger its value, the smaller the error between the restored dehazed image and the original clear image. SSIM measures image similarity from three aspects: brightness, contrast, and structure. The closer its value is to 1, the more similar the structure of the restored dehazed image is to the original clear image.
The quantitative comparison of the dehazing algorithm proposed in this paper with the eight advanced algorithms mentioned at the beginning of Section 4 on the SOTS indoor test set (SOTS indoor) and SOTS outdoor test set (SOTS outdoor) is shown in Table 1. FFA-Net is the baseline model for this paper, and the last row shows the quantitative results of the HPA-HFF network proposed in this paper.
As can be seen from Table 1, the PSNR and SSIM values of the DCP, AOD-Net, and DehazeNet algorithms on the indoor and outdoor test sets are relatively low, which means that the output dehazed images are quite different from the original clear images, and the dehazed images have a certain degree of color distortion and noise. In comparison, RefineDNet only improves slightly in PSNR but has a significant improvement in SSIM, which means that the dehazed images of RefineDNet are very close to the original images, and their structural similarity is very high. The dehazing capabilities of GridDehazeNet, MSBDN, FFA-Net, and DehazeFormer-T have been significantly improved. The PSNR of these algorithms has almost increased by 10 dB, and the SSIM is almost 0.98, which means that the dehazing effect is relatively ideal and the distortion is smaller than that of the original clear images. Compared with the above networks, the PSNR and SSIM of the HPA-HFF network proposed in this paper have the best values on the SOTS indoor and SOTS outdoor datasets, which means that the network proposed in this paper has a better dehazing effect and smaller distortion than other networks.

4.4. Qualitative Analysis

At the subjective visual level, several representative algorithms were selected for comparison, as shown in Figure 9 and Figure 10. Figure 9 shows the test results of each algorithm on the STOS indoor dataset, and Figure 10 shows the test results on the STOS outdoor dataset. To better verify the advantages of the algorithm in this article, dark pictures, bright pictures, and well-lit pictures were selected in the indoor test set, and light haze, medium haze, and heavy haze pictures were selected in the outdoor test set to compare the effects from multiple angles.
Through comparison, it was found that although several algorithms can achieve dehazing effects, the results are not always satisfactory. AOD-Net tends to produce images with heavy color saturation and some distortion, with residual thin haze in dimly lit and thick haze images, leading to poor visual effects. GridDehazeNet leaves obvious haze residues; for example, the red wall in the second image appears white, retaining a layer of thin haze, and there are noticeable color artifacts, such as in the sky region of the fifth image. MSBDN produces images with clearer textures, but due to insufficient global information during dehazing, the brightness of the dehazed images is higher than that of the real images, as evident in the second and fourth images. FFA-Net has issues with unevenness and blurriness in restoring image details and colors, such as the blurred texture of items on the table in the first image when zoomed in, indicating incomplete dehazing. Compared with other algorithms, DehazeFormer-T shows significant improvement in dehazing effects, but it still falls short in detail restoration and brightness recovery compared to our proposed network. Our network effectively removes haze, restores clear images, and preserves texture and color information, resulting in the best visual effects that are closest to the real images.
In order to further verify the effectiveness of the algorithm in this paper, several hazy images in real scenes are tested. Since it is difficult to obtain corresponding haze-free images in real scenes, only qualitative analysis is performed. Figure 11 shows the comparison of the dehazing effects of several algorithms on real hazy images. As shown in Figure 11, similar to the indoor and outdoor test sets, the colors of the images after AOD-Net and GridDehazeNet dehazing are a bit distorted, the sky color is dark or light, and there is still a lack of haze; after MSBDN dehazing, the image has some exposure and shadows; the effect of some dehazing images of FFA-Net is not very ideal, and there is a lot of haze; the color of the dehazing images of DehazeFormer-T is a bit dark and the brightness is relatively high; compared with other algorithms, the dehazing image restoration of the algorithm in this paper is higher and the effect is better.

4.5. Ablation Experiments

In order to demonstrate the effectiveness of each module in the dehazing method proposed in this article, ablation experiments were conducted on the SOTS test set for the two proposed modules. First, a base network for the ablation experiments was constructed. Then, the different modules were added to the base network in turn. The combinations included are as follows:
  • Model A (Base): this is the base network, which is the FFA-Net model.
  • Model B (Base + HPA): the FA module in the base network is replaced with the Hybrid Parallel Attention (HPA) module.
  • Model C (Base + HFF): the hierarchical feature fusion (HFF) module is added to the base network.
  • Model D (Base + HPA + HFF): both the HPA module and the HFF module are added to the base network.
  • Model E (Base + HPA + HFF + LP): the complete model with the addition of the perceptual loss function, representing the network proposed in this article.
In the ablation experiments, the training configuration of all models is the same as that of the HPA-HFF network. The results of their subjective comparison are shown in Figure 12, and the results of the objective comparison are shown in Table 2.
From Figure 12, when no modules are added (Model A), the dehazed images exhibit a color restoration effect that is too deep, especially in the areas of the lake and the sky, and the detail processing effect is poor. For models that only add certain enhancement structures, there is some improvement in detail restoration, but some haze remains around objects (such as branches in the second set of images), making it difficult to remove haze from structurally complex areas of the image. Adding both the HPA module and the HFF module facilitates information exchange between different layers and cross-layer fusion of multi-scale features. This enhances the ability to remove haze, retain details, and utilize original feature information (such as color), resulting in dehazed images that visually resemble the original clear images more closely. The details of the trees in the lake reflection are more complete, and the image contrast is enhanced, achieving a realistic and natural restoration effect with more pronounced details.
As shown in Table 2, the basic network (Base) can enhance the network extraction capability by using the HPA module and the HFF module alone (Models B and C). Model B improves PSNR and SSIM by 2.14 dB and 0.0026 on the indoor test set and by 0.95 dB and 0.0017 on the outdoor test set. Model C improves PSNR by 0.16 dB but decreases SSIM by 0.0077 on the indoor test set, and by 0.10 dB but decreases SSIM by 0.0005 on the outdoor test set. Although the PSNR value of Model C is not significantly improved and the SSIM value is slightly reduced, the combination of the two modules (Model D) makes up for this decline and has further improved the feature extraction ability of the network. Compared with the basic network, PSNR and SSIM are improved by 2.58 dB and 0.0048 on the indoor test set and by 1.44 dB and 0.0013 on the outdoor test set. It can be found that under the joint action of the HPA module and the HFF module, the increase in PSNR and SSIM exceeds that of using only one module, and the test effect is better. In addition, from the PSNR and SSIM values of Model E, it can be seen that adding a perceptual loss function to the original loss function can help improve the dehazing effect. Finally, a comprehensive comparison of several models shows that different modules and loss functions all play a role in promoting the quality of image restoration results, enabling the network to achieve the best dehazing effect, proving the effectiveness of each strategy.
In addition, this paper also conducts ablation experiments on the relative weight η in the multi-task loss function to determine the optimal value of η. By reading a large number of relevant studies, most of the values are between 0.01–0.1. Therefore, η = 0.01–0.1 is set with an interval of 0.01, and the indoor dataset is selected to train the network. Finally, the PSNR and SSIM indicator change curves are obtained, as shown in Figure 13. As can be seen from Figure 13, when η = 0.04, the values of PSNR and SSIM are both maximum, so η is finally set to 0.04.

5. Conclusions

In view of the serious image sharpening, contrast reduction, and haze residue in the current image dehazing algorithm, this paper makes targeted improvements based on the FFA-Net network and proposes a feature fusion image dehazing network (HPA-HFF network) based on hybrid parallel attention to improve the performance of the single image dehazing algorithm. First, a hybrid parallel attention (HPA) module is proposed to replace the FA module in the original network. This module combines pixel attention, channel attention, and spatial attention to extract and fuse features, making the information flow in the network more efficiently and accurately. At the same time, it also has a good effect on processing some special edge features so as to reconstruct and restore the dehazed image. Secondly, the hierarchical feature fusion (HFF) module is introduced. This module dynamically fuses feature maps from different paths and refines and enhances image features to improve the dehazing effect, effectively alleviating the problem of image detail information feature loss caused by the use of skip connections in the FFA-Net network and HPA module. Finally, the perceptual loss function is added to the original L1 loss function, which pays more attention to the perceptual quality of the image, can extract the texture and structural information of the image, restore the high-frequency information of the image, and greatly improves the learning ability of the network. The experimental results show that the algorithm in this article achieves good results. Compared with other algorithms, the PSNR and SSIM are greatly improved, and a good subjective visual effect is obtained. However, the proposed algorithm still has shortcomings, and there is still room for improvement in the restoration of image texture information. In the future, the algorithm can be improved by combining it with traditional image processing methods and using traditional image methods to extract high-frequency features of the image to repair texture information.

Author Contributions

H.C.: conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing—original draft; M.C.: writing—review, editing, supervision, project administration, resources; H.L.: data curation, writing—review, supervision, resources; H.P.: formal analysis, writing—review, resources; Q.S.: investigation, software, supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by Zigong City Key Science and Technology Project under grant number 2020YGJC25, in part by the Opening Fund of Artificial Intelligence Key Laboratory of Sichuan Province under grant number 2023RYY07, in part by the 2022 Graduate Innovation Fund of Sichuan University of Science and Engineering under grant number Y2023272.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Yi, W.; Dong, L.; Liu, M.; Hui, M.; Kong, L.; Zhao, Y. Towards compact single image dehazing via task-related contrastive network. Expert Syst. Appl. 2024, 235, 121130. [Google Scholar] [CrossRef]
  2. Chow, T.-Y.; Lee, K.-H.; Chan, K.-L. Detection of targets in road scene images enhanced using conditional gan-based dehazing model. Appl. Sci. 2023, 13, 5326. [Google Scholar] [CrossRef]
  3. Kim, W.Y.; Hum, Y.C.; Tee, Y.K.; Yap, W.-S.; Mokayed, H.; Lai, K.W. A modified single image dehazing method for autonomous driving vision system. Multimed. Tools Appl. 2024, 83, 25867–25899. [Google Scholar] [CrossRef]
  4. Dong, H.; Pan, J.; Xiang, L.; Hu, Z.; Zhang, X.; Wang, F.; Yang, M.-H. Multi-scale boosted dehazing network with dense feature fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2157–2167. [Google Scholar]
  5. Li, S.; Zhou, Y.; Ren, W.; Xiang, W. Pfonet: A progressive feedback optimization network for lightweight single image dehazing. IEEE Trans. Image Process. 2023, 32, 6558–6569. [Google Scholar] [CrossRef] [PubMed]
  6. McCartney, E.J. Optics of the Atmosphere: Scattering by Molecules and Particles; John Wiley and Sons, Inc.: New York, NY, USA, 1976. [Google Scholar]
  7. Nayar, S.K.; Narasimhan, S.G. Vision in bad weather. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; pp. 820–827. [Google Scholar]
  8. Narasimhan, S.G.; Nayar, S.K. Vision and the atmosphere. Int. J. Comput. Vis. 2002, 48, 233–254. [Google Scholar] [CrossRef]
  9. Liu, H.; Yang, J.; Wu, Z.; Zhang, Q.; Deng, Y. A fast single image dehazing method based on dark channel prior and Retinex theory. Acta Autom. Sin. 2015, 41, 1264–1273. [Google Scholar]
  10. He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [PubMed]
  11. Zhu, Q.; Mai, J.; Shao, L. A fast single image haze removal algorithm using color attenuation prior. IEEE Trans. Image Process. 2015, 24, 3522–3533. [Google Scholar] [PubMed]
  12. Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. Dehazenet: An end-to-end system for single image haze removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef] [PubMed]
  13. Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. Aod-net: All-in-one dehazing network. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4770–4778. [Google Scholar]
  14. Dong, Y.; Liu, Y.; Zhang, H.; Chen, S.; Qiao, Y. FD-GAN: Generative adversarial networks with fusion-discriminator for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 10729–10736. [Google Scholar]
  15. Mehta, A.; Sinha, H.; Narang, P.; Mandal, M. Hidegan: A hyperspectral-guided image dehazing gan. In Proceedings of the IEEE/CVF Conference On Computer Vision And Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 212–213. [Google Scholar]
  16. Liu, X.; Ma, Y.; Shi, Z.; Chen, J. Griddehazenet: Attention-based multi-scale network for image dehazing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October—2 November 2019; pp. 7314–7323. [Google Scholar]
  17. Qin, X.; Wang, Z.; Bai, Y.; Xie, X.; Jia, H. FFA-Net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 11908–11915. [Google Scholar]
  18. Zhao, X. A modified prior-based single-image dehazing method. Signal Image Video Process. 2022, 16, 1481–1488. [Google Scholar] [CrossRef]
  19. Chen, X.; Li, H.; Li, C.; Jiang, W.; Zhou, H. Single Image Dehazing Based on Sky Area Segmentation and Image Fusion. IEICE TRANSACTIONS Inf. Syst. 2023, 106, 1249–1253. [Google Scholar] [CrossRef]
  20. Berman, D.; Treibitz, T.; Avidan, S. Single image dehazing using haze-lines. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 42, 720–734. [Google Scholar] [CrossRef] [PubMed]
  21. Zhang, H.; Patel, V.M. Densely connected pyramid dehazing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3194–3203. [Google Scholar]
  22. Ren, W.; Pan, J.; Zhang, H.; Cao, X.; Yang, M.-H. Single image dehazing via multi-scale convolutional neural networks with holistic edges. Int. J. Comput. Vis. 2020, 128, 240–259. [Google Scholar] [CrossRef]
  23. Qu, Y.; Chen, Y.; Huang, J.; Xie, Y. Enhanced pix2pix dehazing network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 8160–8168. [Google Scholar]
  24. Song, Y.; He, Z.; Qian, H.; Du, X. Vision transformers for single image dehazing. IEEE Trans. Image Process. 2023, 32, 1927–1941. [Google Scholar] [CrossRef] [PubMed]
  25. Dutta, S.; Basarab, A.; Georgeot, B.; Kouamé, D. DIVA: Deep unfolded network from quantum interactive patches for image restoration. Pattern Recognit. 2024, 2024, 110676. [Google Scholar] [CrossRef]
  26. Zhao, H.; Kong, X.; He, J.; Qiao, Y.; Dong, C. Efficient image super-resolution using pixel attention. In Proceedings of the Computer Vision–ECCV 2020 Workshops: Glasgow, UK, 23–28 August 2020; Proceedings, Part III 16; Springer: Cham, Switzerland, 2020; pp. 56–72. [Google Scholar]
  27. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  28. Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
  29. Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H.; Shao, L. Learning enriched features for fast image restoration and enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1934–1948. [Google Scholar] [CrossRef] [PubMed]
  30. Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H.; Shao, L. Learning enriched features for real image restoration and enhancement. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXV 16; Springer: Cham, Switzerland, 2020; pp. 492–511. [Google Scholar]
  31. Zhao, S.; Zhang, L.; Shen, Y.; Zhou, Y. RefineDNet: A weakly supervised refinement framework for single image dehazing. IEEE Trans. Image Process. 2021, 30, 3391–3404. [Google Scholar] [CrossRef] [PubMed]
Figure 1. FFA-Net structure diagram.
Figure 1. FFA-Net structure diagram.
Electronics 13 03438 g001
Figure 2. Basic block structure diagram of FFA-Net.
Figure 2. Basic block structure diagram of FFA-Net.
Electronics 13 03438 g002
Figure 3. HPA-HFF network structure.
Figure 3. HPA-HFF network structure.
Electronics 13 03438 g003
Figure 4. Basic block structure.
Figure 4. Basic block structure.
Electronics 13 03438 g004
Figure 5. HPA module.
Figure 5. HPA module.
Electronics 13 03438 g005
Figure 6. Schematic diagram of SPA, CBAM, and PA.
Figure 6. Schematic diagram of SPA, CBAM, and PA.
Electronics 13 03438 g006
Figure 7. HFF module.
Figure 7. HFF module.
Electronics 13 03438 g007
Figure 8. GC module.
Figure 8. GC module.
Electronics 13 03438 g008
Figure 9. Comparison of dehazing results on the SOTS Indoor test set.
Figure 9. Comparison of dehazing results on the SOTS Indoor test set.
Electronics 13 03438 g009
Figure 10. Comparison of dehazing results on the SOTS Outdoor test set.
Figure 10. Comparison of dehazing results on the SOTS Outdoor test set.
Electronics 13 03438 g010
Figure 11. Comparison of dehazing effects of real hazy images.
Figure 11. Comparison of dehazing effects of real hazy images.
Electronics 13 03438 g011
Figure 12. Subjective comparison of ablation experiments.
Figure 12. Subjective comparison of ablation experiments.
Electronics 13 03438 g012
Figure 13. Image index change curve when η is 0.01–0.1.
Figure 13. Image index change curve when η is 0.01–0.1.
Electronics 13 03438 g013
Table 1. Quantitative comparison results of different algorithms on the SOTS test set.
Table 1. Quantitative comparison results of different algorithms on the SOTS test set.
MethodSOTS IndoorSOTS OutdoorParam (M)Latency (ms)
PSNR (dB)SSIMPSNR (dB)SSIM
DCP16.730.861719.150.8146
AOD-Net19.060.850420.290.87650.0020.351
DehazeNet20.130.845722.160.82330.0090.899
RefineDNet23.230.943123.840.9324
GridDehazeNet32.250.983730.860.98190.9569.345
MSBDN33.670.986033.480.982031.3513.254
FFA-Net36.390.988633.570.98404.45649.397
DehazeFormer-T35.340.983133.150.97160.68616.278
HPA-HFF39.410.996735.520.98875.5415.648
Table 2. Ablation experiment results.
Table 2. Ablation experiment results.
ModelSOTS IndoorSOTS OutdoorParam (M)Latency (ms)
PSNR (dB)SSIMPSNR (dB)SSIM
Model A36.390.988633.570.98404.45649.397
Model B38.530.991234.520.98575.53821.482
Model C36.550.980933.670.98353.41517.391
Model D38.970.993435.010.98535.18316.869
Model E39.410.996735.520.98875.54115.648
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, H.; Chen, M.; Li, H.; Peng, H.; Su, Q. Feature Fusion Image Dehazing Network Based on Hybrid Parallel Attention. Electronics 2024, 13, 3438. https://doi.org/10.3390/electronics13173438

AMA Style

Chen H, Chen M, Li H, Peng H, Su Q. Feature Fusion Image Dehazing Network Based on Hybrid Parallel Attention. Electronics. 2024; 13(17):3438. https://doi.org/10.3390/electronics13173438

Chicago/Turabian Style

Chen, Hong, Mingju Chen, Hongyang Li, Hongming Peng, and Qin Su. 2024. "Feature Fusion Image Dehazing Network Based on Hybrid Parallel Attention" Electronics 13, no. 17: 3438. https://doi.org/10.3390/electronics13173438

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop