Feature Reduction Networks: A Convolution Neural Network-Based Approach to Enhance Image Dehazing

Yu, Haoyang; Yuan, Xiqin; Jiang, Ruofei; Feng, Huamin; Liu, Jiaxing; Li, Zhongyu

doi:10.3390/electronics12244984

Open AccessArticle

Feature Reduction Networks: A Convolution Neural Network-Based Approach to Enhance Image Dehazing

by

Haoyang Yu

¹

,

Xiqin Yuan

²,

Ruofei Jiang

³,

Huamin Feng

²,

Jiaxing Liu

² and

Zhongyu Li

^4,*

¹

School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China

²

Beijing Electronic Science and Technology Institute, Beijing 100070, China

³

Institute of Electronic Computing Technology, China Academy of Railway Science Co., Ltd., Beijing 100081, China

⁴

School of Cyber Science and Technology, University of Science and Technology of China, Hefei 230026, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(24), 4984; https://doi.org/10.3390/electronics12244984

Submission received: 9 November 2023 / Revised: 1 December 2023 / Accepted: 5 December 2023 / Published: 12 December 2023

(This article belongs to the Special Issue Image Processing Based on Convolution Neural Network)

Download

Browse Figures

Versions Notes

Abstract

:

Image dehazing represents a dynamic area of research in computer vision. With the exponential development of deep learning, particularly convolutional neural networks (CNNs), innovative and effective image dehazing techniques have surfaced. However, in stark contrast to the majority of computer vision tasks employing CNNs, the output from a dehazing model is often treated as uninformative noise, even though the model’s filters are engineered to extract pertinent features from the images. The standard approach of end-to-end models for dehazing involves noise removal from the hazy image to obtain a clear one. Consequently, the model’s dehazing capacity diminishes, as the noise is progressively filtered out throughout the propagation phase. This leads to the conception of the feature reduction network (FRNet), which is a distinctive CNN architecture that incrementally eliminates informative features, thereby resulting in the output of noise. Our experimental results indicate that the CNN-driven FRNet surpasses previous state-of-the-art (SOTA) methods in terms of the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) evaluation metrics. This highlights the effectiveness of the FRNet across various image dehazing datasets. With its reduced overhead, the CNN-based FRNet demonstrates superior performance over current SOTA methods, thereby affirming the efficacy of CNNs in image dehazing tasks.

Keywords:

artificial intelligence; image dehazing; computer vision; CNNs; deep learning

1. Introduction

The task of single image dehazing, a complex issue in computer vision, is traditionally regarded as an ill-posed problem [1]. The aim of this task is to generate a clear, haze-free image from an observed hazy one [2,3,4]. Historically, most image dehazing methodologies have been based on the atmospheric scattering model, which can be mathematically represented as follows:

I = J t + A (1 - t)

(1)

In the context of our study, I symbolizes the hazy image, J denotes the haze-free image, A refers to the global atmospheric illumination of the medium, and t represents the transmission map of the medium. As per Equation (1), we can deduce the global atmospheric light and medium transmission map from a hazy image, thereby extracting the inherent haze-free image from the observed hazy one.

Recently, specialized neural networks designed for image dehazing tasks [5] have gained popularity due to their superior end-to-end performance. These techniques employ a deep model to analyze the noise distribution in the hazy image, which is subsequently used to remove the haze and recover the haze-free image—this is a process known as dehazing. This technique has proven effective in enhancing image visibility. These methods successfully identify and eliminate the haze from the image, thus resulting in a clearer and more precise picture.

The feature fusion attention network (FFA-Net) [6] uses CNNs to create a feature fusion attention mechanism, thus enabling the adaptive processing of distinct channel and pixel information. The AECR-Net [7] outperforms the FFA-Net by implementing downsampling and contrastive learning techniques. DehazeFormer [8], based on the Swin transformer architecture, demonstrated significant performance improvements over earlier CNN-based methods when they were tested on the SOTS [5] dataset.

The aforementioned methods rely on supervised learning, thus utilizing synthetic data to train the dehazing networks. Hazy images are created by adding noise to clear images, and these are then input into the dehazing networks. Consequently, the dehazing networks can be seen as the inverse process of adding noise to clear images.

However, these end-to-end approaches may overlook the issue pertaining to the value domain of the model output, which is typically regarded as noise (a shorthand term for haze distribution). In these dehazing networks, the output is added to the input, which is a hazy image, to produce a haze-free image. This process parallels the original synthesis procedure.

Consequently, to derive a haze-free image, the output of the end-to-end model is negative in value, thus being considerably influenced by the activation functions [8]. Moreover, this value domain issue has ramifications for model design:

On the one hand, negative values are often deemed to be data that should be eliminated or suppressed. The functionalities of various activation functions such as ReLU and LeakyReLU are designed specifically for this purpose. Thus, in the design of dehazing models, these negative values should also be removed. On the other hand, the output of modern end-to-end dehazing neural networks is noise that is devoid of any informative content, despite the images undergoing numerous feature extraction layers to glean useful information.

To address the above challenge, we have developed the FRNet, which can both output noise and fully utilize its feature extraction capabilities. Since the image features contain rich information while the noise has zero informational content, the FRNet utilizes this property by continuously extracting and removing information from the network’s hidden layers. Consequently, the model outputs noise containing no information, thereby achieving the goal of dehazing.

In summary, our contributions in this work are as follows:

We propose a novel FRNet for single-image dehazing. The network leverages a unique rrequency residual block (FRBlock) to accurately capture and gradually eliminate noise features.
We demonstrate, through extensive evaluation, that our FRNet achieves superior performance in terms of both image dehazing and computational efficiency compared to current state-of-the-art methods.
We illustrate the potential of our FRNet for other low-level vision tasks, thus suggesting its scalability and adaptability for broader applications in the field of computer vision.

The remainder of this paper is organized as follows: In Section 2, we review related work, including existing dehazing methods and the limitations they present. Section 3 presents the details of our proposed FRNet, including its overall architecture and the design of the FRBlock. In Section 4, we discuss the experimental setup, datasets used, and the evaluation metrics. We present and discuss the results of our model compared to state-of-the-art methods. Section 5 presents an ablation study to validate the effectiveness of each proposed component in the FRNet. Finally, in Section 5, we conclude the paper, summarize our findings, and discuss potential directions for future research.

2. Related Works

Image dehazing techniques, which are essential for image processing and enhancement, aim to mitigate the “haze” effect in images. These techniques are broadly categorized into three groups: prior-based methods, CNNs-based methods, and transformer-based methods. Each category has unique characteristics, advantages, and limitations.

Prior-based methods fundamentally rely on the atmospheric scattering model, which is a mathematical construct describing light scattering due to atmospheric particles. These methods also use handcrafted priors, which are predetermined assumptions aiding in the estimation of haze-free images. Several noteworthy prior-based techniques have been reported in the literature.

For instance, He and colleagues [9] proposed the dark channel prior (DCP) approach, thus leveraging the dark channel prior to estimate the medium’s transmission map, which is a critical element in the dehazing process. The dark channel prior is based on the observation that most nonsky regions in outdoor, haze-free images usually contain pixels with very low intensities in at least one color channel.

In another study, Zhu et al. [10] introduced the color attenuation prior, thereby using a linear model to represent scene depth. This method hinges on the observation that the scene depth and the air light color can be estimated through the brightness and saturation difference between a hazy image and its haze-free counterpart [11,12].

Additionally, Berman et al. [13] applied a nonlocal prior to approximate haze-free image colors, which was founded on the assumption that colors in a haze-free image can be accurately approximated by a relatively small number of distinct colors, therbey resulting in compact clusters in the RGB color space.

Despite their efficacy, prior-based methods do have limitations. Their most significant drawback is their sensitivity to different scenarios [14]. As these methods rely on specific priors, their performance can significantly degrade when these priors do not hold. For example, the dark channel prior may not be valid for images with a lot of white or bright objects. Therefore, the robustness of these methods is questionable, and they may not yield satisfactory results in all cases.

Conversely, CNN-based methods exploit the high semantic abstraction capability of CNNs. For example, DehazeNet [15] employed CNNs to estimate the medium transmission map, which is a key component in determining the haze level in an image. By estimating this map, DehazeNet was able to restore hazy images using the atmospheric scattering model, which is a physical model that describes light scattering in the atmosphere. This model is particularly effective, as it can elucidate the degradation process of hazy images, thereby enabling more accurate restoration.

The FFA-Net [6] introduced a novel feature fusion attention mechanism designed to efficiently process different channel and pixel information, thus providing a more nuanced understanding of image data. The attention mechanism allowed the model to concentrate computational resources on the most informative features, thereby enhancing the overall dehazing performance.

The AECR-Net [7], which is a recent development, improved upon the FFA-Net by incorporating downsampling and contrastive learning techniques. Downsampling helped reduce the computational complexity and enhance the model’s efficiency, while contrastive learning enabled the model to learn rich and robust representations using contrasting positive and negative samples. This approach further improved the model’s ability to handle the intricate task of image dehazing.

The RefineDNet [16], proposed in this work, is a two-stage weakly supervised dehazing framework, which combines the advantages of both classification approaches. In the first stage, RefineDNet collaborates with the DCP to enhance the visibility of the input hazy images by generating preliminary results. The incorporation of the DCP dehazing into the framework facilitates end-to-end training and evaluation. In the second stage, RefineDNet refines the preliminary dehazed images by employing two refinement networks, thereby improving their authenticity and the quality of the transmission maps.

In contrast, methods based on CNNs tap into CNNs’ inherent capacity for robust semantic abstraction. DehazeNet [15] serves as a prominent example, thereby employing CNNs to estimate the atmospheric medium transmission map, which is a vital element in the restoration of hazy images. This estimation uses the atmospheric scattering model.

The FFA-Net [6] further advanced this approach by introducing a novel feature fusion attention mechanism. This mechanism is adept at effectively processing different channel and pixel information, thereby enhancing the network’s performance in image dehazing tasks.

Building on the FFA-Net’s foundation, the AECR-Net [7] introduced an added layer of complexity by incorporating downsampling and contrastive learning techniques. This approach improves the network’s efficiency and performance in image dehazing tasks.

However, a common shortcoming of these CNN-based methods is their predominant focus on expanding the network’s depth and width, thereby often neglecting the kernel size, which plays a significant role in network performance [17]. This neglect is primarily due to concerns over computational complexity, which can potentially impair the network’s efficiency.

Recently, transformer-based methods have gained prominence in computer vision tasks, including single image dehazing. This shift is largely driven by the success of vision transformers (ViTs) [18]. A case in point is Dehazeformer [8], which uses the Swin transformer as its backbone and has demonstrated superior performance over previous CNN-based methods on the SOTS dataset.

Novel methods [19,20] have also emerged, which transform traditional CNNs into architectures that resemble Transformers, thereby showing promising results across multiple domains. However, these methods often allocate excessive resources for token processing, thereby neglecting the varying information weights of different channels and leading to network inefficiency. Addressing this issue is crucial for improving the efficiency and performance of the network.

3. Method

3.1. Overall Architecture

The primary aim of our work is to design a deep learning model for the restoration of hazy images, thereby effectively transforming them into their haze-free counterparts. This goal is achieved by constructing a deep model specifically designed to address noise perception. As shown in Figure 1, the FRNet is patterned after the U-Net [21] architecture and is a multi-scale hierarchical framework composed of several FRBlocks. These blocks generate feature maps of varying sizes to extract multi-scale features and iteratively remove useful information during propagation.

Upon inputting a hazy image into the FRNet, the model undertakes the task of restoring the corresponding haze-free image. Initially, the hazy input image I undergoes a

3 \times 3

convolutional operation to derive a low-level feature embedding. To achieve increased network depth while minimizing the escalation of parameters and computational expenses, we utilize depth-wise separable convolution, as proposed in reference [22]. This approach allows for efficient information fusion and feature transformation. Moreover, we incorporate local and global residuals into the network.

We then employ attention mechanisms to enhance the network’s expressive capacity. It extracts global information t from shallow feature embeddings, which are correlated with the surrounding values. We delegate this task to the fusion module, which is based on the selective kernel (SK) module [23]. This module dynamically merges feature maps from different branches using a channel attention mechanism.

Finally, we incorporate the gating mechanism [24] to allow the network to focus more on informative features in the convolution block. It serves as the pixel attention module, in addition to its role as a nonlinear activation function.

The design process of our FRNet for single-image dehazing can be broken down into several key steps:

Incorporation of FRBlock: The most critical step in our design process is the integration of frequency residual blocks (FRBlocks). These blocks are unique to our network and are designed to accurately capture and gradually eliminate noise features in hazy images. Unlike the addition shortcuts used in traditional networks, our FRBlocks utilize subtraction shortcuts, thereby enhancing the accuracy of the noise distribution representation.
Adjustment of Convolution Window Size: We experiment with different convolution window sizes to optimize the performance of our FRNet. Specifically, we replace the $3 \times 3$ depthwise separable convolution with a larger $7 \times 7$ convolution window. This modification plays a significant role in enhancing the image reconstruction quality.
SK Fusion: The SK fusion layer is designed to reduce the number of parameters, thereby enhancing the model’s computational efficiency. The SK fusion layer contributes to the FRNet by facilitating the efficient fusion of feature maps from different paths of the network. By using selective weighting, it ensures that the most relevant features from both paths are incorporated into the final output. This mechanism helps enhance the network’s ability to capture and restore fine details in the dehazing process.
Training Strategy: The final step in our design process involves validating the performance of our FRNet and conducting necessary adjustments for optimization. We evaluate the model on the indoor dataset of RESIDE and compare its performance against state-of-the-art methods.

Through this stepwise design process, we have developed an efficient and effective FRNet for single-image dehazing. The unique features of our network, such as the FRBlocks and the larger convolution window size, contribute significantly to its superior performance, as demonstrated in the evaluation results.

3.2. Feature Reduction Convolution Block

As depicted in Figure 1, the FRBlock is a variant of the gUNet [25]. It enhances the network’s ability to preserve additional spatial structural information through gating mechanisms.

Given the feature map x, we first apply batch normalization, which is denoted as

\hat{x} = BatchNorm (x)

, to initiate the normalization process. Then, we extract the spatial dimensional information:

\begin{matrix} x_{1} & = Sigmoid ({Conv}_{1 \times 1}, PWConv, DWConv (\hat{x})), \\ x_{2} & = Sigmoid ({Conv}_{1 \times 1}, DWConv, PWConv (\hat{x})), \end{matrix}

(2)

In this context, PWConv represents a pointwise convolutional layer, and DWConv corresponds to a depthwise convolutional layer.

We first use PWConv and DWConv to extract features, thus rearranging the order of the PWConv and DWConv layers to extract different features. This rearrangement is crucial in avoiding gradient dispersion, as having overly similar

x_{1}

and

x_{2}

values can lead to this issue. Then, we use Conv_1×1 to achieve nonlinearity.

We treat

x_{1}

as a gating signal for

x_{2}

. Following this step, the result is obtained by subtracting it from the identity shortcut x.

The use of gating mechanisms [26] as a nonlinear activation function enhances the expressive capability of the networks. In numerical computations, the multiplication of sigmoid activation functions prevents the feature values from decreasing below 0, thereby preventing gradient vanishment.

3.3. SK Fusion Layer

The SK Fusion Layer represents a straightforward tweak of the SK module [23]. Analogous ideas are evident in MIRNet [27]. We are dealing with two feature maps,

x_{1}

and

x_{2}

, where

x_{1}

originates from the skip connection, and

x_{2}

originates from the main path.

Initially, we employ a PWConv layer,

f (\cdot)

, to project

x_{1}

to

x_{1} = f (x_{1})

. Subsequently, we leverage the global average pooling operation

GAP (\cdot)

, the PPS (PWConv–PWConv–Sigmoid) that is symbolized as

F_{P P S} (\cdot)

, the softmax function, and a split operation to compute the fusion weights:

a_{1}, a_{2} = Split (Softmax (F_{P P S} (GAP (x_{1} + x_{2})))) .

(3)

Finally,

x_{1}

and

x_{2}

are merged into

y = a_{1} x_{1} + a_{2} x_{2}

. To curtail the number of parameters, the two PWConv layers of the PPS serve as dimensionality-reducing and dimensionality-increasing layers, respectively, thus aligning with the traditional channel attention mechanism [28].

3.4. Training Strategy

In training the gUNet, we employ several techniques that are not commonly used in training image restoration networks. Synchronized batch normalization implements the sharing of mean and variance across multiple GPUs, thus enabling the normalization batch size to match the minibatch size in data parallel training. This approach has been found to yield the best performance when the normalization batch size is 16 or larger [29]. Frozen batch normalization employs previously computed population statistics during training, thus rendering it as a constant affine transform. We activate FrozenBN in the final epochs when the learning rate is low to maintain training test consistency and prevent gradient explosion [30]. Mixed precision training allows for low-precision training on certain layers during training to reduce computational cost and memory usage without affecting the model’s performance [31]. We utilize mixed precision training to reduce the training time and increase the minibatch size. Additionally, due to our relatively large initial learning rate and the use of mixed precision training, we find that the model may produce NaN during training. Hence, we apply a linear warmup strategy to reduce the risk of model collapse.

4. Experimental Section

4.1. Experiment Setups

We follow the previous studies of [8,25]; the experiments were implemented using PyTorch 1.10 on a NVIDIA RTX-4090 GPU. To enrich the training data and accommodate the GPU memory limit, the images were randomly cropped into

256 \times 256

patches during training. The AdamW optimizer [32] was utilized to optimize our FRNet with the exponential decay rates

β_{1}

and

β_{2}

set to 0.9 and 0.999, respectively. The initial learning rate was set to 0.0005, and the cosine annealing strategy was used to adjust the learning rate, which gradually decreases from the initial learning rate to

7 \times 10^{- 6}

. The batch size was set to 8. We employed only

L_{1}

loss to optimize our FRNet.

4.2. Datasets and Metrics

Experiments were conducted on the RESIDE [5], Haze4K [33], and RS-Haze [5] datasets. For the RESIDE dataset, we followed the setup of FFA-Net [6], which contained two main experimental setups: RESIDE-IN and RESIDE-OUT. Specifically, our model was trained on both the RESIDE indoor training set (ITS, 13,990 image pairs) and RESIDE outdoor training set (OTS, 313,950 image pairs), and it was subsequently tested on both the indoor (500 image pairs) and outdoor (500 image pairs) collections of RESIDE synthesis outdoor training set (SOTS).

In the case of the Haze4K dataset, we followed the settings of PMNet [34]. The Haze4K dataset comprises 4000 image pairs, with 3000 pairs used for training and the remaining 1000 pairs for testing. Compared to RESIDE, Haze4K combines indoor and outdoor scene images, with a more realistic synthesis pipeline.

For the RS-Haze dataset, we followed the guidelines of DehazeFormer [8]. It consists of 54,000 images, with 51,300 allocated for training and the remaining 2700 for testing. RS-Haze is a remote sensing image dataset for dehazing a scene, but the haze exhibits high heterogeneity, which is in stark contrast to RESIDE and Haze4K.

The sample images are as shown in Figure 2, and the detailed information on the experimental dataset is shown in Table 1.

All models used their original training strategies, and we report the best results obtained. Following the conventional evaluation protocol [6], the performance of the FRNet model and the compared SOTA methods was assessed using the PSNR and the SSIM.

4.3. Compared Methods

In our study, we compared our proposed FRNet with several state-of-the-art dehazing methods, as shown in Table 2: DCP, as a classic prior method based on a physical model with zero learnable parameters, stands independently as a category. The second group consists of recently advanced deep learning models, which were all ranked according to their proposed years. Since this paper also employs a deep learning model, a reasonable comparison was made with the aforementioned deep learning models in this section:

DCP (dark channel prior) [9]: DCP is a classic haze removal algorithm that utilizes the dark channel prior in an image to estimate the haze density and haze image in a scene, thereby removing haze from the image.
DehazeNet [15]: DehazeNet is a deep learning-based haze removal model that trains a convolutional neural network to learn the mapping from hazy images to dehazed images, thereby restoring clear images.
MSCNN (multiscale CNN) [35]: MSCNN is a multiscale convolutional neural network model used for haze removal in images. It extracts features at different scales and combines global and local information for haze processing.
AODNet (All-in-one dehazing network) [36]: AODNet is a deep learning-based haze removal model that estimates the atmospheric light and atmospheric light transmission in an image to remove haze and restore image clarity.
GFN (gated fusion network) [37]: GFN employs an end-to-end trainable neural network with an encoder–decoder structure to restore clear images from hazy inputs. The network adaptively fuses multiple derived inputs using pixelwise confidence maps, thereby leveraging differences in appearance from image enhancements to preserve well-visible regions and gate important features.
GCANet (gated context aggregation network) [38]: GCANet proposes an end-to-end gated context aggregation network, which leverages smoothed dilated convolutions and multilevel feature fusion to directly restore haze-free images, thereby outperforming previous methods that rely on handcrafted priors; the network’s effectiveness is further demonstrated through state-of-the-art performance on the image deraining task.
GridDehazeNet [39]: GridDehazeNet employs an end-to-end trainable CNN with learned input preprocessing, a grid backbone for attention-based multiscale estimation to avoid bottlenecks, and artifact-reducing postprocessing to deliver state-of-the-art dehazing performance on synthetic and real images without relying on atmospheric scattering models. Experiments show atmospheric scattering models are not necessarily beneficial for image dehazing even on synthetic data, as they can force dimension reduction at the cost of useful information.
MSBDN (multiscale boosted dehazing network) [40]: The proposed multiscale boosted dehazing Network incorporates boosting and error feedback principles into a U-Net architecture through a boosted decoder and dense feature fusion module to progressively restore haze-free images while preserving spatial information from multiscale features; extensive evaluations show state-of-the-art dehazing performance on benchmarks and real-world images.
PFDN (physics-based feature dehazing network): The proposed physics-based feature dehazing network incorporates an effective feature dehazing unit based on the haze formation model into an encoder–decoder architecture with residual learning for end-to-end feature space haze removal and clear image reconstruction, thereby outperforming state-of-the-art methods by effectively exploring useful dehazing features. Residual learning and explicit modeling of the haze physics in a deep feature space enable effective end-to-end training for haze removal in the proposed network.
FFA-Net (feature fusion and attention network) [6]: FFA-Net is a feature fusion and attention mechanism network model that enhances haze removal by introducing attention mechanisms and feature fusion modules.
AECR-Net (contrastive learning for compact single-image dehazing) [7]: The proposed AECR-Net exploits contrastive regularization built on contrastive learning to utilize both hazy images as negative samples and clear images as positive samples, thus pulling the restored image closer to the clear image and pushing it away from the hazy one in the representation space; furthermore, an autoencoder-like framework with adaptive mixup and dynamic feature enhancement improves dehazing capability while balancing performance and memory requirements, thus surpassing state-of-the-art approaches for synthetic and real-world datasets.

4.4. Results Analysis

In our study, we categorized dehazing algorithms into two main types: prior-based methods and learning-based methods. This grouping is designed to facilitate readers’ understanding and comparison of different dehazing approaches. Prior-based methods typically rely on the statistical characteristics of images, thus having fewer parameters and lower computational latency. However, they may be limited in applicability when dealing with complex and variable real-world scenes. On the other hand, learning-based methods, especially deep learning methods—although requiring a large number of parameters and longer computation time—often achieve better performance due to their ability to learn complex models from large-scale data.

The performance of various dehazing methods differs significantly in terms of speed and the number of learning parameters. For instance, while GridDehazeNet and MSBDN offer impressive speed, they possess relatively high numbers of learning parameters. In contrast, GCANet and PMNet achieve a desirable balance between speed and the number of learning parameters. Our proposed variations of FRNet, namely, FRNet-Tiny, FRNet-Small, and FRNet-Big, demonstrated competitive performances across most datasets. In our proposed FRNet, we further categorize the models into Tiny, Small, and Big based on the depth and complexity of the network. FRNet-Tiny has fewer parameters and lower latency, but its performance may be slightly inferior to FRNet-Big. However, it might be a good choice for applications with high real-time requirements but relatively low performance demands. In contrast, although FRNet-Big has more parameters and longer computation latency; it achieves a very high level of performance, which is suitable for applications with extremely high performance requirements. This design allows our model to address various application needs and environments. We followed the settings of the large models in gUNet [25] and DeHazeFormer [8] and employed different convolutional channel numbers for the FR-Block. The Tiny, Small, and Big models use N, 2N, and 4N convolutional channels (N = 24), respectively, thereby resulting in an exponential growth trend in model parameters.

FRNet-Big stands out among the variations. It outperformed state-of-the-art methods in terms of PSNR and SSIM values on most datasets, while maintaining relatively low speed and fewer learning parameters. It achieved high PSNR and SSIM values across most datasets, with a PSNR value of about 39.17. This affirms that our method maintains image quality during the dehazing process and effectively restores clarity and details. Despite its high accuracy, FRNet-Big managed to keep the processing speed low, at approximately 5.461 milliseconds, thereby making it suitable for real-time or speed-sensitive applications.

On the other hand, FRNet-Small made significant strides in terms of the number of learning parameters, using only about 2.174 million. This resulted in a smaller model size and higher computational efficiency, thereby making it highly suitable for resource-constrained environments.

FRNet-Tiny, the most lightweight version of our model, is designed to operate with the least computational resources. We achieved this by minimizing the number of FRBlocks and reducing the depth of the network. Despite its simplicity, FRNet-Tiny still delivered competitive performance, thereby making it a suitable choice for mobile and edge computing devices where computational resources are limited.

In summary, the proposed FRNet demonstrates competitiveness in the field of image dehazing across multiple metrics: accuracy, speed, and the number of learning parameters. It shows potential for practical applications. However, further evaluations and experiments may be necessary to validate its stability and adaptability across a broader range of datasets and scenarios. FRNet represents a promising image dehazing method that delivers performance satisfaction across multiple metrics, thereby making it a strong candidate for future research and applications.

4.5. Quantitative Comparison

We quantitatively compared the performance of our proposed model with several baseline models, and the results are presented in Table 2. Our proposed model demonstrated superior image dehazing performance and efficiency under various parameter settings. Specifically, our model compares favorably with previous SOTA methods across most datasets, while maintaining fewer parameters. We observed significant improvements in PSNR evaluation metrics on the RESIDE-Indoor, RESIDE-Outdoor, Haze-4K, and RS-Haze datasets compared to previous SOTA methods.

Interestingly, our smaller model, despite using fewer parameters, achieved results that were closely comparable with those with larger parameter settings. This suggests that our model does not require extensive training data to yield satisfactory results. For instance, even though PMNet has 6.42 times the parameters of FRNet-Tiny, its performance on RESIDE-IN exceeded that of FRNet-Tiny by only 0.43 dB.

On the RESIDE-Outdoor and RS-Haze datasets, our model outperformed both the PMNet and DehazeFormer, thereby achieving the highest scores on both the PSNR and SSIM metrics. Given that PMNet’s parameter count is several times that of our model, we posit that our network achieves a superior parameter–performance trade-off.

On the Haze-4K dataset, our model also slightly outperformed the SOTA models and demonstrated higher efficiency. The training set images were resized, thereby leading to a high-frequency information distribution that was inconsistent with the test set images.

We also qualitatively compared our FRNet with the SOTA image dehazing methods, and the visualization results are shown in Figure 3. DehazeNet failed to effectively remove the atmospheric haze from images, while FFA-Net performed well in outdoor scenarios but caused color distortion in indoor scenes. AOD-Net worked well in indoor scenes but caused color oversaturation outdoors. In contrast, the FRNet produced high-quality restored images with accurate brightness and color tones that closely matched the ground truth. It did not produce any artifacts, thus ensuring a faithful restoration of the original image. We argue that the purpose of filtering is to remove unimportant frequency components and emphasize significant ones. However, if the model’s output includes a hazy signal that is not significant, this can lead to artifacts or color distortions, which are issues observed in the other methods.

4.6. Ablation Study

To verify the effectiveness of each proposed component in the FRNet, we conducted an ablation study. We first constructed a baseline network as our reference. The ablation study focused on two aspects: convolution window size, where we replaced the

3 \times 3

depthwise separable convolution with a

7 \times 7

version, and SK fusion, which we replaced with MLP. Subsequently, different modules were added to the baseline network to create two distinct variants: (1) the baseline with a large window, where the convolution window in the baseline was replaced with a

7 \times 7

window, and (2) the baseline with MLP, where the sequential connection in the baseline was replaced with MLP. We evaluated these models on the indoor dataset of RESIDE. The performance of these models is summarized in Table 3.

The results of the ablation study demonstrate the effectiveness of each proposed component in the FRNet. By replacing the

3 \times 3

depthwise separable convolution with a

7 \times 7

version, we observed an improvement in the model performance, thereby indicating that larger convolution window sizes can enhance image reconstruction quality. Compared to the baseline model, the variant with a larger convolution window showed an improvement of approximately 1.74 dB in PSNR and 0.001 in SSIM, albeit with an increase in model parameters from 1.102 M to 1.897 M.

However, replacing the SK fusion with MLP resulted in a slight decrease in performance, thus suggesting that the introduction of MLP may cause information loss or incomplete reconstruction. Compared to the baseline model, the variant with MLP showed a decrease of approximately 1.8 dB in PSNR and 0.006 in SSIM, but it showed an increase in the number of model parameters from 1.102 M to 1.289 M.

In conclusion, based on the results of the ablation study, we can deduce that using larger convolution window sizes can improve image reconstruction quality at the cost of increased model complexity. Conversely, replacing SK fusion with MLP may lead to performance degradation but with an increase in the number of model parameters. Therefore, in practical applications, it is necessary to balance model performance and complexity based on specific requirements.

5. Conclusions

We proposed the FRNet as a novel approach to tackle the problem of single-image dehazing. Our network incorporates the frequency residual block (FRBlock), which is designed to accurately capture and gradually eliminate noise features using subtraction shortcuts rather than addition. This unique design ensures a more accurate representation and elimination of noise distribution in hazy images. Evaluation results reveal that our proposed FRNet offers a superior balance between performance and efficiency when compared against current state-of-the-art methods. It not only outperformed these methods in terms of key evaluation metrics like PSNR and SSIM, but also achieved this with a considerably smaller number of parameters, thereby improving computational efficiency. Importantly, our FRNet presents a scalable architecture that has potential for extension to other low-level vision tasks. Its performance, efficiency, and scalability make it a promising candidate for wider application in the field of computer vision. We look forward to further exploring its potential in future research.

Author Contributions

Conceptualization, Z.L. and X.Y.; methodology, Z.L. and H.Y.; writing—original draft preparation, Z.L. and H.Y.; writing—review and editing, H.F. and R.J.; supervision, H.F. and J.L.; project administration, Z.L.; funding acquisition, R.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the China State Railway Group Co., Ltd. Science and Technology Research and Development Plan Project (N2021S016), the China State Railway Group Co., Ltd. Science and Technology Research and Development Plan Project (N2023S016), and the BUPT Excellent Ph.D. Students Foundation (CX2021124).

Data Availability Statement

All data underlying the results are available as part of the article and no additional source data are required.

Conflicts of Interest

Author Ruofei Jiang was employed by the company Institute of Electronic Computing Technology, China Academy of Railway Science Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Correction Statement

This article has been republished with a minor correction to the title. This change does not affect the scientific content of the article.

References

Tarel, J.P.; Hautiere, N.; Caraffa, L.; Cord, A.; Halmaoui, H.; Gruyer, D. Vision enhancement in homogeneous and heterogeneous fog. IEEE Intell. Transp. Syst. Mag. 2012, 4, 6–20. [Google Scholar] [CrossRef]
Jiang, X.; Sun, J.; Li, C.; Ding, H. Video image defogging recognition based on recurrent neural network. IEEE Trans. Ind. Inform. 2018, 14, 3281–3288. [Google Scholar] [CrossRef]
Ren, W.; Zhang, J.; Xu, X.; Ma, L.; Cao, X.; Meng, G.; Liu, W. Deep video dehazing with semantic segmentation. IEEE Trans. Image Process. 2018, 28, 1895–1908. [Google Scholar] [CrossRef] [PubMed]
Liu, W.; Hou, X.; Duan, J.; Qiu, G. End-to-end single image fog removal using enhanced cycle consistent adversarial networks. IEEE Trans. Image Process. 2020, 29, 7819–7833. [Google Scholar] [CrossRef]
Li, B.; Ren, W.; Fu, D.; Tao, D.; Feng, D.; Zeng, W.; Wang, Z. Benchmarking single-image dehazing and beyond. IEEE Trans. Image Process. 2018, 28, 492–505. [Google Scholar] [CrossRef] [PubMed]
Qin, X.; Wang, Z.; Bai, Y.; Xie, X.; Jia, H. Ffa-net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11908–11915. [Google Scholar]
Wu, H.; Qu, Y.; Lin, S.; Zhou, J.; Qiao, R.; Zhang, Z.; Xie, Y.; Ma, L. Contrastive learning for compact single image dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 10551–10560. [Google Scholar]
Song, Y.; He, Z.; Qian, H.; Du, X. Vision transformers for single image dehazing. IEEE Trans. Image Process. 2023, 32, 1927–1941. [Google Scholar] [CrossRef] [PubMed]
He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [PubMed]
Zhu, Q.; Mai, J.; Shao, L. A fast single image haze removal algorithm using color attenuation prior. IEEE Trans. Image Process. 2015, 24, 3522–3533. [Google Scholar] [PubMed]
Fu, X.; Fan, Z.; Ling, M.; Huang, Y.; Ding, X. Two-step approach for single underwater image enhancement. In Proceedings of the 2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), Xiamen, China, 6–9 November 2017; pp. 789–794. [Google Scholar]
Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Bekaert, P. Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 2017, 27, 379–393. [Google Scholar] [CrossRef] [PubMed]
Berman, D.; Avidan, S.; Avidan, S. Non-local image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1674–1682. [Google Scholar]
Peng, Y.T.; Cao, K.; Cosman, P.C. Generalization of the dark channel prior for single image restoration. IEEE Trans. Image Process. 2018, 27, 2856–2868. [Google Scholar] [CrossRef] [PubMed]
Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. Dehazenet: An end-to-end system for single image haze removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef] [PubMed]
Zhao, S.; Zhang, L.; Shen, Y.; Zhou, Y. RefineDNet: A weakly supervised refinement framework for single image dehazing. IEEE Trans. Image Process. 2021, 30, 3391–3404. [Google Scholar] [CrossRef] [PubMed]
Pang, Y.; Nie, J.; Xie, J.; Han, J.; Li, X. BidNet: Binocular image dehazing without explicit disparity estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5931–5940. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the CVPR, New Orleans, LA, USA, 19–24 June 2022; pp. 11976–11986. [Google Scholar]
Ding, X.; Zhang, X.; Han, J.; Ding, G. Scaling up your kernels to 31 × 31: Revisiting large kernel design in cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11963–11975. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the CVPR, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
Narang, S.; Chung, H.W.; Tay, Y.; Fedus, W.; Fevry, T.; Matena, M.; Malkan, K.; Fiedel, N.; Shazeer, N.; Lan, Z.; et al. Do transformer modifications transfer across implementations and applications? arXiv 2021, arXiv:2102.11972. [Google Scholar]
Song, Y.; Zhou, Y.; Qian, H.; Du, X. Rethinking Performance Gains in Image Dehazing Networks. arXiv 2022, arXiv:2209.11448. [Google Scholar]
Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. Maxim: Multi-axis mlp for image processing. In Proceedings of the CVPR, New Orleans, LA, USA, 19–24 June 2022; pp. 5769–5780. [Google Scholar]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H.; Shao, L. Learning Enriched Features for Real Image Restoration and Enhancement. In Proceedings of the ECCV, Glasgow, UK, 23–28 August 2020. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Peng, C.; Xiao, T.; Li, Z.; Jiang, Y.; Zhang, X.; Jia, K.; Yu, G.; Sun, J. Megdet: A large mini-batch object detector. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6181–6189. [Google Scholar]
Wu, Y.; Johnson, J. Rethinking “batch” in batchnorm. arXiv 2021, arXiv:2105.07576. [Google Scholar]
He, T.; Zhang, Z.; Zhang, H.; Zhang, Z.; Xie, J.; Li, M. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 558–567. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Liu, Y.; Zhu, L.; Pei, S.; Fu, H.; Qin, J.; Zhang, Q.; Wan, L.; Feng, W. From synthetic to real: Image dehazing collaborating with unlabeled real data. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual, 20–24 October 2021; pp. 50–58. [Google Scholar]
Ye, T.; Zhang, Y.; Jiang, M.; Chen, L.; Liu, Y.; Chen, S.; Chen, E. Perceiving and modeling density for image dehazing. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 130–145. [Google Scholar]
Ren, W.; Liu, S.; Zhang, H.; Pan, J.; Cao, X.; Yang, M.H. Single image dehazing via multi-scale convolutional neural networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 154–169. [Google Scholar]
Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. Aod-net: All-in-one dehazing network. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4770–4778. [Google Scholar]
Ren, W.; Ma, L.; Zhang, J.; Pan, J.; Cao, X.; Liu, W.; Yang, M.H. Gated fusion network for single image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3253–3261. [Google Scholar]
Chen, D.; He, M.; Fan, Q.; Liao, J.; Zhang, L.; Hou, D.; Yuan, L.; Hua, G. Gated context aggregation network for image dehazing and deraining. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 June 2019; pp. 1375–1383. [Google Scholar]
Liu, X.; Ma, Y.; Shi, Z.; Chen, J. Griddehazenet: Attention-based multi-scale network for image dehazing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7314–7323. [Google Scholar]
Dong, H.; Pan, J.; Xiang, L.; Hu, Z.; Zhang, X.; Wang, F.; Yang, M.H. Multi-scale boosted dehazing network with dense feature fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2157–2167. [Google Scholar]
Dong, J.; Pan, J. Physics-Based Feature Dehazing Networks. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 188–204. [Google Scholar]

Figure 1. The architecture of the FRNet and the corresponding structure of the FRBlock.

Figure 2. Dataset schematic; best viewed in fullscreen mode.

Figure 3. Qualitative comparation of various dehazing methods used on the RESIDE dataset. The first row of images are outdoor scene; the last is indoor scene. The first column shows the hazy input images; the last column shows the corresponding ground truth images. The results are compared in terms of the visual quality of the dehazed images.

Table 1. Detailed information on the experimental dataset.

Datasets	Experimental Setup/Source	Image Size	Number	Purpose
RESIDE	Follow FFA-Net [6]	Uncertain	328,940	Train/Test
Haze4K	Follow PMNet [34]	Uncertain	4000	Train/Test
RS-Haze	Follow DehazeFormer [8]	$512 \times 512$	54,000	Test

Table 2. A quantitative comparative analysis of image dehazing techniques trained on disparate datasets, including accuracy of the dehazing results, the amount of learning parameters, and the speed of the dehazing process. Regarding the methods under comparison, we use bold, and underline to mark the best and second-best methods, respectively. Bold and underline are used to mark the best and second-best methods.

Methods	RESIDE-IN		RESIDE-OUT		Haze-4K		RS-Haze		Overhead
Methods	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	#Param (M)	Latency (ms)
DCP (TPAMI’10) [9]	16.77	0.836	19.17	0.825	14.15	0.767	18.06	0.754	-	-
DehazeNet (TIP’16) [15]	19.89	0.827	24.95	0.944	19.30	0.857	23.34	0.833	0.009	0.934
MSCNN (ECCV’16) [35]	19.97	0.835	22.26	0.910	14.15	0.530	22.88	0.825	0.008	0.621
AOD-Net (ICCV’17) [36]	20.66	0.819	24.20	0.935	17.22	0.832	25.05	0.835	0.002	0.408
GFN (CVPR’18) [37]	22.43	0.884	21.69	0.869	-	-	29.27	0.905	0.499	3.170
GCANet (WACV’19) [38]	30.48	0.982	-	-	-	-	34.56	0.958	0.702	2.889
GridDehazeNet (ICCV’19) [39]	32.25	0.982	31.25	0.990	23.40	0.941	36.51	0.971	0.956	7.536
MSBDN (CVPR’20) [40]	33.74	0.986	33.59	0.984	23.16	0.859	38.66	0.967	31.35	9.871
PFDN (ECCV’20) [41]	32.77	0.982	-	-	-	-	36.12	0.956	11.27	3.469
FFA-Net (AAAI’20) [6]	36.48	0.990	33.60	0.983	26.97	0.953	39.37	0.971	4.456	39.158
AECR-Net (CVPR’21) [7]	37.25	0.990	-	-	-	-	35.71	0.962	2.611	4.150
PMNet [34]	38.44	0.990	34.89	0.982	33.51	0.973	-	-	18.90	18.572
DehazeFormer-B [8]	37.96	0.994	34.96	0.984	-	-	39.82	0.973	2.514	17.459
FRNet-Tiny (Ours)	38.01	0.992	34.71	0.983	32.11	0.985	39.05	0.969	1.102	2.710
FRNet-Small (Ours)	38.76	0.992	34.92	0.985	32.37	0.986	39.27	0.971	2.174	3.654
FRNet-Big (Ours)	39.17	0.993	35.67	0.987	33.19	0.990	39.91	0.983	4.079	5.461

To facilitate a more lucid comparison of the computational overheads of models, previous approaches have been omitted from consideration.

Table 3. Ablation study on FRNet with different architectures.

Model	PSNR	SSIM	#Param
Base	38.01	0.992	1.102 M
Base + Large Window	39.75	0.993	1.897 M
Base + MLP	36.21	0.986	1.289 M

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, H.; Yuan, X.; Jiang, R.; Feng, H.; Liu, J.; Li, Z. Feature Reduction Networks: A Convolution Neural Network-Based Approach to Enhance Image Dehazing. Electronics 2023, 12, 4984. https://doi.org/10.3390/electronics12244984

AMA Style

Yu H, Yuan X, Jiang R, Feng H, Liu J, Li Z. Feature Reduction Networks: A Convolution Neural Network-Based Approach to Enhance Image Dehazing. Electronics. 2023; 12(24):4984. https://doi.org/10.3390/electronics12244984

Chicago/Turabian Style

Yu, Haoyang, Xiqin Yuan, Ruofei Jiang, Huamin Feng, Jiaxing Liu, and Zhongyu Li. 2023. "Feature Reduction Networks: A Convolution Neural Network-Based Approach to Enhance Image Dehazing" Electronics 12, no. 24: 4984. https://doi.org/10.3390/electronics12244984

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Feature Reduction Networks: A Convolution Neural Network-Based Approach to Enhance Image Dehazing

Abstract

1. Introduction

2. Related Works

3. Method

3.1. Overall Architecture

3.2. Feature Reduction Convolution Block

3.3. SK Fusion Layer

3.4. Training Strategy

4. Experimental Section

4.1. Experiment Setups

4.2. Datasets and Metrics

4.3. Compared Methods

4.4. Results Analysis

4.5. Quantitative Comparison

4.6. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Correction Statement

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI