Nighttime Image Dehazing Based on Multi-Scale Gated Fusion Network

Zhao, Bo; Wu, Han; Ma, Zhiyang; Fu, Huini; Ren, Wenqi; Liu, Guizhong

doi:10.3390/electronics11223723

Open AccessArticle

Nighttime Image Dehazing Based on Multi-Scale Gated Fusion Network

by

Bo Zhao

^1,2,*,

Han Wu

¹,

Zhiyang Ma

²,

Huini Fu

²,

Wenqi Ren

³ and

Guizhong Liu

¹

School of Information and Communication Engineering, Xi’an Jiaotong University, Xi’an 710049, China

²

China North Vehicle Research Institute, Beijing 100072, China

³

School of Cyber Science and Technology, Sun Yat-sen University, Shenzhen Campus, Shenzhen 528406, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(22), 3723; https://doi.org/10.3390/electronics11223723

Submission received: 22 October 2022 / Revised: 2 November 2022 / Accepted: 7 November 2022 / Published: 14 November 2022

(This article belongs to the Special Issue Advances in Image Enhancement)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In this paper, we propose an efficient algorithm to directly restore a clear image from a hazy input, which can be adapted for nighttime image dehazing. The proposed algorithm hinges on a trainable neural network realized in an encoder–decoder architecture. The encoder is exploited to capture the context of the derived input images, while the decoder is employed to estimate the contribution of each input to the final dehazed result using the learned representations attributed to the encoder. The constructed network adopts a novel fusion-based strategy which derives three inputs from an original input by applying white balance (WB), contrast enhancing (CE), and gamma correction (GC). We compute pixel-wise confidence maps based on the appearance differences between these different inputs to blend the information of the derived inputs and preserve the regions with pleasant visibility. The final clear image is generated by gating the important features of the derived inputs. To train the network, we introduce a multi-scale approach to avoid the halo artifacts. Extensive experimental results on both synthetic and real-world images demonstrate that the proposed algorithm performs favorably against the state-of-the-art dehazing for nighttime images.

Keywords:

night image dehazing; encoder–decoder architecture; image fusion; multi-scale network

1. Introduction

The single-image dehazing [1,2] aims to estimate the unknown clean scene given a hazy or foggy image. This is a classical image processing problem, which has received active research efforts in the computer vision communities [3]. Early dehazing methods focus on exploiting hand-crafted features based on the statistics of clean images, such as dark channel prior [1] and local max contrast [4]. To avoid hand-crafted priors, recent work [5,6,7] automatically learns haze-relevant features using convolutional neural networks (CNNs). In the dehazing literature, under the assumption of spatially invariant atmospheric light, the hazing process is usually modeled as [1],

I (x) = J (x) t (x) + A (1 - t (x)),

(1)

where

J (x)

and

I (x)

denote the haze-free scene radiance and the observed hazy image,

A

is the global atmospheric light, and

t (x)

is the scene transmission describing the portion of light that is not scattered and reaches the camera sensors. To recover the clear scene from a hazy input, most dehazing methods try to estimate the transmission

t (x)

and the atmospheric light

A

, given a hazy image.

Estimating transmission from hazy images is a severely ill-posed problem. Some approaches try to use visual cues to capture statistical properties of hazy images [8,9]. However, these transmission approximations are inaccurate, especially for the scenes where the colors of objects are inherently similar to those of atmospheric lights. Note that such an erroneous transmission estimation directly affects the quality of the dehazed image, resulting in undesired haze artifacts. Instead of using hand-crafted features, CNN-based approaches [5,7] are proposed to estimate the transmissions. However, these methods still follow the conventional dehazing methods in estimating atmospheric lights to recover clean images. Thus, if the transmission maps are not estimated well, they will interfere with the following airlight estimation and thereby lead to low-quality dehazed results.

In addition, even the state-of-the-art deep learning based methods need to compute the atmospheric light [5,7,10] or reformulated variables which are dependent on the atmospheric light [6,11]. These approaches suffer from important limitations on nighttime hazy scenes. This is mainly due to the multiple light sources that cause a strongly non-uniform illumination of the scene. However, we note that there are a few works to address nighttime dehazing.

To address the above issues, we propose a novel trainable neural network that does not explicitly estimate the transmission and atmospheric light. Thus, the artifacts arising from transmission and airlight estimation errors can be alleviated in the final restored results. The proposed neural network is built on a fusion strategy which aims to seamlessly blend several input images by preserving only the specific features of the composite output image.

We derive several inputs based on two major factors in nighttime hazy images that need to be dealt with. The first one is the color cast introduced by the environmental light. The second one is the lack of visibility due to attenuation. Therefore, we tackle these two problems by deriving three inputs from the original degraded image with the aim of recovering the visibility of the scene in at least one of them. The first input ensures a natural rendition (second column of Figure 1) of the output by eliminating chromatic casts caused by the atmospheric or environmental light. The second contrast-enhanced input generates a better holistic appearance but mainly in the thick hazy regions. However, the contrast-enhanced images are too dark in the light hazy regions. Hence, to recover the light hazy regions, we find that the gamma-corrected images restore information of the light hazy regions well. Consequently, the three derived inputs are gated by three confidence maps (fifth, sixth, and seventh columns of Figure 1), which aim to preserve the regions with good visibility. In addition, we propose to use the normalization (NM) of nighttime hazy images to provide detailed scene information by substituting gamma correction.

This paper is an extension of our preliminary version [12], which concentrates on daytime dehazing. In this paper, we first improve the network architecture (Section 3.2) and then adapt our network to work effectively on nighttime hazy scenes (Section 4). The contributions of this paper are summarized as follows:

We propose a deep trainable neural network that restores clear images without assuming restrictions on scene transmission and atmospheric light.
We demonstrate the effectiveness of a gated fusion network for single nighttime image dehazing by leveraging the derived inputs from an original input.
We train the proposed model with a multi-scale approach to eliminate the halo artifacts that hurt image recovering.
We show that the proposed algorithm can effectively process nighttime hazy images which are not well handled by most dehazing methods. We show that the proposed method performs favorably against the state-of-the-arts.

2. Related Work

2.1. Day-Time Image Dehazing

Tang et al. [13] combined four types of haze-relevant features with Random Forest to estimate the transmission. Zhu et al. [14] introduced a linear model and learned the parameters of the model in a supervised manner under a color attenuation prior. However, these methods are still developed based on hand-crafted features.

Recently, CNNs have also been used for haze removal and related problems [15,16,17,18]. Cai et al. [5] proposed a DehazeNet and a BReLU layer to estimate the transmissions from hazy inputs. In [7], a coarse-scale network was first used to learn the mapping between hazy inputs and their transmissions, and then, a fine-scale network was exploited to refine the transmission. Zhang and Patel [10] proposed a densely connected encoder–decoder structure for joint estimating the transmission map and atmospheric light. Yang and Sun [11] build a deep architecture incorporating the prior learning for single image dehazing. In the recent level-aware progressive network (LAP-Net) model, an image is restored by fusing the results at various haze levels at different stages. However, one problem of these CNN-based methods [5,7] is that all these models require accurate transmission and atmospheric light estimation steps to restore clear images. Although the AOD-Net [6] method bypasses the estimation step, this approach still needs to compute an additional variable

K (x)

which integrates both transmission

t (x)

and atmospheric light

A

. Thus, the AOD-Net falls as one of the physics models as described in (1) that encounters issues with ill-posed problems. To alleviate these problems, several end-to-end networks [19,20,21,22] have recently been proposed to directly filter the input image.

Different from these CNN-based approaches, our proposed network is built on the principle of image fusion, and it is trained to produce the sharp image directly without estimating transmission and atmospheric light. The main idea of image fusion is to combine several images into a single one, retaining only the most significant features. This idea has been used in a number of applications such as image editing [23] and video super-resolution [24].

2.2. Nighttime Dehazing

Different from common image dehazing, nighttime hazy images often include visible man-made light sources with varying colors and non-uniform illumination [25]. These light sources may introduce noticeable amounts of glow that are not present in haze images taken in the daytime, which makes the estimation of atmospheric light inaccurate and causes some sharp images prior to becoming invalid. However, in recent years, the community has paid relatively less research attention to the nighttime haze removal problem.

Pei and Lee [26] estimate the ambient illumination and the haze thickness by transferring the hazy input into a grayish one; then, they recover the dehazed result using the refined DCP by a bilateral filter in local contrast correction. Zhang et al. [27] build a new imaging model for nighttime conditions; then, they remove the haze by using the DCP along with estimating the point-wise environmental light. Based on the proposed physics model, they estimate the ambient illumination and transmission by combining a maximum reflectance prior (MRP) [28]. However, MRP shares the common limitations of most statistical prior-based methods. When the scene objects are inherent with a solely distinct color, the maximum reflectance prior becomes invalid in nighttime scenes. In [29], Li et al. also introduce a nighttime haze model that is a linear combination of the direct transmission, airlight and glow. Using the physics model, the authors first reduce the effect of the glow and then recover the final dehazed result. Nevertheless, this approach tends to generate some halo artifacts in the dehazed results. Ancuti et al. [30] assume that the brightest pixels of local patches filtered by a minimal operator can capture the properties of atmospheric light, and they use the multi-scale fusion approach to obtain a visibility-enhanced image.

Similar to [25,30], we also propose a multi-scale fusion network for nighttime dehazing. Differently, without any tedious estimation of contrast, saturation, saliency, and airlight, we directly predict the weight maps for each derived input by the trainable network.

3. Multi-Scale Gated Fusion Network Architecture

This section presents the details of our multi-scale gated fusion network that employs an original degraded image and three derived images as inputs. We refer to this network as multi-scale GFN, or MSGFN, as shown in Figure 2. The central idea is to learn the confidence maps to combine several input images into a single one by keeping only the most significant features of them.

3.1. Derived Inputs

We derive several inputs based on the following observations. The first one is that the colors in hazy images often change due to the influence of the atmospheric light. The second is the lack of visibility in distant regions due to scattering and attenuation phenomena. Based on these observations, we generate three inputs that recover the color and visibility of the entire image from the original hazy image. We first estimate the white balanced (WB) image

I_{w b}

of the hazy input

I

to recover the latent color of the scene. Then, we extract visible information including the contrast enhanced (CE)

I_{c e}

and the gamma corrected (GC)

I_{g c}

to generate better holistic quality.

White balanced input. Our first input is a white balanced image which aims to eliminate chromatic casts caused by the atmospheric color. In the past decades, a number of white balancing approaches [31,32] have been proposed. In this paper, we use the gray world assumption [33] based technique. Despite its simplicity, this low-level approach has shown to generate comparable results to those of more complex white balance methods [3]. The gray world assumption is that given an image with a sufficient quantity of color variations, the average value of the Red, Green and Blue components of the image should average out to a common gray value. This assumption is in general valid in any given real-world scene since the variations in colors are random and independent. It would be safe to say that given a large number of samples, the average should tend to converge to the mean value, which is gray. White balancing algorithms can make use of this gray world assumption by forcing images to have a uniform average gray value for the R, G, and B channels. For example, if an image is shot under a hazy weather condition, the captured image will have an atmospheric light

A

cast over the entire image. The effect of this atmospheric light cast disturbs the gray world assumption of the original image. By imposing the assumption on the captured image, we would be able to remove the atmospheric light cast and re-acquire the colors of our original scene. Figure 3b demonstrates such an effect.

Although white balancing could discard the color shifting caused by the atmospheric light, the results still present low contrast. To enhance the contrast, we introduce the following two derived inputs.

Contrast-enhanced input. Similar to prior dehazing methods [34,35], our second input is a contrast-enhanced image of the original hazy input. Ancuti [34] derived a contrast-enhanced image by subtracting the average luminance value

\tilde{I}

of the entire image

I

from the hazy input and then using a factor

μ

to linearly increase the luminance in the recovered hazy regions as follows:

I_{c e} = μ (I - \tilde{I}),

(2)

where

μ = 2 (0.5 + \tilde{I})

. Although

\tilde{I}

is a good indicator of image brightness, there is a problem in this input, especially in denser haze regions. The main reason is that the negative values of

(I - \tilde{I})

may dominate the contrast-enhanced input as

\tilde{I}

increases. As shown in Figure 3c, the dark image regions tend to be black after contrast enhancing.

3.2. Network Architecture

Only using one scale is subject to halo artifacts in the dehazed results, particularly for strong transitions within the confidence maps [34,35]. Hence, we perform estimation by varying the image resolution in a coarse-to-fine manner to prevent halo artifacts. The multi-scale approach is motivated by the fact that the human visual system is sensitive to local changes (e.g., edges) over a wide range of scales. As a merit, the multi-scale approach provides a convenient way to incorporate local image details over varying resolutions.

The proposed multi-scale GFN is shown in Figure 2. Finer level networks basically have the same structure as the coarsest network. However, the first convolutional layer takes the dehazed output from a previous stage as well as its own hazy image and derived inputs in a concatenated form. Each input size is twice the size of its coarser-scale network. As shown in Figure 2, there is an up-sampling layer to resize the coarser output before the next stage. At the finest scale, the original full-resolution image is recovered.

We use an encoder–decoder network in each scale, which has been shown to produce good results for a number of generative tasks. In particular, we choose a variation of the residual encoder–decoder block for image dehazing. We use skip connections between encoder and decoder halves of the network, where features from the encoder side are concatenated to be fed to the decoder. This significantly accelerates the convergence and helps generate a much clear dehazed image. In addition, we improve encoder–decoder modules by using residual blocks [36] after each convolution layer. We use shared weights in each scale, which operates in a way similar to using data multiple times [37] (i.e., data augmentation regarding scales) and reduces the number of parameters need to be learned.

We perform an early fusion by concatenating the original hazy image and three derived inputs in the input layer. Rectification layers are added after each convolutional or deconvolutional layer. The convolutional layers act as a feature extractor, which preserves the primary information of scene colors in the input layer, meanwhile eliminating the unimportant colors from the inputs. The deconvolutional layers are then combined to recover the weight maps of three derived inputs. In other words, the outputs of the deconvolutional layers are the confidence maps of the derived input images

I_{w b}

,

I_{c e}

and

I_{g c}

.

We use three down-convolutional blocks and three deconvolutional blocks in each scale. The stride for down-convolution layer is two, which down-samples feature maps to half size and doubles the channel of the previous layer. Each of the following ResBlocks contains two convolution layers. Each convolutional layer is of the same kernel size of

3 \times 3

except the first layer. The first layer operates on the input image with kernel size of

5 \times 5

. In this work, we demonstrate that explicitly modeling confidence maps has several advantages. These are discussed later in Section 7.1. Once the confidence maps for the derived inputs are predicted, we fuse different inputs using the proposed gating method as illustrated in Figure 2,

J^{k} = Gating (I_{w b}^{k}, I_{c e}^{k}, I_{g c}^{k}),

(3)

where

J^{k}

is the gated result at scale k. The gating function is defined by

Gating (x, y, z) = C_{x} \circ x + C_{y} \circ y + C_{z} \circ z,

(4)

where ∘ denotes element-wise multiplication, and

C_{(\cdot)}

is the confidence map for the input.

The multi-scale approach desires that each scale output is a clear image of the corresponding scale. Thus, we train our network so that all intermediate dehazed images should form a pyramid of the sharp image. The MSE criterion is applied to every level of the pyramid. In particular, given a collection of N training pairs

I_{i}

and

J_{i}

, where

I_{i}

is a hazy image and

J_{i}

is the clean version as the ground truth, the loss function at the k-th scale is defined as follows:

L (Θ, k) = \frac{1}{N} \sum_{i = 1}^{N} {∥F (I_{i, k}, Θ, k) - J_{i, k}∥}^{2}, k \in {1, 2, 3},

(5)

where

Θ

keeps the weights of the convolutional and deconvolutional kernels.

4. Nighttime Image Dehazing

Since nighttime scenes usually have artificial light sources that generate a glow effect in hazy images, most state-of-the-art dehazing methods based on (1) suffer from significant limitations on nighttime hazy scenes. Although several physics-based models [28,29] are developed to relax those strict constraints in (1) (e.g., homogeneous atmosphere illumination, unique extinction coefficient), a straightforward extension of common hazy image modeling to nighttime scenes cannot always hold in real cases. This is why our approach does not resort to an explicit inversion of the nighttime light propagation model in [28,29].

Fusion Process of Nighttime Dehazing

In this paper, we demonstrate that the proposed MSGFN can also effectively enhance nighttime hazy images. We employ the strategy described in Figure 2 to remove haze in nighttime images. For the derived inputs, we also use WB and CE to process a color correction step and visibility enhancement, respectively. However, there is another problem in nighttime hazy images that needs to be dealt with, i.e., non-uniform illumination caused by multiple light sources in the low-light environment. Therefore, we derive a third input, normalization (NM) of the nighttime hazy image, to obtain an illumination-balanced result and enhance the finest details in the nighttime scene.

The NM operation is obtained by linearly stretching all the pixel values in order to fit them into the interval [0, 1]. In this case, we achieve a better illumination result by contrast stretching the range of intensities of the hazy input. The main advantage of this operation is that we do not require any parameter to be tuned, and therefore, without information loss in the derived input. As shown in Figure 3d, the NM operation shifts and scales all the color pixel intensities of the input so that the pixel values cover the entire available dynamic range and obtain a balanced illumination.

Similar to the dehazing approach described in Section 3, we use the proposed MSGFN to predict three confidence maps for the derived inputs to ensure that regions of high contrast or high saliency will receive greater weights in the gated fusion process:

J^{k} = Gating (I_{w b}^{k}, I_{c e}^{k}, I_{n m}^{k}),

(6)

where

I_{n m}^{k}

is the normalized version of the nighttime hazy input at scale k.

5. Nighttime Dehazing Results

We evaluate the proposed algorithm with nighttime configuration on real-world night hazy scenes, with comparisons to the state-of-the-art methods in terms of visual effect.

5.1. Training Data

Owing to the difficulty in obtaining realistic nighttime training data, we adopt the similar strategy as the daytime methods [38] to synthesize nighttime hazy scenes. Specifically, we select 4500 clear nighttime scenes in the KAIST dataset [39] and use the method proposed in [40] to estimate depth maps, which has been demonstrated to be effective for nighttime scene depth estimation. Then, we synthesize 4500 nighttime hazy images according to (1). Note that although some nighttime hazy imaging models are proposed [28,29] to account for artificial light sources, we found our synthesized nighttime hazy images based on (1) look natural as shown in Figure 4, since the proposed model in [28,29] is a generalization of (1) when the illumination is assumed to be a constant.

5.2. Quantitative Evaluation

For quantitative performance evaluation, we construct a new dataset of synthesized nighttime hazy images. We select 100 clear nighttime scenes (different from those that were used for training) from the KAIST dataset [39] to synthesize 500 hazy images (using different scattering coefficients to synthesize different haze concentrations). Figure 5 shows some dehazed images by the evaluated methods. The nighttime dehazing methods of MRP [28] and GMLC [29] generate the results with significant color distortions. The dehazed images by the deep learning approaches of MSCNN [7], GCAN [19], and GDN [41] still contain significant haze residuals. In contrast, our algorithm restores these images well. Overall, the dehazed results by the proposed algorithm are of higher visual quality and with fewer color distortions. The visual results in Figure 5 match the quantitative results shown in Table 1.

5.3. Qualitative Evaluation

To demonstrate that the proposed method generalizes well in real-world nighttime hazy scenes, we use real-world hazy images for experiments against the state-of-the-art dehazing algorithms designed for nighttime scenes, i.e., Maximum Reflectance Prior (MRP) [28] as well as Glow and Multiple Light Colors (GMLC) [29], and daytime scenarios, i.e., DCP [42], MSCNN [7], GCAN [19], and GDN [41].

Figure 6b,c show the results by the recent nighttime dehazing methods, i.e., MRP [28] and GMLC [29]. The MRP method [28] tends to darken the hazy inputs in some regions. For example, the road regions of the first image are much darker than those obtained by other methods. In addition, the GMLC model [29] generates some artifacts in sky regions, e.g., the first and third images in Figure 6e. Figure 6d–g demonstrate the limitations of the daytime dehazing approaches, i.e., DCP [1], MSCNN [7], GCAN [19], and GDN [41] when applied to nighttime hazy inputs. Both the prior-based [1] and CNN-based [7,19,41] methods cannot recover colors well, and they only slightly remove the haze in these night scenes. In contrast, our algorithm generates dehazed results with clearer and sharper details and without artifacts in the sky regions as shown in Figure 6h.

6. Further Experiments

6.1. Comparison on O-Haze

In the main paper, we evaluate the proposed algorithm on all the 45 hazy images from the O-HAZE dataset [43] against the state-of-the-art methods. In this supplementary material, we retrain the proposed MSGFN using the same 40 training data as in the NTIRE 2018 challenge [2] and compare it with the winning methods in [2] on the five test images. As shown in Table 2, our proposed method performs favorably against the winning methods in the NTIRE 2018 challenge [2] and achieves the highest SSIM score.

6.2. Mixed Training Strategy

To demonstrate the robustness of the proposed MSGFN on different training strategies, we train an additional network with all three datasets (daytime, nighttime, and underwater datasets) together. We refer to this network as “all-in-one” and refer to the original network in the main paper as “separate”.

As shown in Table 3, the proposed model performs better on the daytime (SOTS and O-Haze) and nighttime datasets with the “separate” training strategy. Meanwhile, the performance on the underwater dataset becomes better with the “all-in-one” training strategy. Since the underwater inputs in the UIEB dataset are real-world images, the main reason may be that more types of training data benefit real-world image reconstruction.

7. Analysis and Discussions

7.1. Effectiveness of Fusion Strategy

Image fusion is a method to blend several images into a single one by retaining only the most useful features. To effectively blend the information of the derived inputs, we filter their important information by computing corresponding confidence maps. Consequently, in our gated fusion network, the derived inputs are gated by three pixel-wise confidence maps that aim to preserve the regions with good visibility. Our fusion network has two advantages: the first one is that it can reduce patch-based artifacts (e.g., dark channel prior [1]) by single pixel operations, and the other one is that it can eliminate the influence caused by transmission and atmospheric light estimation.

To show the effectiveness of fusion network, we also train an end-to-end network without a fusion process for the dehazing task. This network has the same architecture as MSGFN except the input is hazy image and output is dehazed result without confidence maps learning at each scale. In addition, we also conduct an experiment based on an equivalent fusion strategy, i.e., all the three derived inputs are weighted equally using

1 / 3

. Figure 7 shows visual comparisons of on two real-world examples with different settings. In these examples, the approach without gating generates dark images in Figure 7b, and the method with an equivalent fusion strategy generates results with color distortion and dark regions as shown in Figure 7c. In contrast, our results contain most scene details and maintain the original colors which demonstrate the effectiveness of the learned confidence maps.

7.2. Effectiveness of Derived Inputs

We can design different inputs for different enhancement tasks. In practice, it is difficult to entirely remove the haze effects of hazy images by an enhancing approach. Therefore, the input generation process seeks to recover sharp regions in at least one of the derived inputs as analyzed in Section 3. They complement each other nicely to help dehazing by the gated fusion network as shown in Table 4.

Although we do not claim that these are the optimal inputs, our experiments show that the three derived inputs are the minimum inputs. Using two or fewer of them will not generate better results in the proposed network (Table 4) for nighttime image dehazing. In the future work, we will explore more effective derived inputs or directly learn the derived inputs in the fusion network. The network parameters comparison can be found in Table 5.

8. Conclusions

In this paper, we addressed the nighttime image dehazing via a multi-scale gated fusion network (MSGFN), a fusion based encoder–decoder architecture, by learning confidence maps for derived inputs. Compared with previous methods which impose restrictions on transmission and atmospheric light, our proposed MSGFN is easy to implement and reproduce since the proposed approach does not rely on the estimations of transmission and atmospheric/environmental light. In the approach, we first applied white balance to recover the scene color and then generated two contrast enhanced images for better visibility. Third, we carried out the MSGFN to estimate the confidence map for each derived input. Finally, we used the confidence maps and derived inputs to render the final result. The experimental results on synthetic and real-world nighttime images demonstrate the effectiveness of the proposed approach.

Author Contributions

Supervision, W.R. and G.L.; Validation, Z.M.; Visualization, H.F.; Writing—original draft, B.Z.; Writing—review & editing, H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
Ancuti, C.; Ancuti, C.O.; Timofte, R. Ntire 2018 challenge on image dehazing: Methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 891–901. [Google Scholar]
Li, Y.; You, S.; Brown, M.S.; Tan, R.T. Haze visibility enhancement: A survey and quantitative benchmarking. Comput. Vis. Image Underst. 2017, 165, 1–16. [Google Scholar] [CrossRef] [Green Version]
Tan, R.T. Visibility in bad weather from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, USA, 24–26 June 2008. [Google Scholar]
Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. Dehazenet: An end-to-end system for single image haze removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. Aod-net: All-in-one dehazing network. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
Ren, W.; Liu, S.; Zhang, H.; Pan, J.; Cao, X.; Yang, M.H. Single image dehazing via multi-scale convolutional neural networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
Berman, D.; Avidan, S. Non-local image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Fattal, R. Single image dehazing. In Proceedings of the SIGGRAPH, Los Angeles, CA, USA, 11–15 August 2008. [Google Scholar]
Zhang, H.; Patel, V.M. Densely connected pyramid dehazing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3194–3203. [Google Scholar]
Yang, D.; Sun, J. Proximal dehaze-net: A prior learning-based deep network for single image dehazing. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
Ren, W.; Ma, L.; Zhang, J.; Pan, J.; Cao, X.; Liu, W.; Yang, M.H. Gated fusion network for single image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3253–3261. [Google Scholar]
Tang, K.; Yang, J.; Wang, J. Investigating Haze-Relevant Features in a Learning Framework for Image Dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
Zhu, Q.; Mai, J.; Shao, L. A Fast Single Image Haze Removal Algorithm Using Color Attenuation Prior. IEEE Trans. Image Process. 2015, 24, 3522–3533. [Google Scholar] [PubMed] [Green Version]
Zhang, H.; Patel, V.M. Density-aware Single Image De-raining using a Multi-stream Dense Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Li, R.; Pan, J.; Li, Z.; Tang, J. Single image dehazing via conditional generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Chen, W.T.; Ding, J.J.; Kuo, S.Y. PMS-Net: Robust Haze Removal Based on Patch Map for Single Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11681–11689. [Google Scholar]
Qu, Y.; Chen, Y.; Huang, J.; Xie, Y. Enhanced pix2pix dehazing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Chen, D.; He, M.; Fan, Q.; Liao, J.; Zhang, L.; Hou, D.; Yuan, L.; Hua, G. Gated context aggregation network for image dehazing and deraining. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Waikoloa Village, HI, USA, 7–11 January 2019; pp. 1375–1383. [Google Scholar]
Hong, M.; Xie, Y.; Li, C.; Qu, Y. Distilling Image Dehazing With Heterogeneous Task Imitation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3462–3471. [Google Scholar]
Zhang, H.; Sindagi, V.; Patel, V.M. Multi-scale single image dehazing using perceptual pyramid deep network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 902–911. [Google Scholar]
Sim, H.; Ki, S.; Choi, J.S.; Seo, S.; Kim, S.; Kim, M. High-resolution image dehazing with respect to training losses and receptive field sizes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 912–919. [Google Scholar]
Pérez, P.; Gangnet, M.; Blake, A. Poisson image editing. ACM Trans. Graph. 2003, 22, 313–318. [Google Scholar] [CrossRef]
Liu, D.; Wang, Z.; Fan, Y.; Liu, X.; Wang, Z.; Chang, S.; Huang, T. Robust video super-resolution with learned temporal dynamics. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
Ancuti, C.; Ancuti, C.O.; De Vleeschouwer, C.; Bovik, A.C. Day and night-time dehazing by local airlight estimation. IEEE Trans. Image Process. 2020, 29, 6264–6275. [Google Scholar] [CrossRef] [PubMed]
Pei, S.C.; Lee, T.Y. Nighttime haze removal using color transfer pre-processing and dark channel prior. In Proceedings of the IEEE International Conference on Image Processing, Orlando, FL, USA, 30 September–3 October 2012. [Google Scholar]
Zhang, J.; Cao, Y.; Wang, Z. Nighttime haze removal based on a new imaging model. In Proceedings of the ICIP, Paris, France, 27–30 October 2014. [Google Scholar]
Jing, Z.; Yang, C.; Shuai, F.; Yu, K.; Chang, W.C. Fast haze removal for nighttime image using maximum reflectance prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017. [Google Scholar]
Li, Y.; Tan, R.T.; Brown, M.S. Nighttime Haze Removal with Glow and Multiple Light Colors. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
Ancuti, C.; Ancuti, C.O.; De Vleeschouwer, C.; Bovik, A.C. Night-time dehazing by fusion. In Proceedings of the IEEE International Conference on Image Processing, Phoenix, AZ, USA, 25–28 September 2016. [Google Scholar]
Bekaert, P. Enhancing underwater images and videos by fusion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012. [Google Scholar]
Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Sbert, M. Color channel compensation (3C): A fundamental pre-processing step for image enhancement. IEEE Trans. Image Process. 2019, 29, 2653–2665. [Google Scholar] [CrossRef] [PubMed]
Reinhard, E.; Adhikhmin, M.; Gooch, B.; Shirley, P. Color transfer between images. Comput. Graph. Appl. 2001, 21, 34–41. [Google Scholar] [CrossRef]
Ancuti, C.O.; Ancuti, C. Single image dehazing by multi-scale fusion. IEEE Trans. Image Process. 2013, 22, 3271–3282. [Google Scholar] [CrossRef] [PubMed]
Choi, L.K.; You, J.; Bovik, A.C. Referenceless prediction of perceptual fog density and perceptual image defogging. IEEE Trans. Image Process. 2015, 24, 3888–3901. [Google Scholar] [CrossRef] [PubMed]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26–30 June 2016; pp. 770–778. [Google Scholar]
Tao, X.; Gao, H.; Shen, X.; Wang, J.; Jia, J. Scale-recurrent network for deep image deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8174–8182. [Google Scholar]
Li, B.; Ren, W.; Fu, D.; Tao, D.; Feng, D.; Zeng, W.; Wang, Z. Benchmarking single-image dehazing and beyond. IEEE Trans. Image Process. 2018, 28, 492–505. [Google Scholar] [CrossRef]
Hwang, S.; Park, J.; Kim, N.; Choi, Y.; Kweon, I.S. Multispectral Pedestrian Detection: Benchmark Dataset and Baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
Li, Z.; Snavely, N. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Liu, X.; Ma, Y.; Shi, Z.; Chen, J. GridDehazeNet: Attention-Based Multi-Scale Network for Image Dehazing. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2341–2353. [Google Scholar]
Ancuti, C.O.; Ancuti, C.; Timofte, R.; De Vleeschouwer, C. O-HAZE: A dehazing benchmark with real hazy and haze-free outdoor images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 754–762. [Google Scholar]
Mondal, R.; Santra, S.; Chanda, B. Image dehazing by joint estimation of transmittance and airlight using bi-directional consistency loss minimized FCN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 920–928. [Google Scholar]
Ki, S.; Sim, H.; Choi, J.S.; Kim, S.; Kim, M. Fully end-to-end learning based conditional boundary equilibrium gan with receptive field sizes enlarged for single ultra-high resolution image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 817–824. [Google Scholar]
Engin, D.; Genç, A.; Kemal Ekenel, H. Cycle-dehaze: Enhanced cyclegan for single image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 825–833. [Google Scholar]
Galdran, A.; Alvarez-Gila, A.; Bria, A.; Vazquez-Corral, J.; Bertalmío, M. On the duality between retinex and image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8212–8221. [Google Scholar]
Shao, Y.; Li, L.; Ren, W.; Gao, C.; Sang, N. Domain Adaptation for Image Dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2808–2817. [Google Scholar]

Figure 1. We exploit a multi-scale gated fusion network for nighttime haze removal. The first column gives degraded inputs. The second, third, and fourth columns show derived inputs for original images. The learned confidence maps for the derived inputs are shown in the fifth, sixth, and seventh columns, respectively. The last column shows our results by the proposed algorithm.

Figure 2. The architecture of the proposed multi-scale GFN, which takes a hazy image pyramid and the corresponding three enhanced versions as the input and outputs a latent image pyramid. These three derived inputs are weighted by the three confidence maps in each scale learned by our network, and the full-resolution output is the final dehazed result. The network contains layers of symmetric encoders and decoders. Skip shortcuts are connected from the convolutional feature maps to the deconvolutional feature maps.

Figure 3. We derive three enhanced versions from nighttime hazy images. These derived inputs contain different important visual cues of the input hazy images. (a) Inputs; (b) WB; (c) CE; (d) NM.

Figure 4. The proposed method for synthesizing nighttime hazy images. The first row shows original clear night scenes from [39],and the second row shows the synthesizing hazy images.

Figure 5. Dehazed results on synthetic nighttime images. The results by learning-based methods of MSCNN [7], GCAN [19], and GDN [41] have some remaining haze, while the nighttime dehazing methods of MRP [28] and GMLC [29] tend to generate some color distortions. In contrast, the dehazed results by our algorithm are close to the ground-truth images. (a) Hazy inputs; (b) DCP [42]; (c) MRP [28]; (d) GMLC [29]; (e) MSCNN [7]; (f) GCAN [19]; (g) GDN [41]; (h) Our results; (i) Ground truth.

Figure 6. Qualitative comparison of different methods on real-world images. (a) Hazy inputs; (b) MRP [28]; (c) GMLC [29]; (d) DCP [42]; (e) MSCNN [7]; (f) GCAN [19]; (g) GDN [41]; (h) Our results.

Figure 7. Effectiveness of the gated fusion network. (a) Hazy inputs; (b) w/o fusion; (c) Equivalent fusion; (d) MSGFN.

Table 1. Average PSNR/SSIM of dehazed results by state-of-the-art dehazing methods on nighttime hazy images.

Input	DCP [42]	MSCNN [7]	MRP [28]	GMLC [29]	GCAN [19]	GDN [41]	MSGFN
13.70/0.6063	24.94/0.902	17.45/0.7113	16.49/0.6936	14.49/0.552	19.18/0.8133	21.03/0.8916	30.92/0.9492

Table 2. Average PSNR/SSIM of dehazed results on the 5 test images in the O-Haze [43] dataset. Although our algorithm ranks third in terms of PSNR, our method achieves the highest SSIM score.

Ranking in [2]	Methods	PSNR	SSIM
1	BJTU	24.598	0.777
2	KAIST-VICLAB [22]	24.232	0.687
−	Ours (MSGFN)	24.054	0.787
3	Scarlet Knights [21]	24.029	0.775
4	FKS	23.877	0.775
5	Dq-hisfriends	23.207	0.770
6	Ranjanisi [44]	23.180	0.705
7	Mt.Phoenix	23.124	0.755
8	Ranjanisi [44]	22.997	0.701
9	KAIST-VICLAB [45]	22.705	0.707
10	Mt.Phoenix	22.080	0.731
11	IVLab	21.750	0.717
12	CLEAR	20.291	0.683
13	CLFStudio	20.230	0.722
14	SiMiT-Lab [46]	19.628	0.674
15	AHappyFaceI	18.494	0.669
16	ASELSAN	18.123	0.675
17	Dehazing-by-retinex [47]	17.547	0.652
18	IMCL	16.527	0.616
	baseline (hazy images)	15.784	0.634

Table 3. Comparison of MSGFN using different training strategies (“separate” vs. “all-in-one”).

Dataset	Separate	All-in-One
Daytime (SOTS)	25.37/0.93	23.19/0.94
Daytime (O-Haze)	21.21/0.76	19.05/0.74
Nighttime	30.92/0.95	22.69/0.86
Underwater (UIEB)	17.61/0.86	21.99/0.91

Table 4. Average PSNR/SSIM using different derived inputs. The method only using the original image means that we directly learn the mapping from degraded images to the clear ones.

Inputs				Dehazing
Original	WB	CE	GC/NM	PSNR/SSIM
√	×	×	×	22.38/0.90
√	√	√	×	24.83/0.92
√	√	×	√	23.54/0.92
√	×	√	√	23.96/0.89
√	√	√	√	25.37/0.93

Table 5. Comparison of MSGFN and state-of-the-art dehazing approaches with respect to parameters.

Model	Parameters
AOD-Net [6]	1.83 $\times 10^{3}$
MSCNN [7]	8.01 $\times 10^{3}$
DehazeNet [5]	8.24 $\times 10^{3}$
Domain adaption [48]	2.27 $\times 10^{5}$
PMS-Net [17]	2.44 $\times 10^{5}$
GCAN [19]	7.03 $\times 10^{5}$
EPDN [18]	1.74 $\times 10^{7}$
DCPDN [10]	6.69 $\times 10^{7}$
CGAN [16]	1.23 $\times 10^{8}$
Ours	5.15 $\times 10^{5}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, B.; Wu, H.; Ma, Z.; Fu, H.; Ren, W.; Liu, G. Nighttime Image Dehazing Based on Multi-Scale Gated Fusion Network. Electronics 2022, 11, 3723. https://doi.org/10.3390/electronics11223723

AMA Style

Zhao B, Wu H, Ma Z, Fu H, Ren W, Liu G. Nighttime Image Dehazing Based on Multi-Scale Gated Fusion Network. Electronics. 2022; 11(22):3723. https://doi.org/10.3390/electronics11223723

Chicago/Turabian Style

Zhao, Bo, Han Wu, Zhiyang Ma, Huini Fu, Wenqi Ren, and Guizhong Liu. 2022. "Nighttime Image Dehazing Based on Multi-Scale Gated Fusion Network" Electronics 11, no. 22: 3723. https://doi.org/10.3390/electronics11223723

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Nighttime Image Dehazing Based on Multi-Scale Gated Fusion Network

Abstract

1. Introduction

2. Related Work

2.1. Day-Time Image Dehazing

2.2. Nighttime Dehazing

3. Multi-Scale Gated Fusion Network Architecture

3.1. Derived Inputs

3.2. Network Architecture

4. Nighttime Image Dehazing

Fusion Process of Nighttime Dehazing

5. Nighttime Dehazing Results

5.1. Training Data

5.2. Quantitative Evaluation

5.3. Qualitative Evaluation

6. Further Experiments

6.1. Comparison on O-Haze

6.2. Mixed Training Strategy

7. Analysis and Discussions

7.1. Effectiveness of Fusion Strategy

7.2. Effectiveness of Derived Inputs

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI