Single-Image Defogging Algorithm Based on Improved Cycle-Consistent Adversarial Network

Zhang, Junkai; Sun, Xiaoming; Chen, Yan; Duan, Yan; Wang, Yongliang

doi:10.3390/electronics12102186

Open AccessArticle

Single-Image Defogging Algorithm Based on Improved Cycle-Consistent Adversarial Network

by

Junkai Zhang

,

Xiaoming Sun

^*

,

Yan Chen

,

Yan Duan

and

Yongliang Wang

School of Measurement-Control Technology and Communications Engineering, Harbin University of Science and Technology, Harbin 150080, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(10), 2186; https://doi.org/10.3390/electronics12102186

Submission received: 23 April 2023 / Revised: 6 May 2023 / Accepted: 9 May 2023 / Published: 11 May 2023

Download

Browse Figures

Versions Notes

Abstract

With the wave of artificial intelligence and deep learning sweeping the world, there are many algorithms based on deep learning for image defog research. However, there is still serious color distortion, contrast reduction, incomplete fog removal, and other problems. To solve these problems, this paper proposes an improved image defogging network based on the traditional cycle-consistent adversarial network. We add the self-attention module and atrous convolution multi-scale feature fusion module on the basis of the traditional CycleGAN network to enhance the feature extraction capability of the network. The perceptual loss function is introduced into the loss function of the model to enhance the texture sense of the generated image. Finally, by comparing several typical defogging algorithms, the superiority of the defogging model proposed in this paper is proved qualitatively and quantitatively. Among them, on the indoor synthetic data set, the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measurement (SSIM) of the network designed by us can reach 23.22 and 0.8809, respectively. On the outdoor synthetic data set, the PSNR and SSIM of our designed network can be as high as 25.72 and 0.8859, respectively. On the real data set, the PSNR and SSIM of our designed network can reach 21.02 and 0.8166, respectively. It is proved that the defogging network in this paper has good practicability and universality.

Keywords:

image defogging; cycle-consistent adversarial network; multi-scale features; self-attention module; atrous convolution

1. Introduction

As important carrier to record the real world, images contain abundant information. With the vigorous progress of computer vision technology, image processing [1,2] is widely used in urban intelligent transportation, the military [3], image medical research [4], video surveillance security, and other important fields. Due to the rapid growth of populations and rapid development of industrialization, the environment has become increasingly harsh, and fog and haze weather frequently appear in various regions. In bad weather, there is a large amount of suspended particulate matter (such as fog and haze) in the air. During outdoor shooting, object reflection is affected by the large amount of suspended particulate matter in the air and ambient light before it is collected by the imaging equipment [5], which seriously degrades the image quality obtained by the imaging equipment. After the image quality is reduced, a series of problems occur, such as tone deviation, decreased visibility, blurred details, etc., which especially affect the objects in the deep scene of the scene. This greatly reduces the useful information obtained from images, which seriously affects the application of images in various fields. Therefore, as part of the pre-processing work of other computer vision tasks, the importance of image defogging is self-evident, and it is of great significance to study this task.

1.1. Fog Removal Methods

Image defogging algorithms can be divided into three categories. The first category is defogging algorithms based on image enhancement, the second category is defogging algorithm based on image restoration, and the third category is defogging algorithm based on deep learning. The following introduces the research status of three kinds of image defogging algorithms.

1.1.1. Image Enhancement Defogging Algorithm

An image enhancement defogging algorithm achieves a defogging effect by improving image contrast, saturation, and clarity. Kim et al. [6] proposed a histogram equalization method with local sub-block overlap to increase image contrast, so as to obtain defogging images. The algorithm uses dynamic moving step size and averaging gray value to balance the processing quality and operation efficiency. The block effect problem can be overcome by block effect elimination filtering. However, the complexity and computation of this algorithm are large. Stark [7] proposed an adaptive image contrast enhancement method based on histogram equalization to improve the problem of poor local contrast. Image enhancement algorithms also include wavelet transform [8] and homomorphic filtering [9]. By integrating spatial and temporal information, wavelet transform enables the constructed features to express locality and multi-scale. Homomorphic filtering reduces the influence of uneven illumination intensity by combining frequency filtering and gray level change. However, this kind of image enhancement algorithm does not consider the imaging principle of the image with fog, so the effect is not ideal.

The contrast and texture of the image are enhanced by the method of image enhancement to obtain the image without fog. Since such defogging methods do not start from the real reason for the degradation of foggy images, the depth of the scene is poorly considered in the process of enhancement, which results in a less-than-ideal defogging effect.

1.1.2. Image Restoration Defogging Algorithm

The image restoration algorithm analyzes the formation principle of the image with fog, introduces the prior knowledge obtained by observation, estimates the unknown variables in the physical model, and finally obtains the image without fog, or sets the restriction conditions for some variables in the model, and then obtains the corresponding key variables. In recent years, many image defogging algorithms based on atmospheric scattering model [10,11] have appeared. Tan [12] realizes image defogging by maximizing the local contrast of the image. However, the color of the restored image is too saturated, and there is even color distortion phenomenon, and the edge is blurred. Fattal [13] proposed a method to estimate light transmission of a single image in foggy scenes, which can improve image visibility and restore contrast in fog-free scenes. However, the assumption of this method to estimate the optical transmission rate cannot be guaranteed at all times, and the universality is poor. It is only applicable to misty weather, and it is invalid when processing gray images. Tarel et al. [14] proposed a defogging algorithm applicable to gray and color images, and carried out white balance processing on the image to make the light color gradually close to pure white. In this way, the value of atmospheric light can be obtained by calculating the minimum color channel and smoothing the median filter. He et al. [15] proposed the dark channel prior (DCP) theory, estimated the dark channel graph and deduced the transmittance value combined with the atmospheric scattering model. Then, guided filtering was used to optimize the transmittance graph. The principle of this algorithm is simple, the operation efficiency is high, and the application is wide. On this basis, many scholars proposed a variety of fog removal algorithms [16,17,18]. Nishino et al. [19] used factorial Markov random fields (MRF) to estimate the natural statistical priors of reflectance and depth of field. This method can obtain higher detail features and higher saturation, but there are still some small halos in some scenes. Meng et al. [20] added boundary constraints on the basis of the dark channel prior method, which significantly improved the sky processing effect in the improved fog removal images. However, at the same time, some details in the restored image are also lost.

In summary, image restoration and defogging algorithms are a method based on prior knowledge that is needed to achieve the removal of fog in the image under certain constraints. The processing process of this kind of algorithm is often complicated, because the model generally contains multiple unknown variables, and the result of fog removal includes the appearance of the phenomenon of edge halo.

1.2. Deep Learning Defogging Algorithm

The deep learning method mainly obtains clear images by fitting relevant parameters in the atmospheric scattering model, or directly regards the image defogging problem as a kind of image reconstruction problem, and the model directly restores the input with fog to obtain clear images, realizing end-to-end single-image defogging. Cai et al. [21] introduced convolutional neural network into the image defogging task for the first time, and proposed an end-to-end trainable defogging network DehazeNet. DehazeNet uses a multi-scale convolution operation to extract haze features and obtain the transmission estimate map of haze image, which greatly improves the image defogging performance compared with traditional methods. Ren et al. [22] use multi-scale convolutional neural networks (MSCNN) to predict the atmospheric transmittance of images with fog. By learning the mapping relationship between the foggy image and the transmittance map, the edge of the transmittance map is refined through the global edge guide network. AOD-Net designed by Li et al. [23] is a lightweight network that can directly generate clear images. The innovation of this network is to integrate several intermediate variables in the atmospheric scattering model into one trainable parameter, which effectively improves the quality of fog removal image. Yang et al. [24] integrated the imaging model of images with fog and prior knowledge of images into a separate defog network, and obtained clear images by integrating different prior knowledge. Zhang et al. [25] proposed an end-to-end pyramidal defogging network with dense connections, which jointly learned atmospheric transmittance, atmospheric light, and images after defogging. Ren et al. [26] propose a single-image defogging gated fusion network (GFN) to implement image defogging through an encoder–decoder network to learn the weight of input. In the coding stage, feature coding is implemented for the image with fog and its multiple transform images. In the decoding stage, the weights of multiple transform images are estimated. In the process of image fusion, the estimated weight matrix is used to guide the final defogging image. Wang et al. [27] propose a fog concentration adaptive defogging network, which mainly includes a pyramid feature extractor, feature enhancement module, and multi-scale feature focus module. First, the pyramid feature extractor uses complementary features from different convolutional layers to help the network recover clear images. Then, the feature enhancement module fuses the images with four different fog concentrations, and the guiding network adaptively perceives the images under different fog concentrations. Finally, the multi-scale feature focus module is used to help the network generate clear images with more detail and simplify network training. Feng et al. [28] proposed a residual fog removal network based on U-Net. The encoder module of the network uses hybrid convolution combined with standard convolution and extended convolution to expand the receptive field to extract more detailed image features. Li et al. [29] proposed an end-to-end defogging algorithm. First, the shallow features of the haze image were extracted by a preprocessing module. Then, convolutional neural networks were used to capture the local and global features of haze images. Finally, the features obtained by the encoder and decoder were fused to obtain richer feature information. Wang et al. [30] proposed a recurrent context aggregation network (RCAN) to effectively remove haze in images. RCAN uses a recursion mechanism to improve the performance of image defogging without introducing additional parameters. Li et al. [31] proposed an unsupervised and untrained neural network called you only look yourself (YOLY) for image defogging, which generates fogged images from input fuzzy images in a self-supervised way, so as to avoid the routine training of depth models on the synthetic data set. In essence, it is a deep-learning-based fog degree transfer solution, and the training idea of unsupervised defogging is worth recognizing. However, the defogging effect still needs to be verified and improved. Sun et al. [32] designed a fog removal network to solve the problem of difficult extraction of haze features in poor visibility. By using the encoding and decoding methods of Swin transformer block, they can obtain high-quality smog-free output and achieve good fog removal effect. For remote sensing images, Zhu et al. [33] proposed a remote sensing image defogging network based on double self-focused enhanced residual frequency doubling convolution (DOC). The dual self-attention module is applied to the feature enhancement of the output feature maps of the encoding stage, thereby obtaining the refined feature maps. The strengthen–operate–subtract (SOS) boosted module is used to fuse the refined feature maps of each network layer with the up-sampling feature maps from the corresponding decoding stage.

Although the image defogging algorithm based on deep learning has greatly improved the effect on the basis of the traditional algorithm, there are still many problems that have not been optimized and solved, which has great exploration space and research value.

Therefore, in order to solve the problems of color distortion, contrast reduction, and poor defogging effect in the current defogging algorithm. A single-image defogging algorithm based on cycle-consistent adversarial networks (CycleGAN) is proposed, in which the self-attention module (SA) and atrous convolution multi-scale feature fusion module (ACMS) are added. The perception loss is introduced into the loss function, which enhances the texture sense of the generated image. Then, in order to prove the superiority of the proposed method in generalization performance and visual effects, we conducted a qualitative and quantitative evaluation of the network as a whole. We compared several classical and recent effective defogging algorithms on synthetic and real data sets. Finally, the ablation experiment proves that each improved module in this paper has a positive effect on the overall defogging network and achieves a good defogging effect.

2. Theoretical Basis of Generative Adversarial Networks

2.1. Research Status of Generative Adversarial Network

Goodfellow et al. [34] proposed generative adversarial networks (GAN). Zhu et al. [35] proposed an example of mapping from source image domain to target image domain, and this method does not require pairing. Based on this, researchers apply the generative adversarial network to the image defogging task, thus, generating more deep learning methods based on generative adversarial ideas. Engin et al. [36] introduced cycle-consistency loss to improve the texture information of cycle-consistent adversarial networks (CycleGAN) and generate more realistic defogging images. On this basis, Zhao et al. [37] proposed refinement dehazing network (RefineDNet), a two-stage weakly supervised refinement framework based on CycleGAN. The two-stage refinement network is inspired by target detection tasks. Firstly, dark channel theory is used to restore the visibility of foggy images, and then weak supervision training is used to counter-learn the relationship between fuzzy and clear images to improve the authenticity of the results. However, the logic of network design is too complicated. Mirza [38] added additional conditional information into the model and proposed conditional generative adversarial network (CGAN), thus, improving the effective learning efficiency of generators. Qu et al. [39] simplified the image defogging problem to an image-to-image problem [40] and proposed an enhanced pix2pix dehazing network (EPDN). Liu et al. [41] embedded the principle of atmospheric degradation and sky priors into CycleGAN, and used a two-stage mapping strategy in each path to improve the defogging effect. Dong et al. [42] use the encoder–decoder structure as the network generator, and input the image frequency information into the discriminator as additional priors, so as to restore a clearer image that is more real and natural with less color bias and artifacts. Chen et al. [43] proposed an effective end-to-end unpaired image defogging method, which embedded dark channel priori in the recovery phase to provide constraints for the network, and generated preliminary defogging images. Then, the defogging map was refined to explore the potential relationship between scene depth and transmittance, so as to better refine the results of the previous stage and further restore the details of the deep area of the scene.

2.2. Works Principle of Generative Adversarial Networks

Generative adversarial networks (GAN) are a deep learning method that use an unsupervised learning mode for learning. GAN consists of two neural networks, one is a generator and the other is a discriminator. The generator attempts to capture the distribution of real examples so that new data examples can be generated. The discriminator is usually a binary classifier, which can distinguish the generated examples from the real ones as accurately as possible. The generator and discriminator are trained at the same time, until the network training is completed when the generated data distribution is the same as the target distribution. The generator and discriminator play games with each other, and finally achieve the Nash equilibrium.

2.3. Principle of Defogging Cycle-Consistent Adversarial Networks

Cycle-consistent adversarial networks (CycleGAN) defogging network is shown in Figure 1. CycleGAN has two branches:

(1): Fog–fog branch: image $x$ with fog—image $G (x)$ without fog—reconstruct $F (G (x))$ image with fog. The fog–fog branch is shown by the red connecting line in Figure 1;
(2): No fog–no fog branch: no fog image $y$ —foggy image $F (y)$ —reconstructed $G (F (y))$ image without fog. The no fog–no fog branch is shown by the yellow connection line in Figure 1.

As is shown in Figure 1, when a fog image

x

is input into CycleGAN’s generator

G

, a fog-free image

G (x)

is generated. Then, the fog image

F (G (x))

is reconstructed by generator

F

, and

F (G (x))

is called the cyclic image of the original image

x

. Similarly, fog-free image

y

can be input into generator

F

to generate foggy image

F (y)

. Then, the fog-free image

G (F (y))

is reconstructed by generator

G

, and

G (F (y))

is called the cyclic image of the original image

y

. Discriminator

D_{x}

is used to distinguish the true or false input of the image with fog, and discriminator

D_{y}

is used to distinguish the true or false input of the image without fog.

Although the fog removal algorithm based on GAN network has achieved good results, there are still many problems that have not been optimized and solved, which has great mining space and research value. Therefore, in order to solve the problem of incomplete fog removal, we have made some improvements and promotion on the basis of the traditional CycleGAN network. The third section introduces the generator, discriminator, loss function, and experimental procedure of the improved CycleGAN network in detail.

3. Improved Cycle-Consistent Adversarial Networks Network Model

3.1. Improved Generator Structure

The generator of the traditional CycleGAN defogging network is only composed of three stages: encoding, residual network, and decoding. The feature extraction ability of the network is poor, and the defogging effect is not satisfactory. In order to enhance the robustness of the defogging network and improve the defogging effect of the network, we optimized the traditional CycleGAN network. The optimized generator is composed of five modules: encoding, self-attentional mechanism, residual network, atrous convolution multi-scale fusion, and decoding. The optimized network was named I-CycleGAN.

The purpose of the generator is to take the input image and convert it into the corresponding output image. The generator of I-CycleGAN contains two generators

G

and

F

with the same structure, but the input of the two generators is different. The function of generator

G

is to change the image with fog into the image without fog; the structure of improved generator

G

is shown in Figure 2.

The function of generator

F

is to change the image without fog into the image with fog; the structure of improved generator

F

is shown in Figure 3.

The five modules of the I-CycleGAN generator (encoding, self-attentional (SA) mechanism, residual network, atrous convolution multi-scale (ACMS) fusion, decoding) are introduced in detail in turn. In order to more conveniently and specifically demonstrate the working principle of the defogging network designed by us, the framework of the defogging algorithm of the cycle-consistent adversarial network is written in Algorithm 1.

Algorithm 1 Cycle-consistent adversarial network defogging algorithm

A:= foggy_image
B:= fog_free_image
for epoch in range (0, epochs)
for batch in range (0, maxbatch)
1 same_B = Generator_A2B (real_A)
2 same_A = Generator_B2A (real_B)
3 loss_indentity = criterion_indentity(same_A,real_A) + criterion_indentity(same_B,real_B)
4 fake_B = Generator_A2B (real_A)
5 pred_fake_B = Discriminator_B (fake_B)
6 fake_A = Generator_B2A (real_B)
7 pred_fake_A = Discriminator_A (fake_A)
8 loss_GAN = criterion_GAN (pred_fake_A, target_real_A) + criterion_GAN(pred_fake_B, target_real_B)
9 recovered_A = Generator_B2A (fake_B)
10 loss_cycle_ABA + criterion_cycle (recovered_A, real_A)
11 recovered_B + Generator_ B2A (fake_B)
12 loss_cycle_BAB = criterion_cycle (recovered_B, real_B)
13 Total_loss = loss_indentity + loss_GAN + loss_cycle_ABA + loss_cycle_BAB
14 Total_loss backward()

3.1.1. Coding Module

The generator takes the foggy image

x

as input and first goes through the coding phase. The encoder consists of three layers, starting with an initial convolution block consisting of 7 × 7 convolution, instance normalization, and ReLU activation functions. In the process of feature extraction, convolution is also used as the lower sampling layer to continuously reduce the resolution of feature map, so as to realize feature extraction of images with fog. Image defogging can be regarded as a domain adaptation problem, and each image can be regarded as a domain. Instance normalization mainly deals with the data of a single image, so it is more suitable for the image defogging task compared with batch normalization in regular use. ReLU activation function is a common type of activation function, and all negative values are set to 0. Detailed encoder structure parameters are shown in Table 1.

3.1.2. Self-Attention Module

When using CycleGAN network for image defogging, it cannot always bring beneficial information because of noise and other reasons. During training, the noise of the model has an adverse effect on the feature learning of the discriminator network. In order to make the defogging map more coherent and uniform, and enhance the generator’s ability to capture the dependencies between different positions in the input image, we added a self-attention (SA) module to the generator. The function of the SA is to calculate the response at one position as the weighted sum of the features at all positions, in which the weight is calculated at only a small computational cost. Most generators in CycleGAN networks use a convolutional layer, which heavily relies on convolution to model the correlation between different image regions. Since the convolution operator has a local receptive field, the long-distance dependence can only be dealt with after several convolution layers. However, a non-local model is used in the SA module, which enables the generator and discriminator to effectively construct the relationship between various regions and directly calculate the relationship between two pixels. The self-attention mechanism calculates the response of a single location in the feature-weighted sum of all locations. This mechanism allows the network to focus on areas that are scattered in different locations but are structurally related. The self-attention module in the I-CycleGAN generator model is shown in Figure 4:

As shown in Figure 4, the SA module takes the feature map created by the convolutional neural network as input and turns it into three feature spaces, namely,

f (x)

,

g (x)

, and

h (x)

. The three feature spaces use 1 × 1 convolution kernel to transmit the original feature graph, and the difference lies in the size of the output channel. The features of the previously hidden layer are input into the two feature spaces

f

and

g

to calculate the parameters of self-attention, as shown in Formula (1):

β_{j, i} = \frac{\exp (s_{i j})}{\sum_{i = 1}^{N} \exp (s_{i j})}

(1)

where

β_{j, i}

represents the correlation of two feature space channels. The output of

f (x)

is transposed and multiplied by the output of

g (x)

by matrix. The

\otimes

in Figure 4 represents matrix multiplication and is normalized by softmax to obtain the feature figure

S_{i j}

, whose calculation formula is shown in (2):

S_{i j} = f {(x_{i})}^{T} g (x_{j})

(2)

The attention map is multiplied by

h (x)

to generate the attention feature map, representing the global spatial information. As shown in Formula (3):

O_{j} = \sum_{i = 1}^{N} β_{j i} h (x)

(3)

where

O_{j}

stands for global information. Formula (4) represents the output of the final self-attention mechanism module.

y_{i} = α O_{j} + x_{i}

(4)

y_{i}

represents the integration of global space and local information, and parameter

α

is initialized from 0. The idea is for the network to focus first on nearby information and then slowly distribute its weight to other distant features.

3.1.3. Residual Network Module

Residual connection is a technique often used in deep neural networks to make the network easier to train and optimize, enhance the feature extraction capability of the generator, and prevent problems such as gradient vanishing and gradient explosion. As the result of image defogging needs to retain the features of the original image, such as shape and color, so the residual block is very suitable for these transformations. The structure of the residual network is different from that of the previous neural network in that a shortcut is added at the upper end of the residual network so that the input can be added to the output of the convolutional layer.

As shown in the Figure 5, the residual blocks module of the I-CycleGAN defogging network generator uses nine residuary blocks, each of which includes two convolution kernels with a size of 3 × 3, where

x

is the input data,

F (x)

represents the double-layer convolution operation, and the residuary block adds the input data and the output features extracted by the double-layer convolution, which is expressed as

F (x) + x

. The reason for using residual network is that it can increase the depth of network and improve the accuracy of the defogging network by directly learning residuals to bypass some optimization problems in deep network.

3.1.4. Atrous Convolution Multi-Scale Fusion Module

Since fog images contain both extensive contour information and small texture feature information, we designed an atrous convolution multi-scale (ACMS) fusion module in order to capture feature information of different scales in input images, increase network capacity, improve the robustness and effectiveness of captured internal information, and, thus, improve the generator effect. The specific structure is shown in Figure 6.

The process is as follows: first, a convolutional layer is used to reduce the number of channels on the input feature map to 1/8 of the original number, and then atrous convolution operations are performed on it three times. When there is no atrous convolution, the receptive field is smallest; when there is one atrous convolution, the receptive field becomes larger; when there is two atrous convolution, the receptive field continues to grow larger; and the receptive field is the largest when it passes through three atrous convolution. Therefore, four different receptive fields are obtained after three atrous convolution layers. Then, the four feature maps are cascaded together, and the number of channels is changed to the original value by one convolutional layer. Finally, the cascaded feature graph is added to the input feature graph to obtain the final output result.

Generally speaking, the method to increase the receptive field and reduce computation in deep neural networks is downsampling. However, downsampling sacrifices spatial resolution and some input information. Pooling can also enlarge the receptive field, but with reduced spatial resolution. In contrast, atrous convolution can expand the receptive field without losing resolution, while maintaining the relative spatial position of pixels. Simply speaking, atrous convolution controls both receptive field and resolution. On the one hand, atrous convolution increases the receptive field so that it can be used to detect large segmented targets. On the other hand, the increased resolution allows accurate target localization compared with downsampling. As the receptive field becomes larger, multi-scale information is obtained.

3.1.5. Decoding Module

The function of decoder stage is to restore the low-level features from the feature vector output in the converter stage, use the up-sampling to restore the original features of the image, and finally generate the fog-free image. Just as with the encoder, the convolution operation of the decoder also includes convolution, instance normalization, and activation functions. The output channel of the first layer of the decoder is 256, the output channel is 64, the convolution kernel size is 3 × 3, and the step size is 2. In the second layer, the input channel is 64, the output channel is 32, the size of the convolution kernel is 3 × 3, and the step size is 2. Finally, a final fog-free image is generated by a 7 × 7 convolution kernel and activation function TanH in the decoding stage. Table 2 lists the detailed decoder structure parameters.

3.2. Discriminator Structure

Discriminators are used to distinguish between true and false images generated. In I-CycleGAN, there are two discriminators called

D_{x}

and

D_{y}

. The discriminator

D_{x}

and

D_{y}

have the same structure, but the input data are different. Discriminator

D_{x}

is used to distinguish between the generated image with fog and the real image with fog and the structure of discriminator

D_{x}

is shown in Figure 7.

Discriminator

D_{y}

is used to distinguish between the generated fog-free image and the real fog-free image. The structure of discriminator

D_{y}

is shown in Figure 8. The discriminator updates the network parameters constantly in order to accurately classify the images.

As shown in Figure 7 and Figure 8, except for the first and last layers in the discriminator, the basic composition of the remaining layers is made up of convolution, instance normalization layer, and leaky ReLU activation function layer. The first layer includes the convolution and activation function leaky ReLU, and there is no instance normalization layer used. The last layer contains only convolution operations. The first four convolutional layers of the discriminator network are used to extract features, and the last convolutional layer is used to determine the truth and falsity of the generated image. Leaky ReLU assigns a non-zero slope to all negative values, with all the benefits of ReLU. Typically, the last layer in the discriminator uses the sigmoid activation function to produce the final output. However, in order to maintain the stability of the GAN network training process, this paper refers to the training idea of WGAN [44] and cancels sigmoid activation in the final output layer. The detailed parameters in the discriminator are shown in Table 3.

3.3. Improved Loss Function

In order to improve the generation quality of images, perceptual loss is introduced into the I-CycleGAN network to strengthen the constraints on the quality of generated images. Perceptual loss restricts the perceptual details generated by images from a high-level semantic perspective. In addition, adversarial loss and cycle-consistency loss are inherent loss functions in CycleGAN networks, which can complete the training of models using asymmetric data. Therefore, the loss functions of the I-CycleGAN demisting network include adversarial loss, cycle-consistency loss, and perceptual loss. Adversarial loss is used to restrict image generation in the process of adversarial, and cycle-consistency loss is used to restrict the conversion between different domains of asymmetric data. These three loss functions are introduced in the following section.

(1): Adversarial loss: The goal of generator $G$ is to learn the mapping from $X$ to $Y$ . The input of the generator $G$ is the image $x$ with fog and the output is the image $G (x)$ without fog. The discriminator $D_{y}$ is used to judge whether the input image is real non-foggy data $y$ or generated non-foggy data $G (x)$ . The adversarial loss of generator $G$ and discriminator $D_{y}$ is expressed as:

L_{G A N} (G, D_{y}, X, Y) = E_{y \sim P_{d a t a (y)}} [\log D_{y} (y)] + E_{x \sim P_{d a t a (x)}} [\log (1 - D_{y} G (x))]

(5)

At the same time, the learning objective of generator

F

is to transform the samples in space

Y

of the image without fog into the samples in space

X

of the image with fog. The input of generator

F

is image

y

without fog, and the output is image

F (y)

with fog. The discriminator

D_{x}

is used to judge whether the input image is real foggy data

x

or generated foggy data

F (y)

. The adversarial loss of generator

F

and discriminator

D_{x}

is expressed as:

L_{G A N} (F, D_{x}, X, Y) = E_{x \sim P_{d a t a (x)}} [\log D_{x} (x)] + E_{y \sim P_{d a t a (y)}} [\log (1 - D_{x} F (y))]]

(6)

The total adversarial loss can be expressed as:

L_{a d v} (G, F, x, y) = L_{G A N} (F, D_{x}, X, Y) + L_{G A N} (G, D_{y}, X, Y)

(7)

where

L_{a d v} (G, F, x, y)

represents overall adversarial loss.

(2): Cycle-consistency loss: CycleGAN introduced cycle-consistency loss to solve the problem that the output distribution cannot be consistent with the target distribution when there is only adversarial loss. For the generator and discriminator, every foggy image $x$ , $F (G (x))$ is able to return the generator’s result $G (x)$ to the original image. Meanwhile, for a fog-free image $y$ , $G (F (y))$ can make $F (y)$ return to the original image $y$ . $F (G (x))$ is the cyclic image of the original image $x$ , and $G (F (y))$ is the cyclic image of the original image $y$ . In the training process, the more similar the result of $F (G (x))$ is to $x$ , the better, and the closer gf is to the original image $y$ , the better. To train the generator $G$ and $F$ at the same time to improve their performance, the design cycle of coherence loss includes two constraints: $F (G (x)) \approx x$ and $G (F (y)) \approx y$ . Therefore, the cycle-consistency loss is used to calculate the L1 norm between the input of the defogging network and the cyclic image.

L_{c y c} (G, F) = E_{x \sim p d a t a (x)} [| | F (G (x)) - x | |_{1}] + E_{y \sim p d a t a (y)} [| | G (F (y)) - y | |_{1}]

(8)

where

L_{c y c} (G, F)

represents overall cycle-consistency loss.

(3): Perceptual feature loss: It is difficult to recover all texture information in the image due to cyclic-consistency loss and counter loss. Therefore, we use perceptual loss to make the generated image semantically closer to the real target image. Under the constraints of generator $G$ and generator $F$ , perceived loss is defined as:

L_{p e r c e p t u a l} (G, F) = | | ϕ (x) - ϕ (F (G (x))) | |_{2}^{2} + | | ϕ (y) - ϕ (G (F (y))) | |_{2}^{2}

(9)

where

| | • | |

is the L2 normal form, we use the VGG16 architecture, which is initialized with the ImageNet pre-training model;

φ

is the feature extractor of the VGG16 network, and features extracted from the second and fifth pooling layers of the VGG16 network are combined.

L_{p e r c e p t u a l} (G, F)

represents overall perceptual feature loss. Therefore, the total loss function after improvement can be expressed as:

L (G, D, X, Y) = L_{a d v} (G, F, x, y) + λ L_{c y c} (G, F) + α L_{p e r c e p t u a l} (G, F)

(10)

where

L (G, D, X, Y)

represents the total loss function.

λ

is the weight of cycle-consistency loss, usually 10, and

α

is the weight of perceptual feature loss.

4. Experiment Results and Discussion

In order to qualitatively and quantitatively evaluate the performance of the I-CycleGAN defogging network, we compared the I-CycleGAN defogging network with five classical and advanced defogging algorithms (DCP [15], DehazeNet [21], AOD-Net [23], GFN [26], and the original CycleGAN [35]), among which, the last four are also defogging algorithms based on deep learning. The defogging effect of the generated image was evaluated by using the peak signal-to-noise ratio (PSNR [45]) and the structural similarity index (SSIM [46]). Then, because the five algorithms we compared are all relatively classical and effective defogging networks, in order to prove that the I-CycleGAN defogging network has a strong defogging effect, we compared the recently released defogging algorithm YOLY [31] to prove that our designed network has strong robustness and effectiveness. Finally, through the ablation study, it was proved that I-CycleGAN performed well in defogging and that each module in the model played an irreplaceable role.

A few words about these two metrics. PSNR and SSIM are commonly used in image processing. PSNR evaluates the overall similarity of images mainly by calculating the error between the corresponding pixel points between the defogged image and the fogged image, and focuses on evaluating the color deviation of the image and the distortion degree of the image. The larger the value of PSNR, the smaller the distortion degree of the defogged image and the closer to the non-fogged image. SSIM is an index to measure the similarity between the defogged image and the fogged image. Three different standards, brightness, contrast and structure, are applied to evaluate the similarity of the two images. The larger the SSIM value is, the more similar the two images are, and the better the defogging effect is.

4.1. Data Description and Experimental Details

4.1.1. Data Description

The training set used by the model is a total of 13,990 pictures on RESIDE- indoor training set, ITS, and the training data are asymmetric. In the training phase, the Adam optimizer was used to update the gradient, where the learning rate was set to 0.0001 and the batch size was set to 1. The weight

λ

of cycle-consistency loss and perceptual loss

α

in formula 10 are set to 10 and 0.0001, respectively. The leaky ReLU coefficient in the discriminator is set to 0.2. The ratio of updating generator and discriminator of the algorithm is 1. Our experimental environment is as follows: the system is Ubuntu 18.04, the GPU is Nvidia GTX 1080Ti, and the deep learning framework is Pytorch.

4.1.2. Experimental Details

We used 1000 pictures on SOTS [47] to test indoor and outdoor composite data sets, and use 45 pictures on O-HAZE [48] generated by a professional haze machine by scholars from Timişoara Polytechnic University in Romania to simulate the real-world data sets.

When we obtained the image, we first preprocessed the image. The image was scaled to 256 × 256 and then scaled to 286 × 286 by BICUBIC interpolation. The image size of 286 × 286 was randomly clipped, then the random horizontal clipping was performed, the normalization operation was performed, and finally the data set was scrambled.

The training process of I-CycleGAN is to define the generator, discriminator, and loss function of the defogging network. First, we calculated the loss of the generated fog image and the real fog image, then calculated the loss of the fog-free image and the real fog-free image. The discriminator determines the loss between the generated fog-free image and the real fog-free image and the loss between the generated fog image and the real fog image. Then, we calculated the loss between the reconstructed image with fog and the real image with fog, and calculated the loss between the reconstructed image without fog and the real image without fog. Then the total loss was calculated, the derivative calculated, and the parameters updated by back propagation.

4.2. Experimental Evaluation on Synthetic Data Sets

4.2.1. Experimental Evaluation of Indoor Synthetic Data Sets

Indoor synthetic data set: To prove that the I-CycleGAN network designed in this paper has strong robustness, the composite data set SOTS test set is used to compare the defogging effect of different algorithms, and the number of images is 500. Figure 9 shows the defogging effect diagrams of three randomly selected SOTS indoor test set samples under different algorithms. Figure 9a is the original image with indoor fog, Figure 9b is the DCP [15] algorithm, Figure 9c is the DehazeNet [21] algorithm, Figure 9d is the AOD-Net [23] algorithm, Figure 9e is the GFN [26] algorithm, Figure 9f is the CycleGAN [35] algorithm, Figure 9g is the I-CycleGAN algorithm, and Figure 9h is the original fog-free image.

As shown in Figure 9, it can be clearly seen that the deep-learning-based defogging algorithm is superior to the traditional defogging algorithm in a subjective visual evaluation. In Figure 9b, using the DCP algorithm with relatively good defogging effect among the traditional algorithms, it can be clearly seen by the naked eye that the DCP defogging algorithm has the problem of color distortion, and the overall defogged image is bright, especially in the desktop of the first picture and the wall of the second picture. In Figure 9c, there is a large amount of residual fog residue in DehazeNet, and the effect of defogging is relatively ordinary. In Figure 9d, the wall of the second picture and the table and chair of the third picture clearly show that the fog in the image is dark, and there is a large amount of fog residue in the defogged image, so the generated image is relatively fuzzy. Figure 9e, using the GFN defogging algorithm, and Figure 9f, showing the CycleGAN defogging algorithm, show a relatively good defogging effect, but there are also incomplete defogging phenomenon. In Figure 9e,f, there is still a fuzzy phenomenon in the desktop area in the first picture. Therefore, the I-CycleGAN defogging effect map we proposed is the best in terms of color gloss and image sharpness.

Table 4 shows the objective evaluation of different defogging algorithms used in Figure 9. By comparing the quantitative results of the image quality evaluation index on the SOTS indoor test set, it can be seen that the PSNR of the I-CycleGAN defogging network designed by us is 0.54 higher than that of the GFN network with the best defogging effect. SSIM increases by 0.0056. It can be concluded that the I-CycleGAN defogging method not only achieves good defogging results on the indoor composite test set, but also comes closest to clear images in subjective visual aspects such as color, saturation, and sharpness.

4.2.2. Experimental Evaluation of Outdoor Synthetic Data Sets

Outdoor composite data set: Figure 10 shows the defogging effect on three picture samples of SOTS outdoor test set extracted under different algorithms. Among them, Figure 10a is the original image with outdoor fog, Figure 10b is the DCP [15] defogging algorithm, Figure 10c is the DehazeNet [21] defogging algorithm, Figure 10d is the AOD-Net [23] defogging algorithm, and Figure 10e is the GFN [26] defogging method. Figure 10f is the CycleGAN [35] dehazing algorithm, Figure 10g is the I-CycleGAN dehazing algorithm, and Figure 10h is the original fog-free image.

As shown in Figure 10, the defogging algorithm based on deep learning is superior to the traditional defogging algorithm in subjective visual evaluation index. According to Figure 10b, color distortion exists in the defogging result of the DCP method, which is especially obvious in the sky area of the image in Figure 10b. As can be seen in Figure 10c, DehazeNet’s defogging effect is mediocre, with a large amount of residual fog, and the clarity of the generated image is relatively low. In Figure 10d, there is residual fog in the defogged image, and the generated defogging effect map is dark in brightness. It is obvious that the color of the AOD-Net defogging effect map is dark on the buildings in the second and third pictures. The GFN defogging algorithm in Figure 10e and CycleGAN defogging algorithm in Figure 10f have relatively good effects, and the generated images are relatively clear, but there are also incomplete defogging phenomena. In the roof area of the first picture using GFN and CycleGAN, there exists incomplete defogging phenomena. In contrast, the I-CycleGAN algorithm proposed in this paper is the best in terms of color glossiness and image clarity.

Table 5 shows the objective evaluation of different defogging algorithms used in Figure 10. In terms of PSNR and SSIM indexes, the I-CycleGAN network designed by us is significantly improved compared to the GFN network with the best fog removal, among which, the PSNR increases by 2.21 and SSIM increases by 0.0158. As a result, the I-CycleGAN defogging method not only achieves good defogging results on the outdoor composite test set, but also came closest to a clear image in subjective visual terms such as color, saturation, and sharpness.

4.3. Experimental Evaluation of Real Data Sets

In the real world, images with fog captured are not completely uniform. In this section, real data sets are used to compare the above different defogging algorithms. Since no images with fog corresponding to images with fog in the real world can be obtained, the experimental results can only use a subjective evaluation method to evaluate the effect of fog removal. In order to facilitate quantitative analysis, the O-HAZE outdoor real data set is used in this section. The O-HAZE [48] outdoor real data set uses artificial fog-making equipment to ensure that fog images in the real world are collected as well as fog-free images of corresponding scenes.

For the convenience of comparison, this section puts together the defogging images of different algorithms of the O-HAZE outdoor real data set and the real-world data set. Figure 11 shows the defogging effect diagrams of one real-world data set sample and two O-HAZE outdoor real test set samples under different algorithms. Figure 11a is the original image with fog, Figure 11b is the DCP [15] algorithm, Figure 11c is the DehazeNet [21] algorithm, Figure 11d is the AOD-Net [23] algorithm, Figure 11e is the GFN [26] algorithm, Figure 11f is the CycleGAN [35] algorithm, and Figure 11g is the I-CycleGAN algorithm.

Due to the inability to obtain the actual non-fog images corresponding to images with fog in the real world, the experimental results can only use a subjective evaluation method to evaluate the effect of fog removal. As shown in Figure 11, the proposed method has a good defogging effect for the real world and the O-HAZE outdoor real data set.

As can be seen from Figure 11b, there is a strong halo phenomenon and obvious color distortion in the sky area after DCP processing, the overall color is bluish, and some areas are too dark, resulting in contrast imbalance. As can be seen from Figure 11c–f, the defogging algorithms based on deep learning all show good defogging effect. However, after DehazeNet defogging, the fog remains obvious, and the effect of defogging is not complete. After AOD-Net fog removal, the color of the picture is darker; it can be clearly seen from the third picture in Figure 11d that the trees become darker, and the detail retention needs to be improved. The I-CycleGAN algorithm and GFN and CycleGAN algorithm proposed in this paper have a relatively good defogging effect.

In order to reflect the superiority of the I-CycleGAN network more clearly and intuitively, this paper carries out quantitative evaluation on the O-HAZE outdoor test set. Table 6 shows that the I-CycleGAN network proposed in this paper is slightly superior to other typical defogging algorithms in terms of PSNR and SSIM indexes. Compared with the GFN network, the SSIM index of I-CycleGAN network increases by 0.44 and the PSNR index increases by 0.0035.

4.4. Experimental Evaluation of Fog Removal Algorithms in Recent Years

To further demonstrate the validity and robustness of our proposed I-CycleGAN defogging network, we compared a good defogging network with a recent algorithm called YOLY [31]. We selected one image from each of the indoor, outdoor, and real data sets for comparison. As shown in Figure 12, in order to see the defogging effect of the image more clearly, the defogging effect map was partially enlarged. The green box in the upper right corner and the red box in the lower right corner of each image are the results of zooming in on different areas. Figure 12a is the original image with fog, Figure 12b is the YOLY [31] algorithm, and Figure 12c is the I-CycleGAN algorithm. Figure 12d is the original fog-free image.

Since there is no corresponding non-fogged image in the real-world dataset, we did not put the image in the fourth row and third column of Figure 13. As shown in Figure 13, the desktop, floor, and valley areas in Figure 13b are dark. In addition, some areas in the YOLY algorithm have incomplete defogging. In comparison, the I-CycleGAN network designed by us has a relatively good defogging effect. By comparing the brightness, color, and saturation of the image, we see that the I-CycleGAN designed in this paper has a better defogging effect.

In order to reflect the superiority of I-CycleGAN network more clearly and intuitively, in this paper, the indoor and outdoor data sets are quantitatively evaluated. As can be seen from Table 7, the I-CycleGAN network proposed in this paper slightly outperforms the YOLY defogging algorithm on the PSNR and SSIM indexes. In the indoor data set, the I-CycleGAN network improves the SSIM index by 2.71 and the PSNR index by 0.0118 compared to the YOLY network. In the outdoor data set, the SSIM index of I-CycleGAN network increases by 1.80 and the PSNR index increases by 0.0034.

4.5. Ablation Experiment

In deep learning, the construction and training of network structures are sometimes based on experience and experiments. In order to fully illustrate the effectiveness of a certain design or method, control variables need to be carried out to compare the results obtained after removing this module with those obtained with this module, so as to verify the effectiveness and practicability of this module.

The SOTS test set was used for the ablation study to certify the validity of each module in the I-CycleGAN. This includes a comparative analysis of the following four situations:

(1): Primitive CycleGAN network;
(2): A self-attention (SA) module was added to the original CycleGAN network to verify the validity of the atrous convolution multi-scale (ACMS) feature fusion module;
(3): A atrous convolution multi-scale (ACMS) feature fusion module was added to the original CycleGAN network to verify the validity of the self-attention (SA) module;
(4): The I-CycleGAN network designed in this paper fused self-attention (SA) module and atrous convolution multi-scale (ACMS) feature fusion module into the original CycleGAN network for global evaluation.

Results of the ablation study are shown in Figure 13 below. Figure 13a is the original image with fog, Figure 13b is the CycleGAN defogging algorithm, Figure 13c is the CycleGAN and SA defogging algorithm, Figure 13d is the CycleGAN and ACMS defogging algorithm, Figure 13e is the CycleGAN, SA, and ACMS defogging algorithm, and Figure 13f the original fog-free image.

As can be seen from Figure 13, the I-CycleGAN network designed in this paper significantly improves the defogging performance compared with the traditional CycleGAN network. The generated defogging map is more coherent, uniform, and realistic, and the generated image has higher clarity and clearer texture.

Table 8 shows the quantitative evaluation of the ablation study in Figure 13. From Table 8, we can see that the I-CycleGAN network has certain improvements compared with the original CycleGAN network in terms of SSIM and PSNR. PSNR increases by 1.55, the SSIM index increases by 0.0216, and the addition of each module in the I-CycleGAN model improves the performance of the network and generates images with clear texture and reasonable color transition. The results show that the proposed I-CycleGAN network can effectively remove fog.

5. Summary and Prospect

In view of the problems in existing defogging algorithms, such as color distortion, contrast reduction, and incomplete defogging, a defogging algorithm was designed based on the CycleGAN network, and, in addition to the traditional CycleGAN network, a self-attention (SA) module and atrous convolution multi-scale (ACMS) fusion module were added. It enhances the ability of the algorithm to capture information of different scales in the input image, making the defogged image more coherent and uniform, and the generated image more realistic. Finally, the superiority of the I-CycleGAN model was proven by the ablation study. The PSNR of the I-CycleGAN model increases by 1.55 and the SSIM increases by 0.0216 compared with the original CycleGAN network.

Nowadays, most defogging algorithms cannot guarantee the defogging effects of indoor, outdoor, synthetic, and real data sets at the same time. In order to better reconstruct and restore the details of defogged images, this paper introduces perceptual loss into the I-CycleGAN model loss function to reduce the texture deformation of defogged images. After qualitative and quantitative analysis, we compared several classical algorithms when removing fog in the indoor synthetic data set. PSNR improves by 0.54 compared with the GFN network with the best fog removal. SSIM increases by 0.0056. In the outdoor synthetic data set, PSNR increases by 2.21 and SSIM increases by 0.0158 compared with the GFN network with the best fog removal. In the outdoor real data set, PSNR increases by 0.44 and SSIM increases by 0.0035 compared with the GFN network with the best defogging.

To demonstrate the universality and practicality of our algorithm, we compared a recent image defogging algorithm (YOLY). In the indoor data set, PSNR improves by 2.71 compared with the YOLY network. SSIM increases by 0.0118. In the outdoor synthetic data set, PSNR increases by 1.8 and SSIM increases by 0.0158 compared with the YOLY network with the best fog removal. In the outdoor real data set, PSNR increases by 0.44 and SSIM increases by 0.0034 compared with the YOLY network. Therefore, the network designed in this paper has a good defogging effect on indoor, outdoor, and real data sets.

Outlook: Due to the limited amount of data during the training of the I-CycleGAN network, it is not guaranteed to learn all scenes in reality, so the model generalization ability needs to be improved, and the subsequent training of multi-scene models can be strengthened.

In the improved I-CycleGAN, the number of network layers will increase, which will lead to a longer training time. Therefore, how to realize a lightweight network on the basis of ensuring the effect of defogging is an important research direction for us in the future.

Author Contributions

Conceptualization, J.Z.; methodology, X.S.; software, J.Z.; validation, X.S. and J.Z.; formal analysis, Y.D.; investigation, Y.C.; resources, Y.W.; data curation, J.Z.; writing—original draft preparation, J.Z.; writing—review and editing, Y.D.; visualization, Y.C.; supervision, Y.D.; project administration, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

Thanks to all the authors for writing this paper, we will make continuous efforts to write better papers.

Conflicts of Interest

The authors declare no conflict of interest.

References

Garcia-Mateos, G.; Hernandez-Hernandez, J.L.; Escarabajal-Henarejos, D.; Jaen-Terrones, S.; Molina-Martinez, J.M. Study and comparison of color models for automatic image analysis in irrigation management applications. Agric. Water Manag. 2015, 151, 158–166. [Google Scholar] [CrossRef]
Hernandez-Hernandez, J.L.; Garcia-Mateos, G.; Gonzalez-Esquiva, J.M.; Escarabajal-Henarejos, D.; Ruiz-Canales, A.; Molina-Martinez, J.M. Optimal color space selection method for plant/soil segmentation in agriculture. Comput. Electron. Agric. 2016, 122, 124–132. [Google Scholar] [CrossRef]
Sanaullah, M.; Akhtaruzzaman, M.; Hossain, M.A. Land-robot technologies: The integration of cognitive systems in military and defense. NDC E-J. 2022, 2, 123–156. [Google Scholar]
Currie, G.; Rohren, E. Intelligent imaging in nuclear medicine: The principles of artificial intelligence, machine learning and deep learning. Semin. Nucl. Med. 2021, 51, 102–111. [Google Scholar] [CrossRef]
Kumar, R.; Kaushik, B.K.; Balasubramanian, R. Multispectral transmission map fusion method and architecture for image dehazing. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2019, 27, 2693–2697. [Google Scholar] [CrossRef]
Kim, J.Y.; Kim, L.S.; Hwang, S.H. An advanced contrast enhancement using partially overlapped sub-block histogram equalization. IEEE Trans. Circuits Syst. Video Technol. 2001, 11, 475–484. [Google Scholar]
Stark, J.A. Adaptive image contrast enhancement using generalizations of histogram equalization. IEEE Trans. Image Process. 2000, 9, 889–896. [Google Scholar] [CrossRef]
Daubechies, I. The wavelet transform, time-frequency localization and signal analysis. IEEE Trans. Inf. Theory 1990, 36, 961–1005. [Google Scholar] [CrossRef]
Seow, M.J.; Asari, V.K. Ratio rule and homomorphic filter for enhancement of digital colour image. Neurocomputing 2006, 69, 954–958. [Google Scholar] [CrossRef]
Nayar, S.K.; Narasimhan, S.G. Vision in bad weather. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 820–827. [Google Scholar]
Narasimhan, S.G.; Nayar, S.K. Contrast restoration of weather degraded images. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 713–724. [Google Scholar] [CrossRef]
Tan, R.T. Visibility in bad weather from a single image. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
Fattal, R. Single image dehazing. ACM Trans. Graph. 2008, 27, 1–9. [Google Scholar] [CrossRef]
Tarel, J.P.; Hautiere, N. Fast visibility restoration from a single color or gray level image. In Proceedings of the 2009 IEEE International Conference on Computer Vision (ICCV), Kyoto, Japan, 29 September–2 October 2009; pp. 2201–2208. [Google Scholar]
He, K.; Sun, J.; Tang, X. Single Image Haze Removal Using Dark Channel Prior. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2341–2353. [Google Scholar] [PubMed]
Singh, D.; Kumar, V. Dehazing of remote sensing images using improved restoration model based dark channel prior. Imaging Sci. J. 2017, 65, 282–292. [Google Scholar] [CrossRef]
Anan, S.; Khan, M.I.; Kowsar, M.M.S.; Deb, K. Image defogging framework using segmentation and the dark channel prior. Entropy 2021, 23, 285. [Google Scholar] [CrossRef]
Zhang, L.; Wang, S.; Wang, X. Single image dehazing based on bright channel prior model and saliency analysis strategy. IET Image Process. 2021, 15, 1023–1031. [Google Scholar] [CrossRef]
Nishino, K.; Kratz, L.; Lombardi, S. Bayesian defogging. Int. J. Comput. Vis. 2012, 98, 263–278. [Google Scholar] [CrossRef]
Meng, G.; Wang, Y.; Duan, J.; Xiang, S.; Pan, C. Efficient image dehazing with boundary constraint and contextual regularization. In Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV), Sydney, NSW, Australia, 1–8 December 2013; pp. 617–624. [Google Scholar]
Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. DehazeNet: An end-to-end system for single image haze removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef]
Ren, W.; Liu, S.; Zhang, H.; Pan, J.; Cao, X.; Yang, M.H. Single image dehazing via multi-scale convolutional neural networks. In Proceedings of the 2016 European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 154–169. [Google Scholar]
Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. AOD-Net: All-in-one dehazing network. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4780–4788. [Google Scholar]
Yang, D.; Sun, J. Proximal Dehaze-Net: A prior learning-based deep network for single image dehazing. In Proceedings of the 2016 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 702–717. [Google Scholar]
Zhang, H.; Patel, V. Densely connected pyramid dehazing network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3194–3203. [Google Scholar]
Ren, W.; Ma, L.; Zhang, J.; Pan, J.; Cao, X.; Liu, W.; Yang, M.H. Gated fusion network for single image dehazing. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3253–3261. [Google Scholar]
Wang, T.; Zhao, L.; Huang, P.; Zhang, X.; Xu, J. Haze concentration adaptive network for image dehazing. Neurocomputing 2021, 439, 75–85. [Google Scholar] [CrossRef]
Feng, T.; Wang, C.; Chen, X.; Fan, H.; Zeng, K.; Li, Z. URNet: A U-Net based residual network for image dehazing. Appl. Soft Comput. 2021, 102, 106884. [Google Scholar] [CrossRef]
Li, S.; Yuan, Q.; Zhang, Y.; Lv, B.; Wei, F. Image dehazing algorithm based on deep learning coupled local and global features. Appl. Sci. 2022, 12, 8552. [Google Scholar] [CrossRef]
Wang, C.; Chen, R.; Lu, Y.; Yan, Y.; Wang, H. Recurrent context aggregation network for single image dehazing. IEEE Signal Process. Lett. 2021, 28, 419–423. [Google Scholar] [CrossRef]
Li, B.; Gou, Y.; Gu, S.; Liu, J.; Zhou, J.; Peng, X. You only look yourself: Unsupervised and untrained single image dehazing neural network. Int. J. Comput. Vision 2021, 129, 1754–1767. [Google Scholar] [CrossRef]
Sun, Z.; Liu, C.; Qu, H.; Xie, G. A novel effective vehicle detection method based on swin transformer in hazy scenes. Mathematics 2022, 10, 2199. [Google Scholar] [CrossRef]
Zhu, Z.; Luo, Y.; Qi, G.; Meng, J.; Li, Y.; Mazur, N. Remote sensing image defoging networks based on dual self-attention boost residual octave convolution. Remote Sens. 2021, 13, 3104. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S. Generative adversarial nets. In Proceedings of the International Conference on Neural Information Processing Systems, Cambridge, MA, USA, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation using cycle-consistent Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar]
Engin, D.; Genc, A.; Kemal Ekenel, H. Cycle-Dehaze: Enhanced CycleGAN for single image dehazing. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and PatternRecognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–23 June 2018; pp. 9380–9388. [Google Scholar]
Zhao, S.; Zhang, L.; Shen, Y.; Zhou, Y. RefineDNet:A weakly supervised refinement framework for single image dehazing. IEEE Trans. Image Process. 2021, 30, 3391–3404. [Google Scholar] [CrossRef]
Mirza, M.; Osindero, S. Conditional generative adversarial nets. Comput. Sci. 2014, 2672–2680. [Google Scholar] [CrossRef]
Qu, Y.; Chen, Y.; Huang, J.; Xie, Y. Enhanced Pix2pix Dehazing Network. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8152–8160. [Google Scholar]
Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5967–5976. [Google Scholar]
Liu, W.; Hou, X.; Duan, J.; Qiu, G. End-to-End single image fog removal using enhanced cycle consistent adversarial networks. IEEE Trans. Image Process. 2020, 29, 7819–7833. [Google Scholar] [CrossRef]
Dong, Y.; Liu, Y.; Zhang, H.; Chen, S.; Qiao, Y. FD-GAN: Generative adversarial networks with Fusion-Discriminator for single image dehazing. Assoc. Adv. Artif. Intell. 2020, 34, 10729–10736. [Google Scholar] [CrossRef]
Chen, X.; Li, Y.; Kong, C.; Dai, L. Unpaired image dehazing with physical-guided restoration and depth-guided refinement. IEEE Signal Process. Lett. 2022, 29, 587–591. [Google Scholar] [CrossRef]
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved training of wasserstein GANs. Adv. Neural Inf. Process. Syst. 2017, 30, 5767–5777. [Google Scholar]
Huynh-Thu, Q.; Ghanbari, M. Scope of validity of PSNR in image/video quality assessment. Electron. Lett. 2008, 44, 800–801. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Li, B.; Ren, W.; Fu, D.; Tao, D.; Feng, D.; Zeng, W.; Wang, Z. Benchmarking single image dehazing and beyond. IEEE Trans. Image Process. 2018, 28, 492–505. [Google Scholar] [CrossRef]
Ancuti, C.O.; Ancuti, C.; Timofte, R.; De Vleeschouwer, C. O-HAZE: A dehazing benchmark with real hazy and haze-free outdoor images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 754–762. [Google Scholar]

Figure 1. Overall structure of CycleGAN defogging network.

Figure 2. Improved generator

G

overall structure diagram.

Figure 2. Improved generator

G

overall structure diagram.

Figure 3. Improved generator

F

overall structure diagram.

Figure 3. Improved generator

F

overall structure diagram.

Figure 4. Self-attention module structure diagram.

Figure 5. Residual blocks structure diagram.

Figure 6. Structure diagram of atrous convolution multi-scale fusion module.

Figure 7. Discriminator

D_{x}

overall structure diagram.

Figure 7. Discriminator

D_{x}

overall structure diagram.

Figure 8. Discriminator

D_{y}

overall structure diagram.

Figure 8. Discriminator

D_{y}

overall structure diagram.

Figure 9. The defogging effect diagram of SOTS indoor composite data set (a) image with fog; (b) DCP; (c) DehazeNet; (d) AOD-Net; (e) GFN; (f) CycleGAN; (g) I-CycleGAN network; (h) without fog.

Figure 10. The defogging effect diagram of SOTS outdoor composite data set (a) image with fog; (b) DCP; (c) DehazeNet; (d) AOD-Net; (e) GFN; (f) CycleGAN; (g) I-CycleGAN network; (h) without fog.

Figure 11. Defogging renderings on real-world data set and O-HAZE test set (a) image with fog; (b) DCP; (c) DehazeNet; (d) AOD-Net; (e) GFN; (f) CycleGAN; (g) I-CycleGAN.

Figure 12. Comparison with defogging algorithms from recent years (a) image with fog; (b) YOLY; (c) I-CycleGAN; (d) original fog-free image.

Figure 13. Ablation study (a) original image with fog; (b) CycleGAN; (c) CycleGAN and SA; (d) CycleGAN and ACMS; (e) CycleGAN, SA, and ACMS; (f) fog-free image.

Table 1. Encoder structure parameters.

Layer	Convolution	Stride	Input Channels	Output Channels
Layer 1	7 × 7	1	3	64
Layer 2	3 × 3	2	64	128
Layer 3	3 × 3	2	128	256

Table 2. Decoder structure parameter.

Layer	Convolution	Stride	Input Channels	Output Channels
Layer 1	3 × 3	2	256	64
Layer 2	3 × 3	2	64	32
Layer 3	7 × 7	1	32	3

Table 3. Discriminator structure parameter.

Layer	Convolution	Stride	Input Channels	Output Channels
Layer 1	4 × 4	2	3	64
Layer 2	4 × 4	2	64	128
Layer 3	4 × 4	2	128	256
Layer 4	4 × 4	2	256	512
Layer 5	4 × 4	1	512	1

Table 4. Results of quantitative analysis on the SOTS indoor test set.

Quantitative Evaluation	DCP	DehazeNet	AOD-Net	GFN	CycleGAN	I-CycleGAN
PSNR	16.57	18.63	20.72	22.68	21.67	23.22
SSIM	0.8022	0.8432	0.8398	0.8753	0.8593	0.8809

Table 5. Results of quantitative analysis on the SOTS outdoor test set.

Quantitative Evaluation	DCP	DehazeNet	AOD-Net	GFN	CycleGAN	I-CycleGAN
PSNR	18.54	21.13	20.19	23.51	22.98	25.72
SSIM	0.8135	0.8509	0.8611	0.8701	0.8445	0.8859

Table 6. Results of quantitative analysis on the O-HAZE test set.

Quantitative Evaluation	DCP	DehazeNet	AOD-Net	GFN	CycleGAN	I-CycleGAN
PSNR	15.74	17.78	18.17	20.58	19.89	21.02
SSIM	0.7011	0.7689	0.7715	0.8131	0.8045	0.8166

Table 7. Quantitative analysis results compared with YOLY defogging algorithm.

Quantitative Evaluation		YOLY	I-CycleGAN
Indoor data set	PSNR	20.51	23.22
Indoor data set	SSIM	0.8691	0.8809
Outdoor data set	PSNR	23.92	25.72
Outdoor data set	SSIM	0.8825	0.8859

Table 8. The results of the ablation study were quantitatively analyzed.

	PSNR	SSIM
CycleGAN	21.67	0.8593
CycleGAN and SA	22.83	0.8764
CycleGAN and ACMS	22.43	0.8696
CycleGAN, SA, and ACMS	23.22	0.8809

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, J.; Sun, X.; Chen, Y.; Duan, Y.; Wang, Y. Single-Image Defogging Algorithm Based on Improved Cycle-Consistent Adversarial Network. Electronics 2023, 12, 2186. https://doi.org/10.3390/electronics12102186

AMA Style

Zhang J, Sun X, Chen Y, Duan Y, Wang Y. Single-Image Defogging Algorithm Based on Improved Cycle-Consistent Adversarial Network. Electronics. 2023; 12(10):2186. https://doi.org/10.3390/electronics12102186

Chicago/Turabian Style

Zhang, Junkai, Xiaoming Sun, Yan Chen, Yan Duan, and Yongliang Wang. 2023. "Single-Image Defogging Algorithm Based on Improved Cycle-Consistent Adversarial Network" Electronics 12, no. 10: 2186. https://doi.org/10.3390/electronics12102186

APA Style

Zhang, J., Sun, X., Chen, Y., Duan, Y., & Wang, Y. (2023). Single-Image Defogging Algorithm Based on Improved Cycle-Consistent Adversarial Network. Electronics, 12(10), 2186. https://doi.org/10.3390/electronics12102186

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Single-Image Defogging Algorithm Based on Improved Cycle-Consistent Adversarial Network

Abstract

1. Introduction

1.1. Fog Removal Methods

1.1.1. Image Enhancement Defogging Algorithm

1.1.2. Image Restoration Defogging Algorithm

1.2. Deep Learning Defogging Algorithm

2. Theoretical Basis of Generative Adversarial Networks

2.1. Research Status of Generative Adversarial Network

2.2. Works Principle of Generative Adversarial Networks

2.3. Principle of Defogging Cycle-Consistent Adversarial Networks

3. Improved Cycle-Consistent Adversarial Networks Network Model

3.1. Improved Generator Structure

3.1.1. Coding Module

3.1.2. Self-Attention Module

3.1.3. Residual Network Module

3.1.4. Atrous Convolution Multi-Scale Fusion Module

3.1.5. Decoding Module

3.2. Discriminator Structure

3.3. Improved Loss Function

4. Experiment Results and Discussion

4.1. Data Description and Experimental Details

4.1.1. Data Description

4.1.2. Experimental Details

4.2. Experimental Evaluation on Synthetic Data Sets

4.2.1. Experimental Evaluation of Indoor Synthetic Data Sets

4.2.2. Experimental Evaluation of Outdoor Synthetic Data Sets

4.3. Experimental Evaluation of Real Data Sets

4.4. Experimental Evaluation of Fog Removal Algorithms in Recent Years

4.5. Ablation Experiment

5. Summary and Prospect

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI