1. Introduction
Haze or fog in bad weather, which may result in dull colors and blurred contrast for captured images, is a traditional atmospheric phenomenon. Image dehazing is able to help to improve the clarity and visibility of such degraded images, which makes it widely used in unmanned, security monitoring, aerospace, and other fields to enhance the performance and safety of these fields. Since 2004, the publication of image defogging or dehazing has been increasing, and a large number of new methods have been proposed.
The essence of image defogging is to get rid of adverse visual effects caused by haze or fog. The most intuitive way to address such issue is to employ the traditional image enhancement techniques [
1,
2,
3,
4,
5,
6,
7,
8] to improve the contrast and saturation of fog images. However, these methods do not work well on fog images. This is due to the fact that these methods do not consider the mechanism that the deterioration of haze images is related to the haze concentration. To alleviate this problem, numerous atmospheric scattering model (ASM)-based [
9] image dehazing techniques [
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21] have been developed. These methods are mainly based on prior knowledge, such as dark channel prior [
10] and color attenuation prior [
12], to reduce the uncertainty of ASM. Then, they use the estimated parameters to reversely restore the high-quality scene from single image.
Recently, with the sharp development of artificial intelligence (AI), by leveraging AI frameworks, many deep learning-based image dehazing methods have been further proposed. Compared to physical-model-based dehazing method, this type of algorithm can achieve better results because of its powerful fitting ability. In general, these image dehazing methods either make use of different types of network architectures or employ different loss functions during training. Therefore, they can be roughly divided by network used type and training strategy. In the category of training methods, according to whether there is supervision information, it can be further divided into three categories: unsupervised image dehazing methods [
22,
23,
24,
25,
26,
27,
28,
29,
30], supervised image dehazing methods [
31,
32,
33,
34,
35,
36], and semi-supervised image dehazing methods [
37,
38].
Although great progress in image dehazing or defogging have been made, there are few survey papers on such field, to the best of our knowledge. Liu et al. [
39] provide a detailed summary of classical defogging methods, including depth estimation, wavelets, enhancement, and filtering, but lack a detailed discussion of the latest neural network models. Xu et al. [
40] highlights image recovery algorithms, contrast enhancement algorithms, and fusion-based defogging algorithms. Additionally, they describe current video-defogging algorithms while still ignores an introduction to deep learning-based defogging algorithms. Although Gui et al. [
41] give a full summary and discussion of neural networks and loss functions, they still lack a discussion of traditional defogging algorithms and a systematic classification for defogging methods. Ancuti et al. [
42] carries on with some supervised defogging models. However, they do not focus on recent applications of unsupervised methods.
In this paper, we conduct a comprehensive overview of image dehazing or defogging techniques. Unlike the aforementioned survey works, this review firstly categorizes the classic or state-of-the-art image fog/haze removal algorithms in detail, including traditional image enhancement approaches, physical-model-based defogging, network-based dehazing models, and training strategy-based models. Moreover, we also conduct qualitative and quantitative comparisons of each type of algorithm, aiming at pointing out their advantages and disadvantages and raising the outlook that may boost the image defogging field.
The remainder of this paper is organized as follows. Following the introduction,
Section 2 and
Section 3 introduce non-deep learning defogging and deep learning defogging, respectively. In detail,
Section 2 depicts traditional image defogging algorithm and physical-model-based defogging, while
Section 3 illustrates the architecture of the deep network and training strategy used for image defogging.
Section 4 conducts extensive performance evaluations of the state-of-the-art approaches mentioned in above, and illustrates the advantages and disadvantages of each type of algorithm. Finally, the conclusions and outlook for future are drawn in
Section 5.
2. Non-Deep Learning Defogging
Early image defogging either simply enhances the local or global contrast, or explores the hand-crafted priors on foggy images to achieve image fog removal. The former, namely traditional image defogging algorithm, directly exploits the traditional methods, e.g., histogram equalization, signal analysis methods, and other traditional contrast enhancement approaches. The latter, namely physical-model-based defogging, is based on a physical model to estimate the imaging parameters, thereby realizing high-quality fog removal.
2.1. Traditional Image Defogging Algorithm
2.1.1. Histogram Equalization
The core idea of histogram equalization (HE) is based on redistributing images’ intensity levels. This method is conducted by calculating the histogram of the image and then generating the cumulative distribution function (CDF). Subsequently, a uniform distribution is obtained by mapping the original pixel values to new values according to the CDF. Due to the implementation of the aforementioned measures, this method effectively spreads out the pixel values across the intensity spectrum, thereby enhancing the visibility of image details. Considering the mechanism of histogram equalization, there exists a relationship between gray level and pixels:
where N is the total amount of pixels in the image,
L represents the total number of gray levels,
is the value of the cumulative distribution function corresponding to the gray level
,
is the probability density function value of the gray level
in the original image, and
T is the transform function [
1].
Although HE is able to improve the global contrast of an image, it lacks a selection mechanism for processed signals, thus making it amplify noise. Subsequently, adaptive HE (AHE) was proposed, and it partitions images into blocks and then applies specific defogging methods to each block [
2,
3]. However, it brings a new issue, i.e., over-load computational complexity that may reduce its real-time performance. Contrast limited AHE (CLAHE), an improved version of AHE, which uses a bi-linear interpolation [
4] to mitigates these issues. Unfortunately, CLAHE still exhibit noticeable block artifacts and substantial changes in overall brightness.
2.1.2. Signal Analysis-Based Approach
The most representative method of this type is the well-known homomorphic filtering. In homomorphic filtering, the input image is decomposed into two components. The first component represents the spatially varying incident light intensity, which changes slowly and is primarily present in the low-frequency regions of the foggy image. The second component encapsulates the scenario reflection perceived by the human eye, highlighting the intricacies and details of the scenario. To balance these two components, a logarithmic transformation was designed, whose key idea is to suppress low frequency and high frequency [
5]. Mathematically, homomorphic filtering can be formulated as follows:
where
is the pixel value of the output image,
is the Fourier transform of the input image in the frequency domain,
is the frequency response function of the homomorphic filter,
is a variable in the frequency domain, and
is a function that controls the degree of mixing of the low and high-frequency components.
and
are the low-pass and high-pass filters. The advantages of homomorphics lie in the removal of multiplicative noise, enhancement of contrast in adjacent regions, and compression of the overall dynamic range of the image. However, there still exist several notable disadvantages in homomorphic filtering, e.g., being unable to deal with scenarios with severe fog conditions and lacking the ability to maintain the local details during processing. Therefore, to address such limitations, the Fourier transformation is replaced by the wavelet transform in homomorphic filtering for high-quality fog removal. Compared to the Fourier transformation, the wavelet transform provides a multi-dimensional transformation connecting spatial, temporal, and frequency domains. By incorporating the localized adaptation nature of short-time Fourier transformation and utilizing finite-length wavelets that decay, the wavelet transforms significantly enhance its capacity to process non-stationary signals [
6].
where the parameter
a controls the wavelet us contraction scale, while
t controls its translation. The wavelet transform outperforms homomorphic filtering in handling with local image details and enriching overall information, but it may alter image brightness, causing distortions like blurred edges. To address these, the two-dimensional wavelet transform and threshold function are introduced. This combination separates high- and low-frequency components, emphasizing or eliminating detail levels to enhance useful information. While it improves contrast and information in defogged images, it does not fully and effectively address image distortion and edge blurring.
2.1.3. Other Traditional Image Enhancement Used for Image Defogging
In addition to the aforementioned image enhancement algorithms, there are other competitive alternatives that utilize Partial Differential Equations (PDEs) [
7]. These PDE-based methods integrate Laplace operators with Retinex algorithms. A notable advantage of PDE-based approaches is to offer a more physically explanatory and rational method compared to alternative techniques. By leveraging models of light propagation and scattering, PDE-based algorithms achieve a globally consistent fog removal effect. However, these methods also have certain disadvantages. In contrast to the histogram equalization and signal analysis-based defogging algorithms mentioned earlier, PDE-based methods exhibit higher computational complexity. Moreover, when handling with complex scenarios, the defogging process may cause overcompensation, which will lead to areas to be excessively bright or dark.
The Laplace operator-based algorithm aims to enhance image contrast through quadratic differentiation for sharpening. The main advantage of this method is its low complexity. However, it proves less effective in challenging defogging scenarios, often resulting in severe noise and artifacts. Another approach, the Retinex algorithm, enhances images by separating them into albedo and light components, thereby emphasizing details and contrast to achieve fog removal. Various implementations of Retinex algorithms include single-scale Retinex (SSR) [
8] and multi-scale Retinex (MSR) [
4]. MSR applies a convolution kernel at multiple scales, convolving the image to generate reflective images across these scales. These images are subsequently fused, weighted, and averaged to produce the final defogging output. While the multi-scale Retinex algorithm enhances defogging capabilities for diverse scenes and objects compared to the single-scale approach, it also significantly escalates computational complexity and introduces potential noise and artifacts.
While those algorithms employing image enhancement techniques have shown some effectiveness in defogging, their results remain room for improvement. Consequently, researchers have tried to introduce imaging model or deep learning theory to gain a better restoration performance.
2.2. Physical-Model-Based Defogging
This type of algorithm belongs to non-deep learning, and its essence is to search imaging parameters that are used for fog removal.
Figure 1 showcases the schematic diagram for physical-model-based defogging. As shown, hand-crafted prior knowledge, e.g., dark channel prior (DCP), color attenuation prior (CAP), and gamma correction prior (GCP), is initially imposed on the atmospheric scattering model (ASM) [
9] to derive the transmission map
t and atmospheric light
A. Then, these estimated parameters, along with the original foggy image, are fed into ASM to obtain the haze-free scene.
2.2.1. Atmospheric Scattering Model (ASM)
Before describing physical-model-based defogging approaches, it is necessary to introduce well-known ASM. Formally, the ASM can be further expressed by
where
A is the global atmospheric light,
I is the haze image,
J is the haze-free scene that is expected to be restored, and
t is the transmission. In detail, the transmission can be written as
where
k and
represent the atmospheric scattering coefficient and scene depth, respectively. It is obvious from this equation that, on fog-free scenarios,
k is very close to 0, which leads to
according to ASM. When taking pictures on foggy scenarios,
cannot be ignored, and the light source received by the detector would be interfered with by fog. In this case, the collected light primarily originates from two sources: one is the target-reflected light attenuated by particles and detected by the system, while the other is atmospheric light resulting from particle scattering of the light source. Once these two parameters have been determined, the haze-free scenes can be easily restored by:
where a useful solution of getting
A and
t is to impose prior knowledge or extra information on ASM. In the following, several classic defogging algorithms will be briefly outlined.
2.2.2. Dark Channel Prior Image Dehazing
The dark channel prior (DCP) defogging algorithm, introduced by He et al. [
10], is known for its superior performance compared to other prior-based defogging methods. He et al. note that every region in a fog-free image contains at least bright colors or dark elements, such as shadows. Therefore, each region typically shows a channel with very low pixel values, known as the DCP. Formally, it can be defined as follows:
where
c is the color index,
represents the neighborhood,
stands for each color channel, and
represents the dark channel map. Apart from the sky regions, the intensity of
is low and close to 0. Combining DCP and ASM, the transmission can be computed by
where
denotes the fog retention coefficient, which aims to preserve a minimal amount of fog in the distant regions of the original image. In Equation (
8), the value of
is assigned as 0.95 and the atmospheric light is obtained from the region where the darkest 5 percent is. Once these imaging parameters are determined, the clear version can be recovered by Equation (
7) from a single foggy image. It should be pointed out that the transmission obtained through this method generally offers high accuracy; however, its high complexity limits its practical application. To address this limitation, He et al. proposed the guided filter (GIF) [
11], which improves its computational efficiency. Furthermore, this approach still has some other drawbacks, e.g., it cannot deal with images with sky regions.
2.2.3. Color Attenuation Prior Image Defogging
As is well known, human perception can swiftly discern areas with or without fog, as well as distinguish between near and far distances without relying heavily on additional data. Building upon this observation, Zhu et al. [
12] propose a color attenuation prior (CAP) by assuming that haze concentration correlates with the disparity between brightness and saturation. Mathematically, the CAP is expressed by the following:
where
,
, and
are unknown linear coefficients, and
represents a random variable that captures the inherent error of the model. Additionally,
,
, and
correspond to scene depth, brightness, and saturation, respectively. For simplicity, this approach further assumes a Gaussian distribution for
with zero mean and variance
(i.e.,
). By leveraging the properties of the Gaussian distribution, Equation (
9) can be detailed as
To learn the coefficients
,
, and
accurately, Zhu et al. further create a joint conditional concentration in terms of Equation (
10), i.e.,
where
n is the total number of pixels within the training hazy images,
is the depth of the
nth scene point, and
L is the likelihood. The main advantage of this method is its highly effective performance. However, it may be invalid when processing certain cases, e.g., scenes with strong lighting or complex colors.
2.2.4. Gamma Correction Prior to Image Defogging
As discussed above, most currently available methods fail to accurately search the scene depth that is useful for transmission estimation. To this end, Ju et al. [
13] firstly introduces a novel pre-processing technique called gamma correction preprocessing (GCP), i.e.,
where
is the virtual result, and
is the color channel of the hazy image. Having the input image and the prepossessed result, single-image defogging can be subtly changed to multi-image defogging. Taking ASM as theory, the imaging equation of the input image and the prepossessed result is described by
where
s is the atmospheric light of the virtual results
. By solving this equation, the scene depth can be computed by
where
and
are very small positive constants,
is introduced to avoid the numerator from exceeding the function definition field, and
is introduced to make sure the denominator is not zero. For simplicity, Ju et al. further assume that the weather conditions do not change spatially, which leads to
By substituting Equations (
5) and (
16) into ASM, the dazed result can be expressed as:
where the range of
is set to 0 ≤
≤ 1 in order to prevent pixel overflow. Consequently, the modified expression for reconstructing the scene contents can be formulated as follows:
where dehaze(·) is the abbreviation of the albedo restoring function. Note that dehaze(·) is a function of four parameters, where
is the input,
can be easily calculated according to Equation (
18),
is the depth ratio obtained in the previous subsection, and
=
/(
) is the only unknown parameter. To estimate the value of
with low complexity but high accuracy, a globally optimized function is designed as:
where
represents a vision indicator designed via single or multiple images prior, and
is a down-sampling operator with coefficient
n. With this estimated
, the clear version can be directly recovered according to Equation (
18). Unlike the other defogging methods employing pixel-wise, patch-wise, scene-wise, non-local-wise, and learning-wise strategies, this technique makes use of a global-wise strategy to achieve a high-quality image defogging. Nevertheless, because of the fact that it assumes weather conditions do not change spatially, thus making it fails to deal with the images with non-homogeneous fog.
2.2.5. Physical-Model-Based Defogging Using Other Prior Knowledge
Tan [
15] assumes a fixed atmospheric light value in the local region and employs a Markov model framework to maximize local contrast for processing foggy images. This is achieved by developing a cost function and estimating the optimal atmospheric light using graph segmentation knowledge. The algorithm effectively enhances image contrast and improves visibility; however, it may lead to color over-saturation post-fog removal and introduce halo effects in certain interface areas.
Fattal [
16] assumes that the local region’s reflectance remains constant, while the object’s surface chromaticity demonstrates local statistical intercorrelation with media propagation. Nevertheless, accurate estimation can pose challenges in cases where relevant components lack noticeable changes or when color information is limited.
Tarel et al. [
17] proposed a fast fog removal algorithm, which estimates the dissipation function by analyzing the distortion caused by the median filter. Regrettably, inappropriate parameter configurations during the application of the median filter estimation method can introduce halo artifacts.
Ju et al. [
18] explores a region line prior (RLP), i.e., when the image is divided into
n regions, the brightness corresponding to the blurred image and the fog-free image in each region is positively correlated with the scene depth. Then, combining RLP and ASM, they further proposed the defogging algorithm. To solve the dim effect and better simulate the outdoor hazy scene, Ju et al. [
43] also developed a simple yet effective image enhancement technology based on the grayscale world hypothesis and an enhanced ASM.
Berman et al. [
19] proposed the non-local image defogging algorithm based on the assumption that the color of the haze-free image can be well estimated from hundreds of strict colors. Since this algorithm is based on pixels rather than patches, it can have a high efficient and exhibit a high-quality enhancement effect.
Oakley et al. [
20] postulated the availability of scene depth information and utilized Gaussian functions to restore scene contrast by predicting the optical path. Importantly, their approach did not necessitate any weather-related predictions. However, the implementation conditions were demanding, requiring specific hardware devices for obtaining depth-of-field (DOF).
Kopf et al. [
21] employed a combination of hardware and software devices to acquire auxiliary information, thereby facilitating the collection of depth-of-field (DOF) and texture data. Despite their development of a novel system, it still fails to address the limitations associated with the requirement for specific equipment to obtain DOF.
4. Experiment
In this section, to intuitively illustrate the advantages and disadvantages of different types of algorithms, their recovery performances are evaluated from qualitative and quantitative perspectives. First, we conducted a comparison of classic traditional methods (HE, AHE, CLAHE, Homomorphic Filtering [
53], SSR, MSR, and Laplace [
54]) to reveal the shortcomings of simply increasing contrast (here, we remark that, despite the fact that these traditional methods comes from older literature sources, they represent the early evolution of image enhancement). Then, the results restored by physical-model-based methods (CAP [
12], DCP [
10], IDE [
43], IDRLP [
18], NLP [
19], and TERAL [
17]) were evaluated and compared. Subsequently, we quantitatively and qualitatively compared the results obtained by state-of-the-art deep learning-based technologies, including deep models employing different networks (AOD, Cycledehaze, DehazeNet, GridDehazeNet) and deep models using different training strategies (C2P-Net, FFA, Taylor, UHD, Vison, DE, USID, SLA, SDA-GAN), on various challenging hazy images. For fairness, the codes of selected available techniques for comparison were downloaded from authors’ homepage, and the parameters used in these techniques were optimized according to the corresponding references. Note that all of the experiments were implemented on a PC with an Intel(R) Core(Tm) i5-4210U CPU @ 1.70 GHz, 8.00 GB RAM, and NVIDIA 3090 Ti GPU (for comparative algorithms, their more detailed configurations as shown in
Table 1), and the hazy images used in the experiments were collected from publicly available datasets (I-haze [
55], O-haze [
56], and SOTS [
57]).
4.1. Performance Description of Non-Deep Learning Defogging
As discussed above, non-deep learning defogging mainly includes two types: traditional image defogging and physical-model-based defogging. To investigate the advantages and disadvantages of these two types of algorithms, we conducted a lot of experiments based on O-haze, D-haze [
42], I-haze, and DN-haze [
58] datasets. Here, we remark that, the reason of using these datasets is because they contain different real-world scenes with different haze thickness distributions, which can better check the performance of different algorithms.
4.1.1. Limitations of Traditional Image Defogging
Qualitative comparison: In this subsection, seven representative techniques, i.e., HE, AHE, CLAHE, Homomorphic Filtering, SSR, MSR, and Laplace, were selected to check their performance on a variety of challenging hazy images. The comparison results are illustrated in
Figure 5. As seen in
Figure 5, HE is capable of dealing with most scenes. However, it may lead to darker performances and suffer from severe artifacts. AHE may suffer from the color cast issue, and CLAHE still leaves a significant amount of residual haze. The Homomorphic Filtering method causes severe color distortion on given examples, and yields disastrous results when processing high-brightness regions. SSR and MSR exhibits over-enhancement of the sky regions and color distortions in misty scenes. Although Laplace effectively enhances edges to a certain extent, it introduces noise and blurring.
Quantitative comparison: To reach a more comprehensive evaluation, Peak Signal-to-Noise Ratio (PSNR) [
59] and Structural Similarity (SSIM) [
60] calculated on several representative algorithms (HE, AHE, CLAHE, Homomorphic Filtering, SSR, MSR, and Laplace approaches) based on the SOTS dataset are shown in
Table 2. As seen from this table, all of the selected traditional image enhancement algorithms can have a fast processing speed. However, their scores of PSNR and SSIM still present several low values, which reveals that these methods lack the ability to remove the haze cover in an images. Taking CLAHE as an example, for a few examples, this method can eliminate the effect caused by haze thanks to its local contrast enhancement capability, while its enhanced results appear to blur and color cast.
4.1.2. Limitations of Physical-Model-Based Defogging
Qualitative comparison:
Figure 6 shows the results dehazed by six representative techniques, including CAP, DCP, IDE, IDRLP, NLP, and TERAL. Note that these images used for comparison were also picked from the above dataset. As shown in
Figure 6, the DCP-based approach shows its advantages on different datasets. However, the DCP algorithm can lead to over-saturation and color distortion. For other mainstream algorithms, CAP, IDE, and IDRLP all overmagnify the details of image content, and thus produce some undesirable artifacts in the dehazed results.
Quantitative comparison: To obtain a more reliable conclusion, the calculated PSNR and SSIM for CAP, DCP, IDE, IDRLP, NLP, and TERAL algorithms are summarized in
Table 3. By comparison, it can be found that IDRLP has the best performance in the test results of the four data sets. In particular, its PSNR and SSIM indicators are excellent, and the running time is very short. However, this does not mean that it can serve as an excellent candidate for the image fog removal task. This is because the prior knowledge employed by these physical-model-based defogging methods may fail in some cases, thereby resulting in some negative effects in enhanced versions.
4.2. Performance Description of Deep Learning Defogging
4.2.1. Performance Analysis of Network Architecture
Qualitative comparison: In this subsection, we further checked the performance among different image defogging methods, including DehazeNet (using CNN architecture), GridDehazeNet (using Attention architecture), AOD (using ResNet architecture), and Cycledehaze (using GAN architecture), on various challenging synthetic images. The corresponding experimental results of the dehazing models using different architectures are shown in
Figure 7. It is easily noted from this figure that, regardless of the network architecture used, it seems that deep learning defogging would have a better processing performance than non-deep learning defogging techniques. The key to achieve this success can be attributed to the strong fitting ability of deep models. However, these architectures still have their drawbacks, e.g., CNN can not work well on dark regions and GAN fails to deal with sky parts.
Quantitative comparison: To obtain a more reliable conclusion, the calculated PSNR and SSIM for DehazeNet, GridDehazeNet, AOD, and Cycledehaze are summarized in
Table 4. Note that these metric scores were averaged over all the results from the used datasets. Upon comparison, it is obvious that DehazeNet using CNN architecture generally attains the best scores, while CycleDehaze employing GAN only gets the last place. This means the unpaired dataset trained by GAN can not afford the adequate fog features to defogging models. On the other hand, the values in this table are also further evident that deep learning defogging outperforms non-deep learning defogging techniques in terms of visual quality and quantitative scores.
4.2.2. Performance Analysis of Training Mode
Qualitative comparison: In the above, we have experimentally shown that the impact of different network architectures on defogging performance. In fact, the training mode is also crucial to image fog removal models. Therefore, nine state-of-the-art methods (i.e., UHD, C2P-Net, Taylor, Vision, FFA, USID, DE, SLA, and SDA-GAN) were selected to check the impact of different training mode. The corresponding results dehazed by different available techniques are given in
Figure 8. As expected, most of the selected methods can exclude the haze cover in an image to some extent. However, they all have their own limitations. For the methods using supervised mode, they are hard to balance the enhancement quality between thickness haze and mist images. For the methods exploiting unsupervised mode, they lack the ability to handle the regions where the brightness is similar to atmospheric light.
Quantitative comparison: To provide a more comprehensive evaluation, the calculated PSNR and SSIM for the nine algorithms are summarized in
Table 5 and
Table 6. As analyzed from the tables, the defogging networks using the supervised mode have a more robust capability than the ones using the unsupervised mode. However, their computational complexity is significantly higher than that of ones using unsupervised mode and semi-unsupervised mode.
Overall, non-deep learning defogging methods either leverage statistical image properties (traditional image enhancement) or a combined atmospheric scattering model and prior knowledge (physical-model-based defogging) to realize image fog removal. This make them relatively straightforward to implement with low algorithmic complexity and less computational resource consumption. However, because of the fact that the limitations of statistical image properties and prior knowledge, they always fail to deal with complex scenes, especially for the images with uneven fog. For deep learning defogging, different network architecture used in defogging algorithms may exhibit different defogging effects, e.g., CNN is able to effectively extract local features, and transformer can excavate the global features to enhance a single foggy image. Moreover, currently available defogging networks generally make use of supervised, unsupervised, and semi-supervised modes to train the created network. According to the experiment results, the networks using supervised mode can have a reliable performance on synthetic dataset, while they may invalid in real-world scenarios. On the contrary, the models employing unsupervised and semi-supervised modes work well on scenes collected from the real world, yet they fail to process the synthetic images.