4.1.2. Synthetic Datasets
Foggy Road Image DAtabase 2 (FRIDA2) [
36] and D-HAZY [
38] are two synthetic datasets that contain pairs of ground truth reference images and hazy images of the same scene. There are 264 synthetic hazy images of 66 road scenes in the FRIDA2 dataset and they are generated completely by computer graphics. Each road scene comprises a ground truth reference and four associated hazy images corresponding to four different kinds of haze: homogeneous, heterogeneous, cloudy homogeneous, and cloudy heterogeneous. This dataset is designed for Advanced Driver Assistance Systems (ADAS) [
38]. Since FRIDA2 solely contains computer graphic generated scenes, it may not be valid for real scenarios. Accordingly, an additional D-HAZY dataset consisting of real ground truth reference images and synthesized hazy ones has come into use. It contains over 1400 images of real scenes and their corresponding depth maps captured by a Microsoft Kinect camera. Then, the atmospheric scattering model is used with the assumptions that the atmospheric light and haze density are both uniform to synthesize hazy images.
As these datasets have ground truth reference images, Structural SIMilarity (SSIM) [
41], Tone-Mapped image Quality Index (TMQI) [
42], and Feature SIMilarity extended to color image (FSIMc) [
43] are employed in an evaluation. Additionally, Fog Aware Density Evaluator (FADE) [
44], a learned metric predicting the haze density in a scene, also is used to assess the visibility of the scene after removing haze. The smaller FADE is, in this context, the larger the amount of haze that has been removed.
It is supposed that
and
are two nonnegative image luminance signals. The SSIM measure of two images is calculated as follows:
where (
,
) and (
,
) are the local average and standard deviation of
,
, respectively;
and
are constants for avoiding the instability when (
) and (
) are very close to zero; and
is the correlation coefficient between (
) and (
). The value of SSIM ranges from 0 to 1, in which higher SSIM means two images are more structurally similar.
Moreover, Equation (28) gives the formula of TMQI, where
and
now denote two color images.
is the statistical naturalness measure,
is the overall structural fidelity score,
is used to adjust the relative importance of these two components, and
and
determine their sensitivities. Even though
and
are two color images, TMQI only evaluates their grayscale channel to assess the structural information preserving characteristics. TMQI takes on values within the range [0, 1], in which the higher, the better:
FSIMc is developed based on the observation that a human visual system perceives an image mainly according to its low-level features, such as the phase congruency, the image gradient magnitude, and the chrominance similarity. FSIMc is calculated by:
where
denotes the combined similarity measure of the phase congruency and the gradient magnitude similarities between two images,
the chrominance similarity measure,
the coefficient weighting the importance of
in the overall similarity between
and
, and
is a positive constant for adjusting the importance of the chrominance component. Looking at Equation (29),
means the whole image domain. Since both SSIM and TMQI are designed to evaluate grayscale images only, an additional FSIMc is used to conduct a thorough assessment. Similarly, FSIMc lies between 0 and 1, where the higher FSIMc means that the compared image resembles the reference image to a greater degree.
FADE is a referenceless evaluation metric based on natural scene statistics (NSS), fog aware statistical features (FASFs), and Mahalanobis distance. The feature domain is defined based on a total of twelve features, including NSS and FASFs [
44]. Choi et al. [
44] first collected 500 hazy images and another 500 haze-free images. The features extracted from these two image groups were then individually fitted to the multivariate Gaussian (MVG) model to calculate their corresponding mean vector (
,
) and covariance matrix (
,
), respectively. To access the perceptual haze density of a particular input image, its extracted features also are fit to MVG model. The mean vector (
) and covariance matrix (
) are then employed to calculate the Mahalanobis distances from input image to hazy image group (
) and haze-free image group (
), as shown in Equations (30) and (31). Finally, FADE is calculated as the ratio in Equation (32) and then has positive values, where the smaller is the better.
Table 2 and
Table 3 display the average SSIM, TMQI, FSIMc, and FADE results on the FRIDA2 and D-HAZY datasets, respectively. The best result is marked with bold. Since the luminance information contributes toward the calculation of SSIM, TMQI, and FSIMc, the noticeable background noise of HPM and the halo artifacts of FFD increase these quantitative scores to a certain extent. Thus, they exhibit fairly good performance in terms of SSIM, TMQI, and FSIMc, but not FADE. Alternately, CAP possesses the best FADE score and high TMQI and FSIMc. This is because CAP often excessively removes haze from hazy images, bringing about better colorfulness restoration. Removing too much haze also causes loss of image details, however, resulting in CAP’s low SSIM. Regarding DCP, as FRIDA2 dataset contains images of a road scene with sky regions, DCP shows poor performance. The proposed ICAP is the best performing method under the SSIM, TMQI, and FSIMc metrics and the second-best method in terms of FADE. It is attributed to the adaptive constraints on the lower limit of a transmission map, which avoids the excessive haze removal that occurs in CAP. According to
Table 3, however, ICAP is only the third best under the SSIM and FSIMc metrics, and the best under the FADE metric. This is because D-HAZY is a kind of biased dataset that solely consists of indoor images, while our proposed ICAP has been tuned to gain best performance in both outdoor and indoor scenes. The fact that DCP performs best on the D-HAZY dataset shows that DCP is a fine prior for indoor dehazing, and our experiment results are consistent with those presented by Ancuti et al. [
38].
4.1.3. Real Datasets
The IVC [
37], O-HAZE [
39], and I-HAZE [
40] image datasets are used in this section to assess the dehazing performance on hazy images of the real scene. The IVC dataset contains 25 hazy images of people, animals, landscape, indoor, and outdoor real images and does not contain their corresponding ground-truth haze-free images. Conversely, the O-HAZE dataset consists of 45 pairs of hazy and haze-free images of outdoor scenes, and the I-HAZE dataset comprises 30 pairs of hazy and haze-free images of indoor scenes. The haze in the hazy scenes of O-HAZE and I-HAZE datasets is created by the vapor generator.
Due to the lack of reference images in the IVC dataset, the rate of new visible edges (
), the quality of the contrast restoration (
), proposed by Hautiere et al. [
45], and FADE are used to evaluate the performance of dehazing approaches. Since O-HAZE and I-HAZE possess reference images, SSIM, TMQI, FSIMc, and FADE are used as in the preceding section. The values of
and
are determined by the following equations:
where
and
denote the numbers of the set of visible edges in the restored image and the original image, respectively. Thus,
assesses the ability of an algorithm to restore edges that are visible in the restored image, but are not in the original one [
45]. Using Equation (34),
denotes the set of visible edges in the restored image, and
is the ratio determining the improvement of visibility level. Therefore, higher values of
and
are desired in an image processing task.
Table 4,
Table 5 and
Table 6 show the average evaluation metric results on the IVC, O-HAZE, and I-HAZE datasets. Similarly, the best results are shown in bold. The values of
and
are directly proportional to the number of newly restored edges in the dehazed image. Hautiere et al. [
45], however, stated that any set of edges which have a local contrast above 5% are assumed to be the visible edges. Accordingly, in methods susceptible to the background noise, like FFD and HPM, it appears that noises may be misinterpreted and then adversely affect the accuracy of
and
. Hence, even though ICAP has lower
and
results than FFD and HPM, it may still be the best performing method when considering the background noise resistant characteristics. This can be observed in the qualitative evaluation in the next section. Using the O-HAZE dataset, our proposed ICAP is the best method in terms of TMQI and FADE, and the second best under SSIM and FSIMc metrics. The dehazing performance is even more impressive on the I-HAZE dataset, where the proposed ICAP exhibits the best dehazing power under SSIM, TMQI, and FSIMc metrics. The clear superiority of ICAP over CAP is attributed to all the improvements presented in
Section 3. The adaptive linear weight for handling color distortion makes ICAP outperform CAP in dark areas, resulting in smaller FADE, except for in the I-HAZE dataset, where FADE = 0.8053 may be too low, causing loss of image details. This, coupled with the adaptive constraints on the transmission map and the background noise reduction, leads to the higher TMQI and FSIMc. Adding to that, the adaptive tone remapping post-processing also contributes toward the high scores of TMQI and FSIMc.