SCL-Dehaze: Toward Real-World Image Dehazing via Semi-Supervised Codebook Learning

Cui, Tong; Dai, Qingyue; Zhang, Meng; Li, Kairu; Ji, Xiaofei

doi:10.3390/electronics13193826

Open AccessArticle

SCL-Dehaze: Toward Real-World Image Dehazing via Semi-Supervised Codebook Learning

by

Tong Cui

¹

,

Qingyue Dai

¹

,

Meng Zhang

^1,*

,

Kairu Li

^2,*

and

Xiaofei Ji

³

¹

College of Artificial Intelligence, Shenyang Aerospace University, Shenyang 110136, China

²

College of Electrical Engineering, Shenyang University of Technology, Shenyang 110870, China

³

College of Automation, Shenyang Aerospace University, Shenyang 110136, China

^*

Authors to whom correspondence should be addressed.

Electronics 2024, 13(19), 3826; https://doi.org/10.3390/electronics13193826

Submission received: 26 August 2024 / Revised: 24 September 2024 / Accepted: 25 September 2024 / Published: 27 September 2024

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

Download

Browse Figures

Versions Notes

Abstract

:

Existing dehazing methods deal with real-world haze images with difficulty, especially scenes with thick haze. One of the main reasons is lacking real-world pair data and robust priors. To improve dehazing ability in real-world scenes, we propose a semi-supervised codebook learning dehazing method. The codebook is used as a strong prior to guide the hazy image recovery process. However, the following two issues arise when the codebook is applied to the image dehazing task: (1) Latent space features obtained from the coding of degraded hazy images suffer from matching errors when nearest-neighbour matching is performed. (2) Maintaining a good balance of image recovery quality and fidelity for heavily degraded dense hazy images is difficult. To reduce the nearest-neighbor matching error rate in the vector quantization stage of VQGAN, we designed the unit dual-attention residual transformer module (UDART) to correct the latent space features. The UDART can make the latent features obtained from the encoding stage closer to those of the corresponding clear image. To balance the quality and fidelity of the dehazing result, we design a haze density guided weight adaptive module (HDGWA), which can adaptively adjust the multi-scale skip connection weights according to haze density. In addition, we use mean teacher, a semi-supervised learning strategy, to bridge the domain gap between synthetic and real-world data and enhance the model generalization in real-world scenes. Comparative experiments show that our method achieves improvements of 0.003, 2.646, and 0.019 over the second-best method for the no-reference metrics FADE, MUSIQ, and DBCNN, respectively, on the real-world dataset URHI.

Keywords:

real-world dehazing; latent space feature correction; codebook; semi-supervised learning

1. Introduction

The presence of haze often degrades images captured by outdoor optical sensors, such as those used in video surveillance and remote sensing, leading to a loss of texture details and reduced contrast. This degradation adversely affects the subsequent use of the collected data. For example, the complexity of the physical environment at urban road intersections is crucial for navigation tasks [1]. Consequently, image dehazing research has gained significant attention in recent years.

The emergence of deep learning has significantly improved the performance of the dehazing methodology. Methods employing CNN or transformer architectures perform end-to-end mapping from hazy to clear images. These methods only perform well in synthesized hazy images or low haze-density scenes. When it comes to real-world or thick-haze conditions, these end-to-end dehazing methods [2,3,4] which lack additional guidance usually obtain sub-optimal recovery results. To improve the recovery quality and reduce the uncertain hazy images to the mapping between hazy image and clear images. Some deep learning dehazing methods have introduced traditional priors based on deep learning to guide the dehazing process. For example, PSD [5] employs a prior loss committee in the fine-tuning process, including constraints such as dark channel prior (DCP) [6], bright channel prior (BCP), and so on. The traditional prior fusion methods can partly reduce the uncertainty of the end-to-end mapping. However, most of these traditional priors have some limitations and do not provide sufficient guidance for the recovery of haze images, such as the DPC assumption, which does not exist in the sky regions at all. Thus, this kind of method is not robust, especially in the conditions that the prior assumption is not useful, and the mapping uncertainty still exists.

Recent studies of discrete generative prior codebooks [7,8,9,10,11,12] have shown exciting performance in image restoration, exhibiting fewer pattern crashes and more stable training. The learnable codebook provides a strong prior for compressing and reconstructing natural images. However, because the low-quality inputs of the intrinsic texture are usually corrupted. Information loss and diversity degradation of the input image inevitably distort the feature distribution and affect accurate feature matching. The work [9] demonstrated that features of degraded images also do not cluster well to the exact code but spread to other nearby code clusters.

Therefore, the codebook obtained by nearest-neighbor matching is unreliable during the image recovery vector quantization stage. To improve fidelity, discrete generative priors in codebook form require skipping connections between the encoder and decoder. However, the design of the skipping connection simultaneously introduces artifacts in the result when the input is severely degraded. All in all, codebook learning-based approaches suffer from two difficulties when applied to image restoration tasks: (1) finding accurate latent vectors in degraded content to match with the codebook, and (2) there is a trade-off between the fidelity and quality of the restored result.

Early codebook learning is mainly applied to image compression, and efficient image compression can be achieved by mapping image blocks to sparse representations in the codebook. During decompression, an approximate original image can be reconstructed, effectively reducing storage space and transmission bandwidth while maintaining image quality. For example, VQVAE [13] achieves efficient image compression and reconstruction by learning a discrete codebook and using a variational self-encoder to learn the mapping relationship between the image and the codebook. With the development of deep learning techniques, codebook learning is gradually being applied to the field of image restoration to reconstruct blurred, noisy, or corrupted images [7,8,9,14]. Codebook learning has shown advantages in tasks such as image denoising, deblurring, and image reconstruction, mainly due to its ability to deal with uncertainty in the input data, thus producing high-quality results with more realistic and complex textures. For example, VQGAN [15] achieves high-quality image generation and reconstruction by learning a context-rich codebook through a convolutional neural network and combining adversarial training and perceptual loss.

Moreover, in real-world hazy scenes, the domain gap between synthesized haze images and real-world haze images presents a significant challenge. Supervised methods attain notable performance on synthesized datasets, but they overfit the training data and have limited ability to generalize to real-world scenes. In unsupervised dehazing methods [5,16,17,18,19,20], the advantage lies mainly in elimination the dependence on paired image data. Such dehazing algorithms can be trained with unlabeled real-world data and with good generalization ability in real-world hazy scenes. However, unsupervised methods heavily rely on prior knowledge of image scenes and lack strong constraints during training, resulting in suboptimal dehazing effects.

In this work, we propose a semi-supervised codebook learning framework for dehazing. To address the problems of codebook learning in the field of image restoration, we make the following design. In the discretization stage, to reduce the error rate of the nearest-neighbor matching between the codebook and the latent features of degraded images, we designed the unite dual-attention residual transformer module (UDART). The UDART structure corrects the latent features at the encoding stage to obtain latent features that are more consistent with the clear image in terms of distribution. At the same time, the latent spatial recovery results obtained from nearest neighbor matching are corrected in the decoding stage. We aim to obtain latent spatial recovery results that are more consistent with the clear image. With these two corrections, our method produces fewer haze residuals in the dehazing results compared to existing methods. To ensure the fidelity and quality of the recovery results, we designed the haze density guided weight adaptive (HDGWA) module to adaptively adjust the weights of skip connections based on the haze density of the input image. To bridge the domain gap between synthesized and real-world data, we adopt the strategy of mean teacher for semi-supervised dehazing learning. Inspired by semi-supervised underwater image restoration [21], we establish an optimal pseudo-label bank for unlabeled real-world data, storing the best dehazing results obtained during the training process. These results serve as pseudo labels for the consistency constraint in the unsupervised branch. No-reference evaluation metrics are selected through experimentation to evaluate the dehazing results. In summary, our main contributions are as follows:

1.: We design a dehazing network for real-world data based on a codebook prior as a strong prior to guiding the image dehazing process.
2.: To address the problem of erroneous nearest-neighbor matching caused by features extracted from low-quality haze images, we have devised the UDART module to obtain high-quality latent features.
3.: To achieve a flexible trade-off between restoration quality and fidelity, we propose the HDGWA module that adaptively adjusts the weights of skip connections based on the haze density of the input image.
4.: To bridge the domain gap between synthesized haze images and real-world haze images, we adopt the strategy of mean teacher for semi-supervised dehazing learning. The real-world unlabeled image in the training process enhances the model’s generalization ability to real-world scenes.

In the Section 2, we introduce the codebook learning algorithm and mean teachers strategy that mostly inspire the method presented in this paper. In the Section 3, we specifically elaborate on the proposed SCL-Dehaze, UDART, and HDGWA. In the Section 4, we perform both quantitative and qualitative comparisons and analyses of the proposed methods against current state-of-the-art algorithms and conduct ablation studies on the proposed network structure. In the Section 5, we summarize our findings and potential directions for future research.

2. Related Work

2.1. Codebook Learning

In the VQGAN (Vector Quantized Generative Adversarial Network) [15] model, the “codebook” refers to a collection of vectors that are encoded through learning the features of the input dataset. The fundamental principle of the codebook involves mapping image feature representations to a discrete, predefined codebook. Initially, an encoder transforms the image into low-dimensional feature representations. These features are then quantized, meaning each feature vector is mapped to the nearest codeword in a fixed-size codebook, effectively discretizing the continuous feature space into a finite set of codes. The quantized features are subsequently decoded back into image space by the decoder, aiming to reconstruct an image as close to the original as possible. During training, the generator and discriminator engage in adversarial training to optimize the model. Simultaneously, the codebook’s codewords are adjusted based on the features’ needs, enhancing the quality of image reconstruction. Thus, the codebook in the VQGAN aids in preserving crucial visual information and improving the quality and stability of the generated images by discretizing the feature representations.

This work draws inspiration from the use of codebook learning in image restoration and applies it to the task of image dehazing. We employ a two-stage approach for dehazing. In the first stage, the VQGAN is used to train the codebook and decoder on large datasets, establishing a prior for the dehazing process. In the second stage, we introduce the SCL-Dehaze method to address the specific challenges encountered when applying the codebook to image dehazing. This method effectively removes haze from images and restores clear, real-world scenes.

2.2. Mean Teachers

In the realm of semi-supervised learning, the mean teacher approach has garnered considerable attention for its innovative training strategy. Built upon the “teacher–student” model framework, this method effectively leverages both labeled and unlabeled data through the construction of two structurally identical networks: the student network and the teacher network. During training iterations, the parameters of the student network are updated, while the parameters of the teacher network are updated by the exponential moving average (EMA) of the student’s weights, ensuring training stability. The mean teacher method minimizes the consistency loss between the outputs of these two networks on unlabeled data, thereby enhancing generalization ability to unseen samples. Notable advantages of this approach include improved generalization performance, training stability, wide applicability, and efficient utilization of unlabeled data. Particularly in scenarios where labeled data are scarce and unlabeled data are abundant, the mean teacher method shows significant potential in enhancing model performance. Its concise and efficient strategy has led to successful applications in various deep learning tasks, such as image classification and speech recognition, demonstrating adaptability and effectiveness.

In the field of image dehazing, integrating the mean teacher method into the training process presents a fresh approach. This strategy is to harness a large volume of unlabeled hazy image data, which are often readily available in real world, unlike obtaining labeled data comprising corresponding clear images, which can be challenging. The mean teacher method finds a way to obtain pseudo-labels for unlabeled hazy images and utilizes consistency loss to improve the accuracy and robustness of the network. Specifically, it constructs a teacher model that outperforms the student model through an exponential moving average (EMA) strategy. The teacher’s predictions are used as pseudo-labels to guide the students’ training.

3. Methods

In this section, we introduce the dehazing architecture designed in our work. To enhance output quality, reduce uncertainty in the haze-to-clear image mapping, and supplement high-quality details, we design a dehazing network based on codebook priors. The backbone network structure of our method is illustrated in Figure 1. Addressing the issue of erroneous nearest-neighbor matching caused by features extracted from degraded haze images, we devised UDART to obtain high-quality latent features, thereby minimizing incorrect nearest-neighbor matches. To better focus on thick-haze pixels and essential channel information, the UDART module is composed of four dual-attention residual transformer blocks, specifically designed in this study to enhance the dehazing process performance. Additionally, to balance fidelity and restoration quality in the recovered images, we designed the HDGWA to adjust the weights of skip connections flexibly. In terms of learning strategy, to bridge the domain gap between synthesized and real-world data, we adopt the mean teacher strategy for semi-supervised dehazing learning. The overall Semi-supervised Codebook Learning framework (SCL-Dehaze) of the proposed strategy is depicted in Figure 2.

3.1. Adaptive Dehazing Network with Codebook Learning

The encoder–decoder architecture has achieved significant success in the field of image restoration and enhancement. Typically, the encoder–decoder structure follows the approach of progressively mapping the input to a low-resolution representation, then gradually reverse-mapping it back to the original resolution. Although these methods enable learning rich contextual information by reducing spatial resolution of feature maps, a drawback is the hard-to-recover fine spatial details during the decoding stage, leading to loss and corruption of local textures and details in low-quality hazy image inputs. This challenge is particularly critical in tasks like dehazing, where preserving fine details is crucial for generating high-quality images. To address this problem, the VQGAN [15] introduces a codebook to reduce the uncertainty in the recovery mapping and improve the quality of the recovered images. Adversarial supervision is also introduced to further improve the perceptual quality of reconstruction results. The learned discrete codebook helps to improve the performance of many low-level visual tasks, including the reconstruction of blurred, noisy, or corrupted images [11,12,13,22]. Thus, the VQGAN’s ability to preserve fine details and handle uncertainty makes it a suitable choice for our dehazing task, as it effectively addresses the common issue of texture loss in low-quality inputs. We therefore use VQGAN as an architecture to obtain pre-trained discrete codebooks that help improve the performance of the dehazing task.

In stage 1 of our training process, we train VQGAN [15] on a large dataset of clear images to obtain a rich contextual codebook, which enhances the network’s expressiveness and robustness. In the second stage, we freeze the codebook obtained in the first stage along with its corresponding decoder and train the dehazing model on a dataset of hazy images.

Stage 1 Codebook Learning: We begin with a concise overview of the VQGAN [15]. As illustrated in Figure 3, the clean input image

I_{c}

is initially fed into the encoder E to produce its output feature

\hat{z}

. Subsequently, the discrete representation of

\hat{z}

is computed by finding the nearest neighbors of each element in the codebook Z:

z_{q} = q (\hat{z}) = a r g m i n_{z_{i} \in Z} ∥\hat{z_{i}} - z_{i}∥

(1)

The reconstruction

I_{r e c} \approx I_{c}

is then given by:

I_{r e c} = G (z_{q}) = G (q (E (I_{c})))

(2)

Training Objective: To train the autoencoder and codebook, we employ three image-level losses: reconstruction loss

L_{1}

, perceptual loss

L_{p e r}

[23], and adversarial loss

L_{a d v}

[22], computed using

I_{c}

and

I_{r e c}

. Since the quantization in Equation (1) is non-differentiable, we employ the straight-through gradient estimator from [13,15], which directly copies gradients from the decoder G to the encoder E. This approach facilitates backward propagation and enables end-to-end training using the loss function:

\begin{matrix} L_{V Q} (E, G, Z) & = ∥ I_{c} - I_{rec} ∥^{2} + {∥ sg [E (x)] - z_{q} ∥}_{2}^{2} \\ + β ∥ sg [z_{q}] - E (x) ∥_{2}^{2} \end{matrix}

(3)

where

s g [\cdot]

denotes the stop-gradient operation,

L_{r e c} = {∥I_{c} - I_{r e c}∥}^{2}

, and

β = 0.25

is a hyperparameter controlling the frequency of codebook updates.

With the above image-level and code-level losses, we can summarize the training objective in Stage I as:

L_{s t a g e 1} = L_{V Q} + L_{p e r} + λ L_{a d v}

(4)

where the loss weight

λ

is set to

0.1

.

Stage 2 Dehazing Model Learning: Due to the texture loss in hazy images, the nearest neighbor (NN) matching in Equation (1) often fails to find accurate recovery codes. To alleviate this issue, we designed the UDART module to predict latent features closer to them of clear images. Additionally. After codebook matching, we also incorporated the UDART module to obtain higher-quality latent space recovery results. To achieve a flexible balance between restoration quality and fidelity in the decoder stage, we propose the HDGWA module, allowing for the adaptive adjustment of skip connection weights based on the haze density of the input image.

Training Objective: In this stage, we keep the codebook Z and decoder D fixed, and training is carried out for both the encoder and the UDART module. In the first stage, we have obtained the encoded result

\hat{z}

of clear images and the discretized features

z_{q}

, which serves as the ground truth for training UDART. We employ

L 2

loss and Gram matrix loss for training:

L_{U D A R T (E)} = β {∥\hat{z} - {\hat{z}}^{'}∥}_{2}^{2} + α {∥ψ (\hat{z} - {\hat{z}}^{'})∥}_{2}^{2}

(5)

The loss of the decoding stage can be expressed similarly:

L_{U D A R T (D)} = β {∥z_{q} - {z_{q}}^{'}∥}_{2}^{2} + α {∥ψ (z_{q} - {z_{q}}^{'})∥}_{2}^{2}

(6)

where

ψ

represents the Gram matrix computing features, while

α

and

β

are their respective weights.

The Gram matrix loss, also known as style loss, has been demonstrated to be beneficial in restoring textures [24]. Additionally, we used reconstruction loss, perceptual loss and GAN loss to calculate the input and output losses of the backbone. In summary, the total loss for the dehazing model learning stage is:

L_{s t a g e 2} = L_{r e c} + L_{U D A R T (E)} + L_{U D A R T (D)} + L_{p e r} + λ L_{a d v}

(7)

where the loss weight

λ

is set to

0.1

.

3.2. Unite Dual-Attention Residual Transformer

In the backbone network, the latent space feature reconstruction is obtained by nearest neighbor feature matching (NNFM). In the process of NNFM, matching errors will be generated in latent space features obtained from low-quality images. We designed the UDART module with four dual-attention residual transformer (DART) blocks to correct the latent space features obtained at the dehazing stage. The schematic diagram of the DART is shown in Figure 4.

In the DART block, feature X is firstly sent into a simple convolutional structure, and the convolutional result

X^{'}

is processed by two attention branches in the spatial and channel dimensions. Branch 1 leverages the spatial dependency of convolutional features to generate a spatial attention map, then uses the spatial attention map to recalibrate the

X^{'}

. Branch 2 utilizes channel-wise relationships of convolutional feature maps by employing squeeze and excitation operations. The parallel Branch 1 and Branch 2 make pixel-wise features more valuable for dehazing, and simultaneously we introduce residual connections to preserve some of the input information of the UDART.

Due to the inherent locality and limited global context modeling of convolution, which can restrict dehazing performance, we introduce transformer blocks after the parallel Branch 1 and Branch 2 to handle this issue. Inspired by [25], using the residual Swin transformer block(RSTB) for deep feature extraction, we also employ the RSTB in the DART block. Finally, we combine the RSTB, which possesses remarkable global context modeling capability and convolutional structure, with parallel Branch 1 and Branch 2 as the DART. Ablation experiments have demonstrated that by leveraging the UDART module, which is composed of DART blocks, the error rate of NNFM is reduced, and the proposed UDART module gains a strong ability to remove dense haze.

3.3. Haze Density Guided Weight Adaptive Module

Due to the information loss caused by vector quantization operations, the fidelity of the results is often reduced. To enhance fidelity, it is common to introduce skip connections between the encoder and decoder [26,27,28]. However, this design may introduce artifacts in the results when hazy image degradation is severe. To ensure that the final results maintain high fidelity and remain free from artifacts, we designed a haze density guided weight adaptive module(HDGWA), which can adaptively adjust skip connection weights.

Specifically, during the training process, we establish skip connections between features of different scales from the encoder stage and their corresponding scale features from the decoder. We leverage the Fog Aware Density Evaluator (FADE) [29] and convolution to learn the guided weight map of these skip connections. This approach allows for adaptive control of the skip connection weights, accommodating images with varying haze densities. The visualization results of the weight map are shown in Figure 5.

To learn the haze density guided weight map, we first crop the input 256 × 256 image into 16 × 16 patches. Then, we get the haze density for each patch with the FADE, assuming the local consistency of haze density within each patch. These patches are then tokenized and fed into HDGWA. Through training, we developed a non-uniform and continuous haze density map that more accurately describes the distribution of haze in real-world haze images. The formula for calculating the haze density guided weight map is as follows:

F (x) = \{\begin{matrix} \frac{1}{s c o r e} & if s c o r e \geq 1 \\ \frac{1}{⌈ score ⌉} & if score < 1 \end{matrix}

(8)

where score is the value of FADE(x). The range of the score is in positive floating-point numbers. To adapt the score for the guided weight map, we take the reciprocal of the score. Moreover, experimental results indicate that for scores less than 1, taking the ceiling of the score yields better performance.

During inference, our dehazing network can adaptively adjust skip connection weights, enabling a better balance between fidelity and restoration quality in the resulting image restoration.

3.4. Semi-Supervised Learning Strategy

The mean teacher semi-supervised learning framework has achieved significant results in underwater image restoration [21]. Our dehazing method is also based on mean teacher. Specifically, in our semi-supervised strategy, the teacher network and the student network are totally the same as the proposed backbone shown in Figure 2. The two networks differ primarily in how their weights are updated. The student network weights,

θ_{s}

, are updated using gradient descent. The weights of the teacher network,

θ_{t}

, are updated using the exponential moving average (EMA) of

θ_{s}

.

θ t = η θ t + (1 - η) θ s

(9)

where

η \in (0, 1)

represents the momentum.

This updating strategy enables the teacher model to promptly aggregate previous learning weights after each training step. As described in [30], compared to standard gradient descent, temporal weight averaging can stabilize the training process and improve the performance of teacher model. Our teacher network processes unlabeled images to obtain relatively clear results. Then, the results serve as pseudo-labels to learn the student network from unlabeled data.

Additionally, to enhance the effectiveness of the consistency loss in the mean teacher framework, a reliable bank is established to store the best predictions of the teacher network during the training process. To assess the quality of dehazing results, we select four no-reference evaluation metrics through experiments. The experimental details are referred to in Section 4.3.

4. Experiments

In this section, we describe the dataset and provide implementation details first. We also evaluate the results of SCL-Dehaze, both qualitatively and quantitatively, comparing them with the state-of-the-art methods. Finally, the dehazing performance of the proposed method on real-world image datasets is reported.

4.1. Implementation Details

Datasets

In stage 1, we use the DIV2K dataset [31] (containing 1000 images), Flickr2K [32] (containing 4250 images), and haze-free images from the Outdoor Training Set (OTS) [33] (containing 8970 images) as our dataset. The ratio of training images, validation images, and test images is 8:1:1.

In stage 2, we utilize the Outdoor Training Set (OTS) and Unannotated Real Hazy Images (URHI) to train our semi-supervised SCL-Dehaze framework. Both datasets are subsets of the RESIDE dataset [33]. We randomly selected 3500 pairs of synthetic haze images from OTS for the supervised branch and 3500 haze images from URHI for the unsupervised branch. The ratio of training images to validation images is 9:1. To enhance data variability, we randomly cropped each image to 256 × 256 pixels.

We randomly selected a subset of 500 images from the URHI dataset to serve as the test set, excluding the 3500 images used for training. This subset is referred to as the TEST500 dataset.

4.2. Implementation Settings

Stage 1 and stage 2 are trained with 1.5 M and 200 K iterations, respectively. The learning rate is fixed at 0.0001, and the batch size is set to 16. The proposed network is established with the PyTorch 1.10 framework and trained with two NVIDIA GeForce RTX 3090s.

4.3. No-Reference IQA Indicator Assessment Experiments

For lack of haze-free ground truth, the referenced SSIM and PSNR are no longer suitable as metrics in real-world dehazing. So, we conducted an experiment to find more suitable no-reference evaluation metrics.

Firstly, we select 500 haze-free images from the SOTS dataset randomly; then, for each haze-free image, we synthesize 10 hazy images with scattering coefficients ranging from 0.01 to 0.1 stepped by 0.01. Secondly, eight no-reference evaluation metrics are selected to evaluate the quality of each synthesized hazy image, including FADE [29], BRISQUE [34], DBCNN [35], MUSIQ [36], NIMA [37], NIQE [38], NRQM [39], and PAQ2PIA [40]. Finally, for each metric, 10 average scores, each of them calculated by averaging the metric scores of the 500 synthesized images with the same scattering coefficient, are used to show the relationship between this metric and haze density.

Among these metrics, the scores of FADE, BRISQUE, and NIQE are inversely proportional to the image quality, whereas the scores of DBCNN, MUSIQ, NIMA, NRQM, and PAQ2PIQ are proportional to the image quality. To facilitate the comparison of the metrics, we take the reciprocal of the average scores of the metrics whose score is proportional to the image quality. Then the scores of all metrics are normalized to the same range from 0 to 1. The normalized scores were plotted in the line chart Figure 6. We can observe that the scores of FADE, DBCNN, MUSIQ, and PAQ2PIQ are close to linear variation with the increasing haze density. Therefore, we selected these four metrics as the evaluation criteria for the dehazing effect, as they can better express the density of haze.

4.4. Experiments on Real-World Data

We compared the performance of the proposed method with several state-of-the-art dehazing methods from both quantitative and qualitative perspectives on real-world data.

For quantitative comparison: Since there are no corresponding haze-free images in TEST500. We use the four no-reference metrics selected in Section 4.3 to quantitatively evaluate the dehazing performance of the proposed method and six dehazing methods on TEST500. The experimental results are shown in Table 1. The FADE metric assesses the haze density of the image. The DBCNN primarily evaluates the type and degree of distortion present. MUSIQ endeavors to comprehensively evaluate image quality across various dimensions, including detail, fidelity, and resolution. PaQ2PiQ is specifically designed to focus on the quality of local details within an image. On the PaQ2PiQ metric, our dehazing results score slightly lower than RIDCP, The main reason is that the PaQ2PiQ computes global image quality by local quality inference, so the evaluation accuracy is not stable and sometimes shows better value on distorted data. Our results rank first with the other three metrics, which indicates that the proposed SCL-Dehaze method performs well in maintaining image fidelity, local details, resolution, and overall quality.

For qualitative comparison: Figure 7 shows the comparison results of our method with six state-of-the-art dehazing methods on four real-world hazy images from TEST500. As observed, the dehazing results from FFANet [2], Dehamer [41], and MB-taylormer [42] still exhibit significant residual haze, further confirming the poor generalization ability of supervised methods on real-world hazy images. The unsupervised learning method DAD [16] tends to darken the images and fails to eliminate haze, resulting in color distortion and unsatisfactory halo artifacts. This is because the results obtained from a conventional GAN may have biased color reproduction or even introduce artifacts. PSD [5] yields brighter results due to the effects of prior loss regularization; however, the backbone network was not optimized, which resulted in significant residual haze. Though the method RIDCP [43] is also based on the codebook prior, during the nearest neighbor matching phase, it empirically determines a fixed weight distribution parameter using only 200 images to re-calculate the distances between features and codebook. Therefore, this method has weaker generalization ability. A learnable UDART model with dual-attention residual transformer block structure is employed to adaptive correct latent features and accurately match the features to the codebook. The UDART model gives the proposed algorithm strong dense-haze removal ability. As shown in Figure 7, in the dense-haze areas, the proposed method achieves better haze-free results with higher contrast and color saturation compared to RIDCP. However, on the unrobust PaQ2PiO metric, some haze-free results with low contrast and globally similar saturation sometimes obtain higher evaluation values than those of our results, such as the dehazing results in Figure 7f.

Judging from the experiment results, compared to six state-of-the-art methods, the proposed SCL-Dehaze architecture obtains the most stable quantitative results, also produces the most natural and visually pleasing dehazing results, even under dense haze conditions.

4.5. Ablation Experiments

To further investigate the effectiveness of the proposed SCL-Dehaze, we conducted extensive ablation studies to discuss the impacts of the UDART and HDGWA.

Effectiveness of UDART: To assess the efficacy of the UDART in performing feature corrections (FCs), we conducted training experiments with three variations: (a) without the FC module, (b) with the FC module consisting of STL layers from the Swin transformer, and (c) with our UDART module as the FC. The experimental results presented in Figure 8 reveal that the absence of latent feature correction (a) results in greater haze residue in the recovery outcomes and compromised detail quality. Although using modules composed of STL layers for feature correction (b) is effective, they still leave more haze residue compared to our UDART module (c). The UDART module, as shown in Figure 8, demonstrates minimal haze residue even in dense haze conditions and achieves superior restoration of high-quality details.

To rigorously validate the UDART module’s effectiveness, we performed quantitative testing on TEST500, and the outcomes are presented in Table 2. Table 2 and Figure 8b demonstrate that, although the addition of STL improves the no-reference metric values of compared to methods without FC, the effect on haze removal is minimal, with the haze density metric FADE showing a slight increase instead. The metrics results demonstrate that, with the proposed UDART, the network can achieve the lowest FADE scores, indicating the least visual haze residual, and the highest MUSIQ, DBCNN, and PAQPIQ scores, signifying the best visual detail quality. Therefore, we can draw the conclusion that for feature correction, the proposed UDART possesses a significant effect on the restoration of high-quality dehazing results.

Effectiveness of HDGWA: To obtain the guided weight map that can adaptively adjust skip connections, initially, we use the evaluated haze density value of each image as the global consistent weight. However, experiments revealed that, with this global consistent weight, the dehazing results are not satisfactory. The main reason is that in a degraded hazy image, the haze is generally non-uniform. To obtain more accurate guiding weight, we employ a non-consistent patch-wise guided weight map. For a guided weight map, the weight values are the same in one patch, but in different patches, the weight values are obtained according to the haze density in the corresponding patch. The studies include the following ablated schemes:

1.: No skip connection → V1
2.: Base network (conventional skip connection) → V2
3.: V2 + W (global uniform weights) → V3
4.: V2 + Map (local uniform weights) → V4
5.: V2 + Map (local uniform weights) + Equation (8) → V5

These variants are retrained in the same way as before. The compared results and the metrics evaluation values of these variants are shown in Figure 9 and Table 3.

In the V2 variant, artifacts are introduced into the results when no weights are applied. To address the impact of artifacts, we introduced weight W in the V3 variant to adaptively adjust the skip connection. However, the V3 variant usually generates degradant dehazing images. Acknowledging the non-global consistency of haze, we employed a patch-wise haze density guided weight map to guide the locally consistent weights. Yet, the V4 variant can not mitigate the artifacts either. Upon data analysis, when the scores below 1 are inverted, the V4 variant fails to yield a density guided weight map to accurately represent haze. Hence, we applied Equation (8) to round up the values less than 1. Consequently, the V5 variant can restore the haze-free image with height color fidelity and eliminate the artifacts.

5. Conclusions

In this paper, a semi-supervised codebook learning dehazing method SCL-Dehaze is proposed. The method firstly leverages the pre-training VQGAN to establish the codebook. The UDART module is designed to adaptively correct the latent features. Furthermore, the HDGWA module is devised to optimize multi-scale skip connections according to input haze density. To bridge the gap between synthetic and real-world hazy images, the mean teacher semi-supervised framework is introduced. The comparative experiments show that, on subset TEST500 of the real-world dataset URHI, the proposed model achieves improvements of 0.003, 2.646, and 0.019 over the second-best method for the no-reference metrics FADE, MUSIQ, and DBCNN, respectively. The comparative experiments on real-world hazy image datasets validate that the proposed method achieves significant performance improvements over the state-of-the-art methods in terms of both dehazing quality and generalization capability, especially under thick-haze conditions.

Author Contributions

Conceptualization, T.C. and Q.D.; methodology, T.C. and Q.D.; validation, Q.D. and M.Z.; formal analysis, Q.D. and K.L.; investigation, K.L. and X.J.; resources, T.C. and K.L.; data curation, Q.D.; writing—original draft preparation, T.C. and Q.D.; writing—review and editing, T.C. and Q.D.; visualization, M.Z. and X.J.; supervision, T.C.; project administration, T.C.; funding acquisition, T.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Liaoning Provincial Department of Education project, LJKZZ20220033; the Doctoral Research Foundation of Liaoning Science and Technology Department (2022-BS-220); the Natural Science Foundation of Liaoning (Grant No. 2022-MS-267); and LiaoNing Revitalization Talents Program (Grant No. XLYC2203104).

Data Availability Statement

Data supporting the results of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Guan, F.; Fang, Z.; Zhang, X.; Zhong, H.; Zhang, J.; Huang, H. Using street-view panoramas to model the decision-making complexity of road intersections based on the passing branches during navigation. Comput. Environ. Urban Syst. 2023, 103, 101975. [Google Scholar] [CrossRef]
Qin, X.; Wang, Z.; Bai, Y.; Xie, X.; Jia, H. FFA-Net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11908–11915. [Google Scholar]
Zhao, L.; Zhang, Y.; Cui, Y. An attention encoder-decoder network based on generative adversarial network for remote sensing image dehazing. IEEE Sens. J. 2022, 22, 10890–10900. [Google Scholar] [CrossRef]
Song, T.; Fan, S.; Li, P.; Jin, J.; Jin, G.; Fan, L. Learning an effective transformer for remote sensing satellite image dehazing. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
Chen, Z.; Wang, Y.; Yang, Y.; Liu, D. PSD: Principled synthetic-to-real dehazing guided by physical priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 7180–7189. [Google Scholar]
He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [PubMed]
Chen, C.; Shi, X.; Qin, Y.; Li, X.; Han, X.; Yang, T.; Guo, S. Real-world blind super-resolution via feature matching with implicit high-resolution priors. In Proceedings of the 30th ACM International Conference on Multimedia, Lisbon, Portugal, 10 October 2022; pp. 1329–1338. [Google Scholar]
Zhou, S.; Chan, K.; Li, C.; Loy, C.C. Towards robust blind face restoration with codebook lookup transformer. Adv. Neural Inf. Process. Syst. 2022, 35, 30599–30611. [Google Scholar]
Liu, K.; Jiang, Y.; Choi, I.; Gu, J. Learning image-adaptive codebooks for class-agnostic image restoration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 5373–5383. [Google Scholar]
Zou, W.; Gao, H.; Ye, T.; Chen, L.; Yang, W.; Huang, S.; Chen, H.; Chen, S. VQCNIR: Clearer Night Image Restoration with Vector-Quantized Codebook. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26–27 February 2024; Volume 38, pp. 7873–7881. [Google Scholar]
Wang, L.; Xu, X.; An, S.; Han, B.; Guo, Y. CodeUNet: Autonomous underwater vehicle real visual enhancement via underwater codebook priors. ISPRS J. Photogramm. Remote Sens. 2024, 215, 99–111. [Google Scholar] [CrossRef]
Chen, S.; Mahdizadeh, M.; Yu, C.; Fan, J.; Chen, T. Through the Real World Haze Scenes: Navigating the Synthetic-to-Real Gap in Challenging Image Dehazing. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 7265–7272. [Google Scholar]
Van Den Oord, A.; Vinyals, O. Neural discrete representation learning. Adv. Neural Inf. Process. Syst. 2017, 30, 6306–6315. [Google Scholar]
Wang, Z.; Zhang, J.; Chen, R.; Wang, W.; Luo, P. Restoreformer: High-quality blind face restoration from undegraded key-value pairs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17512–17521. [Google Scholar]
Esser, P.; Rombach, R.; Ommer, B. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 12873–12883. [Google Scholar]
Shao, Y.; Li, L.; Ren, W.; Gao, C.; Sang, N. Domain adaptation for image dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2808–2817. [Google Scholar]
Engin, D.; Genç, A.; Kemal Ekenel, H. Cycle-dehaze: Enhanced cyclegan for single image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 825–833. [Google Scholar]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Yang, Y.; Wang, C.; Liu, R.; Zhang, L.; Guo, X.; Tao, D. Self-augmented unpaired image dehazing via density and depth decomposition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2037–2046. [Google Scholar]
Wang, Z.; Zhao, H.; Peng, J.; Yao, L.; Zhao, K. ODCR: Orthogonal Decoupling Contrastive Regularization for Unpaired Image Dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 25479–25489. [Google Scholar]
Huang, S.; Wang, K.; Liu, H.; Chen, J.; Li, Y. Contrastive semi-supervised learning for underwater image restoration via reliable bank. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 18145–18155. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part II 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 694–711. [Google Scholar]
Gondal, M.W.; Schölkopf, B.; Hirsch, M. The unreasonable effectiveness of texture transfer for single image super-resolution. In Proceedings of the Computer Vision–ECCV 2018 Workshops, Munich, Germany, 8–14 September 2018; Proceedings, Part V 15. Springer: Berlin/Heidelberg, Germany, 2019; pp. 80–97. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Chan, K.C.; Wang, X.; Xu, X.; Gu, J.; Loy, C.C. Glean: Generative latent bank for large-factor image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 14245–14254. [Google Scholar]
Wang, X.; Li, Y.; Zhang, H.; Shan, Y. Towards real-world blind face restoration with generative facial prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 9168–9178. [Google Scholar]
Yang, T.; Ren, P.; Xie, X.; Zhang, L. Gan prior embedded network for blind face restoration in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 672–681. [Google Scholar]
Choi, L.K.; You, J.; Bovik, A.C. Referenceless prediction of perceptual fog density and perceptual image defogging. IEEE Trans. Image Process. 2015, 24, 3888–3901. [Google Scholar] [CrossRef] [PubMed]
Polyak, B.T.; Juditsky, A.B. Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 1992, 30, 838–855. [Google Scholar] [CrossRef]
Agustsson, E.; Timofte, R. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 126–135. [Google Scholar]
Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
Li, B.; Ren, W.; Fu, D.; Tao, D.; Feng, D.; Zeng, W.; Wang, Z. Benchmarking single-image dehazing and beyond. IEEE Trans. Image Process. 2018, 28, 492–505. [Google Scholar] [CrossRef] [PubMed]
Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 2012, 21, 4695–4708. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Ma, K.; Yan, J.; Deng, D.; Wang, Z. Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Trans. Circuits Syst. Video Technol. 2018, 30, 36–47. [Google Scholar] [CrossRef]
Ke, J.; Wang, Q.; Wang, Y.; Milanfar, P.; Yang, F. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 5148–5157. [Google Scholar]
Talebi, H.; Milanfar, P. NIMA: Neural image assessment. IEEE Trans. Image Process. 2018, 27, 3998–4011. [Google Scholar] [CrossRef] [PubMed]
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
Ma, C.; Yang, C.Y.; Yang, X.; Yang, M.H. Learning a no-reference quality metric for single-image super-resolution. Comput. Vis. Image Underst. 2017, 158, 1–16. [Google Scholar] [CrossRef]
Ying, Z.; Niu, H.; Gupta, P.; Mahajan, D.; Ghadiyaram, D.; Bovik, A. From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 3575–3585. [Google Scholar]
Guo, C.L.; Yan, Q.; Anwar, S.; Cong, R.; Ren, W.; Li, C. Image dehazing transformer with transmission-aware 3d position embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5812–5820. [Google Scholar]
Qiu, Y.; Zhang, K.; Wang, C.; Luo, W.; Li, H.; Jin, Z. MB-TaylorFormer: Multi-branch efficient transformer expanded by Taylor formula for image dehazing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 12802–12813. [Google Scholar]
Wu, R.Q.; Duan, Z.P.; Guo, C.L.; Chai, Z.; Li, C. Ridcp: Revitalizing real image dehazing via high-quality codebook priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Paris, France, 1–6 October 2023; pp. 22282–22291. [Google Scholar]

Figure 1. The architecture of the proposed backbone network.

Figure 2. Semi-supervised codebook learning dehazing framework: SCL-Dehaze.

Figure 3. Model structure of VQGAN for stage 1 of pre−training.

Figure 4. Dual-attention residual transformer block (DART).

Figure 5. The guided weight map obtained by HDGWA. The first row shows the real-world hazy image, and the second row displays the guided weight map.

Figure 6. Variation in no-reference image quality assessment (IQA) indicators with haze density.

Figure 7. The dehazing result of the TEST500 dataset. The no-reference metrics FADE and PAQ2PIQ are used to evaluate the quality of the dehazing results quantitatively. Red and green, respectively, represent the best and second best.

Figure 8. Visual results of ablation studies on the UDART module, red represent the best.

Figure 9. The visual results of the ablation studies on the HDGWA module. From (b) to (f): the results of V1, V2, V3, V4, and V5, respectively. Red boxes indicate areas of significant contrast.

Table 1. Quantitative results using NR-IQA metrics on the TEST500 dataset. Red, green, and blue represent the best, second-best, and third-best datasets, respectively. Regarding the metric scores, ↓ means lower is better, and ↑ means higher is better.

Method	Reference	(TIP’15) ↓	MUSIQ(ICCV’21) ↑	DBCNN(TCSVT’20) ↑	PAQ2PIQ(CVPR’20) ↑
Haze	—-	2.470	60.625	0.489	66.679
FFAnet	AAAI’20	1.954	60.238	0.490	67.103
Dehamer	CVPR’22	1.326	62.052	0.527	69.260
MBTaylor	ICCV’23	1.596	60.773	0.507	67.575
PSD	CVPR’21	0.947	59.123	0.440	71.202
DAD	CVPR’20	0.797	54.430	0.349	68.823
RIDCP	CVPR’23	0.564	63.463	0.541	71.961
Ours	—-	0.561	66.109	0.560	71.902

Table 2. Ablation analysis of the TEST500 dataset. Regarding the metric scores, ↓ means lower is better, and ↑ means higher is better, red represent the best.

	w/o FC	FC with STL	FC with UDART
FADE ↓	0.6334	0.664	0.561
MUSIQ ↑	60.625	64.703	66.109
DBCNN ↑	0.489	0.540	0.560
PAQ2PIQ↑	66.679	71.137	71.961

Table 3. Ablation analysis of different skip connection variants on the TEST500 dataset. The ↓ means lower is better, and ↑ means higher is better, red represent the best.

Variants	V1	V2	V3	V4	V5
Addition	w/o	✓	✓	✓	✓
W	w/o	w/o	✓	w/o	w/o
Map	w/o	w/o	w/o	✓	✓
Equation (8)	w/o	w/o	w/o	w/o	✓
FADE ↓	0.583	0.634	0.6379	0.535	0.561
MUSIQ ↑	54.332	64.364	63.846	59.769	66.109
DBCNN ↑	0.363	0.547	0.488	0.510	0.560
PAQ2PIQ ↑	69.290	71.761	70.838	70.965	71.961

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cui, T.; Dai, Q.; Zhang, M.; Li, K.; Ji, X. SCL-Dehaze: Toward Real-World Image Dehazing via Semi-Supervised Codebook Learning. Electronics 2024, 13, 3826. https://doi.org/10.3390/electronics13193826

AMA Style

Cui T, Dai Q, Zhang M, Li K, Ji X. SCL-Dehaze: Toward Real-World Image Dehazing via Semi-Supervised Codebook Learning. Electronics. 2024; 13(19):3826. https://doi.org/10.3390/electronics13193826

Chicago/Turabian Style

Cui, Tong, Qingyue Dai, Meng Zhang, Kairu Li, and Xiaofei Ji. 2024. "SCL-Dehaze: Toward Real-World Image Dehazing via Semi-Supervised Codebook Learning" Electronics 13, no. 19: 3826. https://doi.org/10.3390/electronics13193826

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SCL-Dehaze: Toward Real-World Image Dehazing via Semi-Supervised Codebook Learning

Abstract

1. Introduction

2. Related Work

2.1. Codebook Learning

2.2. Mean Teachers

3. Methods

3.1. Adaptive Dehazing Network with Codebook Learning

3.2. Unite Dual-Attention Residual Transformer

3.3. Haze Density Guided Weight Adaptive Module

3.4. Semi-Supervised Learning Strategy

4. Experiments

4.1. Implementation Details

Datasets

4.2. Implementation Settings

4.3. No-Reference IQA Indicator Assessment Experiments

4.4. Experiments on Real-World Data

4.5. Ablation Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI