PerNet: Progressive and Efficient All-in-One Image-Restoration Lightweight Network

Li, Wentao; Zhou, Guang; Lin, Sen; Tang, Yandong

doi:10.3390/electronics13142817

Open AccessArticle

PerNet: Progressive and Efficient All-in-One Image-Restoration Lightweight Network

¹

State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

School of Automation and Electrical Engineering, Shenyang Ligong University, Shenyang 110159, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(14), 2817; https://doi.org/10.3390/electronics13142817

Submission received: 7 June 2024 / Revised: 13 July 2024 / Accepted: 16 July 2024 / Published: 17 July 2024

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

Download

Browse Figures

Versions Notes

Abstract

:

The existing image-restoration methods are only effective for specific degradation tasks, but the type of image degradation in practical applications is unknown, and mismatch between the model and the actual degradation will lead to performance decline. Attention mechanisms play an important role in image-restoration tasks; however, it is difficult for existing attention mechanisms to effectively utilize the continuous correlation information of image noise. In order to solve these problems, we propose a Progressive and Efficient All-in-one Image Restoration Lightweight Network (PerNet). The network consists of a Plug-and-Play Efficient Local Attention Module (PPELAM). The PPELAM is composed of multiple Efficient Local Attention Units (ELAUs) and PPELAM can effectively use the global information and horizontal and vertical correlation of image degradation features in space, so as to reduce information loss and have a small number of parameters. PerNet is able to learn the degradation properties of images very well, which allows us to reach an advanced level in image-restoration tasks. Experiments show that PerNet has excellent results for typical restoration tasks (image deraining, image dehazing, image desnowing and underwater image enhancement), and the excellent performance of ELAU combined with Transformer in the ablation experiment chapter further proves the high efficiency of ELAU.

Keywords:

all-in-one image-restoration; efficient local attention; image-restoration; lightweight

1. Introduction

The purpose of image-restoration is to enhance image quality and improve image information, thereby improving visual perception and recognition capabilities, and providing better image-processing and -analysis results for various application scenarios. Images captured by cameras are often affected by noise such as snowflakes, rain streaks, fog and underwater blur. These noises can significantly reduce the clarity and quality of the images, thereby affecting the accuracy of downstream tasks such as object detection [1], image classification [2], stereo matching [3] and image recognition [4]. These useless noises not only degrade the image quality but also mislead deep learning models [5,6,7,8], leading to incorrect decisions and analysis.

To address the limitations imposed by image noise on downstream tasks, early researchers employed various traditional methods to remove noise from images. These methods include statistical models [9], total variation [10], least squares [11], nonlinear methods [12] and dictionary-based approaches [13]. Although these traditional methods can achieve image-restoration to a certain extent, they still have drawbacks such as insufficient modeling capability for complex image structures, sensitivity to noise and missing data, high computational complexity and low generalization ability.

In contrast, deep learning methods in image-restoration tasks typically capture high-level features and complex structures better, offering stronger generalization capability and adaptability. In recent years, with the rapid development of deep learning, image-restoration technologies [14] have also made significant progress. However, these methods are limited to handling specific types of image degradation, making it difficult to achieve ideal restoration effects for various degradation types.

Therefore, it is highly necessary to develop a universal network capable of handling multiple types of image degradation. Different types of image degradation possess unique characteristics, and the mapping relationship between these characteristics and the model is often difficult to match. Since most degradations exhibit continuous spatial correlations both horizontally and vertically, the spatial information of these various degradation features is challenging to utilize effectively, making it difficult for the network to fully optimize. Consequently, unified image-restoration technology for different degradation features faces numerous challenges.

Figure 1 illustrates the horizontal and vertical correlation features of rain streaks. This paper focuses on the learning process of rain streak degradation features. The restoration results of our network, obtained by subtracting the rain streak features in Figure 1b from the image features in Figure 1a, is shown in Figure 1d. It is evident from Figure 1b that the spatial information of rain streaks exhibits coherence, presenting a regular continuity and correlation between the pixel points rather than a scattered distribution. We found that this correlation is prevalent across various degradation types. If the network can fully exploit this characteristic of degradation, it can better restore the image. As our network considers the continuous correlation of degradation features in both horizontal and vertical directions, it can predict other types of degradation in the real image. Therefore, compared to the real image in Figure 1c, the image-restoration result in Figure 1d offers a better visual experience. Specifically, although Figure 1c is a really clear image, it is not as visually satisfying as our restoration result in Figure 1d. In addition to restoring rain streaks close to the clear and real image in Figure 1c, our restoration result in Figure 1d also eliminates the blurry degradation present in the real clear image in Figure 1c. It is worth mentioning that in the snowflake (e–h) example shown, our method also predicts other degradation features in the snowflake degradation image. Specifically, the enlarged image block e is a small portion of the label image (e) containing the snowflake degradation, similarly, the enlarged image block f is a small portion of the learned degenerated image (f), the enlarged image block g is a small portion of the clear label image (g) and the enlarged image block h is a small portion of the restored image (h) of image (e). It can be seen that black degradation is still present in the small block g of the clear label image, but this degradation is not preserved in the small block h, suggesting that our network can also eliminate other types of degradation, and that the color and color saturation of the small block h are closer to the original label image (e).

According to the analysis above, we propose a progressive and efficient all-in-one image-restoration lightweight network, PerNet, with the following main contributions:

We design a simple and efficient adaptive progressive network architecture, which has excellent progressive stability and can be easily plugged with any module.
We devise a PPELAM, composed of multiple ELAUs, which can fully exploit the continuous spatial correlation of degradation in both horizontal and vertical directions, thereby achieving a high match with different types of degradation.
Our method shows excellent recovery effects on seven types of image degradation datasets, and our model achieves good lightweight effects.

2. Related Works

Image-restoration refers to the process of recovering the original or near-original image from a damaged or degraded image. In cases of image damage or degradation, various factors such as noise, blur, distortion and degradation may affect the image. The goal of image-restoration is to use algorithms or techniques to minimize or eliminate these adverse factors as much as possible, thereby restoring the quality of the image in terms of clarity, contrast, details, color and other aspects.

2.1. Image-Restoration

Haris et al. [15] introduced Deep Back-Projection Networks for efficient image super-resolution, which constructs multi-level back-projection structures and adopts an end-to-end training strategy to effectively enhance resolution while preserving image details. Zhang et al. [16] presented a Residual Dense Network (RDN), an efficient structure containing dense connections and a residual learning mechanism. RDN enhances feature reuse through dense connections and alleviates training difficulties through residual learning, achieving efficient image super-resolution.

These methods have achieved satisfactory results in image-restoration, but there is a common problem of large number of parameters. To address this problem, a series of lightweight image-repair methods have been developed.

2.2. Lightweight Image-Restoration

Lightweightization has become a mature method that significantly reduces model parameters and computational complexity, enabling models to be deployed on small servers. This method has been widely applied across various domains of deep learning. Fu et al. [17] proposed a network with fewer parameters, shallow depth and simple structure by combining convolutional neural networks with classic pyramid techniques. The network integrates multiscale techniques, recursion and residual learning while removing batch normalization. Hu et al. [18] utilized dilated convolutions and Convolutional Block Attention Module (CBAM) to construct a backbone network, which increases the receptive field and gradually extracts spatial information from local to global scales. Lightweight CBAM is selected to guide rain streak removal in both channel and spatial dimensions. The backbone network consists of five blocks and two convolutional operations, each block comprising a CBAM module and a dilated convolution module for rain streak extraction and removal.

Lightweight operations can indeed significantly reduce the number of model parameters, but there is generally insufficient attention to local information. Attention-based image-repair methods can pay close attention to local specific information, allowing the model to learn specific information well.

2.3. Attention-Based Image-Restoration

Mou et al. [19] proposed a novel network architecture that utilizes windowed attention to mimic the selective focusing mechanism of the human eye. By dynamically adjusting the receptive field, it effectively integrates information from various sources (long sequences, local and global regions, feature dimensions and positional dimensions). The LongIR attention mechanism achieves a balance between efficiency and performance in long-sequence image-restoration to address restoration challenges. Sen et al. [20] introduced two sub-networks with integrated loss functions for the first time. Channel attention combines Squeeze-and-Excitation (SE) operations with residual blocks to fully leverage spatial contextual information for rain streak learning.The Generative Image Inpainting with Contextual Attention model proposed by Yu et al. [21] uses the self-attention mechanism to generate more natural restoration results by learning the contextual relationship of images. On this basis, Wang et al. [22] introduced a multi-scale and cross-attention mechanism to further improve the quality and detail fidelity of image-restoration.

Attention-based image recovery solves the problem of insufficient attention to image-specific information in common methods, but these methods still do not achieve good results for different types of image degradation. In order to solve this problem, a series of all-in-one image-restoration methods are proposed.

2.4. All-in-One Image-Restoration

All-in-one image-restoration refers to integrating multiple image-restoration tasks into a unified framework, handling different types of image degradation problems through a single model or network. This approach aims to address the limitations of existing methods, which are often tailored to specific types of image degradation, thus enhancing the model’s generalization ability and applicability. All-in-one image-restoration methods typically combine various techniques and models to address various issues in images such as noise, blur, distortion, etc., thereby achieving more comprehensive and accurate image-restoration results. Siddiqua et al. [23] utilized learning-based hints to enable a single model to effectively handle multiple image degradation tasks. They designed multiple modules to aggregate multiscale features and adaptively restore various types of degradation efficiently. Mei et al. [24] proposed a reference-based task-adaptive degradation modeling method. By introducing additional external reference images, they achieved adaptive construction of different degradation matrices, enhancing modeling accuracy. They also designed a degradation prior emission mechanism to further bridge the semantic gap between target images and reference images. Chen et al. [25] introduced Neural Degradation Representation to represent the latent features and statistical characteristics of various degradations. Zheng et al. [21] proposed a joint framework capable of simultaneous image denoising and restoration, enabling multi-task learning through a shared encoder-decoder structure. Liu et al. [26] further proposed a unified model that enables knowledge sharing between different tasks, improving the performance of individual tasks.

Our method combines the advantages of the above methods to propose PerNet, which focuses on the attention of local specific degradation information and realizes the learning of different types of degradation, and at the same time achieves the purpose of lightweighting.

3. Method

We will begin by presenting our comprehensive network architecture in Section 3.1, followed by a detailed exploration of our PPELAM in Section 3.2 and finally our loss function and evaluation metrics in Section 3.3.

3.1. Overall Network Architecture

The overall network comprises a progressive structure and the PPELAM, as depicted in Figure 2. To facilitate the calculation, the input image is pre-processed into multiple small pieces. To thoroughly investigate the impact of PPELAM on the restoration task, the internal structure of the network utilizes only one convolutional layer for coarse feature extraction. Subsequently, the features pass through the PPELAM module consisting of multiple ELAUs, followed by another convolutional operation. To prevent network degradation, the features after convolution are combined with the coarsely extracted features to form a residual structure. In order to integrate and compress the spatial information from the earlier hierarchical feature maps to obtain a more global and comprehensive representation of the features appropriate to the outputs of different degradation tasks, they pass through another convolutional layer. Furthermore, the output is element-wise subtracted from the network input, guided by the training to emphasize learning degradation while avoiding learning other features.

The network is concise, efficient and characterized by stable parameter settings, enhancing its stability with its progressive nature. The plug-and-play feature enhances the flexibility of the network architecture, allowing experimentation with various network models. The PPELAM module, being plug-and-play in our network architecture, fully considers the continuous correlation of degradation in both spatial horizontal and vertical directions. Our PerNet can be represented mathematically as follows:

F_{out} = X - F_{de},

(1)

where

F_{out}

represents our final restoration result, X denotes the input image feature containing degradation characteristics and

F_{de}

represents the learned and predicted degradation features of the input image. The mathematical expression for

F_{de}

is as follows:

F_{de} = conv2d ((PReLU (conv2d (x)) + BN (conv2d (PPELAM (PReLU (conv2d (x))))))),

(2)

where conv2d represents a 2D convolution operation, PReLU denotes an activation function. Unlike traditional ones, this activation function allows a small slope for negative inputs, instead of directly zeroing them out. BatchNorm2d (BN) refers to batch normalization, which enhances the model’s generalization capability by normalizing the mean and variance of each input feature, thereby accelerating convergence. PPELAM represents our plug-and-play efficient local attention module. The mathematical expressions for PReLU and BN are as follows:

PReLU (x) = max (0, x) + α min (0, x),

(3)

BN (x) = γ \frac{x - μ}{\sqrt{σ^{2} + ε}} + β,

(4)

where x represents the input feature,

α

denotes learnable parameters, which allow negative inputs to have a small slope. Usually, the

α

is initialized to a small positive value of 0.25. When the input is negative, the output of function PReLU is slightly greater than 0 instead of directly becoming 0, which helps improve the model’s expressive power and generalization ability.

μ

represents the mean of the input feature, while

σ^{2}

represents its variance.

γ

represents a learnable scale factor with an initial value set to 1.

β

represents the learnable offset, and the initial value is set to 0.

ε

is a very small constant to prevent division by zero, and its initial value is set to 0.00001. It is worth noting that

γ

,

β

and

ε

are all set up by the underlying code of PyTorch, and their values are automatically updated as the network is trained, so we just need to use them. The mathematical expressions for the mean

μ

and variance

σ^{2}

of the input feature are as follows:

μ = \frac{1}{N} \sum_{i = 1}^{N} x_{i},

(5)

σ^{2} = \frac{1}{N} {\sum_{i = 1}^{N} (x_{i} - μ)}^{2},

(6)

where N represents the number of input features, and

x_{i}

denotes the i-th value of the input feature.

3.2. Plug-and-Play Efficient Local Attention Module

Figure 3 illustrates the structure of each identical small ELAU within the PPELAM. In order to ensure the lightweight and efficiency of the model, the module first performs two convolutions for coarse feature extraction. The batch size, channel size, height and width are then calculated based on the tensor size, and these four parameters are not calculated sequentially, but only to facilitate the later operations. Here, our feature map is processed by PyTorch, and its shape changes from (B, H, W, C) to (B, C, H, W) of a normal image. The heights and widths of the two tensors are then averaged together separately to obtain two average tensors. Then, the two average tensors are reshaped into two three-dimensional shapes, and the convolution operation is performed using a one-dimensional grouped convolutional layer. One of these two reshaped 3D tensors focuses only on the degraded features at the height of the image (vertically), and the other only with the degenerate features at the width of the image (horizontally). Therefore, our network further enhances the model’s utilization of continuous correlation in the horizontal and vertical directions of the image, and at the same time, the operation of focusing on specific features also enhances the lightweight and efficiency of the model. To maintain consistency in feature distribution across channels, we perform group normalization on these two feature maps separately. Next, to focus the model more on the horizontal and vertical directions of the image, we utilize the Sigmoid activation function to map features into the range of 0 to 1. The processed features are reshaped into four-dimensional shapes for element-wise multiplication. Finally, the two feature maps are multiplied element-wise, multiplied with the coarsely extracted input tensor, and then added to the tensor before coarse extraction to form a residual structure, preventing network degradation. Thus, we obtain the output of the ELAU.

The ELAU is a small unit of PerNet designed to fully exploit the continuous correlations of image degradation in the spatial horizontal and vertical directions. The design intention of this unit is to address the locality of degraded features in image-restoration tasks, as degradation often manifests as continuous changes in the horizontal and vertical directions in actual images. By focusing on these adjacent and correlated regions in the image, the unit can more effectively capture subtle changes and structures in the image, thereby providing more accurate and refined processing for image-restoration tasks. This ELAU enables the network to gain a deeper understanding of the degradation in the image, thus better guiding the restoration process and improving the accuracy and effectiveness of image-restoration. Within the entire image-restoration network, ELAU are used multiple times, adding reliability and stability to the final image recovery results.

The ELAU has four sets of parameters: T, B, S and L. Here, we employ L, with specific distinctions explained in the ablation experiments. The mathematical expression for the PPELAM can be represented as follows:

F_{n} = f (f (\dots f (f (x_{0})))),

(7)

f (x_{0}) = h (x_{0}) \times w (x_{0}) \times conv2d (conv2d (x_{0})) + x_{0},

(8)

h (x_{0}) = {view}_{11} (Sigmoid (GN (conv1d ({view}_{1} ({mean}_{1} (conv2d (conv2d (x_{0})))))))),

(9)

w (x_{0}) = {view}_{22} (Sigmoid (GN (conv1d ({view}_{2} ({mean}_{2} (conv2d (conv2d (x_{0})))))))),

(10)

where

F_{n}

represents the final output of the PPELAM,

x_{0}

denotes the input features,

f (x_{0})

signifies the output of one pass of the ELAU applied to the input

x_{0}

, conv2d denotes two-dimensional convolution,

h (x_{0})

and

w (x_{0})

represent features learned in the spatial vertical and horizontal directions,

{mean}_{1}

and

{mean}_{2}

represent the computation of mean values along the width and height,

{view}_{1}

and

{view}_{2}

represent reshaped features into different shapes, conv1d denotes one-dimensional group convolution, GN represents group normalization, Sigmoid denotes the activation function and

{view}_{11}

and

{view}_{22}

represent feature reshaping along different dimensions, with their computation methods being the same. For function mean, its mathematical expression is as follows:

mean (y) = \frac{1}{n} \sum_{i = 1}^{n} y_{i},

(11)

where y is the input tensor, n is the total number of elements in the tensor and

y_{i}

is the i-th element in the tensor. For functions

{view}_{1}

and

{view}_{2}

, their mathematical expressions are as follows:

{view}_{1} = x . view (B, C \times W, H),

(12)

{view}_{2} = x . view (B, C \times H, W),

(13)

where B represents the batch size, C represents the number of channels, H represents the height, W represents the width and x is the original tensor.

{view}_{1}

and

{view}_{2}

are reshaped tensors and their shapes are

(B, C \times W, H)

and

(B, C \times H, W)

, respectively.

For the GN function, given an input tensor x with shape

(N, C, H, W)

, where N is the batch size, N is the number of channels and H and W are the height and width of the image respectively, for each channel C, the

H \times W

pixel values are divided into G groups, with each group containing

{C_{/}}_{G}

channels. Assuming

x_{n, c, h, w}

is the input tensor x for the n-th sample in the c-th channel, at the h-th row and the w-th column of pixel values. The output GN of the function

y_{n, c, h, w}

is computed as follows.

y_{n, c, h, w} = γ_{c} {\overset{\land}{x}}_{n, c, h, w} + β_{c},

(14)

where

γ_{c}

and

β_{c}

are learnable scale and shift parameters, with

(1, c, 1, 1)

,

γ_{c}

is used for scaling the normalized value,

β_{c}

is used for shifting the normalized value, allowing the network to learn appropriate feature representations.

{\overset{\land}{x}}_{n, c, h, w}

represents the result of normalizing pixels using the computed mean

μ_{c}

and variance

σ_{c}^{2}

for each channel c and each pixel value

(h, w)

. The calculation formula for

{\overset{\land}{x}}_{n, c, h, w}

is as follows:

{\overset{\land}{x}}_{n, c, h, w} = \frac{x_{n, c, h, w} - μ_{c}}{\sqrt{{σ_{c}}^{2} + ε}},

(15)

where

x_{n, c, h, w}

represents the pixel value in the c-th channel, h-th row and w-th column of the n-th sample in the input tensor x.

ε

represents a small constant used to prevent division by zero errors.

σ_{c}^{2}

and

μ_{c}

respectively denote the variance and mean calculated for each channel c and each sample n. The calculation formulas for

σ_{c}^{2}

and

μ_{c}

are as follows:

σ_{c}^{2} = \frac{1}{H W} {\sum_{h = 1}^{H} \sum_{w = 1}^{W} (x_{n, c, h, w} - μ_{c})}^{2},

(16)

μ_{c} = \frac{1}{H W} \sum_{h = 1}^{H} \sum_{w = 1}^{W} x_{n, c, h, w},

(17)

where H and W represent the height and width of the image, respectively.

x_{n, c, h, w}

is the pixel value at the h-th row and w-th column of the c-th channel of the n-th sample in the input tensor x.

μ_{c}

represents the mean calculated for each channel c and each sample n.

In the feature extraction stage, we utilize regular convolution for feature extraction. In the ELAU, one-dimensional grouped convolution is employed to extract features from one-dimensional feature tensors in both the spatial horizontal and vertical directions. The concepts of regular convolution and grouped convolution are illustrated in Figure 4, with a convolutional stride of 1 and no padding applied to the edges. Figure 4a depicts the refinement process of regular convolution, while Figure 4b illustrates a brief overview of regular convolution. On the other hand, Figure 4c demonstrates the brief process of grouped two-dimensional convolution, and Figure 4d presents the brief process of one-dimensional grouped convolution. In Figure 4a, we perform feature extraction using 6, 64 and 64 convolutional kernels of size (3 × 3) in three example convolutional layers, resulting in 6, 64 and 64 feature tensors, respectively. Figure 4b shows the input image before and after the extraction of 6 and 64 feature tensors using 6 (3 × 3) and 64 (3 × 3) convolutional kernels, respectively. In Figure 4c,d, we group the features obtained from one convolutional layer into pairs and perform convolution individually within each group. The difference is in Figure 4d, where the grouped feature is not a two-dimensional tensor, but a one-dimensional vector reshaped by a two-dimensional tensor. At the core of ELAU, our network uses a one-dimensional grouped convolution like Figure 4d. Because the input in this section is the output of multiple ordinary convolutions such as Figure 4a,b, the input in this section is a block containing multiple feature maps.

3.3. Loss Function and Evaluation Metrics

To encourage minimal pixel-wise differences between predicted results and ground truth, conducive to producing clearer and more faithful images, we utilize the

L_{1} loss

function to train the network. This loss function also aids in preserving edge information in the images, which is beneficial for our image-restoration task. The mathematical expression is as follows:

L = \frac{1}{N} \sum_{i = 1}^{N} | I_{i} - G T_{i} |,

(18)

where

I_{i}

represents the predicted value of the i-th image during the training phase of our model, and

G T_{i}

represents the ground truth value of the i-th image.

During the testing phase, we evaluate the performance of our experimental results using Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM). PSNR is a metric used to measure the quality of images or videos and is commonly employed to assess the performance of compression algorithms, image-restoration algorithms and image-processing algorithms. PSNR quantifies the level of distortion in an image, which reflects the similarity between the original image and the processed image. Generally, higher PSNR values indicate better image quality. PSNR is measured in decibels (dB), and values between 30 dB and 50 dB are considered good image quality, while values exceeding 50 dB are indicative of lossless image quality. SSIM is a metric used to measure the similarity between two images, with values ranging from −1 to 1. A value closer to 1 indicates greater similarity between the two images. SSIM considers three aspects of the images: luminance, contrast and structure, thereby better reflecting the characteristics of image quality perceived by the human eye. The mathematical expression for PSNR is as follows:

PSNR = 10 \times {log}_{10} (\frac{r^{2}}{MSE}),

(19)

where r represents the range of image data, and in our case, r is 255. MSE stands for Mean Squared Error, and the formula for Mean Squared Error is as follows:

MSE = \frac{1}{M N} \sum_{i = 1}^{M} \sum_{j = 1}^{N} {(I_{true} (i, j) - I_{test} (i, j))}^{2},

(20)

where M and N respectively denote the height and width of the image,

I_{t r u e} (i, j)

and

I_{test} (i, j)

represent the grayscale values at the i row and j column of the original and test images. For the SSIM, our mathematical expression is as follows:

SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})},

(21)

where x and y represent the restored image and the ground truth image, respectively,

μ_{x}

and

μ_{y}

are the luminance mean values of images x and y,

σ_{x}^{2}

and

σ_{y}^{2}

are the luminance variances of images x and y,

σ_{x y}

is the luminance covariance between images x and y,

C_{1}

and

C_{2}

are constants used to prevent division by zero. The calculations for

μ_{x}

and

μ_{y}

are the same. The formula for

μ_{x}

is as follows:

μ_{x} (i, j) = \frac{1}{N} \sum_{k = - r}^{r} \sum_{l = - r}^{r} x (i + k, j + l),

(22)

where

μ_{x} (i, j)

is the pixel value of

(i, j)

in the output image,

x (i + k, j + l)

is the pixel value at position

(i + k, j + l)

in the input image, N is the number of pixels within the window and r is the radius of the window. The calculations for

σ_{x}^{2}

and

σ_{y}^{2}

are the same. The formula for

σ_{x}^{2}

is as follows:

σ_{x}^{2} (i, j) = μ_{x^{2}} (i, j) - μ_{x}^{2} (i, j) .

(23)

The formula for covariance

σ_{x y}

is as follows:

σ_{x y} = μ_{x y} (i, j) - μ_{x} (i, j) μ_{y} (i, j) .

(24)

The calculation formula for constants

C_{1}

and

C_{2}

is the same, as follows:

C_{1} = {(K_{1} \times R)}^{2},

(25)

C_{2} = {(K_{2} \times R)}^{2} .

(26)

R represents the data range, which is 255 in this case, while constants

K_{1}

and

K_{2}

are used for stability, typically set to 0.01 and 0.03, respectively.

4. Experiments and Analysis

In Section 4.1, we will present the preparation for our experiments. Subsequently, in Section 4.2, subjective evaluations of the model will be conducted for tasks such as rain removal, snow removal, fog removal and underwater enhancement. Objective evaluations for these tasks will be performed in Section 4.3. Lightweighting experiments will be showcased in Section 4.4. Additionally, the results of our ablation experiments will be presented in Section 4.5.

4.1. Experimental Setup

In Section 4.1.1, we will introduce our dataset, while our execution Details will be presented in Section 4.1.2.

4.1.1. Datasets

We used seven public datasets, which are divided into training sets and test sets, as shown in Table 1.

For the Rain200H dataset, the first 1800 pairs of a total of 2000 pairs (degraded and clean) were used as the training dataset, and the remaining 200 pairs of images were used as the test dataset. We divide the Rain200L dataset into the same as the Rain200H dataset. For the Rain800 dataset, we used the first 700 pairs of images with a total of 800 pairs (degraded and clean) as the training dataset and the remaining 100 pairs as the test dataset. For the Snow100K dataset, we used a total of 100,000 pairs of 50,000 pairs of images (degraded and clean) as the training dataset, and the remaining 50,000 pairs of images as the test dataset, which in turn assigned 16,801 pairs, 16,588 pairs and 16,611 pairs according to the division of “L”, “M” and “S”, respectively. For the CSD dataset, we used the first 1000 pairs of images totaling 800 pairs (degraded and clean) as the test dataset and the remaining 7000 pairs of images as the training dataset. For the RSID dataset, we used 900 pairs of a total of 1000 pairs (degraded and clean) as the training dataset and the remaining 100 pairs of images as the test dataset. For the EUVP dataset, we used 11,435 pairs of paired images (degraded and clean) as the training dataset, and the remaining 515 pairs (degenerate and clean images) as the test dataset.

4.1.2. Execution Details

Training Setup and Optimization. During the training phase, we utilized the Adam optimizer to update, adjust and optimize the model parameters. The Adam optimizer, known for its combination of momentum and adaptive learning rate properties, effectively adjusts the learning rate for each parameter, leading to faster convergence to the optimal solution.

Hardware Configuration. Our training was conducted using an NVIDIA 3060 GPU and an Intel i5-12490f CPU. Additionally, the memory and SSD used were sufficient to support the GPU training process.

Training Strategy. In order to speed up the training and improve the generalization ability of the model, we divide the input image into multiple small chunks, and then set the step size. Multiple blocks are processed in each batch, which enhances the diversity of input images, thereby improving the generalization ability of the model. The training process consists of 150 epochs with an initial learning rate set at 0.001, an empirically chosen value that helps to quickly converge to a relatively good solution. At the 80th epoch we reduced the learning rate to 0.0001.

Learning Rate Scheduling. In addition to using a fixed learning rate, we employed a learning rate scheduler to dynamically adjust the learning rate, further enhancing the model’s performance and convergence speed. We utilized the MultiStepLR scheduler, this is an era-based learning rate decay strategy that uses critical periods as adjustment points. Specifically, we reduced the learning rate on the 80th epoch by reducing the current learning rate from the previous value to 0.1 times the previous value, i.e., from the initial learning rate of 0.001 to 0.0001. This exponential decay strategy ensures the learning rate remains appropriate throughout training, gradually decreasing to stabilize the model’s convergence to the optimal solution. By implementing the MultiStepLR scheduler, we managed the learning rate changes effectively, preventing instability from a high learning rate and slow training from a low learning rate. This approach significantly improved the training efficiency and performance, resulting in superior evaluation outcomes.

4.2. Subjective Evaluation

In Section 4.2.1, we will analyze the subjective evaluation of the task of deraining. Section 4.2.2 will be dedicated to the subjective evaluation of the task of desnowing. The subjective evaluation of the task of dehazing will be covered in Section 4.2.3. Lastly, the subjective analysis of the underwater image enhancement task will be presented in Section 4.2.4.

4.2.1. Deraining Task

Datasets. For the deraining task, we utilized three public datasets: Rain200L [27], Rain200H [27] and Rain800 [28].

Comparison Models. For the deraining task, we compared the PerNet model against several state-of-the-art methods. The comparative methods on the Rain200L, Rain200H and Rain800 datasets include DSC [33], DiG-CoM [34], DerainCycleGAN [35], SPD-Net [36], NLCL [37], Syn2Real [38], SIRR [39], JRGB [40], DDN [41], Air-Net [42], DID-MDN [43], RESCAN [44], RainDiffusion [45], PreNet [46] and MSPFN [47].

Results. The test results on Rain200H are shown in Figure 5, Rain200L results in Figure 6 and Rain800 results in Figure 7. We randomly selected some recovery results, which clearly show that our algorithm exhibits excellent restoration performance across all three datasets. Particularly on the Rain200H dataset, PerNet demonstrates superior image-restoration results, maintaining color saturation close to the original image and achieving high-quality recovery. In contrast, DSC performs poorly on Rain200H, while DiG-CoM and DerainCycleGAN also show subpar results. DiG-CoM tends to produce darker images overall, while DerainCycleGAN results in brighter images overall. On the Rain200L and Rain800 datasets, recent advanced algorithms generally perform well. However, the experimental results on Rain200L indicate that DSC still performs poorly. In DiG-CoM, SPD-Net and NLCL, black streaks appear near the fish’s head region, and DSC introduces colored artifacts on the cloud near the penguin’s head. Our algorithm, on the other hand, achieves visually pleasing results without such artifacts. Nevertheless, all algorithms struggle to completely remove the rain streaks near the fish’s head. The Rain800 dataset shows that earlier algorithms such as DSC, DiG-CoM, DerainCycleGAN and SPD-Net fail to effectively remove rain streaks from the sky and lawn areas. Notably, in the park image, the original rain-free image’s upper right corner features white misty blobs, which are removed by both our algorithm and MSPFN’s algorithm. While this removal improves visual quality, it does not benefit the PSNR values. The RESCAN algorithm on Rain800 introduces scattered misty blobs in the sky of the park image and white streaks in the air of the city image. In the climbing image, all algorithms retain large white streaks in the annotated region, even though the rain-free label image does not contain such large streaks. DSC and DiG-CoM even retain two such large streaks.

4.2.2. Desnowing Task

Datasets. For the desnowing task, we conducted experiments using two public datasets: Snow100K-L [29] and CSD [30].

Comparison Models. For the desnowing task, we compared the PerNet model against a range of state-of-the-art methods. The comparative methods on the Snow100K and CSD datasets include CycleGAN [48], RESCAN [44], DesnowNet [29], ALL in one [49], JSTASR [50], HDCW-Net [30], DDMSNet [51], MPRNet [52], TransWeather [53], SMGARN [54], TKL [55], WeatherDiff128 [56], MSP-Former [57], Uformer [58], WeatherDiff64 [56], Restormer [59], SnowDiff128 [56], NAFNet [60], DGUNet [61], SnowDiff64 [56] and GridFormer-S [62].

Results. In this phase, we performed desnowing experiments on the Snow100K-L and CSD datasets. The experimental results on the Snow100K dataset are shown in Figure 8. and the results on the CSD dataset are shown in Figure 9. According to the experimental results, on the Snow100K dataset, the methods CycleGAN, RESCAN, DesnowNet, ALL in one, JSTASR and HDCW-Net showed insufficient snow-restoral capabilities, leaving noticeable large areas of snow. In the image of the tree branches, apart from our algorithm, other methods retained noticeable snowflakes on the branches. In the skiing image, all algorithms exhibited a large snow patch on the palm, with our algorithm showing the least severe snow patch. In the ocean image, all methods struggled to distinguish between white waves and white snowflakes, resulting in thin local snow streaks. In the lighthouse image, both SnowDiff64 and our algorithm removed the snowflakes cleanly, leaving only a small patch of snow at the base of the lighthouse. Various methods produced small black spots in the restored results of the lighthouse image. On the CSD dataset, all methods showed less than ideal snow-restoral performance in the sky of the road image, retaining block-like areas resembling localized haze. In the city image of the CSD dataset, Restormer, SnowDiff64, MSP-Former, SMGARN and our algorithm achieved the best snow removal in the sky, while other methods left small patches of light snow. Notably, in the arch image of the CSD dataset, CycleGAN, RESCAN and ALL in one introduced a small black shadow in the upper left corner.

4.2.3. Dehazing Task

Datasets. For the dehazing task, we conducted experiments using the RSID [31] dataset.

Comparative Models. For the defogging task, we compared the PerNet model against a range of state-of-the-art methods. The comparative methods on the RSID dataset include Cycle-SNSPGAN [63], ZID [64], FCTF-Net [65], FFA-Net [66], TCN [67], EVPM [68], IDeRs [69], GRS-HTM [70], SDCP [71], UHD [72], DeHHamer [73], Dehaze-cGAN [74], STD [75], Zero-restore [76] and ROP [77].

Results. In this phase, we performed defogging experiments on the RSID dataset. The experimental results on the RSID dataset are shown in Figure 10. The results demonstrate that our PerNet model achieves very good performance in the defogging task. While it may not completely restore the original colors, it reaches a high standard of quality. First Image: Cycle-SNSPGAN introduces a purple color bias, FFA-Net introduces a light blue color bias, ZID introduces a brown color bias and DeHHamer, IDeRs and TCN result in an overall whitish appearance. Our method’s restored result is the closest to the original clear label image, although the overall color is slightly darker. Second Image: Cycle-SNSPGAN and FFA-Net exhibit severe purple tinting, and the haze-restoral effect is not satisfactory. ZID, FCTF-Net and TCN-EVPM perform poorly in haze removal. DeHHamer shows an overall cyan color bias.

4.2.4. Underwater Enhancement Task

Datasets. For the underwater enhancement task, we conducted experiments using the EUVP [32] public dataset.

Comparison models. For the underwater enhancement task, our PerNet model was compared with a range of state-of-the-art methods. The comparative methods on the EUVP dataset include PRWNet [78], ShallowUW [79], UWCNN [80], FUnIE-GAN ShallowUW [32], UT-UIE [81], WaterNet [82], RAUNE-Net [83], CPDM [84], SyreaNet [85], SGUIE-Net [86] and Cycle-GAN [87].

Results. In this phase, we conducted underwater enhancement experiments on the EUVP dataset. The experimental results on the EUVP dataset are shown in Figure 11. The results indicate that our algorithm achieves the best overall enhancement performance. First Image: PRWNet, ShallowUW and FUnIE-GAN all exhibit a red color bias. UWFormer shows good color restoration but is slightly blurry. The results of other methods are similar to ours, with our method producing the best balance of clarity and color accuracy. Second Image: The enhancement results of all methods are relatively ideal, with UWFormer displaying colored water droplets at the corner of the fish’s mouth and the tip of its back, which is not present in other methods. Third Image: PRWNet, ShallowUW and UT-UIE exhibit a slight white tint in their overall color. Other methods, including ours, achieve satisfactory enhancement results. Fourth Image: FUnIE-GAN’s color is the closest to the clear label image but does not achieve the best enhancement result. UT-UIE, RAUNE-Net, SyreaNet, Cycle-GAN and our method all produce very clear results, sometimes appearing even cleaner than the ground truth image. This phenomenon can be attributed to the removal of some pseudo-degradation, which, although beneficial in subjective evaluation, may not be ideal from an objective evaluation perspective.

4.3. Objective Evaluation

In Section 4.3.1, we will present the objective evaluation of the deraining task. In Section 4.3.2, we will discuss the objective evaluation of the desnowing task. In Section 4.3.3, we will delve into the objective evaluation of the dehazing task. Finally, in Section 4.3.4, we will elaborate on the objective evaluation of the underwater enhancement task.

4.3.1. Deraining Task

We conducted a quantitative comparison on the Rain200L [27], Rain200H [27] and Rain800 [28] public datasets, selecting 200 pairs of images from Rain200L and Rain200H each as test samples and 100 pairs of images from the Rain800 dataset as test samples. The compared methods include DSC [33], DiG-CoM [34], DerainCycleGAN [35], SPD-Net [36], NLCL [37], Syn2Real [38], SIRR [39], JRGB [40], DDN [41], Air-Net [42], DID-MDN [43], RESCAN [44], RainDiffusion [45], PreNet [46] and MSPFN [47].

As shown in Table 2, early traditional deraining methods, such as DSC, exhibit relatively low values in deraining tasks. This is mainly due to limitations in utilizing image priors and interpolation methods, which fail to learn detailed rain streaks as effectively as deep learning approaches. These methods also struggle to distinguish between rain streaks and non-rain streak details, leading to suboptimal deraining results that fall short of the performance achieved by deep deraining methods. From the numerical results, it is evident that our algorithm achieves the highest SSIM values across all three datasets. Additionally, our method achieves the highest PSNR values on Rain200H and Rain200L datasets, while RainDiffusion achieves the highest PSNR value on the Rain800 dataset. On the Rain200H dataset, PerNet has an improved PSNR value by 2.6 percent and SSIM value by 0.6 percent compared with MSPFN. On the Rain200L dataset, PerNet improves the PSNR value by 0.75 percent and the SSIM value by 1 percent compared with MSPFN. On the Rain800 dataset, PerNet improved the SSIM value by 1.6 percent compared to RainDiffusion.

4.3.2. Desnowing Task

In the snow-restoral task phase, we used three public datasets: Snow100K-S [29], Snow100K-L [29] and CSD [30]. We selected 16,801 pairs of images from Snow100K-L, 16,611 pairs from Snow100K-S and 2000 pairs from the CSD dataset as test samples. The compared methods include CycleGAN [48], RESCAN [44], DesnowNet [29], ALL in one [49], JSTASR [50], HDCW-Net [30], DDMSNet [51], MPRNet [52], TransWeather [53], SMGARN [54], TKL [55], WeatherDiff128 [56], MSP-Former [57], Uformer [58], WeatherDiff64 [56], Restormer [59], SnowDiff128 [56], NAFNet [60], DGUNet [61], SnowDiff64 [56] and GridFormer-S [62].

As shown in Table 3, our PerNet demonstrates outstanding performance on the CSD, Snow100K-S and Snow100K-L datasets. Our method achieves the highest PSNR and SSIM values across all three test sets. It is worth noting that other methods also show their respective strengths. On the Snow100K-S test set, the second-best PSNR and SSIM values are achieved by GridFormer-S and DGUNet, respectively. On the Snow100K-L test set, Uformer and NAFNet exhibit the second-best PSNR and SSIM values, respectively. On the CSD test set, the second-best PSNR and SSIM values are achieved by Restormer and SnowDiff64, respectively. On the Snow100K-S dataset, PerNet improves the PSNR value by 0.82 percent compared to GridFormer-S and improves the SSIM value by 0.3 percent compared to DGUNet. On the Snow100K-L dataset, PerNet improves the PSNR value by 1 percent compared to Uformer and improves the SSIM value by 1.4 percent compared to NAFNet. On the CSD dataset, PerNet improves the PSNR value by 1.2 percent compared to Restormer and improves the SSIM value by 0.3 percent compared to SnowDiff64.

4.3.3. Dehazing Task

For the dehazing task, we utilized the RSID [31] dataset, from which we selected 100 pairs of images denoted as R100. The compared methods include Cycle-SNSPGAN [63], ZID [64], FCTF-Net [65], FFA-Net [66], TCN [67], EVPM [68], IDeRs [69], GRS-HTM [70], SDCP [71], UHD [72], DeHHamer [73], Dehaze-cGAN [74], STD [75], Zero-restore [76] and ROP [77].

As shown in Table 4, our PerNet achieved the highest PSNR and SSIM values on the RSID test set compared to the other methods. Our PerNet exhibited excellent performance in the dehazing task. The method with the second-best PSNR and SSIM values was UHD. On the RSID dataset, PerNet improves the PSNR value by 0.5 percent and the SSIM value by 1.3 percent compared to UHD.

4.3.4. Underwater Enhancement Task

For the underwater enhancement task, our testing dataset comprised 515 pairs of test samples from the EUVP [32] dataset. The comparison methods included PRWNet [78], ShallowUW [79], UWCNN [80], FUnIE-GAN [32], UT-UIE [81], WaterNet [82], RAUNE-Net [83], CPDM [84], SyreaNet [85], SGUIE-Net [86] and Cycle-GAN [87].

As shown in Table 5, our PerNet achieved excellent restoration performance on the EUVP dataset, with the highest SSIM value of 0.913 and the second-best PSNR value of 25.592. The method with the best PSNR value was RAUNE-Net, while CPDM exhibited the second-best SSIM results. On the EUVP dataset, PerNet improves the SSIM value by 1.3 percent compared to CPDM.

4.4. Lightweight Experiment

Our PerNet leverages an ELAU, which endows it with outstanding lightweight capabilities, effectively reducing the model’s parameter count. We compared PerNet with several networks to demonstrate its advantage in parameter count. The comparative results are presented in Figure 12, where “Param” represents the complexity of the network model. From Figure 12, it can be observed that our network exhibits strong lightweight characteristics. With only 1.29 million parameters, PerNet has the lowest parameter count among the image-restoration methods listed in Figure 12.

4.5. Ablation Experiment

To validate the impact of ELAU on our network’s performance and explore the most suitable settings for image-restoration tasks, we conducted ablation experiments by varying the four parameter groups (T, B, S and L) of ELAU along with the number of ELAU. We performed these experiments using the Rain800 dataset, where red indicates the best values and blue indicates the next-second values. The experimental results are presented in Table 6 and Table 7, where Table 6 focuses on PSNR and Table 7 on SSIM. According to the experimental results, when the number of ELAU is 8, parameter group T exhibited the best PSNR and SSIM values, reaching 25.362 and 0.852, respectively. On the other hand, parameter group L showed the poorest PSNR and SSIM values, at 24.992 and 0.843, respectively. When the number of ELAU was increased to 16, parameter group T achieved PSNR and SSIM values of 25.444 and 0.856, respectively, indicating the worst performance. Conversely, parameter group L achieved PSNR and SSIM values of 25.993 and 0.889, respectively, demonstrating the best performance. As the number of ELAU increased gradually, parameter group T consistently exhibited the worst PSNR and SSIM values, while parameter group L consistently demonstrated the best PSNR and SSIM values. This suggests that parameter group T is most suitable for lightweight small-scale image-restoration networks, while parameter groups B and S are suitable for non-large-scale deep image-restoration networks and parameter group L is suitable for extra-large-scale image-restoration networks.

To explore the advantages of ELAU, we separately added the ELAU, Convolutional Block Attention Module (CBAM) [88] and Squeeze-and-Excitation (SE) [89] module to our network and then compared their combination with the Spare Transformer (ST) [90]. We set the parameters of the ELAU group to L, with 16 ELAU modules. The results are shown in Table 8. Red values indicate the best values, while blue values indicate the second-best values. From the table, it can be seen that when adding attention modules individually, ELAU achieved the best performance, with PSNR and SSIM values of 25.993 and 0.889, respectively, both being the highest. The second-best performance was observed when adding CBAM alone, with PSNR and SSIM values of 25.675 and 0.856, respectively. When combined with the ST module, the combination of ELAU and ST showed the best performance, with PSNR and SSIM values of 26.291 and 0.912, respectively. The combination of CBAM and ST exhibited the second-best performance, with PSNR and SSIM values of 26.186 and 0.908, respectively. Overall, it can be seen that ELAU performed the best in image-restoration tasks, followed by CBAM, while SE performed the worst. This indicates that ELAU has the potential to replace CBAM in certain domains.

5. Conclusions

This paper introduces a versatile lightweight image-restoration network called PerNet, designed to effectively balance efficiency and accuracy in image-restoration tasks. The network leverages an efficient local attention mechanism, thoroughly exploring the continuous correlations in both horizontal and vertical spatial dimensions of images. To better adapt to different types of image degradation, a PPELAM module is designed, which effectively matches the model to various types of degradation. The innovation of PerNet lies in combining the efficient local attention mechanism with a progressive mode, allowing the network to accurately capture image details while maintaining a lightweight structure. Specifically, the efficient local attention mechanism significantly enhances the network’s performance in handling complex scenes, while the progressive mode refines features layer by layer to gradually restore image details. Furthermore, the introduction of the PPELAM module enables PerNet to highly match different types of image degradation, further enhancing the network’s applicability and performance in practical scenarios. To validate PerNet’s performance, we conducted numerous ablation experiments. The results demonstrate that integrating PPELAM with a Transformer yields significantly better restoration effects compared to other methods, proving the efficiency and applicability of PPELAM. Especially in addressing complex image degradation issues, PerNet exhibits outstanding performance, showing excellent restoration results in tasks such as de-raining, de-snowing, de-hazing and underwater enhancement. Overall, PerNet effectively addresses the balance between efficiency and accuracy in image-restoration tasks. By combining the advantages of an efficient local attention mechanism and progressive processing, it offers a feasible and efficient solution for image-restoration.

Author Contributions

Conceptualization, W.L.; methodology, G.Z.; software, S.L.; validation, W.L. and S.L.; formal analysis, W.L.; investigation, Y.T.; resources, S.L.; data curation, G.Z.; writing—original draft preparation, W.L.; writing—review and editing, G.Z.; visualization, W.L.; supervision, S.L.; project administration, Y.T.; funding acquisition, Y.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Major Program of National Natural Science Foundation of China (Grant Number: 61991413).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset Rain200H, Rain200L and Rain800 are open datasets and can be downloaded at https://github.com/nnUyi/DerainZoo/blob/master/DerainDatasets.md (accessed on 1 May 2024). The dataset Snow100K are open datasets and can be downloaded at https://sites.google.com/view/yunfuliu/desnownet (accessed on 1 May 2024). The dataset CSD are open datasets and can be downloaded at https://ccncuedutw-my.sharepoint.com/:u:/g/personal/104501531_cc_ncu_edu_tw/EfCooq0sZxxNkB7F8HgCyKwB-sJQtVE59_Gpb9soatYi5A?e=5NjDhb (accessed on 1 May 2024). The dataset RSID are open datasets and can be downloaded at https://github.com/Shan-rs/DCI-Net (accessed on 1 May 2024). The dataset EUVP are open datasets and can be downloaded at http://irvlab.cs.umn.edu/resources/euvp-dataset (accessed on 1 May 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclatures

DSC	Discriminative Sparse Coding
DiG-CoM	Directional Gradient, Constraints based Model
SPD-Net	Structure-Preserving Deraining Network
NLCL	Non-Local Contrastive Learning
SIRR	Single Image Rain Removal
JRGB	Joint Rain Generation and removal for Both the real and synthetic image
DDN	Deep Detail Network
Air-Net	Auxiliary image reconstruction Network
DID-MDN	DensIty-aware De-raining using a Multi-stream Dense Network
RESCAN	RecurrEnt Squeeze-and-excitation Context Aggregation Network
PreNet	Progressive image deraining Networks
MSPFN	Multi-Scale Progressive Fusion Network
JSTASR	Joint Size and Transparency-Aware Snow Removal
HDCW-Net	Hierarchical Dual-tree Complex Wavelet representation Network
DDMSNet	Deep Dense Multi-Scale Network
MPRNet	Multi-stage Progressive image-restoration Net
SMGARN	Snow Mask Guided Adaptive Residual Network
TKL	Two-stage Knowledge Learning
MSP-Former	Multi-Scale Projection transFormer
Uformer	U-shaped transformer
NAFNet	Nonlinear Activation Free Network
DGUNet	Deep Generalized Unfolding Networks
Cycle-SNSPGAN	Cycle Spectral Normalized Soft likelihood estimation Patch GAN
ZID	Zero-shot Image Dehazing
FCTF-Net	First-Coarse-Then-Fine Network
FFA-Net	Feature Fusion Attention Network
TCN	Triple-Convolutional Network
EVPM	dEhazing Values Prior Model
IDeRs	Iterative Dehazing method for single Remote sensing image
GRS-HTM	Ground Radiance Suppressed Haze Thickness Map
SDCP	Sphere model improved Dark Channel Prior
UHD	Ultra-High-Definition
DeHamer	DeHazing transformer
STD	Structure layer according To the Distribution
Zero-restore	Zero-shot single image-restoration
ROP	Rank-One Prior
PRWNet	Progressively Refine Wavelet Network
ShallowUW	Shallow UnderWater
UWCNN	UnderWater image enhancement Convolutional Neural Network
FunIE-GAN	Fast underwater Image Enhancement Generative Adversarial Network
UT-UIE	U-shape Transformer for Underwater Image Enhancement
Water-Net	underWater image enhancement Network
RAUNE-Net	Residual and Attention-driven Underwater eNhancEment Network
CPDM	Content-Preserving Diffusion Model
SyreaNet	Synthetic and real images Network
SGUIE-Net	Semantic attention Guided Underwater Image Enhancement Network
Cycle-GAN	Cycle-consistent Generative Adversarial Networks
CSD	Comprehensive Snow Dataset
RSID	Remote Sensing Image Dataset
EUVP	Enhancing Underwater Visual Perception

References

Liu, Q.; Liu, Y.; Lin, D. Revolutionizing Target Detection in Intelligent Traffic Systems: YOLOv8-SnakeVision. Electronics 2023, 12, 4970. [Google Scholar] [CrossRef]
Zhou, X.; Duan, Y.; Ding, R.; Wang, Q.; Wang, Q.; Qin, J.; Liu, H. Bit-Weight Adjustment for Bridging Uniform and Non-Uniform Quantization to Build Efficient Image Classifiers. Electronics 2023, 12, 5043. [Google Scholar] [CrossRef]
Hirschmuller, H.; Scharstein, D. Evaluation of cost functions for stereo matching. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8. [Google Scholar]
Hu, H.; Zhang, Z.; Xie, Z.; Lin, S. Local relation networks for image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3464–3473. [Google Scholar]
Li, P.; Tian, J.; Tang, Y.; Wang, G.; Wu, C. Model-based deep network for single image deraining. IEEE Access 2020, 8, 14036–14047. [Google Scholar] [CrossRef]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Guo, Y.; Liu, Y.; Oerlemans, A.; Lao, S.; Wu, S.; Lew, M.S. Deep learning for visual understanding: A review. Neurocomputing 2016, 187, 27–48. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Zhao, D.; Xiong, R.; Ma, S.; Gao, W. Image-restoration using joint statistical modeling in a space-transform domain. IEEE Trans. Circuits Syst. Video Technol. 2014, 24, 915–928. [Google Scholar] [CrossRef]
Chambolle, A.; Lions, P.L. Image recovery via total variation minimization and related problems. Numer. Math. 1997, 76, 167–188. [Google Scholar] [CrossRef]
Podilchuk, C.I.; Mammone, R.J. Image recovery by convex projections using a least-squares constraint. JOSA A 1990, 7, 517–521. [Google Scholar] [CrossRef]
Chen, Y.; Pock, T. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective Image-restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1256–1272. [Google Scholar] [CrossRef]
Liu, Q.; Wang, S.; Ying, L.; Peng, X.; Zhu, Y.; Liang, D. Adaptive dictionary learning in sparse gradient domain for image recovery. IEEE Trans. Image Process. 2013, 22, 4652–4663. [Google Scholar] [CrossRef]
Yu, H.; Yuan, X.; Jiang, R.; Feng, H.; Liu, J.; Li, Z. Feature Reduction Networks: A Convolution Neural Network-Based Approach to Enhance Image Dehazing. Electronics 2023, 12, 4984. [Google Scholar] [CrossRef]
Haris, M.; Shakhnarovich, G.; Ukita, N. Deep back-projection networks for super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1664–1673. [Google Scholar]
Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2472–2481. [Google Scholar]
Fu, X.; Liang, B.; Huang, Y.; Ding, X.; Paisley, J. Lightweight pyramid networks for image deraining. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 1794–1807. [Google Scholar] [CrossRef] [PubMed]
Hu, M.; Yang, J.; Ling, N.; Liu, Y.; Fan, J. Lightweight single image deraining algorithm incorporating visual saliency. IET Image Process. 2022, 16, 3190–3200. [Google Scholar] [CrossRef]
Mou, C.; Zhang, J.; Fan, X.; Liu, H.; Wang, R. COLA-Net: Collaborative attention network for Image-restoration. IEEE Trans. Multimed. 2021, 24, 1366–1377. [Google Scholar] [CrossRef]
Deng, S.; Wei, M.; Wang, J.; Feng, Y.; Liang, L.; Xie, H.; Wang, F.L.; Wang, M. Detail-recovery image deraining via context aggregation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 14560–14569. [Google Scholar]
Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Huang, T.S. Generative image inpainting with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5505–5514. [Google Scholar]
Wang, Y.; Tao, X.; Qi, X.; Shen, X.; Jia, J. Image inpainting via generative multi-column convolutional neural networks. Adv. Neural Inf. Process. Syst. 2018, 329–338. [Google Scholar]
Siddiqua, M.; Belhaouari, S.B.; Akhter, N.; Zameer, A.; Khurshid, J. MACGAN: An all-in-one Image-restoration under adverse conditions using multidomain attention-based conditional GAN. IEEE Access 2023, 11, 70482–70502. [Google Scholar] [CrossRef]
Mei, Y.; Fan, Y.; Zhang, Y.; Yu, J.; Zhou, Y.; Liu, D.; Fu, Y.; Huang, T.S.; Shi, H. Pyramid attention network for image-restoration. Int. J. Comput. Vis. 2023, 131, 3207–3225. [Google Scholar] [CrossRef]
Chen, S.; Ye, T.; Liu, Y.; Chen, E. Dual-former: Hybrid self-attention transformer for efficient image restoration. Digit. Signal Process. 2024, 149, 104485. [Google Scholar] [CrossRef]
Liu, G.; Reda, F.A.; Shih, K.J.; Wang, T.C.; Tao, A.; Catanzaro, B. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 85–100. [Google Scholar]
Yang, W.; Tan, R.T.; Feng, J.; Liu, J.; Guo, Z.; Yan, S. Deep joint rain detection and removal from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1357–1366. [Google Scholar]
Zhang, H.; Sindagi, V.; Patel, V.M. Image de-raining using a conditional generative adversarial network. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 3943–3956. [Google Scholar] [CrossRef]
Liu, Y.F.; Jaw, D.W.; Huang, S.C.; Hwang, J.N. Desnownet: Context-aware deep network for snow removal. IEEE Trans. Image Process. 2018, 27, 3064–3073. [Google Scholar] [CrossRef] [PubMed]
Chen, W.T.; Fang, H.Y.; Hsieh, C.L.; Tsai, C.C.; Chen, I.; Ding, J.J.; Kuo, S.Y. All snow removed: Single image desnowing algorithm using hierarchical dual-tree complex wavelet representation and contradict channel loss. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 4196–4205. [Google Scholar]
Zhang, L.; Wang, S. Dense haze removal based on dynamic collaborative inference learning for remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5631016. [Google Scholar] [CrossRef]
Islam, M.J.; Xia, Y.; Sattar, J. Fast underwater image enhancement for improved visual perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
Luo, Y.; Xu, Y.; Ji, H. Removing rain from a single image via discriminative sparse coding. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3397–3405. [Google Scholar]
Ran, W.; Yang, Y.; Lu, H. Single image rain removal boosting via directional gradient. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, 6–10 July 2020; pp. 1–6. [Google Scholar]
Wei, Y.; Zhang, Z.; Wang, Y.; Xu, M.; Yang, Y.; Yan, S.; Wang, M. Deraincyclegan: Rain attentive cyclegan for single image deraining and rainmaking. IEEE Trans. Image Process. 2021, 30, 4788–4801. [Google Scholar] [CrossRef] [PubMed]
Yi, Q.; Li, J.; Dai, Q.; Fang, F.; Zhang, G.; Zeng, T. Structure-preserving deraining with residue channel prior guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 4238–4247. [Google Scholar]
Ye, Y.; Yu, C.; Chang, Y.; Zhu, L.; Zhao, X.L.; Yan, L.; Tian, Y. Unsupervised deraining: Where contrastive learning meets self-similarity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 5821–5830. [Google Scholar]
Yasarla, R.; Sindagi, V.A.; Patel, V.M. Syn2real transfer learning for image deraining using gaussian processes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2726–2736. [Google Scholar]
Wang, H.; Yue, Z.; Xie, Q.; Zhao, Q.; Zheng, Y.; Meng, D. From rain generation to rain removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 14791–14801. [Google Scholar]
Ye, Y.; Chang, Y.; Zhou, H.; Yan, L. Closing the loop: Joint rain generation and removal via disentangled image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 2053–2062. [Google Scholar]
Fu, X.; Huang, J.; Zeng, D.; Huang, Y.; Ding, X.; Paisley, J. Removing rain from single images via a deep detail network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3855–3863. [Google Scholar]
Gui, D.; Song, Q.; Song, B.; Li, H.; Wang, M.; Min, X.; Li, A. AIR-Net: A novel multi-task learning method with auxiliary image reconstruction for predicting EGFR mutation status on CT images of NSCLC patients. Comput. Biol. Med. 2022, 141, 105157. [Google Scholar] [CrossRef]
Zhang, H.; Patel, V.M. Density-aware single image de-raining using a multi-stream dense network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 695–704. [Google Scholar]
Li, X.; Wu, J.; Lin, Z.; Liu, H.; Zha, H. Recurrent squeeze-and-excitation context aggregation net for single image deraining. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 254–269. [Google Scholar]
Wei, M.; Shen, Y.; Wang, Y.; Xie, H.; Qin, J.; Wang, F.L. Raindiffusion: When unsupervised learning meets diffusion models for real-world image deraining. arXiv 2023, arXiv:2301.09430. [Google Scholar]
Ren, D.; Zuo, W.; Hu, Q.; Zhu, P.; Meng, D. Progressive image deraining networks: A better and simpler baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3937–3946. [Google Scholar]
Jiang, K.; Wang, Z.; Yi, P.; Chen, C.; Huang, B.; Luo, Y.; Ma, J.; Jiang, J. Multi-scale progressive fusion network for single image deraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 8346–8355. [Google Scholar]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Li, R.; Tan, R.T.; Cheong, L.F. All in one bad weather removal using architectural search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 3175–3185. [Google Scholar]
Chen, W.T.; Fang, H.Y.; Ding, J.J.; Tsai, C.C.; Kuo, S.Y. JSTASR: Joint size and transparency-aware snow removal algorithm based on modified partial convolution and veiling effect removal. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXI 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 754–770. [Google Scholar]
Zhang, K.; Li, R.; Yu, Y.; Luo, W.; Li, C. Deep dense multi-scale network for snow removal using semantic and depth priors. IEEE Trans. Image Process. 2021, 30, 7419–7431. [Google Scholar] [CrossRef] [PubMed]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H.; Shao, L. Multi-stage progressive image-restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 14821–14831. [Google Scholar]
Valanarasu, J.M.J.; Yasarla, R.; Patel, V.M. Transweather: Transformer-based restoration of images degraded by adverse weather conditions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2353–2363. [Google Scholar]
Cheng, B.; Li, J.; Chen, Y.; Zeng, T. Snow mask guided adaptive residual network for image snow removal. Comput. Vis. Image Underst. 2023, 236, 103819. [Google Scholar] [CrossRef]
Chen, W.T.; Huang, Z.K.; Tsai, C.C.; Yang, H.H.; Ding, J.J.; Kuo, S.Y. Learning multiple adverse weather removal via two-stage knowledge learning and multi-contrastive regularization: Toward a unified model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17653–17662. [Google Scholar]
Özdenizci, O.; Legenstein, R. Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10346–10357. [Google Scholar] [CrossRef]
Chen, S.; Ye, T.; Liu, Y.; Liao, T.; Jiang, J.; Chen, E.; Chen, P. Msp-former: Multi-scale projection transformer for single image desnowing. In Proceedings of the ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A general u-shaped transformer for image-restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17683–17693. [Google Scholar]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar]
Chen, L.; Chu, X.; Zhang, X.; Sun, J. Simple baselines for image-restoration. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 17–33. [Google Scholar]
Mou, C.; Wang, Q.; Zhang, J. Deep generalized unfolding networks for image-restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17399–17410. [Google Scholar]
Wang, T.; Zhang, K.; Shao, Z.; Luo, W.; Stenger, B.; Lu, T.; Kim, T.K.; Liu, W.; Li, H. Gridformer: Residual dense transformer with grid structure for image restoration in adverse weather conditions. arXiv 2023, arXiv:2305.17863. [Google Scholar] [CrossRef]
Wang, Y.; Yan, X.; Guan, D.; Wei, M.; Chen, Y.; Zhang, X.P.; Li, J. Cycle-snspgan: Towards real-world image dehazing via cycle spectral normalized soft likelihood estimation patch gan. IEEE Trans. Intell. Transp. Syst. 2022, 23, 20368–20382. [Google Scholar] [CrossRef]
Li, B.; Gou, Y.; Liu, J.Z.; Zhu, H.; Zhou, J.T.; Peng, X. Zero-shot image dehazing. IEEE Trans. Image Process. 2020, 29, 8457–8466. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Chen, X. A coarse-to-fine two-stage attentive network for haze removal of remote sensing images. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1751–1755. [Google Scholar] [CrossRef]
Qin, X.; Wang, Z.; Bai, Y.; Xie, X.; Jia, H. FFA-Net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11908–11915. [Google Scholar]
Shin, J.; Park, H.; Paik, J. Region-based dehazing via dual-supervised triple-convolutional network. IEEE Trans. Multimed. 2021, 24, 245–260. [Google Scholar] [CrossRef]
Han, J.; Zhang, S.; Fan, N.; Ye, Z. Local patchwise minimal and maximal values prior for single optical remote sensing image dehazing. Inf. Sci. 2022, 606, 173–193. [Google Scholar] [CrossRef]
Xu, L.; Zhao, D.; Yan, Y.; Kwong, S.; Chen, J.; Duan, L.Y. IDeRs: Iterative dehazing method for single remote sensing image. Inf. Sci. 2019, 489, 50–62. [Google Scholar] [CrossRef]
Liu, Q.; Gao, X.; He, L.; Lu, W. Haze removal for a single visible remote sensing image. Signal Process. 2017, 137, 33–43. [Google Scholar] [CrossRef]
Li, J.; Hu, Q.; Ai, M. Haze and thin cloud removal via sphere model improved dark channel prior. IEEE Geosci. Remote Sens. Lett. 2018, 16, 472–476. [Google Scholar] [CrossRef]
Zheng, Z.; Ren, W.; Cao, X.; Hu, X.; Wang, T.; Song, F.; Jia, X. Ultra-high-definition image dehazing via multi-guided bilateral learning. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 16180–16189. [Google Scholar]
Guo, C.L.; Yan, Q.; Anwar, S.; Cong, R.; Ren, W.; Li, C. Image dehazing transformer with transmission-aware 3d position embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5812–5820. [Google Scholar]
Li, R.; Pan, J.; Li, Z.; Tang, J. Single image dehazing via conditional generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8202–8211. [Google Scholar]
Mi, Z.; Li, Y.; Jin, J.; Liang, Z.; Fu, X. A generalized enhancement framework for hazy images with complex illumination. IEEE Geosci. Remote Sens. Lett. 2021, 19, 3079456. [Google Scholar] [CrossRef]
Kar, A.; Dhara, S.K.; Sen, D.; Biswas, P.K. Zero-shot single image-restoration through controlled perturbation of koschmieder’s model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 16205–16215. [Google Scholar]
Liu, J.; Liu, R.W.; Sun, J.; Zeng, T. Rank-one prior: Real-time scene recovery. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 8845–8860. [Google Scholar] [CrossRef]
Huo, F.; Li, B.; Zhu, X. Efficient wavelet boost learning-based multi-stage progressive refinement network for underwater image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1944–1952. [Google Scholar]
Naik, A.; Swarnakar, A.; Mittal, K. Shallow-uwnet: Compressed model for underwater image enhancement (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; Volume 35, pp. 15853–15854. [Google Scholar]
Li, C.; Anwar, S.; Porikli, F. Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recognit. 2020, 98, 107038. [Google Scholar] [CrossRef]
Peng, L.; Zhu, C.; Bian, L. U-shape transformer for underwater image enhancement. IEEE Trans. Image Process. 2023, 29, 4376–4389. [Google Scholar]
Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 2019, 29, 4376–4389. [Google Scholar] [CrossRef] [PubMed]
Peng, W.; Zhou, C.; Hu, R.; Cao, J.; Liu, Y. RAUNE-Net: A Residual and Attention-Driven Underwater Image Enhancement Method. arXiv 2023, arXiv:2311.00246. [Google Scholar]
Shi, X.; Wang, Y.G. CPDM: Content-Preserving Diffusion Model for Underwater Image Enhancement. arXiv 2024, arXiv:2401.15649. [Google Scholar]
Wen, J.; Cui, J.; Zhao, Z.; Yan, R.; Gao, Z.; Dou, L.; Chen, B.M. Syreanet: A physically guided underwater image enhancement framework integrating synthetic and real images. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 5177–5183. [Google Scholar]
Qi, Q.; Li, K.; Zheng, H.; Gao, X.; Hou, G.; Sun, K. SGUIE-Net: Semantic attention guided underwater image enhancement with multi-scale perception. IEEE Trans. Image Process. 2022, 31, 6816–6830. [Google Scholar] [CrossRef]
Li, C.; Guo, J.; Guo, C. Emerging from water: Underwater image color correction based on weakly supervised color transfer. IEEE Signal Process. Lett. 2018, 25, 323–327. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Chen, X.; Li, H.; Li, M.; Pan, J. Learning a sparse transformer network for effective image deraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 5896–5905. [Google Scholar]

Figure 1. (a) Image with rain streaks, (b) Extracted rain degradation features, (c) Clear image, (d) Restored result from the image with rain streaks. (e) Image with snowflakes, (f) Extracted snowflake degradation features, (g) Clear image, (h) Restored result from the image with snowflakes.

Figure 2. Schematic diagram of the overall network. The output of the formula for the important stages is marked in red font in the figure, corresponding to the formula to be introduced below.

Figure 3. Schematic diagram of the ELAU. The output of the formula for the important stages is marked in red font in the figure, corresponding to the formula to be introduced below.

Figure 4. Schematic diagram of the grouped convolution. (a,b) depict standard convolution. (c,d) illustrate grouped convolution. The same color represents features within the same group. In (a,b), all features are of one color, indicating no grouping. In (c,d), the features are grouped in pairs, so every two feature maps share the same color.

Figure 5. Restoration results on Rain200H dataset. The colored boxes in the figure represent some local information in the image. These local details are magnified to clearly display the image-restoration effects of different algorithms.

Figure 6. Restoration results on Rain200L dataset. The colored boxes in the figure represent some local information in the image. These local details are magnified to clearly display the image-restoration effects of different algorithms.

Figure 7. Restoration results on Rain800 dataset. The colored boxes in the figure represent some local information in the image. These local details are magnified to clearly display the image-restoration effects of different algorithms.

Figure 8. Restoration results on Snow100K-L dataset.

Figure 9. Restoration results on CSD dataset.

Figure 10. Restoration results on RSID dataset. The colored boxes in the figure represent some local information in the image. These local details are magnified to clearly display the image-restoration effects of different algorithms.

Figure 11. Restoration results on EUVP dataset.

Figure 12. Comparison of the number of parameters of different methods.

Table 1. The seven datasets are divided into two parts according to training and testing.

Datasets	Training Set/Pairs	Test Set/Pairs
Rain200H [27]	1800	200
Rain200L [27]	1800	200
Rain800 [28]	700	100
Snow100K [29]	50,000	50,000
CSD [30]	7000	1000
RSID [31]	900	100
EUVP [32]	11435	515

Table 2. Average PSNR and SSIM values for the deraining task. The arrows in the table indicate that higher PSNR and SSIM values correspond to better performance. Red and blue values represent the best and second-best results, respectively.

Methods	Rain200L		Rain200H		Rain800
Methods	PSNR↑	SSIM↑	PSNR↑	SSIM↑	PSNR↑	SSIM↑
DSC [33]	27.163	0.866	14.735	0.382	14.935	0.468
DiG-CoM [34]	30.782	0.854	19.332	0.767	22.535	0.833
DerainCycleGAN [35]	31.491	0.936	24.321	0.842	24.293	0.859
SPD-Net [36]	31.591	0.919	26.071	0.857	24.372	0.861
NLCL [37]	31.741	0.935	22.312	0.728	24.461	0.821
Syn2Real [38]	34.391	0.965	25.761	0.837	23.741	0.799
SIRR [39]	34.471	0.969	26.551	0.846	24.361	0.859
JRGB [40]	34.512	0.967	24.621	0.849	24.621	0.828
DDN [41]	34.683	0.976	26.053	0.806	24.234	0.468
Air-Net [42]	34.901	0.969	25.482	0.829	23.771	0.833
DID-MDN [43]	35.401	0.961	25.612	0.854	21.891	0.795
RESCAN [44]	36.094	0.970	26.751	0.835	24.332	0.823
RainDiffusion [45]	36.851	0.972	26.021	0.862	26.491	0.875
PReNet [46]	37.802	0.866	14.735	0.382	14.935	0.468
MSPFN [47]	38.581	0.983	29.361	0.903	23.332	0.803
PerNet	39.591	0.989	29.582	0.912	25.993	0.889

Table 3. Average PSNR and SSIM values for the desnowing task. The arrows in the table indicate that higher PSNR and SSIM values correspond to better performance. Red and blue values represent the best and second-best results, respectively.

Methods	Snow100K-S		Snow100K-L		CSD
Methods	PSNR↑	SSIM↑	PSNR↑	SSIM↑	PSNR↑	SSIM↑
CycleGAN [48]	28.513	0.902	23.596	0.883	20.981	0.801
RESCAN [44]	31.512	0.903	26.080	0.811	22.031	0.812
DesnowNet [29]	32.332	0.950	27.173	0.898	20.131	0.812
ALL in one [49]	31.231	0.923	28.331	0.882	26.312	0.873
JSTASR [50]	31.401	0.901	25.321	0.808	27.961	0.883
HDCW-Net [30]	31.542	0.952	27.236	0.886	29.061	0.910
DDMSNet [51]	34.342	0.995	28.851	0.877	30.201	0.923
MPRNet [52]	35.872	0.962	31.023	0.913	33.981	0.972
TransWeather [53]	32.512	0.934	29.312	0.888	31.761	0.932
SMGARN [54]	33.854	0.950	29.312	0.890	31.931	0.952
TKL [55]	35.213	0.963	31.001	0.919	33.891	0.963
WeatherDiff128 [56]	35.023	0.952	29.582	0.849	33.463	0.968
MSP-Former [57]	35.421	0.936	30.312	0.913	33.751	0.961
Uformer [58]	35.512	0.963	31.301	0.923	33.801	0.961
WeatherDiff64 [56]	35.831	0.957	30.092	0.904	33.631	0.962
Restormer [59]	36.081	0.959	30.281	0.912	35.431	0.972
SnowDiff128 [56]	36.092	0.955	30.283	0.900	35.134	0.974
NAFNet [60]	36.123	0.970	31.263	0.924	35.132	0.973
DGUNet [61]	36.312	0.971	31.204	0.922	34.741	0.973
SnowDiff64 [56]	36.591	0.963	30.431	0.915	35.231	0.976
GridFormer-S [62]	36.681	0.960	30.782	0.917	33.903	0.963
PerNet	36.982	0.974	31.623	0.937	35.861	0.979

Table 4. Average PSNR and SSIM values for the dehazing task. The arrows in the table indicate that higher PSNR and SSIM values correspond to better performance. Red and blue values represent the best and second-best results, respectively.

Methods	R100
Methods	PSNR↑	SSIM↑
Cycle-SNSPGAN [63]	18.344	0.729
ZID [64]	18.992	0.727
FCTF-Net [65]	19.306	0.856
FFA-Net [66]	24.052	0.899
TCN [67]	14.208	0.606
EVPM [68]	15.579	0.689
IDeRs [69]	13.604	0.644
GRS-HTM [70]	14.800	0.519
SDCP [71]	16.055	0.691
UHD [72]	26.659	0.923
DeHamer [73]	23.752	0.899
Dehaze-cGAN [74]	18.703	0.743
STD [75]	16.258	0.559
Zero-restore [76]	16.648	0.717
ROP [77]	15.575	0.750
PerNet	26.794	0.935

Table 5. Average PSNR and SSIM values for underwater enhancement task. The arrows in the table indicate that higher PSNR and SSIM values correspond to better performance. Red and blue values represent the best and second-best results, respectively.

Methods	EUVP(515)
Methods	PSNR↑	SSIM↑
PRWNet [78]	25.441	0.843
ShallowUW [79]	24.551	0.852
UWCNN [80]	17.725	0.704
FunIE-GAN [32]	24.077	0.794
UT-UIE [81]	25.214	0.813
Water-Net [82]	25.285	0.833
RAUNE-Net [83]	26.331	0.845
CPDM [84]	23.243	0.901
SyreaNet [85]	17.721	0.743
SGUIE-Net [86]	19.187	0.760
Cycle-GAN [87]	17.963	0.709
PerNet	25.592	0.913

Table 6. Ablation experiments on the Rain800 dataset regarding the parameters T, B, S, L and the number of ELAU modules, evaluated in terms of PSNR. Red and blue values represent the best and second-best results, respectively.

	ELAU = 8	ELAU = 16	ELAU = 24	ELAU = 32
T	25.362	25.444	25.597	25.662
B	25.271	25.685	25.768	25.832
S	25.021	25.791	25.862	25.883
L	24.992	25.993	26.012	26.241

Table 7. Ablation experiments on the Rain800 dataset regarding the parameters T, B, S, L and the number of ELAU, evaluated in terms of SSIM. Red and blue values represent the best and second-best results, respectively.

	ELAU = 8	ELAU = 16	ELAU = 24	ELAU = 32
T	0.852	0.856	0.862	0.868
B	0.848	0.869	0.879	0.887
S	0.844	0.873	0.884	0.894
L	0.843	0.889	0.893	0.899

Table 8. Ablation experiments of ELAU, SE and CBAM on the Rain800 dataset. Red and blue values represent the best and second-best results, respectively.

	PSNR↑	SSIM↑
ELAU	25.993	0.889
SE	25.586	0.852
CBAM	25.675	0.856
ELAU+ST	26.291	0.912
SE+ST	26.021	0.901
CBAM+ST	26.186	0.908

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, W.; Zhou, G.; Lin, S.; Tang, Y. PerNet: Progressive and Efficient All-in-One Image-Restoration Lightweight Network. Electronics 2024, 13, 2817. https://doi.org/10.3390/electronics13142817

AMA Style

Li W, Zhou G, Lin S, Tang Y. PerNet: Progressive and Efficient All-in-One Image-Restoration Lightweight Network. Electronics. 2024; 13(14):2817. https://doi.org/10.3390/electronics13142817

Chicago/Turabian Style

Li, Wentao, Guang Zhou, Sen Lin, and Yandong Tang. 2024. "PerNet: Progressive and Efficient All-in-One Image-Restoration Lightweight Network" Electronics 13, no. 14: 2817. https://doi.org/10.3390/electronics13142817

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PerNet: Progressive and Efficient All-in-One Image-Restoration Lightweight Network

Abstract

1. Introduction

2. Related Works

2.1. Image-Restoration

2.2. Lightweight Image-Restoration

2.3. Attention-Based Image-Restoration

2.4. All-in-One Image-Restoration

3. Method

3.1. Overall Network Architecture

3.2. Plug-and-Play Efficient Local Attention Module

3.3. Loss Function and Evaluation Metrics

4. Experiments and Analysis

4.1. Experimental Setup

4.1.1. Datasets

4.1.2. Execution Details

4.2. Subjective Evaluation

4.2.1. Deraining Task

4.2.2. Desnowing Task

4.2.3. Dehazing Task

4.2.4. Underwater Enhancement Task

4.3. Objective Evaluation

4.3.1. Deraining Task

4.3.2. Desnowing Task

4.3.3. Dehazing Task

4.3.4. Underwater Enhancement Task

4.4. Lightweight Experiment

4.5. Ablation Experiment

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Nomenclatures

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI