Semantic-Guided Iterative Detail Fusion Network for Single-Image Deraining

School of Information Engineering, Chang’an University, Xi’an 710064, China
Shandong Hi-Speed Group Co., Ltd., Innovation Research Institute, Jinan 271039, China
Author to whom correspondence should be addressed.
Electronics 2024, 13(18), 3634;
Submission received: 18 July 2024 / Revised: 2 September 2024 / Accepted: 9 September 2024 / Published: 12 September 2024


Existing approaches for image deraining often rely on synthetic or unpaired real-world rainy datasets, leading to sub-optimal generalization ability when processing the complex and diverse real-world rain degradation. To address these challenges, we propose a novel iterative semantic-guided detail fusion model with implicit neural representations (INR-ISDF). This approach addresses the challenges of complex solution domain variations, reducing the usual negative impacts found in these situations. Firstly, the input rainy images are processed through implicit neural representations (INRs) to obtain normalized images. Residual calculations are then used to assess the illumination inconsistency caused by rain degradation, thereby enabling an accurate identification of the degradation locations. Subsequently, the location information is incorporated into the detail branch of the dual-branch architecture, while the normalized images obtained from the INR are used to enhance semantic processing. Finally, we use semantic clues to iteratively guide the progressive fusion of details to achieve improved image processing results. To tackle the partial correspondence between real rain images and the given ground truth, we propose a two-stage training strategy that utilizes adjustments in the semantic loss function coefficients and phased freezing of the detail branch to prevent potential overfitting issues. Extensive experiments verify the effectiveness of our proposed method in eliminating the degradation in real-world rainy images.

1. Introduction

Images captured in outdoor rainy conditions often suffer from various forms of degradation, such as raindrops, rain streaks, and fog. These degradations pose significant obstacles to subsequent image processing tasks [1,2,3]. Convolutional neural networks (CNNs) have significantly advanced the field of a broad spectrum of image processing tasks [4], notably including image deraining [5,6,7,8,9], ushering in an era marked by unprecedented efficiency, speed, and automation.
Due to the potential presence of multi-scale degradation in images, some approaches employ cross-scale and fusion methods [10,11] to capture and utilize features of different scales. The Laplacian pyramid is also widely applied to explore multi-scale features [12,13]. However, convolution-based approaches frequently face challenges in effectively modeling the complex interactions among widely distributed but interconnected degradations. This limitation is primarily due to the constrained receptive field of convolutional kernels, which hampers their capacity to capture long-range dependencies [14]. Recent studies [15,16] have employed deeper network architectures or operations similar to dilated convolutions to expand the receptive field. Nevertheless, these approaches often only add to the network’s complexity without effectively establishing cross-scale long-range feature connections, which can increase the risk of overfitting [17].
To overcome these limitations and better manage multitasking recovery, transformers, which are capable of establishing long-range dependencies, have shown remarkable results [18,19,20,21,22]. In the context of multitask rain removal, structural search [23] has been employed to identify potentially optimal model architectures for various types of degradation. However, pre-designed limited structures often struggle to handle the diverse types of degradation effectively. Other studies have leveraged uncertainty [20,24] to enhance the identification of raindrop locations in rainy images. For instance, a sparse sampling transformer with uncertainty-driven ranking [20] can selectively focus on key positions in the image. These methods do not directly address the fitting of degradation locations and lack a specific focus on restoring the original semantic information in the image, leading to suboptimal performance when dealing with variable forms of degradation. Additionally, obtaining pairwise images from real scenes for training is challenging in these approaches. Consequently, most researchers rely on synthetic rain maps to facilitate training. The advantage of this approach is the ease of acquiring fully matched pairwise images for loss computation, enabling straightforward pixel-by-pixel comparisons. However, it has notable drawbacks:
  • The degradation forms, shapes, and distributions of synthetic rain maps are significantly less diverse and complex than those found in real rain maps. As a result, networks trained on synthetic data often exhibit reduced robustness when faced with actual rain conditions.
  • Pixel-by-pixel comparisons between the output image and ground truth tend to cause overfitting, preventing the network from effectively learning the degradation patterns and semantic information inherent in real scenes.
To address the limitations associated with overly simplistic degradation patterns in synthetic maps, Zhang et al. [25] developed the paired dataset JRSRD-real, which consists of real rain maps to mitigate the inaccuracies introduced by artificial rain lines and raindrops. This approach is designed to improve the network’s generalization capability and enhance its performance in real-world scenarios. However, when comparing synthetic maps with real images, differences in lighting, shadows, and local details, arising from temporal disparities between the images, are inevitable. These discrepancies can introduce representation deviations.
To address the issues of incomplete matching in real images and domain deviation caused by various degradations, we propose a iterative semantic-guided detail fusion model named INR-ISDF. The description of our algorithmic process is as follows: Firstly, we leverage implicit neural representation (INR) to obtain a normalized output, enhancing the model’s capability to handle diverse degradations encountered in practical scenarios. Next, we assess the disparity between the input image and the neurally fitted image, serving as a crucial reference for the spatial region-enhanced attention mask, which in turn guides the input weights of the detail branch more effectively. Subsequently, we perform semantic and detail extraction at three scales, applying semantic-aware loss functions to regulate the performance of the semantic extraction process. This approach encourage the semantic branch to isolate and extract semantic information free from degradation, comparing its outputs with those obtained from real images using an identical semantic extraction framework. During the fusion phase, iterative semantic information guides the progressive integration of details, thereby ensuring the recovery quality of image details. Finally, a unified decoder is employed to obtain multi-scale residuals of the restored image, with these multi-scale processing results contributing to the loss constraints.
The contributions of this paper are as follows:
We utilize neural representations to obtain normalized degraded images and measure the gap between the fitted and original images to derive a degradation position indication matrix that guides detail extraction.
A semantic loss function is computed using a specialized semantic information extraction branch, which is designed to better capture partially obscured semantic information within the image content.
An iterative semantic-guided detail fusion module that progressively introduces details guided by the inherent information within the image itself, facilitating detail integration.
We present an effective training strategy for handling imperfectly matched real images, leveraging the semantic loss function and freezing the detail branch to prevent overfitting issues that may arise from pixel-wise comparisons.

2. Related Work

2.1. Single Image Deraining

Single-image deraining is both a significant and challenging problem. Early traditional algorithms [26,27] addressed rain-induced image degradation by leveraging prior knowledge. However, these priors often relied on specific assumptions, limiting the ability of traditional methods to handle complex real-world scenarios [17,28,29].
In recent years, learning-based methods have achieved superior results compared with traditional algorithms for single-image rain removal. Recognizing the uncertainty in raindrop positions within real rain images, Shao et al. [24] incorporated uncertainty modeling to enhance raindrop removal. In the same year, Quan et al. [23] presented an attention-based framework that simultaneously addresses rain streaks and raindrops. To adapt to the differences between these two types of degradation, Quan et al. employed an adaptive architecture search to find the optimal structure. Chen et al. [20] leveraged uncertainty-driven sparse sampling transformers to concurrently address raindrop and rain line artifacts. Sparse sampling, a technique that selectively attends to critical locations within the image, not only reduces the network’s parameters but also enhances focus on areas prone to degradation. Meanwhile, Chen et al. [30] incorporated a learnable top-k selection operator, which retains the most salient features within the transformer’s query, enabling superior feature aggregation. Recently, Chen et al. [22] introduced a multi-scale, end-to-end transformer architecture that harnesses the power of multi-scale representations through a closed-loop bidirectional operation. This approach optimizes the utilization of information across various scales, effectively capturing degradation patterns. However, a commonality among these methods lies in their primary focus on simulating degradation and targeting uncertain degradation locations, often overlooking the restoration of stable semantic information.
In this paper, we explore the unexplored potential of utilizing the transformer’s inherent flexibility in representing multi-scale features to extract and harness semantic information. We propose a novel approach that marries the degraded features with semantic cues, aiming to enhance the overall performance of rain removal techniques while preserving crucial semantic details.

2.2. Transformer and Attention

Inspired by the remarkable achievements of Transformer in natural language processing (NLP) and advanced visual tasks [31,32,33], Transformer has also gained widespread adoption in the field of image restoration [20,30,34] due to its flexibility and its ability to serve as a communication tool that attends to information across spatial dimensions. Among the various approaches for image deraining, techniques such as information sparsification [20], extraction of the top-k most salient features [30], and end-to-end multi-scale collaborative representation [22] have emerged as promising directions. These methods prioritize retaining the most crucial information while minimizing the number of parameters required, aiming to achieve a balance between performance and efficiency. Typically, weights are optimized through learning procedures, and the required parameters are selectively chosen through sorting mechanisms. On this basis, we utilize the brightness variation differences induced by implicit neural representations to derive pixel-level masks. This approach enables precise sorting and facilitates the acquisition of pixel-level masks that serve as learning references during the normalization process, ultimately enhancing the effectiveness of image deraining.

2.3. Neural Representation for Image Restoration

Implicit neural representations, a novel and robust technology, represent continuous signals using coordinate-based multi-layer perceptrons (MLPs). In image processing tasks, they are utilized to describe images and have been widely applied in areas such as image compression [35], 3D image tasks [36,37], and video processing [38,39]. Recently, in image restoration, Chen et al. [22] employed implicit neural representations to improve the continuous function representation of common rain degradation effects. Furthermore, Yang et al. [40] utilized the controllable fitting capabilities of implicit neural representations to address the challenge of low-light image enhancement. Building on these advancements, our model leverages the normalization characteristics of neural representations in terms of brightness [40] to identify potential attention regions and extract semantic information from these regions.

3. Proposed Method

3.1. Architecture

The structure of our INR-ISDF is depicted in Figure 1. Given a naturally rain-degraded image as input, the first step involves performing implicit neural representation, computing the difference and score between the normalized output and the real rainy image to obtain a degradation location mask. In the second step, a dual-branch structure is employed to separately extract detail and semantic information, with the semantic information guiding the progressive integration of detail information for comprehensive feature fusion. The third step involves decoding the features and connecting them with the residual of the input rainy image to produce the final clean image.

3.2. Mask Based on Implicit Neural Representations

Motivation. Images captured under genuine rainy conditions inherently exhibit a mixture of various degradations, which are particularly characterized by differing luminance distributions. As shown in Figure 2, this phenomenon is evident in the luminance channel of rainy images, displaying distinct distribution patterns corresponding to different types of degradations. The distribution positions of these diverse degradations tend to approximate randomness, presenting a challenge for processing within a unified framework. Drawing inspiration from Yang et al. [40], our experimental results reveal that utilizing implicit neural representations (INRs) not only normalizes luminance deviations across image sets but also harmonizes luminance variations within different degradation regions of individual images. Consequently, we have developed a modified version of INR (illustrated in Figure 1) as a preliminary normalization strategy, aimed at addressing luminance inconsistencies arising from multiple degradation sources.
We perform preliminary feature extraction on the input degraded image I D R H × W × C to obtain a feature map I F R H × W × C , where H and W denote the image resolution. The coordinate information of the image is represented as X R H × W × 2 , with 2 indicating the horizontal and vertical coordinates. Leveraging both features and coordinate information to represent the degraded image D, as depicted in the neural representation module of Figure 1, we fuse the coordinates and features through a multi-layer perceptron (MLP) to output the image, formulated as follows:
I I N R [ i , j ] = M L P ( I F [ i , j ] , X [ i , j ] ) ,
where [ i , j ] represents the pixel position, and I I N R [ i , j ] is the fitted RGB value of the degraded image at that position.
Spatial Region Enhancement Attention Mask. Distinct from [40], we further exploit the results of the implicit neural representation by translating the implicit brightness suppression process into an explicit spatial distribution, thereby guiding the model to acquire precise location masks of degradations. We compute D i s [ i , j ] = | I I N R [ i , j ] I D [ i , j ] | , representing the absolute difference between the normalized result and the original degraded image. To derive the attention mask, we employ a nonlinear scoring method, where a predefined hyperparameter γ is used to select positions with large discrepancies, assigning a value of 1 to the mask at those locations. Specifically, the calculation is as follows:
S c o r e [ i , j ] = 1 2 σ exp D i s [ i , j ] σ ,
m a s k [ i , j ] = 1 , S c o r e [ i , j ] > γ 0.7 , o t h e r w i s e ,
where σ represents a learnable variance parameter and S c o r e denotes the computed discrepancy score. m a s k is subsequently derived from the S c o r e , serving as a spatial regional attention mask that facilitates the network to focus more intently on potential degraded areas. The visualization of the intermediate attention m a s k outputs during the testing process is presented in Figure 3. It can be observed that the degradation of raindrops and rain streaks attains higher attention.

3.3. Iterative Semantic-Guided Detail Fusion Module

As depicted in Figure 1, the proposed module features a dual-path structure, comprising concurrent semantic and detail branches, both adhering to a consistent multi-scale framework. The semantic branch ingests the I I N R R H × W × C , which has undergone neural representation-based luminance normalization, while the detail branch processes the original input image I D R H × W × C , enhanced with the m a s k R H × W × C to emphasize intricate features:
l a t e n t s [ i , j ] = b r a n c h s ( I I N R [ i , j ] ) ,
l a t e n t d [ i , j ] = b r a n c h d ( I D [ i , j ] m a s k [ i , j ] ) .
To effectively integrate the features derived from the semantic and detail branches, namely l a t e n t s and l a t e n t d , we devise an iterative process guided by semantic information, incrementally incorporating a greater proportion of detail features in each iteration. This iterative fusion process is illustrated in the Figure 1.
During the first iteration, the output of the semantic branch serves as a guiding signal. For the semantic guidance l s i at the i t h iteration, we compute a channel attention map a s i = C A ( l s i ) , which subsequently modulates the detail features, yielding a weighted contribution:
l d a i = r a t i o i · ( a s i · l d i ) ,
where r a t i o i denotes the scaling factor for the current iteration. The original semantic guidance l s i is then concatenated with the weighted detail features l d a i ; and a standard convolutional module, utilizing a combination of convolutional and related operations (including ReLU activation and a final sigmoid activation), to produce the updated semantic guidance for the next iteration:
l s i + 1 = sigmoid conv ReLU conv l s i l d a i .
In this manner, we progressively enhance the utilization of detail information by incrementally increasing the r a t i o i attributed to the detail branch. Furthermore, we conduct experiments investigating the influence of iteration count on the fusion outcome, with the detailed results being presented in Section 4.

3.4. Loss Function

Our INR-ISDF calculates semantic loss and fitting loss to leverage the controllable fitting capability of INR. The overall loss function of the network is defined as follows:
L total = L PSNR ( I GT , I pre ) + λ 1 L INR + λ 2 L semantic ,
where I GT represents the ground truth and I pre represents the predicted result corresponding to the ground truth. L INR calculates the L1 loss between I NR and I D . L semantic calculates the semantic perceptual loss between the input rainy image and the original image, specifically defined as follows:
L semantic = 1 ( branch s ( I rain ) , branch s ( I GT ) ) ,
which quantifies the difference between the semantic representations extracted by the model’s semantic branch during inference on the rainy image I rain and the corresponding representations when the ground truth image I GT is fed into the same branch.

4. Experiments

4.1. Implementation Specifications

Our model is trained on an NVIDIA GeForce RTX 4060 GPU with a total of 2000 epochs, utilizing input images cropped to a resolution of 128 × 128 pixels. The network architecture comprises a 3-level encoder-decoder structure, where each level possesses a channel depth of 16, 32, and 64, respectively, processing patches with edge lengths of 128, 64, and 32. The hidden layer iterations are set to 3, and the hyperparameter γ for mask computation is tuned to 0.6. The loss function incorporates sparsity terms with λ 1 and λ 2 values of 0.3 and 0.5, respectively.
To manage the learning rate, we employ the Adam optimizer with an initial learning rate of 0.0007, which gradually decays to 1 × 10 8 through a cosine annealing schedule. Data augmentation techniques, including random rotation and horizontal flipping, are applied to enrich the dataset and enhance the model’s performance.
The training strategy for real-world datasets. Given the fact that paired images in real-world datasets do not perfectly match pixel-by-pixel, we leverage the dual-branch architecture of our proposed INR-ISDF algorithm. In the initial 1500 epochs, we leverage a synthetic dataset to refine the detail-oriented branch, where the pixel-wise comparison loss function is employed to achieve precise detail matching. Subsequently, for the remaining 500 epochs, we switch to the real-world dataset, where we freeze the detail-oriented branches and instead concentrate on fine-tuning the model using semantically relevant loss functions. To support this approach, we adjust the hyperparameters, setting the coefficient λ 2 for the semantic loss function L semantic to 0.5 during the first 1500 epochs and increasing it to 1 for the final 500 epochs.

4.2. Datasets

4.2.1. RainDS

The RainDS dataset [23] encompasses both synthetic and real-world images. Each subset comprises paired images with three degradation combinations: rain streaks only, raindrops only, and a mixture of both. Specifically, RainDS-Syn contains 3600 image pairs, of which 3000 are utilized for network training, while the remaining 600 pairs are reserved for testing. RainDS-Real, on the other hand, comprises 750 image pairs, with 450 used for training and 300 for testing purposes.

4.2.2. Real-World Rainy Images Dataset

The Real-World Rainy Images Dataset [41] features high diversity in image content, rain intensity, direction, and other factors. This dataset includes a total of 50 images and is used exclusively for testing purposes.

4.2.3. RSDV

RSVD dataset [42] employs Unreal Engine and video enhancement techniques to create realistic and diverse snowy and foggy images. This dataset comprises 80 training videos and 30 testing videos, with each set containing anywhere from 30 to 200 video frames. A subset of image pairs from these scenes has been carefully selected for training and testing purposes, with 5786 image pairs allocated to the training dataset and 266 image pairs designated for the testing dataset.

4.2.4. RainDS-Low-Light

To further evaluate the model’s genuine deraining capability under varying lighting conditions or camera sensitivity settings, we simulated different intensities of low-light scenarios on the previously mentioned RainDS-real dataset [23], resulting in our RainDS-low-light dataset. Specifically, we applied gamma transformation and linear scaling to the rainy images to simulate low-light conditions, with gamma adjustment values being set to 0.5, 0.6, 0.7, and 0.8. Our RainDS-low-light dataset comprises 600 image pairs for training and 392 image pairs for testing.

4.3. Comparisons to Existing Methods

Table 1 presents the results of the comparative experiments. Our INR-ISDF demonstrates exceptional performance in terms of both peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) on both the RainDS-syn and RainDS-real datasets. Notably, our method exhibits a more significant improvement in real-world images compared with synthetic ones, demonstrating that our network is tailored to address more realistic rainy weather degradations.
Furthermore, we provide the visual comparisons on the RainDS-real dataset [23], including DRSformer [30], GT-Rain [45], NAFNet [46], NeRD-Rain [22], Restormer [19], UDR- S 2 Former [20], and our INR-ISDF approach in Figure 4 and Figure 5. It can be observed that our proposed method reconstructs rain-free images with the lowest error compared with the other methods.
To validate the performance of our network on real-world rainy images, we conducted tests using the unpaired Real-World Rainy Images Dataset [41]. As depicted in Figure 6, the visual outcomes clearly demonstrate that our network outperforms other state-of-the-art methods in terms of rain removal while preserving finer details and color fidelity when processing real rainy images.
In addition, to validate the robustness and scalability of the proposed INR-ISDF algorithm, we conducted experiments on both the RSVD dataset [42] and the RainDS-low-light dataset, which we constructed based on the RainDS dataset [23]. Figure 7 and Figure 8 present visual exemplars that demonstrate the algorithm’s performance in challenging environments. Additionally, Table 2 showcases the quantitative results of our INR-ISDF algorithm compared with the state-of-the-art, evaluated on the RSVD dataset. These tests demonstrate the algorithm’s capability to effectively handle diverse and demanding visual inputs.

4.4. Ablation Studies

Table 3 demonstrates the impact of the number of iterations for the INR module, mask operation, and recurrent iteration module on the network performance. The results of the ground truth image are presented on the right side of Table 3, while the visual representation is showcased in Figure 9. Table 4 presents the impact of our specific design choices in loss functions on network performance. Ablation experiments were conducted on the RainDS-real dataset by individually setting the coefficients λ 1 for L INR , λ 2 for L semantic , and both λ 1 and λ 2 to zero. These experiments elucidate the individual contributions of each loss component to the overall performance. Analysis of the experimental results reveals that the proposed components and loss function significantly enhance the restoration of real-world images.
Furthermore, we conducted experiments with iteration counts ranging from 0 to 6. When the iteration count is 0, dimension matching is directly achieved through concatenation and dimensionality reduction operations. The experimental results on the RainDS-real dataset, as illustrated in Figure 10, demonstrate that the utilization of semantic features by the iterative module attains an overall optimal performance at three iterations. Notably, excessive iterations may lead to the loss of semantic information, underscoring the indispensability of semantic information for image restoration.

5. Conclusions

In this work, our proposed INR-ISDF approach innovatively harnesses the controllable fitting capabilities of INR to provide normalization effects and positioning references for various types of rain degradation. Through the iterative fusion of detail and semantic branches, we uncover relatively definitive semantic information amidst the uncertain degradation fitting, thereby guiding the restoration of image quality to a superior level. Extensive experiments conducted on multiple real-world rain image datasets demonstrate that the INR-ISDF method outperforms state-of-the-art methods in its performance within these authentic datasets. Despite its effectiveness, the model struggles with maintaining detail and color recovery under low-light and extreme rain conditions and has real-time processing limitations due to Transformer complexity. Future work will focus on expanding data diversity with semi-supervised learning and optimizing the network for real-time efficiency.

Author Contributions

Conceptualization, Z.W. and L.X.; methodology, L.X.; validation, Z.W., W.R. and X.Y.; resources, T.C.; data curation, P.Z. and Y.C.; writing—original draft preparation, L.X. and Z.W.; writing—review and editing, Z.W., W.R., and X.Y.; visualization, Y.C. All authors have read and agreed to the published version of the manuscript.


This research was funded by the National R&D Program of China (Grant No. 2023YFB2504703), the Shaanxi International S&T Cooperation Program Project (2024GH-YBXM-24), the National Natural Science Foundation of China (52172379), and the Fundamental Research Funds for the Central Universities (300102242901).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Zijian Wang, Wen Rong and Xinpeng Yao were employed by the company Shandong Hi-Speed Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


Figure 1. The overview of our INR-ISDF network, which includes three main steps. Step1. Implicit neural representation and mask extraction. Step2. Dual-branch structure and semantic information-guided detail feature fusion. Step3. Decoding output.
Electronics 13 03634 g001
Figure 2. Comparision of pixel value distributions in the Y channel of multiple degraded rainy images before (left) and after INR normalization (right). The degraded rainy image samples are sourced from the RainDS dataset [23] and include both rain streak and raindrop degradations. It is evident that after INR normalization, the luminance disparities have been somewhat normalized.
Electronics 13 03634 g002
Figure 3. The visualization of intermediate results of the attention mask on the RainDS-real [23].
Electronics 13 03634 g003
Figure 4. Visual comparison of real-world rain image restoration on RainDS-real [23], including using NeRD-Rain [22], Restormer [19], UDR- S 2 Former [20], and our proposed INR-ISDF. The left column displays the unprocessed rainy images, while the right column presents the corresponding ground truth.
Electronics 13 03634 g004
Figure 5. Visual comparison of real-world rain image restoration on RainDS-real [23], including using DRSformer [30], GT-Rain [45], NAFNet [46], and our proposed INR-ISDF. The left column displays the unprocessed rainy images, while the right column presents the corresponding ground truth.
Electronics 13 03634 g005
Figure 6. Visual comparison of the Real-World Rainy Images Dataset [41], including NAFNet [46], NeRD-Rain [22], Restormer [19], UDR- S 2 Former [20], and our proposed INR-ISDF method. The left column displays the unprocessed rainy images.
Electronics 13 03634 g006
Figure 7. Visual examples showcasing the effectiveness of our INR-ISDF method on the RainDS-low-light dataset, which is derived from the RainDS dataset [23]. The odd-numbered rows display the original degraded images. The even-numbered rows present the corresponding results achieved by our INR-ISDF method.
Electronics 13 03634 g007
Figure 8. Visual examples showcasing the effectiveness of our INR-ISDF method on the RSVD snowy dataset [42]. The odd-numbered rows display the original degraded images. The even-numbered rows present the corresponding results achieved by our INR-ISDF method.
Electronics 13 03634 g008
Figure 9. Visual comparison of real-world rain image restoration on RainDS-real [23]. The sequence from left to right showcases the rain-image, baseline, integration of three IF-Block modules, enhancement with an INR module, our proposed INR-ISDF (INR + IF-Block × 3), and ground truth.
Electronics 13 03634 g009
Figure 10. Image restoration results from 0 to 6 iterations, with iterations indicated on the horizontal axis. The left vertical axis (blue curve) represents the peak signal-to-noise ratio (PSNR), which peaks at iteration 3. The right vertical axis (orange curve) denotes the structural similarity index measure (SSIM), which achieves its best score at iteration 1.
Electronics 13 03634 g010
Table 1. Comparison of image deraining results on the RainDS datasets [23]. All model-based algorithms have been retrained under the same conditions. Our proposed INR-ISDF method achieves the top performance, showcasing a substantial improvement over existing approaches in addressing real-world rainy scenarios.
GMM (2016) [27]26.660.78123.040.79321.500.66923.730.56018.600.55421.350.576(traditional method)
JCAS (2017) [43]26.460.78623.150.81120.910.67124.040.55618.180.55521.220.585(traditional method)
JRSRD (2021) [25]29.240.90528.520.92623.670.75820.170.68820.260.67218.410.6057.20 M24.60 G
IDT (2022) [44]36.560.97233.970.97529.740.92426.880.74124.640.69518.480.55216.00 M61.19 G
CCN (2021) [23]35.120.97033.290.97528.750.92126.830.73724.810.70118.740.5563.75 M245.85 G
DRSformer (2023) [30]29.320.92127.960.90124.150.73126.630.70324.250.69618.960.52533.7 M242.9 G
GT-Rain (2022) [45]26.230.74125.740.72222.190.67226.980.71224.590.69919.070.5082.29 M29.6 G
NAFNet (2022) [46]36.340.96832.330.97529.000.86826.760.73724.930.70419.560.60740.60 M16.19 G
NeRD-Rain (2024) [22]37.330.97835.490.97631.630.93926.430.73624.950.70520.170.62210.53 M79.2 G
Restormer (2022) [19]36.860.97734.970.97631.430.93626.660.74025.020.70221.570.63226.10 M140.99 G
UDR- S 2 Former (2023) [20]37.280.97634.960.97932.560.96127.290.73925.630.70822.050.6358.53 M21.58 G
12.28 M135.86 G
Table 2. Comparison of image desnowing results on the RSVD dataset [42].
NeRD-Rain [22]23.3270.904UDR- S 2 Former [20]24.6550.914
AirNet [47]23.5300.898DEA-Net [48]24.7460.909
AIRFormer [49]24.1320.904SnowFormer [50]24.8910.908
Uformer [51]24.3270.900DyNet [52]24.9490.916
WeatherDiff [53]24.4280.910INR-ISDF25.2400.915
Table 3. Ablation study of the individual components. Every proposed component plays an indispensable role in real-world rainy image restoration.
+ INR32.23 (+0.02)0.959 (+0.002)23.17 (+1.28)0.633 (+0.007)
+ IF-Block × 132.34 (+0.13)0.960 (+0.003)22.46 (+0.57)0.642 (+0.016)
+ IF-Block × 332.67 (+0.46)0.958 (+0.001)22.98 (+1.09)0.638 (+0.012)
INR-ISDF32.68 (+0.47)0.960 (+0.003)23.72 (+1.83)0.650 (+0.024)
Table 4. Ablation study of the L INR and L semantic .
L psnr 32.530.95823.610.648
+ L INR 32.62 (+0.09)0.959 (+0.001)23.65 (+0.05)0.648 (+0)
+ L semantic 32.60 (+0.07)0.960 (+0.002)23.64 (+0.04)0.649 (+0.001)
INR-ISDF32.68 (+0.15)0.960 (+0.002)23.72 (+0.11)0.650 (+0.002)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

