Multi-Scale Detail–Noise Complementary Learning for Image Denoising

Cui, Yan; Shi, Mingyue; Jiang, Jielin

doi:10.3390/app14167044

Open AccessArticle

Multi-Scale Detail–Noise Complementary Learning for Image Denoising

by

Yan Cui

¹

,

Mingyue Shi

²

and

Jielin Jiang

^2,3,4,*

¹

College of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing 210038, China

²

School of Software, Nanjing University of Information Science and Technology, Nanjing 210044, China

³

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China

⁴

Jiangsu Province Engineering Research Center of Advanced Computing and Intelligent Services, Nanjing University of Information Science and Technology, Nanjing 210044, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 7044; https://doi.org/10.3390/app14167044 (registering DOI)

Submission received: 2 July 2024 / Revised: 4 August 2024 / Accepted: 8 August 2024 / Published: 11 August 2024

(This article belongs to the Special Issue Advances in Neural Networks and Deep Learning)

Download

Browse Figures

Versions Notes

Abstract

:

Deep convolutional neural networks (CNNs) have demonstrated significant potential in enhancing image denoising performance. However, most denoising methods fuse different levels of features through long and short skip connections, easily generating a lot of redundant information, thereby weakening the complementarity of different levels of features, resulting in the loss of image details. In this paper, we propose a multi-scale detail–noise complementary learning (MDNCL) network for additive white Gaussian noise removal and real-world noise removal. The MDNCL network comprises two branches, namely the Detail Feature Learning Branch (DLB) and the Noise Learning Branch (NLB). Specifically, a loss function is applied to guide the complementary learning of image detail features and noisy mappings in these two branches. This learning approach effectively balances noise reduction and detail restoration, especially when dealing with high ratios of noise. To enhance the complementarity of features between different network layers and avoid redundant information, we designed a Feature Subtraction Unit (FSU) to capture the differences in features across the DLB network layers. Our extensive experimental evaluations demonstrate that the MDNCL approach achieves impressive denoising performance and outperforms other popular denoising methods.

Keywords:

image denoising; Gaussian noise; complementary learning; deep learning

1. Introduction

Image denoising has become increasingly critical in computer vision tasks, such as satellite remote sensing [1], video object detection [2], industrial inspection [3], and medical imaging [4]. This is primarily due to the heightened quality requirements for input images in these domains. The objective of image denoising is to recover a clean image, denoted as x, from a model defined as

x = y - n

, where y represents the noisy image and n represents the additive white Gaussian noise (AWGN) with a standard deviation of

σ

[5].

Traditional image denoising methods leverage the inherent structural characteristics of images, such as sparsity, low rank, and non-local similarity. Aharon et al. [6] addressed the limited adaptability of fixed dictionaries by successfully applying the K-singular value decomposition (K-SVD) algorithm to construct sparse dictionaries. Dabov et al. [7] introduced the block-matched three-dimensional (BM3D) filtering method, which combines non-local self-similarity and frequency-domain transformation to search for similar blocks. Wu et al. [8] proposed an enhanced Weighted Nuclear Norm Minimization (WNNM) algorithm that incorporates adaptive median filtering and the properties of WNNM to suppress noise by constraining the parameters of low-rank matrices. Although traditional image denoising methods can deliver satisfactory denoising results, they often involve computationally expensive procedures and rely heavily on prior knowledge of the image.

Compared to traditional denoising algorithms, deep convolutional neural networks (CNNs) have shown significant improvements in image denoising tasks by automatically learning network parameters from large datasets without relying on prior assumptions. DnCNN [9] generates residual images containing only noise through residual learning and improves the performance of the denoising network combined with batch normalization (BN). Based on the DnCNN model, Zhang et al. [10] successively proposed a fast and flexible denoising network (FFDNet). The input of FFDNet is noise-level estimation, which can solve the more complex real-world image denoising problem. However, as the depth of the network increases, the deep network layers may learn similar or duplicate feature representations, resulting in redundant features and degraded denoising performance [11]. Therefore, the researchers denoised images by increasing the width of the network or using skip join operations. Gurrola et al. [12] proposed a residual dense neural network (RDUNet) based on a densely connected hierarchical network for image denoising. The encoding layer and decoding layer of RDUNet are composed of densely connected convolutional layers to reuse feature maps and local residual learning, avoid gradient disappearance, and speed up the learning process. Qu et al. [13] proposed a simple and effective multiscale denoising back-projection feature fusion network to recover underwater images affected by complex degradation, which solves the limitations of traditional encoder–decoder methods in detail recovery and complex degradation processing. Liu et al. [14] constructed a wide CNN (WCNN) framework to reorganize several convolutional layers. Enough convolutional layers are trained using different loss functions so that the WCNN becomes a truly wide network and all subnetworks are trained in parallel, which greatly shortens the training time. Wu et al. [15] proposed a new dual convolutional neural network (DCANet) for blind image denoising, which is the first network to combine dual CNNs and attention mechanisms for image denoising. DCANet can suppress not only singular noise but also mixed noise and real-world noise.

Although the denoising algorithms described above have achieved good performance, most denoising methods fuse different levels of features through long and short skip connections, easily generating a lot of redundant information, thereby weakening the complementarity of different levels of features, resulting in the loss of image details [11]. Therefore, effectively retaining the detail features of an image poses a challenge in denoising tasks. To address these limitations, we propose a new multi-scale detail–noise complementary learning (MDNCL) network for AWGN and real-world noise removal. Our approach involves several steps. First, we employ a low-pass filter to extract a base layer and a detail layer from the input image. The detail layer contains edge and texture features, along with the noise mappings, while the base layer represents the approximate features of the image. The MDNCL network consists of two branches, namely the detail feature learning branch (DLB) and the noise learning branch (NLB). The detail layer serves as the input for both branches to separately learn image detail features and noisy mappings. In the DLB, we utilize a 2D Discrete Cosine Transform (DCT) to transform the detail layer from the spatial domain to the frequency domain. We also introduce the feature subtraction unit (FSU) to obtain the difference detail features between network layers. These difference features are then fused to produce rich multi-scale difference features, enhancing the complementarity between different network layers. Finally, the learned image features are combined with the base layer to reconstruct the final denoised image. We train the MDNCL network using appropriate loss functions to achieve a balance between noise removal and detail preservation. The proposed MDNCL effectively retains more image details and yields favorable denoising performance.

Overall, the contributions of the proposed MDNCL can be summarized as follows:

A novel multi-scale detail–noise complementary learning (MDNCL) network is introduced for the removal of AWGN and real-world noise. The MDNCL network incorporates two branches that extract rich detail features, effectively striking a balance between noise reduction and detail retention.
Complementary learning is approached from two perspectives. Firstly, the MDNCL network facilitates the complementary learning of detail features and noise distribution, enabling more effective noise removal while preserving important image details. Secondly, the network facilitates the complementary learning of features between different network layers within the detail feature learning branch.
The proposed FSU is proven to be an effective component in the MDNCL network. The FSU captures and leverages difference features within and between network layers. By avoiding the redundancy in image features and enhancing the complementarity between different network layers, the FSU further enhances the denoising performance.

The remainder of this paper is structured as follows. Section 2 presents a comprehensive review of related work. Section 3 introduces the proposed MDNCL approach in detail, explaining on its architecture and learning strategies. Section 4 provides an overview of the datasets used in our study, presents the implementation details, and reports the experimental results. Section 5 concludes the study and presents the key findings and contributions of our research.

2. Related Work

2.1. Multi-Scale Differential Feature Extraction

In the field of image denoising, the ability of a denoising algorithm to preserve detailed features from the original image is a crucial aspect to consider. Scale cues play a crucial role in capturing contextual information of images. Multi-scale differential feature extraction is particularly valuable in computer vision and image processing, as it can generate more discriminative and robust features, ultimately improving the performance of the model. Huang et al. [16] proposed the Multi-Scale Feature Subtraction Fusion (MFSF) module and the Feature Depth Supervision (FDS) module, which not only reduce redundant features but also provide additional supervision by changing the scale features in the decoder, thereby improving the training efficiency and performance of the network. Gan et al. [17] achieved a global understanding of image depth through a multi-scale structure, subtracting two feature maps at each level to extract differential features. This approach is similar to U-Net [18], where upsampling and downsampling are used to compute differences and extract multi-scale differences. In this paper, inspired by the above methods, the U-shaped structure and long and short jump connections of the DLB module are borrowed from U-Net++ [19]. The FSU in DLB can obtain different features by subtracting feature maps of different scales to obtain reliable image features and improve image denoising performance.

2.2. Deep CNNs for Image Denoising

With the development of deep learning, CNN-based methods have gradually become the mainstream in the field of image denoising. Tian et al. [20] proposed a denoising convolutional neural network (ADNet) that uses the principle of nonlinear diffusion to decompose noisy images into two parts, namely the noise part and the signal part. The noise part is removed by nonlinear diffusion, and the signal part is extracted by a deep CNN. Zhang et al. [21] proposed a robust deformable denoising CNN. The algorithm solves the problem of offset pixels in noisy image feature mapping by integrating deformation–convolution into the CNN and combines extended convolution with BN and rectified linear unit (ReLU) realized feature interaction, which improves the denoising performance of the model. In order to solve the problem of feature scaling losing high-frequency features and ignoring in-scale features, Shen et al. [22] adopted dynamic convolution to improve the learning of high-frequency and multi-scale features, achieving better denoising performance with lower computational complexity. Deep CNN models are able to learn complex image features and perform well when dealing with highly noisy images.

2.3. Multi-Branch-Based Network for Image Application

Multi-branch networks have demonstrated strong expressive power and the ability to capture nonlinear features in hidden spaces. Researchers have utilized such networks to tackle various computer vision tasks by assigning different branches to learn different contents. For instance, Reyhaneh et al. [23] proposed a three-branch self-supervised and cyclic multivariate function-based denoising network. Each branch in their network learns a specific aspect, such as the predicted clean image, the signal-dependent noise, and the signal-independent noise. Li et al. [24] developed a three-branch, omni-frequency region-adaptive network (ORNet) for denoising. Their network consists of low-frequency, medium-frequency, and high-frequency branches. Each branch enhances the corresponding frequency components to recover texture details. Wang et al. [25] implemented a simple lightweight network (LWNET) for image denoising with a two-branch structure. The decoder in the upper branch learns a quarter of the channel features, whereas the decoder in the lower branch learns three-fourths of the channel features. In our denoising network, we adopt a two-branch architecture. One branch directly learns the detail features from the detail layer, while the other branch learns the noise mappings from the detail layer. This design allows for the effective extraction and preservation of important image details.

3. Proposed Method

Many existing denoising networks struggle to properly capture the detailed textures of an image. Additionally, as the network depth increases, computational requirements and redundant image features tend to grow, leading to weakened complementarity among features in different network layers and a loss of fine-grained details. To address these issues, we propose the MDNCL approach, which consists of a detail learning branch (DLB), a Noise learning branch (NLB), and a Fusion Module (FM). The architecture of our proposed MDNCL approach is depicted in Figure 1. The DLB focuses on learning detail features, whereas the NLB emphasizes noise mappings derived from the detail layer. The FM is responsible for combining and fusing the features learned from both branches. Ultimately, we employ element-wise addition to reconstruct the clean images, incorporating the insights gained from both the detail and noise components.

3.1. Noise Learning Branch

To preserve the high-frequency information in noisy images, we begin by applying a high-pass filter to emphasize the edges and details. The high-pass filter convolves the noisy image with a high-pass filter kernel. A high-pass filter kernel is a matrix with negative center values and positive surrounding values. In a convolution operation, the high-pass filter reduces the low-frequency part of the image to highlight the high-frequency part. This process, known as the detail layer, isolates the high-frequency components. Next, we subtract the noisy image from the detail layer to obtain the base layer, which can be represented as follows:

\{\begin{matrix} I_{D e t a i l} = f_{H F} (I_{N}), \\ I_{B a s e} = I_{N} - I_{D e t a i l}, \end{matrix}

(1)

where

I_{N}

represents the noisy image, while

I_{D e t a i l}

and

I_{B a s e}

signify the detail layer obtained through the high-pass filter and the base layer, respectively. The base layer is calculated as the difference between the noisy image and the detail layer, and

f_{H F}

denotes the function of the high-pass filter.

The NLB is responsible for learning the noise mappings to obtain the predicted detail layer. The structure of the NLB, as illustrated in Figure 1a, comprises two

1 \times 1

convolution layers and six dense blocks (DBs). Each DB consists of a three-layer

3 \times 3

convolutional operation followed by a ReLU and a one-layer

3 \times 3

convolutional operation followed by BN and a ReLU. A detailed schematic of a DB is depicted in Figure 1b. Collectively, this process can be represented as follows:

\{\begin{matrix} N^{'} = 6 f_{D B} (C o n v (I_{D e t a i l})), \\ I_{d}^{'} = C o n v (C o n v (I_{D e t a i l}) - N^{'}), \\ N^{″} = C o n v (N^{'}), \end{matrix}

(2)

where

N^{'} \in R^{H \times W \times C}

and

N^{″} \in R^{H \times W \times C}

denote the noise mappings, where C indicates the channel number of 64 and 1 or 3 depending on whether the noisy image is in gray or color, respectively.

N^{″}

is used for the loss calculation of MDNCL in Section 3.4.

I_{d}^{'} \in R^{H \times W \times C}

represents the predicted detail features learned from the NLB, where C indicates the channel number of 1 or 3.

f_{D B}

denotes the function of the DB, and

C o n v

denotes the function of

1 \times 1

convolution.

3.2. Detail Learning Branch

The detail layer obtained from the high-pass filter is typically sparser compared to the noisy image. To facilitate feature extraction, we apply the DCT to further decompose the detail layer and convert it from the spatial domain to the frequency domain. The DCT process can be mathematically represented as follows:

\hat{I_{d}} = f_{D C T} (I_{D e t a i l}),

(3)

where

f_{D C T}

represents the function of the DCT. The kernel size for the DCT is set to 8.

{\hat{I}}_{d} \in R^{H \times W \times C}

indicates the detail layer in the frequency domain, where C indicates the channel number of 64.

The DLB is designed with a U-shaped structure to directly learn the predicted detail layer. The structure of the DLB, as shown in Figure 2, consists of the following three main components: an encoder, Feature Subtraction Module (FSM), and decoder. The encoder is composed of four DBs, along with four

3 \times 3

convolutional operations. To address the potential loss of image detail features during downsampling operations, four DBs are used in the encoder to adjust the channel numbers. The input channels for the four DBs are 64, 128, 256, and 512. After four encoders, the receptive field size of the feature map of the last layer is 31. The corresponding output channels for the four DBs are 128, 256, 512, and 1024. It should be noted that a

1 \times 1

convolutional operation is applied to each DB layer to reduce the number of channels to 64 and, consequently, decrease the number of parameters. The encoding process can be represented as follows:

{\hat{I}}_{d}^{i} = C o n v (f_{D B} ({\hat{I}}_{d})) (i = 1, 2, 3, 4),

(4)

where

{\hat{I}}_{d}^{i}

denotes the detail features learned from the i-th

f_{D B}

. The number of channels for all detail features is 64.

The FSU emphasizes the difference and complementary information among features and removes redundant features. A series of differential features with different scales and receptive fields is calculated by connecting multiple FSUs horizontally and vertically. For inter-layer, multi-scale features, we connect multiple FSUs in a vertically connected way to capture large-span cross-layer difference features. For intra-layer, multi-scale features, we connect multiple FSUs in a horizontally connected way to naturally achieve multi-scale, intra-layer feature fusion. To capture multi-scale complementary detail features, we employ multiple FSUs that connect different layers in both horizontal and vertical directions to calculate the differential detail features. In this way, it can remove redundant information and enhance feature complementarity at different levels, thereby improving the denoising performance of the network. The structure of an FSU is illustrated in Figure 3. The FSU includes feature maps (

F_{A}

and

F_{B}

) corresponding to different layers. We extract features at different scales from these feature maps using

1 \times 1

,

3 \times 3

, and

5 \times 5

convolutional operations. By element-wise subtraction, we obtain three differential feature maps, where each map represents the difference between the same convolutional operation applied to

F_{A}

and

F_{B}

. Finally, these differential feature maps are fused together using element-wise addition, resulting in a multi-scale differential feature map. The FSU effectively captures complementary detail features between

F_{A}

and

F_{B}

, emphasizing their differences in detail and providing rich features for the decoder. The progress of the FSU can be described as follows:

\begin{matrix} F S U_{B}^{A} = |C o n v 1 \times 1 (F_{A}) - C o n v 1 \times 1 (F_{B})| + \\ |C o n v 3 \times 3 (F_{A}) - C o n v 3 \times 3 (F_{B})| + \\ |C o n v 5 \times 5 (F_{A}) - C o n v 5 \times 5 (F_{B})|, \end{matrix}

(5)

where

F S U_{B}^{A}

represents the multi-scale difference feature map obtained from the A-th and the B-th layers.

C o n v 1 \times 1

represents the

1 \times 1

convolutional operation, while

C o n v 3 \times 3

and

C o n v 5 \times 5

represent the

3 \times 3

and

5 \times 5

convolutional operations, respectively.

To aggregate the multi-scale differential features from different layers, we employ element-wise addition to enhance and fuse the difference feature maps of each layer with the corresponding feature maps from the next layer. This fusion process results in a complementary fusion feature. Subsequently, we design the decoder to decode the complementary fusion feature. The structure of the decoder, as shown in Figure 2 (decoder part), includes three feature decoders (FDs). Each FD consists of a one-layer

3 \times 3

convolution, followed by a ReLU activation and another one-layer

3 \times 3

convolution with batch normalization (BN) and a ReLU. This decoding process can be summarized as follows:

\{\begin{matrix} F D_{3} = f_{F D} ({\hat{I}}_{d}^{4} + F S U_{4}^{3}), \\ F D_{2} = f_{F D} (F D_{3} + F S U_{3}^{2} + F S U_{4, 3}^{2}), \\ F D_{1} = f_{F D} (F D_{2} + F S U_{2}^{1} + F S U_{3, 2}^{1} + F S U_{4, 3, 2}^{1}), \end{matrix}

(6)

where

f_{F D}

denotes the function of the FD. Each FD, denoted as

F D_{i} \in R^{H \times W \times C}

(for

i = 1, 2, 3

), represents the decoding result, with C being the number of channels (set to 64).

Based on Equation (6), we can deduce that

F D_{1}

serves as the final output of the DLB. To transform

F D_{1}

from the frequency domain to the spatial domain, we employ inverse Discrete Cosine Transform (DCT). The process of inverse DCT can be represented as follows:

I_{d}^{″} = f_{I D C T} (F D_{i}),

(7)

where

f_{I D C T}

denotes the function of inverse DCT, and we set the kernel size to 8.

I_{d}^{″} \in R^{H \times W \times C}

represents the predicted detail features learned from the DLB, where C indicates a channel number of 1 or 3.

3.3. Fusion Module

We obtain

I_{d}^{'}

and

I_{d}^{″}

from the two branches, corresponding to

d e t a i l_{1}

and

d e t a i l_{2}

in Figure 1, respectively. To obtain the final detail feature map, we utilize a fusion operation, which involves a concatenation operation and a

1 \times 1

convolutional operation. Subsequently, for the reconstruction of the denoised image, we perform element-wise addition between the base layer and the final detail feature map. The specific process is outlined as follows:

\{\begin{matrix} {\tilde{I}}_{d} = C o n v (C o n c a t (I_{d}^{'}, I_{d}^{″})), \\ {\tilde{I}}_{c} = {\tilde{I}}_{d} + I_{B a s e}, \end{matrix}

(8)

where

C o n c a t

and

C o n v

indicate the functions of concatenation and convolution operations, respectively.

{\tilde{I}}_{d} \in R^{H \times W \times C}

represents the final detail feature map, while

{\tilde{I}}_{c} \in R^{H \times W \times C}

represents the denoised image. Here, C indicates the channel number, which can be either 1 or 3.

3.4. Loss Function

To constrain the training of MDNCL, we adopt the mean square error loss, denoted as

L_{2}

[26], as the loss function. The loss (L) can be expressed as follows:

L = L_{L 2 - d e t a i l} + μ L_{L 2 - n o i s e},

(9)

where

L_{L 2 - d e t a i l}

represents the mean squared error between the ground-truth detail and the predicted detail learned by the DLB.

L_{L 2 - n o i s e}

denotes the mean squared error between the ground-truth noise and the predicted noise learned by the NLB.

μ

represents the loss weight associated with

L_{L 2 - n o i s e}

. The specific expressions for

L_{L 2 - d e t a i l}

and

L_{L 2 - n o i s e}

are presented as follows:

\{\begin{matrix} I_{d} = I_{d e t a i l} - N, \\ L_{L 2 - d e t a i l} = E (I_{d}^{″}, I_{d}) [\frac{1}{N_{p}} {∥I_{d}^{″} - I_{d}∥}^{2}], \\ L_{L 2 - n o i s e} = E (N^{″}, N) [\frac{1}{N_{p}} {∥N^{″} - N∥}^{2}], \end{matrix}

(10)

where N and

I_{d}

denote the ground-truth noise mapping and ground-truth detail, respectively.

N_{p}

represents the number of pixels.

4. Experiments

4.1. Dataset

Training datasets: To train MDNCL for AWGN removal, we utilized the Berkeley segmentation dataset (BSD) [27], which consists of 432 original natural images with dimensions of $481 \times 321$ . In order to enhance the training process and increase the number of training patches, each image was divided into 128 patches with dimensions of $58 \times 58$ , resulting in a total of 55,296 image patches. Subsequently, random data augmentation techniques were applied to the training patches to enhance the generalization ability of MDNCL. These techniques included vertical flipping, rotation, etc. For denoising in grayscale images, we trained two versions of MDNCL, namely MDNCL, which utilizes known noise levels, and MDNCL-B, which performs denoising with blind noise levels. Similarly, for color images, we conducted experiments using CMDNCL and CMDNCL-B. Known noise levels were set to $σ$ = 15, 25, and 50, where $σ$ represents the standard deviation. The blind noise level was set within the range of $σ \in [0, 55]$ .
To train MDNCL for real-world noise removal, we used the Smartphone Image Denoising DATA (SIDD) sRGB track [28] with 160 scene instances, where each instance includes two pairs of high-resolution images. Each pair contained one noisy image and the corresponding clean image. In all, 320 pairs of images were used to train for real-world denoising.
Test datasets: For grayscale testing of MDNCL, we evaluated its denoising performance using two datasets, namely the Berkeley segmentation dataset (BSD68) [29] and Set12 [29]. BSD68 consists of 68 images, while Set12 contains 12 images. These datasets were specifically used for AWGN removal. Regarding color image testing, we employed the CBSD68 [29], McMaster [30], and Kodak24 [31] datasets. CBSD68 consists of 68 images, the McMaster dataset consists of 18 images, and Kodak24 consists of 24 images. To assess real-world denoising performance, we utilized sets from the SIDD validation dataset and the Darmstadt Noise Data (DND) dataset [32]. The DND dataset comprises 50 pairs of images but does not provide ground-truth images. To evaluate the denoising results, we uploaded the denoised images to the official DND website, where we obtained the PSNR [33] and SSIM [34] measurements.

4.2. Implementation Details

MDNCL was implemented using Python 3.7 and Pytorch 1.11.0 on an Intel(R) Core(TM) i7-9700K CPU with an Nvidia GeForce RTX 2080Ti GPU. The parameters for MDNCL are listed as follows: a batch size of 16, 90 epochs, and an initial learning rate of 0.0001 that decreases from 0.0001 to 0.00002 as the epochs progress. Optimization was carried out using the Adam algorithm, with the

β_{1}

and

β_{2}

parameters set to 0.9 and 0.999, respectively. Additionally, the weight (

μ

) in Equation (9) was set to 1. More details can be found in Table 1.

4.3. Comparative Experiments

To assess the denoising performance of MDNCL, we conducted a comparison with several advanced denoising methods. The reference methods used in the comparison were BM3D [7], WNNM [35], DnCNN [9], FFDNet [10], ADNet [20], BRDNet [36], BUIFD [37], CDNet [38], MWDCNN [39], DRFENet [40], CBDNet [41], AirNet [42], HI-GAN [43], and TBSN [44]. Among these methods, BM3D [7] and WNNM [35] are traditional denoising techniques, while the remaining approaches are state-of-the-art denoising methods based on deep learning. For quantitative evaluation, we utilized the PSNR and SSIM metrics to measure the performance of MDNCL relative to the other methods.

During the grayscale testing of MDNCL, we employed the Set12 and BSD68 datasets to evaluate its denoising performance. As illustrated in Table 2 and Table 3, MDNCL achieved the highest average PSNR and SSIM results on Set12 and BSD68 at noise levels of 15, 25, and 50. Specifically, when examining the “Barbara” image from the Set12 dataset (Table 3), MDNCL’s denoising performance ranked second, just behind WNNM. MDNCL also demonstrated impressive denoising performance in blind denoising (MDNCL-B) compared to other state-of-the-art denoising methods. For instance, the PSNR and SSIM results of MDNCL-B on the Set12 dataset surpassed those of both DnCNN and DRFENet, as shown in Table 3. Detailed PSNR results for each image and the average PSNR and SSIM scores for each method on the Set12 dataset at three different noise levels can be found in Table 3.

To assess the denoising performance of MDNCL on noisy color images, we conducted evaluations using the CBSD68, Kodak24, and McMaster datasets. As indicated in Table 4, CMDNCL achieved the highest PSNR results across all three datasets and surpassed those of existing reference methods. When the noise level was as high as 75, CMDNCL still obtained the best denoising index, which proves the superior denoising ability of CMDNCL. Furthermore, in the context of blind denoising on color images, we observed that MDNCL-B also demonstrated impressive denoising performance. It effectively removed noise from color images, further affirming the effectiveness of CMDNCL in various denoising scenarios.

To evaluate the denoising performance of MDNCL in removing real-world noise, we conducted tests using the SIDD validation and DND datasets. As illustrated in Table 5, MDNCL achieved the highest PSNR and SSIM results, indicating its effectiveness in removing real-world noise. This demonstrates that MDNCL is well-suited for real-world denoising applications.

After conducting quantitative analysis of MDNCL’s denoising performance, we selected several popular reference denoising methods, including BM3D, DnCNN, FFDNet, ADNet, BRDNet, and MWDCNN, for a visual qualitative comparison. We amplified and observed the differences among the methods by selecting the same areas of interest in the denoised images. Figure 4 and Figure 5 showcase the grayscale image denoising performance. One image from each of the Set12 and BSD68 datasets was chosen for visual comparison at noise levels of 15 and 25, respectively. In Figure 4, the image “House” exhibits more preserved striped details when denoised by MDNCL. In Figure 5, MDNCL demonstrates the clearest outline of the castle compared to other methods. These visual comparisons indicate that MDNCL achieves the best visual performance and preserves more detailed features compared to the reference methods.

For color image denoising, we selected two images from the CBSD68 dataset at a noise level of 25 and one image from each of the Kodak24 and McMaster datasets at a noise level of 50. In Figure 6, the selected images of zebras with many stripes and a lion with intricate hair textures from CBSD68 demonstrate that CMDNCL excels at handling texture details. Additionally, in Figure 7 and Figure 8, when the noise level is set to 50, CMDNCL effectively preserves the stripe details in the window and the outline of the cherry.

For real-world denoising, several denoised images from the SIDD dataset are shown in Figure 9, demonstrating the effective removal of real-world noise. Based on both quantitative and visual analysis, we conclude that MDNCL exhibits superior visual denoising performance compared to the reference denoising methods.

4.4. Ablation Experiment

Role of network framework: To verify the effectiveness of complementary learning branches and the FSU module in MDNCL, we conducted ablation experiments on the Set12 dataset. For complementary learning branches, MDNCL was compared to a corresponding network with only one branch. For the FSU module, the MDNCL was compared to a corresponding network without an FSU. Specifically, we compared MDNCL with MDNCL-NLB, which only learns image details; MDNCL-DLB, which only learns noise; and MDNCL-FSU, which does not have an FSU. To ensure a fair comparison, all networks were trained using the same framework shown in Figure 1. The quantitative results of ablation experiments are shown in Table 6, highlighting the effectiveness of MDNCL.

For visual comparison, Figure 10 shows the results of the ablation experiment. It can be concluded that when a single branch is used to learn detailed features or noise mappings, the denoising results suffer from varying degrees of detail feature loss. However, when the FSU module is removed, the feature artifacts appear in the denoised image, and the noise is not completely removed. Overall, the strong denoising performance of MDNCL is achieved through the joint action of the two branches and the FSU module.

Role of loss function: The $μ$ hyperparameter in the loss function was determined by a series of experiments. We conducted experiments on the Set12 dataset, changing the value of the hyperparameter to 0.4, 0.6, 0.8, 1.0, and 1.2. To ensure fair comparisons, all networks were trained using the same framework shown in Figure 1, and quantitative results of the ablation experiments are shown in Table 7.

As can be seen from Table 7, the best denoising performance was obtained by setting the

μ

hyperparameter to 1.0. When the value of

μ

is greater than 1.0, the network pays more attention to removing noise and ignores the retention of details. When

μ

is less than 1, the network is not able to effectively remove noise. Therefore, it is important to strike an balance between detail retention and noise removal.

To more intuitively demonstrate why setting the

μ

hyperparameter to 1.0 is a reasonable choice, we compared denoised images with different

μ

values. As can be seen in Figure 11, when

μ

is less than 1, the zoomed-in detail areas, such as the white feathers, still contain a significant amount of noise. This indicates that the denoising effect is insufficient and smoothing is inadequate. On the other hand, when

μ

is greater than 1, the noise is reduced, but the smoothing is too strong, leading to a loss of detail information. Through this comparison, we can see that when

μ

equals 1.0, it achieves a good balance between noise removal and detail preservation.

4.5. Model Complexity Analysis

Running time and the number of parameters are important indicators to evaluate the complexity of a model. In this section, we compare MDNCL with BM3D [7], TWSC [45], MCWNNM [46], IRCNN [47], DnCNN-B [9], FFDNet [10], BUIFD [37], ADNet [20], SADNet [48], CycleISP [49], AirNet [42], VDN [50], AINDNet [51], VDIR [52], RDUNet [12], ATDNet [53], CTNet [54], and TBSN [44] in terms of running time and model parameters. Among these models, BM3D, TWSC, and MCWNNM were implemented in the Matlab (R2020a) environment, and other denoising methods were implemented in PyCharm (2021.2.1) software. We randomly selected images with sizes of

256 \times 256

,

512 \times 512

, and

1024 \times 1024

at a noise level of 25. Table 8 records the average running time of 20 implementations of each tested model, and we ignore the memory transfer time between the CPU and GPU. Table 9 shows the complexity of the comparison models.

The performance of the MDNCL algorithm is highlighted on multiple dimensions, showing its superiority over the compared denoising models. For color images, the running times for each image size are 0.316 s, 0.574 s, and 2.174 s. When processing large-size color images, MDNCL is much faster than models such as AINDNet, BM3D, and MCWNNM; AINDNet took 9.573 s, BM3D took 12.788 s, and MCWNNM took 1120.332 s, indicating the efficiency of MDNCL when processing large-size images.

As can be seen from Table 9, MDNCL has 6.71 M parameters, indicating that it can still achieve excellent noise reduction performance while maintaining low computational complexity. In contrast, the number of parameters of CTNet is as high as 49.03 M. While the denoising performance of CTNet may be better, it can be prohibitive in practical applications due to its huge computing requirements. MDNCL provides a balanced solution that optimizes performance and computational efficiency for environments with limited computing resources.

5. Conclusions

In this paper, we propose a multi-scale detail–noise complementary learning (MDNCL) network for image denoising. The two-branch architecture of MDNCL aims to strike a balance between noise reduction and detail preservation, with particularly noticeable performance gains at higher noise levels. Furthermore, the proposed FSU addresses issues related to redundant information and computational complexity during the denoising process, resulting in more effective preservation of detailed features. To evaluate the denoising capabilities of MDNCL, we conducted experiments on various datasets for both grayscale and color images, as well as ablation experiments on the Set12 dataset. The obtained experimental results consistently reinforce the strong denoising performance of MDNCL. However, the proposed MDNCL also has some limitations. This method is suitable for single noise and real-world noise, but it may not be effective for complex mixed noise and noise in complex environments. We will further investigate combining the idea of generative models with MDNCL to improve denoising performance in future work.

Author Contributions

Conceptualization, Y.C. and J.J.; methodology, J.J. and M.S.; validation, M.S.; formal analysis, M.S. and Y.C.; investigation, Y.C. and J.J.; resources, Y.C.; writing—original draft preparation, M.S. and Y.C.; writing—review and editing, Y.C. and M.S.; visualization, J.J.; funding acquisition, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62001236, in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant 20KJA520003, and in part by the Six Talent Peaks Project of Jiangsu Province under Grant JY-051.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found at https://paperswithcode.com/datasets (accessed on 26 December 2023).

Acknowledgments

The authors would like to thank Haidong Yang and Yanqiong Zhang. Haidong Yang contributed to software and project administration, and Yanqiong Zhang contributed to data curation and supervision. Their work improved the quality and clarity of the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yang, X.; Yan, J.; Liao, W.; Yang, X.; Tang, J.; He, T. Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2384–2399. [Google Scholar] [CrossRef] [PubMed]
Yazdi, M.; Bouwmans, T. New trends on moving object detection in video images captured by a moving camera: A survey. Comput. Sci. Rev. 2018, 28, 157–177. [Google Scholar] [CrossRef]
Zatar, W.; Chen, G.; Nghiem, H. Ultrasonic Pulse-Echo Signals for Quantitative Assessment of Reinforced Concrete Anomalies. Appl. Sci. 2024, 14, 4860. [Google Scholar] [CrossRef]
Huang, Z.; Zhang, J.; Zhang, Y.; Shan, H. DU-GAN: Generative Adversarial Networks with Dual-Domain U-Net-Based Discriminators for Low-Dose CT Denoising. IEEE Trans. Instrum. Meas. 2021, 71, 1–12. [Google Scholar] [CrossRef]
Tian, C.; Fei, L.; Zheng, W.; Xu, Y.; Zuo, W.; Lin, C.W. Deep learning on image denoising: An overview. Neural Netw. 2020, 131, 251–275. [Google Scholar] [CrossRef] [PubMed]
Aharon, M.; Elad, M.; Bruckstein, A. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 2006, 54, 4311–4322. [Google Scholar] [CrossRef]
Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef] [PubMed]
Wu, J.; Lee, X. An improved WNNM algorithm for image denoising. J. Phys. Conf. Ser. 2019, 1237, 022037. [Google Scholar] [CrossRef]
Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef]
Zhang, K.; Zuo, W.; Zhang, L. FFDNet: Toward a fast and flexible solution for CNN-based image denoising. IEEE Trans. Image Process. 2018, 27, 4608–4622. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Gurrola-Ramos, J.; Dalmau, O.; Alarcón, T.E. A Residual Dense U-Net Neural Network for Image Denoising. IEEE Access 2021, 9, 31742–31754. [Google Scholar] [CrossRef]
Qu, Q.; Song, Y.; Chen, J. Denoising Multiscale Back-Projection Feature Fusion for Underwater Image Enhancement. Appl. Sci. 2024, 14, 4395. [Google Scholar] [CrossRef]
Liu, G.; Dang, M.; Liu, J. True wide convolutional neural network for image denoising. Inf. Sci. 2022, 610, 171–184. [Google Scholar] [CrossRef]
Wu, W.; Ge, A.; Lv, G. DCANet: Dual Convolutional Neural Network with Attention for Image Blind Denoising. arXiv 2023, arXiv:2304.01498. [Google Scholar]
Huang, Z.; You, H. MFSFNet: Multi-scale feature subtraction fusion network for remote sensing image change detection. Remote Sens. 2023, 15, 3740. [Google Scholar] [CrossRef]
Gan, Y.; Xu, X.; Sun, W.; Lin, L. Monocular depth estimation with affinity, vertical pooling, and label enhancement. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 224–239. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCA, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Granada, Spain, 20 September 2018; Springer: Cham, Switzerland, 2018; pp. 3–11. [Google Scholar]
Tian, C.; Xu, Y.; Li, Z.; Zuo, W.; Fei, L.; Liu, H. Attention-guided CNN for image denoising. Neural Netw. 2020, 124, 117–129. [Google Scholar] [CrossRef]
Zhang, Q.; Xiao, J.; Tian, C.; Chun-Wei Lin, J. A Robust Deformed Convolutional Neural Network (CNN) for Image Denoising. CAAI Trans. Intell. Technol. 2023, 8, 331–342. [Google Scholar] [CrossRef]
Shen, H.; Zhao, Z.Q.; Zhang, W. Adaptive dynamic filtering network for image denoising. In Proceedings of the AAAI Conference on Artificial Intelligence, Seattle, WA, USA, 7–14 February 2023; Volume 37, pp. 2227–2235. [Google Scholar]
Neshatavar, R.; Liu, P.; Pang, J.; Mei, T.; Barnes, N.; Petersson, L.; Harandi, M. CVF-SID: Cyclic multi-variate function for self-supervised image denoising by disentangling noise from image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 17583–17591. [Google Scholar]
Li, X.; Zhang, K.; Zuo, W.; Zhang, L.; Zhang, L.; Zhu, X.; Zhang, Y.; Zhang, L. Learning omni-frequency region-adaptive representations for real image super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence; Association for the Advancement of Artificial Intelligence: Washington, DC, USA, 2021; Volume 35, pp. 1975–1983. [Google Scholar]
Wang, J.; Lu, Y.; Lu, G. Lightweight image denoising network with four-channel interaction transform. Image Vis. Comput. 2023, 137, 104766. [Google Scholar] [CrossRef]
Allen, D.M. Mean square error of prediction as a criterion for selecting variables. Technometrics 1971, 13, 469–475. [Google Scholar] [CrossRef]
Li, H.; Cai, J.; Nguyen, T.N.A.; Zheng, J. A benchmark for semantic image segmentation. In Proceedings of the IEEE International Conference on Multimedia and Expo, San Jose, CA, USA, 15–19 July 2013; pp. 1–6. [Google Scholar]
Abdelhamed, A.; Lin, S.; Brown, M.S. A high-quality denoising dataset for smartphone cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1692–1700. [Google Scholar]
Li, H.; Xia, S.; Zhou, B.; Peng, J. The growth mechanism of grain boundary carbide in Alloy 690. Mater. Charact. 2013, 81, 1–6. [Google Scholar] [CrossRef]
Zhang, L.; Wu, X.; Buades, A.; Li, X. Color demosaicking by local directional interpolation and nonlocal adaptive thresholding. J. Electron. Imaging 2011, 20, 023016. [Google Scholar]
Franzen, R. Kodak lossless true color image suite. Photocd pcd0992 1999, 4, 2. [Google Scholar]
Plotz, T.; Roth, S. Benchmarking denoising algorithms with real photographs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2750–2759. [Google Scholar]
Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the international conference on pattern recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
Gu, S.; Zhang, L.; Zuo, W.; Feng, X. Weighted nuclear norm minimization with application to image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2862–2869. [Google Scholar]
Tian, C.; Xu, Y.; Zuo, W. Image denoising using deep CNN with batch renormalization. Neural Netw. 2020, 121, 461–473. [Google Scholar] [CrossRef] [PubMed]
El Helou, M.; Süsstrunk, S. Blind Universal Bayesian Image Denoising with Gaussian Noise Level Learning. IEEE Trans. Image Process. 2020, 29, 4885–4897. [Google Scholar] [CrossRef] [PubMed]
Quan, Y.; Chen, Y.; Shao, Y.; Teng, H.; Xu, Y.; Ji, H. Image denoising using complex-valued deep CNN. Pattern Recognit. 2021, 111, 107639. [Google Scholar] [CrossRef]
Tian, C.; Zheng, M.; Zuo, W.; Zhang, B.; Zhang, Y.; Zhang, Y. Multi-stage image denoising with the wavelet transform. Pattern Recognit. 2023, 134, 109050. [Google Scholar] [CrossRef]
Zhong, R.; Zhang, Q. DRFENet: An Improved Deep Learning Neural Network via Dilated Skip Convolution for Image Denoising Application. Appl. Sci. 2022, 13, 28. [Google Scholar] [CrossRef]
Guo, S.; Yan, Z.; Zhang, K.; Zuo, W.; Zhang, L. Toward convolutional blind denoising of real photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 1712–1722. [Google Scholar]
Li, B.; Liu, X.; Hu, P.; Wu, Z.; Lv, J.; Peng, X. All-in-one image restoration for unknown corruption. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 17452–17462. [Google Scholar]
Vo, D.M.; Nguyen, D.M.; Le, T.P.; Lee, S.W. HI-GAN: A hierarchical generative adversarial network for blind denoising of real photographs. Inf. Sci. 2021, 570, 225–240. [Google Scholar] [CrossRef]
Li, J.; Zhang, Z.; Zuo, W. TBSN: Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising. arXiv 2024, arXiv:2404.07846. [Google Scholar]
Wu, W.; Ge, A.; Lv, G.; Xia, Y.; Zhang, Y. Two-stage Progressive Residual Dense Attention Network for Image Denoising. arXiv 2024, arXiv:2401.02831. [Google Scholar]
Xu, J.; Zhang, L.; Zhang, D.; Feng, X. Multi-channel weighted nuclear norm minimization for real color image denoising. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1096–1104. [Google Scholar]
Zhang, K.; Zuo, W.; Gu, S.; Zhang, L. Learning deep CNN denoiser prior for image restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3929–3938. [Google Scholar]
Chang, M.; Li, Q.; Feng, H.; Xu, Z. Spatial-adaptive network for single image denoising. In Proceedings of the Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 171–187. [Google Scholar]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H.; Shao, L. Cycleisp: Real image restoration via improved data synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2696–2705. [Google Scholar]
Yue, Z.; Yong, H.; Zhao, Q.; Meng, D.; Zhang, L. Variational denoising network: Toward blind noise modeling and removal. In Proceedings of theAdvances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
Kim, Y.; Soh, J.W.; Park, G.Y.; Cho, N.I. Transfer learning from synthetic to real-noise denoising with adaptive instance normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 3482–3492. [Google Scholar]
Soh, J.W.; Cho, N.I. Variational deep image restoration. IEEE Trans. Image Process. 2022, 31, 4363–4376. [Google Scholar] [CrossRef] [PubMed]
Kim, Y.; Soh, J.W.; Cho, N.I. Adaptively tuning a convolutional neural network by gate process for image denoising. IEEE Access 2019, 7, 63447–63456. [Google Scholar] [CrossRef]
Tian, C.; Zheng, M.; Zuo, W.; Zhang, S.; Zhang, Y.; Lin, C.W. A cross Transformer for image denoising. Inf. Fusion 2024, 102, 102043. [Google Scholar] [CrossRef]

Figure 1. The overall framework of the proposed method. (a) Network architecture of MDNCL. (b) Dense block (DB).

Figure 2. Schematic diagram of DLB.

Figure 3. Schematic diagram of FSU.

Figure 4. Denoising results of “House” from Set12 with a noise level of 15. (a) clean image; (b) noisy image; (c) BM3D/34.93 dB; (d) DnCNN/34.99 dB; (e) FFDNet/35.07 dB; (f) ADNet/35.22 dB; (g) BRDNet/35.27 dB; (h) MDNCL (Ours)/35.46 dB.

Figure 5. Denoising results of images from BSD68 with a noise of level 25. (a) clean image; (b) noisy image; (c) BM3D/29.53 dB; (d) DnCNN/30.17 dB; (e) FFDNet/30.02 dB; (f) ADNet/30.24 dB; (g) BRDNet/30.27 dB; (h) MWDCNN/30.14 dB; and (i) MDNCL (ours)/30.39 dB.

Figure 6. Denoising results of two images from CBSD68 with a noise level of 25. (a) clean image; (b) noisy image; (c) FFDNet/30.54 dB; (d) ADNet/30.74 dB; (e) MDNCL (Ours)/31.10 dB; (f) clean image; (g) noisy image; (h) FFDNet/29.85 dB; (i) ADNet/29.92 dB; (j) MDNCL (Ours)/30.07 dB.

Figure 7. Denoising results of one image from Kodak24 with a noise level of 50. (a) clean image; (b) noisy image; (c) DnCNN/26.13 dB; (d) ADNet/26.22 dB; (e) FFDNet/26.23 dB; (f) CMDNCL (Ours)/26.48 dB.

Figure 8. Denoising results of one image from the McMaster dataset with a noise level of 50. (a) clean image; (b) noisy image; (c) DnCNN/31.13 dB; (d) ADNet/31.84 dB; (e) FFDNet/31.82 dB; (f) CMDNCL (Ours)/32.08 dB.

Figure 9. Denoising results on SIDD. The noisy images are arranged at the top, and the denoised images are presented at the bottom.

Figure 10. Denoising results of one image from Set12 with a noise level of 15. (a) clean image; (b) noisy image; (c) MDNCL-NLB/30.19 dB; (d) MDNCL-DLB/31.27 dB; (e) MDNCL-FSU/28.79 dB; (f) MDNCL/32.08 dB.

Figure 11. Image denoising result with a noise level of 25 on Set12 when

μ

takes different values. (a) clean image; (b) noisy image; (c)

μ

= 0.6; (d)

μ

= 0.8; (e)

μ

= 1.2; (f)

μ

= 1.0.

Figure 11. Image denoising result with a noise level of 25 on Set12 when

μ

takes different values. (a) clean image; (b) noisy image; (c)

μ

= 0.6; (d)

μ

= 0.8; (e)

μ

= 1.2; (f)

μ

= 1.0.

Table 1. Experimental parameters.

Parameter	Value
Batch size	16
Epochs	90
Initial learning rate	0.0001
$β_{1}$	0.9
$β_{2}$	0.999
$μ$	1
Xavier initialization	Draw weights from a normal distribution with a mean of 0

Table 2. Average PSNR (dB)/SSIM results of different methods on BSD68 (The superior results are emphasized in bold).

Method	$σ = 15$	$σ = 25$	$σ = 50$
BM3D [7]	31.07/0.8729	28.57/0.8042	25.62/0.6817
WNNM [35]	31.37/0.8777	28.83/0.8103	25.87/0.6975
DnCNN [9]	31.72/0.8902	29.23/0.8288	26.23/0.7180
FFDNet [10]	31.63/0.8901	29.19/0.8298	26.29/0.7262
ADNet [20]	31.74/0.8910	29.25/0.8285	26.29/0.7211
CDNet [38]	31.74/0.8915	29.28/0.8312	26.36/0.7229
BRDNet [36]	31.72/0.8910	29.24/0.8295	26.28/0.7232
DRFENet [40]	31.76/0.8920	29.26/0.8301	26.29/0.7230
MWDCNN [39]	31.77/0.8919	29.28/0.8305	26.29/0.7229
MDNCL (ours)	31.82/0.8923	29.30/0.8316	26.38/0.7242
MDNCL-B (ours)	31.76/0.8919	29.25/0.8304	26.31/0.7234

Table 3. Average PSNR (dB)/SSIM results of different methods for Set12 (The superior results are emphasized in bold).

Image	C.man	House	Peppers	Starfish	Monarch	Airplane	Parrot	Lena	Barbara	Boat	Man	Couple	Average
Noise Level	$σ = 15$
BM3D [7]	31.91	34.93	32.69	31.14	31.85	31.07	31.37	34.26	33.10	32.13	31.92	32.10	32.37/0.8953
WNNM [35]	32.17	35.13	32.99	31.82	32.71	31.39	31.62	34.27	33.60	32.27	32.11	32.17	32.70/0.8944
DnCNN [9]	32.59	34.99	33.24	32.13	33.25	31.67	31.88	34.58	32.61	32.42	32.43	32.43	32.85/0.9029
FFDNet [10]	32.43	35.07	33.24	31.99	32.66	31.57	31.87	34.62	32.54	32.38	32.41	32.46	32.77/0.9027
CDNet [38]	32.64	34.94	33.28	32.17	33.30	31.72	31.93	34.52	32.66	32.36	32.49	32.47	32.87/0.9034
ADNet [20]	32.81	35.22	33.49	32.17	33.17	31.86	31.96	34.71	32.80	32.57	32.47	32.58	32.98/0.9039
BRDNet [36]	32.80	35.27	33.47	32.24	33.35	31.82	32.00	34.75	32.93	32.55	32.50	32.62	33.03/0.9038
DRFENet [40]	32.60	35.19	33.42	32.19	33.41	31.76	31.96	34.72	32.73	32.54	32.45	32.55	32.96/0.9045
MWDCNN [39]	32.53	35.09	33.29	32.28	33.20	31.74	31.97	34.64	32.65	32.49	32.46	32.52	32.91/0.9043
MDNCL (ours)	32.91	35.46	33.53	32.31	33.68	31.94	32.08	34.81	33.15	32.62	32.56	32.68	33.14/0.9065
MDNCL-B (ours)	32.79	35.34	33.46	32.18	33.29	31.84	32.01	34.73	32.77	32.52	32.50	32.63	33.01/0.9056
Noise Level	$σ = 25$
BM3D [7]	29.45	32.85	30.16	28.56	29.25	28.42	28.93	32.07	30.71	29.90	29.61	29.71	29.97/0.8499
WNNM [35]	29.64	33.22	30.42	29.03	29.84	28.69	29.15	32.24	31.24	30.03	29.76	29.82	30.26/0.8451
DnCNN [9]	30.18	33.06	30.87	29.41	30.28	29.13	29.43	32.44	30.00	30.21	30.10	30.12	30.43/0.8608
FFDNet [10]	30.10	33.28	30.93	29.32	30.08	29.04	29.44	32.57	30.01	30.25	30.11	30.20	30.44/0.8638
CDNet [38]	30.21	32.98	30.78	29.51	30.35	29.20	29.53	32.40	30.05	30.15	30.08	30.11	30.53/0.8646
ADNet [20]	30.34	33.41	31.14	29.41	30.39	29.17	29.49	32.61	30.25	30.37	30.08	30.24	30.58/0.8637
BRDNet [36]	31.39	33.41	31.04	29.46	30.50	29.20	29.55	32.65	30.34	30.33	30.14	30.28	30.61/0.8639
DRFENet [40]	30.26	33.41	31.07	29.49	29.49	29.12	29.46	32.57	30.02	30.29	30.07	30.17	30.54/0.8642
MWDCNN [39]	30.19	33.33	30.85	29.66	30.55	29.16	29.48	32.67	30.21	30.28	30.10	30.13	30.55/0.8645
MDNCL (ours)	30.52	33.66	31.18	29.69	30.75	29.34	29.68	32.84	30.81	30.48	30.24	30.44	30.80/0.8679
MDNCL-B (ours)	30.46	33.55	31.11	29.43	30.52	29.23	29.59	32.69	30.34	30.35	30.16	30.32	30.65/0.8673
Noise Level	$σ = 50$
BM3D [7]	26.13	29.69	26.68	25.04	25.82	25.10	25.90	29.05	27.22	26.78	26.81	26.46	26.72/0.7654
WNNM [35]	26.45	30.33	26.95	25.44	26.32	25.42	26.14	29.25	27.79	26.97	26.94	26.64	27.05/0.7568
DnCNN [9]	27.03	30.00	27.32	25.70	26.78	25.87	26.48	29.39	26.22	27.20	27.24	26.90	27.18/0.7829
FFDNet [10]	27.05	30.37	27.54	25.75	26.81	25.89	26.57	29.66	26.45	27.33	27.29	27.08	27.32/0.7916
CDNet [38]	27.02	30.48	27.40	25.93	27.12	25.85	26.58	29.70	26.50	27.15	27.37	27.05	27.38/0.7924
ADNet [20]	27.31	30.59	27.69	25.70	26.90	25.88	26.56	29.59	26.64	27.35	27.17	27.07	27.37/0.7875
BRDNet [36]	27.44	30.53	27.67	25.77	26.97	25.93	26.66	29.73	26.85	27.38	27.27	27.17	27.45/0.7898
DRFENet [40]	27.10	30.57	27.54	25.83	26.89	25.83	26.42	29.60	26.53	27.31	27.27	27.06	27.33/0.7894
MWDCNN [39]	26.99	30.58	27.34	25.85	27.02	25.93	26.48	29.63	26.60	27.23	27.27	27.11	27.34/0.7897
MDNCL (ours)	27.64	30.96	27.79	26.00	27.15	26.08	26.80	29.95	27.40	27.55	27.39	27.37	27.67/0.7955
MDNCL-B (ours)	27.51	30.53	27.69	25.74	26.92	25.91	26.67	29.67	26.79	27.36	27.28	27.14	27.43/0.7923

Table 4. Average PSNR (dB) results of different methods on the CBSD68, Kodak24, and McMaster datasets (The superior results are emphasized in bold).

Datasets	Method	$σ = 15$	$σ = 25$	$σ = 50$	$σ = 75$
CBSD68	CBM3D [7]	33.52/0.925	30.71/0.872	27.38/0.767	25.74/-
	CDnCNN-B [9]	33.95/0.929	31.29/0.883	28.01/0.790	-
	FFDNet [10]	33.86/0.929	31.18/0.882	27.95/0.789	26.24/0.593
	ADNet [20]	34.02/0.933	31.34/0.889	28.05/0.797	26.33/0.606
	BUIFD [37]	33.65/0.930	30.76/0.882	26.61/0.777	-
	AirNet [42]	33.92/0.933	31.26/0.888	28.01/0.798	-
	CMDNCL (ours)	34.21/0.939	31.53/0.892	28.26/0.802	26.49/0.613
	CMDNCL-B (ours)	34.12/0.934	31.46/0.889	28.22/0.802	26.41/0.609
Kodak24	CBM3D [7]	34.28/0.916	31.68/0.868	28.46/0.775	26.82/-
	CDnCNN-B [9]	34.73/0.920	32.23/0.876	29.02/0.791	-
	FFDNet [10]	34.55/0.922	32.11/0.878	28.99/0.794	27.25/0.733
	ADNet [20]	34.76/0.924	32.26/0.882	29.10/0.798	27.40/0.739
	BUIFD [37]	34.41/0.923	31.77/0.879	27.74/0.786	-
	AirNet [42]	34.68/0.924	32.21/0.882	29.06/0.799	-
	MWDCNN [39]	34.91/0.927	32.40/0.886	29.26/0.806	27.55/0.749
	CMDNCL (ours)	35.09/0.934	32.57/0.889	29.46/0.813	28.61/0.759
	CMDNCL-B (ours)	35.01/0.930	32.49/0.885	29.37/0.809	28.02/0.751
McMaster	CBM3D [7]	34.06/0.915	31.66/0.874	28.51/0.793	-
	CDnCNN-B [9]	34.80/0.904	32.47/0.869	29.21/0.799	-
	FFDNet [10]	34.47/0.922	32.25/0.886	29.14/0.815	-
	ADNet [20]	34.93/0.927	32.56/0.894	29.36/0.825	-
	BUIFD [37]	33.84/0.901	31.06/0.847	26.20/0.733	-
	AirNet [42]	34.70/0.925	32.44/0.891	29.26/0.822	-
	CMDNCL (ours)	35.19/0.939	32.92/0.913	29.69/0.839	-
	CMDNCL-B (ours)	35.13/0.930	32.86/0.907	29.59/0.833	-

Table 5. Denoising results of different methods on real-world noise datasets (The superior results are emphasized in bold).

Test Data	Method	BM3D [7]	WNNM [35]	CBDNet [41]	HI-GAN [43]	TBSN [44]	MDNCL
SIDD	PSNR	25.65	25.78	38.68	38.47	37.78	39.31
SIDD	SSIM	0.685	0.685	0.809	0.900	0.940	0.928
DND	PSNR	34.51	34.67	38.06	39.32	39.08	39.37
DND	SSIM	0.851	0.865	0.942	0.952	0.945	0.959

Table 6. Average PSNR (dB)/SSIM results of ablation experiments on Set12 (The superior results are emphasized in bold).

Method	$σ = 15$	$σ = 25$	$σ = 50$
MDNCL-DLB	32.46/0.8913	30.19/0.8589	27.17/0.7821
MDNCL-NLB	31.93/0.8794	29.82/0.8497	26.89/0.7690
MDNCL-FSU	28.89/0.8623	29.48/0.8446	24.11/0.7413
MDNCL	33.14/0.9065	30.80/0.8679	27.67/0.7955

Table 7. Average PSNR (dB)/SSIM results of ablation experiments on Set12 (The superior results are emphasized in bold).

$μ$	$σ = 15$	$σ = 25$	$σ = 50$
0.4	31.90/0.8419	29.56/0.8449	26.47/0.7553
0.6	32.77/0.8814	29.46/0.8577	27.45/0.7690
0.8	32.57/0.8763	29.80/0.8593	27.54/0.7723
1.0	33.14/0.9065	30.80/0.8679	27.67/0.7955
1.2	31.78/0.8801	29.40/0.8470	26.45/0.7529

Table 8. Running time (in seconds) of the compared models on three sizes of grayscale and color noisy images.

Device	Model	256 × 256		512 × 512		1024 × 1024
Device	Model	Grayscale	Color	Grayscale	Color	Grayscale	Color
	BM3D [7]	0.446	0.589	2.274	3.661	9.687	12.788
CPU	TWSC [45]	12.294	34.361	53.011	140.922	220.997	608.477
	MCWNNM [46]	-	62.732	-	278.073	-	1120.332
	IRCNN [47]	0.030	0.030	0.030	0.030	0.030	0.030
	DnCNN-B [9]	0.032	0.032	0.037	0.037	0.057	0.057
	FFDNet [10]	0.031	0.030	0.031	0.030	0.032	0.030
	BUIFD [37]	0.035	0.037	0.050	0.053	0.112	0.123
	ADNet [20]	0.031	0.033	0.035	0.045	0.051	0.093
GPU	SADNet [48]	0.030	0.030	0.043	0.044	0.101	0.102
	CycleISP [49]	-	0.055	-	0.156	-	0.545
	AirNet [42]	-	0.143	-	0.498	-	2.501
	VDN [50]	0.144	0.162	0.607	0.597	2.367	2.376
	AINDNet [51]	-	0.531	-	2.329	-	9.573
	VDIR [52]	-	0.385	-	1.622	-	6.690
	MDNCL	0.279	0.316	0.574	0.697	2.013	2.174

Table 9. Complexity of different denoising methods.

Method	Parameters	FLOPs
DnCNN [9]	0.56 M	0.891 G
ADNet [20]	0.52 M	0.832 G
ATDNet [53]	9.45 M	-
RDUNet [12]	166 M	48 G
CTNet [54]	49.03 M	6.91 G
TBSN [44]	12.97 M	607.08 G
AirNet [42]	8.72 M	-
MDNCL	6.71 M	9.34 G

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cui, Y.; Shi, M.; Jiang, J. Multi-Scale Detail–Noise Complementary Learning for Image Denoising. Appl. Sci. 2024, 14, 7044. https://doi.org/10.3390/app14167044

AMA Style

Cui Y, Shi M, Jiang J. Multi-Scale Detail–Noise Complementary Learning for Image Denoising. Applied Sciences. 2024; 14(16):7044. https://doi.org/10.3390/app14167044

Chicago/Turabian Style

Cui, Yan, Mingyue Shi, and Jielin Jiang. 2024. "Multi-Scale Detail–Noise Complementary Learning for Image Denoising" Applied Sciences 14, no. 16: 7044. https://doi.org/10.3390/app14167044

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Scale Detail–Noise Complementary Learning for Image Denoising

Abstract

1. Introduction

2. Related Work

2.1. Multi-Scale Differential Feature Extraction

2.2. Deep CNNs for Image Denoising

2.3. Multi-Branch-Based Network for Image Application

3. Proposed Method

3.1. Noise Learning Branch

3.2. Detail Learning Branch

3.3. Fusion Module

3.4. Loss Function

4. Experiments

4.1. Dataset

4.2. Implementation Details

4.3. Comparative Experiments

4.4. Ablation Experiment

4.5. Model Complexity Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI