Next Article in Journal
CNN-Bi-LSTM: A Complex Environment-Oriented Cattle Behavior Classification Network Based on the Fusion of CNN and Bi-LSTM
Previous Article in Journal
Using Deep Learning Models to Predict Prosthetic Ankle Torque
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Scale Feature Learning Convolutional Neural Network for Image Denoising

1
Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
2
University of Chinese Academy of Sciences, Beijing 100039, China
3
Key Laboratory of Space-Based Dynamic & Rapid Optical Imaging Technology, Chinese Academy of Sciences, Changchun 130033, China
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(18), 7713; https://doi.org/10.3390/s23187713
Submission received: 16 July 2023 / Revised: 24 August 2023 / Accepted: 1 September 2023 / Published: 6 September 2023
(This article belongs to the Section Sensing and Imaging)

Abstract

:
Affected by the hardware conditions and environment of imaging, images generally have serious noise. The presence of noise diminishes the image quality and compromises its effectiveness in real-world applications. Therefore, in real-world applications, reducing image noise and improving image quality are essential. Although current denoising algorithms can somewhat reduce noise, the process of noise removal may result in the loss of intricate details and adversely impact the overall image quality. Hence, to enhance the effectiveness of image denoising while preserving the intricate details of the image, this article presents a multi-scale feature learning convolutional neural network denoising algorithm (MSFLNet), which consists of three feature learning (FL) modules, a reconstruction generation module (RG), and a residual connection. The three FL modules help the algorithm learn the feature information of the image and improve the efficiency of denoising. The residual connection moves the shallow information that the model has learned to the deep layer, and RG helps the algorithm in image reconstruction and creation. Finally, our research indicates that our denoising method is effective.

1. Introduction

Because of the impact of hardware devices and their surrounding conditions, noise will inevitably be generated during image transmission, which could potentially degrade the image quality. Denoising of images is a low-level vision task and an essential step for high-level vision tasks. Denoising of images holds a crucial significance in the domains of satellite remote sensing, medicine, military, and internet technology [1,2]. Mathematically, an image denoising model can be expressed as y = x + n, where y represents the original image, x corresponds to a noise-free clean image, and n represents the noise component.
Algorithms for image denoising can be broadly classified into three categories: filter-based approaches, learning-based techniques, and model-based methods. The filter-based approach employs a few manually created filters to eliminate image noise. The adaptive Wiener filter [3], the bilateral filter, the Gaussian filter, and the median filter [4] are some of the more well-known filter-based algorithms. Nevertheless, these algorithms require manual parameter tuning, and there is a risk of losing image details during the denoising process [5,6,7,8,9].
For model-based techniques, the distribution of the images and noise must be modeled. The technique is then optimized while attempting to generate a clear image using the distribution of the model as a prior. As a result, the model-based algorithm’s first phase entails capturing the noise characteristics that are built into the image and then using what is already known about the image to remove the noise in an efficient manner. The non-local mean (NLM) algorithm utilizes a weighted average of blocks that share similarities with each other in order to eliminate noise [10]. The BM3D algorithm realizes image denoising by enhancing sparsity [8]. Different from the general low-rank clustering algorithm, WNNM [11] utilizes distinct weights for singular values to maximize the utilization of prior knowledge. This approach involves leveraging prior information to determine the kernel norm employed in the process of image denoising. Finally, the effect of denoising is obtained. However, the shortcomings of these algorithms are also obvious. The level of noise must be identified in advance, and the denoising process in the testing phase is time-consuming due to the algorithm’s intricate optimization problems. This complexity leads to a prolonged duration for achieving optimal denoising results. In an effort to enhance the denoising capabilities, the CSF algorithm uses the statistical characteristics of the model based on random fields and the optimization ability of the expanded semi-quadratic algorithm to reduce the noise [12]. By performing a predetermined number of gradient descent iterations, the TNRD algorithm [13] can progressively update the denoised image, iteratively reduce noise and enhance image quality. While both CSF and TNRD algorithms exhibit their own unique strengths, they are essentially limited to fixed priors, and these algorithms are specific to specific noise, so their processing on blind noise is not ideal.
Thanks to AlexNet [14], ResNet [15], and other models, the denoising algorithm based on learning is very effective in processing images, and the image denoising algorithm based on convolutional neural network (CNN) has demonstrated remarkable performance and achieved significant advancements in the field [16,17,18,19,20,21]. For instance, a feed-forward denoising convolutional neural network (DnCNN [22]), which combines the principles of residual learning and batch normalization, designs an end-to-end network. The algorithm learns the noise of noisy pictures and then effectively improves the effect of denoising. Zhang et al. introduce an innovative denoising algorithm that is characterized by its speed and flexibility (FFDNet) [23]. Tian et al. propose an algorithm that uses residual learning and BN to solve model training difficulties (ECNDNet [24]). In order to extract more image information, the algorithm uses dilated convolution to extract context information. Tian et al. propose an algorithm that increases the influence of shallow features on deep features and propose four modules (ADNet [25]). Tian et al. propose an algorithm (BRDNet [26]) that combines two networks to increase the network width. Kligvasser et al. propose a denoising algorithm (xUnit [27]) using a new activation function, which reduces the parameters of the model as much as possible while ensuring the effect of the algorithm remains unchanged. Although these denoising techniques have successfully reduced noise, it is important to acknowledge that their feature extraction methods rely on fixed-scale approaches. This limitation restricts their ability to fully extract and utilize the rich information present in the image. Gou et al. introduce a noteworthy improvement in the field of image denoising with their proposed multi-scale adaptive network (MSANet [28]), which considers both the characteristics of the scale and the complementarity across scales and integrates them into the multi-scale design, which effectively improves the denoising performance of the image. However, the algorithm still does not take into account the loss of image details.
Building upon the aforementioned challenges, this paper introduces an innovative denoising algorithm based on the FL module and RG module. The algorithm improves the overall denoising process by transferring shallow information to the deeper layers of the network. The FL module can fully utilize the Res2Net module to extract image features [29], the information of the image is extracted from the perspectives of different dimensions, and detailed information is preserved. The residual connection transfers the shallow information to the deep network, helps the algorithm combine global and local information, improves the effect of an algorithm, and reduces the complexity of a model.
The main contributions of this paper are as follows:
(1) This algorithm uses the Res2Net network structure to design the FL module and the RG module. The FL module fully extracts image information from different scales, uses RG to reconstruct a clean image, improves the denoising performance of the algorithm, and ensures that detailed information in the image is preserved without being lost.
(2) This paper incorporates residual connections, enabling the transfer of information from shallow layers to deep layers. This combination of global and local features enhances the noise reduction efficiency of the algorithm. Additionally, this approach helps reduce the complexity of the model, making it more computationally efficient.
(3) This paper presents experimental results on datasets to validate the proposed approach for image denoising. The results demonstrate that MSFLNet achieves good performance in terms of denoising quality, as evidenced by excellent values of peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM).
The remaining parts of this article are as follows. Section 2 discusses the relevant existing work related to the proposed method. Section 3 details the proposed method. It presents the algorithm, network architecture, and techniques used in the study. In Section 4, this article introduces a large number of experimental results generated using the proposed method. This paper concludes with Section 5, which summarizes the key findings, contributions, and implications of the research.

2. Related Work

2.1. Residual Connection

As the number of network layers increases, the algorithm’s effectiveness can indeed be improved to a certain extent, but one problem that can arise with deeper networks is the gradient explosion. To overcome the challenge and improve the algorithm’s performance, the residual block that ResNet proposes combines the input of the original image with the output of several layers and feeds it to the following layer. The incorporation of residual connections plays a vital role in enabling the transfer of information from the shallow layer to the deep network within the algorithm, which can help the algorithm combine local and global. It can also solve a series of problems arising from the increase in the number of network layers.

2.2. Res2Net

Multi-scale feature learning methods differ from fixed-scale extraction methods. The multi-scale module excels at extracting image information from diverse dimensions, improving the efficiency of image denoising. Based on this, Res2Net proposes a multi-scale module built inside the residual block to form receptive fields of different sizes and obtain different fine-grained features. As shown in Figure 1, after the image information passes through the 1 × 1 convolutional layer, the image’s feature information is segmented into s subsets, where s is the number of subsets into which the feature information is divided. The segmented image information is represented by x i , where every part is the same size. But the number of channels is 1/s of the input feature map of the previous layer, where each part has a corresponding 3 × 3 convolution, h i ( x i ) represents the 3 × 3 convolution, and y i represents the output of multi-scale feature learning methods. Each part is fused with each other after passing through different convolutional layers, and finally, the network will learn image information from different scale dimensions. As shown in Formula (1), the information of the image is represented by x i . Further information is learned and represented by y i after Res2Net extracts the image information x i .
y i = x i h i ( x i ) h i ( x i + y i 1 ) i = 1 i = 2 2 < i s

3. Network Structure

In this section, the algorithm will be introduced, which is composed of three FL modules, one RG module, and a residual connection. The FL module makes full use of the learning method of multi-scale features to obtain image information. The multi-scale feature learning method extracts noise and details from different dimensions of the image. The RG module utilizes the image information learned by the FL module to reconstruct and generate clean images.

3.1. MSFLNet Module

The network structure of MSFLNet is visually represented in Figure 2. First, the noise-containing picture is input to the Conv layer of the first layer, the information of the image is initially extracted, and then all the information is passed to three FL modules and one RG module. The FL module contains two multi-scale feature modules (Res2Net) and three Conv+BN+ReLu (convolutional layer + batch normalization layer + activation function). The model is too deep in the network training process, which may cause the algorithm to produce a gradient explosion, and the gradient explosion will affect the effect of the algorithm. Therefore, to address the issue of gradient explosion, expedite network convergence, and facilitate easier training, the MSFLNet architecture incorporates batch normalization (BN) layers. These BN layers normalize the data flowing through the convolutional (Conv) layers. This normalization process ensures that the data are centered and scaled, preventing the gradients from becoming excessively large or small during training. By maintaining stable gradients, BN accelerates the convergence of the network and aids in more efficient training of the algorithm. By incorporating ReLU activation, MSFLNet can effectively capture and represent complex and non-linear relationships within the data. This is achieved by enabling the network to learn and propagate both positive and negative activations, allowing for richer feature representation and increased expressive power. The FL module fully utilizes multi-scale feature learning methods of Res2Net to extract more feature information. But unlike the three FL modules, the RG module is composed of two multi-scale feature modules (Res2Net), two Conv+BN+ReLu (convolution layer + batch normalization layer + activation function), one Conv+BN, and one Conv, and the RG module helps the algorithm to reconstruct and generate a clean image. By combining the original information of the image with the information propagated through the second FL module, the residual connection establishes a direct pathway for information transfer.

3.2. FL Module

In image denoising algorithms, the key challenge lies in effectively extracting the noise from the image while preserving the essential information of the clean image. Therefore, to ensure that an algorithm effectively removes noise from an image while retaining the complete information from the original image, the algorithm designs an FL feature extraction module. The FL module for extracting features is composed of two multi-scale feature modules (Res2Net) and three Conv+BN+ReLu (convolution layer + batch normalization + activation function) modules. Two multi-scale extraction modules are connected together that can leverage the inherent characteristics of the multi-scale feature learning methods, extract image information from different dimensions, and add three ResNetConv+BN+ReLu after the two multi-scale feature learning methods, which can improve the extraction ability of the FL module. In ResNetConv+BN+ReLu (convolution layer + batch normalization + activation function), the function of the convolution layer is to extract image feature information. BN can perform batch normalization operations on the feature information extracted by the convolution layer, which can significantly expedite the convergence of the network to mitigate the issue of gradient explosion. The ReLu activation function can help the model provide non-linear capabilities and accelerate the training of the network model. Therefore, adding three ResNetConv+BN+ReLu (convolutional layer + batch normalization + activation function) methods after the two multi-scale feature learning methods to form the FL module can effectively enhance the network’s feature extraction capability. Assuming that the information of the first 3 × 3 convolutional layer of the model is passed to the FL module, the first multi-scale module first extracts the image information, as shown in Equation (2); X denotes the image information; while R 1 signifies the output of the first multi-scale module, namely:
R 1 = R ( C o n v 3 × 3 ( X ) )
Then, after the first multi-scale module learns the image information, it transfers the information to the second multi-scale module. R 2 is represented as the output of the second multi-scale module, namely:
R 2 = R ( R 1 ( X ) )
After the second multi-scale module learns the image information, the information is first passed to the first ResNetConv+BN+ReLu (convolution layer + batch normalization + activation function). C B R is expressed as the output of ResNetConv+BN+ReLu, namely:
C B R = Re l u ( B N ( C onv 3 × 3 ( R 2 ( R 1 ( X ) ) ) ) )
The information of the image is then transmitted to the second and third layers of ResNetConv+BN+ReLu (convolution layer + batch normalization + activation function), and finally, FL outputs information, represented by R E C . C B R 1 , C B R 2 , and C B R 3 represent the output of ResNetConv+BN+ReLu (convolution layer + batch standardization + activation function) of the first, second, and third layers, respectively, namely:
R E C = C B R 3 ( C B R 2 ( C B R 1 ( R 2 ( R 1 ( X ) ) ) ) ) )

3.3. RG Module

After all the image information is learned by the network model, it needs to be reconstructed to generate a clean image. Therefore, we designed an RG module for image reconstruction, as shown in Figure 3, which consists of two multi-scale feature modules (Res2Net), two ResNetConv+BN+ReLu (convolution layer + batch normalization layer + activation function) modules, one Conv+BN module, and one Conv module. Two multi-scale feature modules (Res2Net) can help the algorithm extract the image information learned by the network and ultimately transfer all information to the last Conv layer to generate a clean image.
After the last FL module learns the image information, it transfers all the information to the RG module and reconstructs and generates a clean image, where R3 represents the result of the first multi-scale feature module (Res2Net) in the RG module, R4 represents the result of the second multi-scale feature module (Res2Net), and x is the result of the previous module. C B R 4 and C B R 5 represent the output of ResNetConv+BN+ReLu (convolution layer + batch standardization + activation function) of the first and second layers, respectively. C B represents the output of Conv+BN (convolutional layer + batch normalization), C 2 represents the input of the last layer of Conv, and C 3 represents the output of the RG module, namely:
C 2 = C B ( C B R 5 ( C B R 4 ( R 4 ( R 3 ( x ) ) ) ) )
C 3 = C o n v 3 × 3 ( C 2 )

3.4. Loss Functions and Optimizers

The convolutional neural network utilizes the loss function to quantify the disparity between the actual value and the predicted value. A smaller loss function indicates a superior performance of the algorithm. The smooth curve of the mean squared error (MSE) loss function facilitates network training. Hence, this algorithm adopts the MSE loss function, which is also referred to as the L2 loss function. As shown in Equation (8), N represents the total number of images in the training set, x i represents the image obtained from training the neural network with noisy images, and y i represents the clean image corresponding to the noisy images.
M S E = 1 N i = 1 N ( x i y i ) 2
Throughout the model training process, the optimizer plays a crucial role in facilitating parameter updates and guiding the model towards its optimal state. By combining the strengths of AdaGrad (adaptive gradient) and RMSProp (root mean square prop), the Adam optimizer leverages the advantages of both optimization algorithms. Taking into account a comprehensive estimation of the first-order and second-order moments of the gradient, the Adam optimizer calculates the update step size. The Adam optimizer is simple to implement and takes up less memory. It is particularly well suited for models with large-scale data and parameters. Hence, this article chooses Adam to help the model train to the optimal solution.

4. Experimental Results and Analysis

In this section, we will introduce the experiments of the algorithm on several image test sets, and conduct quantitative and qualitative analysis of the experimental settings and experimental results.

4.1. Experimental Environment

In order to give full play to the effect of our model, the learning rate is initially set to 0.0001, which is reduced to the original 0.2 every 30 epochs. During the training process, the batch size is set to 128, the patch is set to 40 × 40, and the Adam optimizer is selected. The training of this algorithm is conducted within a deep learning environment based on PyTorch 1.11.0 and Python 3.8 on an Ubuntu 20.04 system. The GPU is NVIDIA GeForce RTX3080, and cuda11.3 and conda8.2.1 are used to accelerate the network training of the GPU.

4.2. Training Dataset

The data sets used by the algorithm are Train400 [23], DIV2K [30], and SIDD. Train400 is 400 pictures in the Berkeley segmentation data set. The data set contains 400 clear grayscale pictures of 180 × 180. The pictures are rich in content, including various types of animals, landscapes, faces, and more. To improve the denoising performance of our algorithm, 800 pictures in the DIV2K dataset are selected as part of the dataset. The DIV2K dataset is a relatively common dataset in the field of super-resolution reconstruction. In order to facilitate training, it is scaled to a 180 × 180 size picture, and the data set is expanded by flipping the data set by 90°, 180°, 270°, and zooming. In order to train our MSFLNet algorithm model, the model trains Gaussian noise with noise levels of 15, 25, and 50, sets a patch size of 40 × 40, and finally, we generated 715,200 patches for image noise training. For real noise denoising, The algorithm selects the SIDD dataset. SIDD is a smartphone image denoising training set that includes paired clean and noisy images. We chose 140 images and cut them to 1024 × 1024 in size. We expanded the dataset by performing data augmentation on those images in order to increase the dataset.

4.3. Test Dataset

To validate the efficacy of our algorithm in removing noise, BSD68 [23] and Set12 [23] are selected. BSD68 contains 68 grayscale images with rich content, and Set12 is a dataset with 12 grayscale images. We conduct experiments on two test sets at noise levels of 15, 25, and 50. For the experiment on real noise images, we selected images from the SIDD dataset and PolyU dataset for the experiment. PolyU is a large-scale dataset containing real-world noisy images. We selected 14 images from the SIDD dataset and cropped them to 1024 × 1024 in size. Similarly, we selected 16 images from the PolyU dataset and cropped them to 1024 × 1024 in size. The algorithm selects the TNO dataset to test the denoising of infrared images. TNO is a dataset that integrates infrared and visible light images, including infrared and visible light images in military, security, and other scenarios. This algorithm cropped 19 images from the TNO dataset and tested the denoising of infrared images on them.

4.4. Experimental Analysis

We use DnCNN, xUnit, ECNDNet, ADNet, MSANet, and this algorithm to test on BSD68 and Set12. We first conduct experimental comparisons on the BSD68 test set. As shown in Table 1 and Table 2, our algorithm outperforms other algorithms in PSNR and SSIM on the BSD68 test.
As shown in Table 3, we experimented with the algorithm on the Set12 test set. redAs shown in Table 4, our algorithm exhibits higher SSIM indicators compared to other algorithms. We experimented with all the algorithms on each picture on Set12 and tested their PSNR values. As shown in Table 3, our algorithm performs better in denoising experiments with a noise level of 50, and also performs well in experiments with other noise levels.
We selected a picture in the BSD68 dataset and the Set12 dataset and provided a comparison of the denoising results between our algorithm and other algorithms. As shown in Figure 4 and Figure 5, the figure clearly demonstrates that our algorithm produces denoised results that are notably clearer while effectively preserving the details of the image. And the indicators of PSNR and SSIM are also higher.
For the experiment on infrared image denoising, we selected images from the TNO dataset for the experiment. We selected 19 images from the dataset and cropped them to 256 × 256 in size. We tested the denoising of infrared images on the test set using DnCNN, xUnit, ECNDNet, ADNet, MSANet, and our algorithm. As shown in Table 4, our algorithm performs well on PSNR and SSIM.
In the TNO dataset, we selected an image, and provided a comparison of the denoising results between our algorithm and other algorithms. As shown in Figure 6, the results show that our algorithm achieves clearer denoising results and preserves the details of the image.
We tested the denoising of real noisy images on SIDD and PolyU using DnCNN, xUnit, ECNDNet, ADNet, MSANet, and our algorithm. As shown in the Table 5 and Table 6, our algorithm performs well on PSNR and SSIM.
We selected a picture in the SIDD dataset and listed the denoising results of our algorithm and other algorithms. As can be seen from the Figure 7, the denoising results of our algorithm are clearer, and the details of the picture are preserved.

4.5. Ablation Experiment

To verify the rationality of our algorithm, as shown in Table 7, we designed ablation experiments. On real noise images, we performed denoising experiments using ‘baseline model’, ‘RG+baseline’, ‘RG+baseline’, ‘RG+FL1’, ‘RG+FL2’, and ‘RG+FL’ (MSFLNet) in that order. The ‘baseline model’ represents replacing the model proposed by the algorithm with the same amount of convolutional layers. ‘RG+baseline’ and ‘FL+baseline’ denote the use of the corresponding blocks on the basis of the ‘baseline model’. Using RG modules on the base of the baseline module is indicated by the notation ’RG+baseline’. Meanwhile, using FL modules on the base of the baseline module is indicated by the notation ’FL+baseline’. On the basis of ’RG+baseline’, ’RG+FL1’ and ’RG+FL2’ indicate blocks employing one and two FL modules, respectively.

4.6. Ablation Experiment

The total number of model parameters (Parameters) and model computation (FLOPs) can reflect the complexity of the model to a certain extent. If the total number of model parameters and model computation are too large, the model is not suitable for practical applications. Therefore, in order to verify the rationality of the model, as shown in Table 8, we calculated the total number of model parameters and the amount of model calculations for each algorithm. From the table, it can be seen that the total number of parameters and calculation amount of our model are relatively reasonable. The model can effectively remove image noise in practical applications.

5. Conclusions

In this paper, we introduce a denoising algorithm that is built upon the MSFLNet network, which includes the three FL modules and the RG module we proposed. It uses the multi-scale feature extraction ability to extract image information from different dimensions and combines the image shallow information and deep information to help the network to learn image information, significantly enhance the denoising effectiveness, and improve the algorithm’s capability to preserve image details. The experiment proves the effectiveness of the MSFLNet algorithm in image denoising.

Author Contributions

Conceptualization, S.Z. and C.L.; methodology, S.Z. and C.L.; writing—original draft preparation, S.Z. and C.L.; writing—review and editing, S.Z., C.L., Y.Z., S.L. and X.W.; funding acquisition, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant 41974210.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The public datasets are used in this study. No new data were created or analyzed. Data sharing is not applicable to this article. The SIDD datasets can be found here (http://www.cs.yorku.ca/~kamel/sidd/dataset.php, accessed on 23 August 2023). The PolyU datasets can be found here (https://gitcode.net/mirrors/csjunxu/PolyUDataset, accessed on 23 August 2023). The DIV2K datasets can be found here (https://cv.snu.ac.kr/research/EDSR/DIV2K.tar, accessed on 23 August 2023). The TNO datasets can be found (https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/1475454/TNO_Image_Fusion_Dataset.zip, accessed on 23 August 2023).

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Abdulmaged, A.; Baykara, M. Digital image denoising techniques based on multi-resolution wavelet domain with spatial filters: A review. Trait. Du Signal 2021, 38, 639–651. [Google Scholar]
  2. Arivazhagan, S.; Sugitha, N.; Vijay, A. A novel image denoising scheme based on fusing multiresolution and spatial filters. Signal Image Video Process. 2015, 9, 885–892. [Google Scholar] [CrossRef]
  3. Chen, J.; Benesty, J.; Huang, Y.; Doclo, S. New insights into the noise reduction wiener filter. IEEE Trans. Audio Speech Lang. Process. 2006, 14, 1218–1234. [Google Scholar] [CrossRef]
  4. Chen, T.; Ma, K.-K.; Chen, L.-H. Tri-state median filter for image denoising. IEEE Trans. Image Process. 1999, 8, 1834–1838. [Google Scholar] [CrossRef] [PubMed]
  5. Buades, A.; Coll, B.; Morel, J.-M. A non-local algorithm for image denoising. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 2, pp. 60–65. [Google Scholar]
  6. Tao, H.; Guo, W.; Han, R.; Yang, Q.; Zhao, J. Rdasnet: Image denoising via a residual dense attention similarity network. Sensors 2023, 23, 3. [Google Scholar] [CrossRef]
  7. Buades, A.; Coll, B.; Morel, J.M. Nonlocal image and movie denoising. Int. J. Comput. Vis. 2008, 76, 2. [Google Scholar] [CrossRef]
  8. Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef]
  9. Singh, L.; Janghel, R. Image denoising techniques: A brief survey. In Harmony Search and Nature Inspired Optimization Algorithms; Springer: Singapore, 2019; pp. 731–740. [Google Scholar]
  10. Buades, A.; Coll, B.; Morel, J.-M. Non-Local Means Denoising. Image Process. Line 2011, 1, 208–212. [Google Scholar] [CrossRef]
  11. Gu, S.; Zhang, L.; Zuo, W.; Feng, X. Weighted nuclear norm minimization with application to image denoising. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2862–2869. [Google Scholar]
  12. Schmidt, U.; Roth, S. Shrinkage fields for effective image restoration. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2774–2781. [Google Scholar]
  13. Chen, Y.; Yu, W.; Pock, T. On learning optimized reaction diffusion processes for effective image restoration. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 5261–5269. [Google Scholar]
  14. Krizhevsky, A.; Sutskever, I.; Hinton, G. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
  15. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
  16. Roth, T.P.S. Neural nearest neighbors networks. arXiv 2018, arXiv:cs.CV/1810.12575. [Google Scholar]
  17. Anwar, S.; Barnes, N. Real image denoising with feature attention. arXiv 2020, arXiv:cs.CV/1904.07396. [Google Scholar]
  18. Guo, S.; Yan, Z.; Zhang, K.; Zuo, W.; Zhang, L. Toward convolutional blind denoising of real photographs. arXiv 2019, arXiv:cs.CV/1807.04686. [Google Scholar]
  19. Lee, W.; Son, S.; Lee, K.M. Ap-bsn: Self-supervised denoising for real-world images via asymmetric pd and blind-spot network. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–20 June 2022; pp. 17704–17713. [Google Scholar]
  20. Chang, Y.; Yan, L.; Liu, L.; Fang, H.; Zhong, S. Infrared aerothermal nonuniform correction via deep multiscale residual network. IEEE Geosci. Remote Sens. Lett. 2019, PP, 1–5. [Google Scholar] [CrossRef]
  21. Liu, Y.; Anwar, S.; Qin, Z.; Ji, P.; Caldwell, S.; Gedeon, T. Disentangling noise from images: A flow-based image denoising neural network. arXiv 2021, arXiv:cs.CV/2105.04746. [Google Scholar] [CrossRef]
  22. Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [PubMed]
  23. Zhang, K.; Zuo, W.; Zhang, L. FFDNet: Toward a fast and flexible solution for CNN-based image denoising. IEEE Trans. Image Process. 2018, 27, 4608–4622. [Google Scholar] [CrossRef] [PubMed]
  24. Tian, C.; Xu, Y.; Fei, L.; Wang, J.; Wen, J.; Luo, N. Enhanced CNN for image denoising. CAAI Trans. Intell. Technol. 2019, 4, 17–23. [Google Scholar] [CrossRef]
  25. Tian, C.; Xu, Y.; Li, Z.; Zuo, W.; Liu, H. Attention-guided cnn for image denoising. Neural Netw. 2020, 124, 117–129. [Google Scholar] [CrossRef] [PubMed]
  26. Tian, C.; Xu, Y.; Zuo, W. Image denoising using deep cnn with batch renormalization. Neural Netw. 2020, 121, 461–473. [Google Scholar] [CrossRef] [PubMed]
  27. Kligvasser, I.; Shaham, T.R.; Michaeli, T. xunit: Learning a spatial activation function for efficient image restoration. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2433–2442. [Google Scholar]
  28. Gou, Y.; Hu, P.; Lv, J.; Zhou, J.T.; Peng, X. multi-scale adaptive network for single image denoising. arXiv 2022, arXiv:eess.IV/2203.04313. [Google Scholar]
  29. Gao, S.H.; Cheng, M.M.; Zhao, K.; Zhang, X.Y.; Yang, M.H.; Torr, P. Res2Net: A new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 2, 43. [Google Scholar] [CrossRef] [PubMed]
  30. Agustsson, E.; Timofte, R. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1110–1121. [Google Scholar]
Figure 1. Res2Net module. Res2Net proposes a multi-scale module built inside residual blocks to form receptive fields of different sizes and obtain different fine-grained features.
Figure 1. Res2Net module. Res2Net proposes a multi-scale module built inside residual blocks to form receptive fields of different sizes and obtain different fine-grained features.
Sensors 23 07713 g001
Figure 2. FL module. The FL module is composed of two multi-scale feature modules (Res2Net) and three ResNetConv+BN+ReLu.
Figure 2. FL module. The FL module is composed of two multi-scale feature modules (Res2Net) and three ResNetConv+BN+ReLu.
Sensors 23 07713 g002
Figure 3. RG module. RG module is composed of two multi-scale feature modules (Res2Net), two ResNetConv+BN+ReLu (convolution layer + batch normalization layer + activation function) modules, one Conv+BN module, and one Conv module.
Figure 3. RG module. RG module is composed of two multi-scale feature modules (Res2Net), two ResNetConv+BN+ReLu (convolution layer + batch normalization layer + activation function) modules, one Conv+BN module, and one Conv module.
Sensors 23 07713 g003
Figure 4. Results of selecting an image from the Set12 test set and denoising it with different algorithms when the noise level is 15.
Figure 4. Results of selecting an image from the Set12 test set and denoising it with different algorithms when the noise level is 15.
Sensors 23 07713 g004
Figure 5. When the noise level is 50, the result of denoising an image selected from BSD68 with different algorithms.
Figure 5. When the noise level is 50, the result of denoising an image selected from BSD68 with different algorithms.
Sensors 23 07713 g005
Figure 6. Results of selecting an image from the TNO test set and denoising it with different algorithms.
Figure 6. Results of selecting an image from the TNO test set and denoising it with different algorithms.
Sensors 23 07713 g006
Figure 7. Results of selecting an image from the SIDD test set and denoising it with different algorithms.
Figure 7. Results of selecting an image from the SIDD test set and denoising it with different algorithms.
Sensors 23 07713 g007
Table 1. The average value of PSNR of different algorithms on the BSD68 test set at noise levels of 15, 25, and 50.
Table 1. The average value of PSNR of different algorithms on the BSD68 test set at noise levels of 15, 25, and 50.
Data SetAlgorithmSigma = 15Sigma = 25Sigma = 50
BSD68DnCNN31.58429.05826.003
xUnit31.52229.07826.072
ECNDNet31.54929.02425.996
ADNet31.57929.05826.057
MSANet31.59229.07926.061
MSFLNet31.59429.09626.139
The bold one in the table is the best indicator.
Table 2. The average value of SSIM of different algorithms on the BSD68 test set at noise level 15, 25, and 50.
Table 2. The average value of SSIM of different algorithms on the BSD68 test set at noise level 15, 25, and 50.
Data SetAlgorithmSigma = 15Sigma = 25Sigma = 50
BSD68DnCNN0.94160.90280.8265
xUnit0.94100.90350.8293
ECNDNet0.94140.90230.8268
ADNet0.94170.90270.8278
MSANet0.94200.90330.8285
MSFLNet0.94200.90420.8314
The bold one in the table is the best indicator.
Table 3. PSNR value and average value of each picture on Set12 for different algorithms.
Table 3. PSNR value and average value of each picture on Set12 for different algorithms.
ImagesC.manHousePeppersStar.Mon.Air.ParrotLenaBarbaraBoatManCoupleAverage
Noise Level sigma = 15
DnCNN32.66435.00333.26032.14433.26031.69631.90834.56032.66832.41732.43532.45232.872
xUnit32.52434.89433.16532.00933.10831.63431.86934.46732.42232.35932.38932.36132.767
ECNDNet32.53634.93933.20832.11833.15731.62431.82534.5053243532.40132.40832.39332.796
ADNet32.781335.19233.46632.10033.24731.79031.97934.69832.84132.59732.47332.57832.979
MSANet32.69935.15933.23632.12033.19631.81931.94434.69132.67632.55132.48232.57832.929
MSFLNet32.77735.13733.40832.22333.31131.83232.01134.63832.76532.49732.49732.56032.972
Noise Level sigma = 25
DnCNN30.26433.13830.81029.39430.45529.08729.44432.42230.05730.21730.08530.09130.455
xUnit30.25933.12730.83229.42730.45629.09329,45932.46830.06130.23130.10130.13730.471
ECNDNet30.13833.00930.76429.36130.37429.04129.41932.35629.90230.17130.05630.02330.385
ADNet30.39733.37331.07729.33930.42929.14329.54332.62430.31630.38830.11430.25330.583
MSANet30.25133.39530.93629.52230.40929.14829.42432.60430.23630.34030.14330.27930.557
MSFLNet30.42833.29630.98929.42830.52229.18329.60632.53830.16130.26530.13030.25030.567
Noise Level sigma = 50
DnCNN27.34830.08127.41025.64526.82925.84826.46829.35226.20127.20927.19626.89227.207
xUnit27.36230.17127.43825.71626.90625.85926,39629.47926.29827.26827.23327.00127.261
ECNDNet27.16629.96527.24425.68126.81525,78526.27729.28726.21927.17227.17526.87127.138
ADNet27.41030.41727.60325.68526.88825.86626.64229.60626.56327.39127.23727.08827.366
MSANet27.20730.50727.55626.01926.81925,88426,44829.56826.84127.34727.29227.12727.384
MSFLNet27.50530.50927.62925.8527.02325.96026.71629.64326.80927.3827.28427.16527.456
The bold one in the table is the best indicator.
Table 4. The average value of PSNR and SSIM of different algorithms on the TNO test set at noise levels of 15, 25, and 50.
Table 4. The average value of PSNR and SSIM of different algorithms on the TNO test set at noise levels of 15, 25, and 50.
Data SetAlgorithmSigma = 15Sigma = 25Sigma = 50
TNODnCNN34.2508/0.938132.5658/0.918130.1155/0.8799
xUnit34.2516/0.938732.5917/0.919430.2127/0.8835
ECNDNet34.2549/0.938732.5721/0.919430.144/0.8832
ADNet34.3319/0.939332.6105/0.920030.2417/0.8852
MSANet34.2822/0.938932.5634/0.918930.0448/0.8795
MSFLNet34.3009/0.939232.6394/0.920330.2829/0.8861
The bold one in the table is the best indicator.
Table 5. The average value of PSNR and SSIM of different algorithms on the dataset on SIDD.
Table 5. The average value of PSNR and SSIM of different algorithms on the dataset on SIDD.
Data SetDnCNNxUnitECNDNetADNetMSANetMSFLNet
SIDD37.66/0.93934.77/0.88628.85/0.66836.44/0.91038.557/0.95638.634/0.953
The bold one in the table is the best indicator.
Table 6. The average value of PSNR and SSIM of different algorithms on the dataset on PolyU.
Table 6. The average value of PSNR and SSIM of different algorithms on the dataset on PolyU.
Data SetDnCNNxUnitECNDNetADNetMSANetMSFLNet
PolyU36.85/0.96237.00/0.96435.86/0.94036.80/0.95435.81/0.96737.24/0.973
The bold one in the table is the best indicator.
Table 7. The average value of PSNR and SSIM of different modules on the dataset on SIDD.
Table 7. The average value of PSNR and SSIM of different modules on the dataset on SIDD.
Data SetBaseline ModelRG+BaselineFL+BaselineRG+FL1RG+FL2MSFLNet
SIDD37.06/0.93037.88/0.94337.56/0.94738.31/0.95238.46/0.95238.63/0.953
The bold one in the table is the best indicator.
Table 8. The total number of model parameters and model calculations for each algorithm.
Table 8. The total number of model parameters and model calculations for each algorithm.
Data SetDnCNNxUnitECNDNetADNetMSANetMSFLNet
FLOPs7.1 G4.1 G6.6 G6.7 G27.1 G7.3 G
Parameters0.14 M0.08 M0.13 M0.13 M7.99 M0.14 M
The bold one in the table is the best indicator.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, S.; Liu, C.; Zhang, Y.; Liu, S.; Wang, X. Multi-Scale Feature Learning Convolutional Neural Network for Image Denoising. Sensors 2023, 23, 7713. https://doi.org/10.3390/s23187713

AMA Style

Zhang S, Liu C, Zhang Y, Liu S, Wang X. Multi-Scale Feature Learning Convolutional Neural Network for Image Denoising. Sensors. 2023; 23(18):7713. https://doi.org/10.3390/s23187713

Chicago/Turabian Style

Zhang, Shuo, Chunyu Liu, Yuxin Zhang, Shuai Liu, and Xun Wang. 2023. "Multi-Scale Feature Learning Convolutional Neural Network for Image Denoising" Sensors 23, no. 18: 7713. https://doi.org/10.3390/s23187713

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop