A RAW Image Noise Suppression Method Based on BlockwiseUNet

Xu, Jing; Liu, Yifeng; Fang, Ming

doi:10.3390/electronics12204346

Open AccessArticle

A RAW Image Noise Suppression Method Based on BlockwiseUNet

by

Jing Xu

^1,2,

Yifeng Liu

^1,2,* and

Ming Fang

^2,3

¹

School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China

²

Zhongshan Institute of Changchun University of Science and Technology, Zhongshan 528403, China

³

School of Artificial Intelligence, Changchun University of Science and Technology, Changchun 130022, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(20), 4346; https://doi.org/10.3390/electronics12204346

Submission received: 7 September 2023 / Revised: 16 October 2023 / Accepted: 17 October 2023 / Published: 19 October 2023

(This article belongs to the Special Issue Advances in Image Processing and Detection)

Download

Browse Figures

Versions Notes

Abstract

:

Given the challenges encountered by industrial cameras, such as the randomness of sensor components, scattering, and polarization caused by optical defects, environmental factors, and other variables, the resulting noise hinders image recognition and leads to errors in subsequent image processing. In this study, we propose a RAW image denoising method based on BlockwiseUNet. By enabling local feature extraction and fusion, this approach enhances the network’s capability to capture and suppress noise across multiple scales. We conducted extensive experiments on the SIDD benchmark (Smartphone Image Denoising Dataset), and the PSNR/SSIM value reached 51.25/0.992, which exceeds the current mainstream denoising methods. Additionally, our method demonstrates robustness to different noise levels and exhibits good generalization performance across various datasets. Furthermore, our proposed approach also exhibits certain advantages on the DND benchmark(Darmstadt Noise Dataset).

Keywords:

image denoising; RAW noise; convolutional neural network; Bayer pattern image; feature fusion

1. Introduction

The field of industrial cameras, as a significant application area of image processing technology, has gained widespread attention and application. Through the acquisition and processing of image data, industrial cameras are extensively utilized in automation inspection, quality control, logistics, and other domains. However, the images captured by industrial cameras often contain noise due to various factors, such as the influence of sensors and internal components of the imaging system [1]. The distribution and magnitude of this noise are non-uniform, severely impacting the retrieval of image information. The removal of image noise has become an indispensable step in image processing. Furthermore, while removing image noise, it is essential to ensure the preservation of complete image information. Therefore, this paper focuses on exploring the characteristics of noise in images captured by industrial cameras, as well as the contributions and issues in the field of image denoising in recent years.

In recent years, image denoising has become a prominent research area in the fields of computer vision and image processing. Research methods can be broadly categorized into two groups: traditional denoising methods and learning-based deep learning methods. Among these, a representative method is the non-local means (NLM) proposed by Buades [2], which aims to remove noise by exploiting the similarity between pixels in an image. Subsequently, Dabov, Foi [3], and others introduced the block matching and 3D filtering (BM3D) technique, which identifies blocks similar to the current block through block matching and applies 3D filtering to these blocks, resulting in a denoised image. Mairal, Bach, and et al. [4] proposed dictionary learning-based sparse representation and self-similarity non-local methods, both of which exploit the properties of the image to eliminate noise [5]. Additionally, there are other popular image denoising methods, including but not limited to denoising using Markov random fields [6,7,8], gradient-based denoising [9,10], and total variation denoising [11,12,13].

The aforementioned methods are conventional denoising approaches that, at the time, achieved decent results in image denoising. However, they inevitably suffer from several issues: (1) reliance on manual design and prior knowledge, (2) the need for extensive parameter tuning, and (3) difficulty in handling complex noise. In contrast, deep learning-based image denoising methods exhibit strong learning capabilities, enabling them to fit complex noise distributions and achieve excellent results through parallel computing and GPU utilization, effectively reducing resource consumption. Initially, multilayer perceptrons (MLPs) [14,15] and autoencoders [16,17,18] were employed for denoising tasks, but due to their limited network capacity and inability to effectively capture noise characteristics, they fell short of the performance achieved by traditional methods. However, the introduction of ResNet by Zhang [19] addressed these issues, enabling continuous improvement in the performance of deep learning-based denoising and gradually establishing its dominance in the field. In recent years, the emergence of ViT [20,21,22] brought the application of Transformers to the field of computer vision, yielding remarkable outcomes. Nowadays, Transformers [23,24,25,26] and CNNs [27,28,29] have become mainstream methods in the domain of image denoising.

In the realm of deep learning-based denoising methods, noise can be effectively modeled. However, the current mainstream deep learning approaches primarily focus on denoising RGB images. As shown in Figure 1, many noises are generated during the camera imaging process [30]. Hence, the denoising methods designed for RGB images perform unsatisfactorily when applied to denoise RAW images. In recent years, an increasing number of papers have proposed methods for denoising RAW images. Zhang [1] modeled the noise in RAW images and made assumptions about different types of noise in the images, enabling the network to learn the noise distribution more effectively. Subsequently, Wei [31] presented a model for synthesizing real noise images, forming an extreme low-light RAW noise formation model, and proposed a noise parameter correction scheme. At the same time, in the field of RGB denoising, Brooks [32] introduced a denoising approach that involves converting RGB images to RAW format for denoising, followed by reconversion to RGB format, yielding significant improvements.

Inspired by the aforementioned works, this paper focuses on the denoising problem of RAW images. The research is primarily conducted in three aspects: data preprocessing, network architecture, and loss function. Additionally, fine-tuning is incorporated into the validation process to enhance the denoising performance of the network. The contributions of this paper can be summarized as follows:

(1) This paper proposes a U-shaped convolutional neural network model named BlockwiseUNet, specifically designed for denoising RAW images. The proposed method is evaluated on the SIDD benchmark [33] and DND benchmark [34]. Objective evaluation metrics such as PSNR and SSIM are used, and the experimental results demonstrate significant improvements compared to recent denoising algorithms. Furthermore, the proposed method not only removes noise from the images but also preserves the integrity of the image information without introducing artifacts.

(2) Currently, mainstream deep learning methods primarily focus on denoising RGB images. RAW images refer to unprocessed raw camera data, which present greater challenges in noise handling due to their larger dynamic range and diverse noise characteristics. To address these challenges, this paper converts the images into a 4-channel representation and applies a unified Bayer pattern for image processing. Additionally, a combination of Charbonnier loss [35] and PSNR loss is introduced. Experimental results illustrate that the proposed method outperforms other denoising approaches, effectively removing noise from images.

2. Related Work

2.1. RAW Image

Compared to sRGB images processed through the Image Signal Processing (ISP) unit of a camera, direct processing of RAW images is superior. This is because ISP processing includes steps such as white balance, denoising, gamma correction, color channel compression, and demosaicking, which result in information loss in high spatial frequencies and dynamic range. Moreover, the nature of image noise becomes more complex and challenging to handle. Throughout the camera imaging process, from photons to electrons, and then to voltage, before being converted into digital signals, noise is introduced. These noise sources primarily include thermal noise, photon shot noise, readout noise, and quantization error noise. After going through the ISP modules, noise continues to be generated, amplified, or its statistical characteristics altered, leading to an amplified impact on image quality and increasingly uncontrollable noise characteristics. Therefore, it is more feasible and effective to perform denoising on RAW images before ISP processing. In the experiments conducted by Brooks [32] they inverted RGB images to RAW images and introduced the estimated noise into the network for denoising. Their approach achieved promising results in the field of RGB image denoising.

2.2. RAW Image Denoising

The study of image denoising has always been an essential component of the computer vision field. With the advent of deep learning, utilizing deep neural networks for denoising has become the mainstream approach in image denoising. Early deep learning denoising methods primarily focused on removing additive Gaussian white noise from RGB images. However, for the RAW image denoising benchmark established in 2017 [34], these denoising methods, while outperforming some traditional approaches, exhibited poor performance in denoising the original image data. In the case of RAW images, neighboring pixels belong to different color channels and exhibit weak correlation, lacking the traditional notion of pixel smoothness. Furthermore, since each pixel in RAW image data contains information for only one color channel, denoising algorithms designed for color images are not applicable. In recent years, noise modeling methods, such as those proposed by Wang [36] and Wei [31], have simulated the noise distribution generated during the image signal processing (ISP) pipeline, achieving promising denoising results through network-based learning. Feng [37] introduced a method that decomposes real noise into shot noise and read noise, improving the accuracy of data mapping. Zhang [38] further extended this by proposing two components of noise synthesis, namely signal-independent, and signal-dependent, which were implemented using different methods. Although utilizing noise modeling methods can provide a good understanding of the statistical characteristics and distribution patterns of noise, which also contributes to noise removal, in practical applications, noise often exhibits diversification and is influenced by various factors, such as sensor temperature and environmental lighting, among others. A single noise model cannot fully describe all noise situations, resulting in unsatisfactory denoising performance in specific scenarios. Therefore, this study employs a real noise dataset to fit image noise, aiming to achieve denoising effects that accurately simulate and reproduce noise conditions in the real world. This approach enables the algorithm to learn more types and features of noise, enhancing its generalization capability and adaptability. Consequently, the algorithm becomes more versatile, applicable to a wide range of scenarios, and more closely aligned with real-world applications.

3. Proposed Methods

3.1. Image Preprocessing

Due to the limitations of conventional photodetectors, which can only sense the intensity of light and cannot differentiate wavelengths, image sensing requires the use of color filter arrays (CFAs) to obtain color information for each pixel. Different CFAs have variations in their actual sampling and output pixel configurations. Among different CFAs, the Bayer array is the most commonly used. The Bayer array converts grayscale information into color by arranging pixels in a 1-red-2-green-1-blue pattern, considering the human eye’s higher sensitivity to green. In other words, the number of green pixels is the sum of red and blue pixels. Since the data of RAW images is stored in a matrix format and different images may have different Bayer arrays, it is necessary to standardize the matrix by cropping to achieve Bayer array unification. As illustrated in Figure 2, all Bayer array patterns are converted to the B, G, G, and R array. The G, R, B, G array is cropped by removing the first and last rows to convert it to the B, G, G, R array. The G, B, R, G array is cropped by removing the first and last columns to convert it to the B, G, G, R array. The R, G, G, B array is cropped by removing the outermost pixels to convert it to the B, G, G, R array. After standardizing the Bayer array, the matrix is cropped while maintaining the image size. Then, through linear interpolation, the R, G, and B color components are combined to obtain a demosaicked color image with four channels.

3.2. Network Architecture

In this paper, we propose a denoising network, named BlockwiseUNet, for RAW images. The overall network structure is illustrated in Figure 3. The network follows a U-shaped architecture and comprises a shallow feature extraction module, a multilayer feature fusion residual module, a deep feature fusion module, and several skip connections. During the Data Preprocessing Stage, the

1 \times 512 \times 512

image undergoes Bayer Array Unify to transform it into a

4 \times 256 \times 256

image with channels representing R, G, G, B. These transformed images are then utilized as inputs for the network.

The image first undergoes shallow feature extraction through the shallow feature module. It then progresses through the multilayer feature fusion residual module, which employs a multilayer encoder-decoder structure to extract and fuse multilevel features from the image. The fused deep features are subsequently combined with shallow features to generate an output image, which also has a size of

4 \times 256 \times 256

and channels representing R, G, G, and B. In the final Data Postprocessing Stage, the output image is then converted back from a four-channel image to a

1 \times 512 \times 512

image, maintaining consistency with the initial input.

3.2.1. Shallow Feature Extraction Module

As shown in Figure 3, during the initial stage of the network, the image undergoes shallow feature extraction through a module consisting of

3 \times 3

convolutions. This module is designed to extract shallow features from the noisy image. Since the image has not undergone multiple convolutions, it retains the original pixel information, higher resolution, and a significant amount of positional and detailed information. Additionally, employing a shallow network with

3 \times 3

convolutional kernels enables the generation of feature maps that capture more information. The output of the shallow feature extraction module serves as the input to the multi-layer feature fusion residual module, acting as a preliminary step for the final fusion with deep features.

3.2.2. Multi-Layer Feature Fusion Residual Module

The multi-layer feature fusion residual module consists of a U-shaped network comprising an encoder and a decoder. This module has a total of four layers, with the encoder and decoder being repetitions of the blocks shown in Figure 4. The number of blocks increases from top to bottom. For the input that has undergone shallow feature extraction, the channel dimensions are increased during the Encoder Stage, and downsampling is applied to reduce the spatial size. The image is progressively compressed from

4 \times 256 \times 256

to

64 \times 64 \times 64

. Subsequently, it enters the Decoder Stage, where upsampling is performed to increase the spatial size while reducing the channel dimensions. The image is expanded back from

64 \times 64 \times 64

to

4 \times 256 \times 256

. During this process, skip connections are established between the decoder and the corresponding encoder outputs at the same layer, enabling the fusion of features across multiple layers. Through the multi-layer feature fusion residual module, the denoised image is gradually restored.

Where the downsampling is the convolution kernel size

2 \times 2

, a convolution with a step size of 2 halves the spatial size of the image and doubles the number of channels. The upsampling is a

1 \times 1

convolution and channel rearrangement that doubles the spatial size of the image and halves the number of channels.

3.2.3. Multiscale Residual Block

As shown in Figure 4, the Multiple Scale Residual Block (MSRB) is depicted. The MSRB operates as follows: firstly, the feature map is divided into four sections based on channels. Each section undergoes separate convolutional and activation functions. Subsequently, the outputs of the four segments are merged, and the merged output is connected to the residual of the feature map before the segmentation step, as illustrated in Figure 5. Different convolutional kernel sizes are employed in the MSRB to establish distinct receptive fields for the input feature map. Consequently, both local and global information within the feature map can be captured. The locally and globally processed information from different scales is then fused to generate a new feature map with channels consistent with the original feature map. Finally, the new feature map is connected to the residuals of the original feature map to prevent gradient vanishing.

3.3. Loss Function

In order to ensure that high-quality denoised images are obtained through the network, the loss function chosen in this paper is the Charbonier loss, and the calculation formula is shown in Equation (1).

L = L_{c} + α L_{P S N R}

(1)

where

L_{c}

denotes the Charbonnier loss, and

L_{P S N R}

denotes the PSNR loss. To make sure that the two loss functions play a balancing role in adjusting the model weights during training, we can introduce a balancing factor

α

to adjust the relative weights of each loss function. By adjusting the balancing factor, we can control how much each loss function contributes to the overall loss. After some experimentation and tuning, we set its value as

5 \times 10^{- 5}

.

The Charbonnier loss is an improvement of the

L_{1}

loss by adding a constant term. The advantage is that it makes the curve smoother, and because of the addition of the constant term, it can be derived at zero, avoiding gradient vanishing and gradient explosion. The

L_{1}

loss is shown in Equation (2), and the Charbonnier loss is shown in Equation (3).

L_{1} (y, f (x)) = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - f (x_{i}) |

(2)

L_{c} (y, f (x)) = \frac{1}{n} \sum_{i = 1}^{n} \sqrt{{(y_{i} - f (x_{i}))}^{2} + ϵ^{2}}

(3)

in Equations (2) and (3),

L (y, f (x))

is used to measure the difference between the predicted value

f (x)

and the true value y. n denotes the number of samples,

x_{i}

denotes the i-th noise image,

f (x_{i})

denotes the i-th denoised image, and

y_{i}

denotes the i-th true value image. In Equation (3), the constant term added is

ϵ

, whose value is

10^{- 3}

.

The

L_{P S N R}

is a loss function designed for PSNR; it takes the opposite value of the PSNR loss between the denoised image and the true image, and it is necessary to calculate the MSE loss first, which is calculated as in Equation (4)

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - f (x_{i}))}^{2}

(4)

M S E

denotes mean square error, n denotes the number of samples,

y_{i}

denotes the i-th truth image, and

f (x_{i})

denotes the i-th denoised image.

P S N R = 10 {log}_{10} (\frac{255^{2}}{M S E})

(5)

L_{P S N R} = - P S N R

(6)

the calculation formula for PSNR is shown in Equation (5), and the

L_{P S N R}

is its inverse as in Equation (6). As the difference between the denoised image and the ground truth image decreases, the PSNR value increases, and the

L_{P S N R}

becomes smaller.

4. Experiments

4.1. Implementation Details

To validate the efficacy of the proposed method in this study, we conducted performance testing on two benchmark datasets: SIDD and DND. The generated results were evaluated through both qualitative and quantitative assessments. Objective evaluation criteria included metrics such as PSNR and SSIM. Additionally, we compared our method with recent image-denoising approaches in the literature.

During the experiments, specific training parameters were employed. The input channels were set to 4, and the encoder and decoder consisted of blocks with a progressive count of [2, 3, 4, 5] from top to bottom. The middle layer contained 6 blocks. The model utilized the Adam optimizer with default parameters, and a learning rate tuning strategy known as CosineAnnealingLR was selected. The initial learning rate was set at

3 \times 10^{- 4}

. The network was implemented in the PyTorch framework and trained using an NVIDIA GeForce RTX 3080 GPU.

4.2. Data Augmentation

During preprocessing, random rotation and horizontal or vertical flipping were used as a means of data augmentation. In addition, this paper references the idea of hard example mining [39]. During the validation process, we iteratively train images with PSNR that do not reach expectations by using a lower learning rate. If the PSNR consistently falls short of expectations, other tuning schemes as well will be used. In general, our hard example retraining approach aims to address images with deviating PSNR by employing a fine-tuning method.

4.3. Experimental Results

To validate the superiority and robustness of the proposed method, this study conducted ablation experiments on modules such as Bayer array standardization, hard example retraining, and multi-scale residual modules within the network architecture to assess the effectiveness of each module.

Module A refers to Bayer array unification, Module B corresponds to hard example retraining, and Module C represents the multi-scale residual module. These modules were individually evaluated and combined. The training was performed on the SIDD benchmark, ensuring consistent hyperparameter settings and using PSNR and SSIM as objective evaluation metrics. Table 1 presents the quantitative results of different modules on the SIDD benchmark.

By comparing the PSNR and SSIM metrics of different modules and combination methods, it can be seen that all three modules can bring about improvements in PSNR and SSIM metrics.

Firstly, the combination of Module A and Module B, denoted as A + B, demonstrates improved performance compared to using Module A alone or Module B alone. This improvement can be attributed to the synergistic effect of both modules. Module A focuses on Bayer array standardization, which enhances the consistency and quality of the raw color information. Module B, on the other hand, addresses the challenging examples through retraining, allowing the network to better handle complex noise patterns. The complementary nature of these modules results in a significant boost in denoising performance, as indicated by the higher PSNR and SSIM values achieved by the A + B combination.

Furthermore, the addition of the multi-scale residual module, Module C, further enhances the denoising capabilities. This module introduces the concept of multi-scale processing, allowing the network to capture and leverage information at different scales. By incorporating multi-scale residual connections, the model can effectively exploit both local and global contextual information, leading to improved denoising accuracy. The superiority of the A + B + C combination over the A+B combination underscores the importance of multi-scale processing in preserving image details and reducing noise artifacts.

In conclusion, the experimental results clearly demonstrate the advantages of using multiple modules in the proposed method. The combination of Module A, Module B, and Module C yields superior denoising performance compared to individual module usage. This comprehensive approach effectively addresses the challenges posed by Bayer array non-unification, hard examples, and multi-scale information processing, resulting in enhanced image quality and noise reduction.

4.4. Analysis of Results

In our study, we compared the performance of several image denoising methods, namely DnCNN [40], MWCNN [41], VDNet [42], DANet [43], PRIDNet [44], RDUNet [45] and CBDNet [46]. These methods were specifically selected as they represent the prominent approaches in the field of image denoising in recent years. By evaluating the performance of these methods on both the SIDD benchmark and DND benchmark, we can effectively assess the efficacy of our proposed method in comparison to these established denoising techniques.

The comparative results presented in Table 2 demonstrate the superior performance of our proposed method when compared to other mainstream denoising techniques, both in terms of quantitative metrics and visual quality. Our method achieves higher PSNR, SSIM, and MSE scores, indicating enhanced denoising accuracy and superior preservation of image details. Moreover, qualitative assessments indicate that the proposed method generates visually appealing denoised images characterized by reduced noise artifacts and improved overall image quality.

Table 3 presents the running time of different denoising methods on the SIDD benchmark when processing each pair of samples. Our method demonstrates superior performance in terms of evaluation metrics compared to other methods. However, it is worth noting that due to block reuse and the inclusion of numerous layers and parameters, our method exhibits higher model complexity, resulting in longer computational runtime compared to some other approaches. DnCNN, CBDNet, DANet, and VDNet benefit from having fewer parameters, resulting in shorter running time. On the other hand, RDUNet, PRIDNet, and MWCNN methods are more complex and require longer running time.

Figure 6 presents a subjective visual comparison between the proposed method and other comparative methods on the DND benchmark. It is evident from the visual inspection that the proposed method outperforms the other denoising methods in terms of subjective visual quality.

In Figure 6, we magnify the image details within the red box and place them in the top left corner. The denoised images obtained using the proposed method exhibit remarkable improvements in terms of noise reduction and preservation of fine image details. The images appear visually cleaner, with reduced noise artifacts and enhanced clarity. The proposed method effectively restores the natural appearance of the images, producing visually pleasing results with improved texture and sharpness.

In contrast, denoised images obtained using conventional methods often exhibit residual noise, blurring, and loss of details. From Figure 6, it can be observed that the DnCNN image exhibits a color cast phenomenon, while the MWCNN image appears excessively smoothed, resulting in a decrease in clarity and sharpness. DANet, PRIDNet, and RDUNet show deficiencies in handling edge textures, while VDNet still retains some noise, failing to achieve complete denoising. These methods struggle to effectively handle complex noise patterns and may introduce unnecessary artifacts or smoothing effects, leading to a decrease in visual quality.

These findings highlight the importance of subjective visual evaluation in assessing the quality of denoised images. The subjective visual comparison further strengthens the claim that the proposed method surpasses the comparative methods, not only in terms of quantitative metrics but also in terms of perceptually pleasing denoised results.

Figure 7 illustrates the coordinate representations of our proposed method and other comparative techniques on the SIDD and DND benchmarks. The graph on the left corresponds to the SIDD benchmark, while the one on the right represents the DND benchmark. The horizontal axis indicates PSNR, the vertical axis represents SSIM, and the size of each circle corresponds to the model’s parameter size. The coordinate graph clearly demonstrates the exceptional performance of our proposed method across both benchmarks.

Notably, when examining the PSNR values along the horizontal axis, it is evident that our proposed method outperforms the other approaches, as indicated by the higher positioning of our data points. This indicates the superior precision and accuracy of our method in reducing noise. Moreover, the vertical axis representing SSIM also showcases the excellent performance of our approach. The higher SSIM values signify that our method effectively preserves the structure and content of the images, resulting in successful noise reduction while maintaining and enhancing image quality.

Furthermore, the circle sizes provide insights into the relatively smaller model parameters of our proposed method. Although not the smallest, compared to other methods, our approach achieves excellent denoising results while maintaining a smaller model size. This implies that our method can effectively reduce the computational resource requirements in practical applications, thereby enhancing the algorithm’s utility and efficiency.

In conclusion, the coordinate graph clearly demonstrates the outstanding performance of our proposed method on the SIDD and DND benchmarks. It outperforms other comparative methods in terms of PSNR and SSIM metrics while achieving remarkable denoising results with relatively smaller model parameters.

5. Conclusions

In this paper, this study presents a novel approach for denoising RAW images, employing a U-shaped convolutional neural network model named BlockwiseUNet. The methodology incorporates advanced techniques such as data augmentation and Bayer array unification to preprocess the dataset. Additionally, the U-shaped network architecture and the multi-scale residual fusion module are introduced to enhance the denoising capabilities of the network. Comparative experiments and ablation studies convincingly demonstrate the efficacy of the proposed method in effectively removing noise from RAW images while preserving crucial edge information. Future research directions will focus on improving the runtime issue, with an emphasis on reducing model complexity while ensuring denoising performance in order to achieve faster processing.

Author Contributions

Conceptualization, J.X. and Y.L.; methodology, J.X. and Y.L.; software, Y.L.; validation, Y.L. and M.F.; formal analysis, Y.L.; investigation, J.X.; resources, J.X.; data curation, Y.L. and M.F.; writing—original draft preparation, Y.L.; writing—review and editing, J.X.; visualization, Y.L.; supervision, J.X.; project administration, J.X.; funding acquisition, M.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Laboratory Project of Optoelectronic Information Control and Security Technology: 2021JCJQLB055011.

Data Availability Statement

The dataset used in this article can be found here: SIDD and DND.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, Y.; Qin, H.; Wang, X.; Li, H. Rethinking noise synthesis and modeling in raw denoising. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 4593–4601. [Google Scholar]
Buades, A.; Coll, B.; Morel, J.M. Non-local means denoising. Image Process. Online 2011, 1, 208–212. [Google Scholar] [CrossRef]
Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image denoising with block-matching and 3D filtering. In Image Processing: Algorithms and Systems, Neural Networks, and Machine Learning; SPIE: Bellingham, WA, USA, 2006; Volume 6064, pp. 354–365. [Google Scholar]
Mairal, J.; Bach, F.; Ponce, J.; Sapiro, G. Online dictionary learning for sparse coding. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; pp. 689–696. [Google Scholar]
Gu, S.; Zhang, L.; Zuo, W.; Feng, X. Weighted nuclear norm minimization with application to image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2862–2869. [Google Scholar]
Malfait, M.; Roose, D. Wavelet-based image denoising using a Markov random field a priori model. IEEE Trans. Image Process. 1997, 6, 549–565. [Google Scholar] [CrossRef] [PubMed]
Cao, Y.; Luo, Y.; Yang, S. Image denoising based on hierarchical Markov random field. Pattern Recognit. Lett. 2011, 32, 368–374. [Google Scholar] [CrossRef]
Li, Y.; Li, C.; Li, X.; Wang, K.; Rahaman, M.M.; Sun, C.; Chen, H.; Wu, X.; Zhang, H.; Wang, Q. A comprehensive review of Markov random field and conditional random field approaches in pathology image analysis. Arch. Comput. Methods Eng. 2022, 29, 609–639. [Google Scholar] [CrossRef]
Zanella, R.; Boccacci, P.; Zanni, L.; Bertero, M. Efficient gradient projection methods for edge-preserving removal of Poisson noise. Inverse Probl. 2009, 25, 045010. [Google Scholar] [CrossRef]
Zeng, N.; Zhang, H.; Li, Y.; Liang, J.; Dobaie, A.M. Denoising and deblurring gold immunochromatographic strip images via gradient projection algorithms. Neurocomputing 2017, 247, 165–172. [Google Scholar] [CrossRef]
Beck, A.; Teboulle, M. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 2009, 18, 2419–2434. [Google Scholar] [CrossRef] [PubMed]
Chan, T.F.; Chen, K. An optimization-based multilevel algorithm for total variation image denoising. Multiscale Model. Simul. 2006, 5, 615–645. [Google Scholar] [CrossRef]
Frohn-Schauf, C.; Henn, S.; Witsch, K. Nonlinear multigrid methods for total variation image denoising. Comput. Vis. Sci. 2004, 7, 199–206. [Google Scholar] [CrossRef]
Burger, H.C.; Schuler, C.J.; Harmeling, S. Image denoising: Can plain neural networks compete with BM3D? In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2392–2399. [Google Scholar]
Fan, L.; Zhang, F.; Fan, H.; Zhang, C. Brief review of image denoising techniques. Vis. Comput. Ind. Biomed. Art 2019, 2, 1–12. [Google Scholar] [CrossRef]
Bajaj, K.; Singh, D.K.; Ansari, M.A. Autoencoders based deep learner for image denoising. Procedia Comput. Sci. 2020, 171, 1535–1541. [Google Scholar] [CrossRef]
Gondara, L. Medical image denoising using convolutional denoising autoencoders. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), Barcelona, Spain, 12–16 December 2016; pp. 241–246. [Google Scholar]
Cho, K. Boltzmann machines and denoising autoencoders for image denoising. arXiv 2013, arXiv:1301.3468. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Chen, X.; Hsieh, C.J.; Gong, B. When vision transformers outperform resnets without pre-training or strong data augmentations. arXiv 2021, arXiv:2106.01548. [Google Scholar]
Steiner, A.; Kolesnikov, A.; Zhai, X.; Wightman, R.; Uszkoreit, J.; Beyer, L. How to train your vit? data, augmentation, and regularization in vision transformers. arXiv 2021, arXiv:2106.10270. [Google Scholar]
Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17683–17693. [Google Scholar]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar]
Fan, C.M.; Liu, T.J.; Liu, K.H. SUNet: Swin transformer UNet for image denoising. In Proceedings of the 2022 IEEE International Symposium on Circuits and Systems (ISCAS), Austin, TX, USA, 28 May–1 June 2022; pp. 2333–2337. [Google Scholar]
Chen, L.; Chu, X.; Zhang, X.; Sun, J. Simple baselines for image restoration. In Proceedings of the Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Part VII. Springer Nature: Cham, Switzerland, 2022; pp. 17–33. [Google Scholar]
Chen, L.; Lu, X.; Zhang, J.; Chu, X.; Chen, C. Hinet: Half instance normalization network for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 182–192. [Google Scholar]
Mou, C.; Zhang, J.; Fan, X.; Liu, H.; Wang, R. COLA-Net: Collaborative attention network for image restoration. IEEE Trans. Multimed. 2021, 24, 1366–1377. [Google Scholar] [CrossRef]
Konnik, M.; Welsh, J. High-level numerical simulations of noise in CCD and CMOS photosensors: Review and tutorial. arXiv 2014, arXiv:1412.4031. [Google Scholar]
Wei, K.; Fu, Y.; Yang, J.; Huang, H. A physics-based noise formation model for extreme low-light raw denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2758–2767. [Google Scholar]
Brooks, T.; Mildenhall, B.; Xue, T.; Chen, J.; Sharlet, D.; Barron, J.T. Unprocessing images for learned raw denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11036–11045. [Google Scholar]
Abdelhamed, A.; Lin, S.; Brown, M.S. A high-quality denoising dataset for smartphone cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1692–1700. [Google Scholar]
Plotz, T.; Roth, S. Benchmarking denoising algorithms with real photographs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1586–1595. [Google Scholar]
Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Fast and accurate image super-resolution with deep laplacian pyramid networks. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2599–2613. [Google Scholar] [CrossRef]
Wang, Y.; Huang, H.; Xu, Q.; Liu, J.; Liu, Y.; Wang, J. Practical deep raw image denoising on mobile devices. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 1–16. [Google Scholar]
Feng, H.; Wang, L.; Wang, Y.; Huang, H. Learnability enhancement for low-light raw denoising: Where paired real data meets noise modeling. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 1436–1444. [Google Scholar]
Zhang, F.; Xu, B.; Li, Z.; Liu, X.; Lu, Q.; Gao, C.; Sang, N. Towards General Low-Light Raw Noise Synthesis and Modeling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 10820–10830. [Google Scholar]
Shrivastava, A.; Gupta, A.; Girshick, R. Training region-based object detectors with online hard example mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 761–769. [Google Scholar]
Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef]
Liu, P.; Zhang, H.; Zhang, K.; Lin, L.; Zuo, W. Multi-level wavelet-CNN for image restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 773–782. [Google Scholar]
Yue, Z.; Yong, H.; Zhao, Q.; Meng, D.; Zhang, L. Variational denoising network: Towards blind noise modeling and removal. Adv. Neural Inf. Process. Syst. 2019, 32, 1690–1701. [Google Scholar]
Yue, Z.; Zhao, Q.; Zhang, L.; Meng, D. Dual adversarial network: Towards real-world noise removal and noise generation. In Proceedings of the Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Part X 16. Springer International Publishing: New York, NY, USA, 2020; pp. 41–58. [Google Scholar]
Zhao, Y.; Jiang, Z.; Men, A.; Ju, G. Pyramid real image denoising network. In Proceedings of the 2019 IEEE Visual Communications and Image Processing (VCIP), Sydney, Australia, 1–4 December 2019; pp. 1–4. [Google Scholar]
Gurrola-Ramos, J.; Dalmau, O.; Alarcón, T.E. A residual dense u-net neural network for image denoising. IEEE Access 2021, 9, 31742–31754. [Google Scholar] [CrossRef]
Guo, S.; Yan, Z.; Zhang, K.; Zuo, W.; Zhang, L. Toward convolutional blind denoising of real photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1712–1722. [Google Scholar]

Figure 1. Imaging noise generating processes.

Figure 2. Bayer Array Unify by Trimming.

Figure 3. Overall network structure.

Figure 4. Reused blocks in encoder and decoder.

Figure 5. Multiscale residual block.

Figure 6. Qualitative evaluation of different denoising methods on DND benchmark.

Figure 7. Coordinate representation of different denoising methods on SIDD benchmark and DND benchmark.

Table 1. Quantitative results on the SIDD benchmark with different modules.

Method	PSNR (db)	SSIM (%)
A	50.82	99.15
B	50.87	99.16
C	50.74	99.16
A + B	50.92	99.17
A + C	51.08	99.21
B + C	51.16	99.23
A + B + C	51.25	99.22

Note: The table displays the optimal values in red font and the secondary values underlined.

Table 2. The quantitative results of different denoising methods on SIDD benchmark and DND benchmark.

Method	PSNR (db) on SIDD	SSIM (%) on SIDD	MSE ( $10^{- 5}$ ) on SIDD	PSNR (db) on DND	SSIM (%) on DND	MSE ( $10^{- 5}$ ) on DND
DnCNN	38.97	95.17	12.68	37.62	93.71	17.30
MWCNN	47.79	95.45	1.66	44.38	94.24	3.65
RDUNet	50.08	98.08	0.98	47.83	97.80	1.65
CBDNet	48.75	98.87	1.33	47.49	97.81	1.78
DANet	48.98	99.14	1.26	46.99	97.45	2.00
PRIDNet	48.48	98.06	1.42	45.69	96.86	2.70
VDNet	50.97	99.19	0.80	48.17	97.97	1.52
Ours	51.25	99.22	0.75	48.15	98.19	1.53

Note: The table displays the optimal values in red font and the secondary values underlined.

Table 3. Running time of different denoising methods on SIDD benchmarks.

Method	DnCNN	MWCNN	CBDNet	RDUNet	PRIDNet	DANet	VDNet	Ours
Time(s)	2.17	47.73	3.38	26.99	32.65	3.84	2.65	31.85

Note: The shortest time in the table is shown in red font and the next shortest time is underlined.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, J.; Liu, Y.; Fang, M. A RAW Image Noise Suppression Method Based on BlockwiseUNet. Electronics 2023, 12, 4346. https://doi.org/10.3390/electronics12204346

AMA Style

Xu J, Liu Y, Fang M. A RAW Image Noise Suppression Method Based on BlockwiseUNet. Electronics. 2023; 12(20):4346. https://doi.org/10.3390/electronics12204346

Chicago/Turabian Style

Xu, Jing, Yifeng Liu, and Ming Fang. 2023. "A RAW Image Noise Suppression Method Based on BlockwiseUNet" Electronics 12, no. 20: 4346. https://doi.org/10.3390/electronics12204346

APA Style

Xu, J., Liu, Y., & Fang, M. (2023). A RAW Image Noise Suppression Method Based on BlockwiseUNet. Electronics, 12(20), 4346. https://doi.org/10.3390/electronics12204346

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A RAW Image Noise Suppression Method Based on BlockwiseUNet

Abstract

1. Introduction

2. Related Work

2.1. RAW Image

2.2. RAW Image Denoising

3. Proposed Methods

3.1. Image Preprocessing

3.2. Network Architecture

3.2.1. Shallow Feature Extraction Module

3.2.2. Multi-Layer Feature Fusion Residual Module

3.2.3. Multiscale Residual Block

3.3. Loss Function

4. Experiments

4.1. Implementation Details

4.2. Data Augmentation

4.3. Experimental Results

4.4. Analysis of Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI