SwinDenoising: A Local and Global Feature Fusion Algorithm for Infrared Image Denoising

Wu, Wenhao; Dong, Xiaoqing; Li, Ruihao; Chen, Hongcai; Cheng, Lianglun

doi:10.3390/math12192968

Open AccessArticle

SwinDenoising: A Local and Global Feature Fusion Algorithm for Infrared Image Denoising

by

Wenhao Wu

^1,2,

Xiaoqing Dong

^3,*,

Ruihao Li

^1,2,

Hongcai Chen

³ and

Lianglun Cheng

^1,2

¹

Guangdong Provincial Key Laboratory of Cyber-Physical System, School of Automation, Guangdong University of Technology, Guangzhou 510006, China

²

School of Computer, Guangdong University of Technology, Guangzhou 510006, China

³

School of Physics and Electronic Engineering, Hanshan Normal University, Chaozhou 521041, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(19), 2968; https://doi.org/10.3390/math12192968

Submission received: 24 August 2024 / Revised: 17 September 2024 / Accepted: 22 September 2024 / Published: 24 September 2024

(This article belongs to the Special Issue Deep Learning and Adaptive Control, 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Infrared image denoising is a critical task in various applications, yet existing methods often struggle with preserving fine details and managing complex noise patterns, particularly under high noise levels. To address these limitations, this paper proposes a novel denoising method based on the Swin Transformer architecture, named SwinDenoising. This method leverages the powerful feature extraction capabilities of Swin Transformers to capture both local and global image features, thereby enhancing the denoising process. The proposed SwinDenoising method was tested on the FLIR and KAIST infrared image datasets, where it demonstrated superior performance compared to state-of-the-art methods. Specifically, SwinDenoising achieved a PSNR improvement of up to 2.5 dB and an SSIM increase of 0.04 under high levels of Gaussian noise (50 dB), and a PSNR increase of 2.0 dB with an SSIM improvement of 0.03 under Poisson noise (

λ

= 100). These results highlight the method’s effectiveness in maintaining image quality while significantly reducing noise, making it a robust solution for infrared image denoising.

Keywords:

infrared image denoising; image processing; deep learning; machine learning

MSC:

68U10; 94A12

1. Introduction

As a detection method that helps the human visual system to detect IR radiation, IR sensors output IR images that can reflect the temperature differences between objects in the scene [1]. IR sensors utilize passive imaging and operate around the clock. Therefore, they are widely used in scientific research [2], military detection [3], fire monitoring [4], fault diagnosis [5], medical analysis [6,7], and remote sensing [8]. However, in general, IR images suffer from low resolution, blurred edges, loss of details, low image contrast, and background noise due to factors including the principles of IR imaging technology, external environment distortion, and thermal motion of the sensor itself. In order to avoid noise becoming a disadvantage that restricts the development of IR imaging technology, IR denoising is particularly important.

In infrared image denoising, the goal is to enhance image quality by suppressing noise while preserving essential features such as edges, textures, and structural information. Mathematically, infrared denoising can be described as a feature extraction problem, where the key task is to separate noise components from the underlying image features. Let an observed noisy infrared image be represented as

I = S + N

, where S is the clean image and N is the noise. The objective of denoising is to estimate S by extracting relevant features that reflect the true structure of the scene while minimizing the impact of N.

Over the past few decades, numerous methods have been proposed to tackle infrared image denoising. These methods can be broadly categorized into two groups: non-deep-learning (traditional) methods and deep-learning-based methods.

Non-Deep-Learning Methods: Traditional approaches include filtering techniques and transform-based methods. For example, Yan et al. [9] proposed a denoising method using the improved Anscombe transformation in the wavelet domain to suppress heavy Poisson noise and preserve edges in low-light conditions. Chen et al. [10] introduced a variance-stabilizing transform (VST) with a dual-domain filter (DDF) for Poisson noise suppression. Other methods, such as the algorithm based on feature analysis in the shearlet domain by Jiang [11] and adaptive contrast enhancement by Liu et al. [12], aim to enhance image features while reducing noise. However, these traditional methods often struggle to effectively balance noise reduction and detail preservation, resulting in oversmoothing and the loss of fine details.

Deep-Learning-Based Methods: More recently, deep learning has been employed to handle the complexity of infrared images. Li et al. [13] proposed a deep learning method to address the problem of noise interference in infrared thermal imaging. Zhang et al. [14] introduced a deep reinforcement learning model for star target extraction and denoising in infrared star images. Despite their advanced capabilities, deep-learning-based methods face challenges in managing diverse noise patterns and often rely on large, annotated datasets for training, which may not always be available.

To address the limitations of existing denoising methods, this paper introduces a novel approach named SwinDenoising, which is based on the Swin Transformer architecture. Unlike traditional methods that often struggle with preserving fine details while effectively reducing noise, SwinDenoising leverages the powerful hierarchical structure of the Swin Transformer to capture both local and global features within an image. This capability allows the method to balance noise suppression with detail preservation more effectively than conventional techniques.

In recent years, the Swin Transformer architecture has gained prominence for its ability to capture both local and global image features effectively. The Swin Transformer [15] employs a hierarchical structure with shifted windows, allowing for more efficient computation while maintaining fine-grained feature extraction. Its multi-head self-attention mechanism enables it to model long-range dependencies in an image, making it particularly suitable for tasks such as image denoising. However, while the Swin Transformer provides a robust framework for feature extraction, directly applying it to infrared image denoising presents several challenges. Infrared images often contain complex noise patterns that require a tailored approach for effective noise suppression and detail preservation. Our method, SwinDenoising, builds upon the Swin Transformer architecture by introducing a novel fusion of local and global features specifically designed for infrared image denoising. Unlike the standard Swin Transformer, our approach incorporates an additional image restoration module to adaptively combine the extracted features, thereby enhancing its denoising capabilities under various noise conditions. This study aims to address these challenges by leveraging the Swin Transformer’s powerful feature extraction capabilities while introducing modifications that tailor it to the specific needs of infrared image denoising. Detailed discussions of related methods and further analysis of our approach’s distinctiveness are provided in the Section 2 and Section 3. The proposed SwinDenoising method significantly improves upon the shortcomings of traditional methods by incorporating a global–local feature fusion strategy through the multi-head self-attention mechanism. This approach enhances the model’s ability to handle complex and diverse noise patterns typical of infrared imagery. Moreover, SwinDenoising is designed to be robust across various noise levels, maintaining high performance in terms of PSNR and SSIM, even in challenging scenarios where traditional methods tend to oversmooth the image and lose critical structural information. Through extensive experiments on infrared image datasets, SwinDenoising has demonstrated superior denoising performance, showcasing its effectiveness in maintaining image quality while significantly reducing noise. This makes it a more reliable and advanced solution for infrared image denoising compared to existing methods. We show some denoising results of 50 dB Gaussian noise which is difficult to denoise, as shown in Figure 1, it can be seen that the proposed method can better preserve the image details while denoising. The main contributions of this study are as follows:

Hierarchical Feature Extraction: SwinDenoising introduces a hierarchical approach to feature extraction, which enables the model to capture image features at multiple scales. This is particularly beneficial for infrared images, where both small-scale details and broader structures need to be preserved.
Global–Local Feature Fusion: The method integrates both local and global features through the Swin Transformer’s multi-head self-attention mechanism. This fusion is critical for enhancing the model’s ability to handle diverse noise patterns, ensuring that noise is effectively reduced without compromising the integrity of the image’s structural details.
Improved Robustness in Denoising: SwinDenoising demonstrates significant improvements in robustness, particularly under high levels of Gaussian and Poisson noise. The method’s ability to maintain high PSNR and SSIM values across various noise levels underscores its effectiveness in real-world infrared image denoising scenarios.

2. Related Work

The field of infrared image denoising has seen significant advancements, with research efforts focusing on various methods to enhance the quality and applicability of denoised images. The following is a categorized overview of the most relevant work in this area.

2.1. Traditional and Hybrid Denoising Techniques

Earlier research primarily targeted the transformation of multiplicative noise into additive noise to simplify the removal process [16]. This foundational approach paved the way for more sophisticated methods, including the use of nonparametric models with the Expectation–Maximization (EM) algorithm. Song and Huang [17] proposed a framework that simultaneously removes stripe noise and denoises images by decomposing the problem into two subproblems: computing the conditional expectation of the true image and estimating the column mean of the residual image.

Building upon these traditional methods, recent studies have explored hybrid approaches. For example, He et al. [18] introduced a hybrid denoiser that combines frequency-domain and spatial-domain advantages to handle multiple types of noise in thermal infrared images. This approach demonstrates significant improvements in stripe noise removal by incorporating spatial and frequency domain priors.

2.2. Deep-Learning-Based Methods

Deep learning has revolutionized the field, particularly through convolutional neural networks (CNNs) and their variants. Tang and Jian [19] proposed a denoising CNN (DeDn-CNN) with deformable convolution modules for thermal fault diagnosis in complex electrical equipment. This method was enhanced with an improved RetinaNet for detecting densely packed or tilted electrical components, illustrating the versatility and effectiveness of deep learning models.

In the context of explainable AI methods that perform image denoising through feature extraction, some studies have proposed models that utilize feature guidance to enhance denoising performance. For example, in [20], a feature-extraction-guided approach was employed to effectively remove noise while preserving important image details. This method demonstrates the effectiveness of combining feature extraction with deep learning for robust denoising, highlighting the importance of feature-based techniques in various imaging modalities.

Other studies have targeted joint tasks to prevent error accumulation. For instance, Li et al. [21] developed a model that jointly performs denoising and demosaicking, ensuring that error propagation between these processes is minimized. Moreover, Shi et al. [22] explored neural networks that treat small targets as noise, effectively leveraging this concept for more accurate detection in infrared images.

The integration of multimodal data has become a prominent trend in recent years. Li et al. [23] utilized a combination of magnetic and infrared data to enhance diagnostic accuracy, showcasing the benefits of leveraging multiple data sources. Meanwhile, deep reinforcement learning has been applied for more precise restoration tasks, as demonstrated by Zhang et al. [14], who introduced a model capable of adapting to various noise levels and types in infrared images.

Optimization algorithms have also been employed to automate parameter selection and improve model performance. Liu et al. [24] proposed the use of fractional-order calculus in an optimization algorithm, which significantly enhances the denoising performance by automatically tuning critical parameters.

Practical applications often involve challenges such as atmospheric disturbances, which can degrade image quality. Cao et al. [25] addressed this by developing a successive approximation algorithm tailored to real-world infrared imaging conditions, effectively mitigating the impact of environmental factors.

Additionally, pre-processing techniques have been studied to improve the fusion of infrared images with visible images. Budhiraja et al. [26] explored the effects of various pre-processing steps on the fusion quality, providing valuable insights for applications requiring image fusion.

Innovations in spectral imaging have also contributed to the advancement of infrared image denoising. Yang et al. [27] introduced a mid-wave infrared snapshot compressive spectral imager (MWIR-SCSI) that uses deep infrared denoising prior to image reconstruction. This system addresses challenges such as inconsistent noise levels and data loss, providing a robust solution for high-dimensional data.

While existing infrared image denoising methods have achieved notable progress, they often fall short in effectively addressing complex noise with multiple distributions. Such limitations are particularly evident when dealing with real-world scenarios where noise characteristics vary widely across spatial and temporal dimensions. The method proposed in this study leverages the global and local feature modeling capabilities of transformers, allowing for a more nuanced and comprehensive approach to denoising. By integrating these strengths, our approach has demonstrated superior performance in reducing complex noise, thereby improving the overall quality of infrared images. This advancement not only enhances the practical applicability of denoised infrared images but also sets a new benchmark for future research in this field.

3. Method

The IR denoising method includes a multi-head self-attention neural network, and the neural network includes three modules, namely a local feature extraction module, a global feature extraction module, and an IR image denoising restoration module, as shown in Figure 2. The local feature extraction module inputs the IR image information

I_{0}

into the network

F_{l o c a l}

, and extracts the local feature information

I_{l o c a l}

in the IR image. The global feature extraction module

F_{g l o b a l}

transfers the extracted local feature information to the multi-head self-attention network. This module extracts the global features of the image using multi-head self-attention (MSA) and obtains the global features

I_{g l o b a l}

. The image restoration module fuses the local features

I_{l o c a l}

and the global features

I_{g l o b a l}

to obtain

I_{f u s i o n}

, which is processed by the neural network

F_{r e c}

, and the image restoration module restores the encoded feature information to a noise-free IR image

I_{r e c}

. The above three modules restore the image collected by the IR thermal imaging device to a noise-free image, to achieve the purpose of removing noise.

3.1. The Local Feature Extraction Module

The local feature extraction module consists of two convolutional layers with a kernel size of

K = 3

, denoted as

F_{l o c a l}

, the IR image

I_{0} \in R^{H \times W \times C_{i n}}

of dimension

C_{i n}

is processed into tensor information

I_{l o c a l} \in R^{H \times W \times C}

of dimension 96, where

C = 96

,

H, W

is the image height and length, and the mathematical model is expressed as Equation (1):

I_{l o c a l} = F_{l o c a l} (I_{0})

(1)

Through local feature extraction, in the process of local feature encoding, the image signal information will be strengthened, and the proportion of noise information will be reduced to achieve the purpose of preliminary noise reduction.

3.2. Global Feature Extraction Module

The feature tensor

I_{l o c a l}

obtained by local feature extraction is passed to the global feature extraction module

F_{g l o b a l}

for processing. The global feature processing module comprises four multi-head self-attention blocks with the same structure, as shown in Figure 3. The calculation process of each multi-head self-attention calculation module is shown in Figure 4, the window multi-head self-attention (W-MSA) and sliding-window multi-head self-attention (SW-MSA) strategies were employed to reduce computational complexity. The global feature

I_{g l o b a l} \in R^{H \times W \times C}

is extracted by the global feature extraction module, and its mathematical model is as Equation (2):

I_{g l o b a l} = F_{g l o b a l} (I_{l o c a l})

(2)

In Equation (2),

F_{g l o b a l}

represents the global feature extraction module, which includes N multi-head self-attention blocks (MSABs) and a convolution operation with a convolution kernel size of K. The feature

I_{1}, I_{2}, \dots, I_{N}

extracted by each module and the global feature

I_{g l o b a l}

are calculated by the MSABs. Its mathematical model is expressed as Equation (3):

\begin{matrix} I_{i} = F_{M S A B} (I_{i - 1}), & i = 1, 2, \dots, N \\ I_{g l o b a l} = F_{C O N V} (I_{N}) \end{matrix}

(3)

where

F_{M A S B}

represents the multi-head self-attention calculation module, and

F_{C O N V}

represents the convolution operation. In order to reduce the amount of computation, the method adopts a sliding-window multi-head self-attention mechanism. The attention mechanism first divides the input

I_{l o c a l} \in R^{H \times W \times C}

into non-overlapping windows of size

M \times M

, and the feature size becomes

\frac{H W}{M^{2}} \times M^{2} \times C

, where

\frac{H W}{M^{2}}

is the total number of windows. Standard multi-head self-attention computations are then performed within each window. The feature

X \in^{M^{2} \times C}

in a sliding window, its query, key, and value calculation process is as Equation (4):

\begin{matrix} Q = X P_{Q}, & K = X P_{K}, & V = X P_{V} \end{matrix}

(4)

In the above equation,

P_{Q}

,

P_{K}

, and

P_{V}

are the mapping matrices in the window, and different windows share the same mapping matrix. After transformation

Q, K, V \in R^{M^{2} \times d}

, the self-attention calculation method is as Equation (5):

A t t e n t i o n (Q, K, V) = S o f t M a x (Q K^{T} / \sqrt{d} + B) V

(5)

where B represents a learnable positional encoding. Next, the features are further transformed using a multilayer feature perceptron. The mathematical model is as Equation (6):

\begin{matrix} X = W M S A (L N (X)) + X \\ X = M L P (L N (X)) + X \end{matrix}

(6)

where

W M S A

represents the multi-head self-attention calculation in the window,

M L P

represents the fully connected multilayer feature perceptron, and the Layer Normalization calculation is added. Then, the sliding-window self-attention calculation is performed as Equation (7):

\begin{matrix} X = S W M S A (L N (X)) + X \\ X = M L P (L N (X)) + X \end{matrix}

(7)

where

S W M S A

represents the calculation of the multi-head self-attention mechanism after the sliding window. The sliding-window operation can extract the features between different windows. Each sliding window moves a distance

[\frac{M}{2}, \frac{M}{2}]

. The self-attention mechanism plays a crucial role in preserving image information by modeling long-range dependencies between pixels. Unlike traditional convolutional methods that use fixed-size kernels, the self-attention mechanism dynamically weighs the relationships between all pixels in a given window. This allows the model to capture intricate patterns and structures, which are vital for maintaining details during denoising. The use of sliding-window feature extraction further enhances this capability. By applying self-attention within overlapping windows, the model effectively extracts both local and global features while considering spatial correlations. This overlapping operation enables the preservation of fine details and edges that might otherwise be lost in the denoising process. Consequently, the combination of self-attention and sliding-window techniques helps maintain the integrity of the image’s structural information while reducing noise, which is particularly beneficial for complex infrared images.

3.3. Image Recovery Module

The image restoration module contains two convolution operations to restore local and global features into noise-free images. The mathematical model of this process is expressed as Equation (8):

I_{r e c} = F_{r e c} (I_{l o c a l} + I_{g l o b a l})

(8)

where

F_{r e c}

represents the image restoration module. The image restoration module is a learnable convolutional neural network, which fuses and reconstructs the feature information after the noise removal of the first two modules, restores it to image information, and further filters out the noise information.

4. Experiment

4.1. Experiment Setup

The study utilized a computational platform featuring an Intel i7-10900F CPU and a Nvidia GeForce RTX 3090 GPU. Two established datasets, FLIR and KAIST, served as the cornerstone for model training and validation, offering thermal imagery for diverse environmental adaptations. The dataset is divided into test and training sets with a ratio of 2:8. The same trained model is used in the subsequent Gaussian and Poisson noise denoising experiments.

4.2. Experimental Results for Infrared Image Denoising under Complex Noise

We conducted a series of experiments to evaluate the performance of our infrared image denoising method under Gaussian noise levels of 15 dB, 25 dB, and 50 dB. Our method (referred to as “Ours”) is compared with several state-of-the-art techniques: Restormer [28], NAFNet [29] in its 32 and 64 width configurations, and HINet [30], versions 0.5 and 1.0. The results can be summarized as follows:

15 dB Gaussian Noise At a noise level of 15 dB, our method demonstrates superior denoising capabilities, maintaining more structural details and achieving a cleaner output compared to the other tested methods. Specifically, our method exhibits exceptional retention of fine details with minimal noise artifacts. In contrast, Restormer provides good noise reduction but with some loss of detail. NAFNet (widths 32 and 64) is effective in denoising but introduces slight blurring, while HINet (versions 0.5 and 1.0) shows noticeable denoising with some finer details compromised.

The results indicate that our method outperforms the others by providing a balanced approach to denoising while preserving the essential features of the image. The visualization result is shown in Figure 5.

25 dB Gaussian Noise At a noise level of 25 dB, our method continues to excel, outperforming the comparative methods in both noise suppression and detail preservation. Our method delivers the best performance, producing clear and detailed outputs. In comparison, Restormer exhibits adequate performance, though some areas retain residual noise. NAFNet (widths 32 and 64) effectively reduces noise but is slightly less detailed, while HINet (versions 0.5 and 1.0) performs well overall, albeit with minor loss of detail.

Our method consistently provides superior results, indicating its robustness across different noise levels. The visualization result is shown in Figure 6.

50 dB Gaussian Noise At the highest noise level of 50 dB, our method significantly outperforms the other techniques, delivering the cleanest and most detailed images. Specifically, our method exhibits remarkable performance with excellent detail retention and minimal noise. In contrast, Restormer provides adequate performance but leaves some residual noise. NAFNet (widths 32 and 64) is effective but results in notable blurring and detail loss, while HINet (versions 0.5 and 1.0) achieves adequate denoising but with a significant loss of fine details.

Even under extreme noise conditions, our method is proved to be the most effective, providing clear and accurate denoised images. The visualization result is shown in Figure 7.

In the numerical comparison, we evaluated the performance of several denoising methods, including the proposed SwinDenoising algorithm, using 100 images selected from the FLIR and KAIST datasets as the test set. The images were subjected to varying levels of Gaussian noise, and the denoising results were quantified using Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) metrics, as shown in Table 1.

The results demonstrate that SwinDenoising consistently outperforms the other methods across all noise levels. Specifically, the method achieves higher PSNR and SSIM scores, indicating its superior ability to preserve image quality while effectively reducing noise. The SwinDenoising method’s advantage is particularly pronounced at higher noise levels, where traditional methods struggle to maintain image structure. This superior performance is attributed to the method’s utilization of the Swin Transformer architecture, which adeptly captures both local and global features, enhancing its robustness in denoising infrared images.

The experimental results highlight our denoising method’s superiority across varying Gaussian noise levels. Our approach consistently outperforms other state-of-the-art noise reduction and detail preservation techniques, making it a highly effective solution for infrared image denoising.

We also conducted a series of experiments to evaluate the performance of our infrared image denoising method at Poisson noise levels of

λ

= 50 and

λ

= 100. Our method (referred to as “Ours”) is again compared with several state-of-the-art techniques: Restormer, NAFNet in its 32 and 64 width configurations, and NIHNet, versions 0.5 and 1.0. The results can be summarized as follows:

Poisson Noise $λ = 50$ At a noise level of

λ

= 50, our method demonstrates superior denoising capabilities, maintaining more structural details and achieving a cleaner output compared to the other tested methods. Specifically, our method exhibits exceptional retention of fine details with minimal noise artifacts. In contrast, Restormer provides good noise reduction but with some loss of detail. NAFNet (widths 32 and 64) is effective in denoising but introduces slight blurring, while NIHNet (versions 0.5 and 1.0) shows noticeable denoising with some finer details compromised. The visualization result is shown in Figure 8.

Poisson Noise $λ = 100$ At a noise level of

λ

= 100, our method continues to excel, outperforming the comparative methods in both noise suppression and detail preservation. Our method delivers the best performance, producing clear and detailed outputs. In comparison, Restormer exhibits adequate performance, though some areas retain residual noise. NAFNet (width 32 and 64) effectively reduces noise but is slightly less detailed, while NIHNet (0.5 and 1.0) performs well overall, albeit with minor loss of detail. The visualization result is shown in Figure 9.

Table 2 presents the comparative performance of the denoising methods under Poisson noise conditions, again using 100 images from the FLIR and KAIST datasets as the test set. The noise was introduced at two different intensity levels,

λ

= 50 and

λ

= 100, to simulate real-world conditions.

The proposed SwinDenoising method exhibits superior performance in both scenarios, achieving higher PSNR and SSIM values compared to other methods. Notably, at the higher noise intensity (

λ

= 100), SwinDenoising significantly outperforms the competing methods, demonstrating its ability to effectively mitigate Poisson noise while preserving critical image details. This enhanced performance underscores the efficacy of the Swin Transformer’s attention mechanism, which enables precise feature extraction and noise suppression, particularly in the challenging context of infrared imagery.

The experimental results highlight the superiority of our denoising method across varying levels of Poisson noise. Our approach consistently outperforms other state-of-the-art techniques in both noise reduction and detail preservation, making it a highly effective solution for high-quality infrared image denoising.

To further demonstrate the effectiveness of our approach in handling both Gaussian and Poisson infrared noise, we present additional results across various scenes (as shown in Figure 9, Figure A1, Figure A2, Figure A3 and Figure A4). In these figures, our proposed method, SwinDenoising, is labeled as “Proposed” in the legends, and it exhibits superior performance in preserving scene details under different noise conditions.

5. Conclusions

In this paper, we have introduced SwinDenoising, a novel local and global feature fusion algorithm for IR image denoising. Our method leverages the multi-head self-attention mechanism, originally developed for natural language processing, to effectively model both local and global features. This approach addresses the limitations of current denoising methods, such as artifact generation and oversmoothing, by exploiting the mechanism’s robust global and local feature extraction capabilities, making it well-suited to handle the complex noise distributions typically found in IR imagery.

We have conducted extensive experiments to evaluate the effectiveness of our proposed method. Specifically, we constructed a noisy infrared image dataset dominated by sea scenes to test the denoising algorithm. Comparative analysis demonstrates that SwinDenoising consistently outperforms state-of-the-art techniques in both noise reduction and detail preservation, as evidenced by quantitative metrics such as SSIM and PSNR. For instance, under Poisson noise with

λ

= 50 and

λ

= 100, our method achieved SSIM scores of 0.927 and 0.892, respectively, and PSNR values of 37.17 dB and 36.06 dB, outperforming competing methods.

In conclusion, leveraging the multi-head self-attention mechanism for IR image denoising significantly enhances image quality by effectively addressing the complex noise characteristics inherent in IR images. This advancement underscores the potential of integrating sophisticated feature extraction techniques to overcome the limitations of traditional denoising methods, thereby promoting the continued development and application of IR imaging technology. Future work could focus on further optimizing the network architecture and exploring its applicability to other imaging modalities and more challenging noise scenarios.

Author Contributions

Conceptualization, W.W. and L.C.; Methodology, W.W.; Software, W.W. and R.L.; Writing—original draft, W.W.; Writing—review and editing, X.D., H.C. and L.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by a Guangdong Provincial Marine Electronic Information Special Project (GDNRC[2024]19), Innovation Teams of Ordinary Universities in Guangdong Province projects (2021KCXTD038, 2023KCXTD022), a Key Laboratory of Ordinary Universities in Guangdong Province project (2022KSYS003), a China University Industry, University, and Research Innovation Fund project (2022XF058), Key Discipline Research Ability Improvement Project of Guangdong Province projects (2021ZDJS043, 2022ZDJS068), and a Chaozhou Engineering Technology Research Center, and the Chaozhou Science and Technology Plan project (202102GY17), and a Special Projects in Key Fields of Ordinary Universities in Guangdong Province (2022ZDZX3011, 2023ZDZX2038).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. Denoising effect of 15 dB Gaussian noise, zoom to see details.

Figure A2. Denoising effect of 25 dB Gaussian noise, zoom to see details.

Figure A3. Denoising effect of 50 dB Gaussian noise, zoom to see details.

Figure A4. Denoising effect of Poisson noise,

λ

= 50, zoom to see details.

Figure A4. Denoising effect of Poisson noise,

λ

= 50, zoom to see details.

Figure A5. Denoising effect of Poisson noise,

λ

= 100, zoom to see details.

Figure A5. Denoising effect of Poisson noise,

λ

= 100, zoom to see details.

References

Tan, C.L.; Mohseni, H. Emerging technologies for high performance infrared detectors. Nanophotonics 2018, 7, 169–197. [Google Scholar] [CrossRef]
Ng, W.; Minasny, B.; McBratney, A. Convolutional neural network for soil microplastic contamination screening using infrared spectroscopy. Sci. Total. Environ. 2020, 702, 134723. [Google Scholar] [CrossRef]
Wu, S.; Zhang, K.; Li, S.; Yan, J. Learning to track aircraft in infrared imagery. Remote Sens. 2020, 12, 3995. [Google Scholar] [CrossRef]
Li, Y.; Yu, L.; Zheng, C.; Ma, Z.; Yang, S.; Song, F.; Zheng, K.; Ye, W.; Zhang, Y.; Wang, Y.; et al. Development and field deployment of a mid-infrared CO and CO₂ dual-gas sensor system for early fire detection and location. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 270, 120834. [Google Scholar] [CrossRef]
Chen, Q.; Li, Y.; Liu, Z.; Gao, J.; Li, X.; Zhang, S. Fault detection and analysis of voltage transformer secondary terminal based on infrared temperature measurement technology. In Proceedings of the 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chengdu, China, 20–22 December 2019; IEEE: New York, NY, USA, 2019; Volume 1, pp. 1283–1286. [Google Scholar]
Mambou, S.J.; Maresova, P.; Krejcar, O.; Selamat, A.; Kuca, K. Breast cancer detection using infrared thermal imaging and a deep learning model. Sensors 2018, 18, 2799. [Google Scholar] [CrossRef]
Arun, M.; Baraneetharan, E.; Kanchana, A.; Prabu, S. Detection and monitoring of the asymptotic COVID-19 patients using IoT devices and sensors. Int. J. Pervasive Comput. Commun. 2020, 18, 407–418. [Google Scholar]
Zhang, R.; Li, H.; Duan, K.; You, S.; Liu, K.; Wang, F.; Hu, Y. Automatic detection of earthquake-damaged buildings by integrating UAV oblique photography and infrared thermal imaging. Remote Sens. 2020, 12, 2621. [Google Scholar] [CrossRef]
Shen, Y.; Chen, Y.; Liu, Q.; Lou, S.; Yu, W.; Wang, X.; Chen, H. Improved Anscombe transformation and total variation for denoising of lowlight infrared images. Infrared Phys. Technol. 2018, 93, 192–198. [Google Scholar] [CrossRef]
Chen, X.; Liu, L.; Zhang, J.; Shao, W. Infrared image denoising based on the variance-stabilizing transform and the dual-domain filter. Digit. Signal Process. Rev. J. 2021, 113, 103012. [Google Scholar] [CrossRef]
Jiang, M. Edge enhancement and noise suppression for infrared image based on feature analysis. Infrared Phys. Technol. 2018, 91, 142–152. [Google Scholar] [CrossRef]
Liu, C.; Sui, X.; Kuang, X.; Liu, Y.; Gu, G.; Chen, Q. Adaptive contrast enhancement for infrared images based on the neighborhood conditional histogram. Remote Sens. 2019, 11, 1381. [Google Scholar] [CrossRef]
Li, Z.; Luo, S.; Chen, M.; Wu, H.; Wang, T.; Cheng, L. Infrared thermal imaging denoising method based on second-order channel attention mechanism. Infrared Phys. Technol. 2021, 116, 103789. [Google Scholar] [CrossRef]
Zhang, Z.; Zheng, W.; Ma, Z.; Yin, L.; Xie, M.; Wu, Y. Infrared star image denoising using regions with deep reinforcement learning. Infrared Phys. Technol. 2021, 117, 103819. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Binbin, Y. An improved infrared image processing method based on adaptive threshold denoising. EURASIP J. Image Video Process. 2019, 2019, 1–12. [Google Scholar] [CrossRef]
Song, L.; Huang, H. Simultaneous Destriping and Image Denoising Using a Nonparametric Model With the EM Algorithm. IEEE Trans. Image Process. 2023, 32, 1065–1077. [Google Scholar] [CrossRef]
He, Y.; Zhang, C.; Zhang, B.; Zhi, C. FSPnP: Plug-and-Play Frequency–Spatial-Domain Hybrid Denoiser for Thermal Infrared Image. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5000416. [Google Scholar] [CrossRef]
Tang, Z.; Jian, X. Thermal Fault Diagnosis of Complex Electrical Equipment Based on Infrared Image Recognition. Sci. Rep. 2024, 14, 5547. [Google Scholar] [CrossRef]
Dong, G.; Ma, Y.; Basu, A. Feature-guided CNN for denoising images from portable ultrasound devices. IEEE Access 2021, 9, 28272–28281. [Google Scholar] [CrossRef]
Li, N.; Wang, B.; Goudail, F.; Zhao, Y.; Pan, Q. Joint Denoising-Demosaicking Network for Long-Wave Infrared Division-of-Focal-Plane Polarization Images with Mixed Noise Level Estimation. IEEE Trans. Image Process. 2023, 32, 5961–5976. [Google Scholar] [CrossRef]
Shi, M.; Wang, H. Infrared dim and small target detection based on denoising autoencoder network. Mob. Netw. Appl. 2020, 25, 1469–1483. [Google Scholar] [CrossRef]
Li, X.; Zhang, J.; Shi, J. Quantitative nondestructive testing of broken wires for wire rope based on magnetic and infrared information. J. Sensors 2020, 2020, 1–14. [Google Scholar] [CrossRef]
Liu, T.; Zhang, Y.; Yang, F. Fractional order total variational infrared image denoising with fused flower pollination particle swarm optimization. Opt. Eng. 2023, 62, 033105. [Google Scholar] [CrossRef]
Cao, Z.; Zhong, Y.; Gaotang, L.; Zhao, H.; Pang, L. A successive approach to enhancement of infrared facial images. In Proceedings of the Infrared Technology and Applications XLVIII, Orlando, FL, USA, 3–7 April 2022; SPIE: Philadelphia, PA, USA, 2022; Volume 12107, pp. 407–416. [Google Scholar]
Budhiraja, S.; Agrawal, S.; Sohi, B.S.; Sharma, N. Effect of pre-processing on MST based infrared and visible image fusion. In Proceedings of the 2021 3rd International Conference on Signal Processing and Communication (ICPSC), Coimbatore, India, 13–14 May 2021; IEEE: New York, NY, USA, 2021; pp. 522–526. [Google Scholar]
Yang, S.; Qin, H.; Yan, X.; Yuan, S.; Zeng, Q. Mid-Wave Infrared Snapshot Compressive Spectral Imager with Deep Infrared Denoising Prior. Remote Sens. 2023, 15, 280. [Google Scholar] [CrossRef]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar]
Chen, L.; Chu, X.; Zhang, X.; Sun, J. Simple baselines for image restoration. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 17–33. [Google Scholar]
Jing, J.; Deng, X.; Xu, M.; Wang, J.; Guan, Z. Hinet: Deep image hiding by invertible network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 4733–4742. [Google Scholar]

Figure 1. The denoising results of the proposed method for difficult noise.

Figure 2. The overall architecture of the proposed network SwinDenoising for IR image denoising.

Figure 3. The construction of the MSABs. Each module consists of 4 multi-head self-attention computations.

Figure 4. The construction of the MSA when calculating multi-head self-attention.

Figure 5. Visualization effect of 15 dB Gaussian noise denoising. The red boxes show magnified sections of the images to their left.

Figure 6. Visualization effect of 25 dB Gaussian noise denoising. The red boxes show magnified sections of the images to their left.

Figure 7. Visualization effect of 50 dB Gaussian noise denoising. The red boxes show magnified sections of the images to their left.

Figure 8. Visualization effect of

λ

= 50 Poisson noise denoising, and magnification effect of red box position on the right.

Figure 8. Visualization effect of

λ

= 50 Poisson noise denoising, and magnification effect of red box position on the right.

Figure 9. Visualization effect of

λ

= 100 Poisson noise denoising. The red boxes show magnified sections of the images to their left.

Figure 9. Visualization effect of

λ

= 100 Poisson noise denoising. The red boxes show magnified sections of the images to their left.

Table 1. Performance comparison of tested approaches with 15 dB, 25 dB, and 50 dB Gaussian noise. The results are the average test results of the test set. The best and second best scores are highlighted.

	15 dB		25 dB		50 dB
Method	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR
HINet0.5x	0.850	33.50	0.708	27.73	0.704	29.21
HINet1x	0.848	33.38	0.789	31.58	0.708	29.31
NAFNet32	0.841	33.30	0.786	31.54	0.691	28.90
NAFNet64	0.826	32.52	0.792	31.64	0.690	28.89
Restormer	0.835	33.14	0.681	29.02	0.714	29.39
SwinDenoising (Ours)	0.869	33.74	0.801	31.92	0.726	29.42

Table 2. Comparison of the denoising effect with Poisson noise with

λ

= 50 and

λ

= 100, and the result is the average test result of the test set. The best and second best scores are highlighted.

Table 2. Comparison of the denoising effect with Poisson noise with

λ

= 50 and

λ

= 100, and the result is the average test result of the test set. The best and second best scores are highlighted.

	$λ$ = 50		$λ$ = 100
Method	SSIM	PSNR	SSIM	PSNR
HINet0.5x	0.916	36.12	0.885	34.64
HINet1x	0.919	36.32	0.888	34.74
NAFNet32	0.921	36.57	0.890	35.10
NAFNet64	0.918	36.39	0.891	35.14
Restormer	0.914	36.09	0.871	34.18
SwinDenoising (Ours)	0.927	37.17	0.892	36.06

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, W.; Dong, X.; Li, R.; Chen, H.; Cheng, L. SwinDenoising: A Local and Global Feature Fusion Algorithm for Infrared Image Denoising. Mathematics 2024, 12, 2968. https://doi.org/10.3390/math12192968

AMA Style

Wu W, Dong X, Li R, Chen H, Cheng L. SwinDenoising: A Local and Global Feature Fusion Algorithm for Infrared Image Denoising. Mathematics. 2024; 12(19):2968. https://doi.org/10.3390/math12192968

Chicago/Turabian Style

Wu, Wenhao, Xiaoqing Dong, Ruihao Li, Hongcai Chen, and Lianglun Cheng. 2024. "SwinDenoising: A Local and Global Feature Fusion Algorithm for Infrared Image Denoising" Mathematics 12, no. 19: 2968. https://doi.org/10.3390/math12192968

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SwinDenoising: A Local and Global Feature Fusion Algorithm for Infrared Image Denoising

Abstract

1. Introduction

2. Related Work

2.1. Traditional and Hybrid Denoising Techniques

2.2. Deep-Learning-Based Methods

3. Method

3.1. The Local Feature Extraction Module

3.2. Global Feature Extraction Module

3.3. Image Recovery Module

4. Experiment

4.1. Experiment Setup

4.2. Experimental Results for Infrared Image Denoising under Complex Noise

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI