Next Article in Journal
Direction-of-Arrival Estimation for a Floating HFSWR Through Iterative Adaptive Beamforming of Focusing Concept
Previous Article in Journal
Overview of Radar Jamming Waveform Design
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Framework for Real ICMOS Image Denoising: LD-NGN Noise Modeling and a MAST-Net Denoising Network

1
National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
Beijing Key Laboratory of Space Environment Exploration, Beijing 100190, China
4
Key Laboratory of Science and Technology on Space Environment Situational Awareness, Beijing 100190, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(7), 1219; https://doi.org/10.3390/rs17071219
Submission received: 16 January 2025 / Revised: 24 March 2025 / Accepted: 27 March 2025 / Published: 29 March 2025

Abstract

:
Intensified complementary metal-oxide semiconductor (ICMOS) sensors involve multiple steps, including photoelectric conversion and photoelectric multiplication, each of which introduces noise that significantly impacts image quality. To address the issues of insufficient denoising performance and poor model generalization in ICMOS image denoising, this paper proposes a systematic solution. First, we established an experimental platform to collect real ICMOS images and introduced a novel noise generation network (LD-NGN) that accurately simulates the strong sparsity and spatial clustering of ICMOS noise, generating a multi-scene paired dataset. Additionally, we proposed a new noise evaluation metric, KL-Noise, which allows a more precise quantification of noise distribution. Based on this, we designed a denoising network specifically for ICMOS images, MAST-Net, and trained it using the multi-scene paired dataset generated by LD-NGN. By capturing multi-scale features of image pixels, MAST-Net effectively removes complex noise. The experimental results show that, compared to traditional methods and denoisers trained with other noise generators, our method outperforms both qualitatively and quantitatively. The denoised images achieve a peak signal-to-noise ratio (PSNR) of 35.38 dB and a structural similarity index (SSIM) of 0.93. This optimization provides support for tasks such as image preprocessing, target recognition, and feature extraction.

Graphical Abstract

1. Introduction

Deep space ultraviolet detection technology is a critical tool for human exploration of the solar system and celestial bodies. To meet the unique demands of deep space exploration, detectors must exhibit high resolution, radiation resistance, and high quantum efficiency under extreme environmental conditions. Among the commonly used detectors, doped solid-state detectors are limited by the intrinsic properties of semiconductor materials, resulting in a quantum efficiency of less than 10% in the ultraviolet range [1]. Position-sensitive anode detectors are constrained by noise in anode decoding algorithms, leading to limited spatial resolution [2]. In contrast, ICMOS (intensified complementary metal-oxide semiconductor) detectors leverage low-noise CMOS readout circuit designs to achieve micron-level spatial resolution detection. Through the synergistic effect of image intensifiers and photocathodes, the system can achieve quantum efficiencies of over 70% in the ultraviolet range [3,4]. Additionally, radiation-resistant process optimization and hardware reinforcement ensure the reliability of ICMOS detectors in extreme environments. These characteristics make ICMOS detectors the optimal technology solution for deep space ultraviolet detection.
However, the imaging process of ICMOS sensors involves steps such as photoelectric conversion and photoelectron multiplication, with each step introducing specific noise sources. Especially during microchannel plate (MCP) multiplication and the stage where output electrons bombard the phosphor screen, a large amount of spatially clustered random noise is generated. This type of noise is significantly different from the characteristics of CMOS sensors, especially in terms of its disruptive impact on the structural information of images [5,6]. This type of spatially clustered noise not only leads to the loss of image details but also significantly degrades the overall imaging quality. Consequently, effectively suppressing this noise has emerged as a significant challenge in ICMOS imaging technology. Furthermore, beyond considering denoising effectiveness, methods for denoising ICMOS images must possess strong generalization capabilities. This aspect is notably lacking in current ICMOS denoising approaches. Therefore, a key challenge for ICMOS image denoising is not only how to remove noise accurately but also how to design a denoising algorithm that can cope with various noise patterns and scene changes to improve the universality and robustness of the method.
In recent years, convolutional neural networks (CNNs) have demonstrated exceptional performance in image denoising tasks, thanks to their innovative network structures and efficient training strategies. At present, deep learning-based image denoising techniques primarily operate by learning the mapping relationships between numerous noisy–clean image pairs, achieving efficient denoising results. Although this training strategy can yield excellent performance, it requires substantial data, often leading to prolonged training periods and high energy consumption. Additionally, ICMOS systems are typically used in extremely low-light conditions, where image capture is particularly challenging, making this large-scale data training approach less suitable.
In this paper, we introduce a novel approach that leverages deep learning-based noise modeling to learn the noise characteristics of ICMOS sensors and generate realistic noise images to compensate for the lack of real data. By synthesizing noise and pairing it with clean images to form training datasets, we effectively address the challenges in denoising ICMOS images.
A large number of studies have focused on developing noise simulation models for conventional CMOS imaging systems to generate realistic noisy images, and these models are used to construct paired datasets with clean images for denoising [7,8,9,10,11,12,13]. However, due to the complex noise model of the ICMOS system and the stringent data collection conditions, simulation models specifically for ICMOS system noise have not been thoroughly investigated, and the related field is still in an exploratory stage.
In this paper, we specifically address the issue of random clustered noise in ICMOS images and propose a novel real noise model that incorporates neighborhood pixel awareness to simulate realistic ICMOS noise. Our goal is not only to simulate the general noise found in ICMOS images but also to capture the additional structural information caused by random clustered noise. This realistic noise model significantly bridges the gap between conventional synthetic noise and the actual noise in ICMOS images. Furthermore, based on the proposed noise model, we synthesized “real” noisy images for supervised image denoising. The experimental results demonstrate that our approach achieves significant improvements in denoising performance and effectively tackles the challenges associated with ICMOS image denoising. Our main contributions can be summarized as follows:
(1)
We have built a comprehensive experimental testing platform and collected an ICMOS image dataset that spans various illumination levels and diverse scene conditions, addressing the shortcomings of existing methods in multi-scene noise modeling.
(2)
We propose a novel ICMOS image noise modeling framework, LD-NGN, along with a new noise evaluation method, KL-Noise, which accurately simulates the inherent sparsity and spatial clustering characteristics of ICMOS noise. This approach more precisely characterizes the noise distribution across different images, providing abundant and realistic training datasets for ICMOS image denoising tasks.
(3)
We propose an image denoising network, MAST-Net, for ICMOS sensor images, which achieves excellent results on real noise datasets.

2. Related Work

2.1. Noise Image Synthesis

In recent years, deep learning has made significant progress in the field of noise image synthesis. The primary challenge lies in the complexity and diversity of noise as well as the ability of models to generalize to unknown feature spaces while maintaining accuracy. Wei et al. [14] identified CMOS sensor noise as a complex combination of shot noise, read noise, and row noise. Chang et al. [15] proposed a noise generation model based on real-world noise and camera perception. Sun et al. [10] improved the quality of canine low-dose CT images using a generative adversarial network with an anti-aliasing generator and a multi-scale discriminator. Chen et al. [16] introduced a GAN-based model that learns noise distributions by cropping uniform noisy image patches. Li et al. [17] used the generative adversarial network (GAN) algorithm with Cycle-GAN to synthesize the oblique stripe noise of spaceborne remote sensing images. Zhang et al. [18] proposed a novel sensor-related image synthesis framework for remote sensing image synthesis, aimed at ship detection.
Although these methods have achieved notable progress in noise modeling, no prior work has focused on simulating ICMOS image noise. Most existing deep learning-based noise simulation methods primarily consider independent and identically distributed noise and fail to account for the spatial aggregation and strong local correlation characteristics of ICMOS image noise. As a result, the complex structural and distributional characteristics of ICMOS noise remain an unexplored research area.

2.2. ICMOS Image Denoising

Image denoising, as a core technique in computer vision, aims to recover high-quality and noise-free images from noisy ones. The imaging process of the ICMOS system involves steps such as photoelectric conversion and electron multiplication. Each step introduces specific noise sources, especially in MCP multiplication and the stage where output electrons strike the phosphor screen, which generates a large amount of randomly distributed spatial aggregation noise. This type of noise severely disrupts image structural information and differs significantly from the noise characteristics of conventional CMOS sensors. The challenge in ICMOS image denoising lies in the strong randomness and strong local spatial aggregation of the noise in the image as well as how denoising methods can be generalized to unknown feature spaces while maintaining as much generalization capability as possible.
ICMOS image denoising methods can be categorized into traditional denoising algorithms and deep learning-based denoising algorithms based on their principles. Traditional denoising algorithms are further divided into spatial domain methods and transform domain methods. Spatial-domain denoising algorithms remove noise by analyzing the characteristics of pixels and their neighborhoods. These methods are suitable for scenarios with distinct local features and simple noise distributions. However, their effectiveness is limited for ICMOS images, which exhibit complex spatial aggregation and widely distributed random noise. For example, Yan Wang et al. [19] used temporal and spatial photon counting to filter out aggregated noise and obtained accurate photon counts, but they did not further process the residual noise. Transform-domain denoising algorithms, on the other hand, convert images from the spatial domain to the frequency domain or other transform domains (e.g., Fourier transform, wavelet transform [20], and BM3D [21]). These methods leverage the differences between signal and noise characteristics in the transform domain for denoising. However, the features of spatially aggregated noise are not easily distinguishable in the transform domain, and the global filtering nature of transform domain algorithms struggles to adapt to locally strong correlations. For instance, Fei Wang et al. [1] and Meng Yang et al. [2] achieved image denoising by decomposing the image into flat patches and structural patches combined with sparse coding algorithms. Although some improvements have been proposed, no significant breakthroughs have been made in enhancing the denoising performance.
Deep learning-based denoising algorithms learn end-to-end mapping relationships from large amounts of noisy–clean image pairs, demonstrating superior denoising performance and real-time processing capabilities under ideal data conditions. However, these methods are highly sensitive to the completeness of training samples. Due to the extremely low-light imaging characteristics of ICMOS images, the stringent imaging conditions result in a scarcity of high-quality paired samples. Typical of relevant studies, Wang Xia et al. [22] constructed pseudo-ground-truth images through multi-frame averaging and implemented a dual-residual encoder–decoder network for denoising in specific scenarios. Similarly, Xin Zhang et al. [23] designed a cross-scale transformer architecture, achieving excellent denoising performance in limited scenarios. Although these methods have shown significant results on specific datasets, they are limited by the scenario-specific nature of the training data, making it difficult to establish a robust noise mapping model across scenes. As a result, the generalization performance of existing methods significantly decreases in practical applications. The advantages and disadvantages of various ICMOS denoising algorithms are compared in Table 1.

3. Noise Analysis of the Intensified CMOS Imaging System

The structure of the ICMOS system is shown in Figure 1a. The front end of the system consists of components such as a photocathode, a microchannel plate (MCP), a phosphor screen, coupling elements, and a CMOS sensor. Among these, the signal amplification components, including the photocathode, MCP, and phosphor screen, are collectively referred to as the “image intensifier”, which serves as the core device for signal amplification and enhancement. The real ICMOS imaging system is shown in Figure 1b. As the ICMOS system involves multiple stages and components, each step introduces different types of noise, which significantly impact the final image quality. Therefore, understanding and analyzing the noise characteristics within the ICMOS system is essential for optimizing the performance of the ICMOS imaging system. In the following sections, we will provide a detailed discussion of the primary noise sources in the ICMOS system and their characteristics.
(1) Photocathode: A component designed based on the external photoelectric effect, it is the core component of the detector responsible for converting incident photons into electrons. It also determines the detector’s photoelectric conversion efficiency for specific wavelength bands. The photocathode introduces various types of noise during operation, including dark current noise, photon shot noise, etc.
Dark current noise N d a r k is caused by the dark current generated due to thermal excitation, typically manifested as dark counts in the ICMOS system, which are determined by the dark current intensity and exposure time:
N d a r k = I d · t
where I d represents the dark current per unit time, which is positively correlated with temperature, and t is the exposure time.
Photon shot noise N p h o t o n is determined by fundamental physical laws, with the variance in its signal fluctuations equal to the mean accumulated charge, following a Poisson distribution:
N p h o t o n   ~   P ( λ p h o t o n )
where λ p h o t o n represents the number of incident photons, and P ( ) denotes the Poisson distribution.
(2) Microchannel plate (MCP): The MCP is composed of millions of microchannel glass tubes with a diameter of about 10 microns. Each microchannel can be regarded as an independent electron multiplier. An example of the MCP is shown in Figure 2. Its primary function is to amplify the primary electrons generated by the photocathode. When these primary electrons strike the walls of the microchannels, secondary electrons are emitted, enabling the process of electron multiplication.
The primary noise in the MCP is non-uniformity gain noise, which is similar to the non-uniformity noise in CMOS sensors. The gain of the MCP is related to the size and length of the channel and the number of electron collisions. These factors lead to differences in the electron multiplication capabilities of different channels. This type of noise belongs to multiplicative noise, and its calculation can be expressed as follows:
N M C P = E i , j × N n u
where N M C P represents the MCP non-uniformity noise, E i , j denotes the number of electrons entering the MCP, and N n u indicates the different gains of the MCP channels.
(3) Phosphor screen: The phosphor screen converts the electron signals amplified by the microchannel plate (MCP) into visible light through its phosphor coating. A physical image of the phosphor screen is shown in Figure 3. The performance of the phosphor screen directly impacts the image quality of the ICMOS system.
During the imaging process, the electrons amplified by the MCP strike the surface of the phosphor screen, forming discrete charge clusters. These charges are distributed on the phosphor screen in a nonlinear manner, resulting in localized bright or dark spots. Due to the diffusion and scattering effects of electron motion, the MCP’s electron output does not create a precise single-photon point on the phosphor screen but rather forms a clustered electron cloud around the photon point. The noise of the electronic point cloud follows Poisson distribution [24], and its overall noise model can be expressed as follows:
N s c r e e n ( x , y )   ~   P λ ( x ,   y )
where λ(x, y) is the average number of generated electrons. These values are determined by the photon’s energy, the MCP amplification factor, and the electron diffusion effect.
In addition, the ICMOS system is influenced by factors such as electronic scattering effects, coupling system alignment errors, and CMOS imaging system noise. Among these, the contributions of coupling system errors and CMOS imaging system noise to the overall system noise are relatively minor and generally negligible. The electronic scattering effect causes the exit angle of the emitted electrons from the MCP channel to vary, resulting in a dispersed spot on the phosphor screen [25]. Consequently, the total noise in the ICMOS system can be modeled as a nonlinear coupling effect between the photoelectric cathode noise, the MCP noise, and the phosphor screen noise under localized electron scattering. This relationship can be represented as follows:
N t o t a l = f Ω ( N d a r k , N p h o t o n , N M C P , N s c r e e n )
where f ( · ) represents the nonlinear coupling relationship between the different noise sources, and Ω denotes the electronic scattering effect within a local region.
In Figure 4, we compare ICMOS noisy images from the dataset used in this paper with images containing additive Gaussian noise. The clean images in the figure were obtained through the method of integrating 500 image frames. The results show that in ICMOS noisy images, random-scale abnormal texture distributions can be clearly observed. These abnormal noise features are typically caused by the complex interference mechanisms within the system. In contrast, in images with additive Gaussian noise, the original structural information remains well preserved with good integrity and clarity. This comparison highlights the complexity of internal noise sources in the ICMOS system and their unique impact on imaging quality. Therefore, in subsequent research, we designed and optimized a specialized denoising algorithm aimed at suppressing random textures caused by complex noise while preserving the original structural information of the image, thereby effectively improving the imaging quality of the ICMOS system.

4. Method

4.1. Problem Formulation

In this section, we propose a noise model for the real noise in ICMOS images. In common image degradation models, the noisy image is typically composed of the clean image combined with a noise function, where the noise function is often represented by heteroscedastic Gaussian noise:
y = f v i , x
where f ( · ) represents the clean image x and the nonlinear relationship with the pixel-level noise v i , and v i denotes the pixel-level noise, which has been modeled in many earlier studies as a Gaussian distribution N ( 0 , σ 2 ) .
However, existing noise models are insufficient to accurately simulate the real noise of ICMOS sensors. As analyzed in Section 3, the ICMOS system has unique structural characteristics, with the photoelectrons exhibiting randomness and uncertainty during the microchannel plate (MCP) multiplication process. This leads to an uneven spatial distribution of electrons after multiplication, and the random amplification effect causes noise to cluster spatially. Additionally, the electron scattering effect causes the electrons emitted from the MCP (microchannel plate) to form a scattering circle on the phosphor screen rather than a single landing point. This scattering effect results in the noise in the image no longer being independently distributed but instead exhibiting significant spatial correlation. Therefore, the distribution of real noise in the ICMOS system not only depends on the image signal but also displays spatial clustering characteristics rather than following an independent distribution.
We attribute this discrepancy to the improper implementation of noise in Equation (6); specifically, the noise is sampled independently of the underlying noise model in the spatial domain. Common noise models, such as additive white Gaussian noise (AWGN) and heteroscedastic Gaussian noise, assume that the noise distribution of each pixel is independent, and the noise is directly sampled from the given distribution, neglecting the correlation between neighboring pixels. To address this issue, we introduce the relationship between the noise value and the neighboring pixels and describe the spatial aggregation of the noise using the following equation:
n i = f x i + j Ω w j · f x j + ϵ i
where n i represents the spatially aggregated noise, f x j denotes the underlying noise model associated with pixel x i , w j is the weight coefficient corresponding to neighboring pixels in Ω , reflecting the spatial correlation between pixels, and ϵ i is the additional noise term, used to model the uncertainty or randomness in the noise.
The main research approach in this paper can be divided into three parts: noise image synthesis, image denoising, and synthetic noise evaluation. The overall structural block diagram is shown as Figure 5. First, based on the noise characteristics of the ICMOS system, a novel noise synthesis architecture is proposed to generate synthetic noise images with realistic noise distribution features, providing data support for subsequent denoising experiments. Next, a new noise distribution evaluation method is designed to effectively address the limitations of current evaluation methods, providing reliable quantitative evidence for the accuracy of noise modeling. Finally, a dedicated denoising network is designed for ICMOS noise images, trained with both synthetic and real noise images, significantly enhancing noise suppression and detail preservation. Through these three research components, we have developed a complete technical framework for noise modeling and image denoising, offering a systematic solution for the processing of ICMOS noise images.

4.2. Noise Synthesis Architecture

Using paired ICMOS system true noise images and clean images (y, x), we have developed a novel noise synthesis architecture that represents the complex noise distribution discussed in Section 3 through the complementary learning of content and noise.

4.2.1. Overall Pipeline of the LD-NGN Network

Figure 6 illustrates the overall architecture of the proposed local dependency noise generation network (LD-NGN), which consists of two parts: a noise distribution estimation network and a local pixel dependency network.
The noise distribution estimation network adopts a dual-encoder architecture, consisting of a content encoder, E1, and a noise encoder, E2. The dual-encoder structure enables feature decoupling, ensuring the independent encoding of content and noise information, thereby avoiding feature confusion. The content encoder, E1, focuses on extracting semantic and structural information from the image, while the noise encoder, E2, specializes in capturing the statistical properties and distribution patterns of the noise. Additionally, through the feature complementary learning mechanism, the network can fully leverage complementary feature information from different data, enhancing the feature representation capability. In the decoding phase, the noise and content features are combined and input into the decoder. Skip connections and upsampling operations are employed to map the fused features back to the image space, generating synthetic noise images that closely resemble the real noise distribution.
Subsequently, a local pixel dependency network is used to simulate the spatially aggregated noise of ICMOS. Unlike traditional independent and identically distributed noise models, this paper proposes a novel noise modeling approach based on the theoretical derivation of ICMOS system noise in Chapter 3 and an analysis of existing models in Section 4.1. This method takes pixel values and their local neighborhood pixels as inputs, combining multi-scale feature extraction and attention mechanisms to effectively learn spatially aggregated noise distributions with neighboring correlations and signal dependencies. By more accurately simulating the local characteristics of real noise, it provides a new solution for the precise representation of complex noise distributions. This approach not only overcomes the limitations of traditional models in capturing the spatial correlations of noise but also improves the accuracy and robustness of noise modeling, offering more reliable technical support for image processing tasks in complex noise environments.
(1) Noise Distribution Estimation Network. The noise distribution estimation network consists of three main components: Encoder1 (E1), Encoder2 (E2), and Decoder1 (D1). E1 and E2 utilize the same processing module. Both start by performing feature extraction through convolutional and downsampling operations, followed by two Res2Net residual blocks to capture features from receptive fields of different scales. The difference is that E2 performs four additional residual operations on top of the previous steps to estimate the noise feature kernel k. The input of Decoder1 (D1) is obtained by fusing the noise feature kernel k with the encoded content features obtained by E1. D1 progressively reconstructs the spatial structure and details of the noise image through multiple convolutional layers while precisely estimating the noise distribution. This process effectively combines the noise information with the content representation, providing crucial support for subsequent decoding and reconstruction tasks. In this way, the noise generator is able to accurately capture the complex noise patterns in ICMOS images and significantly improve the precision and robustness of noise modeling.
(2) Local Pixel Dependent Network. The local pixel dependency network (LPD-Net) serves as a neighborhood correlation operator. By taking the noise values and their local neighboring pixel noise as input, it learns the spatially aggregated noise distribution with local dependencies and signal correlations. The core architecture of LPD-Net combines Res2Net and Transformer, effectively leveraging the advantages of both. The multi-scale convolutions of Res2Net effectively extract local features from the image and capture noise patterns at different scales, thereby enhancing the diversity and detail representation of noise generation. Meanwhile, Transformer utilizes self-attention mechanisms to model the global dependencies between pixels over long distances, enabling noise generation to not only rely on local information but also to fully account for the interactions between neighboring pixels. LPD-Net efficiently captures local neighborhood information while maintaining a global contextual awareness, thus enabling precise learning and the estimation of complex noise patterns.

4.2.2. Loss Function

We introduce four loss functions: (1) noise level loss L n l , (2) adversarial losses L w g a n 1 and L w g a n 2 , and (3) stability loss L s t b .
Noise Level Loss. Through noise distribution estimation, we can obtain a noise distribution estimation. However, due to the lack of real noise level values, we use a local region noise estimation approach to approximate the noise level. Specifically, we divide the image into several small patches (such as 3 × 3 or 5 × 5 sliding windows) and calculate the noise level within each patch. This method can estimate noise characteristics in a local range at multiple scales, thus providing an effective estimate for overall noise modeling:
L n l = E m ^ m 2
The noise level is estimated as 1 m m e a n Ω i 2 m e a n ( Ω i ) 2 , where Ω refers to patches of size 3 × 3, 5 × 5, and 7 × 7. By applying the noise estimation method at multiple scales and averaging the results, we can more comprehensively capture the noise characteristics.
Wasserstein GAN Loss. To ensure that the generated noise has a similar distribution to the real noise, we introduce two adversarial losses. The first adversarial loss, L w g a n 1 , is applied between the final synthesized noise n ^ and the real noise n , aiming to enforce the generated noise to exhibit a high degree of spatial correlation similar to the real noise. The second adversarial loss, L w g a n 2 , further optimizes the noise generation by comparing the synthesized intermediate noise v ^ with the real noise n. Since the synthesized intermediate noise v ^ lacks spatial correlation among local pixels, we adopt the PixelUnshuffle method inspired by AP-BSN [26] to disrupt the spatial correlation in the real noise image n . By comparing two down-sampled versions, ( n ) s and v ^ s , the loss evaluation emphasizes the overall noise structure rather than local dependencies. This strategy adjusts the spatial distribution of the noise, making the generated noise more hierarchical and consistent. We adopt the WGAN adversarial loss function proposed by Arjovsky et al. [27]:
L w g a n 1 = E n D 1 n + E n ^ D 1 n ^
L w g a n 2 = E v D 2 ( n ) s + E v ^ D 2 v ^ s
where E[•] denotes the earth-mover (EM) distance, an expectation operator, D 1 is the discriminator that outputs noise and is responsible for evaluating the realism of the synthesized noise, and D 2 is the discriminator for the synthesized intermediate noise, tasked with assessing the quality of the intermediate noise and engaging in adversarial training with the real noise, thereby encouraging the generated intermediate noise to better match the distribution characteristics of the real noise.
Stabilizing Loss. To prevent the generated noise from deviating from the image grayscale value, thus negatively affecting the performance of the overall framework, we define an additional stability loss term, L s t b . This loss is designed to constrain the noise generation process to ensure that the generated noise does not cause a grayscale value shift in the image:
L s t b = n i
Finally, the complete loss function of the framework is described as follows:
L a l l = λ 1 L w g a n 1 + λ 2 L w g a n 2 + λ 3 L n l + λ 4 L s t b

4.3. Denoise Architecture

4.3.1. Overall Pipeline of MAST-Net

To better handle the complex noise in ICMOS images while preserving key structural details, this study proposes an image denoising network, the multi-scale attentive stage and transformer network (MAST-Net), as shown in Figure 7. The entire architecture follows an encoder–decoder framework, consisting of input and output projection, upsampling, downsampling, and processing modules. This design combines multi-scale feature extraction and attention mechanisms, enabling the effective capture of both local details and global contextual information within the image, thereby enhancing the denoising performance.
Specifically, given the input image I i n R H , W , 1 , the feature map G i n is first extracted through the input projection as follows:
G i n = φ ( f c i n ( I i n ) )
where φ denotes the LeakyReLU activation function, f c i n represents the input convolutional feature extraction network, and G i n R C , H , W .
Then, the feature map is fed into the encoder, which consists of the attentive stage block (ASB) and the downsampling layers proposed in this paper, with a total of four sets. The ASB captures the structural features of the image by combining pixel-level and channel-level attention mechanisms, thereby enhancing the feature extraction capability of the network. The downsampling is performed using PixelUnshuffle, and the calculation is given by the following:
G d o w n = φ ( f ( f A S B ( G l R C , H , W ) ) ) R 4 C , H / 2 , W / 2
where f represents the downsampling layer with a downsampling factor of 2, and f A S B denotes the attentive stage block. This multi-scale encoding structure allows for the stepwise extraction of both low-frequency and high-frequency features from the image, while the attention mechanism optimizes the feature representations.
The feature vectors of different scales produced by the encoder are fed into the decoder. Similarly, the decoder consists of the channel decoder block (CDB) and the upsampling layers proposed in this paper, with a total of three sets. The channel decoder block (CDB) is capable of capturing the global dependencies between long-range pixels, while upsampling is performed using PixelShuffle, which increases the resolution while reducing the vector dimensionality. The calculation is given by the following:
G u p = φ ( f ( f C D B ( G m R 4 C , H 2 , W 2 ) ) ) R C , H , W
where f represents the upsampling layer with an upsampling factor of 2, and f C D B denotes the channel decoder block.
Finally, the different levels of features between the encoder and decoder are fused through skip connections to enhance the interaction between global and local information. The denoised image I o u t reconstructed by the output network is represented as follows:
I o u t = φ ( f c o u t ( G l a s t ) ) + I i n
where f c o u t represents the output convolutional feature extraction network, and G l a s t denotes the final output of the encoder–decoder.
Attentive Stage Block. As shown in Figure 7B, the attentive stage block integrates pixel attention and channel attention mechanisms. The advantage of this dual attention design lies in the following: the pixel attention module enhances the focus on key regions by computing weights for spatial locations, effectively capturing local structural information, while the channel attention module optimizes global feature representation by assigning weights to different feature channels. The collaboration between these two modules significantly enhances the network’s ability to preserve high-frequency structures (such as edges and textures) with clarity while also improving the model’s capability to model complex features.
The input feature vector F i n R C , H , W is split into two parts, F i n 1   a n d   F i n 2 R C / 2 , H , W , after convolution. These two parts are then separately fed into the pixel attention module and the channel attention module. The feature maps F o u t 1 and F o u t 2 are then concatenated along the channel dimension and passed through a convolutional module. The final output is obtained by adding a residual connection with F i n :
F o u t = F i n + φ ( C o n v 3 × 3 ( F o u t 1 © F o u t 2 ) )
where C o n v 3 × 3 is the convolutional module, and © represents the concatenation along the channel dimension. The final output is F o u t R C , H , W .
Channel Decoder Block. As shown in Figure 7C, the channel decoder block effectively captures the multi-scale local features and long-range global dependencies of the image by combining CNN and Transformer modules. The design principle behind this hybrid architecture is as follows: the CNN-based Res2Net module excels at extracting local details and multi-scale features, while the LeWin Transformer module is capable of modeling long-range global contextual information. The combination of these two modules enables the network to simultaneously capture fine-grained local features and global structural information, significantly enhancing feature discrimination capability in complex scenarios.
The input feature vector D i n R C , H , W is passed through two layers of the Res2Net module for multi-scale feature extraction. The intermediate vector D m is obtained through residual connections:
D m = φ f r e s 2 ( f r e s 2 φ ( D i n ) ) + D i n
where f r e s 2 denotes the Res2Net module, and D m represents the intermediate vector obtained through residual connections.
Finally, to enhance global feature extraction, we further introduce a Transformer module to more efficiently capture contextual information:
D o u t = φ f l e w i n ( φ f l e w i n ( B N ( D m ) ) ) + D m
where BN denotes the batch normalization function, f l e w i n represents the Transformer module, and D o u t is the output feature of the channel decoder block (CDB).

4.3.2. Loss Function

To make the output of the denoising network closer to real images in terms of both pixel-level accuracy and perceptual realism, we introduce the Charbonnier loss and the SSIM (structural similarity) index. These are used for the supervised optimization of the estimated denoised results, with the clean image serving as the reference.
Charbonnier loss. Charbonnier loss is a loss function used to evaluate the error between the predicted image and the ground-truth image. Compared to traditional mean squared error (MSE) loss, Charbonnier loss is more robust in handling noise and outliers, effectively avoiding issues with unstable gradients. l C h a r b o n n i e r can be expressed as follows:
l C h a r b o n n i e r = 1 N ( I p r e I t a r g e t ) 2 + ϵ 2
where N denotes the number of training images, I p r e represents the predicted image, I t a r g e t represents the ground-truth image, and ϵ 2 is a small constant to ensure numerical stability that is empirically set to 1 × 10 6 .
SSIM loss. SSIM loss is a loss function used to evaluate the similarity between the predicted image and the ground-truth image. The SSIM value ranges from −1 to 1, where a higher value indicates greater similarity between the two images. It can be written as follows:
l S S I M = 1 N ( 1 S S I M ( I p r e I t a r g e t ) )
The final loss can be written as follows:
l a l l = λ C h a r b o n n i e r l C h a r b o n n i e r + λ S S I M l S S I M

4.4. Noise Evaluation Method

Noise evaluation plays a crucial role in the noise simulation and generation process. Accurate noise evaluation helps to realistically simulate the noise characteristics of an image, thereby improving the reliability and effectiveness of the algorithm. In image noise analysis, Kullback–Leibler (KL) divergence is commonly used to compare the histograms of two noisy images, assessing the similarity of their noise distributions. However, this method has significant limitations when dealing with complex noise types. First, KL divergence is not sensitive enough to the differences in the tails of the distribution, and it is difficult to accurately capture significant differences in the tails of the noise distribution. Secondly, even if the noise distributions differ in shape, the histograms of noisy images may appear similar, leading to similar KL divergence values. Therefore, while KL divergence can effectively evaluate the similarity of noise distributions in certain cases, it may yield inaccurate results when dealing with noise that has complex tail behavior or local structures. This can result in situations where different noise distributions produce small KL divergence values.
For example, in Figure 8, when we add Gaussian noise with a standard deviation of 0.1 and uniformly distributed noise to the image, the calculated KL divergence of the two noisy images is 0.02. According to conventional evaluation standards, these two noise distributions appear to be highly similar. But in fact, they are fundamentally different types of noise. This result highlights the limitations of KL divergence in evaluating complex noise distributions, particularly its inability to accurately capture significant differences in the tail behavior and local structures of the noise distributions.
In this paper, we improve the traditional method for calculating the KL divergence of image histograms by proposing a novel adaptive noise evaluation method, KL-Noise, based on a single image. This method extracts local features, such as luminance, gradient, and texture, from the image in blocks and estimates the image noise level comprehensively. It then calculates the KL divergence based on the noise estimate histogram. Compared to traditional methods, KL-Noise can perform the evaluation without prior noise information and adaptively predict the noise level through the feature variations of local blocks.

4.4.1. Adaptive Noise Estimation

The estimation of the noise level relies on local image features, such as luminance, gradient, and texture. We adopt a sliding window approach to divide the image into several smaller blocks of size N × N. The features within each block, including luminance, the variance in grayscale values, gradient, and texture variations, are used for noise level estimation. For each block, we compute the following features:
Grayscale Variance: Grayscale variance reflects the degree of variation in pixel values within an image. Since noise typically manifests as random fluctuations in grayscale values, grayscale variance is one of the important indicators for noise evaluation:
σ I 2 = 1 N 2 i = 1 N 2 I i μ i 2
Gradient Features: Gradient features reflect the direction and rate of change in pixel values within an image. A larger gradient value indicates more significant pixel value changes, typically corresponding to edge regions of the image. By calculating the gradient variance G x , y within a local region, the degree of variation in that region can be quantified, further characterizing the structural features of the image:
G x , y = I x , y x 2 + I x , y y 2
Texture Features: Texture features reflect the structural complexity of a local region, and noise typically disrupts the regularity of texture patterns. Therefore, texture variance can serve as an important indicator for noise evaluation. We use the local binary pattern (LBP) to describe the texture features. Given a radius R and P neighboring points, the LBP descriptor T(x,y) is defined as follows:
T x , y = P = 0 P 1 s I P I x , y 2 P
where s(⋅) is the sign function. The texture variation within the local block can be described by the texture variance σ T ( x , y ) 2 , which quantifies the texture changes in the region.
Edge Density: The edge density quantifies the impact of noise on the edges by calculating the proportion of edge pixels within a local block:
E D = i = 1 N 2 G i > T t h r e s h o l d N 2
where G i represents the gradient image generated by the Sobel operator.
Based on the above features, we have constructed a single-image adaptive noise estimation model, as shown in Equation (27). This model combines local statistical features, such as brightness, gray-level variance, gradient, texture, and edge density, to achieve accurate noise estimation within each sliding window. Histogram statistical analysis of these estimates is then performed to extract the statistical properties of the noise distribution:
σ = 0.5 σ I 2 + 0.25 σ G 2 + 0.15 σ T 2 + 0.1 E D

4.4.2. Noise Level Evaluation

Through adaptive noise estimation, we can accurately assess the noise level of the image. Specifically, we first compute the noise residuals and estimate the noise level based on their histograms. Then, we calculate the Kullback–Leibler (KL) divergence to measure the similarity between the distributions of different noise residuals. By comparing the histogram shapes, we can effectively estimate the similarity of the noise distributions.
To validate the effectiveness of the proposed method, we performed further testing on the images in Figure 8, adding Gaussian noise images with identical distributions, Poisson noise images, and salt-and-pepper noise images for comparison. Using the adaptive noise estimation method, we successfully estimated the noise levels of each image and compiled the histograms of the estimated noise, with the results shown in Figure 9. Visually, the estimated results for the Gaussian noise images with the same distribution are nearly identical, whereas obvious deviations are observed for images with other noise distributions, confirming the sensitivity of our method to different noise types. Quantitatively, the KL-Noise calculated from the noise histogram of the Gaussian noise images with the same distribution is 0.04, while Gaussian–uniform, Gaussian–Poisson, and Gaussian–salt-and-pepper pairs yielded values of 3.6, 5.4, and 7.6, respectively. These results demonstrate that the proposed KL-Noise method effectively evaluates noise distribution similarity and accurately distinguishes noise types, overcoming the limitations of traditional KL divergence methods that directly compute image histogram differences.

5. Experiment

5.1. Experimental Setup

To validate the effectiveness of the proposed method, we designed two sets of experiments. First, noise images were generated using the proposed network to evaluate the quality of the generated noise. Second, the performance of the generated noise images in downstream image denoising tasks was tested.
Experimental Platform: The experimental platform is illustrated in Figure 10. The experimental platform consists of an ICMOS imaging system, a data acquisition system, an adjustable xenon lamp, and a computer. The ICMOS system was developed by the National Space Science Center (NSSC), Chinese Academy of Sciences (CAS).The ICMOS system employs an 18 mm image intensifier, which is directly coupled to a CMOS sensor via a fiber optic taper, with a peak response wavelength of 500 nm and a spatial resolution of 750 × 580 pixels. The experiments were conducted in both indoor and outdoor environments.
  • Indoor experiments: All indoor experiments were conducted under optical darkroom conditions to minimize the interference of ambient light on the experimental results. A xenon lamp with adjustable illumination was used to precisely control the lighting levels of the imaging environment, with illuminance set at 10 2 lx and 10 3 lx, simulating imaging scenarios under extremely low-light conditions. The experimental scenes included typical indoor environments, such as laboratories and offices, to ensure the diversity and representativeness of the dataset.
  • Outdoor experiments: In the outdoor experiments, real-world datasets were collected under natural conditions, covering an illuminance range from 10 2 lx to 10 4 lx, fully reflecting low-light conditions in real-world scenarios. An illuminance meter was employed to quantitatively calibrate the illumination level of the imaging scenes, ensuring the reliability and reproducibility of the experimental data. The experimental scenes included typical outdoor environments, such as urban streets and natural landscapes.
The experimental design comprehensively considers the performance of the imaging system under multi-scene and multi-illumination conditions, providing rigorous experimental support for the analysis of results.
Noise Image Dataset: In this study, we collected a real ICMOS image dataset as the training set. The ICMOS imaging system has a spatial resolution of 750 × 580 pixels, and a total of 2000 images from different scenes were collected according to the experimental setup described above, covering both indoor and outdoor environments. This dataset comprehensively reflects the performance of the ICMOS imaging system under various illumination and scene conditions.
To obtain ground-truth images, we applied the frame integration algorithm. The experimental results demonstrate that integrating 500 frames of the same scene can significantly suppress noise, thereby generating high-quality, noise-free clean images. Ultimately, the dataset consists of 2000 ground-truth images, with each clean image corresponding to 20 noisy images with different noise distributions, resulting in a total of 40,000 noisy images. Among these, 80% were used as the training set, and 20% were used as the test set.
Metrics: We used four metrics to estimate the performance of the proposed method: KL divergence, KL-Noise, the peak signal-to-noise ratio (PSNR), and the structural similarity index (SSIM) [28]. Among them, KL divergence and KL-Noise were used to measure the similarity between the distribution of real noise and generated noise, while PSNR and SSIM were used to assess the denoising performance. Higher PSNR and SSIM values indicate better denoising performance, while lower KL divergence suggests that the generative model more accurately reflects the real noise distribution, representing superior noise synthesis quality. Additionally, KL-Noise offers a more detailed measure of the consistency between the generated noise and the real noise distribution, making it a critical metric for evaluating the effectiveness of the proposed method
Implementation Details: All the networks were optimized using the Adam optimizer with a batch size of 32. To enhance training efficiency, the images were cropped to a fixed size of 128 × 128 pixels. Both the noise generation network and the denoising network were trained for 900 iterations, with a learning rate of 1 × 10−4, which was gradually reduced as training progressed. The values of λ1, λ2, λ3, and λ4 in the noise generation network loss function were set to 0.1, 0.1, 30, and 0.01, respectively, and the values of λ C h a r b o n n i e r and λ S S I M for the denoising network were set to 1 and 0.01.

5.2. Noise Synthesis Results

Baseline: In this study, the proposed method was compared with several common noise models, including additive white Gaussian noise (AWGN), C2N [29], N2N [30], and NeCA [31]. To synthesize AWGN, we evaluated the noise level of each noisy image using a noise estimation method and added the estimated noise to the corresponding clean image, ensuring that the comparison with real noise remains representative. For C2N and NeCA noise, we used the networks provided by the respective authors and trained them under the same experimental settings and dataset as the proposed network. This benchmarking design ensures fairness and reliability in the comparison, enabling a comprehensive evaluation of the proposed method’s performance and quantification of its advantages in noise modeling and image denoising tasks.
Quantitative Evaluations: We evaluated the noise synthesis results of the proposed network and used both KL divergence and KL-Noise as metrics to assess the quality of the synthesized noise. Table 2 shows the mean KL divergence and KL-Noise results computed for various baseline methods on the validation set used in this study. The proposed method demonstrates significant advantages in both metrics, achieving the lowest values among all the methods. This indicates that the synthesized noise generated by our method exhibits the highest similarity to the statistical distribution of real noise data. Specifically, our approach shows exceptional robustness and realism in simulating complex spatially aggregated noise. Figure 11 illustrates the noise evaluation histograms obtained through KL-Noise for various noise synthesis methods, including C2N, Gaussian noise, NeCA, N2N, and the proposed method. “Real” in the table refers to the results obtained by calculating the noise images of different frames under the same experimental conditions. By computing KL and KL-Noise for real noise images (which share the same distribution characteristics), we established reference metrics for noise with identical real distributions. As shown, the noise distribution generated by our proposed model almost perfectly overlaps with the real noise distribution, confirming the superior realism and accuracy of the proposed noise synthesis framework.
Furthermore, we evaluated the efficiency of various models. To ensure fairness in the efficiency comparison, we adopted the inference time and the number of parameters as evaluation metrics. Specifically, we tested all models on the same computer device (equipped with an NVIDIA RTX 4090 GPU), and the results are presented in Table 3.
Qualitative Comparison. Figure 12 and Figure 13 further present a qualitative comparison of noise images generated by various methods. Through visual analysis, it is evident that the noise images generated by the proposed method exhibit high consistency with real noise images in terms of texture details and spatial distribution. Specifically, the noise generated by our method significantly outperforms others, such as C2N and NeCA, in terms of randomness, local spatial clustering, and overall structural consistency. This highlights the superior capability of our framework in capturing complex noise patterns. Additionally, from a visual perspective, the noise generated by other methods often appears overly smooth or the noise points are independent and lack spatial correlation, failing to accurately reflect the distribution characteristics of real noise. In contrast, the noise generated by LD-NGN not only accurately reproduces its complex patterns but also better preserves the diversity and local features of the noise, making it almost identical to the original ICMOS noise images.
In summary, the noise images generated by the LD-NGN proposed in this paper perform excellently in both quantitative and qualitative evaluations. Whether in terms of statistical distribution similarity (measured by KL divergence and KL-Noise) or visual realism (observed through qualitative comparisons), our model exhibits significant advantages. These results indicate that the proposed method not only generates synthetic noise highly consistent with real noise but also provides higher-quality training data for downstream tasks such as image denoising.

5.3. Real Image Denoising Results

Baseline. We designed several sets of experiments to validate the effectiveness of the noise synthesis network LD-NGN in downstream real-image denoising tasks and to assess the performance of the image denoising network MAST-Net.
First, to validate the effectiveness of the noise synthesis network LD-NGN, we constructed a synthetic dataset, where the clean images were obtained from the captured real dataset and the noisy images were generated using different noise modeling methods, including C2N [29], N2N Flow [30], NeCA [31], and the proposed LD-NGN. Using these synthetic noisy–clean image pairs, we trained MAST-Net on these datasets and evaluated its performance on a test set.
Second, to verify the effectiveness of the denoising network MAST-Net, we used the noisy–clean image pairs generated by LD-NGN to train multiple denoising networks, including the real image denoising model SRM [32], the low-light image enhancement model LLFLOW [33], the complex image denoising model CTNet [34], and MAST-Net. The performance of each model was evaluated on the test set.
Additionally, we conducted comparative evaluations with other methods, including classical denoising algorithms (e.g., BM3D [21], WNNM [35], and K-SVD [36]) and self-supervised image denoising methods (e.g., APBSN [26]). The superiority of the proposed method in denoising tasks was validated through the various abovementioned methods.
Quantitative Evaluations. Table 4 shows the mean denoising results of various baseline methods on different metrics, evaluated on the validation set used in this study, where “Generation-based methods with MAST-Net” compares the effectiveness of the noise synthesis network LD-NGN and “Supervised learning methods with LD-NGN” compares the effectiveness of the denoising network MAST-Net. Table 5 presents the efficiency comparison of different methods. We adopted inference time and the number of parameters as evaluation metrics and conducted the assessment on the same computer device (equipped with an NVIDIA RTX 4090 GPU).
Clearly, the MAST-Net denoising network, trained with the LD-NGN noise model proposed in this paper, achieves excellent performance in both PSNR and SSIM among all the compared methods.
Specifically, these excellent evaluation results not only validate the effectiveness and accuracy of the LD-NGN noise generation model in generating highly realistic and consistent noisy images but also fully demonstrate the outstanding performance of MAST-Net in denoising tasks. LD-NGN can accurately capture the sparsity and spatial clustering characteristics of ICMOS noise, thereby generating training data highly similar to real ICMOS noise, providing a solid foundation for performance improvement in denoising networks. MAST-Net, with its ability to model multi-scale features and long-range dependencies, effectively suppresses complex noise while preserving image detail and structural integrity.
In contrast, other denoising methods, such as SRM and LLFLOW, exhibit certain limitations in denoising performance. Their results often fail to sufficiently remove complex noise and tend to over-smooth the image. Although CTNet is close to our method in terms of the SSIM metric, its larger number of parameters and longer runtime limit its efficiency in practical applications. Denoising networks trained with other noise generators (e.g., C2N, N2N Flow, and NeCA) suffer from inaccurate noise distribution, which leads to insufficient learning of noise characteristics during training and, consequently, poorer denoising performance.
Qualitative Evaluations. To provide a clearer comparison of the effectiveness of the above baseline methods, Figure 14 presents the visual denoising results of different methods in the dataset. It can be observed that denoising networks trained with other noise generation methods (such as C2N, N2N Flow, and NeCA) exhibit significant shortcomings in denoising performance. Specifically, the denoised images generated by these methods still contain many unreasonable noise spots, and their ability to restore image details is poor. The main reason for this phenomenon is that these compared noise generation models fail to adequately simulate the complex characteristics of real ICMOS noisy images, leading to insufficient learning of noise distribution during the training phase of the denoising network, ultimately resulting in poor denoising performance.
Additionally, different denoising networks (such as SRM, LLFLOW, and CTNet) have certain limitations when handling noisy images. These methods typically struggle to balance noise removal and detail preservation, leading to issues such as blurry image details or incomplete noise removal.
In contrast, the training results of the noise generation network LD-NGN and the denoising network MAST-Net proposed in this paper achieve optimal denoising performance. The denoised results are close to the real images, removing noise while preserving finer details and clarity. This can be attributed to the accuracy of LD-NGN in modeling noisy images and the effective handling of both global features and local details by MAST-Net in the denoising task.

5.4. Ablation Study

We conducted ablation experiments to analyze the effectiveness of each component in the LD-NGN network. Through quantitative evaluations on the dataset, we validated the contributions of different loss functions, the noise distribution estimation network, and the local pixel dependency network. In the following, we will provide a detailed discussion on the impact of each design on the LD-NGN network model.
(1) Effect of loss functions: This section first investigates the effectiveness of various loss functions, focusing on the impact of L w g a n 2 and L s t b losses on the performance of the network model. We first use the basic network containing adversarial losses L w g a n 1 and L n l as the benchmark and then add different loss functions on this basis for experiments. Specifically: (1) Base + L w g a n 2 : adding the WGAN adversarial loss, and (2) Base + L s t b : adding the stability loss. Table 6 summarizes the quantitative results of the ablation study. Compared to the baseline network, the models with the added L w g a n 2 and L s t b losses demonstrate significant performance improvements across all evaluation metrics. Specifically, the introduction of L w g a n 2 significantly enhances the generator’s ability to model the noise distribution, making the synthetic noise more statistically consistent with real noise and significantly reducing the distribution gap. The addition of L s t b further stabilizes the model’s training process, effectively improving the spatial consistency of the generated noise and eliminating grayscale discrepancies between the generated noise images and real images. This improvement makes the generated noise visually closer to real noise while significantly enhancing the performance of downstream tasks, such as image denoising.
(2) Effect of encoder E2: Encoder E2 effectively enhances the noise distribution extraction capability of LPD-Net. To verify its effectiveness, we conducted a comparative experiment by removing the E2 module from the complete network. The quantitative evaluation results are shown in Table 7, where the performance of KL and KL-Noise significantly decreased. Visual comparison results are shown in Figure 15, where the noise images generated after removing E2 exhibit deviations in noise structure modeling compared to real images, with some regions showing excessive noise aggregation. In contrast, the output noise images from our original complete network are closer to the real noise images. These results indicate that encoder E2 indeed improves the network’s noise distribution extraction capability.
(3) Effect of the local pixel dependency network (LPD-Net): To verify the contribution of LPD-Net, we designed a dual comparative experiment: 1) replacing LPD-Net with a standard U-Net; 2) removing LPD-Net from the complete network. The quantitative evaluation results are shown in Table 7, and the visual comparison is shown in Figure 15. The results indicate that the introduction of LPD-Net significantly improves the overall performance of the model. Compared to the traditional U-Net, LPD-Net demonstrates a stronger ability to capture the spatial dependencies between local pixels and complex noise patterns. Specifically, LPD-Net is more adept at handling complex spatially aggregated noise, and the noise it generates is highly consistent with real noise in detail and structure. In contrast, removing the LPD module results in noise images with noticeable granularity, which significantly deviates from the real ICMOS noise distribution, further validating the rationality of the network design proposed in this paper. These results indicate that LPD-Net effectively models the nonlinear dependencies between pixels, providing a theoretically explainable solution for extracting complex noise patterns.

6. Conclusions

This paper addresses the issues of poor denoising algorithm performance and insufficient model generalization in ICMOS systems by combining noise modeling and image denoising algorithms to enhance ICMOS image denoising. The main contributions of this work are as follows:
(1) We constructed an experimental platform to capture real ICMOS images and proposed a novel noise synthesis network, LD-NGN, which accurately simulates the strong sparsity and spatial clustering characteristics of ICMOS noise to generate multi-scene paired ICMOS datasets. Additionally, we introduced a new noise evaluation metric, KL-Noise, which enables more precise quantification of noise distribution. The experimental results show that the proposed noise modeling approach reduces KL divergence by 0.03 compared to state-of-the-art noise generation methods, achieving superior performance in capturing fine noise details.
(2) We propose a denoising network, MAST-Net, specifically designed for ICMOS images. By training it on the multi-scene paired dataset generated by LD-NGN, MAST-Net leverages a multi-scale attention mechanism to capture image pixel features and effectively suppress complex noise. The experimental results demonstrate that, compared to original ICMOS noisy images, the proposed method improves the SNR by 8.37 dB and structural similarity (SSIM) by 0.254. Additionally, compared to state-of-the-art denoising algorithms, it outperforms state-of-the-art denoising algorithms by enhancing SNR by 0.22 dB and SSIM by 0.026, with notably superior qualitative visual performance.
For practical system implementation, the deployment path can be divided into two stages. First, the proposed network can be deployed on desktop computing platforms (e.g., GPU) to perform end-to-end denoising on ICMOS noisy images. Second, through model quantization and pruning techniques, the network can be compressed to a scale suitable for FPGA/DSP embedded hardware, and a dedicated AI acceleration core can be integrated into the sensor readout circuit to achieve real-time on-chip denoising. This method can be widely applied in scenarios such as deep-space ultraviolet detection and night vision detection, effectively improving target recognition capabilities and detail resolution. It provides strong technical support for advancements in related fields.
Although the proposed method demonstrates significant performance advantages in ICMOS image denoising tasks, it also has certain limitations. First, the method is currently trained and inferred on desktop-level computing platforms (e.g., GPU) and has not yet been deployed on embedded mobile platforms; it thus lacks real-time image denoising capabilities. Second, the current method primarily focuses on image denoising under weak signal conditions, and its effectiveness in single-photon-level extremely low-light scenarios requires further investigation. In the future, we plan to explore additional tasks. On the one hand, we will consider more extremely low-light scenarios to further optimize denoising performance; on the other hand, we aim to achieve lightweight network deployment on mobile platforms (e.g., FPGA) to enable the real-time processing of ICMOS images and extend the method to dynamic video denoising.

Author Contributions

Conceptualization, Y.L. and L.F.; methodology, all authors; software, Y.L., T.Z. and B.Z.; validation, R.L. and N.J.; formal analysis, Y.L. and L.F.; investigation, N.J.; resources, L.F. and N.J.; data curation, Y.L.; writing—original draft preparation, Y.L.; visualization, Y.L., T.Z. and B.Z.; supervision, L.F. and N.J.; project administration, L.F.; funding acquisition, L.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant 42174226 and the National Key R & D Program of China (Grant 2022YFF0503901).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors thank the researchers who provided the ICMOS system, and the authors also wish to thank the editors and the reviewers for their valuable suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Tuttle, S.; Matsumura, M.; Ardila, D.R.; Chen, P.; Davis, M.; Ertley, C.; Farr, E.; Fleming, B.; France, K.; Froning, C.; et al. Ultraviolet Technology To Prepare For The Habitable Worlds Observatory. arXiv 2024, arXiv:2408.07242. [Google Scholar]
  2. Martin, C.; Jelinsky, P.; Lampton, M.; Malina, R.F.; Anger, H.O. Wedge-and-strip Anodes for Centroid-finding Position-sensitive Photon and Particle Detectors. Rev. Sci. Instrum. 1981, 52, 1067–1074. [Google Scholar] [CrossRef]
  3. Siegmund, O.H.W.; McPhate, J.B.; Vallerga, J.V.; Tremsin, A.S.; Jelinsky, S.R.; Frisch, H.J. Novel Large Format Sealed Tube Microchannel Plate Detectors for Cherenkov Timing and Imaging. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrometers Detect. Assoc. Equip. 2011, 639, 165–168. [Google Scholar] [CrossRef]
  4. Tremsin, A.S.; Siegmund, O.H.W. UV Radiation Resistance and Solar Blindness of CsI and KBr Photocathodes. IEEE Trans. Nucl. Sci. 2001, 48, 421–425. [Google Scholar] [CrossRef]
  5. Yang, M.; Wang, F.; Wang, Y.; Zheng, N. A Denoising Method for Randomly Clustered Noise in ICCD Sensing Images Based on Hypergraph Cut and Down Sampling. Sensors 2017, 17, 2778. [Google Scholar] [CrossRef]
  6. Wang, F.; Wang, Y.; Yang, M.; Zhang, X.; Zheng, N. A Denoising Scheme for Randomly Clustered Noise Removal in ICCD Sensing Image. Sensors 2017, 17, 233. [Google Scholar] [CrossRef]
  7. Yu, Y.; Pan, E.; Ma, Y.; Mei, X.; Chen, Q.; Ma, J. UnmixDiff: Unmixing-Based Diffusion Model for Hyperspectral Image Synthesis. IEEE Trans. Geosci. Remote Sens. 2024, 62, 3425517. [Google Scholar] [CrossRef]
  8. Zhong, Y.; Zhang, S.; Liu, Z.; Zhang, X.; Mo, Z.; Zhang, Y.; Hu, H.; Chen, W.; Qi, L. Unsupervised Fusion of Misaligned PAT and MRI Images via Mutually Reinforcing Cross-Modality Image Generation and Registration. IEEE Trans. Med. Imaging 2024, 43, 1702–1714. [Google Scholar] [CrossRef]
  9. Ma, R.; Ma, T.; Guo, D.; He, S. Novel View Synthesis and Dataset Augmentation for Hyperspectral Data Using NeRF. IEEE Access 2024, 12, 45331–45341. [Google Scholar] [CrossRef]
  10. Son, Y.; Jeong, S.; Hong, Y.; Lee, J.; Jeon, B.; Choi, H.; Kim, J.; Shim, H. Improvement in Image Quality of Low-Dose CT of Canines with Generative Adversarial Network of Anti-Aliasing Generator and Multi-Scale Discriminator. Bioengineering 2024, 11, 944. [Google Scholar] [CrossRef]
  11. Zheng, Z.; Wang, M.; Zhao, X.; Weng, Z. Adltformer Team-Training with Detr: Enhancing Cattle Detection in Non-Ideal Lighting Conditions Through Adaptive Image Enhancement. Animals 2024, 14, 3635. [Google Scholar] [CrossRef] [PubMed]
  12. Yu, C.; Han, C.; Zhang, C. Multi-Source Training-Free Controllable Style Transfer via Diffusion Models. Symmetry 2025, 17, 290. [Google Scholar] [CrossRef]
  13. Guo, L.; Huang, S.; Liu, H.; Wen, B. Toward Robust Image Denoising via Flow-Based Joint Image and Noise Model. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 6105–6115. [Google Scholar] [CrossRef]
  14. Wei, K.; Fu, Y.; Yang, J.; Huang, H. A Physics-Based Noise Formation Model for Extreme Low-Light Raw Denoising. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2755–2764. [Google Scholar]
  15. Chang, K.-C.; Wang, R.; Lin, H.-J.; Liu, Y.-L.; Chen, C.-P.; Chang, Y.-L.; Chen, H.-T. Learning Camera-Aware Noise Models. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 343–358. [Google Scholar]
  16. Chen, J.; Chen, J.; Chao, H.; Yang, M. Image Blind Denoising with Generative Adversarial Network Based Noise Modeling. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3155–3164. [Google Scholar]
  17. Li, B.; Xie, D.; Wu, Y.; Zheng, L.; Xu, C.; Zhou, Y.; Fu, Y.; Wang, C.; Liu, B.; Zuo, X. Synthesis and Detection Algorithms for Oblique Stripe Noise of Space-Borne Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 3360268. [Google Scholar] [CrossRef]
  18. Zhang, W.; Zhang, R.; Wang, G.; Li, W.; Liu, X.; Yang, Y.; Hu, D. Physics Guided Remote Sensing Image Synthesis Network for Ship Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 3248106. [Google Scholar] [CrossRef]
  19. Wang, Y.; Qian, Y.; Kong, X. Photon Counting Based on Solar-Blind Ultraviolet Intensified Complementary Metal-Oxide-Semiconductor (ICMOS) for Corona Detection. IEEE Photonics J. 2018, 10, 2876514. [Google Scholar] [CrossRef]
  20. Aimin, Z.; Range, T. Denoising and Fusion Method of Night Vision Image Based on Wavelet Transform. Electron. Meas. Technol. 2015, 38, 38–40. [Google Scholar] [CrossRef]
  21. Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef]
  22. Wu, W.; Liu, S.; Zhou, Y.; Zhang, Y.; Xiang, Y. Dual Residual Attention Network for Image Denoising. Pattern Recognit. 2024, 149, 110291. [Google Scholar]
  23. Zhang, X.; Wang, X.; Yan, C. LL-CSFormer: A Novel Image Denoiser for Intensified CMOS Sensing Images under a Low Light Environment. Remote Sens. 2023, 15, 2483. [Google Scholar] [CrossRef]
  24. Lubberts, G. Random Noise Produced by X-Ray Fluorescent Screens. J. Opt. Soc. Am. 1968, 58, 1475. [Google Scholar] [CrossRef]
  25. Edgar, M.L.; Kessel, R.; Lapington, J.S.; Walton, D.M. Spatial Charge Cloud Distribution of Microchannel Plates. Rev. Sci. Instrum. 1989, 60, 3673–3680. [Google Scholar] [CrossRef]
  26. Lee, W.; Son, S.; Lee, K.M. AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 17704–17713. [Google Scholar]
  27. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein GAN 2017. arXiv 2017, arXiv:1701.07875. [Google Scholar]
  28. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
  29. Jang, G.; Lee, W.; Son, S.; Lee, K. C2N: Practical Generative Noise Modeling for Real-World Denoising. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 10–17 October 2021; pp. 2330–2339. [Google Scholar]
  30. Maleky, A.; Kousha, S.; Brown, M.S.; Brubaker, M.A. Noise2NoiseFlow: Realistic Camera Noise Modeling without Clean Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 4–18 June 2022. [Google Scholar]
  31. Fu, Z.; Guo, L.; Wen, B. sRGB Real Noise Synthesizing with Neighboring Correlation-Aware Noise Model. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 1683–1691. [Google Scholar]
  32. Fan, C.-M.; Liu, T.-J.; Liu, K.-H.; Chiu, C.-H. Selective Residual M-Net for Real Image Denoising. In Proceedings of the 2022 30th European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 29 August–2 September 2022; pp. 469–473. [Google Scholar]
  33. Wang, Y.; Wan, R.; Yang, W.; Li, H.; Chau, L.-P.; Kot, A.C. Low-Light Image Enhancement with Normalizing Flow. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021. [Google Scholar]
  34. Tian, C.; Zheng, M.; Zuo, W.; Zhang, S.; Zhang, Y.; Lin, C.W. A Cross Transformer for Image Denoising. Inf. Fusion 2024, 102, 102043. [Google Scholar]
  35. Gu, S.; Zhang, L.; Zuo, W.; Feng, X. Weighted Nuclear Norm Minimization with Application to Image Denoising. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2862–2869. [Google Scholar]
  36. Aharon, M.; Elad, M.; Bruckstein, A. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. IEEE Trans. Signal Process. 2006, 54, 4311–4322. [Google Scholar] [CrossRef]
Figure 1. (a) Schematic of the ICMOS image intensifier structure; (b) the ICMOS imaging system.
Figure 1. (a) Schematic of the ICMOS image intensifier structure; (b) the ICMOS imaging system.
Remotesensing 17 01219 g001
Figure 2. Physical image of the microchannel plate (MCP).
Figure 2. Physical image of the microchannel plate (MCP).
Remotesensing 17 01219 g002
Figure 3. Physical image of the phosphor screen.
Figure 3. Physical image of the phosphor screen.
Remotesensing 17 01219 g003
Figure 4. Comparison between the ICMOS image and the Gaussian white noise image.
Figure 4. Comparison between the ICMOS image and the Gaussian white noise image.
Remotesensing 17 01219 g004
Figure 5. ICMOS noise processing technical framework diagram.
Figure 5. ICMOS noise processing technical framework diagram.
Remotesensing 17 01219 g005
Figure 6. (A) Overview of the proposed LD-NGN network, which includes the noise distribution estimation network and the local pixel dependency network (LPD-Net). (B) The residual block, Res2Net module and the LeWin module.
Figure 6. (A) Overview of the proposed LD-NGN network, which includes the noise distribution estimation network and the local pixel dependency network (LPD-Net). (B) The residual block, Res2Net module and the LeWin module.
Remotesensing 17 01219 g006
Figure 7. (A) Overview of the proposed MAST-Net. PixelShuffle and pixelUnshuffle are the downsampling layer with a factor of 2 and the upsampling layer with a factor of 2. (B) Structure of the proposed Attentive Stage Block. (C) Structure of the proposed Channel Decoder Block.
Figure 7. (A) Overview of the proposed MAST-Net. PixelShuffle and pixelUnshuffle are the downsampling layer with a factor of 2 and the upsampling layer with a factor of 2. (B) Structure of the proposed Attentive Stage Block. (C) Structure of the proposed Channel Decoder Block.
Remotesensing 17 01219 g007
Figure 8. (a) Clean image; (b) Gaussian noise; (c) uniform noise image; (d) comparison of Gaussian noise and uniform noise histograms; (e) comparison of histograms for an image with added Gaussian noise and an image with added uniform noise.
Figure 8. (a) Clean image; (b) Gaussian noise; (c) uniform noise image; (d) comparison of Gaussian noise and uniform noise histograms; (e) comparison of histograms for an image with added Gaussian noise and an image with added uniform noise.
Remotesensing 17 01219 g008
Figure 9. The noise level evaluation values obtained through adaptive noise assessment histogram statistics after adding different types of noise to the images in Figure 8.
Figure 9. The noise level evaluation values obtained through adaptive noise assessment histogram statistics after adding different types of noise to the images in Figure 8.
Remotesensing 17 01219 g009
Figure 10. The ICMOS imaging system data acquisition experimental platform.
Figure 10. The ICMOS imaging system data acquisition experimental platform.
Remotesensing 17 01219 g010
Figure 11. The KL-Noise proposed in this paper is used to calculate the noise distribution evaluation for various noise simulation methods, and the noise evaluation results are statistically analyzed through histograms.
Figure 11. The KL-Noise proposed in this paper is used to calculate the noise distribution evaluation for various noise simulation methods, and the noise evaluation results are statistically analyzed through histograms.
Remotesensing 17 01219 g011
Figure 12. Under 10 2 lx attenuation lighting conditions, the visual comparison of synthetic noise samples using various methods. Below each image is the corresponding residual noise map.
Figure 12. Under 10 2 lx attenuation lighting conditions, the visual comparison of synthetic noise samples using various methods. Below each image is the corresponding residual noise map.
Remotesensing 17 01219 g012
Figure 13. Under 10 3 lx attenuation lighting conditions, the visual comparison of synthetic noise samples using various methods. Below each image is the corresponding residual noise map.
Figure 13. Under 10 3 lx attenuation lighting conditions, the visual comparison of synthetic noise samples using various methods. Below each image is the corresponding residual noise map.
Remotesensing 17 01219 g013
Figure 14. Visual comparison of denoising results; the denoising methods include a traditional denoising algorithm, an image self-supervised denoising algorithm, and a generation-based supervised denoising algorithm.
Figure 14. Visual comparison of denoising results; the denoising methods include a traditional denoising algorithm, an image self-supervised denoising algorithm, and a generation-based supervised denoising algorithm.
Remotesensing 17 01219 g014
Figure 15. Visual comparison of the effectiveness of each module in the proposed LD-NGN network.
Figure 15. Visual comparison of the effectiveness of each module in the proposed LD-NGN network.
Remotesensing 17 01219 g015
Table 1. Comparative analysis of different ICMOS denoising methods.
Table 1. Comparative analysis of different ICMOS denoising methods.
Method CategoryAdvantagesDisadvantagesReferences
Traditional MethodsWide applicability1. High computation complexity;
2. Limited denoising performance
[1,2]
Deep Learning MethodsGood denoising effect1. Heavy reliance on large training datasets;
2. Poor model generalization
[22,23]
Table 2. Quantitative results of synthesized noise: KL divergence and KL-Noise between the generated noise maps and the real noise maps.
Table 2. Quantitative results of synthesized noise: KL divergence and KL-Noise between the generated noise maps and the real noise maps.
MethodKLKL-NoiseReference
AWGN1.438110.1913-
C2N0.16522.1157[29]
N2N0.37861.5824[30]
NeCA0.04220.7465[31]
Ours0.01210.1644This work
Real0.00740.0941-
Table 3. Comparison of model parameters and inference time across different noise synthesis methods. The results from all the compared methods in this table are obtained by inferring an image with a size of 512 × 512 on the same GPU device (i.e., an NVIDIA RTX 4090 GPU).
Table 3. Comparison of model parameters and inference time across different noise synthesis methods. The results from all the compared methods in this table are obtained by inferring an image with a size of 512 × 512 on the same GPU device (i.e., an NVIDIA RTX 4090 GPU).
MethodC2NN2NNeCAOurs
Param.2.15M0.7M8.07M5.58M
Inference time78 ms1.9 ms18 ms25 ms
Table 4. Quantitative evaluation of denoising performance, including traditional methods, self-supervised methods, and a two-stage pipeline: “Generation-based methods with MAST-Net” and “Supervised learning methods with LD-NGN”.
Table 4. Quantitative evaluation of denoising performance, including traditional methods, self-supervised methods, and a two-stage pipeline: “Generation-based methods with MAST-Net” and “Supervised learning methods with LD-NGN”.
MethodPSNR (dB)SSIMReference
Traditional MethodBM3D31.4480.862[21]
WNNM28.720.749[35]
K-SVD27.890.705[36]
Self-supervisedAP-BSN32.120.859[26]
Generation-basedC2N + MAST-Net30.530.871[29]
N2N + MAST-Net30.810.860[30]
NeCA + MAST-Net32.450.879[31]
Supervised LearningSRM + LD-NGN34.860.904[32]
LLFLOW + LD-NGN34.360.908[33]
CTNet + LD-NGN35.160.928[34]
Ours
(LD-NGN+ MAST-Net)
35.380.930This work
Noise27.010.676-
Table 5. Comparison of model parameters and inference time across different denoise methods. The results from all the compared methods in this table are obtained by inferring an image with a size of 512 × 512 on the same GPU device (i.e., an NVIDIA RTX 4090 GPU).
Table 5. Comparison of model parameters and inference time across different denoise methods. The results from all the compared methods in this table are obtained by inferring an image with a size of 512 × 512 on the same GPU device (i.e., an NVIDIA RTX 4090 GPU).
MetricsBM3DAP-BSNSRMLLFLOWCTNetOurs
Param.-3.66 M37.59 M5.43 M54.5 M27.5 M
Inference time3059 ms2.21 ms347 ms48 ms2343 ms246 ms
Table 6. Quantitative comparison of loss functions in the proposed LD-NGN model.
Table 6. Quantitative comparison of loss functions in the proposed LD-NGN model.
MethodKLKL-Noise
Base0.09720.5146
Base+ Lwgan20.06920.1813
Base+ Lstb0.09370.3115
all0.01210.1644
Table 7. Comparison of the effectiveness of each module in the proposed LD-NGN network.
Table 7. Comparison of the effectiveness of each module in the proposed LD-NGN network.
MethodKLKL-Noise
Without decoder E20.12860.3401
Replace LPD-Net With U-Net0.02380.1964
Without LPD-Net0.32851.4345
ours0.01210.1644
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luo, Y.; Zhang, T.; Li, R.; Zhang, B.; Jia, N.; Fu, L. A Novel Framework for Real ICMOS Image Denoising: LD-NGN Noise Modeling and a MAST-Net Denoising Network. Remote Sens. 2025, 17, 1219. https://doi.org/10.3390/rs17071219

AMA Style

Luo Y, Zhang T, Li R, Zhang B, Jia N, Fu L. A Novel Framework for Real ICMOS Image Denoising: LD-NGN Noise Modeling and a MAST-Net Denoising Network. Remote Sensing. 2025; 17(7):1219. https://doi.org/10.3390/rs17071219

Chicago/Turabian Style

Luo, Yifu, Ting Zhang, Ruizhi Li, Bin Zhang, Nan Jia, and Liping Fu. 2025. "A Novel Framework for Real ICMOS Image Denoising: LD-NGN Noise Modeling and a MAST-Net Denoising Network" Remote Sensing 17, no. 7: 1219. https://doi.org/10.3390/rs17071219

APA Style

Luo, Y., Zhang, T., Li, R., Zhang, B., Jia, N., & Fu, L. (2025). A Novel Framework for Real ICMOS Image Denoising: LD-NGN Noise Modeling and a MAST-Net Denoising Network. Remote Sensing, 17(7), 1219. https://doi.org/10.3390/rs17071219

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop