Physics-Based Noise Modeling and Deep Learning for Denoising Permanently Shadowed Lunar Images

Pan, Haiyan; Chen, Binbin; Zhou, Ruyan

doi:10.3390/app15052358

Open AccessArticle

Physics-Based Noise Modeling and Deep Learning for Denoising Permanently Shadowed Lunar Images

by

Haiyan Pan

,

Binbin Chen

and

Ruyan Zhou

^*

College of Information Technology, Shanghai Ocean University, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(5), 2358; https://doi.org/10.3390/app15052358

Submission received: 1 February 2025 / Revised: 12 February 2025 / Accepted: 21 February 2025 / Published: 22 February 2025

(This article belongs to the Section Applied Physics General)

Download

Browse Figures

Versions Notes

Abstract

:

The Narrow-Angle Cameras (NACs) onboard the Lunar Reconnaissance Orbiter Camera (LROC) capture lunar images that play a crucial role in current lunar exploration missions. Among these images, those of the Moon’s permanently shadowed regions (PSRs) are highly noisy, obscuring the lunar topographic features within these areas. While significant advancements have been made in denoising techniques based on deep learning, the direct acquisition of paired clean and noisy images from the PSRs of the Moon is costly, making dataset acquisition expensive and hindering network training. To address this issue, we employ a physical noise model based on the imaging principles of the LROC NACs to generate noisy pairs of images for the Moon’s PSRs, simulating realistic lunar imagery. Furthermore, inspired by the ideas of full-scale skip connections and self-attention models (Transformers), we propose a denoising method based on deep information convolutional neural networks. Using a dataset synthesized through the physical noise model, we conduct a comparative analysis between the proposed method and existing state-of-the-art denoising approaches. The experimental results demonstrate that the proposed method can effectively recover topographic features obscured by noise, achieving the highest quantitative metrics and superior visual results.

Keywords:

lunar permanently shadowed regions; physical noise model; deep learning; full-scale skip connections; transformer; feature extraction

1. Introduction

The Moon, as a core target for human deep-space exploration, holds irreplaceable value in scientific research, resource exploration, and mission planning through its high-resolution surface images. These images can accurately reveal the Moon’s topographic and geological features, providing critical data to study its evolutionary history, crustal composition, and resource distribution, while also laying the foundation for the safe landing and execution of future missions. Currently, high-resolution images captured by the Lunar Reconnaissance Orbiter Camera (LROC) Narrow-Angle Cameras (NACs) have become a vital data source for lunar research, offering important insights for topographic analysis, geological composition studies, and geomorphic process exploration, significantly advancing the study of the Moon’s formation history, resource assessment, and precision landing missions [1,2]. Moreover, the permanently shadowed regions (PSRs) of the Moon, due to their unique polar lighting conditions, have garnered significant attention. Due to the Moon’s small axial tilt and low solar elevation angle, deep craters in the polar regions receive no direct sunlight year-round. They primarily rely on weak scattered light from nearby terrain and starlight, resulting in a very limited number of photons detected by cameras. Consequently, photon noise and sensor noise dominate the LROC NAC images [3,4], obscuring key topographic information and urgently requiring effective denoising techniques to improve image quality. Enhancing the clarity of PSR images is crucial for advancing lunar exploration.

In the field of traditional image denoising, several classic methods have been proposed, which are generally categorized into three main types: filtering methods, variational methods, and statistical modeling methods [5,6,7,8,9]. Among the filtering methods, the non-local means filter, introduced by Buades et al. (2005), utilizes the self-similarity of images by calculating similarity weights between pixel blocks and performing weighted averaging to achieve denoising. This method is effective in preserving image details and textures [10]. In the realm of variational methods, the total variation denoising model proposed by Rudin et al. (1992) removes noise by minimizing the total variation of the image. This model assumes that natural images exhibit piecewise smooth characteristics, allowing it to smooth noise in homogeneous regions while preserving edge information. It is particularly suitable for images with prominent edge structures [11]. In the domain of statistical modeling methods, Portilla et al. (2003) introduced the Gaussian scale mixture model for wavelet-domain denoising. By applying wavelet transforms to decompose the image and statistically modeling the wavelet coefficients, they assumed the coefficients followed a Gaussian scale mixture distribution and optimized them through Bayesian estimation. This significantly enhanced denoising performance, especially in terms of preserving image details and textures [12]. Additionally, Dabov et al. (2007) proposed the block-matching and 3D transform-domain filtering method, which groups similar image blocks and performs collaborative filtering in the transform domain. This method leverages the sparsity of the transform coefficients to remove noise, making it particularly effective for Gaussian noise removal [13]. These traditional denoising methods have demonstrated excellent performance across various noise models and are capable of effectively preserving both image details and structure.

With the development of deep learning technologies, more and more research have adopted deep learning methods to tackle denoising problems, which typically perform well under various noise conditions [14,15,16,17,18,19,20,21,22,23,24,25,26,27,28]. Convolutional Neural Networks are the most commonly used denoising models. Among these, Huang et al. (2021) first proposed a random neighbor sub-sampler to generate training image pairs, then trained a denoising network on the subsampled training pairs from the first stage and introduced an additional loss function for better performance. This method eliminates the need for noise-clean image pairs, multiple noise observations, or explicit noise modeling [14]. Cheng et al. (2020) designed a non-local subspace attention module for learning basis generation and subspace projection, training a network that separates signal and noise by learning a set of reconstruction bases in the feature space. This method successfully recovers high-quality images in scenes with weak textures or high-frequency details [15]. Liu et al. (2021) proposed a lightweight neural network solution to handle complex noise in real images, such as shot noise and companding noise. The network is reversible and suitable for denoising tasks, improving performance through deep supervised learning. Its lightweight design makes it applicable to embedded systems and mobile devices while achieving efficient denoising without losing original image information [16]. Byun et al. (2021) focused on addressing Poisson–Gaussian mixed noise, proposing a fast blind denoiser that effectively removes mixed noise from images without prior knowledge of the noise level [17].

In summary, traditional denoising strategies are computationally efficient and perform well under relatively simple conditions but often struggle with more complex noise distributions. On the other hand, deep learning-based methods, although often requiring more computational resources, exhibit strong capabilities in extracting significant features and handling complex noise types [21]. Given these advantages, data-intensive projects increasingly prefer deep learning methods in scenarios requiring higher-quality outputs, especially when ample data and computational power are available.

Although significant progress has been made in image denoising, existing research primarily focuses on Earth images, with relatively little attention given to denoising lunar PSR images. Compared to Earth images, lunar PSR images have significantly reduced contrast due to limited lighting conditions, making it difficult to separate topographic features and geological structures, thus increasing the complexity of subsequent analysis. Additionally, the working conditions of the LROC NAC imaging system differ from those of Earth cameras, and the images may contain various noise components, such as Poisson noise and stripe noise, significantly increasing denoising difficulty. Furthermore, the variation in solar incident angle and the diversity of terrain lead to uneven lighting, with some areas being dim while others are relatively bright, further complicating image processing.

Therefore, applying traditional models trained on Earth image denoising datasets to lunar PSR images presents challenges in effectively removing photon noise and stripe noise. This study proposes a denoising method for lunar PSR images that combines a physical noise model and deep learning techniques [29,30,31,32,33,34]. The main contributions of this study can be summarized as follows:

(1): This study integrates the physical noise model with deep learning, simulating the noise distribution and intensity in LROC NAC lunar PSR images, enhancing the denoising capability of the model and providing a physics-based simulation method for constructing training datasets, thereby improving the physical interpretability of the network model.
(2): This study proposes a full-scale feature refinement and local enhancement shifted window multi-head self-attention Transformer (FRET) network, which effectively captures multi-resolution noise and details through full-scale skip connections and enhances denoising accuracy and detail recovery using the self-attention mechanism of Transformers.
(3): This study proposes the REWin Transformer module, which combines the shifted window multi-head self-attention mechanism (SW-MSA), the gated feature refinement feedforward neural network (GFRN), and the gated convolution-based local enhancement feedforward neural network (GCEN). By refining feature maps, removing redundant information, and enhancing detail recovery, the denoising performance and texture reconstruction ability are significantly improved.

The structure of this paper is arranged as follows: Section 2 provides an overview of the dataset used in this study. Section 3 elaborates on the methods employed. Section 4 presents a comparative analysis of the experimental results for different methods, highlighting their distinctions and key differences, along with an ablation study to analyze the contribution of each proposed improvement. Section 5 discusses the significance and versatility of the proposed method, as well as the limitations of the physical noise model used in this study. Finally, Section 6 discusses and summarizes the main contributions and findings of this study.

2. Datasets

2.1. LROC NAC Data Information

The LROC NAC dataset is captured by two high-resolution NACs onboard the LROC, providing a resolution ranging from 0.5 to 2.0 m per pixel. The image size can reach up to 10,000 × 52,000 pixels, covering ground areas from approximately 5 × 26 square kilometers to 20 × 100 square kilometers [35]. From September 2009 to December 2011, LRO operated in a circular polar orbit at an altitude of 50 km, with an image resolution of approximately 0.5 m per pixel. After December 2011, LRO transitioned to an elliptical polar orbit, where the resolution for the northern hemisphere images decreased to 2.0–1.0 m per pixel, while the resolution for the southern hemisphere improved to 1.0–0.4 m per pixel [35]. Additionally, the LROC NAC supports a “summation” mode, where pixel merging is used to reduce the image width to 2532 pixels, thereby doubling the pixel scale to enhance the signal-to-noise ratio under low-light conditions [35].

2.2. Simulated Data of Lunar PSRs

The core challenge in obtaining high-quality permanent shadow region images lies in the scarcity of photon reflections from these low-light areas, where photon noise and sensor noise dominate [3]. Therefore, in this study, we adopted the physical noise model proposed by Moseley et al. (2021) [29] to train the deep learning model. This model, based on the imaging principles of the LROC NAC charge-coupled device (CCD), is used to simulate images of the permanent shadow regions. The specific noise synthesis formula is shown in Equation (1), and the flowchart of the process is illustrated in Figure 1.

I = N \times (F \times (S + N_{p})) + (N_{d} + T) + N_{b} + N_{r} + N_{c}

(1)

In the above equation, the mean photon signal S represents the ideal image without interference, while I denotes the simulated low-light image, and

N_{p}

represents photon noise. The flat-field response is described by F, and N captures the nonlinear response of the camera. The expression (

N_{d}

+ T) represents dark current noise, where T is the temperature of the CCD in the NAC camera. Additionally,

N_{b}

stands for dark bias noise,

N_{r}

for Read noise, and

N_{c}

corresponds to companding noise.

Specifically, photon noise

N_{p}

arises from the random arrival of photons at the CCD sensor and follows a Poisson distribution. When recording digital noise (DN) values, the LROC NAC responds linearly to the photon signal S. However, when DN values fall below 600, the response becomes nonlinear, necessitating the application of a nonlinear response correction function. The flat-field correction F addresses the issue of pixel sensitivity differences in the NACs, achieved by multiplying the forward response by the flat-field response. In the noise superposition process, only noiseless sunlight images are used, which contain only the forward response and must be multiplied by the flat-field response F. This flat-field response is generated from a specialized image, ensuring that the mean value of the central region is 1.0 [36]. Read noise

N_{r}

is the random system noise generated when carriers are converted into analog voltage signals. Dark noise consists of two components: dark current noise (

N_{d}

+ T) and dark bias noise

N_{b}

. Dark current noise originates from thermally excited carriers in the CCD, manifesting as horizontal and vertical stripes. Dark bias noise is deterministic, primarily appearing as vertical stripes. Therefore, dark noise can be considered a form of striped noise. To simulate dark noise, we randomly selected dark noise frames containing dark current noise, dark bias noise, and read noise. Companding noise

N_{c}

was not introduced, as these frames used a one-to-one mapping from 12-bit images to 8-bit format. Companding noise

N_{c}

is the noise generated when the NAC image undergoes a lossy compression process, reducing the 16-bit image to an 8-bit image, reflecting the core limitation of DN resolution. For the NACs, a compression scheme specifically designed for images with DN values lower than 500 was used. The noise generation process involves applying a compression function to the synthetic noise image, followed by reverse compression. Figure 2 illustrates the specific process by which the compression function reduces 12-bit values to 8-bit values [29].

The main difference between sunlight-exposed images and PSR images lies in the lighting environment. Sunlight-exposed images capture areas directly exposed to sunlight, while PSR images record areas illuminated only by indirect light. To better simulate the lighting conditions of the PSRs in the training data, we selected sunlight-exposed images whose secondary incident light angles are similar to the expected angles in the PSRs. These angles were determined by analyzing secondary incident light angles across several craters, which allowed for the specification of the lighting range for the training data. Figure 3 illustrates the distribution of incident angles for the selected training data and their corresponding range [29].

In summary, the clean image

\tilde{S}

represents the average photon signal, and the relationship between it and the noisy image

\tilde{I}

is given by Equation (2). Figure 4 provides a schematic illustration of the synthetic samples: Figure 4a shows a noise-free image captured under adequate lighting, while Figure 4b displays the noisy image generated by adding noise from the physical noise model to the image in Figure 4a. Notably, the image in Figure 4b exhibits prominent horizontal and vertical stripes caused by dark noise, significant color variations due to flat-field and nonlinear responses, as well as Poisson noise mixed with photon noise, Read noise, and companding noise.

\tilde{I} = N (F \times (\tilde{S} + N_{p})) + D + N_{c}

(2)

We selected 1093 original lunar surface images, spanning from 75° to −75° lunar latitude, which meet the physical noise model requirements as baseline inputs for noise synthesis. After systematic cropping and noise addition, the synthetic dataset consisted of 77,310 pairs of aligned images. Data partitioning followed a stratified distribution, with 10% retained for evaluation, while the remaining 90% was split 4:1 for model training and validation.

To validate the effectiveness of our training using the simulated dataset, we selected real lunar PSR experimental data from the unmodified Moon’s South Pole as the real-world dataset. Figure 5 presents some representative samples from this test set.

3. Methods

In this section, we first describe the overall pipeline and hierarchical structure of the FRET network for image denoising. We then provide detailed information about the key component of the proposed network, the REWin block.

3.1. Overall Pipeline

As shown in Figure 6, the proposed FRET is a hierarchical network with a U-shaped structure, where full-scale skip connections are employed between the decoder and encoder to expand the receptive field and capture multi-scale information. The specific process is as follows:

Given a noisy lunar PSR image

\tilde{I} \in R^{1 \times H \times W}

, the FRET first applies a convolution operation to map the input image to low-level features. In the encoding phase, following the U-shaped design, the feature map undergoes N encoding layers with downsampling, where each encoding layer is a REWin Transformer block.

For the decoding phase, the proposed FRET consists of N-1 decoding layers, each also being a REWin Transformer block. Due to the limitations of simple skip connections in fully capturing information from various scales, we introduce the concept of full-scale skip connections in the decoding part of the network to enhance its ability to learn multi-scale information. The full-scale skip connections modify the interconnections between the encoder and decoder, as well as within the decoder subnet [17]. This approach allows each decoding layer to merge feature maps from smaller and same-scale ones in the encoder, as well as from larger scales in the decoder. As a result, the feature map of each decoder layer is capable of capturing complex and extensive semantics across all scales. As shown in Figure 6, when i denotes the i-th downsampling layer along the encoding direction, the output feature map

X_{D e}^{i}

can be expressed by Equation (3):

X_{D e}^{i} = \{\begin{matrix} X_{E n}^{i}, i = N \\ F L W i n (\underset{Scales : 1^{t h} \sim i^{t h}}{\underset{⏟}{[C {(D (X_{E n}^{k}))}_{k = 1}^{i - 1}, C (X_{E n}^{i})}}, \underset{Scales : (i + 1)^{t h} \sim N^{t h}}{\underset{⏟}{C {(U (X_{D e}^{k}))}_{k = i + 1}^{N}]}}), i = 1, \dots, N - 1 \end{matrix}

(3)

In this equation, the function

C (.)

represents the convolution operation,

F L W i n (.)

denotes a REWin Transformer block, and

D (.)

and

U (.)

represent the upsampling and downsampling operations, respectively. The notation [] indicates concatenation along the channel dimension.

After the N-1 decoding stages, we apply a convolution to obtain the residual image

R \in R^{1 \times H \times W}

. Finally, the recovered image is obtained as

{\tilde{I}}^{'} \in \tilde{I} + R

. We use the mean Squared Error (MSE) loss to train FRET, where D represents the size of the dataset:

L o s s ({\tilde{I}}^{'}, \tilde{S}) = \frac{1}{D} \sum_{i = 1}^{d} ({\tilde{I}}^{'} - \tilde{S})^{2}

(4)

3.2. REWin Transformer Block

Due to the limitations of traditional Transformers in multi-scale information extraction and local information processing [37,38,39,40], we propose the REWin Transformer module, which aims to reduce redundancy, refine multi-scale information, and further enhance the refined local features, thus achieving superior image denoising performance. The REWin Transformer module integrates the self-attention mechanism to capture global information, while utilizing a feedforward neural network to extract and learn key local contextual information. The module consists of three core components: SW-MSA, GFRN, and GCEN. Figure 7 illustrates the specific structure of this module. When the input feature map is passed through the REWin Transformer module, its computational process can be expressed as follows:

\begin{matrix} X_{l}^{'} = S W - M S A (L N (X_{l})) + X_{l} \\ X_{l}^{″} = G F R N (L N (X_{l}^{'})) + X_{l}^{'} \\ X_{l}^{‴} = S W - M S A (L N (X_{l}^{″})) + X_{l}^{″} \\ X_{l}^{⁗} = G C L N (L N (X_{l}^{‴})) + X_{l}^{‴} \end{matrix}

(5)

The three key modules, SW-MSA, GFRN, and GCEN, will be introduced separately below. Their specific structures are shown in Figure 8, Figure 9 and Figure 10, respectively.

Firstly, SW-MSA performs self-attention within non-overlapping local windows. Compared to global self-attention, SW-MSA significantly reduces computational cost. Given a two-dimensional input feature

X \in R^{C \times H \times W}

, SW-MSA initially divides X into non-overlapping

M \times M

blocks. Subsequently, the features of each window are flattened and transposed, resulting in

X^{i} \in R^{M^{2} \times C}

for each window. Self-attention is then applied to the features within each window. The calculation procedure for the k-th head is described in Equation (6) [19].

\begin{matrix} X = \{X^{1}, X^{2}, \dots, X^{N}\} \\ N = \frac{H W}{M^{2}} \\ Y_{k}^{i} = A t t e n t i o n (X^{i} W_{k}^{Q}, X^{i} W_{k}^{K}, X^{i} W_{k}^{V}), i = 1, \dots, N \\ {\hat{X}}_{k} = \{Y_{k}^{1}, Y_{k}^{2}, \dots, Y_{k}^{N}\} \end{matrix}

(6)

W_{k}^{Q}, W_{k}^{K}, a n d W_{k}^{V} \in R^{C \times d_{k}}

are the projection matrices for Q, K, and V. Subsequently, the outputs from all heads are merged and linearly projected to produce the ultimate output of SW-MSA. As per earlier research [41,42], relative position encoding is integrated into this module. Consequently, the self-attention calculation can be represented by Equation (7) [19].

A t t e n t i o n (Q, K, V) = S o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}} + B) V

(7)

Here, B represents the relative position bias, which comes with learnable parameters [41,42].

In the process of enhancing feature representation through the self-attention mechanism, conventional Feedforward neural networks (FFN) [38] play a key role in independently processing pixel position information. However, the direct use of multi-scale information may introduce redundancy, which can affect the effectiveness of the features. Therefore, designing an efficient FFN to enhance feature representation is crucial for improving image recovery performance. When using SW-MSA as the basic module for feature extraction, redundant information that interferes with noise removal often exists in the feature map. To address this issue, we introduce GFRN to simplify and optimize feature transformation. Specifically, GFRN is constructed by incorporating PConv [43] and DWConv [44], which are used to reduce redundant computations and optimize feature information. Additionally, a gating mechanism is introduced to further alleviate the processing burden of redundant information. GFRN can be expressed by the following equation:

\begin{matrix} [X_{1}, X_{2}] = R (\hat{X}) \\ {\hat{X}}^{'} = G E L U (W_{1} F (P C o n v (X_{1}) ⨁ X_{2})) \\ [{\hat{X}}_{1}^{'}, {\hat{X}}_{2}^{'}] = {\hat{X}}^{'} \\ {\hat{X}}_{r}^{'} = {\hat{X}}_{2}^{'} ⨂ F (G E L U (D W C o n v (R ({\hat{X}}_{1}^{'})))) \\ {\hat{X}}_{o u t}^{'} = W_{2} {\hat{X}}_{r}^{'} \end{matrix}

(8)

Here, X represents the 2D feature map, and

\hat{X}

denotes the feature sequence;

W_{1}

and

W_{2}

represent the linear projections; [,] refers to the channel-wise slicing operation; the functions

R (.)

and

F (.)

represent reshaping and flattening operations, which transform the sequence input into a 2D feature map, and vice versa. These operations are crucial for introducing locality into the architecture [39]. The functions

P C o n v (.)

and

D W C o n v (.)

refer to the operations of partial convolution and depthwise convolution, respectively.

⨁

denotes channel stacking, and

⨂

represents matrix multiplication.

Overall, GFRN can extract representative key features from the information flow while effectively simplifying redundant features, thereby significantly enhancing feature representation capability. Moreover, GFRN also provides the model with the ability to clear non-informative features along the channel dimension, further optimizing the effectiveness of the features and improving the overall performance of the model.

After the feature map undergoes feature extraction through SW-MSA and redundancy removal and feature refinement via GFRN, the resulting feature map contains the key local information required for image recovery. To further enhance and extract this information, we introduce GCEN. Specifically, GCEN combines DWConv and GELU to construct a gated channel attention mechanism based on nearest-neighbor features, enabling the enhancement and extraction of local information. GCEN can be expressed by the following equation:

\begin{matrix} [{\hat{X}}_{1}^{'}, {\hat{X}}_{2}^{'}] = W_{1} \hat{X} \\ {\hat{X}}_{e}^{'} = {\hat{X}}_{2}^{'} ⨂ F (G E L U (D W C o n v (R ({\hat{X}}_{1}^{'})))) \\ {\hat{X}}_{o u t}^{'} = W_{2} {\hat{X}}_{e}^{'} \end{matrix}

(9)

The meanings and operations of the relevant variables and functions in the equation are consistent with those described in the GFRN section above.

Overall, GFRN is capable of further extracting effective features from the refined information while significantly enhancing feature representation capability. Additionally, the structural design of GFRN maintains the same network depth as multilayer perceptrons and gated linear units, ensuring its efficiency and stability during the backpropagation process.

3.3. Evaluating Indicator

To evaluate image quality, we employed a combination of reference and non-reference metrics. For the simulated noiseless images, we used Mean Absolute Error (MAE), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index Measure (SSIM) [45]. MAE quantifies pixel-level discrepancies between denoised and original images, while PSNR measures the disparity between the processed and original images, with higher values indicating better quality. SSIM offers a more holistic evaluation by comparing structural similarities, including luminance and contrast, and aligns well with human visual perception. For real lunar PSR images, which lack ground-truth references, we used three non-reference metrics: Integrated Local Natural Image Quality Evaluator (IL-NIQE) [46], Non-Reference Quality Metric (NRQM) [47], and Multi-Scale Image Quality (MUSIQ) [48]. IL-NIQE assesses image naturalness by comparing it to typical natural scene statistics, making it useful for evaluating natural feature preservation. NRQM provides a general image quality score based on perceptual quality, while MUSIQ simulates human visual perception using multi-scale information, reflecting subjective image texture across different spatial scales.

As non-reference image quality metrics, the specific values of IL-NIQE, NRQM, and MUSIQ are generally based on the image quality characteristics. The quality of denoised images is typically assessed by the difference in these metrics before and after denoising. Using the original image as a baseline, a lower IL-NIQE value for the denoised image compared to the original is considered better, while higher NRQM and MUSIQ values for the denoised image, relative to the original, are preferred [46,47,48].

4. Results

4.1. Experimental Environment and Model Training

The network is developed using the PyTorch 2.2.2 framework. For the comparative experiments, the Adam optimizer is employed for backpropagation, with training conducted for 200 epochs, a batch size of 16, and a learning rate of 0.0001, along with the use of automatic mixed precision. The training is performed on an NVIDIA RTX 3090 GPU with 24 GB of memory. For the ablation experiments, the setup is almost identical to that of the comparative experiments, with the only difference being a reduction of 55 epochs in the training duration.

This paper conducts both quantitative and qualitative experiments on simulated and real-world datasets to compare and analyze the proposed method with five recent state-of-the-art image denoising approaches. These include: U-Net (2019) [49,50], which utilizes an encoder-decoder architecture with skip connections to fuse multi-scale features for end-to-end noise suppression; UNet3Plus (2020) [51], which aggregates deep semantic and shallow details through a full-scale skip connection mechanism to enhance texture restoration capability; DRANet (2024) [18], which incorporates a dual residual attention subnetwork for collaborative optimization along with a global feature fusion strategy to improve noise modeling robustness; UFormer (2022) [19], which integrates window-based multi-head self-attention modules and local enhancement feed-forward networks for collaborative modeling of local-global feature interactions; and AST (2024) [20], which constructs a lightweight adaptive Transformer framework through structural sparsification and attention feature optimization components to balance computational complexity with denoising accuracy.

4.2. Experimental Results of Simulated Images

Table 1 presents the quantitative results of different denoising techniques on simulated images, with the best results highlighted in bold. All methods achieved very high performance metrics; however, the proposed method demonstrated effective improvements. Compared to the weakest baseline method, UNet, the proposed method reduced the MAE from 3.7986 to 2.5834, a decrease of 1.2152. Additionally, the SSIM improved from 0.9942 to 0.9973, an increase of 0.0031, and the PSNR rose from 60.6075 to 64.0642, an improvement of 3.4567. Moreover, the proposed method slightly outperformed DRANet in terms of SSIM, with an increase of 0.0004, and was 0.1763 lower than AST in MAE, with PSNR also showing an improvement of 0.3721. However, it is important to note that these differences are relatively small, and since the evaluation metrics are nearing saturation, further validation of the experimental results will be conducted through qualitative evaluations of denoising on simulated images using different networks, to provide supplementary insights to these quantitative findings.

Figure 11 shows the results of different denoising techniques applied to the simulated images labeled as a, b, c, and d. By comparing the regions within the red boxes in the figure, the following observations can be made: Although UNet is able to remove some noise, a significant amount of noise remains in the image. In contrast, UNet3Plus, with full-scale skip connections, demonstrates better denoising performance. However, as shown in Figure 11b,c, UNet3Plus smooths out terrain texture details during the denoising process, leading to the loss of topographical information. DRANet, while effective at noise removal, causes blurring along the edges of small pits in Figure 11a after denoising. UFormer still leaves considerable noise, especially in Figure 11b. The AST network almost eliminates all noise, but it still lacks sufficient texture detail retention capability, with very small pits or hills being reconstructed rather vaguely. In comparison, the method proposed in this study not only effectively preserves the details of the image during denoising but also removes a large amount of noise, while better restoring textures and topographical features obscured by noise, thus demonstrating the highest visual quality.

4.3. Experimental Results of Real Images

Table 2 presents the quantitative denoising results of different methods applied to real images. The evaluation metrics used include IL-NIQE, NRQM, and MUSIQ, with the results reflecting the average values of all processed real images. The best denoising results are emphasized in bold. According to Table 2, the proposed method performs excellently, achieving the lowest IL-NIQE score and the highest NRQM and MUSIQ scores. Specifically, compared to the worst-performing UNet network, the proposed method improves the IL-NIQE, NRQM, and MUSIQ scores by 3.875, 0.122, and 0.987, respectively. Furthermore, when compared to the better-performing AST network, the proposed method improves the IL-NIQE, NRQM, and MUSIQ scores by 0.604, 0.024, and 0.213, respectively. Although our proposed method shows improvements over other networks in these three non-parameter metrics, the differences are relatively small. Therefore, we will also conduct qualitative experiments using real images to compare the denoising results of different networks, in order to provide supplementary insights to the current quantitative findings.

Figure 12 shows the results of applying different denoising techniques to real images labeled as a, b, c, and d. By comparing the regions within the red boxes in the figure, the following observations can be made: UNet performs reasonably well on real images, but its robustness is poor, with denoising failure occurring in Figure 12c. While UNet3Plus demonstrates better denoising performance than UNet, in Figure 12a, the black regions in the noisy image negatively affect the denoising results, leading to the appearance of holes. Additionally, oversmoothing is observed, with significant brightness differences between some regions, particularly in Figure 12b. DRANet, UFormer, and AST perform well in denoising, but as shown in Figure 12c, these networks do not effectively remove vertical stripe noise, leaving it still visible. Furthermore, Figure 12a–c reveal that DRANet introduces stripes at the image edges, while UFormer and AST fail to recover fine texture details in the image. In contrast, the method proposed in this study not only effectively removes a significant amount of noise but also, as shown in Figure 12c, effectively eliminates vertical stripe noise. While maintaining excellent visual quality, it recovers the most texture details and topographical features, preserving the image information after denoising to the greatest extent.

4.4. Analysis of the Impact of Different Modules on FRET

To validate the effectiveness of the full-scale skip connections in FRET and to assess whether the order of applying GFRN and GCEN affects the network’s performance, we conducted a series of experiments using the same PSR simulation dataset and experimental settings, with 55 training epochs. The experimental results are presented in Table 3. In this experiment, “Baseline” refers to the UFormer model, “A” indicates the introduction of full-scale skip connections, “B” refers to the addition of the GFRN module, and “C” denotes the inclusion of the GCEN module. Since the RE-Win module contains two sets of SW-MSA and FFN, “B+B” signifies that both FFNs in the two networks are constructed with GFRN, while “C+C” means both FFNs in the two networks are constructed with GCEN. “C+B” represents the configuration where the FFNs in the two networks first use GCEN and then GFRN, and “B+C” indicates the configuration where the FFNs in the two networks first use GFRN followed by GCEN.

The experimental results show that after incorporating full-scale skip connections (A), the MAE decreases from 3.3185 to 3.2900, a reduction of 0.0285, while SSIM remains at 0.9957, and PSNR increases from 61.8474 to 61.9904, an improvement of 0.143. This indicates that full-scale skip connections help to enhance the fusion of information across different scales, improving the model’s ability to handle complex data, especially reflected in the significant reduction in MAE.

For Baseline+A, after the introduction of the GFRN module, Baseline+A+B+B results in a MAE of 2.9918, a reduction of 0.2982. SSIM improves from 0.9957 to 0.9966, with an increase of 0.0009, and PSNR rises from 61.8474 to 62.7690, an increase of 0.9216. This demonstrates that the GFRN module significantly improves MAE by reducing redundant computations and optimizing feature information, while also enhancing image quality, especially in PSNR and SSIM metrics, validating its crucial role in optimizing feature information and improving image quality.

Compared to Baseline+A+C+C (using the GCEN module), Baseline+A+B+B shows superior performance. Baseline+A+C+C achieves a MAE of 2.9525, a reduction of 0.3375, SSIM increases from 0.9957 to 0.9966, an increase of 0.0009, and PSNR increases from 61.8474 to 62.7622, an improvement of 0.9148. Although the GCEN module excels in enhancing local feature extraction, the GFRN module shows stronger advantages in optimizing global features. Therefore, the overall performance of Baseline+A+B+B exceeds that of Baseline+A+C+C.

Using a combination of GCEN and GFRN modules (Baseline+A+C+B) resulted in some performance improvements. Compared to Baseline+A+B+B, the MAE drops to 2.9976, a reduction of 0.0058. SSIM increases from 0.9966 to 0.9967, with an increase of 0.0001, and PSNR increases from 62.7690 to 62.8227, an improvement of 0.0537. This combination enhances local feature extraction through GCEN and further optimizes feature information through GFRN, reducing redundant computations and achieving a balanced performance improvement.

Although Baseline+A+C+B performs well, Baseline+A+B+C demonstrates even better performance. The MAE in Baseline+A+B+C decreases to 2.8575, 0.1401 lower than Baseline+A+C+B’s 2.9976. SSIM is 0.9968, slightly higher than 0.9967 in Baseline+A+C+B, with an increase of 0.0001. PSNR increases to 63.0946, 0.2720 higher than Baseline+A+C+B’s 62.8227. This difference arises from the order and complementarity of the modules. In Baseline+A+B+C, the GFRN module is applied first, enabling earlier optimization of feature information and reducing redundant computations. The GCEN module then further strengthens local feature extraction. In contrast, in Baseline+A+C+B, the GCEN module is applied first, enhancing local features, but the subsequent GFRN module’s optimization effect is somewhat diminished. Therefore, Baseline+A+B+C achieves a better balance between global feature optimization and local feature enhancement, ultimately showcasing superior overall performance.

Finally, the Baseline+A+B+C combination achieves the best MAE value of 2.8575, a reduction of 0.4610 compared to Baseline+A. SSIM is 0.9968, an increase of 0.0011, and PSNR is 63.0946, an increase of 1.2472. These results indicate that the combination of the GFRN and GCEN modules not only optimizes feature information and local feature extraction but also significantly improves image quality, achieving the best performance across all three metrics.

5. Discussion

5.1. The Significance of PSR Image Denoising

Clear PSR images are significant for both scientific exploration and mission operations. The advanced denoising techniques we provide can effectively enhance the clarity of these images, enabling researchers to identify obstacles and terrain hazards in poorly lit areas. This directly contributes to the safety of lunar PSR missions and reduces uncertainties in rover and human traversal plans. Additionally, it helps scientists identify surface-exposed water ice in PSRs [52], providing crucial data for studying resource distribution on the Moon. Therefore, by offering better denoising techniques for PSR images captured by LROC NAC, we establish a more reliable observational foundation for formulating lunar exploration strategies, optimizing lunar traversal paths, and prioritizing scientific sampling sites for the next phase of human lunar exploration.

5.2. Scalability of the Proposed Method

Our proposed denoising method exhibits strong scalability. The full-scale skip connections in our approach effectively integrate multi-scale information, while the SW-MSA, GFRN, and GCEN modules within the Transformer architecture excel at capturing complex noise patterns. This not only makes the method well-suited for the low-light conditions of the lunar environment but also highlights its potential applicability to denoising tasks in low-light terrestrial conditions or heavily degraded images. By adjusting the noise characteristics of the training dataset, this method holds promise for broader applications, such as processing extraterrestrial imagery or addressing image denoising tasks under various noise conditions.

5.3. Limitations of Physical Noise Models

The physical noise model used in this study simulates extreme low-light PSR images by tightly coupling the hardware parameters of the LROC NAC camera with environmental conditions. However, it still has certain limitations. This highly customized design makes the model difficult to generalize to other satellites or probes equipped with CCD sensors, as its noise parameter estimation heavily depends on the unique physical characteristics of LROC NAC.

For instance, the photon flux calculation is directly based on LROC NAC’s orbital altitude and short exposure time. In contrast, other imaging systems, such as the OHRC camera on Chandrayaan-2 or the optical payloads on the Chang’e series, have different orbital parameters (such as altitude and inclination) and exposure strategies. These differences cause significant deviations in photon statistical distributions and noise coupling mechanisms from the model’s predefined conditions, leading to potential failures in noise estimation.

Such limitations indicate that the model’s physical assumptions are strongly tied to the hardware parameters of LROC NAC. Applying it directly to other CCD systems would require reconstructing the noise generation pipeline and revalidating the model, which incurs substantial engineering costs.

6. Conclusions

This study expands the existing research on the denoising of lunar PSR images by combining physical noise models with deep learning techniques. The physical noise model is used to simulate the distribution and intensity of noise, and a training dataset is constructed based on this model. In addition, this article proposes an improved network architecture aimed at enhancing denoising capabilities. By testing simulated and actual images of the lunar PSR, the following conclusions were drawn:

(1): The physical noise model effectively simulated the noise in the lunar permanent shadow area images obtained from LROC NAC. More importantly, this model not only provides an effective solution for constructing training samples, but also enhances the physical interpretability of the network.
(2): This study proposes a FRET network that combines full-scale skip connections with Transformer structures. By using full-scale skip connections and Transformer structures to enhance the network’s ability to obtain multi-scale information and extract key information, the denoising effect on complex noisy images has been significantly improved.
(3): This study proposes a REWin Transformer block consisting of SW-MSA, GFRN, and GCEN modules to remove redundant information from feature maps and reduce interference with network learning. Meanwhile, fine processing and local enhancement are applied to the remaining key information, further enhancing the network’s ability to recover texture details from noisy images.

Although the current model outperforms mainstream methods on both simulated and real datasets, some residual noise remains when handling strong stripe noise. Given the limitations of the current approach, improvements to the denoising method may involve integrating frequency-domain attention-based denoising techniques, particularly optimized for stripe noise, and developing noise estimation models to enhance the recognition of stripe features. Furthermore, in future research, we will focus on leveraging high-quality PSR images to explore geometric consistency constraints in multi-view PSR imagery, optimizing the accuracy of 3D point cloud reconstruction based on the texture details of the denoised results. Finally, by employing cross-modal feature alignment techniques, we aim to establish a multi-modal data fusion framework that integrates complementary information from Mini-RF radar data and optical imagery, thereby enhancing the spatial resolution of lunar PSR 3D reconstruction. This approach not only provides sub-meter precision 3D geographic information for lunar polar ice detection but also facilitates the transition of deep space exploration image processing from 2D to 3D.

Author Contributions

Conceptualization, H.P. and R.Z.; methodology, H.P., B.C. and R.Z.; software, B.C.; validation, H.P.; formal analysis, H.P.; investigation, B.C.; resources, R.Z.; data curation, B.C.; writing—original draft preparation, R.Z. and B.C.; writing—review and editing, R.Z., B.C. and H.P.; visualization, H.P.; supervision, R.Z. and H.P.; project administration, R.Z.; funding acquisition, R.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China under Grant 42241164.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the National Aeronautics and Space Administration at https://www.lroc.asu.edu/data/, accessed on 12 February 2024.

Acknowledgments

We thank the National Aeronautics and Space Administration for providing the Lunar Reconnaissance Orbiter Camera data that made this article possible.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LROC	Lunar Reconnaissance Orbiter Camera
NACs	Narrow-Angle Cameras
PSRs	Permanently shadowed regions
FRET	Full-scale feature refinement and local enhancement shifted window multi-head self attention transformer
SW-MSA	Shifted window multi-head self attention
GFRN	Gated feature refinement feedforward neural network
GCEN	Gated convolution-based local enhancement feedforward neural network
CCD	Charge-coupled device
DN	Digital number
MSE	Mean Squared Error
FFN	Feedforward neural networks
MAE	Mean Absolute Error
SSIM	Structural Similarity Index Measure
PSNR	Peak Signal-to-Noise Ratio
IL-NIQE	Integrated Local Natural Image Quality Evaluator
NRQM	Non-Reference Quality Metric
MUSIQ	Multi-Scale Image Quality

References

Gordon, S.; Brylow, M.; Foote, J.; Garvin, J.; Kasper, J.; Keller, M.; Litvak, I.; Mitrofanov, D.; Paige, K.; Raney, M.; et al. Lunar reconnaissance orbiter overview: The instrument suite and mission. Space Sci. Rev. 2007, 129, 391–419. [Google Scholar]
Robinson, M.S.; Brylow, S.M.; Tschimmel, M.E.; Humm, D.; Lawrence, S.J.; Thomas, P.C.; Denevi, B.W.; Bowman-Cisneros, E.; Zerr, J.; Ravine, M.A.; et al. Lunar reconnaissance orbiter camera (LROC) instrument overview. Space Sci. Rev. 2010, 150, 81–124. [Google Scholar] [CrossRef]
Chromey, F.R. To Measure the Sky; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
Konnik, M.; Welsh, J. High-level numerical simulations of noise in CCD and CMOS photosensors: Review and tutorial. arXiv 2014, arXiv:1412.4031. [Google Scholar]
Buades, A.; Coll, B.; Morel, J.M. A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 2005, 4, 490–530. [Google Scholar] [CrossRef]
Goyal, B.; Dogra, A.; Agrawal, S.; Sohi, B.S.; Sharma, A. Image denoising review: From classical to state-of-the-art approaches. Inf. Fusion 2020, 55, 220–244. [Google Scholar] [CrossRef]
Fan, L.; Zhang, F.; Fan, H.; Zhang, C. Brief review of image denoising techniques. Vis. Comput. Ind. Biomed. Art 2019, 2, 7. [Google Scholar] [CrossRef] [PubMed]
Gu, S.; Timofte, R. A Brief Review of Image Denoising Algorithms and Beyond. In Inpainting and Denoising Challenges; Part of the The Springer Series on Challenges in Machine Learning Book Series (SSCML); Springer: Cham, Switzerland, 2019. [Google Scholar]
Jebur, R.S.; Zabil, M.H.B.M.; Hammood, D.A.; Cheng, L.K. A comprehensive review of image denoising in deep learning. Multimed. Tools Appl. 2024, 83, 58181–58199. [Google Scholar] [CrossRef]
Buades, A.; Coll, B.; Morel, J.M. A non-local algorithm for image denoising. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–25 June 2005; Volume 2, pp. 60–65. [Google Scholar]
Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 1992, 60, 259–268. [Google Scholar] [CrossRef]
Portilla, J.; Strela, V.; Wainwright, M.J.; Simoncelli, E.P. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans. Image Process. 2003, 12, 1338–1351. [Google Scholar] [CrossRef] [PubMed]
Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef] [PubMed]
Huang, T.; Li, S.; Jia, X.; Lu, H.; Liu, J. Neighbor2Neighbor: Self-Supervised Denoising from Single Noisy Images. arXiv 2021, arXiv:2101.02824. [Google Scholar]
Cheng, S.; Wang, Y.; Huang, H.; Liu, D.; Fan, H.; Liu, S. NBNet: Noise Basis Learning for Image Denoising with Subspace Projection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
Liu, Y.; Qin, Z.; Anwar, S.; Ji, P.; Kim, D.; Caldwell, S.; Gedeon, T. Invertible Denoising Network: A Light Solution for Real Noise Removal. arXiv 2021, arXiv:2104.10546. [Google Scholar]
Byun, J.; Cha, S.; Moon, T. FBI-Denoiser: Fast Blind Image Denoiser for Poisson-Gaussian Noise. arXiv 2021, arXiv:2105.10967. [Google Scholar]
Wu, W.; Liu, S.; Xia, Y.; Zhang, Y. Dual residual attention network for image denoising. Pattern Recognit. 2024, 149, 110291. [Google Scholar] [CrossRef]
Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A General U-Shaped Transformer for Image Restoration. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 17662–17672. [Google Scholar]
Zhou, S.; Chen, D.; Pan, J.; Shi, J.; Yang, J. Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
Kong, Z.; Deng, F.; Zhuang, H.; Yu, J.; He, L.; Yang, X. A Comparison of Image Denoising Methods. arXiv 2023, arXiv:2304.08990. [Google Scholar]
Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [PubMed]
Mao, X.J.; Shen, C.; Yang, Y.B. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Proceedings of the NIPS’16: Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 2810–2818. [Google Scholar]
Tai, Y.; Yang, J.; Liu, X.; Xu, C. MemNet: A Persistent Memory Network for Image Restoration. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
Guo, S.; Yan, Z.; Zhang, K.; Zuo, W.; Zhang, L. Toward convolutional blind denoising of real photographs. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; IEEE Computer Society: Washington, DC, USA, 2019. [Google Scholar]
Lehtinen, J.; Munkberg, J.; Hasselgren, J.; Laine, S.; Karras, T.; Aittala, M.; Aila, T. Noise2Noise: Learning image restoration without clean data. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; International Machine Learning Society (IMLS): Stroudsburg, PA, USA, 2018. [Google Scholar]
Krull, A.; Buchholz, T.O.; Jug, F. Noise2void-Learning denoising from single noisy images. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2124–2132. [Google Scholar]
Lempitsky, V.; Vedaldi, A.; Ulyanov, D. Deep Image Prior. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9446–9454. [Google Scholar]
Moseley, B.; Bickel, V.; López-Francos, I.G.; Rana, L. Extreme Low-Light Environment-Driven Image Denoising over Permanently Shadowed Lunar Regions with a Physical Noise Model. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 6313–6323. [Google Scholar]
Chen, C.; Chen, Q.; Xu, J.; Koltun, V. Learning to See in the Dark. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Wei, K.; Fu, Y.; Yang, J.; Huang, H. A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Xu, K.; Yang, X.; Yin, B.; Lau, R.W.H. Learning to restore low-light images via decomposition-and-enhancement. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2278–2287. [Google Scholar]
Maharjan, P.; Li, L.; Li, Z.; Xu, N.; Ma, C.; Li, Y. Improving extreme low-light image denoising via residual learning. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 8–12 July 2019; IEEE Computer Society: Washington, DC, USA, 2019. [Google Scholar]
Szeliski, R. Computer Vision: Algorithms and Applications; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Working with Lunar Reconnaissance Orbiter LROC Narrow. Available online: http://www.lroc.asu.edu/data/support/downloads/LROC_NAC_Processing_Guide.pdf (accessed on 21 November 2023).
Humm, D.C.; Tschimmel, M.; Brylow, S.M.; Mahanti, P.; Tran, T.N.; Braden, S.E.; Wiseman, S.; Danton, J.; Eliason, E.M.; Robinson, M.S. Flight Calibration of the LROC Narrow Angle Camera. Space Sci. Rev. 2016, 200, 431–473. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the 8th International Conference on Learning Representations, ICLR, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the NeurIPS, Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Li, Y.; Zhang, K.; Cao, J.; Timofte, R.; Van Gool, L. LocalViT: Bringing Locality to Vision Transformers. arXiv 2021, arXiv:2104.05707. [Google Scholar]
Wu, H.; Xiao, B.; Codella, N.; Liu, M.; Dai, X.; Yuan, L.; Zhang, L. CvT: Introducing convolutions to vision transformers. arXiv 2021, arXiv:2103.15808. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar]
Shaw, P.; Uszkoreit, J.; Vaswani, A. SelfAttention with Relative Position Representations. arXiv 2018, arXiv:1803.02155. [Google Scholar]
Chen, J.; Kao, S.; He, H.; Zhuo, W.; Wen, S.; Lee, C.; Chan, S.G. Run, don’t walk: Chasing higher flops for faster neural networks. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, L.; Bovik, A. A Feature-Enriched Completely Blind Image Quality Evaluator. IEEE Trans. Image Process. 2015, 24, 2579–2591. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Sheikh, H.; Bovik, A. No-reference perceptual quality assessment of JPEG compressed images. In Proceedings of the Proceedings. International Conference on Image Processing, Rochester, NY, USA, 22–25 September 2002. [Google Scholar]
Ke, J.; Wang, Q.; Wang, Y.; Milanfar, P.; Yang, F. MUSIQ: Multi-scale Image Quality Transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 5128–5137. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
Dua, P. Image Denoising Using a U-Net. 2019. Available online: http://stanford.edu/class/ee367/Winter2019/dua_report.pdf (accessed on 15 April 2023).
Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.W.; Wu, J. UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 1055–1059. [Google Scholar]
Li, S.; Lucey, P.G.; Milliken, R.E.; Hayne, P.O.; Fisher, E.; Williams, J.P.; Hurley, D.M.; Elphic, R.C. Direct evidence of surface exposed water ice in the lunar polar regions. Proc. Natl. Acad. Sci. USA 2018, 115, 8907–8912. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The flowchart of data synthesis.

Figure 2. Compression function.

Figure 3. Original image selection.

Figure 4. Illustration of simulated images. (a) Noiseless images with sufficient lighting; (b) Synthesized noisy images.

Figure 5. Real images of PSRs.

Figure 6. The structure of the FRET.

Figure 7. The structure of the REWin block.

Figure 8. The structure of the SW-MSA.

Figure 9. The structure of the GFRN.

Figure 10. The structure of the GCEN.

Figure 11. Denoising results of the simulated images. (a) Latitude: −72.78° Longitude: 310.81° (b) Latitude: 57.36° Longitude: 194.01° (c) Latitude: −58.12° Longitude: 153.74° (d) Latitude: −32.57° Longitude: 153.23°.

Figure 12. Denoising result of the real images. (a) Latitude: −88.13° Longitude: 216.34° (b) Latitude: −70.26° Longitude: 187.91° (c) Latitude: −83.63° Longitude: 16.12° (d) Latitude: −83.52° Longitude: 56.23°.

Table 1. Quantitative evaluation results for the simulated image experiment.

Evaluating Indicator	UNet	UNet3Plus	DRANet	UFormer	AST	This Study
MAE	3.7986	3.1189	2.8718	2.8728	2.7597	2.5834
SSIM	0.9942	0.9958	0.9969	0.9967	0.9968	0.9973
PSNR	60.6075	61.5843	62.6493	62.8602	63.6921	64.0642

Table 2. Quantitative evaluation results for the real image experiment.

Evaluating Indicator	Original Image	UNet	UNet3Plus	DRANet	UFormer	AST	This Study
IL-NIQE	143.279	97.981	95.895	97.522	97.085	94.710	94.106
NRQM	2.762	2.946	2.969	3.015	2.988	3.044	3.068
MUSIQ	25.272	26.791	26.810	26.134	27.203	27.565	27.778

Table 3. Quantitative evaluation results for the ablation experiments.

Evaluating Indicator	Baseline	Baseline+A	Baseline+A+B+B	Baseline+A+C+C	Baseline+A+C+B	Baseline+A+B+C
MAE	3.3185	3.2900	2.9918	2.9525	2.9976	2.8575
SSIM	0.9957	0.9957	0.9966	0.9966	0.9967	0.9968
PSNR	61.8474	61.9904	62.7690	62.7622	62.8227	63.0946

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pan, H.; Chen, B.; Zhou, R. Physics-Based Noise Modeling and Deep Learning for Denoising Permanently Shadowed Lunar Images. Appl. Sci. 2025, 15, 2358. https://doi.org/10.3390/app15052358

AMA Style

Pan H, Chen B, Zhou R. Physics-Based Noise Modeling and Deep Learning for Denoising Permanently Shadowed Lunar Images. Applied Sciences. 2025; 15(5):2358. https://doi.org/10.3390/app15052358

Chicago/Turabian Style

Pan, Haiyan, Binbin Chen, and Ruyan Zhou. 2025. "Physics-Based Noise Modeling and Deep Learning for Denoising Permanently Shadowed Lunar Images" Applied Sciences 15, no. 5: 2358. https://doi.org/10.3390/app15052358

APA Style

Pan, H., Chen, B., & Zhou, R. (2025). Physics-Based Noise Modeling and Deep Learning for Denoising Permanently Shadowed Lunar Images. Applied Sciences, 15(5), 2358. https://doi.org/10.3390/app15052358

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Physics-Based Noise Modeling and Deep Learning for Denoising Permanently Shadowed Lunar Images

Abstract

1. Introduction

2. Datasets

2.1. LROC NAC Data Information

2.2. Simulated Data of Lunar PSRs

3. Methods

3.1. Overall Pipeline

3.2. REWin Transformer Block

3.3. Evaluating Indicator

4. Results

4.1. Experimental Environment and Model Training

4.2. Experimental Results of Simulated Images

4.3. Experimental Results of Real Images

4.4. Analysis of the Impact of Different Modules on FRET

5. Discussion

5.1. The Significance of PSR Image Denoising

5.2. Scalability of the Proposed Method

5.3. Limitations of Physical Noise Models

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI