Residual-Based Implicit Neural Representation for Synthetic Aperture Radar Images

Han, Dongshen; Zhang, Chaoning

doi:10.3390/rs16234471

Open AccessArticle

Residual-Based Implicit Neural Representation for Synthetic Aperture Radar Images

by

Dongshen Han

and

Chaoning Zhang

^*

School of Computing, Kyung Hee University, Yongin-si 17104, Gyeonggi-do, Republic of Korea

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(23), 4471; https://doi.org/10.3390/rs16234471

Submission received: 26 September 2024 / Revised: 27 October 2024 / Accepted: 8 November 2024 / Published: 28 November 2024

Download

Browse Figures

Versions Notes

Abstract

:

Implicit neural representations (INRs) are a new way to represent all kinds of signals ranging from 1D audio to 3D shape signals, among which 2D images are the most widely explored due to their ubiquitous presence. Image INRs utilize a neural network to learn a continuous function that takes pixel coordinates as input and outputs the corresponding pixel values. The continuous representation of synthetic aperture radar (SAR) images using INRs has not yet been explored. Existing INR frameworks developed on natural images show reasonable performance, but this performance suffers when capturing fine details. This can be attributed to INR’s prioritization of learning inter-pixel relationships, which harms intra-pixel mapping in those regions that require fine detail. To address this, we decompose the target image into an artificial uniform noise component (intra-pixel mapping) and a residual image (inter-pixel relationships). Rather than directly learning the INRs for the target image, we propose a noise-first residual learning (NRL) method. The NRL first learns the uniform noise component, then gradually incorporates the residual into the optimization target using a sine-adjusted incrementation scheme as training progresses. Given that some SAR images inherently contain significant noise, which can facilitate learning the intra-pixel independent mapping, we propose a gradient-based dataset separation method. This method distinguishes between clean and noisy images, allowing the model to learn directly from the noisy images. Extensive experimental results show that our method achieves competitive performance, indicating that learning the intra-pixel independent mapping first, followed by the inter-pixel relationship, can enhance model performance in learning INR for SAR images.

Keywords:

implicit neural representation; synthetic aperture radar; residual

1. Introduction

Recently, a new approach using neural networks to represent signals continuously has garnered significant attention from researchers [1]. Unlike traditional discrete representations, this continuous approach has become a powerful and versatile tool for solving inverse problems such as image super-resolution and novel view synthesis [2,3]. The implicit representation of an image signal is expressed as a continuous function composed of pixel coordinates and their corresponding pixel values. When learning an implicit neural representation (INR) for an image, the model learns a function that takes pixel coordinates as input and outputs their values [4,5]. To accurately represent image signals, SIREN proposes using sinusoidal functions as activation functions to learn the image signal [2,6,7]. Moreover, some prior studies have found that applying complex mappings to the input signals allows the model to learn their INR more accurately [8,9,10].

Synthetic aperture radar (SAR) is a technology that leverages the relative motion between the radar and the target to process radar echoes, effectively emulating a large antenna aperture. This contrasts with traditional radar systems, which rely on the physical size of the antenna for resolution [11]. SAR systems can be deployed on various platforms and operate across different frequency bands to suit specific needs. However, SAR images differ significantly from natural images due to the imaging technique employed [12,13,14,15]. For instance, in scenarios requiring foliage penetration, SAR uses lower carrier frequencies, resulting in reduced bandwidths and lower resolution. Additionally, SAR images inherently contain speckle noise due to the interference of multiple radar signals scattered by rough surfaces or atmospheric conditions [16,17].

Why INRs in (SAR) images? Conventionally, images are represented by discrete grids of pixels, while INRs parameterize an image as a continuous function that maps the image’s pixel coordinates to the corresponding pixel intensity values. By default, SAR images are also represented by discrete pixel grids with the conventional approach. Compared ot the conventional approach, INRs have multiple benefits worth highlighting. First, the continuous nature of INRs decouples them from the number of discrete pixels. They can be sampled at arbitrary spatial resolutions, which gives them “infinite resolution”. Moreover, the memory required to parameterize the image no longer depends on the spatial resolution; instead, it only depends on the complexity of the image when it is perceived as a function. Despite extensive investigation of INRs on various types of optical images, including natural images [18], CT images [19], and tumor images [20], there has been little research on the continuous representation of SAR images. To fill this gap, the current work pioneers the investigation of INRs on SAR images.

Image INRs perform a pixel-wise mapping from the coordinate values to their corresponding intensity values, but with a shared INR model weight. In other words, in addition to pixel-wise independent learning, the INR model also needs to exploit the relationship between different pixels (most image regions being smooth, for instance). However, it is nontrivial to learn the inter-pixel relationships, which is conjectured to jeopardize the learning of intra-pixel independent coordinate-to-value mapping. Therefore, we propose to first learn the intra-pixel independent mapping by learning a noise component in the image, which minimizes the learning of the inter-pixel relationship because the values in the noise component are independent. The residual component is then gradually added to the optimization target so that the full image can be learned. Specifically, we decompose the target image into an artificial uniform noise and a residual image. Instead of directly learning the INR for the target image, we propose a noise-first residual learning method which first learns the uniform noise component and then gradually incorporates the residual into the optimization target as the training epochs increase. Moreover, we employ a sine-adjusted incrementation scheme to control the addition of the residual component, ensuring a smooth transition throughout the learning process. Target SAR images used for learning may already contain substantial noise, which can directly facilitate the model’s learning of the intra-pixel independent mapping. Empirically, we find that our proposed method significantly improves clean images, but at the cost of a performance drop for noisy images. Therefore, we propose a gradient-based dataset separation method to distinguish between clean images and noisy images. For noisy SAR images, the model learns directly using the default target, while the proposed noise-first learning method is applied only to clean SAR images. Overall, our contributions are summarized as follows:

We pioneer the exploration of continuous representation with INRs for SAR images, finding that INRs can benefit from first learning the intra-pixel independent mapping.
We propose a noise-first residual learning process which first learns the uniform noise (intra-pixel independent mapping), then gradually incorporates the residual image (inter-pixel relationships) into the optimization target.
Extensive experiments demonstrate that our noise-first residual learning approach significantly improves performance over multiple state-of-the-art INR baseline methods.

2. Related Work

2.1. Implict Neural Representation

Implicit neural representations (INRs) utilize multi-layer perceptrons (MLP) [21] to map spatial coordinates to their corresponding values, offering a compact and flexible means of representing diverse types of data. They have been applied to a wide array of domains, such as audio processing [6], image generation [1,22], point cloud encoding [23], and complex 3D scene modeling [2,24,25]. INRs have also been explored in medical image representation, particularly for enhancing image quality in modalities such as MRIs [4] and CT scans [19]. Moreover, extensive research has explored transforming input formats [26] and network architectures [6,27] in INRs to improve the networks’ ability to learn and represent signals across a diverse range of tasks. SIREN [6] uses sinusoidal activations to represent high-frequency details in signals, while FINER [28] improves resolution generalization through multiscale frequency decomposition. DINER [26] adapts to temporal variations in dynamic signals such as video or time series data, while WIRE [27] leverages wavelet decomposition to efficiently capture both global and local details in large-scale datasets. NeRF [2] is a widely-used model for 3D scene representation. MetaSDF [29] uses meta-learning for signal frequency regulation, while NeRF-W improves the handling of dynamic scenes. TensoRF [30] leverages tensor decomposition to significantly enhance memory efficiency and inference speed in implicit neural representations, further showcasing the potential of INRs in computationally demanding environments.

2.2. Synthetic Aperture Radar Images

Synthetic aperture radar (SAR) is an advanced radar imaging technique that leverages the motion of the radar platform to create high-resolution images. SAR synthesizes a large antenna aperture by transmitting pulses and listening for echoes called backscatter, enabling detailed imaging over wide areas [31]. This imaging process involves sophisticated signal processing techniques such as pulse compression and Doppler processing to achieve fine spatial resolution [32,33]. SAR systems can be deployed on various platforms for wide-area imaging, including aircraft and satellites, or can be configured as ground-based systems for high-resolution close-range observations [34,35]. SAR technology has extensive applications across various domains [36]. In maritime surveillance, SAR is pivotal for monitoring and tracking vessels, leading to significant enhancements in maritime safety and security [37]. In environmental monitoring, SAR is crucial to tracking changes such as deforestation and natural disasters [38]. In agriculture, SAR assists in evaluating crop health and managing resources [39]. Additionally, SAR supports urban planning by providing detailed imagery for infrastructure development [40]. In defense and security, SAR is widely used for reconnaissance and surveillance [41]. Prominent SAR datasets that advance research and development include SARDet-100k [42], a comprehensive dataset featuring over 100,000 images and 245,653 labeled instances, which serves as a benchmark for SAR object detection across such various categories as aircraft, ships, cars, tanks, bridges, and harbors [43]. Additionally, the MSTAR dataset [44] provides extensive SAR imagery specifically designed for target recognition and classification. Deep learning techniques such as CNN-based despeckling methods [45], support vector machines [46], generative adversarial network (GAN)-based image augmentation [47,48], and multistream networks [49] have unlocked new capabilities of SAR imagery under various operational conditions [50,51]. The unique challenges posed by complex signals and speckle noise in synthetic aperture radar(SAR) images present difficulties for traditional representation methods. With SAR images being applied to a wide range of tasks, our work pioneers the use of INR in their continuous representation.

3. Background and Method

3.1. Background

Image INR. Unlike traditional discrete representations of images, such as pixels [52], implicit neural representations (INRs) enable the model to represent these signals continuously. For an image I, we typically learn a continuous function

Φ

using a neural network, which takes pixel coordinates

(P_{x}, P_{y})

as input and outputs the corresponding value

v \in R^{3}

, formulated as

Φ (P_{x}, P_{y}) = I (P_{x}, P_{y}) .

(1)

A multi-layer perceptron (MLP) is typically employed for this, consisting of an input layer, hidden layers, and an output layer, as shown in Figure 1. The MLP learns the continuous function

Φ (x, y)

by optimizing its parameters

θ

to minimize a loss function

L

, which measures the difference between the predicted and actual pixel values:

θ^{*} = arg min_{θ} L (Φ_{θ} (P_{x}, P_{y}), I (P_{x}, P_{y})) .

(2)

The loss function

L

typically uses the mean squared error (MSE), which measures the squared differences between predicted and true pixel values. The MSE loss is provided by

MSE (\hat{y}, y) = \frac{1}{W \cdot H} \sum_{i = 1}^{W} \sum_{j = 1}^{H} {({\hat{y}}_{(i, j)} - y_{(i, j)})}^{2},

(3)

where W and H are the image’s width and height, respectively, and

(i, j)

denote the pixel locations. Then, the terms

{\hat{y}}_{(i, j)}

and

y_{(i, j)}

respectively represent the predicted and actual pixel values.

Seminal Work. SIREN (Sinusoidal Representation Networks) [6] is a notable prior work for image INRs. A defining feature of SIREN is the use of periodic sine functions as the activation functions. The adoption of this activation function smoothens the difference between the inputs and outputs, as shown in Figure 2. The periodic nature of SIREN enables it to effectively capture fine variations in natural signals, thereby enhancing its ability to model high-frequency details with precision. SIREN takes as input a set of element coordinates

x \in R^{d}

representing the spatial or temporal locations within a continuous data domain. These coordinates are mapped to corresponding signal attributes

y \in R^{k}

, such as color values, intensity levels, or other features, through a learnable function

Φ : R^{d} \to R^{k}

, which is parameterized by

θ

. The function

Φ (x; θ)

is designed to approximate the mapping between the coordinate space and its associated attributes. Specifically, the transformation from

x_{i}

to

y_{i}

is achieved through a recursive series of nonlinear operations in each layer of the network:

\begin{matrix} h_{0} = x \\ h_{l + 1} = sin (W_{l} h_{l} + b_{l}) \end{matrix}

(4)

where

W_{l}

and

b_{l}

are respectively the weight matrix and bias of the l-th layer, while

h_{l}

is the output of the previous layer. Through these recursive transformations, the network iteratively adjusts its internal parameters to refine the mapping, capturing both low- and high-frequency variations in the underlying data as the function

Φ (x; θ)

evolves over the course of training. To ensure stable signal propagation across layers, SIREN incorporates a specific weight initialization strategy. The weights are initialized as follows:

\begin{matrix} θ_{i} \sim U (- \frac{ω_{0}}{\sqrt{n}}, \frac{ω_{0}}{\sqrt{n}}), \end{matrix}

(5)

where

θ_{i}

represents the weights of the i-th layer, n is the number of input neurons from the previous layer, and

ω_{0}

is a frequency scaling parameter. This initialization ensures that the network’s activations span multiple periods over the interval

[- 1, 1]

, effectively preventing issues such as vanishing or exploding gradients. Given its popularity and simplicity, by default our work uses the seminal SIREN as a baseline for learning image INRs.

3.2. Our Proposed Method

As shown in Figure 3, our proposed method for learning INR for SAR images introduces a noise-first residual learning (NRL) method. More specifically, instead of directly learning the implicit neural representations (INRs) for the target image, NRL first learns the uniform noise component (intra-pixel independent mapping) and gradually incorporates the residual image (inter-pixel relationships) into the optimization target as the training progresses. Additionally, because some images in the SAR dataset already contain significant noise, we propose a gradient-based dataset separation method to identify heavily noisy images and directly learn from them rather than using NRL.

3.2.1. Noise-First Residual Learning for INR

Target Decomposition. In learning INR for images, the model takes pixel coordinates as input and outputs their corresponding values. This means that for each pixel in the image, the model learns the individual values of each pixel and the relationships between different pixels. However, learning the inter-pixel relationships is challenging and can potentially hinder the learning of the intra-pixel coordinate-to-value mapping. Given a target image for learning, the pixels are often interrelated to some extent, making it challenging for the model to focus solely on intra-pixel independent mapping. To this end, we decompose the optimization target into two components: an artificial uniform noise

I_{noise}

, and a residual image

I_{residual}

. The uniform noise is generated by randomly sampling the values within a certain range (by default, we set it to [−1, 1]). The residual image is obtained by calculating the gap between the original target and the uniform noise.

Roles of Decomposed Targets. Instead of directly learning the INR for the target image, we propose a noise-first residual learning method which first learns the uniform noise component and gradually incorporates the residual into the optimization target as the training epochs increase. As a result, the optimization target in our method starts with uniform noise, changes at each step, and eventually converges to the target image. With this noise-first residual learning approach, our proposed method first facilitates the learning of intra-pixel independence, as the value of each pixel is random and independent from the others in the case of uniform noise. By setting uniform noise as the learning target, the model can easily learn the intra-pixel independent mapping without being affected by inter-pixel relationships. The residual image is then gradually incorporated into the optimization target, as the full image needs to be reconstructed for representation. As the optimization approaches the end, the ratio of residual image over uniform noise increases, which allows the INR model to gradually exploit the inter-pixel relationship.

Figure 3. Overview of our proposed method. The gradient-based dataset separation method distinguishes between noisy images and clean images; for clean images, we adopt NRL to learn their INRs, while noisy images are learned directly without additional operations.

Moreover, we propose a sine-based incremental adjustment method in our noise-first residual learning method for INR in order to more smoothly incorporate

I_{residual}

into

I_{noise}

. Specifically, when adding

I_{noise}

and

I_{residual}

to generate

I_{target}

, we introduce an adjustment factor

k^{γ}

to control the contribution of

I_{residual}

in

I_{target}

, formulated as follows:

\begin{matrix} k & = sin (\frac{π}{2} \cdot \frac{e}{E}), e < E \\ I_{target} & = I_{noise} + k^{γ} \cdot I_{residual}, \end{matrix}

(6)

where

γ

controls the rate at which k changes with the number of training epochs, set to 0.5 by default. The parameter e corresponds to the current training epoch, while E represents the total number of epochs in the training stage. As e increases from 0, k follows a sine function pattern from 0 to 1, thereby increasing the contribution of

I_{residual}

to

I_{target}

. With k starting from 0,

I_{target}

is nearly identical to

I_{noise}

. When k approaches 1,

I_{target}

converges to

I_{clean}

. Based on the growth of k with e, the transformation of

k^{γ}

is shown in Figure 4. The optimization model based on our NRL method can be seen as learning a continuous function of pixel coordinates

P_{x y}

and their corresponding pixel values

I_{target}

, with the MLP learning the parameters

θ

of the function

Φ (P_{x y})

as follows:

\begin{matrix} θ^{*} = arg min_{θ} L (Φ_{θ} (P_{x y}), I_{noise} + k^{γ} \cdot I_{residual}) \end{matrix}

(7)

where

L

represents the MSE loss function. We set k to 1 for the last h epoch of training to allow the model to fully learn the original

I_{clean}

.

3.2.2. Gradient-Based Dataset Separation

In SAR image datasets, around 30% of the images already contain significant noise. These noisy images differ from clean images, which can provide intra-pixel independent mapping. We explored the image representation results of our noise-first residual learning (NRL) method for both clean and noisy images. Specifically, we selected ten noisy images and ten clean images, separated them using the gradient-based dataset separation method, and trained models using both SIREN and NRL-SIREN with these images as targets. As shown in Table 1, our NRL-SIREN method significantly improves the representation of clean images compared to SIREN, boosting PSNR by 24.50 dB and SSIM by 0.304. However, it provides no benefit for the representation of noisy images, and even causes degradation, reducing PSNR by 1.75 dB and SSIM by 0.001. These results suggest that for noisy images with rich intra-pixel independent mapping, it is unnecessary to first learn the intra-pixel independent mapping separately. Figure 5 visualizes the SIREN and NRL-SIREN representation results for the clean and noisy images. It can be seen that applying NRL to clean images improves the representation of fine details; however, for noisy images the detail representation shows no noticeable visual difference compared to SIREN. This further underscores the importance of initially learning the intra-pixel independent mapping and demonstrates the effectiveness of our gradient-based dataset separation method in learning SAR images.

Overall, our proposed noise-first residual learning approach shows significant performance for the majority of SAR images, but at the cost of a minor performance drop when significant noise occurs in the original target images. Therefore, we propose a gradient-based discrimination method to differentiate between clean images and noisy images in the SAR dataset, as shown in Figure 6, with our noise-first residual learning only being applied to noisy images. Noisy images exhibit significant local pixel value variations due to the presence of high-frequency noise components, while clean images are mostly flat and have smaller local pixel value variations. For an input image I, the gradient map is derived by computing the difference between neighboring pixels:

\begin{matrix} {Grad}_{x} (x) = I (x + 1, y) - I (x, y), \\ {Grad}_{y} (x) = I (x, y + 1) - I (x, y), \\ G (I) = \sqrt{{Grad}_{x} {(x)}^{2} + {Grad}_{y} {(x)}^{2}}, \end{matrix}

(8)

where

G (I)

represents the gradient map. The elements of this map capture the strength of gradient changes for pixels at coordinates

(x, y)

. As depicted in Figure 6, gradient maps can effectively highlight noise in the image, for instance by assigning large values to those pixels with high contrast compared to their neighbors. We calculate the total gradient

G_{t o t a l} (I)

for each SAR image from its gradient map, which can be formulated as follows:

\begin{matrix} G_{t o t a l} (I) = \frac{1}{W \cdot H} \sum_{i = 1}^{W} \sum_{j = 1}^{H} G_{i, j} (I), \\ I_{c l e a n} = {I ∣ G_{t o t a l} (I) \leq α}, \\ I_{n o i s y} = {I ∣ G_{t o t a l} (I) > α} . \end{matrix}

(9)

The threshold value

α

is set to 0.15 to classify each SAR dataset image as either a clean or noisy image based on its

G_{t o t a l} (I)

, as shown in Figure 7.

4. Experiments

4.1. Implementation Details and Evaluation Metrics

In our experiments, we used the seminal SIREN method as our baseline model and adopted the same initialization as in SIREN. We used the Adam optimizer [6] with the learning rate set to 0.0001. Our experiments were conducted with the widely used PyTorch deep learning framework on an NVIDIA RTX A2000. To evaluate the performance, we utilized two widely recognized metrics for image quality evaluation, namely, the peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) [53]. The PSNR measures the fidelity of reconstructed or compressed signals, while the SSIM compares the structural similarity between reference images and their test counterparts, which can closely mirror the sensitivity of the human visual system (HVS). The PSNR and SSIM metrics are mathematically defined as follows:

P S N R (x, y) = 10 \cdot {log}_{10} (\frac{255^{2}}{\frac{1}{H W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} {(x_{i j} - y_{i j})}^{2}}),

(10)

S S I M (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})},

(11)

where x and y are the model prediction values and ground truth values, respectively,

μ_{x}

and

μ_{y}

represent the respective mean values of x and y,

σ_{x}

and

σ_{y}

denote their respective variances,

σ_{x y}

is the covariance between x and y, and the constant values of

C_{1}

,

C_{2}

, and

C_{3}

are predefined.

4.2. SARDet-100K Dataset

The SARDet-100K dataset [42] represents a significant advancement in the field of SAR image analysis, particularly in the domain of object detection. Unlike previous datasets that are primarily mono-category, SARDet-100K is a large-scale multiclass dataset that is comparable in both scale and diversity to the COCO dataset for RGB images. This makes it a valuable resource for training and evaluating deep learning models specifically designed for SAR image analysis. One of the key challenges in SAR image analysis is the substantial disparity between pretraining models on RGB datasets and fine-tuning them on SAR datasets. This disparity arises from differences in data domain and model structure. To address this issue, the creators of SARDet-100k introduced multi-stage filter augmentation (MSFA). MSFA is a technique that involves applying a sequence of feature extraction filters to SAR images in order to create a more diverse and representative dataset. The SARDet-100k dataset comprises 116,598 SAR images and 245,653 annotated instances covering six object categories: Aircraft, Ship, Car, Bridge, Tank, and Harbor. The images have been acquired using various SAR systems with different frequencies and polarizations, providing a comprehensive spectrum of imaging conditions. The resolution of the images ranges from

0.1 \times 0.1

m to

25 \times 25

m per pixel, addressing the limitations of prior datasets that had coarser resolutions. To make the data more manageable, images exceeding

1000 \times 1000

pixels are cropped into overlapping patches of

512 \times 512

pixels. This dataset serves as a valuable resource for researchers and practitioners working on SAR image analysis, providing a large-scale, diverse, and well-annotated benchmark for training and evaluating deep learning models.

4.3. SAR Image Representation

The proposed gradient-based method was employed to distinguish clean images from noisy images in the SARDet-100k dataset. Subsequently, we randomly selected 20 clean images, irrespective of category, to serve as input for our image representation experiment. Following the settings of SIREN, we applied random center-cropping to

321 \times 321

pixels and then resized the images to

256 \times 256

pixels. Within this set of inputs, we replace our NRL backbone with other state-of-the-art methods, including position encoding and ReLU activation function-based INRs (PEMLP) [8], sinusoidal representation networks using sine periodic activation functions (SIREN) [6], Gaussian activation function-based INRs (Gauss) [54], and flexible spectral-bias tuning in implicit neural representation (FINER) [28]. When combined with our proposed NRL, these models are respectively referred to as NRL-PEMLP, NRL-SIREN, and NRL-FINER. We adhered to the experimental settings specified in the original papers, including the number of layers and the respective training strategies. Table 2 provides a quantitative comparison between our method and the other methods. The models trained with our NRL method show significant improvements in both PSNR and SSIM, demonstrating the effectiveness of using noise to enhance the learning of SAR image representations. Notably, the SIREN method achieves the greatest improvement when using NRL, with the PSNR increasing by 18.26 dB and the SSIM by 0.116. Moreover, we achieve the best performance with NRL-FINER, obtaining a PSNR of 56.64 dB and SSIM of 0.998, which are the best results among all our investigations. Figure 8 visualizes the representation results. It can be observed that adopting our NRL approach to learning INR for SAR images, starting with the complex noise and then learning the target SAR images, can effectively improve the model’s ability to represent fine details such as objects and subtle noise patterns.

The SAR dataset is divided into six categories, with each category containing images of different objects captured in various scenarios, leading to significant variations between the images. As shown in Table 3, adopting NRL across various models consistently enhances performance in learning INR for different categories of SAR images. These results demonstrate that our NRL method is well suited for various categories of SAR images.

4.4. High-Resolution SAR Image Representation

We additionally conducted image representation on the CARABAS-II [55] dataset, which consists of various challenging high-resolution SAR images. This dataset includes 24 images captured using HH-polarized radio waves within the 20–90 MHz frequency range. Each image is characterized by a high resolution, with a pixel size of 1 m × 1 m, resulting in dimensions of 3000 rows by 2000 columns and covering an area of 2 km × 3 km. The dataset features images of 25 military vehicles, categorized into three types: TGB11, TGB30, and TGB40. These images have been collected under 12 different settings that vary in terms of factors such as incidence angle and radio frequency interference (RFI). The results for five randomly selected clean images are shown in Table 4, while the qualitative results are shown in Figure 9. It can be observed that our proposed NRL method consistently improves upon various baselines.

4.5. Ablation Study

In our ablation studies, we used SIREN as our baseline and trained the model for 500 epochs, then compared the image representation performance with different hyperparameters in NRL.

4.5.1. Ablation Study of Focusing Parameter $γ$

The focusing parameter

γ

controls the growth trend of the adjusting factor

k^{γ}

in the sine-based incremental adjustment method, thereby affecting the process of adding the residual image. With

γ

set to 0, 0.5, 1, and 2, Figure 10 shows the corresponding approximation curves of

k^{γ}

as the number of training epochs increases. When

γ

is 0, this corresponds to using the baseline approach which directly learns from the target image. As

γ

decreases from 0, the curve of

k^{γ}

becomes progressively more convex, indicating that the optimization target contains more residual information during the early stages of training and that by the end of training there is less noise and a smoother overall change in the optimization target when using our NRL method. As shown in Table 5, setting

γ

to 0.5 achieves the best performance, with a PSNR of 50.94 dB and an SSIM of 0.997. Therefore, we selected

γ

as 0.5 as the default for our NRL method.

4.5.2. Ablation Study on Learning Epochs for Target Image

In our noise-first residual learning (NRL) approach, after first learning the artificial uniform noise and residual image, we then instruct the model to use the target image as the optimization target for several training epochs R. We conducted an ablation study by using different values of R to train separate models and comparing their image representation results. As shown in Table 6, our NRL achieves a PSNR of 42.38 dB and an SSIM of 0.972 when R is set to 0, surpassing the baseline. This result demonstrates the effectiveness of the NRL method. Moreover, our model’s performance improves as R increases and peaks when R reaches 50, with a PSNR of 50.94 dB and an SSIM of 0.997. Further increases in R lead to a decline in performance, with the PSNR dropping to 49.12 dB and the SSIM to 0.991 when R reaches 210. Figure 11 presents the approximation curves of various R along with the PSNR and SSIM values for target image representations versus the number of optimization steps. It can be seen that when the model begins to use the target image as the optimization target for learning, the PSNR and SSIM values representing the image increase the fastest, then slow down after a few epochs.

4.5.3. Ablation Study on Noise Level

The uniform noise is generated using a random number generator that produces uniformly distributed random values within the range of −R to R. We conducted an ablation study with multiple uniform noise ranges, setting R to 1.3, 1.1, 1, 0.9, and 0.7. The results in Table 7 show that the performance of our proposed method is consistently superior to that of the baseline, suggesting that our proposed method is robust to various noise levels.

5. Discussion

Figure 12 visualizes the SIREN and NRL-SIREN representation error maps of both clean and noisy images, showing that the existing INR frameworks struggle to capture fine details of SAR images. On the other hand, our proposed method significantly mitigates this issue.

Notably, INRs are not the only way to represent images in a continuous manner; multiplicative filter networks (MFNs) [56] represent an alternative approach. In Table 8, we empirically show that MFNs do not work as well as INRs for representing SAR images.

It is worth noting that our proposed method does not explicitly take advantage of the special characteristics of SAR images. Nonetheless, it improves upon the performance of multiple state-of-the-art INR baseline methods by a significant margin. We leave the investigation of exploiting the special characteristics of SAR images for further performance improvement to future work. Even though our proposed method does not exploit special characteristics such as complex-valued information, it can be easily extended from magnitude values to complex-valued information based on the results shown in Table 9.

6. Conclusions

As a pioneering attempt to learn INRs for SAR images, this work demonstrates that INRs can benefit from first learning the intra-pixel independence mapping. We propose noise-first residual learning (NRL), which begins by learning the uniform noise (intra-pixel independence mapping), then gradually adding the residual image (inter-pixel relationships) into the optimization target. We demonstrate that our NRL method can improve model performance. Moreover, we observe that some SAR images consistently contain significant noise, allowing the model to learn the intra-pixel independence mapping directly from these images. Thus, we propose a gradient-based dataset separation method to select noisy images, with which we demonstrate that directly learning these representations can yield competitive results. Extensive experimental results demonstrate the superiority of our method in learning INR for SAR images.

Author Contributions

Conceptualization, D.H. and C.Z.; methodology, D.H. and C.Z.; software, D.H.; validation, D.H.; formal analysis, D.H. and C.Z.; investigation, D.H. and C.Z.; resources, D.H. and C.Z.; data curation, D.H.; writing—original draft preparation, D.H. and C.Z.; writing—review and editing, D.H. and C.Z.; visualization, D.H.; supervision, C.Z.; project administration, C.Z.; funding acquisition, C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly by supported by ITRC (Information Technology Research Center) support program (IITP-2024-RS-2023-00259004) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation) and another IITP grant funded by the Korea government (MSIT) (IITP-2022-II220078, Explainable Logical Reasoning for Medical Knowledge Generation).

Data Availability Statement

The SARDet-100k dataset is available at https://github.com/zcablii/SARDet_100K?tab=readme-ov-file, accessed on 25 April 2024; The high-resolution CARABAS-II dataset is available at https://www.sdms.afrl.af.mil/index.php?collection=mstar&page=targets, accessed on 1 September 2024.

Acknowledgments

We thank the handling Associate Editor and the anonymous reviewers for their valuable comments and suggestions for this paper. We also thank the Institute of Information & Communications Technology Planning & Evaluation (IITP) for the financial support, funded by IITP-2024-RS-2023-00259004 and IITP-2022-II220078.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

INRs	Implicit Neural Representations
SAR	Synthetic Aperture Radar
NRL	Noise-first Residual Learning
MLP	Multi-Layer Perceptron
MSE	Mean Squared Error
SIREN	Sinusoidal Representation Network
PEMLP	Position Embedding Multi-Layer Perceptron
FINER	Flexible spectral-bias tuning in Implicit Neural Representation
WIRE	Wavelet Implicit Neural Representation
PSNR	Peak Signal-to-Noise Ratio
SSIM	Structural Similarity Index
MSFA	Multi-Stage Filter Augmentation
CNN	Convolutional Neural Network
ReLU	Rectified Linear Unit
HVS	Human Visual System

References

Park, J.J.; Florence, P.; Straub, J.; Newcombe, R.; Lovegrove, S. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 165–174. [Google Scholar]
Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
Mescheder, L.; Oechsle, M.; Niemeyer, M.; Nowozin, S.; Geiger, A. Occupancy networks: Learning 3D reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4460–4470. [Google Scholar]
Molaei, A.; Aminimehr, A.; Tavakoli, A.; Kazerouni, A.; Azad, B.; Azad, R.; Merhof, D. Implicit neural representation in medical imaging: A comparative survey. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 2381–2391. [Google Scholar]
Sitzmann, V.; Martel, J.N.; Bergman, A.W.; Lindell, D.B.; Wetzstein, G. Light field networks: Neural scene representations with single-evaluation rendering. Adv. Neural Inf. Process. Syst. 2021, 34, 19313–19325. [Google Scholar]
Sitzmann, V.; Martel, J.; Bergman, A.; Lindell, D.; Wetzstein, G. Implicit neural representations with periodic activation functions. Adv. Neural Inf. Process. Syst. 2020, 33, 7462–7473. [Google Scholar]
Rahaman, N.; Baratin, A.; Arpit, D.; Draxler, F.; Lin, M.; Hamprecht, F.; Bengio, Y.; Courville, A. On the spectral bias of neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 5301–5310. [Google Scholar]
Tancik, M.; Srinivasan, P.; Mildenhall, B.; Fridovich-Keil, S.; Raghavan, N.; Singhal, U.; Ramamoorthi, R.; Barron, J.; Ng, R. Fourier features let networks learn high frequency functions in low dimensional domains. Adv. Neural Inf. Process. Syst. 2020, 33, 7537–7547. [Google Scholar]
Sitzmann, V.; Zollhöfer, M.; Wetzstein, G. Scene representation networks: Continuous 3d-structure-aware neural scene representations. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Martel, J.N.; Lindell, D.B.; Lin, Z.; Chan, E.R.; Wetzstein, G. ACORN: Adaptive coordinate networks for neural scene representation. arXiv 2021, arXiv:2104.09575. [Google Scholar] [CrossRef]
Passah, A.; Sur, S.N.; Paul, B.; Kandar, D. SAR image classification: A comprehensive study and analysis. IEEE Access 2022, 10, 20385–20399. [Google Scholar] [CrossRef]
Chen, S.; Wang, H. SAR target recognition based on deep learning. In Proceedings of the 2014 International Conference on Data Science and Advanced Analytics (DSAA), IEEE, Shanghai, China, 30 October–1 November 2014; pp. 541–547. [Google Scholar]
Denis, L.; Dalsasso, E.; Tupin, F. A review of deep-learning techniques for SAR image restoration. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, IEEE, Brussels, Belgium, 11–16 July 2021; pp. 411–414. [Google Scholar]
Skrunes, S.; Johansson, A.M.; Brekke, C. Synthetic aperture radar remote sensing of operational platform produced water releases. Remote Sens. 2019, 11, 2882. [Google Scholar] [CrossRef]
Zhang, T.; Zeng, T.; Zhang, X. Synthetic aperture radar (SAR) meets deep learning. Remote Sens. 2023, 15, 303. [Google Scholar] [CrossRef]
Lee, J.S.; Jurkevich, L.; Dewaele, P.; Wambacq, P.; Oosterlinck, A. Speckle filtering of synthetic aperture radar images: A review. Remote Sens. Rev. 1994, 8, 313–340. [Google Scholar] [CrossRef]
Chierchia, G.; Scarpa, G.; Poggi, G.; Verdoliva, L.; Ciotola, M. Speckle2Void: Deep Self-Supervised SAR Despeckling with Blind-Spot Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2021, 59, 9814–9826. [Google Scholar]
Liu, K.; Liu, F.; Wang, H.; Ma, N.; Bu, J.; Han, B. Partition speeds up learning implicit neural representations based on exponential-increase hypothesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 5474–5483. [Google Scholar]
Shen, L.; Pauly, J.; Xing, L. NeRP: Implicit neural representation learning with prior embedding for sparsely sampled image reconstruction. IEEE Trans. Neural Networks Learn. Syst. 2022, 35, 770–782. [Google Scholar] [CrossRef]
Fang, W.; Tang, Y.; Guo, H.; Yuan, M.; Mok, T.C.; Yan, K.; Yao, J.; Chen, X.; Liu, Z.; Lu, L.; et al. CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 11631–11641. [Google Scholar]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Chen, Z.; Zhang, H. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5939–5948. [Google Scholar]
Genova, K.; Cole, F.; Sud, A.; Sarna, A.; Funkhouser, T. Deep structured implicit functions. arXiv 2019, arXiv:1912.06126. [Google Scholar]
Niemeyer, M.; Mescheder, L.; Oechsle, M.; Geiger, A. Differentiable volumetric rendering: Learning implicit 3D representations without 3D supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 3504–3515. [Google Scholar]
Liu, L.; Lin, W.; Bao, Y.; Bai, X.; Kavan, L.; Wu, J.; Tong, X. Neural sparse voxel fields. Adv. Neural Inf. Process. Syst. 2020, 33, 15651–15663. [Google Scholar]
Xie, S.; Zhu, H.; Liu, Z.; Zhang, Q.; Zhou, Y.; Cao, X.; Ma, Z. DINER: Disorder-invariant implicit neural representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 6143–6152. [Google Scholar]
Saragadam, V.; LeJeune, D.; Tan, J.; Balakrishnan, G.; Veeraraghavan, A.; Baraniuk, R.G. Wire: Wavelet implicit neural representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18507–18516. [Google Scholar]
Liu, Z.; Zhu, H.; Zhang, Q.; Fu, J.; Deng, W.; Ma, Z.; Guo, Y.; Cao, X. FINER: Flexible spectral-bias tuning in Implicit NEural Representation by Variable-periodic Activation Functions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 2713–2722. [Google Scholar]
Chen, Z.; Zhang, K.; Gao, S.; Zhang, H.; Tong, X. MetaSDF: Meta-learning signed distance functions. Adv. Neural Inf. Process. Syst. 2022, 33, 10136–10147. [Google Scholar]
Chen, A.; Xu, Z.; Geiger, A.; Yu, J.; Su, H. TensoRF: Tensorial radiance fields. arXiv 2022, arXiv:2203.09517. [Google Scholar]
Schowengerdt, R.A. Remote Sensing: Models and Methods for Image Processing; Elsevier: Amsterdam, The Netherlands, 2006. [Google Scholar]
Moreira, A.; Prats-Iraola, P.; Younis, M.; Krieger, G.; Hajnsek, I.; Papathanassiou, K.P. A tutorial on synthetic aperture radar. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–43. [Google Scholar] [CrossRef]
Elachi, C.; Van Zyl, J.J. Introduction to the Physics and Techniques of Remote Sensing; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
Navalgund, R.R.; Jayaraman, V.; Roy, P. Remote sensing applications: An overview. Curr. Sci. 2007, 93, 1747–1766. [Google Scholar]
Monserrat, O.; Crosetto, M.; Luzi, G. A review of ground-based SAR interferometry for deformation measurement. ISPRS J. Photogramm. Remote Sens. 2014, 93, 40–48. [Google Scholar] [CrossRef]
Wang, S.; He, F.; Dong, Z. A Novel Intrapulse Beamsteering SAR Imaging Mode Based on OFDM-Chirp Signals. Remote Sens. 2023, 16, 126. [Google Scholar] [CrossRef]
Chang, Y.L.; Anagaw, A.; Chang, L.; Wang, Y.C.; Hsiao, C.Y.; Lee, W.H. Ship detection based on YOLOv2 for SAR imagery. Remote Sens. 2019, 11, 786. [Google Scholar] [CrossRef]
Koo, V.; Chan, Y.K.; Vetharatnam, G.; Chua, M.Y.; Lim, C.H.; Lim, C.S.; Thum, C.; Lim, T.S.; bin Ahmad, Z.; Mahmood, K.A.; et al. A new unmanned aerial vehicle synthetic aperture radar for environmental monitoring. Prog. Electromagn. Res. 2012, 122, 245–268. [Google Scholar] [CrossRef]
McNairn, H.; Shang, J. A review of multitemporal synthetic aperture radar (SAR) for crop monitoring. In Multitemporal Remote Sensing: Methods and Applications; Springer: Cham, Switzerland, 2016; pp. 317–340. [Google Scholar]
Henderson, F.M.; Xia, Z.G. SAR applications in human settlement detection, population estimation and urban land use pattern analysis: A status report. IEEE Trans. Geosci. Remote Sens. 1997, 35, 79–85. [Google Scholar] [CrossRef]
Melillos, G.; Themistocleous, K.; Papadavid, G.; Agapiou, A.; Kouhartsiouk, D.; Prodromou, M.; Michaelides, S.; Hadjimitsis, D.G. Using field spectroscopy combined with synthetic aperture radar (SAR) technique for detecting underground structures for defense and security applications in Cyprus. In Proceedings of the Detection and Sensing of Mines, Explosive Objects, and Obscured Targets XXII, SPIE, Anaheim, CA, USA, 10–12 April 2017; Volume 10182, pp. 23–35. [Google Scholar]
Li, Y.; Li, X.; Li, W.; Hou, Q.; Liu, L.; Cheng, M.M.; Yang, J. Sardet-100k: Towards open-source benchmark and toolkit for large-scale sar object detection. arXiv 2024, arXiv:2403.06534. [Google Scholar]
Chi, M.; Plaza, A.; Benediktsson, J.A.; Sun, Z.; Shen, J.; Zhu, Y. Big data for remote sensing: Challenges and opportunities. Proc. IEEE 2016, 104, 2207–2219. [Google Scholar] [CrossRef]
Blasch, E.; Majumder, U.; Zelnio, E.; Velten, V. Review of recent advances in AI/ML using the MSTAR data. In Proceedings of the Algorithms for Synthetic Aperture Radar Imagery XXVII, Online Only, 27 April–8 May 2020; Volume 11393, pp. 53–63. [Google Scholar]
Wang, P.; Zhang, H.; Patel, V.M. SAR image despeckling using a convolutional neural network. IEEE Signal Process. Lett. 2017, 24, 1763–1767. [Google Scholar] [CrossRef]
Gao, F.; Huang, T.; Sun, J.; Wang, J.; Hussain, A.; Yang, E. A new algorithm for SAR image target recognition based on an improved deep convolutional neural network. Cogn. Comput. 2019, 11, 809–824. [Google Scholar] [CrossRef]
Zheng, C.; Jiang, X.; Liu, X. Semi-supervised SAR ATR via multi-discriminator generative adversarial network. IEEE Sensors J. 2019, 19, 7525–7533. [Google Scholar] [CrossRef]
Zheng, S.; Hao, X.; Zhang, C.; Zhou, W.; Duan, L. Towards Lightweight Deep Classification for Low-Resolution Synthetic Aperture Radar (SAR) Images: An Empirical Study. Remote Sens. 2023, 15, 3312. [Google Scholar] [CrossRef]
Zhao, P.; Liu, K.; Zou, H.; Zhen, X. Multi-stream convolutional neural network for SAR automatic target recognition. Remote Sens. 2018, 10, 1473. [Google Scholar] [CrossRef]
Ma, L.; Liu, Y.; Zhang, X.; Ye, Y.; Yin, G.; Johnson, B.A. Deep learning in remote sensing applications: A meta-analysis and review. ISPRS J. Photogramm. Remote Sens. 2019, 152, 166–177. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, L.; Du, B. Deep learning for remote sensing data: A technical tutorial on the state of the art. IEEE Geosci. Remote Sens. Mag. 2016, 4, 22–40. [Google Scholar] [CrossRef]
Van Den Oord, A.; Kalchbrenner, N.; Kavukcuoglu, K. Pixel recurrent neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 19–24 June 2016; pp. 1747–1756. [Google Scholar]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
Ramasinghe, S.; Lucey, S. Beyond periodicity: Towards a unifying framework for activations in coordinate-mlps. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 142–158. [Google Scholar]
Lundberg, M.; Ulander, L.M.; Pierson, W.E.; Gustavsson, A. A challenge problem for detection of targets in foliage. In Proceedings of the Algorithms for Synthetic Aperture Radar Imagery XIII, SPIE, Kissimmee, FL, USA, 17–20 April 2006; Volume 6237, pp. 160–171. [Google Scholar]
Fathony, R.; Sahu, A.K.; Willmott, D.; Kolter, J.Z. Multiplicative filter networks. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]

Figure 1. Framework for learning INR for image signals. An MLP-based network maps pixel coordinates to pixel intensity values, then is optimized by minimizing the MSE loss between the target pixel intensity values and their predicted ones.

Figure 2. Plot of the periodic activation function in SIREN.

Figure 4. Plot of the curve of

k^{γ}

versus

\frac{e}{E}

as e increases during the training process.

Figure 4. Plot of the curve of

k^{γ}

versus

\frac{e}{E}

as e increases during the training process.

Figure 5. Visualization of the representation results on both clean and noisy images. The ground truth image is presented in the first column for reference. For each of the SIREN and NRL-SIREN methods, the top row displays the represented full image and the bottom row shows a detailed portion of the image.

Figure 6. Example of a clean image and a noisy image, along with their respective gradient maps.

Figure 7. Illustration showing the use of a threshold

α

to distinguish clean and noisy images.

Figure 7. Illustration showing the use of a threshold

α

to distinguish clean and noisy images.

Figure 8. Qualitative comparison of SAR image representation results. Our NRL method demonstrates competitive performance in capturing fine details.

Figure 9. Example results of image representation produced by various methods. The ground truth image is presented in the left-most column for reference. For each method, the top row displays the reconstructed full image, while the bottom row shows a detailed portion of the image.

Figure 10. Plots the curve of

k^{γ}

versus

\frac{e}{E}

for various

γ

as e increases during the training process.

Figure 10. Plots the curve of

k^{γ}

versus

\frac{e}{E}

for various

γ

as e increases during the training process.

Figure 11. Performance analysis of models trained on the target image with varying numbers of epochs. Each subfigure illustrates the progression of the PSNR and SSIM metrics during training.

Figure 12. Qualitative comparison of error maps. SIRAN fails to capture fine details such as the white regions, while our proposed method helps to mitigate this problem.

Table 1. Quantitative comparison of image representation with and without INR. Metrics include PSNR (dB) and SSIM.

Image	Metrics	SIREN	NRL-SIREN
Clean	PSNR	30.12	54.62
Clean	SSIM	0.692	0.996
Noisy	PSNR	45.62	43.87
Noisy	SSIM	0.998	0.997

Table 2. Experimental comparison of SAR image representation performance with and without NRL. The best results are highlighted in bold.

Metrics	PEMLP		Gauss		SIREN		FINER
Metrics	w/o NRL	w NRL	w/o NRL	w NRL	w/o NRL	w NRL	w/o NRL	w NRL
PSNR (dB)	24.80	32.21	28.13	31.50	32.68	50.94	35.91	53.24
SSIM	0.626	0.763	0.628	0.754	0.881	0.997	0.928	0.998

Table 3. Quantitative performance comparison of different classes of SAR image representation with and without NRL. The best results are highlighted in bold.

Category	Metrics	PEMLP		Gauss		SIREN		FINER
Category	Metrics	w/o NRL	w NRL	w/o NRL	w NRL	w/o NRL	w NRL	w/o NRL	w NRL
Aircraft	PSNR (dB)	25.85	34.15	29.22	35.68	38.54	50.21	39.71	51.64
Aircraft	SSIM	0.648	0.895	0.740	0.905	0.945	0.993	0.950	0.994
Ship	PSNR (dB)	24.92	33.67	28.05	34.32	37.95	49.67	39.03	50.92
Ship	SSIM	0.637	0.891	0.720	0.899	0.940	0.992	0.946	0.993
Car	PSNR (dB)	25.30	34.12	28.50	35.50	38.21	50.32	39.45	51.75
Car	SSIM	0.646	0.893	0.735	0.902	0.943	0.993	0.948	0.994
Bridge	PSNR (dB)	23.70	31.12	26.90	32.28	35.82	47.45	36.97	48.89
Bridge	SSIM	0.625	0.880	0.705	0.888	0.925	0.985	0.930	0.987
Tank	PSNR (dB)	24.88	33.88	28.17	34.92	38.08	49.34	39.30	50.78
Tank	SSIM	0.628	0.889	0.728	0.898	0.941	0.992	0.947	0.993
Harbor	PSNR (dB)	23.55	30.34	26.75	31.28	35.45	46.42	36.61	48.54
Harbor	SSIM	0.620	0.875	0.700	0.883	0.923	0.979	0.928	0.986

Table 4. Experimental comparison of image representation accuracy after training with and without NRL. The best results are highlighted in bold.

Metrics	PEMLP		Gauss		SIREN		FINER
Metrics	w/o NRL	w NRL	w/o NRL	w NRL	w/o NRL	w NRL	w/o NRL	w NRL
PSNR (dB)	18.10	19.19	19.69	20.71	20.17	21.32	20.41	21.98
SSIM	0.132	0.169	0.398	0.431	0.410	0.441	0.423	0.459

Table 5. Ablation study of focusing parameter

γ

. The best results are highlighted in bold.

Table 5. Ablation study of focusing parameter

γ

. The best results are highlighted in bold.

Metrics	0	0.5	1	2
PSNR (dB)	32.68	50.94	48.95	46.73
SSIM	0.881	0.997	0.995	0.990

Table 6. Ablation study on learning epochs for the target image. The best results are highlighted in bold.

Metrics	Baseline	0	50	180	210
PSNR (dB)	32.68	42.38	50.94	50.24	49.12
SSIM	0.881	0.972	0.997	0.995	0.991

Table 7. Ablation study of different uniform noise ranges. The best results are highlighted in bold.

Metrics	Baseline	1.3	1.1	1	0.9	0.7
PSNR (dB)	32.68	38.91	47.91	50.94	46.72	42.12
SSIM	0.881	0.942	0.991	0.997	0.987	0.962

Table 8. Comparison with the non-INR multiplicative filter network (MFN) method. The best results are highlighted in bold.

Metrics	MFNs	SIREN	NRL-SIREN
PSNR (dB)	28.93	32.68	50.94
SSIM	0.847	0.881	0.997

Table 9. Quantitative comparison of image reconstruction with and without INR. Metrics include PSNR and SSIM. The best results are highlighted in bold.

Image	Metrics	SIREN	NRL-SIREN
Magnitude	PSNR (dB)	44.19	53.19
Magnitude	SSIM	0.973	0.992
Real	PSNR (dB)	43.75	52.63
Real	SSIM	0.961	0.990
Imaginary	PSNR (dB)	44.53	53.29
Imaginary	SSIM	0.975	0.992

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, D.; Zhang, C. Residual-Based Implicit Neural Representation for Synthetic Aperture Radar Images. Remote Sens. 2024, 16, 4471. https://doi.org/10.3390/rs16234471

AMA Style

Han D, Zhang C. Residual-Based Implicit Neural Representation for Synthetic Aperture Radar Images. Remote Sensing. 2024; 16(23):4471. https://doi.org/10.3390/rs16234471

Chicago/Turabian Style

Han, Dongshen, and Chaoning Zhang. 2024. "Residual-Based Implicit Neural Representation for Synthetic Aperture Radar Images" Remote Sensing 16, no. 23: 4471. https://doi.org/10.3390/rs16234471

APA Style

Han, D., & Zhang, C. (2024). Residual-Based Implicit Neural Representation for Synthetic Aperture Radar Images. Remote Sensing, 16(23), 4471. https://doi.org/10.3390/rs16234471

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Residual-Based Implicit Neural Representation for Synthetic Aperture Radar Images

Abstract

1. Introduction

2. Related Work

2.1. Implict Neural Representation

2.2. Synthetic Aperture Radar Images

3. Background and Method

3.1. Background

3.2. Our Proposed Method

3.2.1. Noise-First Residual Learning for INR

3.2.2. Gradient-Based Dataset Separation

4. Experiments

4.1. Implementation Details and Evaluation Metrics

4.2. SARDet-100K Dataset

4.3. SAR Image Representation

4.4. High-Resolution SAR Image Representation

4.5. Ablation Study

4.5.1. Ablation Study of Focusing Parameter $γ$

4.5.2. Ablation Study on Learning Epochs for Target Image

4.5.3. Ablation Study on Noise Level

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Residual-Based Implicit Neural Representation for Synthetic Aperture Radar Images

Abstract

1. Introduction

2. Related Work

2.1. Implict Neural Representation

2.2. Synthetic Aperture Radar Images

3. Background and Method

3.1. Background

3.2. Our Proposed Method

3.2.1. Noise-First Residual Learning for INR

3.2.2. Gradient-Based Dataset Separation

4. Experiments

4.1. Implementation Details and Evaluation Metrics

4.2. SARDet-100K Dataset

4.3. SAR Image Representation

4.4. High-Resolution SAR Image Representation

4.5. Ablation Study

4.5.1. Ablation Study of Focusing Parameter γ

4.5.2. Ablation Study on Learning Epochs for Target Image

4.5.3. Ablation Study on Noise Level

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.5.1. Ablation Study of Focusing Parameter $γ$