NBDNet: A Self-Supervised CNN-Based Method for InSAR Phase and Coherence Estimation

Li, Hongxiang; Wang, Jili; Ai, Chenguang; Wu, Yulun; Ren, Xiaoyuan

doi:10.3390/rs17071181

Open AccessArticle

NBDNet: A Self-Supervised CNN-Based Method for InSAR Phase and Coherence Estimation

by

Hongxiang Li

^1,2

,

Jili Wang

^1,2,*

,

Chenguang Ai

¹,

Yulun Wu

¹

and

Xiaoyuan Ren

¹

Department of Space Microwave Remote Sensing System, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China

²

School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(7), 1181; https://doi.org/10.3390/rs17071181

Submission received: 25 February 2025 / Revised: 21 March 2025 / Accepted: 24 March 2025 / Published: 26 March 2025

(This article belongs to the Special Issue Advances in InSAR Processing: Algorithmic Developments and Diverse Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Phase denoising constitutes a critical component of the synthetic aperture radar interferometry (InSAR) processing chain, where noise suppression and detail preservation are two mutually constraining objectives. Recently, deep learning has attracted considerable interest due to its promising performance in the field of image denoising. In this paper, a Neighbor2Neighbor denoising network (NBDNet) is proposed, which is capable of simultaneously estimating phase and coherence in both single-look and multi-look cases. Specifically, repeat-pass PALSAR real interferograms encompassing a diverse range of coherence, fringe density, and terrain features are used as the training dataset, and the novel Neighbor2Neighbor self-supervised training framework is leveraged. The Neighbor2Neighbor framework eliminates the necessity of noise-free labels, simplifying the training process. Furthermore, rich features can be learned directly from real interferograms. In order to validate the denoising capability and generalization ability of the proposed NBDNet, simulated data, repeat-pass data from Sentinel-1 Interferometric Wide (IW) swath mode, and single-pass data from Hongtu-1 stripmap mode are used for phase denoising experiments. The results demonstrate that NBDNet performs well in terms of noise suppression, detail preservation and computation efficiency, validating its potential for high-precision and high-resolution topography reconstruction.

Keywords:

convolutional neural network (CNN); Neighbor2Neighbor; phase denoising; self-supervised training; synthetic aperture radar interferometry (InSAR)

1. Introduction

Synthetic aperture radar interferometry (InSAR) is an effective remote sensing technique that utilizes the phase information between two coherent synthetic aperture radar (SAR) images to reconstruct a digital elevation model (DEM) and monitor deformation [1]. However, the interferometric phase is inevitably contaminated by random noise, which not only makes the subsequent phase unwrapping step challenging but also reduces the accuracy of height or deformation measurements. Therefore, phase denoising (or phase filtering) represents a crucial stage in the InSAR processing chain, which aims to suppress the random phase noise in the interferograms while preserving as much detailed information as possible.

In the last few decades, plenty of phase denoising algorithms have been proposed. The most straightforward way to reduce phase noise is boxcar averaging, which convolves the interferogram with a box filter. Boxcar averaging is easy to implement but is far from satisfactory due to its limited noise suppression ability and strong resolution loss. Lee et al. [2] replaced the rectangle window with predefined directional windows to better preserve edge information. However, the limited number of windows may introduce artifacts in the denoised interferogram. The intensity-driven adaptive-neighborhood (IDAN) [3] technique iteratively selects similar adjacent pixels to construct an adaptive neighborhood, which improves flexibility compared to predefined windows but suffers from a selection bias. In addition to spatial averaging, some researchers attempt to split signal and noise in the transformed domain (e.g., frequency domain [4] and wavelet domain [5]) and achieve promising results.

Non-local means (NL-means) [6] is an extension of local averaging. For a specific pixel, NL-means calculates the weighted average of all pixels within a relatively extensive search window, where the weights are set based on the similarity measured by patch-wise Euclidean distance. Compared with local averaging, NL-means can achieve better noise reduction and detail preservation at the cost of increased computational burden. Deledalle et al. [7] introduced the concept of NL-means to the field of InSAR denoising and proposed NL-InSAR to jointly estimate the reflectivity, coherence, and interferometric phase. Later, they proposed a generalized framework named NL-SAR [8], which can perform denoising on arbitrary number of SAR images. Baier et al. [9] implemented NL-InSAR and NL-SAR on graphics processing units (GPU), improving the processing speed for large-scale interferograms. Zhu et al. [10] conducted a comprehensive survey of the capabilities and limitations of NL-means InSAR denoising in the context of generating high-resolution DEM using TanDEM-X interferograms. They highlighted that NL-InSAR results in terrace-like artifacts in steep regions, while NL-SAR tends to blur the edge information. To address the aforementioned issues, Baier et al. [11] designed a refined non-local InSAR filter by introducing a fringe frequency compensation step and dynamically changing the patch size based on local scene heterogeneity. Furthermore, Sica et al. [12] proposed a powerful InSAR phase denoising algorithm named InSAR-BM3D, which is a combination of NL-means and transform-domain filtering. InSAR-BM3D and its refined version OC-InSAR-BM3D [13] perform well in terms of both noise suppression and detail preservation. Kang et al. [14] proposed an innovative algorithm named complex convolutional sparse coding (ComCSC) as well as its gradient regularized version (ComCSC-GR), which achieved outstanding denoising performance via convolutional sparse representation in the complex domain. For further information, Xu et al. [15] provided a comprehensive survey of established and emerging phase denoising algorithms.

Recently, deep learning (DL), especially convolutional neural networks (CNNs), has demonstrated considerable success in the field of image denoising. For instance, Zhang et al. [16] developed a denoising convolutional neural network (DnCNN) to tackle additive white Gaussian noise for grayscale and color images. The supervised learning scheme was employed, whereby the denoising network was tasked with predicting the output as close as possible to the given clean target images. Specifically, 400 images of size 180 × 180 serve as clean target images, and simulated Gaussian noise is added to clean images to create noisy images. The experiment on the Berkeley segmentation dataset (BSD68) shows that DnCNN outperforms BM3D by about 0.6dB on the peak-signal-to-noise ratio (PSNR). An improved version of DnCNN called FFDNet was then proposed [17]. On the one hand, FFDNet takes an adjustable noise level map as an extra input channel, which makes its denoising performance slightly better than DnCNN. On the other hand, FFDNet introduces a reversible downscaling operator, which expands the receptive field and reduces the computation burden. Nevertheless, the noise present in real images may deviate from the Gaussian distribution, which would result in performance degradation of the trained network when applied to real noisy images. To overcome the limitation of supervised learning, efforts have been made to develop unsupervised learning or self-supervised learning algorithms. Lehtinen et al. [18] introduced the Noise2Noise scheme in which the parameters of the network are optimized to learn a mapping function between independent noisy image pairs with the same ground truth. In other words, the optimized parameters of the denoising network theoretically remain unchanged if the clean target images are replaced with independent noisy images. The Noise2Noise framework obviates the necessity for the availability of clean images, but collecting noisy image pairs in some research domains is still infeasible. Subsequently, Huang et al. [19] proposed the Neighbor2Neighbor framework, which uses a random neighbor sub-sampler to construct pairs of noisy images from single noisy images. Similar strategies were used to denoise seismic data and achieved excellent results [20].

DL techniques have attracted much attention in the domain of InSAR phase denoising. Since noise-free interferograms are unavailable, most existing DL-based InSAR denoising methods use simulated datasets for training. Sica et al. [21] employed an innovative strategy to generate a simulated training dataset and trained a robust InSAR denoising network named

Φ

-Net to jointly estimate the interferometric phase and coherence. Specifically, the amplitude, coherence, and phase patterns were classified into six representative categories. Then, simulated noise is added based on the complex circular Gaussian signal model. Experiments on synthesized data and real data from the TanDEM-X and Sentinel-1 acquisitions demonstrated the potential of

Φ

-Net in high-quality DEM generation. Vitale et al. [22] designed a novel multiobjective cost function considering the fringe patterns, edge preservation and statistical distribution, and they trained an efficient phase denoising network named InSAR-MONet. Only recently have some self-supervised methods been proposed. Sica et al. [23] used TanDEM-X images as the training dataset and employed a blind spot technique to train a denoising network, but experimental results showed that its performance was slightly inferior to OC-InSAR-BM3D [13]. Geara et al. [24] verified the feasibility of self-supervised denoising with the simulated dataset. Besides phase denoising, DL has also been widely used in other steps of the InSAR processing chain. Wu et al. [25] developed a versatile network named SSENet to estimate stereo-radargrammetric shift, which can be utilized to improve coregistration accuracy, facilitate phase unwrapping and estimate absolute phase offset. Wu et al. [26] proposed two networks named DDNet and PUNet, the former for the fast identification of rapid deformation from large interferograms and the latter for the phase unwrapping of the detected interferogram patches. Chen et al. [27] designed a U-shaped network named ARU-Net to remove the atmospheric phase screen (APS) in repeat-pass interferograms.

In this paper, we propose a Neighbor2Neighbor denoising network (NBDNet) for InSAR phase and coherence estimation. NBDNet leverages the Neighbor2Neighbor framework, and the training dataset is constructed using Advanced Land Observing Satellite Phased Array-type L-band Synthetic Aperture Radar (ALOS PALSAR) repeat-pass interferograms. A total of 15 interferometric pairs were selected, encompassing a diverse range of coherences, fringe densities, and terrain features. The main advantages of NBDNet are as follows:

Capable of suppressing the phase noise in the interferograms while preserving the detail information;
Capable to estimating the interferometric phase and coherence simultaneously;
Capable to performing denoising in both single-look and multi-look cases.

The remainder of the paper is structured as follows. A brief survey of the relevant works is provided in Section 2. Section 3 provides an in-depth description of the proposed NBDNet. Experimental results are presented in Section 4. Finally, discussions and conclusions are summarized in Section 5 and Section 6.

2. Related Works

This section provides a brief overview of two works that inspired the proposed NBDNet, namely, Noise2Noise and Neighbor2Neighbor.

2.1. Noise2Noise

Traditional supervised learning uses noisy–clean pairs

(y, x)

to estimate the optimal parameters

θ

of a denoising network

f_{θ}

, which can be formulated as the following regression problem:

\underset{θ}{arg min} E_{x, y} \{{∥f_{θ} (y) - x∥}_{p}\},

(1)

where

E

denotes the mathematical expectation, and

| | \cdot | |_{p}

denotes the

L_{p}

-norm.

The Noise2Noise [18] framework points out that corrupting the clean targets with independent zero-mean noise during training does not change the optimized parameters of the network. Provided with two independent noisy images named y and z corresponding to the common ground truth named x, the objective function of Noise2Noise is defined as follows:

\underset{θ}{arg min} E_{x, y, z} \{{∥f_{θ} (y) - z∥}_{p}\} .

(2)

Indeed, the statistical distribution of y and z can differ, as long as

E_{y | x} (y) = E_{z | x} (z) = x

. The Noise2Noise framework manages to train the denoising network by only looking at corrupted images, whose performance is comparable to classical supervised learning. However, it is sometimes expensive or even impractical to acquire noisy pairs, which limits the application of the Noise2Noise framework.

2.2. Neighbor2Neighbor

Neighbor2Neighbor [19] is a variant of the Noise2Noise framework, which requires simply single noisy images to conduct self-supervised training. As illustrated in Figure 1, a random neighbor subsampler

G = (g_{1}, g_{2})

is designed to construct a noisy pair

(g_{1} (y), g_{2} (y))

from one single noisy image y. The detailed steps of the random neighbor subsampler are as follows.

The single noisy image y of size $W \times H$ is uniformly divided into $\frac{W}{2} \times \frac{H}{2}$ cells of size $2 \times 2$ ;
For the i-th row and j-th column cell, two neighboring positions are randomly picked, which are, respectively, placed at the $(i, j)$ -th positions of the subsampler $G = (g_{1}, g_{2})$ ;
Execute step 2 for all cells to generate the full subsampler $G = (g_{1}, g_{2})$ . Then, the noisy pair $(g_{1} (y), g_{2} (y))$ of size $\frac{W}{2} \times \frac{H}{2}$ can be obtained.

The Neighbor2Neighbor framework aims to acquire independent noisy pairs via the random neighbor sampler G, and in this case, the minimization problem shown in (2) can be reformulated as follows:

\underset{θ}{arg min} E_{x, y} \{{∥f_{θ} (g_{1} (y)) - g_{2} (y)∥}_{p}\} .

(3)

However, it must be stressed that (3) only holds if the two conditions of Noise2Noise are satisfied: (1) the noise in

g_{1} (y)

and

g_{2} (y)

is independent; (2) the ground truths of

g_{1} (y)

and

g_{2} (y)

are the same, i.e.,

E_{y | x} (g_{1} (y)) - E_{y | x} (g_{2} (y)) = 0

. The first condition is satisfied if we assume that the noise in y is pixel-wise independent (i.e., spatially uncorrelated). For the second condition, since

g_{1} (y)

and

g_{2} (y)

are originally neighboring, the gap between them should be small when a certain degree of spatial smoothness of the image is assumed. Moreover, in order to relax this requirement, a regularization term is added to address the gap between the ground truths of

g_{1} (y)

and

g_{2} (y)

, resulting in the following minimization problem:

\begin{matrix} \underset{θ}{arg min} E_{x, y} \{{∥f_{θ} (g_{1} (y)) - g_{2} (y)∥}_{p}\} \\ + α E_{x, y} \{{∥f_{θ} (g_{1} (y)) - g_{2} (y) - g_{1} (f_{θ} (y)) + g_{2} (f_{θ} (y))∥}_{p}\} \end{matrix},

(4)

where

α

serves to balance the reconstruction term and the regularization term. As analyzed in [19], the absence of the regularization term will lead to over-smoothing, and setting

α = 2

can achieve the balance between noise suppression and detail preservation.

In short, Neighbor2Neighbor is a self-supervised denoising framework which employs a random neighbor subsampler to construct training pairs from only noisy images. The pixel-wise independence of noise is assumed, and the regularization term is employed to compensate for the ground truth gap.

3. Methodology

This section presents the methodology of the proposed NBDNet. First, the employed InSAR signal model is introduced. Second, the strategy for training dataset generation is described. Third, the network architecture is presented. Fourth, the self-supervised training process is expounded. Finally, the inference details are discussed.

3.1. Signal Model

Let

z_{1}

and

z_{2}

be two pixels at the same position in two coregistered single-look complex (SLC) SAR images. Based on Goodman’s theory [28],

z_{1}

and

z_{2}

obey a zero-mean complex circular Gaussian (CCG) distribution, which can be written as follows:

p (z_{1}, z_{2} | Σ) = \frac{1}{π^{2} det (Σ)} exp [- (\begin{matrix} z_{1}^{*} & z_{2}^{*} \end{matrix}) Σ^{- 1} (\begin{matrix} z_{1} \\ z_{2} \end{matrix})],

(5)

where

Σ

is the

2 \times 2

covariance matrix, which has the following form:

Σ = (\begin{matrix} A^{2} & A^{2} ρ exp (j ϕ) \\ A^{2} ρ exp (- j ϕ) & A^{2} \end{matrix}),

(6)

where A is the true amplitude (assuming the true amplitudes of the two SLCs are equal),

ρ

is the true coherence, which represents the coherence influenced by baseline decorrelation, temporal decorrelation, thermal noise, etc., and

ϕ

is the true interferometric phase which is a blend of several contributions such as topographic phase, deformation phase and atmospheric phase.

The unbiased maximum likelihood estimators of the elements of

Σ

are provided by [29]

{\hat{σ}}_{11} = \frac{1}{2 N} \{\sum_{n = 1}^{N} {|z_{1}^{(n)}|}^{2} + \sum_{n = 1}^{N} {|z_{2}^{(n)}|}^{2}\},

(7)

{\hat{σ}}_{12} = \frac{1}{N} \sum_{n = 1}^{N} z_{1}^{(n)} z_{2}^{(n) *},

(8)

where

σ_{p q}

denotes the element of the p-th row and q-th column of the matrix

Σ

,

^

denotes estimator,

z_{p}^{(n)}

denotes the nth sample of

z_{p}

, and N denotes the number of samples. In practice, the average operation is usually performed on spatially neighboring pixels, which is called multi-looking in the InSAR field.

To obtain the sample estimates of

ρ

and

ϕ

, we calculate the following:

γ_{m} = \frac{{\hat{σ}}_{12}}{{\hat{σ}}_{11}} = \hat{ρ} exp (j \hat{ϕ}),

(9)

where

γ_{m}

is the multi-look normalized complex interferogram. In this way, the magnitude and angle of the multi-look normalized complex interferogram provide the sample estimate of the coherence and interferometric phase.

Despite the loss of resolution associated with multi-looking, it presents the following advantages:

Multi-looking can suppress noise to some extent in advance, making it easier for subsequent denoising steps.
Multi-looking reduces the size of the interferogram by a factor of $L_{a}$ and $L_{r}$ in the azimuth and range direction, respectively, where $L_{a}$ and $L_{r}$ are the azimuth and range multi-looking factors. Therefore, the computational burden is reduced.
Since SAR images are usually oversampled in both range and azimuth directions (typical oversampling rate ranges from 1.1 to 1.4), there is a correlation between neighboring pixels. Multi-looking has been demonstrated to be an effective method of reducing the spatial correlation of noise [30], which facilitates the application of the Neighbor2Neighbor framework since the noise in $γ_{m}$ can be approximated as pixel-wise independent.

Consequently, the multi-look normalized complex interferogram

γ_{m}

is employed as the input of our denoising network. In other words, our denoising network is designed for multi-look interferograms instead of single-look ones. However, a special technique to be mentioned in Section 3.5 allows the denoising network to be used in single-look cases.

3.2. Training Dataset Generation

The proposed NBDNet leverages the Neighbor2Neighbor framework mentioned in Section 2.2, which requires only noisy interferograms to train the denoising network. The Phased Array-type L-Band Synthetic Aperture Radar (PALSAR) equipped on the Advanced Land Observing Satellite (ALOS) worked from 2006 to 2011 and collected a large amount of data on a global scale, which could be used for repeat-pass interferometry. In order to obtain a robust and efficient denoising network, the interferograms used for training should encompass a diverse range of coherence, fringe density and terrain features. The global seasonal Sentinel-1 interferometric coherence dataset [31], though not the same satellite, is used to guide the selection of PALSAR interferometric pairs. A total of 15 interferometric pairs were selected, and their main parameters are listed in Table 1. Figure 2 shows the phase, coherence and amplitude of each interferometric pair. In fact, the coherence is spatially varying. Figure 3 provides the statistical histogram of the coherence of the 15 selected interferograms. It can be seen that the number of samples with coherence lower than 0.2 is relatively small, but in general, regions with too low coherence are usually masked in InSAR processing. Furthermore, there is also a limited number of samples with coherence higher than 0.95 due to the fact that it is difficult to achieve such a high coherence with repeat-pass interferometry. Overall, the distribution of coherence basically covers the range from 0 to 1, which satisfies the training requirements.

The basic processing chain of the training data generation is as follows.

SAR Image Focusing: The level 1.0 products downloaded from Alaska Satellite Facility (ASF) website are unfocused signal data, so they have to be focused to obtain the SLCs.
Coregistration and Cropping: For each interferometric pair, the auxiliary SLC is coregistered to the primary SLC using the polynomial offset models. Afterward, the common area of two coregistered SLCs is cropped.
Flat Earth Phase Removal: Flat Earth phase refers to the phase trend corresponding to the curved Earth, which is calculated and added to the phase of the coregistered auxiliary SLC to reduce the fringe frequency in the interferogram to be generated.
Multi-look Normalized Interferogram Generation: The multi-look normalized interferograms can be generated according to (7), (8), and (9). In order to guarantee the robustness and practicality of our denoising network, interferograms corresponding to 12 different multi-looking factor combinations are generated for each pair, as depicted in Figure 4. The guideline for setting the ratio of azimuth and range multi-looking factors is to make the azimuth resolution and ground range resolution of the multi-look interferogram as close as possible. The maximum multi-looking factor is set to $4 \times 9 = 36$ to ensure that there is no excessive loss of resolution, and the minimum multi-looking factor is set to $2 \times 2 = 4$ to ensure the pixel-wise independence of noise.

In summary, 15 PALSAR interferometric pairs are selected, and for each pair, 12 interferograms with different multi-looking factor combinations are generated. For each interferogram, 1800 patches of size

120 \times 120

are uniformly cropped. Therefore, a total number of

15 \times 12 \times 1800 = 324,000

patches of size

120 \times 120

are generated as the training dataset. The training set is very complete, incorporating a wide range of coherence, fringe density, terrain features, and multi-looking factors. Consequently, the network can learn rich features from the dataset during training and, thus, is expected to be competent in interferogram denoising in a variety of scenarios.

3.3. Network Architecture

The network architecture of the proposed NBDNet is depicted in Figure 5, which is modified from DnCNN [16] and FFDNet [17]. For the input layer, the real and imaginary components of the multi-look normalized complex interferogram are concatenated into a tensor of size

H \times W \times 2

. Afterward, a reversible downscaling operator is applied, which rearranges blocks of spatial data into the third dimension and thus reshapes the input tensor into a downscaled tensor of size

\frac{H}{2} \times \frac{W}{2} \times 8

, as depicted in Figure 6. As analyzed in [32], the combination of the downscaling and upscaling operators is an effective way of reducing computation burden without deteriorating the denoising capability. Subsequently, the denoising network with depth D consists of 3 categories of modules: (i) Conv + ReLU: for the first module, M convolution (Conv) filters of size

3 \times 3 \times 8

are employed to create M feature maps, followed by rectified linear units (ReLU) [33] activation function for nonlinearity. (ii) Conv + BN + ReLU: for modules 2∼

(D - 1)

, a cascaded structure is employed. For each module, M filters of size

3 \times 3 \times M

are employed, and batch normalization (BN) [34] is inserted between Conv and ReLU. (iii) Conv: for the last module, 8 filters of size

3 \times 3 \times M

are employed. Moreover, a residual connection is added between the input and output of the denoising network, i.e., the downscaled output is given by subtracting the output of the denoising network from the downscaled input. This strategy is called residual learning [35], which can address the performance degradation problem and facilitate network training. Finally, an upscaling operator is applied to rearrange the data into an output denoised result of size

H \times W \times 2

.

The proposed NBDNet has two main parameters that need to be determined, namely the depth D (i.e., the number of convolution layers) and the number of feature maps M. In [17], D is set to 15 and M is set to 64 for grayscale denoising, and D is set to 12 and M is set to 96 for color denoising. As for the parameter settings of the proposed NBDNet, the following criteria have been considered.

Compared with the additive white Gaussian noise that FFDNet deals with, the noise in real interferograms is more intricate, so the capacity of the network needs to be enlarged by increasing either the depth D or the number of feature maps M.
The real and imaginary parts of the multi-look normalized complex interferogram are concatenated as the input of our network, and there is a dependency between the two channels. As analyzed in [17], the exploitation of inter-channel dependency is facilitated by implementing a small depth D and a large number of feature maps M.
The appropriate receptive field for image denoising ranges from $35 \times 35$ to $61 \times 61$ , and larger depth D will increase the computation burden with little performance improvement [17].

Consequently, considering the network capacity, receptive field, inter-channel dependency and computation burden, we empirically set the parameters

D = 13

and

M = 128

, with the corresponding receptive field size of

54 \times 54

.

3.4. Self-Supervised Training

The process of self-supervised training is depicted in Figure 7. Given the noisy interferogram

γ_{m}

, the random neighbor subsampler

G = (g_{1}, g_{2})

shown in Figure 1 is applied to obtain

g_{1} (γ_{m})

and

g_{2} (γ_{m})

, and the denoising network

f_{θ}

shown in Figure 5 is applied to

g_{1} (γ_{m})

to obtain

f_{θ} (g_{1} (γ_{m}))

. Furthermore,

f_{θ}

and G are successively applied to

γ_{m}

to obtain

g_{1} (f_{θ} (γ_{m}))

and

g_{2} (f_{θ} (γ_{m}))

. Afterward, the loss function

L

is calculated, and then the weights

θ

are updated using the backpropagation algorithm. Once the training is completed, the denoising network

f_{θ}

will be able to estimate the mathematical expectation of the noisy interferogram

γ_{m}

. In other words, NBDNet can conduct simultaneous estimation of the interferometric phase and coherence.

The loss function is designed under the Neighbor2Neighbor framework mentioned in Section 2.2. Given a multi-look normalized complex interferogram

γ_{m}

, the loss function is defined as follows:

\begin{matrix} L & = {∥R [f_{θ} (g_{1} (γ_{m})) - g_{2} (γ_{m})]∥}_{1} + {∥I [f_{θ} (g_{1} (γ_{m})) - g_{2} (γ_{m})]∥}_{1}, \\ + α \{{∥R [f_{θ} (g_{1} (γ_{m})) - g_{2} (γ_{m}) - g_{1} (f_{θ} (γ_{m})) + g_{2} (f_{θ} (γ_{m}))]∥}_{1}, \\ + {∥I [f_{θ} (g_{1} (γ_{m})) - g_{2} (γ_{m}) - g_{1} (f_{θ} (γ_{m})) + g_{2} (f_{θ} (γ_{m}))]∥}_{1}\}, \end{matrix}

(10)

where

R

and

I

refer to extracting the real and imaginary components of a complex number, respectively,

G = (g_{1}, g_{2})

is the random neighbor sampler,

f_{θ}

is the denoising network, and the

L_{1}

-norm

| | \cdot | |_{1}

is selected based on preliminary experiments. The hyperparameter

α

determines the strength of the regularization term and is set to 2 as suggested in [19].

3.5. Inference Details

As mentioned in Section 3.4, the network is capable of denoising multi-look interferograms of a wide range of multi-looking factors. Due to the downscaling and upscaling operators in the network structure, the size of the input multi-look interferogram must be even. However, this limitation can be easily removed by padding one line symmetrically after the end of the multi-look interferogram along dimensions of an odd size. The term “symmetrically” means padding with mirror reflections of the array along the border.

Due to the presence of spatial correlation in single-look interferograms, the denoising network cannot be applied directly in single-look cases. To address this issue, a special technique is designed, which includes the following steps:

Padding: Padding is performed along both the azimuth and range dimensions. Take the azimuth dimension as an example, and denote the number of azimuth lines as H. Without loss of generality, we assume that $H \equiv 3 (\mod 4)$ (otherwise the primary and auxiliary SLCs can be padded symmetrically to satisfy this requirement). Pad one line symmetrically both before the beginning and after the end of the primary and auxiliary SLCs, which makes the new number of azimuth lines satisfy $H^{'} \equiv 1 (\mod 4)$ . As shown in Figure 8, the position of the original SLCs is indicated by solid pixels, while pixels with diagonal pattern denote the padded area.
Multi-looking: In this step, 4 multi-look interferograms with factor $2 \times 2$ are generated. The area for each multi-looking operation is shown in Figure 8a–d, where colored and gray pixels represent pixels used for multi-looking and pixels not used for multi-looking, respectively.
Denoising: The 4 interferograms generated in step 2 are denoised using the trained network.
Aggregation: The solid circles shown in Figure 8a–d indicate the center position of each pixel of the multi-look interferograms, each of which is located exactly at one corner of one single-look pixel. For each single-look pixel, the denoised complex values at its four corners are obtained, so the denoised complex value of each single-look pixel can be estimated by averaging the denoised complex values at its four corners.

4. Results

In this section, a comprehensive validation of the proposed NBDNet is presented. First, the training details are introduced. Afterward, the comparison methods are listed. Subsequently, denoising experiments have been conducted on simulated data. Finally, Sentinel-1 repeat-pass data and Hongtu-1 (HT-1) single-pass data are used to assess the performance of the proposed NBDNet on real data.

4.1. Training Details

As mentioned in Section 3.2 and Section 3.4, NBDNet is trained using the dataset constructed from PALSAR real acquisitions under the Neighbor2Neighbor framework. More specifically, a total number of

15 \times 12 \times 1800 = 324,000

patches of size

120 \times 120

are generated, and the loss function defined in (10) is employed for training.

The network weights are initialized by the orthogonal method [36] and optimized by the Adam algorithm [37]. The initial learning rate is

l_{r} = 10^{- 3}

, which reduces to

10^{- 4}

after 30 epochs, as shown in Figure 9a. The mini-batch size is set as 128, and data augmentation techniques, including rotation and flipping, are employed throughout the training process. The network is implemented in Python 3.7 using the Tensorflow 2.11 framework and is carried out on a Tesla P100 GPU for a total time of around 16 hours. After 50 epochs, the loss function converges, as shown in Figure 9b.

4.2. Comparison Methods

We compare the proposed NBDNet with several existing methods: boxcar averaging, NL-InSAR [7], NL-SAR [8], OC-InSAR-BM3D [13], ComCSC-GR [14],

Φ

-Net [21], and InSAR-MONet [22]. The following experiments are conducted on a PC with Intel Core (TM) i5-13400F CPU at 2.50 GHz. The codes provided by the authors are used, and the parameters are set as suggested in the references. Specifically, the parameter settings are as follows:

Boxcar averaging with dimWindow = 5;
NL-InSAR with dimPatch = 7 and dimWindow = 21;
NL-SAR with dimWindowMax = 25 and dimPatchMax = 11;
OC-InSAR-BM3D with dimPatch = 8 and dimWindow = 21;
ComCSC-GR with number of filters set to 96, filter size set to 20, $λ = 2.5$ and $μ = 5$ ;
$Φ$ -Net and InSAR-MONet with the trained model provided by the authors, which have no adjustable parameters.

Furthermore, the above-mentioned methods differ in terms of application cases and estimated parameters, as detailed in Table 2. It can be seen that boxcar, NL-SAR, and the proposed NBDNet can estimate coherence and phase simultaneously in both single-look and multi-look cases, while the other methods are either inapplicable in the multi-look case or unable to conduct coherence estimation.

4.3. Simulated Assessment

In order to validate the effectiveness and robustness of the proposed NBDNet, denoising experiments on simulated data are carried out. The application of an InSAR system for topographic mapping is considered. The main parameters of the simulated single-pass InSAR system are listed in Table 3, and the ALOS global digital surface model (AW3D30) [38] is employed as the terrain height. The noise-free unwrapped interferometric phase

ψ

is then calculated by

ψ = \frac{2 π B h}{λ R sin θ_{inc}},

(11)

where B is the perpendicular baseline length, h is the terrain height,

λ

is the wavelength, R is the slant range, and

θ_{inc}

is the incidence angle. The wrapped phase

ϕ

is then obtained by

ϕ = arg (exp (j ψ)) \in [- π, π) .

(12)

In order to assess the denoising performance on different noise levels and fringe densities, nine patterns are constructed by combining three coherence values

ρ = {0.9, 0.6, 0.3}

and three baseline values

B = {500 m, 1000 m, 1500 m}

. Low coherence corresponds to a high noise level, and a long baseline is indicative of dense fringes, both of which can bring challenges to the phase denoising process.

As listed in Table 2, all the aforementioned methods can be employed for phase denoising in a single-look case, so the single-look case is considered. For each pattern, ten Monte-Carlo simulations are made, and one of the realizations is depicted in Figure 10. All the methods are then tested on the simulated interferograms. To quantitatively evaluate the noise suppression ability, the root mean square error (RMSE) of the denoised phase is employed as the metric. The average RMSE results for ten Monte Carlo simulations are calculated and listed in Table 4, with the best values highlighted in bold. As can be observed, the proposed NBDNet outperforms all other methods for most patterns, verifying the superiority of the NBDNet in terms of noise suppression ability.

Moreover, the edge preservation index (EPI) [39] is employed as a metric of detail preservation. Given the denoised image

\hat{x}

and the noise-free image x, the EPI value is defined as follows:

\begin{matrix} EPI (\hat{x}, x) = \frac{〈Δ x - \bar{Δ x}, Δ \hat{x} - \bar{Δ \hat{x}}〉}{\sqrt{〈Δ x - \bar{Δ x}, Δ x - \bar{Δ x}〉 \cdot 〈Δ \hat{x} - \bar{Δ \hat{x}}, Δ \hat{x} - \bar{Δ \hat{x}}〉}} \end{matrix},

(13)

where

Δ

denotes the

3 \times 3

Laplacian filter used to emphasize the edges,

\bar{Δ x}

and

\bar{Δ \hat{x}}

denote the average values of images

Δ x

and

Δ \hat{x}

, and

〈x_{1}, x_{2}〉 = \sum_{i, j} x_{1} (i, j) \cdot x_{2} (i, j)

(14)

denotes the inner product of two matrices:

x_{1}

and

x_{2}

. The closer the EPI value is to 1, the higher the correlation between

Δ x

and

Δ \hat{x}

is, indicating superior detail preservation. It is worth mentioning that it is improper to calculate the EPI value for the wrapped denoised phase

\hat{ϕ}

because

2 π

-jumps due to wrapping phenomenon do not represent real edges. Consequently, the “optimal” unwrapped phase

\hat{ψ}

is calculated based on the reference unwrapped phase

ψ

,

\hat{ψ} = \hat{ϕ} + round ((ψ - \hat{ϕ}) / (2 π)) \cdot (2 π),

(15)

and then the EPI values of

\hat{ψ}

are calculated and listed in Table 5. Similarly, the proposed NBDNet demonstrates superior performance in comparison to alternative methods for the majority of patterns, thereby substantiating its efficacy in detail preservation.

In order to demonstrate the visual differences between the results with different methods, the denoised phase and the absolute phase error map for three representative patterns, namely the least, medium and most challenging ones, are shown in Figure 11. The denoised phase by boxcar averaging looks grainy, which will introduce undesirable artifacts in the subsequent products (e.g., topographic maps). Both NL-InSAR and NL-SAR perform poorly on the simulated interferograms, especially for the most challenging pattern, possibly due to the rare patch effect [11] in such heterogeneous scenes. In contrast, OC-InSAR-BM3D produces stable results in terms of both noise suppression and detail preservation. Residual texture features can be clearly observed in the phase error map of ComCSC-GR and

Φ

-Net, implying that the two methods tend to over-smooth the interferometric phase and thus blur some detail structures. InSAR-MONet is capable of obtaining favorable results for the least and medium challenging patterns, but the performance decreases significantly for the most challenging pattern. Moreover, some artifacts can be observed at the 2

π

-jump areas. Finally, the proposed NBDNet provides the most visually appealing denoised phase among all the aforementioned methods, suppressing the noise effectively without too much loss of detailed information for all patterns.

In addition, some methods can give estimates of the coherence during the denoising process. The average RMSE results of the estimated coherence of different methods for interferograms with different patterns are listed in Table 6, with the best values highlighted in bold. It can be seen that our proposed NBDNet also exhibits an excellent level of coherence estimation. The estimated coherence and the absolute coherence error map on the three representative patterns are shown in Figure 12. Similar to the denoised phase, the coherence estimated by boxcar averaging also looks grainy. As can be seen in the error map, NL-InSAR and NL-SAR suffer from an estimation bias.

Φ

-Net produces better results, but the NBDNet achieves the best performance.

4.4. Real Assessment

In order to further validate the denoising capability and generalization ability of the proposed NBDNet, cross-sensor denoising experiments on Sentinel-1 repeat-pass data and HT-1 single-pass data are carried out. Note that NBDNet is trained using only PALSAR interferograms, and Sentinel-1 and HT-1 interferograms are not included in the training dataset.

The first experiment was conducted on a dataset obtained from Sentinel-1 using the Interferometric Wide (IW) swath mode. The images were acquired at Maqin County, Golog Tibetan Autonomous Prefecture, Qinghai Province, China, with an incidence angle of 39.2° on 18 April 2017 and 30 April 2017. The coregistered SLCs were cropped to a size of

512 \times 2048

, and a multi-look interferogram with factor

2 \times 8

was generated, whose size was thus reduced to

256 \times 256

. The original noisy interferometric phase and the denoised phase with different methods are shown in Figure 13, and the corresponding number of residues is listed in Table 7. As can be observed, boxcar averaging failed to retain the original fringe structure when the noise level or fringe density was high. Plenty of residual noise in the denoised phase and a large number of residues demonstrated the poor performance of NL-SAR in this scene. ComCSC-GR effectively suppressed the phase noise, but the detail structures were severely blurred. The denoised phase using InSAR-MONet is satisfactory at first glance, but the loss of resolution can be noticed. Finally, the proposed NBDNet is the best in terms of detail preservation, and the phase noise is also effectively suppressed.

In addition, the coherence estimation abilities of different methods are also evaluated. As shown in Figure 14, the sample estimate of coherence is very noisy due to the insufficient number of samples. The coherence given by boxcar averaging is severely biased because the topographic phase is not compensated. The coherence provided by NL-SAR is noisy, while the proposed NBDNet provides a smooth result.

The second experiment was performed a dataset obtained from the HT-1 constellation. As a spaceborne single-pass multi-baseline InSAR system, HT-1 consists of one primary satellite (A) and three auxiliary satellites (B, C, and D). The study area was located in Larestan County, Fars Province, Iran. The bistatic pair formed by A and B was selected for the experiment, and the main parameters are listed in Table 8. A patch of

3000 \times 3000

pixels was extracted from the coregistered SLCs, and no multi-looking operation was performed. The SAR amplitude image and the interferometric phase are depicted in Figure 15a and b, respectively. It can be seen that the study area contained many steep mountains. Consequently, abrupt phase and coherence changes were present, which brought challenges to the denoising process. After applying the proposed NBDNet, the estimated coherence and the denoised phase are depicted in Figure 15c and d, respectively. Thanks to the advanced InSAR system, precise imaging and elimination of temporal decorrelation, the coherence is extremely high throughout the interferogram except for layover and shadow areas. In such cases, the preservation of image details becomes essential for digital surface model (DSM) reconstruction at a high level of precision and resolution.

In order to evaluate the performance of the proposed NBDNet, the methods listed in Table 2 are all executed. To demonstrate the noise suppression and detail preservation ability of each method, the four denoised phase patches in the position indicated by the red squares in Figure 15a are amplified in Figure 16. It is obvious that boxcar averaging creates grainy artifacts. NL-InSAR suffers from the staircase effect [40], which is clearly visible in P1, P2, and P3. As for NL-SAR and OC-InSAR-BM3D, residual noise can be observed in P1 and P4, implying their limited denoising abilities. ComCSC-GR severely blurs the fringes, especially in P1 and P4.

Φ

-Net has two main drawbacks. On the one hand, the over-smoothing effect can be observed in P1 and P4. On the other hand,

Φ

-Net sometimes introduces artifacts, such as abrupt changes in P2. As for the InSAR-MONet, the grainy effect similar to boxcar averaging can be noticed. In comparison, the proposed NBDNet can effectively mitigate the noise while preserving the detailed structure. The veins of the mountains are distinctly visible in P1 and P4 without noticeable grainy or staircase effects.

In addition, the number of residues and the execution time of different methods are listed in Table 9. NL-InSAR, NL-SAR, and OC-InSAR-BM3D produce a large number of residues, which suggests that these methods are weak in noise reduction.

Φ

-Net has the least number of residues due to its over-smoothing phenomenon. As for the execution time, the proposed NBDNet demonstrates the highest computation efficiency except for boxcar, verifying its convenience and practicability.

As a secondary product of phase denoising, the four estimated coherence patches indicated in Figure 15a are shown in Figure 17. It is evident that boxcar averaging causes a grainy effect and resolution loss due to the insufficient number of samples. Furthermore, the results of NL-InSAR and NL-SAR look noisy. As for

Φ

-Net, some undesirable holes can be noticed in P1, P2, and P3. Finally, the proposed NBDNet produces the smoothest results.

We further assess the advantage of the proposed NBDNet for high-precision and high-resolution DSM generation. Firstly, the denoised phase is unwrapped by the SNAPHU [41,42,43] method. Owing to the effective phase denoising step, there are a few phase wrapping errors. Subsequently, the unwrapped phase is converted to topographic height and then geocoded to a geographic grid with a 4 m pixel spacing. The rendered DSM is depicted in Figure 18, with World Imagery (Copyright: ESRI) as the background.

Again, four patches indicated by the blue squares in Figure 18 are amplified in Figure 19 to compare the visual differences between the reconstructed DSM with different methods. The aforementioned drawbacks of each method can be observed in the DSM patches. Q1 is a relatively flat area where a grainy effect can be noticed in a boxcar, NL-InSAR, OC-InSAR-BM3D, ComCSC-GR, and InSAR-MONet. Residual noise can be observed in the Q2 and Q4 patches generated by boxcar, NL-InSAR, NL-SAR, and OC-InSAR-BM3D. As for

Φ

-Net, although there is little noise, the detail structures are severely blurred, especially in Q2 and Q3. The DSM patches generated by the proposed NBDNet are very promising, verifying its potential for high-precision and high-resolution topography reconstruction.

To quantitatively evaluate the height accuracy of the reconstructed DSM, the land and vegetation height (ATL08) [44] product provided by the Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2) is employed as the reference height. Specifically, the ATL08 product contains along-track geolocated terrain height, canopy height, and canopy cover, which are provided as segments with a step size of 100 m along the ground track. Considering the segment-dependent height accuracy of ATL08 data and the limited penetration depth of X-band radar waves into vegetation, the following strategy is employed to extract reference height from ATL08 data:

Segments with uncertainty of the mean terrain height (h_te_uncertainty) larger than 20 m are excluded;
Segments with uncertainty of the relative canopy height (h_canopy_uncertainty) larger than 20 m are excluded;
Within each segment, the optical centroid of all photons classified as either canopy or ground points (centroid_height) is employed as the reference height.

The selected ATL08 segments are plotted in Figure 18. The DSM reconstructed by HT-1 is interpolated to obtain the elevation values at the geographic positions of the center-most signal photon within each ATL08 segment, and the RMSEs of the reconstructed DSM based on different methods are then calculated and listed in Table 9. Basically, the height accuracies of the DSMs generated by all denoising methods are around four meters, and there is not much difference between them. There are two reasons to account for this phenomenon. Firstly, the horizontal resolution and vertical accuracy of the ATL08 product are not compatible with the DSM product generated by the HT-1 InSAR system, and the X-band phase center may not correspond to the optical centroid of the photons, so the evaluation of height accuracy is not precise enough. Secondly, the pixel spacing of the DSM product is approximately twice the ground resolution of the SAR image, which weakens the influence of denoising method selection.

In summary, the proposed NBDNet trained by PALSAR data exhibits excellent performance in cross-sensor denoising experiments on Sentinel-1 and HT-1 data, verifying its generalization ability. Both quantitative and qualitative metrics validate that NBDNet surpasses mainstream phase denoising methods.

5. Discussion

The main contribution of this paper is the adaptation of the Neighbor2Neighbor self-supervised framework to the InSAR denoising field. Specifically, the multi-look normalized complex interferogram

γ_{m}

is selected as the input and output format of the network based on the following considerations. First, the multi-looking operation can effectively reduce the spatial correlation of noise, which guarantees the pixel-wise independence requirement of the Neighbor2Neighbor framework. Second, the noise in interferometric phase cannot be regarded as zero-mean due to its inherent wrapping property, while handling interferograms in a complex format can avoid the phase wrapping phenomenon. Third, the magnitude and angle of

γ_{m}

give the sample estimate of coherence

ρ

and the interferometric phase

ϕ

, which enables the denoising network to estimate both the coherence and the interferometric phase. Fourth, different multi-looking factor combinations are generated in the training dataset, which enables the network to be applied in multi-look cases. In addition, in order to allow the proposed NBDNet to be applied when spatial correlation is present, especially in single-look cases, a special technique is designed in Section 3.5.

We also constructed a complete training dataset based on PALSAR repeat-pass data. Indeed, the performance of denoising networks depends largely on the quality of the training dataset, and real datasets are advantageous over simulated datasets in many aspects. On the one hand, real interferograms comprise diverse combinations of coherence and phase patterns, and their underlying relevance is also considered. On the other hand, real interferograms can reflect the noise properties of real scenes, including both Gaussian and non-Gaussian characteristics, which are difficult to construct through simulation. Consequently, the network is able to learn rich features from real interferograms during training and demonstrates higher adaptability and robustness when employed for denoising tasks in real scenes. Experimental results show that the proposed NBDNet demonstrates superior denoising performance in comparison to existing algorithms when applied to the denoising of interferograms acquired by different sensors at different geographic locations, which verifies the superiority of using real interferograms as the training dataset.

Moreover, the proposed NBDNet has the potential advantage of being specialized for a particular sensor. Specifically, for a particular interferometric SAR sensor, a sufficient number of real interferograms captured can be used to construct the training dataset, and then the denoising network can be trained under the Neighbor2Neighbor framework. The model obtained in this manner should achieve higher denoising performance than the cross-sensor model when employed in the denoising tasks for that sensor.

One of the limitations of the proposed NBDNet is that it is designed for single-baseline InSAR. Multi-baseline and multi-frequency InSAR technologies [45] are capable of acquiring multiple interferograms with different heights of ambiguity simultaneously. Although we can individually apply the NBDNet on each interferogram, the exploitation of the internal relationships between multiple interferograms can potentially improve the phase denoising performance. It is worth mentioning that the self-supervised training strategy has the potential to be transplanted into the field of multi-baseline denoising, but there are some issues that need to be addressed. First, the training dataset should contain various baseline ratios, which will bring challenges to the training dataset generation process. Second, if the interferograms of different baselines is concatenated in the channel dimension as the input of the network, then one specific trained model can only be applied to InSAR systems with a fixed number of channels, which will be inconvenient for both training and applying.

6. Conclusions

Decorrelation is an inherent problem of InSAR technology, which is manifested as random noise at the interferometric phase level. Accordingly, phase denoising plays an important part in the InSAR processing chain. By suppressing the noise and enhancing the signal, phase denoising effectively improves the phase accuracy and fringe visibility of the interferogram. Besides, as a secondary product of phase denoising, the accurate estimation of coherence can be employed not only as a weight map for phase unwrapping but also for the evaluation of phase uncertainty. In this paper, a novel CNN-based phase denoising method named NBDNet is proposed. The repeat-pass PALSAR real interferograms encompassing a diverse range of coherence, fringe density and terrain features are used as the training dataset, and the novel Neighbor2Neighbor self-supervised training framework is leveraged. Simulated data, Sentinel-1 repeat-pass data and HT-1 single-pass data are used to assess the performance of the proposed NBDNet, and the experimental results validate that NBDNet performs well in terms of noise suppression, detail preservation and computation efficiency. To sum up, the proposed NBDNet demonstrates its superiority in several aspects. Firstly, it manages to simultaneously estimate the phase and coherence in either single-look or multi-look cases, which makes it applicable to a variety of tasks. Secondly, NBDNet is capable of mitigating the phase noise while preserving the detailed structure, which confirms its potential for high-precision and high-resolution topography reconstruction. Thirdly, NBDNet can be applied to interferograms collected by different sensors such as PALSAR, Sentinel-1, and HT-1, despite the fact that only PALSAR data are used for training. Finally, as a DL-based method, only the structure and weights of the network need to be saved after training, and the denoising can be efficiently performed even without GPUs.

Although the proposed NBDNet is designed for single-baseline InSAR, it can potentially be extended to multi-baseline InSAR denoising. Our future work will study how to efficiently construct training datasets for multi-baseline denoising and design a DL-based multi-baseline InSAR denoising framework that can be applied to an arbitrary number of channels.

Author Contributions

Conceptualization, H.L. and J.W.; methodology, H.L.; software, H.L., J.W., and Y.W.; validation, H.L., J.W., and Y.W.; formal analysis, H.L.; investigation, H.L. and J.W.; resources, J.W., C.A., and X.R.; data curation, H.L. and J.W.; writing—original draft preparation, H.L.; writing—review and editing, H.L., J.W., and Y.W.; visualization, H.L.; supervision, J.W.; project administration, J.W.; funding acquisition, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Science Fund under Grant 62201548.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors would like to thank the Japan Aerospace Exploration Agency (JAXA) for providing the Advanced Land Observing Satellite Phased Array type L-band Synthetic Aperture Radar (ALOS PALSAR) data, the European Space Agency (ESA) for providing Sentinel-1 data, and the PIESAT Information Technology Co., Ltd. for providing the Hongtu-1 (HT-1) data.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bamler, R.; Hartl, P. Synthetic aperture radar interferometry. Inverse Probl. 1998, 14, R1–R54. [Google Scholar]
Lee, J.S.; Papathanassiou, K.; Ainsworth, T.; Grunes, M.; Reigber, A. A new technique for noise filtering of SAR interferometric phase images. IEEE Trans. Geosci. Remote Sens. 1998, 36, 1456–1465. [Google Scholar] [CrossRef]
Vasile, G.; Trouve, E.; Lee, J.S.; Buzuloiu, V. Intensity-driven adaptive-neighborhood technique for polarimetric and interferometric SAR parameters estimation. IEEE Trans. Geosci. Remote Sens. 2006, 44, 1609–1621. [Google Scholar] [CrossRef]
Goldstein, R.M.; Werner, C.L. Radar interferogram filtering for geophysical applications. Geophys. Res. Lett. 1998, 25, 4035–4038. [Google Scholar] [CrossRef]
Lopez-Martinez, C.; Fabregas, X. Modeling and reduction of SAR interferometric phase noise in the wavelet domain. IEEE Trans. Geosci. Remote Sens. 2002, 40, 2553–2566. [Google Scholar] [CrossRef]
Buades, A.; Coll, B.; Morel, J.M. A non-local algorithm for image denoising. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 2, pp. 60–65. [Google Scholar] [CrossRef]
Deledalle, C.A.; Denis, L.; Tupin, F. NL-InSAR: Nonlocal Interferogram Estimation. IEEE Trans. Geosci. Remote Sens. 2011, 49, 1441–1452. [Google Scholar] [CrossRef]
Deledalle, C.A.; Denis, L.; Tupin, F.; Reigber, A.; Jäger, M. NL-SAR: A Unified Nonlocal Framework for Resolution-Preserving (Pol)(In)SAR Denoising. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2021–2038. [Google Scholar] [CrossRef]
Baier, G.; Zhu, X.X. GPU-based nonlocal filtering for large scale SAR processing. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 7608–7611. [Google Scholar] [CrossRef]
Zhu, X.X.; Baier, G.; Lachaise, M.; Shi, Y.; Adam, F.; Bamler, R. Potential and limits of non-local means InSAR filtering for TanDEM-X high-resolution DEM generation. Remote. Sens. Environ. 2018, 218, 148–161. [Google Scholar] [CrossRef]
Baier, G.; Rossi, C.; Lachaise, M.; Zhu, X.X.; Bamler, R. A Nonlocal InSAR Filter for High-Resolution DEM Generation From TanDEM-X Interferograms. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6469–6483. [Google Scholar] [CrossRef]
Sica, F.; Cozzolino, D.; Zhu, X.X.; Verdoliva, L.; Poggi, G. InSAR-BM3D: A Nonlocal Filter for SAR Interferometric Phase Restoration. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3456–3467. [Google Scholar] [CrossRef]
Sica, F.; Cozzolino, D.; Verdoliva, L.; Poggi, G. The Offset-Compensated Nonlocal Filtering of Interferometric Phase. Remote. Sens. 2018, 10, 1359. [Google Scholar] [CrossRef]
Kang, J.; Hong, D.; Liu, J.; Baier, G.; Yokoya, N.; Demir, B. Learning Convolutional Sparse Coding on Complex Domain for Interferometric Phase Restoration. IEEE Trans. Neural Networks Learn. Syst. 2021, 32, 826–840. [Google Scholar] [CrossRef]
Xu, G.; Gao, Y.; Li, J.; Xing, M. InSAR Phase Denoising: A Review of Current Technologies and Future Directions. IEEE Geosci. Remote. Sens. Mag. 2020, 8, 64–82. [Google Scholar] [CrossRef]
Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef]
Zhang, K.; Zuo, W.; Zhang, L. FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising. IEEE Trans. Image Process. 2018, 27, 4608–4622. [Google Scholar] [CrossRef] [PubMed]
Lehtinen, J.; Munkberg, J.; Hasselgren, J.; Laine, S.; Karras, T.; Aittala, M.; Aila, T. Noise2Noise: Learning Image Restoration without Clean Data. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 2965–2974. [Google Scholar]
Huang, T.; Li, S.; Jia, X.; Lu, H.; Liu, J. Neighbor2Neighbor: Self-Supervised Denoising from Single Noisy Images. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 14776–14785. [Google Scholar] [CrossRef]
Shao, D.; Zhao, Y.; Li, Y.; Li, T. Noisy2Noisy: Denoise Pre-Stack Seismic Data Without Paired Training Data With Labels. IEEE Geosci. Remote. Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Sica, F.; Gobbi, G.; Rizzoli, P.; Bruzzone, L. Φ-Net: Deep Residual Learning for InSAR Parameters Estimation. IEEE Trans. Geosci. Remote Sens. 2021, 59, 3917–3941. [Google Scholar] [CrossRef]
Vitale, S.; Ferraioli, G.; Pascazio, V.; Schirinzi, G. InSAR-MONet: Interferometric SAR Phase Denoising Using a Multiobjective Neural Network. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
Sica, F.; Schmitt, M. On the estimation of InSAR phase and coherence through self-supervised learning. In Proceedings of the EUSAR 2024; 15th European Conference on Synthetic Aperture Radar, Munich, Germany, 23–26 April 2024; pp. 736–739. [Google Scholar]
Geara, C.; Gelas, C.; De Vitry, L.; Colin, E.; Tupin, F. InSAR2InSAR: A Self-Supervised Method for InSAR Parameters Estimation. In Proceedings of the 2024 32nd European Signal Processing Conference (EUSIPCO), Lyon, France, 26–30 August 2024; pp. 651–655. [Google Scholar] [CrossRef]
Wu, Y.; Wang, J.; Zhang, H.; Zhao, F.; Xiang, W.; Li, H.; Wang, H.; An, L. SSENet: A Multiscale 3-D Convolutional Neural Network for InSAR Shift Estimation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–15. [Google Scholar] [CrossRef]
Wu, Z.; Wang, T.; Wang, Y.; Wang, R.; Ge, D. Deep Learning for the Detection and Phase Unwrapping of Mining-Induced Deformation in Large-Scale Interferograms. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–18. [Google Scholar] [CrossRef]
Chen, Y.; Bruzzone, L.; Jiang, L.; Sun, Q. ARU-Net: Reduction of Atmospheric Phase Screen in SAR Interferometry Using Attention-Based Deep Residual U-Net. IEEE Trans. Geosci. Remote Sens. 2021, 59, 5780–5793. [Google Scholar] [CrossRef]
Goodman, N.R. Statistical Analysis Based on a Certain Multivariate Complex Gaussian Distribution (An Introduction). Ann. Math. Stat. 1963, 34, 152–177. [Google Scholar] [CrossRef]
Tough, J.A.; Blacknell, D.; Quegan, S. A statistical description of polarimetric and interferometric synthetic aperture radar data. Proc. R. Soc. London Ser. Math. Phys. Sci. 1995, 449, 567–589. [Google Scholar] [CrossRef]
Lee, J.S.; Hoppel, K.; Mango, S.; Miller, A. Intensity and phase statistics of multilook polarimetric and interferometric SAR imagery. IEEE Trans. Geosci. Remote Sens. 1994, 32, 1017–1028. [Google Scholar] [CrossRef]
Kellndorfer, J.; Cartus, O.; Lavalle, M.; Magnard, C.; Milillo, P.; Oveisgharan, S.; Osmanoglu, B.; Rosen, P.; Wegmüller, U. Global seasonal Sentinel-1 interferometric coherence and backscatter data set. Sci. Data 2022, 9, 73. [Google Scholar] [CrossRef]
Tassano, M.; Delon, J.; Veit, T. An Analysis and Implementation of the FFDNet Image Denoising Method. Image Process. Line 2019, 9, 1–25. [Google Scholar]
Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning, Madison, WI, USA, 21–24 June 2010; ICML’10. pp. 807–814. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Volume 37, pp. 448–456. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Saxe, A.M.; McClelland, J.L.; Ganguli, S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Takaku, J.; Tadono, T.; Doutsu, M.; Ohgushi, F.; Kai, H. Updates of ‘AW3D30’ ALOS global digital surface model with other open access datasets. Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. 2020, XLIII-B4-2020, 183–189. [Google Scholar] [CrossRef]
Sattar, F.; Floreby, L.; Salomonsson, G.; Lovstrom, B. Image enhancement based on a nonlinear multiscale method. IEEE Trans. Image Process. 1997, 6, 888–895. [Google Scholar] [CrossRef]
Baier, G.; Zhu, X.X.; Lachaise, M.; Breit, H.; Bamler, R. Nonlocal InSAR Filtering for DEM Generation and Addressing the Staircasing Effect. In Proceedings of the EUSAR 2016: 11th European Conference on Synthetic Aperture Radar, Hamburg, Germany, 6–9 June 2016; pp. 1–4. [Google Scholar]
Chen, C.W.; Zebker, H.A. Network approaches to two-dimensional phase unwrapping: Intractability and two new algorithms. J. Opt. Soc. Am. A 2000, 17, 401–414. [Google Scholar] [CrossRef]
Chen, C.W.; Zebker, H.A. Two-dimensional phase unwrapping with use of statistical models for cost functions in nonlinear optimization. J. Opt. Soc. Am. A 2001, 18, 338–351. [Google Scholar] [CrossRef]
Chen, C.; Zebker, H. Phase unwrapping for large SAR interferograms: Statistical segmentation and generalized network models. IEEE Trans. Geosci. Remote Sens. 2002, 40, 1709–1719. [Google Scholar] [CrossRef]
Neuenschwander, A.; Pitts, K. The ATL08 land and vegetation product for the ICESat-2 Mission. Remote. Sens. Environ. 2019, 221, 247–259. [Google Scholar] [CrossRef]
Gini, F.; Lombardini, F. Multibaseline cross-track SAR interferometry: A signal processing perspective. IEEE Aerosp. Electron. Syst. Mag. 2005, 20, 71–93. [Google Scholar] [CrossRef]

Figure 1. Schematic of the random neighbor subsampler in the Neighbor2Neighbor framework. Pixels filled in red and blue are randomly picked from each

2 \times 2

cell in the single noisy image y, and then a pair of subsampled images

g_{1} (y)

and

g_{2} (y)

are formed.

Figure 1. Schematic of the random neighbor subsampler in the Neighbor2Neighbor framework. Pixels filled in red and blue are randomly picked from each

2 \times 2

cell in the single noisy image y, and then a pair of subsampled images

g_{1} (y)

and

g_{2} (y)

are formed.

Figure 2. Training dataset for NBDNet. (a–o) are the 15 selected PALSAR interferometric pairs, whose phase, coherence and amplitude are displayed from top to bottom.

Figure 3. Statistical histogram of the coherence of the training dataset.

Figure 4. Azimuth and range multi-looking factors used for training interferogram generation. The blue asterisks indicate the multi-looking factor combinations.

Figure 5. The architecture of the denoising network

f_{θ}

.

R

and

I

mean extracting the real and imaginary components of a complex number, respectively.

S_{1}

,

S_{2}

,

S_{3}

, and

S_{4}

represent the downscaling operators shown in Figure 6. Boxes in three different colors represent the three categories of modules. Conv, BN, and ReLU denote the convolution layer, batch normalization layer, and rectified linear unit, respectively. The connecting line from “Downscaled Input” to the “minus” symbol represent subtraction.

Figure 5. The architecture of the denoising network

f_{θ}

.

R

and

I

mean extracting the real and imaginary components of a complex number, respectively.

S_{1}

,

S_{2}

,

S_{3}

, and

S_{4}

represent the downscaling operators shown in Figure 6. Boxes in three different colors represent the three categories of modules. Conv, BN, and ReLU denote the convolution layer, batch normalization layer, and rectified linear unit, respectively. The connecting line from “Downscaled Input” to the “minus” symbol represent subtraction.

Figure 6. Diagram of the downscaling and upscaling operators.

Figure 7. The process of self-supervised training. Given the noisy interferogram

γ_{m}

, the random neighbor subsampler

G = (g_{1}, g_{2})

and the denoising network

f_{θ}

are applied to obtain

f_{θ} (g_{1} (γ_{m}))

,

g_{2} (γ_{m})

,

g_{1} (f_{θ} (γ_{m}))

, and

g_{2} (f_{θ} (γ_{m}))

, which are used to calculate the loss function

L

.

Figure 7. The process of self-supervised training. Given the noisy interferogram

γ_{m}

, the random neighbor subsampler

G = (g_{1}, g_{2})

and the denoising network

f_{θ}

are applied to obtain

f_{θ} (g_{1} (γ_{m}))

,

g_{2} (γ_{m})

,

g_{1} (f_{θ} (γ_{m}))

, and

g_{2} (f_{θ} (γ_{m}))

, which are used to calculate the loss function

L

.

Figure 8. Illustration of the special technique for single-look denoising. (a–d) show the 4 multi-look areas designed for single-look denoising. The position of the original SLCs is indicated by solid pixels, while pixels with diagonal pattern denote the padded area. Colored and gray pixels represent pixels used for multi-looking and pixels not used for multi-looking, respectively. The solid circles indicate the center position of each pixel of the multi-look interferograms, each of which is located exactly at one corner of one single-look pixel.

Figure 9. Learning rate and loss curve. (a) Learning rate schedule. (b) Loss function curve.

Figure 10. Simulated interferograms used for denoising in the single-look case. Each row shows the phase corresponding to a specific baseline. The first column shows the noise-free phase, and the remaining columns shows the noisy phase under different coherence.

Figure 11. The denoised phase (color-coded) based on different methods and the absolute phase error map (grayscale) in the single-look simulated case. The least, medium, and most challenging patterns are depicted in the top, middle, and bottom rows, respectively.

Figure 12. The estimated coherence (color-coded) based on different methods and the absolute coherence error map (grayscale) in the single-look simulated case. The least, medium, and most challenging patterns are depicted in the top, middle, and bottom rows, respectively.

Figure 13. The denoised phase based on different methods for the Sentinel-1 real interferogram.

Figure 14. The estimated coherence based on different methods for the Sentinel-1 real interferogram. Note that ComCSC-GR and InSAR-MONet are unable to estimate coherence.

Figure 15. The HT-1 real data. (a) SAR amplitude. (b) Noisy interferometric phase. (c) Denoised coherence by NBDNet. (d) Denoised phase by NBDNet. The red squares P1, P2, P3, and P4 in (a) identify the location of the four selected patches to be amplified.

Figure 16. The amplified denoised phase patches indicated in Figure 15a based on different methods for the HT-1 real interferogram.

Figure 17. The amplified estimated coherence patches indicated in Figure 15a based on different methods for the HT-1 real interferogram. Note that OC-InSAR-BM3D, ComCSC-GR, and InSAR-MONet are unable to estimate coherence.

Figure 18. The rendered DSM generated by the proposed NBDNet using HT-1 real data, with World Imagery (Copyright: ESRI) as the background. The blue squares Q1, Q2, Q3, and Q4 identify the location of the four selected patches to be amplified. The scattered points indicate the heights and positions of the ICESat-2 ATL08 segments.

Figure 19. The amplified shaded reliefs of the four patches indicated in Figure 18 generated by the different methods using HT-1 real data.

Table 1. Parameters of the selected PALSAR interferometric pairs.

Scene ID (Primary/Auxiliary)	Acquisition Date (Primary/Auxiliary)	Perpendicular Baseline (m)	Average Coherence
ALPSRP107516870/ALPSRP114226870	31 January 2008/17 March 2008	−569	0.54
ALPSRP267110800/ALPSRP273820800	29 January 2011/16 March 2011	712	0.44
ALPSRP102950220/ALPSRP116370220	30 December 2007/31 March 2008	643	0.61
ALPSRP096210670/ALPSRP102920670	14 November 2007/30 December 2007	−278	0.40
ALPSRP212120720/ALPSRP225540720	17 January 2010/19 April 2010	728	0.38
ALPSRP107740690/ALPSRP114450690	1 February 2008/18 March 2008	49	0.51
ALPSRP153980780/ALPSRP160690780	14 December 2008/29 January 2009	413	0.58
ALPSRP264340710/ALPSRP271050710	10 January 2011/25 February 2011	662	0.52
ALPSRP265360640/ALPSRP272070640	17 January 2011/4 March 2011	475	0.72
ALPSRP215610680/ALPSRP222320680	10 February 2010/28 March 2010	266	0.31
ALPSRP099060680/ALPSRP105770680	4 December 2007/19 January 2008	483	0.49
ALPSRP261850730/ALPSRP268560730	24 December 2010/8 February 2011	582	0.25
ALPSRP020250460/ALPSRP026960460	11 June 2006/27 July 2006	−2459	0.57
ALPSRP210870750/ALPSRP217580750	8 January 2010/23 February 2010	675	0.86
ALPSRP214146710/ALPSRP220856710	31 January 2010/18 March 2010	−501	0.84

Table 2. Comparison of the capabilities of different methods.

Method	Application Case		Estimation
Method	Single-Look	Multi-Look	Coherence	Phase
Boxcar	√	√	√	√
NL-InSAR	√	×	√	√
NL-SAR	√	√	√	√
OC-InSAR-BM3D	√	×	×	√
ComCSC-GR	√	√	×	√
$Φ$ -Net	√	×	√	√
InSAR-MONet	√	√	×	√
NBDNet	√	√	√	√

Table 3. Main parameters of the simulated InSAR system.

Parameter	Value
Carrier Frequency [GHz]	1.27
Incidence Angle [°]	30
Slant Range [km]	600
Coherence	${0.9, 0.6, 0.3}$
Baseline Length [m]	${500, 1000, 1500}$

Table 4. Average RMSE results of the denoised phase with different methods for simulated interferograms with different patterns in the single-look case. Values presented in bold signify the best performance.

Method	$B_{1}$			$B_{2}$			$B_{3}$
Method	$ρ_{1}$	$ρ_{2}$	$ρ_{3}$	$ρ_{1}$	$ρ_{2}$	$ρ_{3}$	$ρ_{1}$	$ρ_{2}$	$ρ_{3}$
Boxcar	0.121	0.227	0.579	0.209	0.301	0.671	0.316	0.426	0.829
NL-InSAR	0.163	0.341	0.459	0.184	0.492	0.995	0.237	0.866	1.498
NL-SAR	0.285	0.366	0.465	0.339	0.545	0.884	0.396	0.839	1.347
OC-InSAR-BM3D	0.112	0.196	0.354	0.162	0.263	0.488	0.207	0.321	0.683
ComCSC-GR	0.180	0.236	0.410	0.255	0.346	0.615	0.322	0.447	0.861
$Φ$ -Net	0.206	0.294	0.425	0.303	0.465	0.667	0.344	0.572	0.865
InSAR-MONet	0.117	0.222	0.702	0.148	0.275	0.813	0.176	0.319	0.908
NBDNet	0.094	0.168	0.308	0.143	0.230	0.413	0.205	0.277	0.550

Table 5. Average EPI results of the denoised phase with different methods for simulated interferograms with different patterns in single-look case. Values presented in bold signify the best performance.

Method	$B_{1}$			$B_{2}$			$B_{3}$
Method	$ρ_{1}$	$ρ_{2}$	$ρ_{3}$	$ρ_{1}$	$ρ_{2}$	$ρ_{3}$	$ρ_{1}$	$ρ_{2}$	$ρ_{3}$
Boxcar	0.246	0.117	0.033	0.291	0.179	0.057	0.254	0.160	0.070
NL-InSAR	0.266	0.061	0.012	0.332	0.163	0.019	0.355	0.070	0.068
NL-SAR	0.051	0.023	0.010	0.135	0.043	0.017	0.173	0.048	0.049
OC-InSAR-BM3D	0.369	0.258	0.090	0.587	0.482	0.124	0.681	0.474	0.097
ComCSC-GR	0.417	0.247	0.040	0.523	0.319	0.059	0.482	0.210	0.068
$Φ$ -Net	0.153	0.085	0.103	0.270	0.111	0.031	0.334	0.106	0.054
InSAR-MONet	0.544	0.331	0.030	0.645	0.448	0.060	0.708	0.530	0.088
NBDNet	0.613	0.412	0.154	0.644	0.546	0.203	0.633	0.578	0.152

Table 6. Average RMSE results of the estimated coherence with different methods for simulated interferograms with different patterns in the single-look case. Values presented in bold signify the best performance.

Method	$B_{1}$			$B_{2}$			$B_{3}$
Method	$ρ_{1}$	$ρ_{2}$	$ρ_{3}$	$ρ_{1}$	$ρ_{2}$	$ρ_{3}$	$ρ_{1}$	$ρ_{2}$	$ρ_{3}$
Boxcar	0.046	0.094	0.119	0.129	0.122	0.117	0.253	0.184	0.120
NL-InSAR	0.032	0.065	0.065	0.059	0.134	0.163	0.086	0.353	0.215
NL-SAR	0.122	0.116	0.080	0.199	0.229	0.161	0.270	0.321	0.202
$Φ$ -Net	0.040	0.039	0.031	0.104	0.106	0.075	0.148	0.157	0.114
NBDNet	0.027	0.047	0.056	0.061	0.067	0.074	0.112	0.100	0.102

Table 7. Number of residues of the denoised phase based on different methods for the Sentinel-1 real interferogram.

Method	Number of Residues
Boxcar	176
NL-SAR	1767
ComCSC-GR	75
InSAR-MONet	47
NBDNet	211

Table 8. Main parameters of the HT-1 InSAR system.

Parameter	Value
Carrier Frequency [GHz]	9.6
Incidence Angle [°]	34.6
Slant Range [km]	623.2
Perpendicular Baseline [m]	−211.6
Height of Ambiguity [m]	51.3
Range Resolution [m]	1.05
Azimuth Resolution [m]	1.98

Table 9. Number of residues (NR), execution time, and RMSE of the reconstructed DSM compared with ICESat-2 of different methods in the HT-1 real experiment.

Method	NR	Execution Time (s)	RMSE (m)
Boxcar	27,047	0.5	4.89
NL-InSAR	37,514	5044.8	5.02
NL-SAR	38,767	328.6	4.22
OC-InSAR-BM3D	38,320	920.0	5.42
ComCSC-GR	10,074	24,660.1	4.71
$Φ$ -Net	3513	3774.3	3.28
InSAR-MONet	14,773	41.9	4.46
NBDNet	10,365	16.1	3.84

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, H.; Wang, J.; Ai, C.; Wu, Y.; Ren, X. NBDNet: A Self-Supervised CNN-Based Method for InSAR Phase and Coherence Estimation. Remote Sens. 2025, 17, 1181. https://doi.org/10.3390/rs17071181

AMA Style

Li H, Wang J, Ai C, Wu Y, Ren X. NBDNet: A Self-Supervised CNN-Based Method for InSAR Phase and Coherence Estimation. Remote Sensing. 2025; 17(7):1181. https://doi.org/10.3390/rs17071181

Chicago/Turabian Style

Li, Hongxiang, Jili Wang, Chenguang Ai, Yulun Wu, and Xiaoyuan Ren. 2025. "NBDNet: A Self-Supervised CNN-Based Method for InSAR Phase and Coherence Estimation" Remote Sensing 17, no. 7: 1181. https://doi.org/10.3390/rs17071181

APA Style

Li, H., Wang, J., Ai, C., Wu, Y., & Ren, X. (2025). NBDNet: A Self-Supervised CNN-Based Method for InSAR Phase and Coherence Estimation. Remote Sensing, 17(7), 1181. https://doi.org/10.3390/rs17071181

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

NBDNet: A Self-Supervised CNN-Based Method for InSAR Phase and Coherence Estimation

Abstract

1. Introduction

2. Related Works

2.1. Noise2Noise

2.2. Neighbor2Neighbor

3. Methodology

3.1. Signal Model

3.2. Training Dataset Generation

3.3. Network Architecture

3.4. Self-Supervised Training

3.5. Inference Details

4. Results

4.1. Training Details

4.2. Comparison Methods

4.3. Simulated Assessment

4.4. Real Assessment

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI