3.1.1. The SMD-Net Architecture

Nowadays, some phase filtering methods [29–31] based on DL have achieved a better filtering performance than conventional filtering approaches. Nevertheless, they are purely data-driven, which means that these networks rely on a huge data volume and their underlying structures are difficult to interpret. In addition, these networks generally consist of many layers and learn a large number of parameters, which lead to grea<sup>t</sup> increases in the computational burden of the network. Considering that SR algorithms model the physical processes underlying the problem and a few parameters, we designed the SMD-Net by employing the idea of unrolling the ISTA algorithm. Unlike the existing based-DL phase filtering methods, the SMD-Net focuses on the interferometric phase filtering model rather

than relying entirely on the data fitting and its network structure is simple. It thus is expected to improve the filtering performance and computational efficiency at the same time. The architecture of the SMD-Net is shown in Figure 2.

**Figure 2.** The architecture of the SMD-Net.

In the SMD-Net, each network block is equivalent to an iterative process in the traditional ISTA algorithm. To improve the filtering performance and computational efficiency, we exploited a CNN module that automatically learns the sparse transform instead of the artificial sparse transform in the traditional ISTA algorithm. The CNN module is shown in Figure 3.

**Figure 3.** The CNN module in the kth block.

From each block in Figure 2, ℵ(·) and ℵ−1(·) functions in the CNN module replace the sparse basis ψ and the conjugate transpose of ψ in the traditional ISTA, respectively. Thus, Equation (9) is transformed as

$$\hat{\mathbf{p}}\_c = \underset{\mathbf{p}\_c}{\operatorname{argmin}} \frac{1}{2} \|\boldsymbol{\Phi}\mathbf{p}\_c - \mathbf{p}\|\_2^2 + \lambda \|\mathbb{N}(\mathbf{p}\_c)\|\_1 \tag{15}$$

The iterative process in ISTA converts to each block operation in the SMD-Net. Now, we can take the *k*th block as an example for detailed analysis.

Step 1: The SMD-Net transfers the model parameters of block *k* − 1 corresponding to the ISTA algorithm parameters to block *k* by making use of back-propagation. Then, **h**(*k*) is updated by

$$\mathbf{h}^{(k)} = \mathbf{p}\_c^{(k-1)} - a\Phi^\mathbf{H} \left(\Phi \mathbf{p}\_c^{(k-1)} - \mathbf{p}\right) \tag{16}$$

where *α* indicates the step size; **Φ<sup>H</sup>** is the conjugate transpose of **Φ**; and **h**(*k*) is the residual error in the *k*th block.

Step 2: In order to satisfy the second term of Equation (15), (i.e., the sparse constraint). In the first stage of the CNN module, **h**(*k*) is sparsely represented as ℵ**h**(*k*) ℵ(·) is a function that automatically learns the sparse domain.

Step 3: The CNN module takes ℵ **h**(*k*) and *λ* as inputs. The *k*th filtered result in the sparse domain is calculated by

$$\mathbf{s}^{(k)} = \text{soft}\left(\mathbb{N}\left(\mathbf{h}^{(k)}\right), \lambda\right) = \text{sign}\left(\mathbb{N}\left(\mathbf{h}^{(k)}\right)\right) \max\left\{ \left| \mathbb{N}\left(\mathbf{h}^{(k)}\right) \right| - \lambda, 0 \right\} \tag{17}$$

Step 4: The ℵ−1(·) function is designed on the constraint of ℵ−<sup>1</sup>(·) × <sup>ℵ</sup>(·) = I to obtain the *k*th filtered result in the spatial domain. The result is obtained by

$$\mathbf{p}\_c^{(k)} = \aleph^{-1} \left( \mathbf{s}^{(k)} \right) \tag{18}$$

where ψ**p***c* is a sparse representation of **p***c*; ψ denotes a certain transform such as Wavelet, Fourier and so on; and *λ* is a tunable soft threshold parameter.

The SMD-Net is a combination of the merits of modern CNN and the ISTA algorithm. On one hand, the CNN can quickly process the operations between network layers, which makes up for the disadvantage of traditional ISTA algorithms relying on a large number of iterations, thus improving the computational efficiency. On the other hand, interpretability and a few parameters with specific meanings of the ISTA algorithm overcome the drawback that the CNN learns abundant parameters using a large amount of training data and improves the accuracy and the stability of the method.
