*7.3. Super Resolution*

We evaluate our SSM-NTF in comparison with two examplar-based scheme [37] for image Super Resolution (SR) Problem (41) with a bicubic filter. Figure 6 shows the 15 test natural images [18] with both rich texture and structure. All the schemes are applied to the illumination channel, where the scale factor is 3, we always use 3 × 3 low-resolution patches with overlap of 1 pixel between adjacent patches, corresponding to 9 × 9 patches with overlap of 3 pixels for the high-resolution patches. In these experiments, we have used the following parameters: *A* = 0.8, *B* = 1.8, *γ*<sup>1</sup> = 1.1 and *γ*<sup>3</sup> = 1.2 *L*/*M* where *L* and *M* are the sample and dictionary size, respectively. In our scheme, dictionary learning is performed between HR and middle-level (MR) images which are the first-, and second-order derivatives of the upsampled version of one LR image by a factor of 2. The four 1*D* filters used to extract the derivatives are:

$$\begin{array}{ll} \mathbf{f\_1} = [-1, 0, 1], & \mathbf{f\_2} = \mathbf{f\_1}^T\\ \mathbf{f\_3} = [1, 0, -2, 0, 1], & \mathbf{f\_4} = \mathbf{f\_3}^T \end{array} \tag{45}$$

We train two pairs of HR/LR dictionaries {**Φ***h*, **Ψ***h*} and {**Φ***l*, **Ψ***l*} from 100,000 HR/LR patch pairs [**X***h*, **X***l*] randomly sampled from the collected natural images which are also used in [37] where **X***<sup>h</sup>* is sampled from the HR images and **X***<sup>l</sup>* is sampled from the four feature images. The feature images are obtained by applying the four filters to the upsampled LR image. Given **Φ** and **Ψ** and the four MR feature images, the sparse coefficients **Y** and threshold value *λ* can be calculated by Algorithm 1. With the theory in [37], the HR image can be recovered via Algorithm 3. In the experiment, our HR dictionary pair are of size 81 × 450 and MR ones are of size 144 × 450. The dictionary size of [37] is 81 × 1024 (HR) and 144 × 1024 (MR) at its best performance as stated in the paper. Thus, the dictionary size of [37] is larger than the sum of our dictionaries. Table 3 shows the objective evaluation results of our proposed SSM-NTF compared with bicubic interpolation and [37]. On average, our SSM-NTF presents best in PSNR. Figure 7 presents the corresponding visual comparison of the illumination SR results of Image 12. We can observe that the result of bicubic interpolation is too smooth and the result of [37] suffers from obvious ringing artifact and noises. The HR reconstruction of our SSM-NTF method provides more clear details.

**Figure 6.** Test images for image super-resolution performance evaluation [18].

**Figure 7.** Visual quality comparison of SR results for Image 12 corresponding to Table 3. From left to right: Original image, result of bicubic interpolation (*PSNR* = 30.28), [32] (*PSNR* = 30.62) and our SSM-NTF method (*PSNR* = 30.99), respectively.


**Table 3.** PSNR (dB) for 3× SR reconstructions results.

#### *7.4. Image Inpainting*

To illustrate the potential applicability of our proposed SSM-NTF model on image inpainting, we apply it to the applications of text removal. In these experiments, we have used the following parameters: *A* = 0.8, *B* = 1.8, *γ*<sup>1</sup> = 1.1 and *γ*<sup>3</sup> = 1.2 *L*/*M* where *L* and *M* are the sample and dictionary size, respectively. We operate on the image 'Adar','Lena', 'Couple', 'Hill' with super-imposed text of various fonts.

In this experiment, we applied our SSM-NTF model to image inpainting in a way similar to the non-blind KSVD inpainting algorithm [9], which requires the knowledge of which pixels are corrupted and required inpainting. Actually, only the non-corrupted pixels are used to training the dictionary pair and inpainting the images. We operate our method on pathes of size 10 × 10 that extract from the images with overlap of 1 pixel between adjacent. The trained dictionary pair are of size 100 × 200. The KSVD algorithm in this experiment is dealing with patches of size 8 × 8 that extract from the images with overlap of 1 pixel between adjacent. The dictionary size is 64 × 256 at its best performance according to [9]. The patch inpainting stage is followed solving Problem (41). Table 4 shows the objective evaluation results of our proposed SSM-NTF compared with DCT and KSVD [9]. The visual comparisons are shown in Figures 8 and 9. We find that the proposed SSM-NTF method is able to eliminate text of fonts completely while the KSVD is dull. Our SSM-NTF method achieves better performance in terms of both subjective and objective quality.

**Figure 8.** Visual quality comparison of text image inpainting results. From left to right: Original image, text image, result of DCT (*PSNR* = 27.11), KSVD (*PSNR* = 28.01) and our SSM-NTF method (*PSNR* = 28.95), respectively.

**Figure 9.** Visual quality comparison of scratch image inpainting results. From left to right: Original image, scratch image, result of DCT (*PSNR* = 30.81), KSVD (*PSNR* = 31.69), and our SSM-NTF method (*PSNR* = 32.02), respectively.


**Table 4.** PSNR (dB) for image inpainting results.

#### **8. Conclusions**

In this paper, we propose a stable sparse model with non-tight frame (SSM-NTF) and further formulate a dictionary pair learning model to stably recover the signals. We theoretically analyze the rationality of the approximation for RIP with the non-tight frame condition. The proposed SSM-NTF has RIP and the closed-form expression of the sparse coefficients that ensure the stable recovery especially for seriously noise images. The proposed SSM-NTF contains both a synthesis sparse and an analysis system which share the common sparse coefficients without taking into account the thresholding. We also propose an efficient dictionary pair learning algorithm via developing an explicit analytical expression of the inherent relation between the dictionary pair. The proposed algorithm is capable of approximating structures of signals via a pair of adaptive dictionaries. The effectiveness of our proposed SSM-NTF and its corresponding algorithms are demonstrated in image denoising, image super-resolution and image inpainting. The results of numerical experiments show that the proposed SSM-NTF achieves superior to the compared methods in objective and subjective quality on most of the cases.

On the other hand, our proposed SSM-NTF is actually a 1D sparse model. The 1D sparse model suffers from high memory as well as high computational costs especially when handling high dimensional data. MD frame can be expressed as the kronecker product of a series of 1D frames. Benefitting from this good characteristic, in future work, we will extend our stable sparse model to propose an MD stable sparse model. Moreover, the proposed SSM-NTF is not effective enough to remove other kinds of noise (e.g., salt and pepper noise) as the loss function of SSM-NTF is gaussian. We would like to improve the performance of our model by changing the loss function.

**Author Contributions:** M.Z. derived the theory, analyzed the data, performed the performance and wrote the original draft; Y.S. and N.Q. researched the relevant theory, participated in discussions of the work and revised the manuscript; B.Y. supervised the project. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by NSFC (No.61672066,61976011, U1811463, U19B2039, 61906008, 61906009), Beijing municipal science and technology commission (No.Z171100004417023) and the Scientific Research Common Program of Beijing Municipal Commission of Education (KM202010005018).

**Acknowledgments:** This work was supported by Beijing Advanced Innovation Center for Future Internet Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology. We deeply appreciate the organizations mentioned above.

**Conflicts of Interest:** The authors declare no conflict of interest.
