**4. Results**

This section presents an experimental study that was carried out to demonstrate the performance of the proposed algorithms. The tests were performed for a few image completion problems using the following RGB images: Barbara, Lena, Peppers, and Monarch, which are presented in Figure 1. All of them have a resolution of 512 × 512 pixels.

**Figure 1.** Original images: Barbara, Lena, Peppers, and Monarch (from left to right).

#### *4.1. Setup*

The incomplete images were obtained by removing some entries from tensors representing the original images. The following test cases were analyzed:


We compared the proposed methods with the following: FAN (filtering by adaptive normalization) and EFAN (efficient filtering by adaptive normalization) [60], SmPC-QV (smooth PARAFAC tensor completion with quadratic variation) [38], LRTV (low-rank total-variation) [61], TMAC-inc (low-rank tensor completion by parallel matrix factorization with the rank-increasing) [62], C-SALSA (constrained split augmented Lagrangian shrinkage algorithm) [63], fALS (filtered alternating least-squares) [64], and KA-TT (ket augmentation tensor train) [65]. FAN and EFAN are based on adaptive Gaussian low-pass filtration. SmPC-QV performs low-rank tensor completion with smoothness-penalized CP decomposition and gradually increasing CP rank. LRTV accomplishes low-rank matrix completion using total variation regularization. TMAC-inc also belongs to a family of low-rank tensor completion, and in this approach, an incomplete tensor is unfolded with respect to all modes, and the resulting matrices are completed by applying low-rank matrix factorizations together with the adaptive rank-adjusting strategy. C-SALSA performs image completion using the variable splitting approach to solve an LS image reconstruction problem with a strong nonsmooth regularizer. fALS and KA-TT combine low-pass filtration with standard Tucker decomposition and tensor train models, respectively. The proposed algorithm is referred to as Tensorial Interpolation for Image Completion (TI-IC), and it is presented in Algorithm 1. It combines two strategies: RBF-interpolation with an exponential function, and multivariate polynomial regression. To emphasize the importance of both terms in the model (16), we also present the results obtained separately for each of them, using the same partitioning strategy in each case. The TI-IC algorithm with only the exponential term is referred to as TI-IC(Exp). When only the polynomial regression is used, TI-IC will be denoted as TI-IC(Poly).

TI-IC is flexible with respect to the choice of the distance function *<sup>d</sup>*(*n*)(·, ·), degrees of the interpolation polynomials, partitioning, and overlapping rates. Since {*in*, *jn*} lie on a line, *<sup>d</sup>*(*n*)(*in*, *jn*) = <sup>|</sup>*in* <sup>−</sup> *jn*<sup>|</sup> seems to be the best choice. Factor matrices {*P*(*n*) } were determined by quadratic polynomials, hence ∀*n* : *Rn* = 3. Higher-order polynomials result in ill-conditioning of the system

matrix in (19) and do not noticeably improve the performance. The partitioning and overlapping rates were set experimentally to [*S*1, *S*2, *S*3]=[32, 32, 1] and [*θ*1, *θ*2, *θ*3]=[33.33, 33.33, 0]. As the resolution of M is 512 × 512, the overlapping amounts to 5 pixels across the first and second mode for each subtensor Y ∈ <sup>ˆ</sup> <sup>R</sup>16×16×3. For larger subtensors, computational time increased considerably, and we did not observe a noticeable improvement in the quality of recovered images. For smaller subtensors, the performance decreased. The scaling factor in the exponential RBFs was also determined experimentally, and to compute *F*(*n*) we set *τ* = 3 for TI-IC, and *τ* = 5 for TI-IC(Exp).

In the iterative algorithms, the maximum number of iterations was set to 1000, and the threshold for the residual error was equal to 10<sup>−</sup>12. The maximum rank was limited to 50.

The algorithms were implemented in MATLAB 2016a and run on the distributed cluster server in the Wroclaw Centre for Networking and Supercomputing (WCSS) (https://www.wcss.pl/en/) using PLGRID (http://www.plgrid.pl/en) queues and parallel workers. The resources were limited to 10 cores (ncpus) and 32 GB RAM (mem). The workers can be employed to run each algorithm for various initializations in parallel, or they can be used to process subtensors <sup>Y</sup><sup>ˆ</sup> (*s*1,...,*sN*) in Algorithm <sup>1</sup> in parallel, in such a way that each subtensor is processed by one CPU core. The block partitioning procedure was implemented with the blockproc function in MATLAB 2016a, which has an option to run the computation across the available workers. We analyzed both options, i.e., when it was enabled and when it was disabled.

#### *4.2. Image Completion*

The recovered images were validated quantitatively using the signal-to-interference ratio (SIR) measure [31], defined as SIR = 20 log10 ||M||*F* ||M−Y||*F* . The SIR values were averaged over the colormaps. The speeds of the algorithms were compared by measuring the averaged runtime of each test.

Figure 2 illustrates the incomplete image (top left) used in Test A and the results obtained with the algorithms: FAN, EFAN, SmPC-QV, LRTV, C-SALSA, TMac-inc, fALS, KA-TT, TI-IC(Exp), TI-IC(Poly), and TI-IC. The images reconstructed in tests B, C, and D with the same algorithms are depicted in Figures 3–5, respectively. Due to the random initialization of some baseline algorithms, all the tests were repeated 100 times, and the SIR samples are presented in Figure 6 in the form of box-plots, separately for each test. The mean runtime of the evaluated algorithms and the corresponding standard deviations for each test case are listed in Table 1. The algorithms run on a parallel pool of MATLAB workers are denoted with an asterisk.

**Figure 2.** Test A (90% randomly missing pixels) for the image "Barbara".

**Figure 3.** Test B (95% randomly missing entries in the incomplete tensor) for the image "Lena".

**Figure 4.** Test C (200 missing circles of maximum 10-pixel radius) for the image "Peppers".

**Figure 5.** Test D: resolution up-scaling for the image "Monarch".

**Figure 6.** Box-plots of signal-to-interference ratio (SIR) performance for the tests (**A**–**D**) with the algorithms 1 = FAN, 2 = EFAN, 3 = SmPC-QV, 4 = LRTV, 5 = C-SALSA, 6 = TMac-inc, 7 = fALS, 8 = KA-TT, 9 = TI-IC(Exp), 10 = TI-IC(Poly), 11 = TI-IC.

**Table 1.** Mean runtime (in seconds) of the algorithms and the corresponding standard deviations for each test case. An asterisk denotes the use of parallel processing with a parallel pool of workers in MATLAB.


#### *4.3. Discussion*

The experiments were carried out for typical but challenging image completion problems. In test A, we knew only 10% of the pixels in the "Barbara" image, and the aim was to recover the 90%

missing pixels. The results illustrated in Figures 2 and 6A show that good quality reconstructions were obtained when the EFAN, SmPC-QV, fALS, KA-TT, TI-IC(Exp), and TI-IC algorithms were used, but the image recovered with TI-IC had the highest SIR score. TI-IC was also quite fast in this test (see Table 1). It lost in the runtime category only to FAN and EFAN, but its parallel version TI-IC \* ) was more than 250 times faster than SmPC-QV. The latter performs the CP decomposition, but the difference in computational speed comes from the fact that in our method, the factor matrices are precomputed, and only the core tensor in the Tucker decomposition is estimated using the data. The results obtained in Test B are presented in Figures 3 and 6B. They confirm the conclusions drawn from Test A, but it should be noted that TI-IC strengthened its leading position in the SIR performance. Moreover, its runtime was shorter than that in the previous test because only 5% of the entries were known, and hence the system matrix in (19) was smaller. Test C compared algorithms for the completion of many small-scale missing regions (holes), distributed across the image. The results presented in Figures 4 and 6C show that EFAN and SmPC-QV failed to provide satisfactory reconstructions in this test, but TI-IC outperformed the other algorithms considerably. Obviously, a lower number of missing pixels in the image to be completed results in a noticeable increase in the runtime, but it was still below the runtime of low-rank tensor completion methods, such as SmPC-QV, TMac-inc, fALS, and LRTV. In Test D, only 50 % of pixels were unknown, but not all the tested algorithms handled this case well. In this test, TI-IC also yielded the best reconstruction (see Figures 5 and 6D) but only slightly better than that obtained with TI-IC(exp). Hence, the low-degree polynomial regression in Test D does not affect the result considerably, and due to the computation time, it can be neglected. In other tests, both approaches (RBF interpolation and polynomial regression), combined appropriately, were essential to yielding high-quality results.

#### **5. Conclusions**

In this study, we showed the relationship between the models of RBF interpolation and Tucker decomposition (Remark 1). We combined the exponential RBF interpolation and polynomial regression in one model and experimentally demonstrated that such a hybrid method achieved the highest SIR scores in all the tests. The proposed algorithm (TI-IC) can be applied to a wide spectrum of image-completion problems. The incomplete images can contain many single missing entries or missing pixels distributed across the image, a large number of small-scale regions (holes), or regularly shaped missing regions, such as in resolution up-scaling problems. The TI-IC algorithm is also computationally efficient. It provides reconstructions of the highest quality and in a much shorter time than the tested low-rank tensor image-completion methods. Its runtime depends on the number of missing entries in an input tensor, and it is shorter if more entries are unknown. The computational complexity of the proposed method can be controlled by the block partitioning strategy, as proven in Remark 2. Assuming the overlapping in this partitioning strategy, we avoided visible disturbances around boundary entries of the blocks, which is an intrinsic effect of RBF interpolation methods. Furthermore, due to the use of RBFs, the overlapping blocks can be processed in parallel computer architectures, and our experiments demonstrated that the use of a parallel pool of workers in MATLAB considerably shortened the runtime of the proposed algorithm.

Summing up, the proposed algorithm outperforms all the tested image completion methods for a wide spectrum of tests. Its computational runtime is also satisfactory and considerably shorter than that for the low-rank tensor decompositions. The proposed algorithm can also be efficiently implemented on parallel computer architectures.

**Author Contributions:** Conceptualization, R.Z.; Methodology, R.Z.; Software, R.Z. and T.S.; Investigation, R.Z. and T.S.; Validation, R.Z. and T.S.; Visualization, T.S.; Writing—original draft, R.Z.; Writing—review and editing, R.Z.; Funding acquisition, R.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by grant 2015/17/B/ST6/01865, funded by the National Science Center in Poland. Calculations were performed at the Wroclaw Centre for Networking and Supercomputing under grant no. 127.

**Conflicts of Interest:** The authors declare no conflict of interest.
