**1. Introduction**

In recent years, different types of tensor data have emerged with the significant progress of modern sensor technology, such as color images [1], videos [2], functional MRI data [3], hyper-spectral images [4], point could data [5], traffic stream data [6], etc. Thanks to its multi-way nature, tensor-based methods have natural superiority over vector and matrix-based methods in analyzing and processing ubiquitous modern multi-way data, and have found extensive applications in computer vision [1,7], data mining [5], machine learning [2], signal processing [8], to name a few. In real applications, the acquired tensor data may often suffer from noises and gross corruptions owing to many different reasons such as sensor failure, lens pollution, communication interference, occlusion in videos, or abnormalities in a sensor network [9], etc. At the same time, many real-world tensor data, such as face images or videos, have been shown to have some low-dimensional structure and can be well approximated by a smaller number of "principal components" [8]. Then, a question naturally arises: how to pursue the principal components of an observed tensor data in the presence of both noises and gross corruptions? We will answer this question in this paper and refer to the proposed methodology as Stable Tensor Principal Component Pursuit (STPCP).

The tensor low-rankness is an ideal model of the property that a tensor data can be well approximated by a small number of principal components [8]. In the last decade, low-rank tensor models have attracted much attention in many fields [10]. There are multiple low-rank tensor models since there exist different definitions of tensor rank. Among these models, the low CP rank model [11] and the low Tucker rank model [1] should be the most famous two. The low CP rank model approximates the underlying tensor by the sum of a small number of rank-1 tensors, whereas the low Tucker rank model assumes the unfolding matrix along each mode are low rank. To estimate an unknown low-rank tensor from corrupted observations, it is a natural option to consider the rank minimization problem which chooses the tensor of lowest rank as the solution from a certain feasible set. However, tensor rank minimization, even in its 2-way (matrix) case, is generally NP-hard [12] and even harder in higher-way cases [13]. For tractable solutions, researchers turn to a variety of convex surrogates for tensor rank [1,14–18] to replace the tensor rank in rank minimization problem. Methods based on surrogates for the tensor CP rank and Tucker rank have been extensively explored in both the theoretical side and the application side [14,17,19–24].

Recently, the low-tubal-rank model [16,25] has shown better performance than traditional tensor low-rank models in many tensor recover tasks such as image/video inpainting/denoising/ sensing [2,25,26], moving object detection [27], multi-view learning [28], seismic data completion [29], WiFi fingerprint [30], MRI imaging [16], point cloud data inpainting [31], and so on. The tubal rank is a new complexity measure of tensor defined through the framework of tensor singular value decomposition (t-SVD) [32,33]. At the core of existing low-tubal-rank models is the tubal nuclear norm (TNN) which is a convex surrogate for the tubal rank. In contrast to CP-based tensor nuclear norms or Tucker-based tensor nuclear norms which models low-rankness in the original domain, TNN models low-rankness in the Fourier domain. It is pointed out in [25,34,35] that TNN has superiority over traditional tensor nuclear norms in exploiting the ubiquitous "spatial-shifting" property in real-world tensor data.

Inspired by the superior performance of TNN, this paper adopts TNN as a low-rank regularizer in the proposed STPCP model. Specifically, the proposed STPCP aims to estimate the underlying tensor data L0 <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> from an observation tensor <sup>M</sup> polluted by both small dense noises and sparse gross corruptions as follows:

$$
\underline{\mathbf{M}} = \underline{\mathbf{I}}\_0 + \underline{\mathbf{S}}\_0 + \underline{\mathbf{E}}\_0 \tag{1}
$$

where S0 is a tensor denoting the sparse corruptions and E0 is a tensor representing small dense noises. Model (1) is also known as robust tensor decomposition in [36,37].

Our STPCP model is first formulated as a TNN-based convex problem. Then, our theoretical analysis gives upper bound on the estimation error of L0 and S0. In contrast to the analysis in [37], the proposed STPCP can exactly recovery the underlying tensor L0 and the sparse corruption tensor S0 when the noise term E0 vanishes. For efficient solution of the proposed STPCP model, we develop two algorithms with extensions to a more challenging scenario where missing observations are also considered. The first algorithm is an ADMM algorithm and the second algorithm accelerates it using tensor factorization. Experiments show the effectiveness and the efficiency of the designed algorithms.

We organize the rest of this paper as follows. In Section 2, we briefly introduce basic preliminaries for t-SVD and some related works. The proposed STPCP model is formulated and analyzed theoretically in Section 3. We design two algorithms in Section 4 and report experimental results in Section 5. This work is concluded in Section 6. The proofs of theorems, propositions, and lemmas are given in the appendix.

### **2. Preliminaries and Related Works**

In this section, some preliminaries of t-SVD are first introduced. Then, the related works are presented.

**Notations.** We denote vectors by bold lower-case letters, e.g., **<sup>a</sup>** <sup>∈</sup> <sup>R</sup>*n*, matrices by bold upper-case letters, e.g., **<sup>A</sup>** <sup>∈</sup> <sup>R</sup>*n*1×*n*<sup>2</sup> , and tensors by underlined upper-case letters, e.g., <sup>A</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> . For a given 3-way tensor, we define its fiber as a vector given through fixing all indices but one, and its *slice* as a matrix obtained by fixing all indices but two. For a given 3-way tensor A, we use A*ijk* to denote its (*i*, *<sup>j</sup>*, *<sup>k</sup>*)-th element; **<sup>A</sup>**(*k*) :<sup>=</sup> <sup>A</sup>(:, :, *<sup>k</sup>*) is used to denote its *<sup>k</sup>*-th frontal slice. <sup>A</sup> is used to denote the tensor after performing 1D Discrete Fourier Transformation (DFT) on all tube fibers A(*i*, *j*, :) of T, ∀*i* = 1, 2, ··· , *n*1, *j* = 1, 2, ··· , *n*2, which can be efficiently computed by the Matlab command A- = fft(A, [], 3). We use dft3(·) and idft3(·) to represent the 1D DFT and inverse DFT along the tube fibers of 3-way tensors, i.e., dft3(A) := fft(A, [], 3), idft3(A) := ifft(A, [], 3).

For a given matrix **<sup>M</sup>** <sup>∈</sup> <sup>R</sup>*n*1×*n*<sup>2</sup> , define the nuclear norm and spectral norm of **<sup>M</sup>** respectively as:

$$\|\mathbf{M}\|\!\_{\*} := \sum\_{i=1}^{p} \sigma\_{i}(\mathbf{M})\_{\prime} \quad \text{and} \quad \|\mathbf{M}\|\!\_{\text{sp}} := \max\{\sigma\_{i}(\mathbf{M})\}\_{\prime}.$$

where *p* = min{*n*1, *n*2}, and *σ*1(**M**) ≥···≥ *σp*(**M**) are the singular values of **M** in a non-ascending order. The *<sup>l</sup>*0-norm, *<sup>l</sup>*1-norm, Frobenius norm, *<sup>l</sup>*∞-norm of a tensor A <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> is defined as:

$$\|\|\Delta\|\|\circ:=\sum\_{ijk}\mathbf{1}(\Delta\_{j\bar{k}}\neq 0), \quad \|\Delta\|\|\_{1}:=\sum\_{ijk}|\Delta\_{\bar{i}\bar{k}}|, \quad \|\Delta\|\|\mathbf{r}:=\sqrt{\sum\_{ijk}\underline{\mathbf{A}}\_{ijk\mathbf{\bar{r}}}^{2}}, \quad \|\Delta\|\_{\mathbb{R}^{\infty}}:=\max\_{ijk}|\Delta\_{\bar{i}\bar{k}}|, \quad \|\Delta\_{\bar{i}\bar{k}}\bar{k}\|\|\Delta\_{\bar{i}\bar{k}}\| = \max\_{i}|\underline{\mathbf{A}}\_{i}|$$

where 1(*C*) is an indicator function whose value is 1 if the condition *C* is true, and 0 otherwise.

Given two matrices **<sup>A</sup>** = (*aij*) <sup>∈</sup> <sup>C</sup>*n*1×*n*<sup>2</sup> ,**<sup>B</sup>** = (*bij*) <sup>∈</sup> <sup>C</sup>*n*1×*n*<sup>2</sup> , we define their inner product as follows:

$$\langle \mathbf{A}, \mathbf{B} \rangle = \text{tr}(\mathbf{A}^{\mathsf{H}} \mathbf{B}) = \sum\_{ij} a\_{ij} b\_{ij\prime\prime}$$

where **A**<sup>H</sup> denotes conjugate transpose of matrix **A** and *a*¯*ij* denotes the conjugation of complex number *aij*. Given two 3-way tensors A, B <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> , we define their inner product as follows:

$$
\langle \underline{\mathbf{A}} \prime \underline{\mathbf{B}} \rangle := \sum\_{ijk} \underline{\mathbf{A}}\_{ijk} \underline{\mathbf{B}}\_{ijk} \cdots
$$

### *2.1. Tensor Singular Value Decomposition*

We first define 3 operators based on block matrices which are introduced in [33]. For a given 3-way tensor <sup>A</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> , we define its block vectorization bvec(·) and the inverse operation bvfold(·) in the following equation:

$$\text{bvec}(\underline{\mathbf{A}}) := \begin{bmatrix} \mathbf{A}^{(1)} \\ \mathbf{A}^{(2)} \\ \vdots \\ \mathbf{A}^{(n\_3)} \end{bmatrix} \in \mathbb{R}^{n\_1 n\_3 \times n\_2}, \quad \text{bvfold}(\text{bvec}(\underline{\mathbf{A}})) = \underline{\mathbf{A}} \cdot \mathbf{A}$$

We further define the block circulant matrix bcirc(·) of any 3-way tensor <sup>A</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> as follows:

$$\mathbf{b}\text{circ}(\underline{\mathbf{A}}) := \begin{bmatrix} \mathbf{A}^{(1)} & \mathbf{A}^{(n\_{3})} & \cdots & \mathbf{A}^{(2)} \\ \mathbf{A}^{(2)} & \mathbf{A}^{(1)} & \cdots & \mathbf{A}^{(3)} \\ \vdots & \ddots & \ddots & \vdots \\ \mathbf{A}^{(n\_{3})} & \mathbf{A}^{(n\_{3}-1)} & \cdots & \mathbf{A}^{(1)} \end{bmatrix} \in \mathbb{C}^{n\_{1}n\_{3} \times n\_{2}n\_{3}}$$

Equipped with above defined operators, we are now in a position to define the t-product of 3-way tensors.

**Definition 1** (t-product [33])**.** *Given two tensors* <sup>A</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> *and* <sup>B</sup> <sup>∈</sup> <sup>R</sup>*n*2×*n*4×*n*<sup>3</sup> *, the t-product of* <sup>A</sup> *and* B *is a new 3-way tensor* C *with size n*<sup>1</sup> × *n*<sup>4</sup> × *n*3*:*

$$
\underline{\mathbf{C}} = \underline{\mathbf{A}} \ast \underline{\mathbf{B}} = \text{:}
\text{bvfold}\left(\text{bcircc}(\underline{\mathbf{A}}) \text{bvec}(\underline{\mathbf{B}})\right).
\tag{2}
$$

A more intuitive interpretation of t-SVD is as follows [33]. If we treat a 3-way tensor <sup>A</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> as a matrix of size *<sup>n</sup>*<sup>1</sup> <sup>×</sup> *<sup>n</sup>*<sup>2</sup> whose entries are the tube fibers, then the tensor t-product can be analogously understood as the "matrix multiplication" where the standard scalar product is replaced with the vector circular convolution between the tubes (i.e., vectors):

$$\underline{\mathbf{C}} = \underline{\mathbf{A}} \* \underline{\mathbf{B}} \Leftrightarrow \underline{\mathbf{C}}(i, j, :) = \sum\_{k=1}^{n\_2} \underline{\mathbf{A}}(i, k, :) \* \underline{\mathbf{B}}(k, j, :), \quad \forall i = 1, 2, \dots, n\_1, j = 1, 2, \dots, n\_4. \tag{3}$$

where represent the operation of circular convolution [33] of two vectors **<sup>a</sup>**, **<sup>b</sup>** <sup>∈</sup> <sup>R</sup>*n*<sup>3</sup> defined as (**a <sup>b</sup>**)*<sup>j</sup>* = <sup>∑</sup>*n*<sup>3</sup> *<sup>k</sup>*=<sup>1</sup> **a***k***b**1+(*j*−*k*)mod*n*<sup>3</sup> .

We also define the block diagonal matrix bdiag(·) of any 3-way tensor <sup>A</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> and its inverse bdfold(·) as follows

$$\mathbf{b}\mathbf{b}\mathbf{i}\mathbf{a}(\underline{\mathbf{A}}) := \begin{bmatrix} \mathbf{A}^{(1)} \\ & \ddots \\ & & \mathbf{A}^{(n\_{3})} \end{bmatrix} \in \mathbb{R}^{n\_{1}n\_{3} \times n\_{2}n\_{3}}, \quad \mathbf{b}\mathbf{d}\mathbf{f}\mathbf{d}\mathbf{d}(\mathbf{b}\mathbf{d}\mathbf{a}(\underline{\mathbf{A}})) = \underline{\mathbf{A}}.$$

We also use **A** (or A) to denote the block diagonal matrix of tensor A- = dft3(A) (i.e., the Fourier version of A) i.e.,

$$\overline{\mathbf{A}} = \text{bdiag}(\underline{\widetilde{\mathbf{A}}}) := \begin{bmatrix} \widetilde{\mathbf{A}}^{(1)} \\ & \ddots \\ & & \widetilde{\mathbf{A}}^{(n\_3)} \end{bmatrix} \in \mathbb{C}^{n\_1 n\_3 \times n\_2 n\_3} \dots$$

Then the relationship between DFT and circular convolution further indicates that the conducting t-product in the original domain is equivalent to performing standard matrix product on the Fourier block diagonal matrices [33]. Since matrix product on the Fourier block diagonal matrices can be parallel written as matrix product of all the frontal slices in the Fourier domain, we have the following relationships:

$$
\underline{\mathbf{C}} = \underline{\mathbf{A}} \* \underline{\mathbf{B}} \Leftrightarrow \overline{\mathbf{C}} = \overline{\mathbf{A}} \overline{\mathbf{B}} \Leftrightarrow \widetilde{\mathbf{C}}^{(k)} = \widetilde{\mathbf{A}}^{(k)} \widetilde{\mathbf{B}}^{(k)}, \quad k = 1, 2, \cdots, n\_3. \tag{4}
$$

The relationship between the t-product and FFT also indicates that the inner product of two 3-way tensors A, <sup>B</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> and the inner product of their corresponding Fourier block diagonal matrices A, <sup>B</sup> <sup>∈</sup> <sup>C</sup>*n*1*n*3×*n*2*n*<sup>3</sup> satisfy the following relationship:

$$
\langle \underline{\mathbf{A}}, \underline{\mathbf{B}} \rangle = \frac{1}{n\_3} \left\langle \tilde{\underline{\mathbf{A}}}, \tilde{\underline{\mathbf{B}}} \right\rangle = \frac{1}{n\_3} \left\langle \overline{\mathbf{A}}, \overline{\mathbf{B}} \right\rangle \dots
$$

When A = B = X, one has:

$$\|\underline{\mathbf{X}}\|\_{\mathcal{F}} = \frac{1}{\sqrt{n\_3}} \|\underline{\mathbf{X}}\|\_{F}.$$

We further define the concepts of tensor transpose, identity tensor, f-diagonal tensor and orthogonal tensor as follows.

**Definition 2** (tensor transpose [33])**.** *Given a tensor* <sup>A</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> *, then define its transpose tensor* <sup>A</sup> *of size n*<sup>2</sup> × *n*<sup>1</sup> × *n*<sup>3</sup> *which can be formed through first transposing all the frontal slices of* A *and then exchanging each k-th transposed frontal slice with the* (*n*<sup>3</sup> + 2 − *k*)*-th transposed frontal slice for all k* = 2, 3, ··· , *n*3*.*

For example, consider 3-way tensor A = [**A**(1) <sup>|</sup>**A**(2) <sup>|</sup>**A**(3) <sup>|</sup>**A**(4) ] <sup>∈</sup> <sup>R</sup>*n*1×*n*2×<sup>4</sup> with 4 frontal slices, the tensor transpose A of A is:

$$\underline{\mathbf{A}}^{\top} = [(\mathbf{A}^{(1)})^{\top}](\mathbf{A}^{(4)})^{\top}[(\mathbf{A}^{(3)})^{\top}](\mathbf{A}^{(2)})^{\top}] \in \mathbb{R}^{n\_2 \times n\_1 \times 4}.$$

**Definition 3** (identity tensor [33])**.** *The identity tensor* <sup>I</sup> <sup>∈</sup> <sup>R</sup>*n*×*n*×*n*<sup>3</sup> *is a tensor whose first frontal slice is the n-by-n identity matrix with all other frontal slices are zero matrices.*

**Definition 4** (f-diagonal tensor [33])**.** *We call a 3-way tensor f-diagonal if all the frontal slices of it are diagonal matrices.*

**Definition 5** (orthogonal tensor [33])**.** *We call a tensor* <sup>Q</sup> <sup>∈</sup> <sup>R</sup>*n*×*n*×*n*<sup>3</sup> *an orthogonal tensor if the following equations hold:*

$$
\underline{\mathbf{Q}}^{\top} \ast \underline{\mathbf{Q}} = \underline{\mathbf{Q}} \ast \underline{\mathbf{Q}}^{\top} = \overline{\mathbf{r}}.
$$

Then, the tensor singular value decomposition (t-SVD) can be given as follows.

**Definition 6** (Tensor singular value decomposition, and Tensor tubal rank [38])**.** *Given any 3-way tensor* <sup>X</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> *, then it has the following factorization called tensor singular value decomposition (t-SVD):*

$$
\underline{\mathbf{X}} = \underline{\mathbf{U}} \* \underline{\mathbf{\Sigma}} \* \underline{\mathbf{Y}}^{\top} \,. \tag{5}
$$

*where the left and right factor tensors* <sup>U</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*1×*n*<sup>3</sup> *and* <sup>V</sup> <sup>∈</sup> <sup>R</sup>*n*2×*n*2×*n*<sup>3</sup> *are orthogonal, and the middle tensor* <sup>Σ</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> *is a rectangular f -diagonal tensor.*

A visual illustration for the t-SVD is shown in Figure 1. It can be computed efficiently by FFT and IFFT in the Fourier domain according to Equation (4). For more details, see [2].

**Figure 1.** A visual illustration of t-SVD.

**Definition 7** (Tensor tubal rank [38])**.** *The tensor tubal rank of any 3-way tensor* <sup>X</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> *is defined as the number of non-zero tubes of* Σ *in its t-SVD shown in Equation (5), i.e.,*

$$r\_{tubal}(\underline{\mathbf{A}}) := \sum\_{i} \mathbf{1}(\underline{\Sigma}(i, i\_{\prime}; \cdot) \neq \mathbf{0}).\tag{6}$$

**Definition 8** (Tubal average rank [38])**.** *The tubal average rank <sup>r</sup>*a(A) *of any 3-way tensor* <sup>A</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> *is defined as the averaged rank of all frontal slices of* A*as follows,*

$$r\_{\mathbf{a}}(\underline{\mathbf{A}}) := \frac{1}{n\_{\mathbf{3}}} \sum\_{k=1}^{n\_{\mathbf{3}}} \text{rank} \left( \widetilde{\mathbf{A}}^{(k)} \right). \tag{7}$$

**Definition 9** (Tensor operator norm [2,38])**.** *The tensor operator norm* Fop *of any 3-way tensor* <sup>F</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> *is defined as follows:*

$$\|\underline{\mathbf{E}}\|\_{\mathrm{op}} := \sup\_{\|\underline{\mathbf{A}}\|\_{\mathrm{F}} \leq 1} \|\underline{\mathbf{E}} \ast \underline{\mathbf{A}}\|\_{\mathrm{F}}.\tag{8}$$

The relationship between t-product and FFT indicates that

$$\|\mathbf{E}\|\_{\text{op}} := \sup\_{\|\mathbf{A}\|\_{\text{F}} \leq 1} \|\mathbf{E} \ast \mathbf{A}\|\_{\text{F}} = \sup\_{\|\overline{\mathbf{A}}\|\_{\text{F}} \leq \sqrt{n\_{3}}} \|\overline{\mathbf{F}} \cdot \overline{\mathbf{A}}\|\_{\text{F}} = \|\overline{\mathbf{A}}\|\_{\text{sp}}.\tag{9}$$

**Definition 10** (Tensor spectral norm [38])**.** *The tensor spectral norm* Asp *of any 3-way tensor* <sup>F</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> *is defined as the matrix spectral norm of* **<sup>A</sup>***, i.e.,*

$$\|\mathbf{A}\|\_{\text{sp}} := \|\mathbf{A}\|\_{\text{sp}}.\tag{10}$$

We further define the tubal nuclear norm.

**Definition 11** (Tubal nuclear norm [2])**.** *For any tensor* <sup>A</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> *with t-SVD* <sup>A</sup> <sup>=</sup> <sup>U</sup> <sup>∗</sup> <sup>Σ</sup> <sup>∗</sup> <sup>V</sup> *, the tubal nuclear norm (TNN) of* A *is defined as:*

$$\|\|\underline{\mathbf{A}}\|\|\_{\text{TNN}} := \langle \underline{\mathbf{\Sigma}}, \underline{\mathbf{I}} \rangle = \sum\_{i=1}^{r} \underline{\mathbf{\Sigma}}(i, i, \mathbf{1}), \tag{11}$$

*where r* = *rtubal*(A)*.*

To understand the tubal nuclear norm, first note that

$$\tau\_{\text{total}}(\underline{\mathbf{A}}) = \sum 1 \left( \underline{\Sigma}(i, i, :) \neq \mathbf{0} \right) \stackrel{(i)}{=} \sum 1 \left( \underline{\hat{\Sigma}}(i, i, :) \neq \mathbf{0} \right) \stackrel{(ii)}{=} \sum 1 \left( \| \underline{\hat{\Sigma}}(i, i, :) \|\_{1} \neq 0 \right) \stackrel{(iii)}{=} \sum 1 \left( \underline{\Sigma}(i, i, 1) \neq 0 \right), \tag{12}$$

where (i) holds because of the definition of DFT [2], (ii) holds by the property of *l*1-norm, and (iii) is a result of DFT [2]. Thus, the tubal rank of A is also the number of non-zero diagonal elements of Σ(*i*, *i*, 1), i.e., the first frontal slice of tensor Σ in the t-SVD of A. Similar to the matrix singular values, the values Σ(*i*, *i*, 1), *i* = 1, 2, ··· , *n*<sup>3</sup> are also called the singular values of tensor A. As the matrix nuclear norm is the sum of matrix singular values, the tubal nuclear norm can be similarly understood as the sum of tensor singular values.

One can also verify by the property of DFT [2] that:

$$\|\underline{\mathbf{A}}\|\_{\text{TNN}} = \sum\_{i=1}^{r} \underline{\mathbf{S}}(i, i, 1) = \sum\_{k=1}^{n3} \sum\_{i=1}^{r} \underline{\mathbf{S}}(i, i, k) = \frac{1}{n3} \sum\_{k=1}^{n3} \|\tilde{\mathbf{A}}^{(k)}\|\_{\*} = \frac{1}{n3} \|\overline{\mathbf{A}}\|\_{\*} \tag{13}$$

which indicates that the TNN of <sup>A</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> is also the averaged nuclear norm all frontal slices of A-. Thus, TNN indeed models the low-rankness of Fourier domain.

Now, we will show that the low-tubal-rank model is ideal to some real-world tensor data, such as color images and videos.

First, we consider a natural image of size 256 × 256 × 3, shown in Figure 2a. In Figure 2b, we plot the distribution of its singular values, i.e., the values of Σ(*i*, *i*, 1) along with the index *i*. As can be seen from Figure 2b, there are only a small number of singular values with large magnitude, and most of the singular values are close to 0. Then, we can say that some natural color images are approximately low tubal rank.

**Figure 2.** The distribution of tensor singular values Σ(*i*, *i*, 1) in a natural color image. (**a**) the sample image, (**b**) the distribution of Σ(*i*, *i*, 1).

Then, consider a commonly used YUV sequence *Mother-daughter\_qcif* (These data can be download from the following link https://sites.google.com/site/subudhibadri/fewhelpfuldownloads.) whose first frame is shown in Figure 3a. We use the Y components of the first 30 frames, and get a tensor of size 144 × 176 × 30 and show the distribution of tensor singular values in Figure 3b. We can see from Figure 3b that similar to Figure 2b, there are only a small number of singular values with large magnitude, and most of the singular values are close to 0. Then, we can say that some videos can be well approximately low tubal rank.

**Figure 3.** The distribution of tensor singular values Σ(*i*, *i*, 1) in a video sequence. (**a**) the first frame of the video, (**b**) the distribution of Σ(*i*, *i*, 1).

For TNN and tensor spectral norm, we highlight the following two lemmas.

**Lemma 1.** *[2] TNN is the convex envelop of the tensor average rank in the unit ball of tensor spectral norm* {<sup>T</sup> <sup>∈</sup> <sup>R</sup>*n*1×*n*2×*n*<sup>3</sup> |Tsp <sup>≤</sup> <sup>1</sup>}*.*

**Lemma 2.** *[2] The TNN and the tensor spectral norm are dual norms to each other.*

### *2.2. Related Works*

In this subsection, we briefly introduce some related works. The proposed STPCP is tightly related to the Tensor Robust Principal Component Analysis (TRPCA) which aims to recover a low-rank tensor L0 and a sparse tensor S0 from their sum M = L0 + S0. This is a special case of our measurement Model (1) where the noise tensor E0 is a zero tensor.

In [39], the SNN-based TRPCA model is proposed by modeling the underlying tensor as a low Tucker rank one

$$\min\_{\mathbf{L}\underline{\mathbf{S}}} \|\underline{\mathbf{L}}\|\_{\text{SNN}} + \|\underline{\mathbf{S}}\|\_1 \quad \text{s.t.} \ \underline{\mathbf{L}} + \underline{\mathbf{S}} = \underline{\mathbf{M}} \tag{14}$$

where SNN (Sum of Nuclear Norms) is defined as LSNN :<sup>=</sup> *<sup>K</sup>* ∑ *i*=1 *αk***L**(*k*)∗, where *α<sup>k</sup>* > 0 and **L**(*k*) is the mode-*k* matricization of L [40].

Model (14) indeed assumes the underlying tensor to be low Tucker rank, which can be too strong for some real tensor data. The TNN-based TRPCA model uses TNN to impose low-rankness in the final solution L as follows

$$\min\_{\sf L\sf S} \quad \|\sf L\|\_{\sf TNN} + \lambda \|\sf S\|\_1 \quad \text{s.t.} \quad \sf L + \sf S = \sf M. \tag{15}$$

As shown in [2], when the underlying tensor L0 satisfy the tensor incoherent conditions, by solving Problem (15), one can exactly recover the underlying tensor L0 and S0 with high probability with parameter *<sup>λ</sup>* <sup>=</sup> 1/max{*n*1, *<sup>n</sup>*2}*n*3.

When the noise tensor E0 is not zero, the robust tensor decomposition based on SNN is proposed in [36] as follows:

$$\min\_{\mathbf{L}\in\mathbb{S}} \frac{1}{2} \|\underline{\mathbf{M}} - \underline{\mathbf{L}} - \underline{\mathbf{S}}\|\_{\mathbf{F}} + \lambda\_1 \|\underline{\mathbf{L}}\|\_{\operatorname{SNN}} + \lambda\_2 \|\underline{\mathbf{S}}\|\_{1\prime} \tag{16}$$

where *λ*<sup>1</sup> and *λ*<sup>2</sup> are positive regularization parameters. The estimation error on L and S is analyzed with an upper bound in [36].

In [37], the TNN-based RTD model is proposed as follows:

$$\min\_{\mathsf{LLS}} \frac{1}{2} \|\underline{\mathbf{M}} - \underline{\mathbf{L}} - \underline{\mathbf{S}}\|\_{\mathsf{F}} + \lambda\_1 \|\underline{\mathbf{L}}\|\_{\mathsf{TNN}} + \lambda\_2 \|\underline{\mathbf{S}}\|\_{1\prime} \quad \text{s.t.} \|\underline{\mathbf{L}}\|\_{\infty} \le a,\tag{17}$$

where *α* is an upper estimate of *l*∞-norm of the underlying tensor L0. An upper bound on the estimation error is also established. However, in the analysis of Model (17), the error does not vanish as the noise tensor E0 vanishes which means the analysis cannot guarantee exact recovery in the noiseless setting (which can be provided by the analysis of TNN-based TRPCA (15) by Lu et al. [2]).

The Bayesian approach is also used for robust tensor recovery. The CP decomposition under sparse corruption and small dense noise is considered [41], and tensor rank estimation is achieved using Bayesian approach. In [42], CP decomposition under missing value and small dense noise is considered with rank estimation similar to [41]. A sparse Bayesian CP model is proposed in [43] to recover a tensor with missing value, outliers and noises. In [44], a fully Bayesian treatment is proposed to recover a low-tubal-rank tensor corrupted by both noises and outliers.

### **3. Theoretical Guarantee for Stable Tensor Principal Component Pursuit**

In this section, we formulate the proposed STPCP model and give the main theoretical result which upper bounds the estimation error and guarantees exact recovery in the noiseless setting.
