Randomized Average Kaczmarz Algorithm for Tensor Linear Systems

Bao, Wendi; Zhang, Feiyu; Li, Weiguo; Wang, Qin; Gao, Ying

doi:10.3390/math10234594

Open AccessArticle

Randomized Average Kaczmarz Algorithm for Tensor Linear Systems

by

Wendi Bao

^*,

Feiyu Zhang

,

Weiguo Li

,

Qin Wang

and

Ying Gao

College of Science, China University of Petroleum, Qingdao 266580, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(23), 4594; https://doi.org/10.3390/math10234594

Submission received: 9 October 2022 / Revised: 29 November 2022 / Accepted: 30 November 2022 / Published: 4 December 2022

(This article belongs to the Section Computational and Applied Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

For solving tensor linear systems under the tensor–tensor t-product, we propose the randomized average Kaczmarz (TRAK) algorithm, the randomized average Kaczmarz algorithm with random sampling (TRAKS), and their Fourier version, which can be effectively implemented in a distributed environment. We analyzed the relationships (of the updated formulas) between the original algorithms and their Fourier versions in detail and prove that these new algorithms can converge to the unique least F-norm solution of the consistent tensor linear systems. Extensive numerical experiments show that they significantly outperform the tensor-randomized Kaczmarz (TRK) algorithm in terms of both iteration counts and computing times and have potential in real-world data, such as video data, CT data, etc.

Keywords:

tensor linear system; randomized average Kaczmarz method; T-product; least-norm problem; Fourier domain

MSC:

65F10; 65F45; 65H10

1. Introduction

In this paper, we focus on computing the least F-norm solution for consistent tensor linear systems of the form

A * X = B,

(1)

where

A \in R^{N_{1} \times N_{2} \times N_{3}}

,

X \in R^{N_{2} \times K \times N_{3}}

, and

B \in R^{N_{1} \times K \times N_{3}}

are third-order tensors, and * is the t-product proposed by Kilmer and Martin in [1]. The problem has various applications, such as tensor neural networks [2], tensor dictionary learning [3], medical imaging [4], etc.

The randomized Kaczmarz (RK) algorithm is an iterative method for approximating solutions to linear systems of equations. Due to its simplicity and efficiency, the RK method has attracted widespread attention and has been widely developed in many applications, including ultrasound imaging [5] and seismic imaging [6]. Many developments [7,8,9,10,11,12,13,14] of the RK method were obtained, including block Kaczmarz methods. The block Kaczmarz methods [12,13,14], which utilize several rows of the coefficient matrix at each iterate, can be implemented more efficiently in many computer architectures. However, each iteration of the block Kaczmarz method needs to apply the pseudoinverse of the submatrix to a vector, which is expensive. To solve this problem, Necoara [10] proposed the randomized average block Kaczmarz (RABK) method, which utilizes a combination of several RK updates. This method can be implemented effectively in distributed computing units.

Recently, the RK algorithm was extended to solve the system of tensor equations [15,16,17,18]. Ma and Molitor extended the RK method to solve consistent tensor linear systems under the t-product and proposed a Fourier domain version in [15]. Li and Tang et al. [16] presented sketch-and-project methods for tensor linear systems with the pseudoinverse of some submatrix. In [17], Chen and Qin proposed the regularized Kaczmarz algorithm, which avoids the need for the calculation of the pseudoinverse for tensor recovery problems. Wang and Che, et al. [18] proposed the randomized Kaczmarz-like algorithm and its relaxed version to deal with the system of tensor equations with nonsingular coefficient tensors under general tensor–vector multiplication.

In this paper, inspired by the [10,17], we explore the randomized average Kaczmarz (TRAK) method, which is pseudoinverse-free and speeds up over the TRK method for solving tensor linear system (1). However, the entries of each block are determined, which have significant effects on the behavior of the TRAK method. Thus, we propose the tensor randomized average Kaczmarz algorithm with random sampling (TRAKS) which gives an optimized selection of the entries in each iteration. Meanwhile, considering that circulant matrices are diagonalized by the discrete Fourier transform (DFT), we discuss the Fourier domain versions of the TRAK and TRAKS methods. Their corresponding convergence analyses are proved. Numerical experiments are given to illustrate our theoretical results.

The rest of this paper is organized as follows. In Section 2, we introduce some notations and tensor basics. Then, we describe new algorithms and give their convergence theories in Section 3, Section 4 and Section 5. In Section 6, some numerical experiments are presented to illustrate our theoretical results. Finally, we propose a brief conclusion in Section 7.

2. Preliminaries

In this section, we provide clarification of notations, a brief review of fundamental concepts in tensor algebra and some existing algorithms.

2.1. Notation

Throughout this paper, we use calligraphic capital letters for tensors, capital letters for matrices, and lowercase letters for scalars. For any matrix A, we use

A^{T}

,

A^{H}

,

A^{†}

,

{∥ A ∥}_{F}

,

{∥ A ∥}_{2}

, and

σ_{m i n} (A^{T})

to denote the transpose, the conjugate transpose, the Moore–Penrose pseudoinverse, Frobenius norm, Euclidean norm, and the minimum nonzero singular values of A, respectively. For an integer m, let

[m] = {1, 2, \dots, m}

. For a set C, we define

| C |

as the cardinality of C. We denote

E [ψ]

and

E [ψ ∣ ζ]

as the expectation of

ψ

and the conditional expectation of

ψ

given

ζ

, respectively. By the law of total expectation, we have

E [E [ψ ∣ ζ]] = E [ψ]

.

2.2. Tensor Basics

In this subsection, we provide some key definitions and facts in tensor algebra, which can be found in [1,15,16,19,20].

For a third-order tensor

A \in C^{N_{1} \times N_{2} \times N_{3}}

, as done in [15,16], we denote its

(i, j, k)

entry as

A_{i, j, k}

and use

A_{i, :, :}

,

A_{:, j, :}

and

A_{:, :, k}

to denote the

i^{t h}

horizontal slice, the

j^{t h}

lateral slice and the

k^{t h}

frontal slice. To condense notation,

A_{k}

represents the

k^{t h}

frontal slice of

A

. We define the block circulant matrix

b c i r c (A)

of

A

as

b c i r c (A) = [\begin{matrix} A_{1} & A_{N_{3}} & \dots & A_{2} \\ A_{2} & A_{1} & \dots & A_{3} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ A_{N_{3}} & A_{N_{3} - 1} & \dots & A_{1} \end{matrix}] \in C^{N_{1} N_{3} \times N_{2} N_{3}} .

Definition 1

(DFT matrix). The

N \times N

DFT matrix is defined as

F_{N} = [\begin{matrix} 1 & 1 & 1 & \dots & 1 \\ 1 & ω & ω^{2} & \dots & ω^{N - 1} \\ 1 & ω^{2} & ω^{4} & \dots & ω^{2 (N - 1)} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & ω^{N - 1} & ω^{2 (N - 1)} & \dots & ω^{{(N - 1)}^{2}} \end{matrix}],

where

ω = e^{- \frac{2 π i}{N}}

.

The inverse matrix of the DFT matrix (IDFT)

F_{N}

is

F_{N}^{- 1} = \frac{F_{N}^{H}}{N}

.

Lemma 1

([1]). Suppose

A

is an

N_{1} \times N_{2} \times N_{3}

tensor and

F_{N_{3}}

is an

N_{3} \times N_{3}

DFT matrix. Then

b d i a g (\hat{A}) = \frac{(F_{N_{3}} \otimes I_{N_{1}}) b c i r c (A) (F_{N_{3}}^{H} \otimes I_{N_{2}})}{N_{3}} = [\begin{matrix} {(\hat{A})}_{1} \\ {(\hat{A})}_{2} \\ ⋱ \\ {(\hat{A})}_{N_{3}} \end{matrix}],

(2)

where “⊗” is Kronecker product,

I_{N}

denotes the

N \times N

identity matrix,

{(\hat{A})}_{k}

is the

k^{th}

frontal face of

\hat{A}

which is obtained by applying the DFT on

A

along the third dimension and

b d i a g (\hat{A})

is the block diagonal matrix formed by the frontal faces of

\hat{A}

.

We denote the operator

u n f o l d (\cdot)

and its inversion

f o l d (\cdot)

as

u n f o l d (A) = [\begin{matrix} A_{1} \\ A_{2} \\ ⋮ \\ A_{N_{3}} \end{matrix}] \in C^{N_{1} N_{3} \times N_{2}}, f o l d (u n f o l d (A)) = A .

Definition 2

(t-product [1]). The tensor–tensor t-product is defined as

A * B = f o l d (b c i r c (A) u n f o l d (B)) \in C^{N_{1} \times K \times N_{3}},

where

A \in C^{N_{1} \times N_{2} \times N_{3}}

and

B \in C^{N_{2} \times K \times N_{3}}

.

For

A \in C^{N_{1} \times N_{2} \times N_{3}}

and

B \in C^{N_{2} \times K \times N_{3}}

, it holds that

{(A * B)}_{i, :, :} = A_{i, :, :} * B,

(3)

and

A * B = \sum_{j = 1}^{N_{2}} A_{:, j, :} * B_{j, :, :} .

(4)

Definition 3

(conjugate transpose [15]). The conjugate transpose of a tensor

A \in C^{N_{1} \times N_{2} \times N_{3}}

is denoted as

A^{H}

and is produced by taking the conjugate transpose of all frontal slices and reversing the order of the frontal slices

2, \dots N_{3}

.

For

A \in C^{N_{1} \times N_{2} \times N_{3}}

, we have

{(A_{j, :, :})}^{H} = {(A^{H})}_{:, j, :} .

(5)

For

A \in C^{N_{1} \times N_{2} \times N_{3}}

, the block circulant operator

b c i r c (\cdot)

commutes with the conjugate transpose, this is

b c i r c (A^{H}) = b c i r c {(A)}^{H} .

(6)

For

A \in C^{N_{1} \times N_{2} \times N_{3}}

and

B \in C^{N_{2} \times K \times N_{3}}

, we have that

{(A * B)}^{H} = B^{H} * A^{H} .

(7)

Definition 4

(identity tensor). The identity tensor

I \in R^{N \times N \times N_{3}}

is the tensor whose first frontal slice is an

N \times N

identity matrix, and its other frontal slices are all zeros.

For

A \in R^{N_{1} \times N_{2} \times N_{3}}

, it holds that

A_{i, :, :} = I_{i, :, :} * A .

(8)

Definition 5

(Moore–Penrose inverse [20]). Let

X \in R^{N_{1} \times N_{2} \times N_{3}}

. If there exists

Y \in R^{N_{2} \times N_{1} \times N_{3}}

such that

X * Y * X = X, Y * X * Y = Y, {(X * Y)}^{T} = X * Y, {(Y * X)}^{T} = Y * X,

then

Y

is called the Moore–Penrose inverse of

X

and is denoted by

X^{†}

.

Definition 6

(inner product). The inner product between

A

and

B

in

C^{N_{1} \times N_{2} \times N_{3}}

is defined as

〈 A, B 〉 = \sum_{i, j, k} A_{i, j, k} \bar{B_{i, j, k}},

where

\bar{B_{i, j, k}}

is the conjugate of

B_{i, j, k}

.

For

A \in C^{N_{1} \times N_{2} \times N_{3}}

,

B \in C^{N_{2} \times K \times N_{3}}

and

C \in C^{N_{1} \times K \times N_{3}}

, it holds that

〈 A * B, C 〉 = 〈 B, A^{H} * C 〉 .

(9)

Definition 7

(spectral norm and Frobenius norm). The spectral norm and Frobenius norm of

A \in R^{N_{1} \times N_{2} \times N_{3}}

are defined as

{∥ A ∥}_{2} = {∥ b c i r c (A) ∥}_{2}

(10)

and

{∥ A ∥}_{F} = \sqrt{〈 A, A 〉} = \sqrt{\sum_{i, j, k} {(A_{i, j, k})}^{2}},

(11)

respectively.

For

A \in R^{N_{1} \times N_{2} \times N_{3}}

and

B \in R^{N_{2} \times K \times N_{3}}

, it holds that

{∥ A * B ∥}_{F} \leq {∥ A ∥}_{2} {∥ B ∥}_{F} .

(12)

Definition 8

(K-range space, K-null space [1]). For

A \in R^{N_{1} \times N_{2} \times N_{3}}

, define

r a n g e_{K} (A) = {A * Y ∣ Y \in R^{N_{2} \times K \times N_{3}}}, n u l l_{K} (A) = {X \in R^{N_{2} \times K \times N_{3}} ∣ A * X = 0} .

We easily know that

{(r a n g e_{K} (A^{T}))}^{⊥} = n u l l_{K} (A) .

For

A \in R^{N_{1} \times N_{2} \times N_{3}}

and all

X \in r a n g e_{K} (A)

, it holds that

∥ A^{T} {* X ∥}_{F}^{2} \geq σ_{m i n}^{2} (b c i r c (A)) {∥ X ∥}_{F}^{2} .

(13)

Theorem 1

([20]). Let

A \in R^{N_{1} \times N_{2} \times N_{3}}

,

X \in R^{N_{2} \times K \times N_{3}}

and

B \in R^{N_{1} \times K \times N_{3}}

. Then the minimum F-norm solution of a consistent system

A * X = B

is unique, and it is

A^{†} * B

.

Theorem 2.

Let

A * X = B

with

A \in R^{N_{1} \times N_{2} \times N_{3}}

,

X \in R^{N_{2} \times K \times N_{3}}

and

B \in R^{N_{1} \times K \times N_{3}}

be consistent. Then, there is only one solution to

A * X = B

in

r a n g e_{K} (A^{T})

, which is the minimum F-norm solution

A^{†} * B

.

Proof.

Suppose that

X_{1}, X_{2} \in r a n g e_{K} (A^{T})

are the solutions of

A * X = B

, then, we have that

A * X_{1} = B, A * X_{2} = B .

Note that the above equations can be written as

A * (X_{1} - X_{2}) = A * X_{1} - A * X_{2} = 0 .

Therefore,

X_{1} - X_{2} \in n u l l (A) = {(r a n g e_{K} (A^{T}))}^{⊥} .

By

X_{1}, X_{2} \in r a n g e_{K} (A^{T})

, we also know that

X_{1} - X_{2} \in r a n g e_{K} (A^{T}) .

Thus,

X_{1} - X_{2} = 0, that is X_{1} = X_{2} .

From Equation (7) and Definition 5, we can obtain

A^{†} * B = A^{†} * (A * X) = {(A^{†} * A)}^{T} * X = A^{T} * {(A^{†})}^{T} * X \in r a n g e_{K} (A^{T}) .

Finally, by Theorem 1, we obtain the desired result. □

2.3. Randomized Average Block Kaczmarz Algorithm

For computing the least-norm solution of the large consistent linear system

A x = b, (A \in R^{m \times n}, x \in R^{n} a n d b \in R^{m}),

Necoara [10] developed the randomized average block Kaczmarz (RABK) algorithm which projected the current iteration vector onto each individual row of the chosen submatrix and then averaged these projections with the weights. The updated formula of the RABK algorithm with a constant stepsize can be written as

x^{(k)} = x^{(k - 1)} - α (\sum_{i \in J_{i_{k}}} ω_{i} \frac{A_{i, :} x^{(k - 1)} - b_{i}}{∥ A_{i, :} ∥_{2}^{2}} A_{i, :}^{T}),

where the weights are chosen to satisfy

0 < ω_{i} < 1

for all i and sum to 1.

In each iteration, if we average obtained projections with the weights

\frac{∥ A_{i, :} ∥_{F}^{2}}{∥ A_{J_{i_{k}}, :} ∥_{F}^{2}}, i \in J_{i_{k}}

, we can obtain Algorithm 1. This algorithm avoids the need for each iterate of the randomized block Kaczmarz algorithm [14] to apply the pseudoinverse to a vector, which is cheaper.

Algorithm 1 Randomized average block Kaczmarz (RABK) algorithm

Input:: $A \in R^{m \times n}$ , $b \in R^{m}$ , $x_{0} \in R^{n}$ , stepsize $α > 0$ , maximum number of iterations M and partition of [m]: ${J_{i}}_{i = 1}^{s}$
Output:: last iterate $x^{(k)}$
1:: for $k = 1, 2, \dots, M$ do
2:: Pick $i_{k} \in [m]$ with $∥ A_{J_{i_{k}}, :} ∥_{F}^{2} / {∥ A ∥}_{F}^{2}$
3:: Set $x^{(k)} = x^{(k - 1)} - \frac{α}{∥ A_{J_{i_{k}}, :} ∥_{F}^{2}} {(A_{J_{i_{k}}, :})}^{T} (A_{J_{i_{k}}, :} x^{(k - 1)} - b_{J_{i_{k}}})$
4:: end for

2.4. Randomized Regularized Kaczmarz Algorithm

Chen and Qin [17] provided the randomized regularized Kaczmarz algorithm based on the RK algorithm for the consistent tensor recovery problem:

X^{*} = \underset{X \in R^{N_{2} \times K \times N_{3}}}{a r g m i n} f (X), s . t . A * X = B,

where the objective function f is strongly convex,

A \in R^{N_{1} \times N_{2} \times N_{3}}

and

B \in R^{N_{1} \times K \times N_{3}}

.

Let

f^{*} (X) = \underset{Z \in R^{N_{2} \times K \times N_{3}}}{s u p} {〈 X, Z 〉 - f (Z)}

and

\nabla f^{*} (X)

be the convex conjugate function of f and the gradient of

f^{*}

at

X \in R^{N_{2} \times K \times N_{3}}

, respectively. Then, the randomized regularized Kaczmarz algorithm is obtained as follows (Algorithm 2).

Algorithm 2 Randomized Regularized Kaczmarz Algorithm

Input:: $A \in R^{N_{1} \times N_{2} \times N_{3}}$ , $B \in R^{N_{1} \times K \times N_{3}}$ , stepsizes $α > 0$ and maximum number of iterations M
Output:: last iterate $X^{(k)}$
1:: Initialize: $Z^{(0)} \in r a n g e_{K} (A^{T})$ and $X^{(0)} = \nabla f^{*} (Z^{(0)})$
2:: for $k = 1, 2, \dots, M$ do
3:: Pick $i_{k} \in [N_{1}]$ with $∥ A_{i_{k}, :, :} ∥_{F}^{2} / {∥ A ∥}_{F}^{2}$
4:: Set $Z^{(k)} = Z^{(k - 1)} - \frac{α}{∥ A_{i_{k}, :, :} ∥_{F}^{2}} {(A_{i_{k}, :, :})}^{T} * (A_{i_{k}, :, :} * X^{(k - 1)} - B_{i_{k}, :, :})$
5:: $X^{(k)} = \nabla f^{*} (Z^{(k)})$
6:: end for

For the least F-norm problem of consistent tensor linear systems (1), we have

f (X) = \frac{1}{2} {∥ X ∥}_{F}^{2}, f^{*} (X) = \frac{1}{2} {∥ X ∥}_{F}^{2}, \nabla f^{*} (X) = X .

Algorithm 2 becomes a tensor-randomized Kaczmarz (TRK) algorithm for solving the problem (1) with the update

X^{(k)} = X^{(k - 1)} - \frac{α}{∥ A_{i_{k}, :, :} ∥_{F}^{2}} {(A_{i_{k}, :, :})}^{T} * (A_{i_{k}, :, :} * X^{(k - 1)} - B_{i_{k}, :, :}),

(14)

which is pseudoinverse-free and different from the algorithm proposed in [15]. Next, the new algorithms provided are based on the TRK update (14).

3. Tensor Randomized Average Kaczmarz Algorithm

In this section, similar to the RABK algorithm, the TRAK algorithm is designed to pursue the minimum F-norm solution of a consistent tensor linear systems (1), which takes a combination of several TRK updates, i.e.,

\begin{matrix} X^{(k)} = & X^{(k - 1)} - \sum_{i \in J_{i_{k}}} \frac{∥ A_{i, :, :} ∥_{F}^{2}}{∥ A_{J_{i_{k}}, :, :} ∥_{F}^{2}} α {(A_{i, :, :})}^{T} * \frac{A_{i,;, :} * X^{(k - 1)} - B_{i, :, :}}{∥ A_{i, :, :} ∥_{F}^{2}} \\ \overset{(5)}{=} & X^{(k - 1)} - \sum_{i \in J_{i_{k}}} α {(A^{T})}_{:, i, :} * \frac{A_{i,;, :} * X^{(k - 1)} - B_{i, :, :}}{∥ A_{J_{i_{k}}, :, :} ∥_{F}^{2}} \\ \overset{(4)}{=} & X^{(k - 1)} - α {(A^{T})}_{:, J_{i_{k}}, :} * \frac{A_{J_{i_{k}}, :, :} * X^{(k - 1)} - B_{J_{i_{k}}, :, :}}{∥ A_{J_{i_{k}}, :, :} ∥_{F}^{2}} \\ \overset{(5)}{=} & X^{(k - 1)} - α {(A_{J_{i_{k}}, :, :})}^{T} * \frac{A_{J_{i_{k}}, :, :} * X^{(k - 1)} - B_{J_{i_{k}}, :, :}}{∥ A_{J_{i_{k}}, :, :} ∥_{F}^{2}} . \end{matrix}

(15)

We give the method in Algorithm 3. We emphasize that the algorithm can be implemented on distributed computing units. If the number of partition for

[N_{1}]

is

N_{1}

, then the TRAK algorithm will reduce to the TRK algorithm.

Algorithm 3 Tensor randomized average Kaczmarz (TRAK) algorithm

Input:: $A \in R^{N_{1} \times N_{2} \times N_{3}}$ , $X^{(0)} \in r a n g e_{K} (A^{T}) \in R^{N_{2} \times K \times N_{3}}$ , $B \in R^{N_{1} \times K \times N_{3}}$ , stepsize $α > 0$ , maximum number of iterations M and partition of $[N_{1}]$ : ${J_{i}}_{i = 1}^{s}$
Output:: last iterate $X^{(k)}$
1:: for $k = 1, 2, \dots, M$ do
2:: Pick $i_{k} \in [s]$ with $∥ A_{J_{i_{k}}, :, :} ∥_{F}^{2} / {∥ A ∥}_{F}^{2}$
3:: $X^{(k)} = X^{(k - 1)} - \frac{α}{∥ A_{J_{i_{k}}, :, :} ∥_{F}^{2}} {(A_{J_{i_{k}}, :, :})}^{T} * (A_{J_{i_{k}}, :, :} * X^{(k - 1)} - B_{J_{i_{k}}, :, :})$
4:: end for

Remark 1.

Since

X^{(0)} \in r a n g e_{K} (A^{T})

and

\begin{matrix} X^{(k)} & = X^{(k - 1)} - \frac{α}{∥ A_{J_{i_{k}}, :, :} ∥_{F}^{2}} {(A_{J_{i_{k}}, :, :})}^{T} * (A_{J_{i_{k}}, :, :} * X^{(k - 1)} - B_{J_{i_{k}}, :, :}) \\ \overset{(8)}{=} X^{(k - 1)} - \frac{α}{∥ A_{J_{i_{k}}, :, :} ∥_{F}^{2}} {(I_{J_{i_{k}}, :, :} * A)}^{T} * (A_{J_{i_{k}}, :, :} * X^{(k - 1)} - B_{J_{i_{k}}, :, :}) \\ \overset{(7)}{=} X^{(k - 1)} - \frac{α}{∥ A_{J_{i_{k}}, :, :} ∥_{F}^{2}} A^{T} * {(I_{J_{i_{k}}, :, :})}^{T} * (A_{J_{i_{k}}, :, :} * X^{(k - 1)} - B_{J_{i_{k}}, :, :}), \end{matrix}

we obtain that

{X^{(k)}}_{k = 0}^{\infty} \in r a n g e_{K} (A^{T})

by induction. Therefore, Iif the iteration sequence generated by the TRAK algorithm converges to a solution

X^{*}

,

X^{*}

will be the least F-norm solution

A^{†} * B

by Theorem 2.

Next, we analyze the convergence of the TRAK algorithm with a constant stepsize.

Theorem 3.

Let the tensor linear system (1) be consistent. Assume that

{J_{i}}_{i = 1}^{s}

is a partition of

[N_{1}]

. Let

ξ = max_{i \in [s]} \frac{∥ A_{J_{i}, :, :} ∥_{2}^{2}}{∥ A_{J_{i}, :, :} ∥_{F}^{2}}

and

α \in (0, \frac{2}{ξ})

. Then the iteration sequence

{X^{(k)}}_{k = 0}^{\infty}

generated by the TRAK algorithm with

X^{(0)} \in r a n g e_{K} (A^{T})

converges in expectation to the unique least F-norm solution

X^{*} = A^{†} * B

. Moreover, the expected norm of solution error obeys

E ∥ X^{(k + 1)} - X^{*} ∥_{F}^{2} \leq {(1 - \frac{(2 α - α^{2} ξ) σ_{m i n}^{2} (b c i r c (A))}{{∥ A ∥}_{F}^{2}})}^{k + 1} E {∥ X^{(0)} - X^{*} ∥}_{F}^{2} .

Proof.

Subtracting

X^{*}

from both sides of the TRAK update given in Equation (15), we have that

X^{(k + 1)} - X^{*} = X^{(k)} - X^{*} - \frac{α}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} {(A_{J_{i_{k + 1}}, :, :})}^{T} * (A_{J_{i_{k + 1}}, :, :} * X^{(k)} - B_{J_{i_{k + 1}}, :, :}) .

To simplify notation, we use

E^{(k)} = X^{(k)} - X^{*}

. Then,

\begin{matrix} ∥ E^{(k + 1)} ∥_{F}^{2} & = ∥ E^{(k)} - \frac{α}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} {(A_{J_{i_{k + 1}}, :, :})}^{T} * (A_{J_{i_{k + 1}}, :, :} * E^{(k)}) ∥_{F}^{2} \\ \overset{(11)}{=} 〈 E^{(k)} - \frac{α}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} {(A_{J_{i_{k + 1}}, :, :})}^{T} * (A_{J_{i_{k + 1}}, :, :} * E^{(k)}), \\ E^{(k)} - \frac{α}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} {(A_{J_{i_{k + 1}}, :, :})}^{T} * (A_{J_{i_{k + 1}}, :, :} * E^{(k)}) 〉 \\ = 〈 E_{k}, E_{k} 〉 - 2 〈 E_{k}, \frac{α}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} {(A_{J_{i_{k + 1}}, :, :})}^{T} * (A_{J_{i_{k + 1}}, :, :} * E^{(k)}) 〉 \\ + 〈 \frac{α}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} {(A_{J_{i_{k + 1}}, :, :})}^{T} * (A_{J_{i_{k + 1}}, :, :} * E^{(k)}), \\ \frac{α}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} {(A_{J_{i_{k + 1}}, :, :})}^{T} * (A_{J_{i_{k + 1}}, :, :} * E^{(k)}) 〉 \\ = ∥ E_{k} ∥_{F}^{2} - \frac{2 α}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} 〈 E^{(k)}, {(A_{J_{i_{k + 1}}, :, :})}^{T} * (A_{J_{i_{k + 1}}, :, :} * E^{(k)}) 〉 \\ + \frac{α^{2}}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{4}} {∥ {(A_{J_{i_{k + 1}}, :, :})}^{T} * (A_{J_{i_{k + 1}}, :, :} * E^{(k)}) ∥}_{F}^{2} \\ \overset{(9)}{=} {∥ E^{(k)} ∥}_{F}^{2} - \frac{2 α}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} 〈 A_{J_{i_{k + 1}}, :, :} * E^{(k)}, A_{J_{i_{k + 1}}, :, :} * E^{(k)} 〉 \\ + \frac{α^{2}}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{4}} {∥ {(A_{J_{i_{k + 1}}, :, :})}^{T} * (A_{J_{i_{k + 1}}, :, :} * E^{(k)}) ∥}_{F}^{2} \\ \overset{(12)}{\leq} ∥ E^{(k)} ∥_{F}^{2} - \frac{2 α}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} {∥ A_{J_{i_{k + 1}}, :, :} * E^{(k)} ∥}_{F}^{2} \\ + \frac{α^{2}}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{4}} ∥ {(A_{J_{i_{k + 1}}, :, :})}^{T} ∥_{2}^{2} {∥ A_{J_{i_{k + 1}}, :, :} * E^{(k)} ∥}_{F}^{2} \\ \overset{(10), (6)}{=} ∥ E^{(k)} ∥_{F}^{2} - \frac{2 α}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} {∥ A_{J_{i_{k + 1}}, :, :} * E^{(k)} ∥}_{F}^{2} \\ + α^{2} \frac{∥ A_{J_{i_{k + 1}}, :, :} ∥_{2}^{2}}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} \frac{∥ A_{J_{i_{k + 1}}, :, :} * E^{(k)} ∥_{F}^{2}}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} . \end{matrix}

(16)

Since

ξ = max_{i \in [s]} \frac{∥ A_{J_{i}, :, :} ∥_{2}^{2}}{∥ A_{J_{i}, :, :} ∥_{F}^{2}}

, we can obtain

∥ E^{(k + 1)} ∥_{F}^{2} \leq {∥ E^{(k)} ∥}_{F}^{2} - (2 α - α^{2} ξ) \frac{∥ A_{J_{i_{k + 1}}, :, :} * E^{(k)} ∥_{F}^{2}}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} .

Taking the conditional expectation conditioned on

E^{(k)}

, we have

\begin{matrix} E [∥ E^{(k + 1)} ∥_{F}^{2} ∣ E^{(k)}] \\ \leq ∥ E^{(k)} ∥_{F}^{2} - (2 α - α^{2} ξ) E [\frac{∥ A_{J_{i_{k + 1}}, :, :} * E^{(k)} ∥_{F}^{2}}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} ∣ E^{(k)}] \\ = ∥ E^{(k)} ∥_{F}^{2} - (2 α - α^{2} ξ) \sum_{i_{k + 1} \in [s]} \frac{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}}{{∥ A ∥}_{F}^{2}} \frac{∥ A_{J_{i_{k + 1}}, :, :} * E^{(k)} ∥_{F}^{2}}{∥ A_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} \\ = ∥ E^{(k)} ∥_{F}^{2} - (2 α - α^{2} ξ) \sum_{i_{k + 1} \in [s]} \frac{∥ A_{J_{i_{k + 1}}, :, :} * E^{(k)} ∥_{F}^{2}}{{∥ A ∥}_{F}^{2}} \\ \overset{(3)}{=} {∥ E^{(k)} ∥}_{F}^{2} - (2 α - α^{2} ξ) \sum_{i_{k + 1} \in [s]} \frac{∥ {(A * E^{(k)})}_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}}{{∥ A ∥}_{F}^{2}} \\ = ∥ E^{(k)} ∥_{F}^{2} - (2 α - α^{2} ξ) \frac{∥ A * E^{(k)} ∥_{F}^{2}}{{∥ A ∥}_{F}^{2}} . \end{matrix}

By Remark 1 and Theorem 2, we have

E^{(k)} = X^{(k)} - X^{*} = X^{(k)} - A^{†} * B \in r a n g e_{K} (A^{T})

.

Therefore,

\begin{matrix} E [∥ E^{(k + 1)} ∥_{F}^{2} ∣ E^{(k)}] & \overset{(13)}{\leq} {∥ E^{(k)} ∥}_{F}^{2} - (2 α - α^{2} ξ) \frac{σ_{m i n}^{2} (b c i r c (A)) {∥ E^{(k)} ∥}_{F}^{2}}{{∥ A ∥}_{F}^{2}} \\ = (1 - \frac{(2 α - α^{2} ξ) σ_{m i n}^{2} (b c i r c (A))}{{∥ A ∥}_{F}^{2}}) {∥ E^{(k)} ∥}_{F}^{2} . \end{matrix}

By the law of total expectation, we have

E [∥ E^{(k + 1)} ∥_{F}^{2}] \leq (1 - \frac{(2 α - α^{2} ξ) σ_{m i n}^{2} (b c i r c (A))}{{∥ A ∥}_{F}^{2}}) E [∥ E^{(k)} ∥_{F}^{2}] .

Finally, unrolling the recurrence gives the desired result. □

When

N_{3} = 1

, the tensor

A \in R^{N_{1} \times N_{2} \times 1}

,

X \in R^{N_{2} \times K \times 1}

and

B \in R^{N_{1} \times K \times 1}

will degenerate to an

N_{1} \times N_{2}

matrix A, an

N_{2} \times K

matrix X and an

N_{1} \times K

matrix B, respectively. Problem (1) becomes a problem of solving the least F-norm solution for consistent matrix linear systems

A X = B,

(17)

where

A \in R^{N_{1} \times N_{2}}

,

X \in R^{N_{2} \times K}

and

B \in R^{N_{1} \times K}

.

Then, Algorithm 3 becomes Algorithm 4.

Algorithm 4 Matrix randomized average Kaczmarz (MRAK) algorithm

Input:: $A \in R^{N_{1} \times N_{2}}$ , $B \in R^{N_{1} \times K}$ , $X^{(0)} \in {r a n g e}_{K} (A^{T}) \in R^{N_{2} \times K}$ , stepsize $α > 0$ , maximum number of iterations M and partition of [ $N_{1}$ ]: ${J_{i}}_{i = 1}^{s}$
Output:: last iterate $X^{(k)}$
1:: for $k = 1, 2, \dots, M$ do
2:: Pick $i_{k} \in [N_{1}]$ with $∥ A_{J_{i_{k}}, :} ∥_{F}^{2} / {∥ A ∥}_{F}^{2}$
3:: Set $X^{(k)} = X^{(k - 1)} - \frac{α}{∥ A_{J_{i_{k}}, :} ∥_{F}^{2}} {(A_{J_{i_{k}}, :})}^{T} (A_{J_{i_{k}}, :} X^{(k - 1)} - B_{J_{i_{k}}, :})$
4:: end for

In this setting, Theorem 3 reduces to the following result.

Corollary 1.

Let the matrix linear system (17) be consistent. Assume that

{J_{i}}_{i = 1}^{s}

is a partition of

[N_{1}]

. Let

ξ = max_{i \in [s]} \frac{∥ A_{J_{i}, :} ∥_{2}^{2}}{∥ A_{J_{i}, :} ∥_{F}^{2}}

and

α \in (0, \frac{2}{ξ})

. Then the iteration sequence

{X^{(k)}}_{k = 0}^{\infty}

generated by the MRAK algorithm with

X^{(0)} \in {r a n g e}_{K} (A^{T})

converges in expectation to the unique least F-norm solution

X^{*} = A^{†} * B

. Moreover, the expected norm of solution error obeys

E ∥ X^{(k + 1)} - X^{*} ∥_{F}^{2} \leq {(1 - \frac{(2 α - α^{2} ξ) σ_{m i n}^{2} (A)}{{∥ A ∥}_{F}^{2}})}^{k + 1} E {∥ X^{(0)} - X^{*} ∥}_{F}^{2} .

4. Tensor Randomized Average Kaczmarz Algorithm with Random Sampling (TRAKS)

In Algorithm 3, we give the partition of

A

before the TRAK algorithm starts to run. In this setting, the entries of each block are determined in the iteration. However, the partition of

A

has a significant effect on the behavior of the TRAK method, as shown in [14]. Motivated by [21], we use a small portion of the horizontal slice of

A

to estimate the whole and propose the tensor randomized average Kaczmarz algorithm with random sampling (TRAKS).

In the TRAKS method, we first randomly sample from the population with normal distribution

N (μ, σ^{2})

. Next, in order to avoid unreasonable sampling that cannot estimate the whole well, we use “Z” test [22] to evaluate the results of each random sampling. Precisely, let

{ω_{1}, ω_{2}, \dots, ω_{β}}

be

β

random samples from all horizontal slices of

A

. Then, the significant difference between the samples and the population can be judged by the Z-score:

Z = \frac{\bar{ω} - μ}{s / \sqrt{β}},

where

μ = \frac{\sum_{i = 1}^{N_{1}} {∥ A_{i, :, :} ∥}_{F}^{2}}{N_{1}}

is the population mean,

\bar{ω} = \frac{\sum_{i = 1}^{β} {∥ ω_{i} ∥}_{F}^{2}}{β}

is the sample mean and

s = \sqrt{\frac{\sum_{i = 1}^{β} (∥ ω_{i} {∥_{F}^{2} - \bar{ω})}^{2}}{β}}

is the sample standard deviation. If

Z < 1.96

, the occurrence probability of significant difference will be no more than

5 %

and we accept the sampling, otherwise, we need to resample from the population. We list the TRAKS method in Algorithm 5.

Algorithm 5 Tensor randomized average Kaczmarz algorithm with random sampling (TRAKS)

Input:: $A \in R^{N_{1} \times N_{2} \times N_{3}}$ , $X^{(0)} \in r a n g e_{K} (A^{T}) \in R^{N_{2} \times K \times N_{3}}$ , $B \in R^{N_{1} \times K \times N_{3}}$ , stepsize $α > 0$ , $β$ and maximum number of iteration M
Output:: last iterate $X^{(k)}$
1:: Compute the population mean $μ = \frac{\sum_{i = 1}^{N_{1}} {∥ A_{i, :, :} ∥}_{F}^{2}}{N_{1}}$
2:: for $k = 1, 2, \dots, M$ do
3:: Randomly select $β$ horizontal slices of $A$ as samples, $A_{τ_{k}, :, :}$
4:: Compute $\bar{ω_{k}} = \frac{∥ A_{τ_{k}, :, :} ∥_{F}^{2}}{β}$ , $s_{k} = \sqrt{\frac{\sum_{i \in τ_{k}} (∥ A_{i, :, :} {∥_{F}^{2} - \bar{ω_{k}})}^{2}}{β}}$ , $Z_{k} = \frac{\bar{ω_{k}} - μ}{s_{k} / \sqrt{β}}$
5:: while $Z_{k} \geq 1.96$ do
6:: Randomly select $β$ horizontal slices of $A$ as samples, $A_{τ_{k}, :, :}$
7:: Calculate $\bar{ω_{k}} = \frac{∥ A_{τ_{k}, :, :} ∥_{F}^{2}}{β}$ , $s_{k} = \sqrt{\frac{\sum_{i \in τ_{k}} (∥ A_{i, :, :} {∥_{F}^{2} - \bar{ω_{k}})}^{2}}{β}}$ , $Z_{k} = \frac{\bar{ω_{k}} - μ}{s_{k} / \sqrt{β}}$
8:: end while
9:: $X^{(k)} = X^{(k - 1)} - \frac{α}{∥ A_{τ_{k}, :, :} ∥_{F}^{2}} {(A_{τ_{k}, :, :})}^{T} * (A_{τ_{k}, :, :} * X^{(k - 1)} - B_{τ_{k}, :, :})$
10:: end for

In order to prove the convergence of Algorithm 5, we need to prepare a lemma and a theorem firstly.

Lemma 2

([23]). If both

a = {a_{1}, a_{2}, \dots, a_{n}}

and

b = {b_{1}, b_{2}, \dots, b_{n}}

are two arrays with real components and satisfy

a_{j} \geq 0, b_{j} > 0, j \in {1, 2, \dots, n}

, then the following inequality is established

\sum_{j = 1}^{n} \frac{a_{j}}{b_{j}} \geq \frac{\sum_{j = 1}^{n} a_{j}}{\sum_{j = 1}^{n} b_{j}} .

(18)

Lemma 3

([22] (Chebyshev’s law of large numbers)). Suppose that

{z_{k}}_{n = 1}^{\infty}

is a series of independent random variables. They have expectation

E (z_{k})

and variance

D (z_{k})

, respectively. If there is a constant C such that

D (z_{k}) \leq C

, for any small positive number ϵ, we have

lim_{n \to \infty} P = {∣ \frac{1}{n} \sum_{k = 1}^{n} z_{k} - \frac{1}{n} \sum_{k = 1}^{n} E (z_{k}) ∣ < ϵ, \forall ϵ > 0} = 1 .

Lemma 3 indicates that if the sample size is large enough, the sample mean will approach to the population mean.

Next, we discuss the convergence of the TRAKS method.

Theorem 4.

Let the tensor linear system (1) be consistent. Assume that β is the size of the sample and

β_{1}

is the cardinality of the sample set accepted by the “Z” test. Let

ξ_{1} = max_{τ_{k} \subset [N_{1}], | τ_{k} | = β} \frac{∥ A_{τ_{k}, :, :} ∥_{2}^{2}}{∥ A_{τ_{k}, :, :} ∥_{F}^{2}}

,

α \in (0, \frac{2}{ξ_{1}})

and

0 \leq ϵ_{k}, \tilde{ϵ_{k}} ≪ 1

. Then the iteration sequence

{X^{(k)}}_{k = 0}^{\infty}

generated by the TRAKS algorithm with

X^{(0)} \in {r a n g e}_{K} (A^{T})

converges in expectation to the unique least F-norm solution

X^{*} = A^{†} * B

. Moreover, the expected norm of solution error obeys

E [∥ X^{(k + 1)} - X^{*} ∥_{F}^{2}] \leq (1 - \frac{(2 α - α^{2} ξ_{1}) σ_{m i n}^{2} (b c i r c (A)) (1 \pm ϵ_{k})}{β_{1} {∥ A ∥}_{F}^{2} (1 \pm \tilde{ϵ_{k}})}) E [∥ X^{(k)} - X^{*} ∥_{F}^{2}] .

Proof.

From the last line of (16) in the Proof of Theorem 3, we have that

∥ E^{(k + 1)} ∥_{F}^{2} \leq ∥ E^{(k)} ∥_{F}^{2} - \frac{2 α}{∥ A_{τ_{k}, :, :} ∥_{F}^{2}} {∥ A_{τ_{k}, :, :} * E^{(k)} ∥}_{F}^{2} + α^{2} \frac{∥ A_{τ_{k}, :, :} ∥_{2}^{2}}{∥ A_{τ_{k}, :, :} ∥_{F}^{2}} \frac{∥ A_{τ_{k}, :, :} * E^{(k)} ∥_{F}^{2}}{∥ A_{τ_{k}, :, :} ∥_{F}^{2}} .

Since

ξ_{1} = max_{τ_{k} \subset [N_{1}], | τ_{k} | = β} \frac{∥ A_{τ_{k}, :, :} ∥_{2}^{2}}{∥ A_{τ_{k}, :, :} ∥_{F}^{2}}

, we can obtain

∥ E^{(k + 1)} ∥_{F}^{2} \leq {∥ E^{(k)} ∥}_{F}^{2} - (2 α - α^{2} ξ_{1}) \frac{∥ A_{τ_{k}, :, :} * E^{(k)} ∥_{F}^{2}}{∥ A_{τ_{k}, :, :} ∥_{F}^{2}} .

Taking conditional expectation conditioned on

E^{(k)}

, we have that

\begin{matrix} E [∥ E^{(k + 1)} ∥_{F}^{2} ∣ E^{(k)}] \\ \leq ∥ E^{(k)} ∥_{F}^{2} - (2 α - α^{2} ξ_{1}) E [\frac{∥ A_{τ_{k}, :, :} * E^{(k)} ∥_{F}^{2}}{∥ A_{τ_{k}, :, :} ∥_{F}^{2}} ∣ E^{(k)}] . \end{matrix}

Assume that the sample set accepted by the “Z” test is C and

∣ C ∣ = β_{1}

, we can obtain that

\begin{matrix} E [∥ E^{(k + 1)} ∥_{F}^{2} ∣ E^{(k)}] \\ \leq ∥ E^{(k)} ∥_{F}^{2} - (2 α - α^{2} ξ_{1}) \frac{1}{β_{1}} \sum_{A_{τ_{k}, :, :} \in C} \frac{∥ A_{τ_{k}, :, :} * E^{(k)} ∥_{F}^{2}}{∥ A_{τ_{k}, :, :} ∥_{F}^{2}} \\ \overset{(18)}{\leq} {∥ E^{(k)} ∥}_{F}^{2} - (2 α - α^{2} ξ_{1}) \frac{1}{β_{1}} \frac{\sum_{A_{τ_{k}, :, :} \in C} {∥ A_{τ_{k}, :, :} * E^{(k)} ∥}_{F}^{2}}{\sum_{A_{τ_{k}, :, :} \in C} {∥ A_{τ_{k}, :, :} ∥}_{F}^{2}} \\ \overset{(3)}{=} {∥ E^{(k)} ∥}_{F}^{2} - (2 α - α^{2} ξ_{1}) \frac{1}{β_{1}} \frac{\sum_{A_{τ_{k}, :, :} \in C} {∥ {(A * E^{(k)})}_{τ_{k}, :, :} ∥}_{F}^{2}}{\sum_{A_{τ_{k}, :, :} \in C} {∥ A_{τ_{k}, :, :} ∥}_{F}^{2}} \\ = ∥ E^{(k)} ∥_{F}^{2} - (2 α - α^{2} ξ_{1}) \frac{1}{β_{1}} \frac{\sum_{A_{τ_{k}, :, :} \in C} \frac{∥ {(A * E^{(k)})}_{τ_{k}, :, :} ∥_{F}^{2}}{β}}{\sum_{A_{τ_{k}, :, :} \in C} \frac{∥ A_{τ_{k}, :, :} ∥_{F}^{2}}{β}} \end{matrix}

(19)

According to Lemma 3, when

N_{1}

is large enough and

β

is sufficiently large, there exist

0 \leq ϵ_{k}, \tilde{ϵ_{k}} ≪ 1

, such that

\frac{∥ {(A * E^{(k)})}_{τ_{k}, :, :} ∥_{F}^{2}}{β} = \frac{∥ A * E^{(k)} ∥_{F}^{2}}{N_{1}} (1 \pm ϵ_{k})

(20)

and

\frac{∥ A_{τ_{k}, :, :} ∥_{F}^{2}}{β} = \frac{{∥ A ∥}_{F}^{2}}{N_{1}} (1 \pm \tilde{ϵ_{k}}) .

(21)

Combining (19), (20), and (21), we have that

E [∥ E^{(k + 1)} ∥_{F}^{2} ∣ E^{(k)}] \leq ∥ E^{(k)} ∥_{F}^{2} - (2 α - α^{2} ξ_{1}) \frac{1}{β_{1}} \frac{∥ A * E^{(k)} ∥_{F}^{2} (1 \pm ϵ_{k})}{{∥ A ∥}_{F}^{2} (1 \pm \tilde{ϵ_{k}})} .

By Remark 1 and Theorem 2, we have

E^{(k)} = X^{(k)} - X^{*} = X^{(k)} - A^{†} * B \in r a n g e_{K} (A^{T})

.

Therefore,

\begin{matrix} E [∥ E^{(k + 1)} ∥_{F}^{2} ∣ E^{(k)}] & \overset{(13)}{\leq} {∥ E^{(k)} ∥}_{F}^{2} - (2 α - α^{2} ξ_{1}) \frac{1}{β_{1}} \frac{σ_{m i n}^{2} (b c i r c (A)) {∥ E^{(k)} ∥}_{F}^{2} (1 \pm ϵ_{k})}{{∥ A ∥}_{F}^{2} (1 \pm \tilde{ϵ_{k}})} \\ = (1 - \frac{(2 α - α^{2} ξ_{1}) σ_{m i n}^{2} (b c i r c (A)) (1 \pm ϵ_{k})}{β_{1} {∥ A ∥}_{F}^{2} (1 \pm \tilde{ϵ_{k}})}) {∥ E^{(k)} ∥}_{F}^{2} . \end{matrix}

By the law of total expectation, we have

E [∥ E^{(k + 1)} ∥_{F}^{2}] \leq (1 - \frac{(2 α - α^{2} ξ_{1}) σ_{m i n}^{2} (b c i r c (A)) (1 \pm ϵ_{k})}{β_{1} {∥ A ∥}_{F}^{2} (1 \pm \tilde{ϵ_{k}})}) E [∥ E^{(k)} ∥_{F}^{2}] .

□

For solving (17), Algorithm 5 becomes Algorithm 6.

Algorithm 6 Matrix randomized average Kaczmarz algorithm with random sampling (MRAKS)

Input:: $A \in R^{N_{1} \times N_{2}}$ , $X^{(0)} \in r a n g e_{K} (A^{T}) \in R^{N_{2} \times K}$ , $B \in R^{N_{1} \times K}$ , stepsize $α > 0$ , $β$ and maximum number of iteration M
Output:: last iterate $X^{(k)}$
1:: Compute the population mean $μ = \frac{\sum_{i = 1}^{N_{1}} {∥ A_{i, :} ∥}_{F}^{2}}{N_{1}}$
2:: for $k = 1, 2, \dots, M$ do
3:: Randomly select $β$ rows of A as samples, $A_{τ_{k}, :}$
4:: Compute $\bar{ω_{k}} = \frac{∥ A_{τ_{k}, :, :} ∥_{F}^{2}}{β}$ , $s_{k} = \sqrt{\frac{\sum_{i \in τ_{k}} (∥ A_{i, :, :} {∥_{F}^{2} - \bar{ω_{k}})}^{2}}{β}}$ , $Z_{k} = \frac{\bar{ω_{k}} - μ}{s_{k} / \sqrt{β}}$
5:: while $Z_{k} \geq 1.96$ do
6:: Randomly select $β$ rows of A as samples, $A_{τ_{k}, :}$
7:: Calculate $\bar{ω_{k}} = \frac{∥ A_{τ_{k}, :, :} ∥_{F}^{2}}{β}$ , $s_{k} = \sqrt{\frac{\sum_{i \in τ_{k}} (∥ A_{i, :, :} {∥_{F}^{2} - \bar{ω_{k}})}^{2}}{β}}$ , $Z_{k} = \frac{\bar{ω_{k}} - μ}{s_{k} / \sqrt{β}}$
8:: end while
9:: $X^{(k)} = X^{(k - 1)} - \frac{α}{∥ A_{τ_{k}, :} ∥_{F}^{2}} {(A_{τ_{k}, :})}^{T} (A_{τ_{k}, :} X^{(k - 1)} - B_{τ_{k}, :})$
10:: end for

Theorem 4 reduces to the following result.

Corollary 2.

Let the matrix linear system (17) be consistent. Assume that β is the size of the sample and

β_{1}

is the cardinality of the sample set accepted by the “Z” test. Let

ξ_{1} = max_{τ_{k} \subset [N_{1}], | τ_{k} | = β} \frac{∥ A_{τ_{k}, :} ∥_{2}^{2}}{∥ A_{τ_{k}, :} ∥_{F}^{2}}

,

α \in (0, \frac{2}{ξ_{1}})

and

0 \leq ϵ_{k}, \tilde{ϵ_{k}} ≪ 1

. Then the iteration sequence

{X^{(k)}}_{k = 0}^{\infty}

generated by the MRAKS algorithm with

X^{(0)} \in {r a n g e}_{K} (A^{T})

converges in expectation to the unique least F-norm solution

X^{*} = A^{†} * B

. Moreover, the expected norm of solution error obeys

E [∥ X^{k + 1} - X^{*} ∥_{F}^{2}] \leq (1 - \frac{(2 α - α^{2} ξ_{1}) σ_{m i n}^{2} (A) (1 \pm ϵ_{k})}{β_{1} {∥ A ∥}_{F}^{2} (1 \pm \tilde{ϵ_{k}})}) E [∥ X^{k} - X^{*} ∥_{F}^{2}] .

5. The Fourier Version of the Algorithms

We first introduce two additional notations that are widely used in this subsection. For

X \in R^{N_{1} \times N_{2} \times N_{3}}

,

f f t (X, [], 3)

denotes the mode-3 fast Fourier transform of

X

, which can also be written as

f f t (X, [], 3) = f o l d ((F_{N_{3}} \otimes I_{N_{1}}) u n f o l d (X))

(

F_{N_{3}}

is defined in Definition 1.). Similarly,

i f f t (X, [], 3)

denotes the mode-3 inverse fast Fourier transform of

X

and

i f f t (X, [], 3) = f o l d ((\frac{F_{N_{3}}^{H}}{N_{3}} \otimes I_{N_{1}}) u n f o l d (X))

. By Definition 1, we easily know that

i f f t (f f t (X, [], 3)) =

f f t (i f f t (X, [], 3)) = X

. Furthermore, for the convenience of viewing, we introduce the tensor operator “bdiag” again. For

X \in R^{N_{1} \times N_{2} \times N_{3}}

,

b d i a g (X)

is the block diagonal matrix formed by the frontal faces of

X

, i.e.,

b d i a g (X) = [\begin{matrix} X_{1} \\ X_{2} \\ ⋱ \\ X_{N_{3}} \end{matrix}] .

By Lemma 1, the tensor linear system

A * X = B (A \in R^{N_{1} \times N_{2} \times N_{3}}, X \in R^{N_{2} \times K \times N_{3}}

a n d B \in R^{N_{1} \times K \times N_{3}})

can be reformulated as

[\begin{matrix} {(\hat{A})}_{1} \\ {(\hat{A})}_{2} \\ ⋱ \\ {(\hat{A})}_{N_{3}} \end{matrix}] [\begin{matrix} {(\hat{X})}_{1} \\ {(\hat{X})}_{2} \\ ⋮ \\ {(\hat{X})}_{N_{3}} \end{matrix}] = [\begin{matrix} {(\hat{B})}_{1} \\ {(\hat{B})}_{2} \\ ⋮ \\ {(\hat{B})}_{N_{3}} \end{matrix}],

(22)

where

{(\hat{A})}_{k}

,

{(\hat{X})}_{k}

and

{(\hat{B})}_{k}

are the frontal slices of

\hat{A} = f o l d ((F_{N_{3}} \otimes I_{N_{1}}) u n f o l d (A))

,

\hat{X} = f o l d ((F_{N_{3}} \otimes I_{N_{2}}) u n f o l d (X))

and

\hat{B} = f o l d ((F_{N_{3}} \otimes I_{N_{1}}) u n f o l d (B))

, respectively (

F_{N_{3}}

is an

N_{3} \times N_{3}

DFT matrix and is defined in Definition 1.)

Before we present the algorithms in the Fourier domain, we provide a key theorem.

Theorem 5.

Let

J_{i} \subset [N_{1}] (i = 1, 2, \dots)

and

τ_{i} = {(t - 1) N_{1} + J_{i} ∣ t = 1, 2, \dots, N_{3}} \subset [N_{1} \times N_{3}] (i = 1, 2, \dots)

be the set of horizontal slice indicator in

A

and the set of row indicator in

b d i a g (A)

, respectively. Assume that the sequence

{u n f o l d ({\hat{X}}^{(k)})}_{k = 0}^{\infty}

is generated by

u n f o l d ({\hat{X}}^{(k)}) = u n f o l d ({\hat{X}}^{(k - 1)}) - \frac{α}{∥ {(b d i a g (\hat{A}))}_{τ_{k}, :} ∥_{F}^{2}} {({(b d i a g (\hat{A}))}_{τ_{k}, :})}^{H} ({(b d i a g (\hat{A}))}_{τ_{k}, :}

u n f o l d ({\hat{X}}^{(k - 1)}) - {(u n f o l d (\hat{B}))}_{τ_{k}, :})

with

X^{(0)} \in r a n g e_{K} (A^{T}) \in R^{N_{2} \times K \times N_{3}}

and

u n f o l d ({\hat{X}}^{(0)}) = u n f o l d (f f t (X^{(0)}, [], 3))

for solving Equation (22). Then, for tensor linear systems (1), the iteration scheme of the TRAK and TRAKS algorithms in the Fourier domain is

{(\hat{X})}_{t}^{(k + 1)} = {(\hat{X})}_{t}^{(k)} - \frac{α}{∥ {(\hat{A})}_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} {({(\hat{A})}_{J_{i_{k + 1}}, :, t})}^{H} ({(\hat{A})}_{J_{i_{k + 1}}, :, t} {(\hat{X})}_{t}^{(k)} - {(\hat{B})}_{J_{i_{k + 1}}, :, t}),

(23)

for

t = 1, 2, \dots, N_{3} .

Moreover, it holds that

i f f t ({\hat{X}}^{(k)}, [], 3) \in R^{N_{2} \times K \times N_{3}}, f o r k \in [0, \infty) .

Proof.

\begin{matrix} u n f o l d ({\hat{X}}^{(k + 1)}) \\ = & u n f o l d ({\hat{X}}^{(k)}) - \frac{α}{∥ {(b d i a g (\hat{A}))}_{τ_{i_{k + 1}}, :} ∥_{F}^{2}} {({(b d i a g (\hat{A}))}_{τ_{i_{k + 1}}, :})}^{H} \\ ({(b d i a g (\hat{A}))}_{τ_{i_{k + 1}}, :} u n f o l d ({\hat{X}}^{(k)}) - {(u n f o l d (\hat{B}))}_{τ_{i_{k + 1}}, :}) \\ = & u n f o l d ({\hat{X}}^{(k)}) - \frac{α}{∥ {(\hat{A})}_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} {(b d i a g ({(\hat{A})}_{J_{i_{k + 1}}, :, :}))}^{H} \\ (b d i a g ({(\hat{A})}_{J_{i_{k + 1}}, :, :}) u n f o l d ({\hat{X}}^{(k)}) - u n f o l d ({(\hat{B})}_{J_{i_{k + 1}, :, :}})), \end{matrix}

(24)

where the second equality follows from

{(b d i a g (\hat{A}))}_{τ_{i_{k + 1}}, :} = b d i a g ({(\hat{A})}_{J_{i_{k + 1}}, :, :}),

∥ b d i a g ({(\hat{A})}_{J_{i_{k + 1}}, :, :}) ∥_{F}^{2} = {∥ {(\hat{A})}_{J_{i_{k + 1}}, :, :} ∥}_{F}^{2}

and

{(u n f o l d (\hat{B}))}_{τ_{i_{k + 1}, :}} = u n f o l d ({(\hat{B})}_{J_{i_{k + 1}}, :, :})

.

With the use of the block-diagonal structure, Equation (24) can be reformulated as

{(\hat{X})}_{t}^{(k + 1)} = {(\hat{X})}_{t}^{(k)} - \frac{α}{∥ {(\hat{A})}_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} {({(\hat{A})}_{J_{i_{k + 1}}, :, t})}^{H} ({(\hat{A})}_{J_{i_{k + 1}}, :, t} {(\hat{X})}_{t}^{(k)} - {(\hat{B})}_{J_{i_{k + 1}}, :, t}),

for

t = 1, 2, \dots, N_{3} .

Since

u n f o l d (\hat{A}) = (F_{N_{3}} \otimes I_{N_{1}}) u n f o l d (A)

and

u n f o l d (\hat{B}) = (F_{N_{3}} \otimes I_{N_{1}}) u n f o l d (B)

, we can generate that

{(\hat{A})}_{J_{i_{k + 1}}, :, t} = {(\hat{A_{J_{i_{k + 1}, :, :}}})}_{t}, for t \in [N_{3}]

and

u n f o l d ({(\hat{B})}_{J_{i_{k + 1}, :, :}})) = (F_{N_{3}} \otimes I_{| J_{i_{k + 1}} |}) u n f o l d (B_{J_{i_{k + 1}}, :, :}),

(25)

where

| J_{i_{k + 1}} |

is the cardinality of

J_{i_{k + 1}}

.

Therefore, it holds that

\begin{matrix} b d i a g ({(\hat{A})}_{J_{i_{k + 1}}, :, :}) & = b d i a g (\hat{A_{J_{i_{k + 1}, :, :}}}) \\ \overset{(2)}{=} \frac{(F_{N_{3}} \otimes I_{| J_{i_{k + 1}} |}) b c i r c (A_{J_{i_{k + 1}, :, :}}) (F_{N_{3}}^{H} \otimes I_{N_{2}})}{N_{3}} . \end{matrix}

(26)

Combining (24), (25), and (26), we have

\begin{matrix} u n f o l d ({\hat{X}}^{(k + 1)}) \\ = & u n f o l d ({\hat{X}}^{(k)}) - \frac{α}{∥ {(\hat{A})}_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} (F_{N_{3}} \otimes I_{N_{2}}) {(b c i r c (A_{J_{i_{k + 1}}, :, :}))}^{T} \\ (\frac{b c i r c (A_{J_{i_{k + 1}}, :, :}) (F_{N_{3}}^{H} \otimes I_{N_{2}}) u n f o l d ({\hat{X}}^{(k)})}{N_{3}} - u n f o l d (B_{J_{i_{k + 1}}, :, :})) . \end{matrix}

(27)

Since

u n f o l d ({\hat{X}}^{(0)}) = (F_{N_{3}} \otimes I_{N_{2}}) u n f o l d (X^{(0)})

and

X^{(0)} \in R^{N_{2} \times K \times N_{3}}

, with the recurrence Formula (27) it holds that

\frac{F_{N_{3}}^{H} \otimes I_{N_{2}}}{N_{3}} u n f o l d ({\hat{X}}^{(k)}) \in R^{N_{2} \times K \times N_{3}} .

This completes the proof. □

By the above analysis, we propose the Fourier version of the TRAK and TRAKS algorithms, i.e., Algorithms 7 and 8.

Algorithm 7 TRAK algorithm in the Fourier domain (

{TRAK}_{F}

)

Input:: $A \in R^{N_{1} \times N_{2} \times N_{3}}$ , $X^{(0)} \in r a n g e_{K} (A^{T}) \in R^{N_{2} \times K \times N_{3}}$ , $B \in R^{N_{1} \times K \times N_{3}}$ , stepsize $α$ , maximum number of iterations M and partition of $[N_{1}]$ : ${J_{i}}_{i = 1}^{s}$
Output:: last iterate $X^{(k)}$
1:: $\hat{A} \Leftarrow f f t (A, [], 3)$ , $\hat{X^{(0)}} \Leftarrow f f t (X^{(0)}, [], 3)$ , $\hat{B} \Leftarrow f f t (B, [], 3)$
2:: for $k = 1, 2, \dots, M$ do
3:: Pick $i_{k} \in [s]$ with $∥ {\hat{A}}_{J_{i_{k}}, :, :} ∥_{F}^{2} / {∥ \hat{A} ∥}_{F}^{2}$
4:: for $t = 1, 2, \dots, N_{3}$ do
5:: ${(\hat{X})}_{t}^{(k)} = {(\hat{X})}_{t}^{(k - 1)} - \frac{α}{∥ {(\hat{A})}_{J_{i_{k}}, :, :} ∥_{F}^{2}} {({(\hat{A})}_{J_{i_{k}}, :, t})}^{H} ({(\hat{A})}_{J_{i_{k}}, :, t} {(\hat{X})}_{t}^{(k - 1)} - {(\hat{B})}_{J_{i_{k}}, :, t})$
6:: end for
7:: end for
8:: $X^{(k)} \Leftarrow i f f t ({(\hat{X})}^{(k)}, [], 3)$

Algorithm 8 TRAKS algorithm in the Fourier domain (

{TRAKS}_{F}

)

Input:: $A \in R^{N_{1} \times N_{2} \times N_{3}}$ , $X^{(0)} \in r a n g e_{K} (A^{T}) \in R^{N_{2} \times K \times N_{3}}$ , $B \in R^{N_{1} \times K \times N_{3}}$ , stepsize $α > 0$ , $β$ and maximum number of iterations M
Output:: last iterate $X^{(k)}$
1:: $\hat{A} \Leftarrow f f t (A, [], 3)$ , $\hat{X^{(0)}} \Leftarrow f f t (X^{(0)}, [], 3)$ , $\hat{B} \Leftarrow f f t (B, [], 3)$
2:: Compute the population mean $μ = \frac{\sum_{i = 1}^{N_{1}} {∥ {\hat{A}}_{i, :, :} ∥}_{F}^{2}}{N_{1}}$
3:: for $k = 1, 2, \dots, M$ do
4:: Randomly select $β$ horizontal slices of $\hat{A}$ as samples, ${\hat{A}}_{τ_{k}, :, :}$
5:: Compute $\bar{ω_{k}} = \frac{∥ {\hat{A}}_{τ_{k}, :, :} ∥_{F}^{2}}{β}$ , $s_{k} = \sqrt{\frac{\sum_{i \in τ_{k}} (∥ {\hat{A}}_{i, :, :} {∥_{F}^{2} - \bar{ω_{k}})}^{2}}{β}}$ , $Z_{k} = \frac{\bar{ω_{k}} - μ}{s_{k} / \sqrt{β}}$
6:: while $Z_{k} \geq 1.96$ do
7:: Randomly select $β$ horizontal slices of $\hat{A}$ as samples, ${\hat{A}}_{τ_{k}, :, :}$
8:: Calculate $\bar{ω_{k}} = \frac{∥ {\hat{A}}_{τ_{k}, :, :} ∥_{F}^{2}}{β}$ , $s_{k} = \sqrt{\frac{\sum_{i \in τ_{k}} (∥ {\hat{A}}_{i, :, :} {∥_{F}^{2} - \bar{ω_{k}})}^{2}}{β}}$ , $Z_{k} = \frac{\bar{ω_{k}} - μ}{s_{k} / \sqrt{β}}$
9:: end while
10:: for $t = 1, 2, \dots, N_{3}$ do
11:: ${(\hat{X})}_{t}^{(k)} = {(\hat{X})}_{t}^{(k - 1)} - \frac{α}{∥ {(\hat{A})}_{τ_{k}, :, :} ∥_{F}^{2}} {({(\hat{A})}_{τ_{k}, :, t})}^{H} ({(\hat{A})}_{τ_{k}, :, t} {(\hat{X})}_{t}^{(k - 1)} - {(\hat{B})}_{τ_{k}, :, t})$
12:: end for
13:: end for
14:: $X^{(k)} \Leftarrow i f f t ({(\hat{X})}^{(k)}, [], 3)$

Remark 2.

Theorem 5 shows that the sequence generated by the iteration scheme (23) can be transformed into real space after implementing a mode-3 inverse fast Fourier transform of the obtained sequence, which guarantees that Algorithms 7 and 8 work well for real-valued tensor linear systems (1).

By multiplying the IDFT matrix (Definition 1) and folding on both sides of (27), we obtain

\begin{matrix} X^{(k + 1)} & = X^{(k)} - \frac{α}{∥ {(\hat{A})}_{J_{i_{k + 1}}, :, :} ∥_{F}^{2}} {(A_{J_{i_{k + 1}}, :, :})}^{T} * (A_{J_{i_{k + 1}}, :, :} * X^{(k)} - B_{J_{i_{k + 1}}, :, :}) \\ = X^{(k)} - \frac{α}{N_{3} {∥ A_{J_{i_{k + 1}}, :, :} ∥}_{F}^{2}} {(A_{J_{i_{k + 1}}, :, :})}^{T} * (A_{J_{i_{k + 1}}, :, :} * X^{(k)} - B_{J_{i_{k + 1}}, :, :}) . \end{matrix}

(28)

If we select

α_{NoFourier} = \frac{α_{Fourier}}{N_{3}}

, where

α_{NoFourier} a n d α_{Fourier}

are the stepsizes of the algorithms that are not in the Fourier domain and stepsizes of the algorithms in the Fourier domain, respectively, Equation (28) will be similar to the update (15) of the algorithms that are not in the Fourier domain. However, the

{TRAK}_{F}

and

{TRAKS}_{F}

algorithm make use of the block-diagonal structure, which can be implemented more efficiently than the TRAK and TRAKS algorithms. Moreover, the

{TRAK}_{F}

and

{TRAKS}_{F}

algorithms can be computed in parallel well.

Taking advantage of the relationship between the TRAK and

{TRAK}_{F}

updates and the TRAKS and

{TRAKS}_{F}

updates, the convergence guarantees of the

{TRAK}_{F}

and

{TRAKS}_{F}

algorithms are generated as follows.

Theorem 6.

Assume that the tensor linear system (1) is consistent. Assume that

{J_{i}}_{i = 1}^{s}

is a partition of

[N_{1}]

. Let

ξ = max_{i \in [s]} \frac{∥ A_{J_{i}, :, :} ∥_{2}^{2}}{∥ A_{J_{i}, :, :} ∥_{F}^{2}}

and

α \in (0, \frac{2 N_{3}}{ξ})

. Then the iteration sequence

{X^{(k)}}_{k = 0}^{\infty}

generated by the

T R A K_{F}

algorithm with

X^{(0)} \in {r a n g e}_{K} (A^{T})

converges in expectation to the unique least F-norm solution

X^{*} = A^{†} * B

. Moreover, the expected norm of the solution error obeys

E ∥ X^{(k + 1)} - X^{*} ∥_{F}^{2} \leq {(1 - \frac{(2 α N_{3} - α^{2} ξ) σ_{m i n}^{2} (b c i r c (A))}{{(N_{3})}^{2} {∥ A ∥}_{F}^{2}})}^{k + 1} E {∥ X^{(0)} - X^{*} ∥}_{F}^{2} .

Theorem 7.

Let the tensor linear system (1) be consistent. Assume that β is the size of the sample and

β_{1}

is the cardinality of the sample set accepted by the “Z" test. Let

ξ_{1} = max_{τ_{k} \subset [N_{1}], | τ_{k} | = β} \frac{∥ A_{τ_{k}, :, :} ∥_{2}^{2}}{∥ A_{τ_{k}, :, :} ∥_{F}^{2}}

,

α \in (0, \frac{2 N_{3}}{ξ_{1}})

and

0 \leq ϵ_{k}, \tilde{ϵ_{k}} ≪ 1

. Then the iteration sequence

{X^{(k)}}_{k = 0}^{\infty}

generated by the

{TRAKS}_{F}

algorithm with

X^{(0)} \in {r a n g e}_{K} (A^{T})

converges in expectation to the unique least F-norm solution

X^{*} = A^{†} * B

. Moreover, the expected norm of solution error obeys

E [∥ X^{(k + 1)} - X^{*} ∥_{F}^{2}] \leq (1 - \frac{(2 α N_{3} - α^{2} ξ_{1}) σ_{m i n}^{2} (b c i r c (A)) (1 \pm ϵ_{k})}{{(N_{3})}^{2} β_{1} {∥ A ∥}_{F}^{2} (1 \pm \tilde{ϵ_{k}})}) E [∥ X^{(k)} - X^{*} ∥_{F}^{2}] .

6. Numerical Experiments

In this section, we compare the performances of the TRK, TRAK, TRAKS,

{TRAK}_{F}

and

{TRAKS}_{F}

algorithms in some numerical examples. The tensor t-product toolbox [24] is used in our experiments. The stopping criterion is

RSE = \frac{∥ X^{(k)} - A^{†} {* B ∥}_{F}^{2}}{∥ A^{†} {* B ∥}_{F}^{2}} = \frac{\hat{∥ X^{(k)}} - \hat{A^{†} * B} ∥_{F}^{2}}{\hat{∥ A^{†} * B} ∥_{F}^{2}} < tolerance

or the number of iteration steps exceeds

10^{5}

. In our implementations, all computations are started from the point

X^{(0)} = 0

. IT and CPU(s) mean the medians of the required iteration steps and the elapsed CPU times with respect to 10 times of the repeated runs of the corresponding method, respectively. In the following experiments, we use the structural similarity index (SSIM) between two images X and Y to evaluate the recovered image quality, which is defined as

S S I M = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})},

where

μ_{x}

and

σ_{x}

are the mean and standard deviation of image X,

σ_{x y}

is the cross-covariance between X and Y, and

C_{1}

and

C_{2}

are the luminance and contrast constants. SSIM values can be obtained by using the MATLAB function SSIM.

All experiments were carried out using MATLAB R2020b on a laptop with an Intel Core i7 processor, 16GB memory, and Windows 11 operating system.

For the TRAK and

{TRAK}_{F}

algorithms, we consider a partition of

[N_{1}]

, which is proposed in [14]:

\begin{matrix} J_{i} & = {ϖ (k) : k = (i - 1) ⌊ \frac{N_{1}}{s} ⌋ + 1, (i - 1) ⌊ \frac{N_{1}}{s} ⌋ + 2, \dots, i ⌊ \frac{N_{1}}{s} ⌋}, i \in [s - 1], \\ J_{s} & = {ϖ (k) : k = (s - 1) ⌊ \frac{N_{1}}{s} ⌋ + 1, (s - 1) ⌊ \frac{N_{1}}{s} ⌋ + 2, \dots, N_{1}}, \end{matrix}

where

ϖ

is a permutation on

[N_{1}]

chosen uniformly at random.

6.1. Synthetic Data

We generate the sensing tensor

A

and the acquired measurement tensor

B

as follows:

A = randn (N_{1}, N_{2}, N_{3}), B = A * randn (N_{2}, K, N_{3}) .

We set

N_{1} = 500, N_{2} = 200, N_{3} = 50

and

K = 50

.

In the TRAK, TRAKS,

{TRAK}_{F}

, and

{TRAKS}_{F}

algorithms, we know that s,

β

, and

α

affect the numerical results. Thus, we first show how they impact the performance of our methods in Figure 1, Figure 2, Figure 3 and Figure 4. In Figure 1, we plot the CPU and IT of the TRAK and

{TRAK}_{F}

algorithms with the number of partitions

s = 10

and different stepsizes

α

for different tolerances. We can find that the convergences of the TRAK and

{TRAK}_{F}

algorithms are quicker with respect to the increase of

α

and the fixed

s = 10

, but slow down after reaching the faster convergence rate. Moreover, the TRAK and

{TRAK}_{F}

algorithms also converge for

α_{T R A K} \geq 2 / ξ

and

α_{T R A K_{F}} \geq 2 N_{3} / ξ

, and converge much faster by using the appropriate extrapolated stepsize. In Figure 2, we plot the curves of the TRAKS and

{TRAKS}_{F}

algorithms with different stepsizes when

β = 100

. In this experiment, we approximate

ξ_{1} = max_{τ_{k} \subset [N_{1}], | τ_{k} | = β} \frac{∥ A_{τ_{k}, :, :} ∥_{2}^{2}}{∥ A_{τ_{k}, :, :} ∥_{F}^{2}}

by using the maximum value of

\frac{∥ A_{τ_{k}, :, :} ∥_{2}^{2}}{∥ A_{τ_{k}, :, :} ∥_{F}^{2}}

obtained by random sampling 100 times. From this figure, we can observe that the convergence rates of the TRAKS and

{TRAKS}_{F}

algorithms raise firstly and then slow down with respect to the increase of

α

.

In Figure 3, we depict the curves of the average number of iterations and computing times of the TRAK and

{TRAK}_{F}

algorithms versus the different number of partitions. In Figure 4, we plot the curves of the average CPU times and average IT of the TRAKS and

{TRAKS}_{F}

algorithms for different

β

. We use

α_{T R A K S} = 1.95 / ξ_{1}

,

α_{{TRAKS}_{F}} = 1.95 N_{3} / ξ_{1}

,

α_{T R A K} = 1.95 / ξ

and

α_{T R A K_{F}} = 1.95 N_{3} / ξ

. For all cases, we set

t o l e r a n c e = 10^{- 5}

. From these figures, we see that the CPU times are decreasing at the beginning and then increasing with respect to the increase of parameters. The number of iteration steps are always increasing, with respect to the increase of s, for both TRAK and

{TRAK}_{F}

, while the TRAKS and

{TRAKS}_{F}

algorithms are the opposite for the number of iteration steps. In general, from Figure 1, Figure 2, Figure 3 and Figure 4, there is an optimal combination of parameters that makes our algorithms perform best. In the following experiments, we use the optimal experimental results obtained by trial and error.

To compare the performances of these algorithms, we use

α_{TRK} = 1.3

, which makes the TRK algorithm achieve optimal performance. For the TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms,

α_{TRAK} = 2.5 / ξ

,

α_{{TRAK}_{F}} = 2.5 N_{3} / ξ

,

s_{TRAK} = s_{{TRAK}_{F}} = 4

,

α_{TRAKS} = 2.7 / ξ_{1}

,

α_{{TRAKS}_{F}} = 2.7 N_{3} / ξ_{1}

and

β_{TRAKS} = β_{{TRAKS}_{F}} = 160

are used. In Figure 5, we depict the curves of tolerance versus the iteration steps and the calculated times. We observe that the TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms converge much faster than the TRK algorithm, and the original algorithms and their Fourier versions have the same number of iteration steps for identical tolerance. The

{TRAKS}_{F}

algorithm has the best performance in terms of CPU time.

6.2. 3D MRI Image Data

This experiment considers an image data

X \in R^{128 \times 128 \times 27}

from data set mri in MATLAB. We generate randomly a Gaussian tensor

A \in R^{1000 \times 128 \times 27}

and form the measurement tensor

B

by

B = A * X

. The parameters are tuned to achieve optimal performance in all algorithms. We use

α_{T R K} = 1.2

,

α_{T R A K} = 2.3 / ξ

,

α_{T R A K_{F}} = (2.3 N_{3}) / ξ

,

s_{T R A K} = s_{T R A K_{F}} = 4

,

α_{T R A K S} = 2.4 / ξ_{1}

,

α_{{TRAKS}_{F}} = 2.4 N_{3} / ξ_{1}

and

β_{TRAKS} = β_{{TRAKS}_{F}} = 160

. The results are reported in Figure 6. From the figure, we find that the TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms significantly outperform the TRK algorithm. Meanwhile, it is easy to see that the

{TRAKS}_{F}

algorithm performs better than the other algorithms in terms of CPU time.

6.3. Video Data

The experiment was implemented on video data, where the frontal slices of tensor

X

were the first 60 frames from the 1929 film “Finding His Voice” [25]. Each video frame had

240 \times 320

pixels. Similar to Experiment Section 6.2, we randomly obtained a Gaussian tensor

A \in R^{1000 \times 240 \times 60}

and form the measurement tensor

B

by

B = A * X

. In this test, we used

α_{T R K} = 1.3

,

α_{T R A K} = 2.3 / ξ

,

α_{T R A K_{F}} = (2.3 N_{3}) / ξ

,

s_{T R A K} = s_{T R A K_{F}} = 4

,

α_{T R A K S} = 2.8 / ξ_{1}

,

α_{{TRAKS}_{F}} = 2.8 N_{3} / ξ_{1}

and

β_{TRAKS} = β_{{TRAKS}_{F}} = 170

, which made all algorithms achieve the best performance. In Figure 7, we plot the curves of the average RSE versus IT and CPU times for all algorithms. From this figure, we observe that the TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms significantly outperformed the TRK algorithm in terms of IT and CPU times and the

{TRAKS}_{F}

algorithm had a great advantage in the calculating time. In addition, we run the TRK, TRAK,

{TRAKS}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms for 50, 15, 15, 19, and 19 iterations, respectively. Their reconstructions are reported in Figure 8. From Figure 8, we observe that the SSIM of the TRK algorithm was much smaller than that of the TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms, which implies that the reconstruction results of the TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms are apparently better than that of the TRK algorithm. Moreover, the

{TRAKS}_{F}

algorithm requires less CPU time than the other algorithms.

6.4. CT Data

In this experiment, we test the performance of the algorithms using the real world CT data set. The underlying signal

X

is a tensor of size

512 \times 512 \times 42

, where each frontal slice is a

512 \times 512

matrix of the C2-vertebrae. The images were taken from the Laboratory of the Human Anatomy and Embryology, University of Brussels (ULB), Belgium [26]. We also obtain randomly a Gaussian tensor

A \in R^{1000 \times 512 \times 42}

and

B = A * X

. Let

α_{T R K} = 1.3

,

α_{T R A K} = 2.3 / ξ

,

α_{T R A K_{F}} = (2.3 N_{3}) / ξ

,

s_{T R A K} = s_{T R A K_{F}} = 5

,

α_{T R A K S} = 2.6 / ξ_{1}

,

α_{{TRAKS}_{F}} = 2.6 N_{3} / ξ_{1}

and

β_{TRAKS} = β_{{TRAKS}_{F}} = 250

. We run the TRK, TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms for 40, 15, 15, 13, and 13 iterations, respectively. Their numerical results are presented in Figure 9 and Figure 10. From Figure 9, we see again that the TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms converge faster than the TRK algorithm. From Figure 10, we observe that the SSIMs of the TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms are much closer to 1 than that of the TRK algorithm, which implies that the TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms achieve better reconstruction results. In addition, the

{TRAKS}_{F}

algorithm requires less CPU time than the other algorithms.

7. Conclusions

In this paper, we propose the TRAK and TRAKS algorithms and discuss the Fourier domain version. The new algorithms can be efficiently implemented in distributed computing units. Numerical results show that the new algorithms have better performance than the TRK algorithm. Meanwhile, we note that the sizes of the samples, the number of partitions, and the stepsizes play important roles in guaranteeing fast convergences of the new methods. Therefore, in future work, we will obtain more appropriate parameters.

Author Contributions

Conceptualization, W.B. and F.Z.; Methodology, W.B. and F.Z.; Validation, W.B. and F.Z.; Writing—original draft preparation, F.Z.; Writing—review and editing, W.B., F.Z., W.L., Q.W. and Y.G.; Software, Q.W. and Y.G.; Visualization, F.Z., Q.W. and Y.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Shandong Province (grant number ZR2020MD060), the Fundamental Research Funds for the Central Universities (grant number 20CX05011A), and the Major Scientific and Technological Projects of CNPC (grant number ZD2019-184-001).

Data Availability Statement

The datasets that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors are thankful to the referees for their constructive comments and valuable suggestions, which have greatly improved the original manuscript of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kilmer, M.E.; Martin, C.D. Factorization strategies for third-order tensors. Linear Algebra Its Appl. 2011, 435, 641–658. [Google Scholar] [CrossRef] [Green Version]
Newman, E.; Horesh, L.; Avron, H.; Kilmer, M. Stable tensor neural networks for rapid deep learning. arXiv 2018, arXiv:1811.06569. [Google Scholar]
Soltani, S.; Kilmer, M.E.; Hansen, P.C. A tensor-based dictionary learning approach to tomographic image reconstruction. Bit Numer. Math. 2016, 56, 1425–1454. [Google Scholar] [CrossRef] [Green Version]
Hua, Z.; Li, L.; Zhu, H. Tensor regression with applications in neuroimaging data analysis. J. Am. Stat. Assoc. 2013, 108, 540–552. [Google Scholar]
Andersen, A.H.; Kak, A.C. Simultaneous algebraic reconstruction technique (SART): A superior implementation of the ART algorithm. Ultrason. Imaging 1984, 6, 81–94. [Google Scholar] [CrossRef] [PubMed]
Peterson, J.E.; Paulsson, B.N.; McEvilly, T.V. Applications of algebraic reconstruction techniques to crosshole seismic data. Geophysics 1985, 50, 1566–1580. [Google Scholar] [CrossRef]
Zouzias, A.; Freris, N.M. Randomized extended Kaczmarz for solving least squares. SIAM J. Matrix Anal. Appl. 2013, 34, 773–793. [Google Scholar] [CrossRef]
Needell, D. Randomized Kaczmarz solver for noisy linear systems. BIT Numer. Math. 2010, 50, 395–403. [Google Scholar] [CrossRef] [Green Version]
Moorman, J.D.; Tu, T.K.; Molitor, D.; Needell, D. Randomized Kaczmarz with averaging. Bit Numer. Math. 2021, 61, 337–359. [Google Scholar] [CrossRef]
Necoara, I. Faster randomized block Kaczmarz algorithms. SIAM J. Matrix Anal. Appl. 2019, 40, 1425–1452. [Google Scholar] [CrossRef] [Green Version]
Miao, C.Q.; Wu, W.T. On greedy randomized average block Kaczmarz method for solving large linear systems. J. Comput. Appl. Math. 2022, 413, 114372. [Google Scholar] [CrossRef]
Elfving, T. Block-iterative methods for consistent and inconsistent linear equations. Numer. Math. 1980, 35, 1–12. [Google Scholar] [CrossRef]
Eggermont, P.P.B.; Herman, G.T.; Lent, A. Iterative algorithms for large partitioned linear systems with applications to image reconstruction. Linear Algebra Its Appl. 1981, 40, 37–67. [Google Scholar] [CrossRef] [Green Version]
Needell, D.; Tropp, J.A. Paved with good intentions: Analysis of a randomized block Kaczmarz method. Linear Algebra Its Appl. 2014, 441, 199–221. [Google Scholar] [CrossRef] [Green Version]
Ma, A.; Molitor, D. Randomized Kaczmarz for tensor linear systems. BIT Numer. Math. 2022, 62, 171–194. [Google Scholar] [CrossRef]
Tang, L.; Yu, Y.; Zhang, Y.; Li, H. Sketch-and-project methods for tensor linear systems. arXiv 2022, arXiv:2201.00667. [Google Scholar] [CrossRef]
Chen, X.; Qin, J. Regularized Kaczmarz algorithms for tensor recovery. SIAM J. Imaging Sci. 2021, 14, 1439–1471. [Google Scholar] [CrossRef]
Wang, X.; Che, M.; Mo, C.; Wei, Y. Solving the system of nonsingular tensor equations via randomized Kaczmarz-like method. J. Comput. Appl. Math. 2023, 421, 114856. [Google Scholar] [CrossRef]
Kilmer, M.E.; Braman, K.; Hao, N.; Hoover, R.C. Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging. SIAM J. Matrix Anal. Appl. 2013, 34, 148–172. [Google Scholar] [CrossRef] [Green Version]
Jin, H.; Bai, M.; Benítez, J.; Liu, X. The generalized inverses of tensors and an application to linear models. Comput. Math. Appl. 2017, 74, 385–397. [Google Scholar] [CrossRef]
Jiang, Y.; Wu, G.; Jiang, L. A Kaczmarz method with simple random sampling for solving large linear systems. arXiv 2020, arXiv:2011.14693. [Google Scholar]
Carlton, M.A. Probability and Statistics for Computer Scientists. Am. Stat. 2008, 62, 271–272. [Google Scholar] [CrossRef]
Wang, Q.; Li, W.; Bao, W.; Gao, X. Nonlinear Kaczmarz algorithms and their convergence. J. Comput. Appl. Math. 2022, 399, 113720. [Google Scholar] [CrossRef]
Lu, C. Tensor-Tensor Product Toolbox. Carnegie Mellon University. 2018. Available online: https://github.com/canyilu/tproduct (accessed on 17 July 2022).
Finding His Voice. Available online: https://archive.org/details/FindingH1929 (accessed on 15 August 2022).
Bone and Joint ct-scan Data. Available online: https://isbweb.org/data/vsj/ (accessed on 25 August 2022).

Figure 1. CPU and IT of the TRAK and

{TRAK}_{F}

algorithms with the number of partitions

s = 10

and different stepsizes

α

for different tolerances. Upper: TRAK. Lower:

{TRAK}_{F}

.

Figure 1. CPU and IT of the TRAK and

{TRAK}_{F}

algorithms with the number of partitions

s = 10

and different stepsizes

α

for different tolerances. Upper: TRAK. Lower:

{TRAK}_{F}

.

Figure 2. CPU and IT of the TRAKS and

{TRAKS}_{F}

algorithms with

β = 100

and different stepsizes

α

for different tolerances. Upper: TRAKS. Lower:

{TRAKS}_{F}

.

Figure 2. CPU and IT of the TRAKS and

{TRAKS}_{F}

algorithms with

β = 100

and different stepsizes

α

for different tolerances. Upper: TRAKS. Lower:

{TRAKS}_{F}

.

Figure 3. CPU and IT of the TRAK and

{TRAK}_{F}

algorithms with extrapolated stepsize and different number of partitions. Upper: TRAK. Lower:

{TRAK}_{F}

.

Figure 3. CPU and IT of the TRAK and

{TRAK}_{F}

algorithms with extrapolated stepsize and different number of partitions. Upper: TRAK. Lower:

{TRAK}_{F}

.

Figure 4. CPU and IT of the TRAKS and

{TRAKS}_{F}

algorithms for different

β

. Upper: TRAKS. Lower:

{TRAKS}_{F}

.

Figure 4. CPU and IT of the TRAKS and

{TRAKS}_{F}

algorithms for different

β

. Upper: TRAKS. Lower:

{TRAKS}_{F}

.

Figure 5. Pictures of tolerance versus IT and CPU times for the TRK, TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms when

X

is synthetic data.

Figure 5. Pictures of tolerance versus IT and CPU times for the TRK, TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms when

X

is synthetic data.

Figure 6. Pictures of tolerance versus IT and CPU times for the TRK, TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms when

X

comes from the 3D MRI image data set.

Figure 6. Pictures of tolerance versus IT and CPU times for the TRK, TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms when

X

comes from the 3D MRI image data set.

Figure 7. Pictures of the average RSE versus IT and CPU times for the TRK, TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms when

X

is the video datum.

Figure 7. Pictures of the average RSE versus IT and CPU times for the TRK, TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms when

X

is the video datum.

Figure 8. The

50 t h

frame of the clean film and the images recovered by the TRK, TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms.

{CPU}_{TRK} = 6.6113

,

{CPU}_{TRAK} = 6.1381

,

{CPU}_{{TRAK}_{F}} = 3.9756

,

{CPU}_{TRAKS} = 6.0760

,

{CPU}_{{TRAKS}_{F}} = 3.8270

.

Figure 8. The

50 t h

frame of the clean film and the images recovered by the TRK, TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms.

{CPU}_{TRK} = 6.6113

,

{CPU}_{TRAK} = 6.1381

,

{CPU}_{{TRAK}_{F}} = 3.9756

,

{CPU}_{TRAKS} = 6.0760

,

{CPU}_{{TRAKS}_{F}} = 3.8270

.

Figure 9. Pictures of the average RSE versus IT and CPU times for the TRK, TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms when

X

is a CT data.

Figure 9. Pictures of the average RSE versus IT and CPU times for the TRK, TRAK,

{TRAK}_{F}

, TRAKS, and

{TRAKS}_{F}

algorithms when

X

is a CT data.

Figure 10. The

39 t h

slice of the clean image sequence and the images recovered by the TRK, TRAK,

{TRAK}_{F}

algorithms.

{CPU}_{TRK} = 12.0177

,

{CPU}_{TRAK} = 9.3976

,

{CPU}_{{TRAK}_{F}} = 7.5307

,

{CPU}_{TRAKS} = 9.2294

,

{CPU}_{{TRAKS}_{F}} = 7.3547

.

Figure 10. The

39 t h

slice of the clean image sequence and the images recovered by the TRK, TRAK,

{TRAK}_{F}

algorithms.

{CPU}_{TRK} = 12.0177

,

{CPU}_{TRAK} = 9.3976

,

{CPU}_{{TRAK}_{F}} = 7.5307

,

{CPU}_{TRAKS} = 9.2294

,

{CPU}_{{TRAKS}_{F}} = 7.3547

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bao, W.; Zhang, F.; Li, W.; Wang, Q.; Gao, Y. Randomized Average Kaczmarz Algorithm for Tensor Linear Systems. Mathematics 2022, 10, 4594. https://doi.org/10.3390/math10234594

AMA Style

Bao W, Zhang F, Li W, Wang Q, Gao Y. Randomized Average Kaczmarz Algorithm for Tensor Linear Systems. Mathematics. 2022; 10(23):4594. https://doi.org/10.3390/math10234594

Chicago/Turabian Style

Bao, Wendi, Feiyu Zhang, Weiguo Li, Qin Wang, and Ying Gao. 2022. "Randomized Average Kaczmarz Algorithm for Tensor Linear Systems" Mathematics 10, no. 23: 4594. https://doi.org/10.3390/math10234594

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Randomized Average Kaczmarz Algorithm for Tensor Linear Systems

Abstract

1. Introduction

2. Preliminaries

2.1. Notation

2.2. Tensor Basics

2.3. Randomized Average Block Kaczmarz Algorithm

2.4. Randomized Regularized Kaczmarz Algorithm

3. Tensor Randomized Average Kaczmarz Algorithm

4. Tensor Randomized Average Kaczmarz Algorithm with Random Sampling (TRAKS)

5. The Fourier Version of the Algorithms

6. Numerical Experiments

6.1. Synthetic Data

6.2. 3D MRI Image Data

6.3. Video Data

6.4. CT Data

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI