Low-Rank Matrix Completion via QR-Based Retraction on Manifolds

Wang, Ke; Chen, Zhuo; Ying, Shihui; Xu, Xinjian

doi:10.3390/math11051155

Open AccessArticle

Low-Rank Matrix Completion via QR-Based Retraction on Manifolds

¹

Department of Mathematics, Shanghai University, Shanghai 200444, China

²

Qianweichang College, Shanghai University, Shanghai 200444, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(5), 1155; https://doi.org/10.3390/math11051155

Submission received: 17 January 2023 / Revised: 23 February 2023 / Accepted: 23 February 2023 / Published: 26 February 2023

(This article belongs to the Special Issue Advanced Optimization Methods and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Low-rank matrix completion aims to recover an unknown matrix from a subset of observed entries. In this paper, we solve the problem via optimization of the matrix manifold. Specially, we apply QR factorization to retraction during optimization. We devise two fast algorithms based on steepest gradient descent and conjugate gradient descent, and demonstrate their superiority over the promising baseline with the ratio of at least

24 %

.

Keywords:

matrix completion; QR factorization; gradient algorithm; manifold

MSC:

90C29; 57R57; 49Q10; 90C30

1. Introduction

The problem of matrix completion (MC) has generated a great deal of interest over the last decade [1], and several variant problems have been considered, such as non-negative matrix completion (NMC) [2], structured matrix completion [3,4] (including Hankel matrices [5]), and low-rank matrix completion (LRMC) [6,7]. Because of its wide applications in sensor network localization [8], system identification [9], machine learning [10,11], computer vision [12], recommendation systems [13], etc., LRMC has drawn a great deal of attention. Let

M \in R^{n \times n}

be an observed matrix and

Ω \subset {(i, j), i, j = 1, \dots, n}

be an index set of the observed position. Then, the desired low-rank matrix X can be recovered by solving the following rank minimization problem [14,15]:

min_{X \in R^{n \times n}} rank (X) s . t . P_{Ω} (X) = P_{Ω} (M),

(1)

where

{[P_{Ω} (X)]}_{i j} = X_{i j}

,

(i, j) \in Ω

and 0, otherwise. Unfortunately, the calculation of the rank function is (non-deterministic polynomial) NP-hard, and thus all known algorithms need double exponential time on the dimension of n.

To overcome this limitation, many approaches have been proposed [13]. For instance, Candès and Recht [16] replaced the rank function with the nuclear norm, and (1) can be rewritten as

min_{X} {∥ X ∥}_{*} s . t . P_{Ω} (X) = P_{Ω} (M),

(2)

where

{∥ X ∥}_{*} = \sum_{i} σ_{i} (X)

and

σ_{i} (X)

represents the i-th largest non-zero singular value. They proved that if the number of observed entries

m = | Ω |

obeys

m \geq c n^{1.2} r log (n)

with c being some positive constant and r being the rank of X, then most matrices of rank r can be perfectly recovered with very high probability by solving a simple convex optimization program. However, when the size of the matrix is large, the computation is still burdensome. To mitigate the computational burden, Cai et al. [17] introduced the singular value thresholding algorithm. The key idea of this approach is to place the regularization term into the objective function of the nuclear norm minimization problem. On the other hand, given the rank of a matrix, Lee and Bresler [18] replaced the rank function with the Frobenius norm, and (1) can be rewritten as

min_{X} \frac{1}{2} {∥ P_{Ω} (M) - P_{Ω} (X) ∥}_{F}^{2} s . t . rank (X) \leq r .

(3)

According to matrix theory, the matrix

M \in R^{n_{1} \times n_{2}}

of rank r can be decomposed into two matrices

X \in R^{n_{1} \times r}

and

Y \in R^{n_{2} \times r}

such that

M = X Y^{T}

. A straightforward method is to determine X and Y by minimizing the residual between the original M and the recovered one (that is,

X Y^{T}

) on the sampling set [19,20]:

min_{X, Y} \frac{1}{2} {∥ P_{Ω} (M) - P_{Ω} (X Y^{T}) ∥}_{F}^{2} .

(4)

To solve this multiple objective optimization program, one can employ the alternating minimization technique: (i) set X first and determine Y via minimizing the residual and; (ii) fix Y and determine X in the same way.

To accelerate the completing process, a novel utilization of the rank information is definition of an inner product and a differentiable structure, which formulates a manifold-based optimization program [21,22]. Then, one can compute the Riemannian gradient and Hessian matrix to solve the following problem [14,23]:

min_{X \in M_{r}} ∥ P_{Ω} (X - M) ∥,

(5)

where

M_{r} : = {X \in R^{n_{1} \times n_{2}}, rank (X) = r}

. Specially, Mishra et al. [24] discussed singular value decomposition, rank factorization, and QR factorization on manifolds. Following this line, Cambier and Absil [25] simultaneously considered singular value decomposition and regularization:

min_{X \in M_{r}} ∥ P_{Ω} {(X - M) ∥}_{l_{1}} + λ {∥ P_{Ω} (X) ∥}_{F}^{2},

(6)

where

{∥ X ∥}_{l_{1}} = Σ_{i, j} | X_{i, j} |

and

λ

is a regularization parameter. Yet, the improvement of the accuracy is not remarkable. More recently, Dong et al. [26] devised a preconditioned gradient descent algorithm for the rank factorization problem:

min_{(G, H) \in R^{m \times k} \times R^{n \times k}} \frac{1}{2} {∥ P_{Ω} (G H^{T} - M) ∥}_{F}^{2},

(7)

which is a multiple objective problem on the product space that can be defined as a manifold. Although it shows good performance in comparison to single-objective problems, the algorithm hardly considers the structure on a per matrix basis.

In this paper, we consider QR factorization on manifolds. Different from single-objective optimization on the manifold [24,27,28], we study LRMC using multiple objective optimization in the product space

R^{n_{1} \times r} \times R^{r \times n_{2}}

. During iteration, we first obtain the gradient of the objective function in the tangent space and then retract it with QR factorization. Specially, we introduce a measure to characterize the degree of orthogonality of Q for retraction, based on which we design two fast algorithms and show their advantage in comparison to rank factorization [26].

The paper is organized as follows. In Section 2, we introduce some preliminaries, including basic notations, problem formulation, and the element of manifolds. In Section 3, we show algorithms based on the choice of initial point, descent direction, step size, and retraction. In Section 4, we prove convergence and analyze the reason why the proposed algorithms outperform those in [26]. In Section 5, we demonstrate the superior performance of the proposed algorithms using numerical experiments. Finally, in Section 6 we provide a conclusion.

2. Preliminary

Notation. The Euclidean inner product and norm for the product space

R^{n_{1} \times r} \times R^{r \times n_{2}}

, respectively, denoted with

〈\cdot, \cdot〉

and

∥ \cdot ∥

, are defined by

〈x, y〉 = T r (Q_{x}^{T} Q_{y}) + T r (R_{x} R_{y}^{T}) and ∥ x ∥ = \sqrt{〈x, x〉},

(8)

for any pair of points

x = (Q_{x}, R_{x}), y = (Q_{y}, R_{y}) \in R^{n_{1} \times r} \times R^{r \times n_{2}}

.

Problem statement. The purpose of this paper is to solve the problem (5). With QR factorization, it becomes:

min_{(Q, R) \in R^{n_{1} \times r} \times R^{r \times n_{2}}} f_{Ω} (Q, R) : = \frac{1}{2} {∥ P_{Ω} (Q R - M) ∥}_{F}^{2} .

(9)

QR factorization. QR factorization [29] can be carried out by using Householder transformation, Givens rotations, Gram–Schmidt process, and their variants. In this paper, we choose the modified Gram–Schmidt algorithm for a more reliable procedure. (see Algorithm 7 for details.)

Geometric element on

R^{n_{1} \times r} \times R^{r \times n_{2}}

. The tangent space (see Figure 1) at a point

x \in R^{n_{1} \times r} \times R^{r \times n_{2}}

is the finite Cartesian product of the tangent spaces of the two element matrix spaces. Then,

T_{x} (R^{n_{1} \times r} \times R^{r \times n_{2}}) ≃ R^{n_{1} \times r} \times R^{r \times n_{2}}

(see Section 3.5.2, [21]) where

X ≃ Y

indicates that there is a homeomorphism between the topological space X and Y.

Comparing the performance of several metrics, we consider the following. Given two tangent vectors

ξ, η \in T_{x} (R^{n_{1} \times r} \times R^{r \times n_{2}})

(see Section 4, [26]) at point

x = (Q, R) \in R^{n_{1} \times r} \times R^{r \times n_{2}}

, the preconditioned metric is

M_{x} (ξ, η) = T r (ξ_{Q}^{T} η_{Q} (R R^{T} + δ I_{r})) + T r (ξ_{R} η_{R}^{T} (1 + δ)),

(10)

where

ξ = (ξ_{Q}, ξ_{R})

,

η = (η_{Q}, η_{R})

,

δ > 0

is a constant, which keeps the metric well defined and positive definite if Q or R does not have full rank. Furthermore, if

ξ = η

, one can write

{∥ ξ ∥}_{x}^{2} = M_{x} (ξ, ξ)

as a kind of norm at point x.

Definition 1.

For a point

x \in R^{n_{1} \times r} \times R^{r \times n_{2}}

, the gradient of

f_{Ω}

is the unique vector in

T_{x} (R^{n_{1} \times r} \times R^{r \times n_{2}})

, denoted with

\nabla f_{Ω} (x)

, such that

M_{x} (ξ, \nabla f_{Ω} (x)) = D f_{Ω} (x) [ξ], \forall ξ \in T_{x} (R^{n_{1} \times r} \times R^{r \times n_{2}}),

(11)

where

D f_{Ω} (x) [ξ] = T r (ξ_{Q}^{T} P_{Ω} (Q R - M) R^{T})) + T r (ξ_{R} P_{Ω}^{T} (Q R - M) Q)

is directional derivative defined [21] by

D f_{Ω} (x) [ξ] = lim_{t \to 0} \frac{f_{Ω} (x + t ξ) - f_{Ω} (x)}{t} .

Combing Equations (8) and (11), it follows that

\nabla f_{Ω} (Q, R) = (\partial_{Q} f_{Ω} (Q, R) {(R R^{T} + δ I_{r})}^{- 1}, \partial_{R} f_{Ω} (Q, R) {(1 + δ)}^{- 1}),

(12)

where

\partial_{Q} f_{Ω} (Q, R) = P_{Ω} (Q R - M) R^{T}

and

\partial_{R} f_{Ω} (Q, R) = Q^{T} P_{Ω} (Q R - M)

.

3. Algorithms

Initial point

x^{0}

. Following the widely used spectral initialization [30], we apply

k -

SVD to a zero-filled matrix

M_{0} : = P_{Ω} (M)

and yield three matrices

U_{0}, Σ_{0}

, and

V_{0}

such that

M_{0} = U_{0} Σ_{0} V_{0}^{T}

. Then, the initial point

x^{0} : = (Q^{0}, R^{0})

is set as (see Algorithm 1 for details)

(Q^{0}, R^{0}) = (U_{0} Σ_{0}^{1 / 2}, Σ_{0}^{1 / 2} V_{0}^{T}) .

(13)

Algorithm 1 Initialization

Input: data M and rank r
Output: initialization

x_{0}

1:: Use singular value decomposition (SVD) to compute $U, Σ, V$ (that satisfies $M = U Σ V^{T}$ ).
2:: Trim matrices $U_{0} = U (:, 1 : r), Σ_{0} = Σ (1 : r, 1 : r), V_{0} = V (:, 1 : r)$ .
3:: Set $x_{0} = [U_{0} {(Σ_{0})}^{1 / 2}; V_{0} {(Σ_{0}^{1 / 2})}^{T}]$ .

Descent direction

η^{t}

. Here, we consider two kinds of directions, the steepest descent (SD) direction (see Algorithm 2) and the conjugate descent (CD) direction (see Algorithm 3 and Figure 1), defined, respectively, by

\begin{matrix} η^{t} & = - \nabla f (x^{t}), \end{matrix}

(14)

\begin{matrix} η^{t} & = - \nabla f (x^{t}) + β_{t} η^{t - 1} . \end{matrix}

(15)

Although there are several calculations of

β_{t}

, we adopt

β_{t}^{D Y} = {∥ ξ^{t} ∥}_{x}^{2} / M_{x} (η^{t - 1}, Δ ξ^{t - 1})

from [31] because it outperforms the others.

Algorithm 2 Steepest descent (SD) direction of the function in (9) with orthogonality of Q

Input: Data M, iterate

x_{t} = [Q; R^{T}]

, rank r, and metric constant

δ

.
Output: SD direction

ξ^{t}

1:: $S = P_{Ω} (Q R - M)$ .
2:: $\nabla f_{Ω} (x^{t}) = (S R^{T} / (R R^{T} + δ I_{r}), S^{T} Q / (1 + δ)) (= \nabla f, for omitted)$ .
3:: $ξ^{t} = - \nabla f$

Algorithm 3 Conjugate descent (CD) direction of the function in (9)

Input: Last conjugate direction

η^{t - 1}

(Set

η^{0} = 0

), conjugate direction

β_{t}

.
Output: Conjugate direction

η^{t}

.

1:: Compute $\nabla f$ using Algorithm 2.
2:: $η^{t} = - \nabla f + β_{t} η^{t - 1}$ .

Stepsize

s_{t}

. For the SD direction, we apply exact line search (ELS) [22] (see Algorithm 4). Let

η = (η_{Q}, η_{R}) \in T_{x} (R^{n_{1} \times r} \times R^{r \times n_{2}})

be a given descent direction; then,

\begin{matrix} {arg min}_{s} f_{Ω} (Q + s η_{Q}, R + s η_{R}) \\ = {arg min}_{s} \frac{1}{2} {∥ P_{Ω} ((Q + s η_{Q}) (R + s η_{R}) - M) ∥}_{F}^{2} \\ = {arg min}_{s} \frac{1}{2} {∥ P_{Ω} (η_{Q} η_{R}) s^{2} + P_{Ω} (Q η_{R} + η_{Q} R) s + P_{Ω} (Q R - M) ∥}_{F}^{2} \\ = {arg min}_{s} \frac{1}{2} T r [{(A s^{2} + B s + O)}^{T} (A s^{2} + B s + O)] \\ = {arg min}_{s} \frac{1}{2} [T r (A^{T} A) s^{4} + 2 T r (A^{T} B) s^{3} + (2 T r (A^{T} O) + T r (B^{T} B)) s^{2} + 2 T r (B^{T} O) s + T r (O^{T} O)], \end{matrix}

where

A = P_{Ω} (η_{Q} η_{R})

,

B = P_{Ω} (Q η_{R} + η_{Q} R)

, and

O = P_{Ω} (Q R - M)

. The differential of the formula above reads as:

2 T r (A^{T} A) s^{3} + 3 T r (A^{T} B) s^{2} + (2 T r (A^{T} O) + T r (B^{T} B)) s + T r (B^{T} O) = 0 .

(16)

As a cubic equation, one can obtain its roots easily. The step size

s_{t}

is exactly the real positive root.

Algorithm 4 Exact line search

Input: Data M, iterate

x = [Q; R^{T}]

, Conjugate direction

η = [η_{Q}; η_{R}^{T}]

.
Output: Step size s.

1:: Set $A = P_{Ω} (η_{Q} η_{R}), B = P_{Ω} (Q η_{R} + η_{Q} R)$ and $O = P_{Ω} (Q R - M)$ .
2:: Furthermore, solve the cubic equation $2 T r (A^{T} A) x^{3} + 3 T r (A^{T} B) x^{2} + (2 T r (A^{T} O) + T r (B^{T} B)) x + T r (B^{T} O) = 0$ .
3:: Let the smallest absolute value s of their solutions be the step size.

For the CD direction, we apply the inexact line search (IELS) [32] (see Algorithm 5). For this purpose, we set

s_{0} = - M_{x} (ξ^{t}, η^{t}) / (L ∥ η^{t} ∥_{x}^{2})

, where

L > 0

is constant, and

σ \in (0, 0.5]

. Then, the step size

s_{t}

at the

t -

th iteration is the largest one in the set

{s_{0}, 0.5 s_{0}, 0 . 5^{2} s_{0}, \dots}

, and therefore

f (x^{t}) - f (x^{t} + s_{t} η^{t}) \geq - σ s_{t} M_{x} (ξ^{t}, η^{t}) .

(17)

Algorithm 5 Inexact line search

Input: Data M, iterate x, constant

L > 0

, times limitation

i m

, parameter

σ \in (0, 0.5]

, SD direction

ξ

, and CD direction

η

.
Output: Stepsize s.

1:: Set $s_{0} = - M_{x} (ξ, η) / {(L ∥ η ∥}_{x}^{2})$ .
2:: $k \leftarrow 0$
3:: while $f_{Ω} (x) - f_{Ω} (x + s η) + σ s M_{x} (ξ, η) < 0 & & k < i m$ do
4:: $s \leftarrow 0.5 s$
5:: $k \leftarrow k + 1$
6:: end while

Retraction. With the descent direction

η = (η_{Q}, η_{R})

and stepsize s, one can apply retraction (see Figure 1 and Algorithm 6). For this purpose, we introduce the concept of the degree of orthogonality.

Definition 2.

For a matrix

Q \in R^{n_{1} \times r}

, we definite its degree of orthogonality as:

Orth (Q) = |\frac{T r (Q^{T} Q) - r}{r}| .

(18)

Algorithm 6 Retraction with QR factorization

Input: Iteration

x = [Q; R^{T}]

, direction

η = [η_{Q}; η_{R}^{T}]

, stepsize

s_{t}

and parameter

θ

.
Output: Next iterate

x^{t + 1} = (Q^{t + 1}, R^{t + 1})

.

1:: if $Orth (Q^{t} + s_{t} η_{Q}^{t}) < θ$ (see (18)) then
2:: $x^{t + 1} = x^{t}$
3:: else {obtain $\tilde{Q}$ and $\tilde{R}$ from $Q^{t} + s_{t} η_{Q}^{t}$ using Algorithm 7}
4:: $x^{t + 1} = [\tilde{Q}; ({(R^{t})}^{T} + s_{t} {(η_{R}^{t})}^{T}) {(\tilde{R})}^{T}]$
5:: end if

Algorithm 7 Modified Gram–Schmidt algorithm

Input:

A \in R^{n_{1} \times r}

with

rank (A) = r

.
Output:

Q \in R^{n_{1} \times r}

and

R \in R^{r \times r}

.

1:: for $k = 1 : r$ do
2:: $R_{x, x} = \sqrt{Σ_{i = 1}^{n_{1}} A_{i, k}^{2}}$
3:: for $i = 1 : n_{1}$ do
4:: $Q_{i, k} = A_{i, k} / R_{k, k}$
5:: end for
6:: for $k = k + 1 : r$ do
7:: $R_{k, j} = Σ_{i = 1}^{n_{1}} Q_{i, k} A_{i, j}$
8:: for $i = 1 : n_{1}$ do
9:: $A_{i, j} = A_{i, j} - Q_{i, k} R_{k, j}$
10:: end for
11:: end for
12:: end for

Given a small parameter

θ

, we say that the matrix

Q + s η_{Q}

has good orthogonality if

Orth (Q + s η_{Q}) < θ

. Then, we adopt

(Q + s η_{Q}, R + s η_{R})

as the value for the next iterate. On the contrary, we have to decompose

Q + s η_{Q}

[33] and obtain

\tilde{Q}

,

\tilde{R}

, and hence, the next iteration point

(\tilde{Q}, \tilde{R} (R + s η_{R}))

.

In summary, we present Algorithms 8 and 9 as the whole process of solving the optimization problem (9), respectively.

Algorithm 8 QR Riemannian gradient descent (QRRGD)

Input: Function

f : R^{n_{1} \times r} \times R^{r \times n_{2}} \to R

(see (9)), initial point

x^{0} \in R^{n_{1} \times r} \times R^{r \times n_{2}}

(generated by Algorithm 1), tolerance parameter

ϵ > 0

Output:

x^{t}

1:: $t \leftarrow 0$
2:: Compute the gradient by Algorithm 2
3:: while $∥ ξ^{t} ∥_{x} > ϵ$ do
4:: Find step size $s_{t}$ by Algorithm 4
5:: Update via retraction (Algorithm 6): $x^{t + 1} = R_{x^{t}} (s_{t} η^{t})$
6:: $t \leftarrow t + 1$
7:: Compute the steepest direction by Algorithm 2
8:: end while

Algorithm 9 QR Riemannian conjugate gradient (QRRCG)

Input: Function

f : R^{n_{1} \times r} \times R^{r \times n_{2}} \to R

(see (9)), initial point

x^{0} \in R^{n_{1} \times r} \times R^{r \times n_{2}}

, tolerance parameter

ϵ > 0

p Output:

x^{t}

1:: $t \leftarrow 0$
2:: Compute the gradient using Algorithm 2
3:: while $∥ ξ^{t} ∥_{x} > ϵ$ do
4:: Find step size $s_{t}$ using Algorithm 4
5:: Update via retraction (Algorithm 6): $x^{t + 1} = R_{x^{t}} (s_{t} η^{t})$
6:: $t \leftarrow t + 1$
7:: Compute the conjugate direction using Algorithm 3
8:: end while

4. Analysis

4.1. Convergence

We conduct analysis of Algorithm 8 as an instance and Algorithm 9 can be proved using a similar method. First, we prove that the objective function (9) is Lipschitz continuously differentiable [34] on the product space

R^{n_{1} \times r} \times R^{r \times n_{2}}

over the Euclidean geometry. Then, we demonstrate that the proposed Riemannian gradient descent direction (14) has a sufficient decrease in the function value provided that the step size is selected properly depending on the local geometry at each iteration.

The Lipschitz continuity of the gradient of f in the sublevel set [35]

S^{0} = {x \in R^{n_{1} \times r} \times R^{r \times n_{2}}, f (x) \leq f (x^{0})} .

(19)

with respect to a point

x^{0} \in R^{n_{1} \times r} \times R^{r \times n_{2}}

is shown below.

Lemma. 1

(Lipschitz continuous). Given a point

x^{0} \in R^{n_{1} \times r} \times R^{r \times n_{2}}

, there exists a Lipschitz constant

L_{0} > 0

such that the gradient of f in

R^{n_{1} \times r} \times R^{r \times n_{2}}

is

L_{0} -

Lipschitz continuous for any

x, y \in R^{n_{1} \times r} \times R^{r \times n_{2}}

belonging to the sublevel set

S^{0}

(19) (see [36] for details to this lemma),

f (y) - f (x) \leq 〈\nabla f (x), y - x〉 + \frac{L_{0}}{2} {∥ y - x ∥}^{2} .

(20)

Proof.

By the definition of function (9), set

S^{0}

, where

S^{0}

is bounded with respect to any

x < \infty

. Furthermore, let B be a closed ball that contains

S^{0}

. For all

x, y \in B

, according to f is

C^{\infty}

, we have

\begin{matrix} f (y) & = f (x) + \int_{0}^{1} 〈\nabla f (x + τ (y - x)), y - x〉 d τ \\ = f (x) + 〈f (x), y - x〉 + \int_{0}^{1} 〈\nabla f (x + τ (y - x)) - \nabla f (x), y - x〉 d τ \end{matrix}

Then,

\begin{matrix} | f (y) - f (x) - 〈\nabla f (x), y - x〉 | & = | \int_{0}^{1} 〈f (x + τ (y - x)) - \nabla f (x), y - x〉 d τ | \\ \leq \int_{0}^{1} | 〈\nabla f (x + τ (y - x)) - \nabla f (x), y - x〉 | d τ \\ \leq \int_{0}^{1} ∥ \nabla f (x + τ (y - x)) - \nabla f (x) ∥ ∥ y - x ∥ d τ \\ \leq τ L_{0} {∥ y - x ∥}^{2} d τ \\ = \frac{L_{0}}{2} {∥ y - x ∥}^{2} . \end{matrix}

This means that (20) is true on B, and it functions on its subset

S^{0}

. □

Next, we obtain the following sufficient decrease property with Lemma 1.

Lemma 2.

At any iterate

x^{t} = (Q^{t}, R^{t})

produced by Algorithm 8 before stopping, the following sufficient decrease property holds, provided that the step size s satisfies

0 < s < 2 H_{t} / L_{0}

for a positive value

H_{t} > 0

,

f (x^{t + 1}) - f (x^{t}) \leq - C_{t} (s) {∥ \nabla f (x^{t}) ∥}^{2},

(21)

where

C_{t} (s) = s (H_{t} - \frac{L_{0} s}{2}) > 0

and

H_{t}

is defined by

H_{t} = δ + \min (1, σ_{\min}^{2} (R^{t})),

(22)

under the gradient setting (14).

Proof.

In Algorithm 8, at iterate

x^{t} \in R^{n_{1} \times r} \times R^{r \times n_{2}}

, the Riemannian gradient descent step is

η^{t} = - \nabla f (x^{t})

. Let

s > 0

denote the step size for producing the next iterate:

x^{t + 1} = x^{t} + s η^{t} = (η_{Q}, η_{R})

. In the gradient setting (14), the partial differentials are

\partial_{Q} f (x) = η_{Q} (R R^{T} + δ I_{r}) and \partial_{H} f (x) = η_{R} (1 + δ) .

(23)

According to Lemma 1, it follows that

\begin{matrix} f (x^{t + 1}) - f (x^{t}) & \leq 〈\nabla f (x^{t}), x^{t + 1} - x^{t}〉 + \frac{L_{0}}{2} {∥ x^{t + 1} - x^{t} ∥}^{2} \\ = - s 〈\nabla f (x^{t}), \nabla f (x^{t})〉 + \frac{L_{0} s^{2}}{2} {∥ η^{t} ∥}^{2} \\ = - s (T r (η_{Q}^{T} η_{Q} (R R^{T} + δ I_{r})) + T r (η_{R} η_{R}^{T} (Q^{T} Q + δ I_{r}))) + \frac{L_{0} s^{2}}{2} {∥ η^{t} ∥}^{2} \\ \leq - s (δ ∥ η^{t} ∥^{2} + σ_{m i n}^{2} (R) ∥ η_{Q} ∥_{F}^{2} + σ_{m i n}^{2} (Q) ∥ η_{R} ∥_{F}^{2}) + \frac{L_{0} s^{2}}{2} {∥ η^{t} ∥}^{2} \\ \leq - s ∥ η^{t} ∥^{2} (δ + min (1, σ_{m i n}^{2} (R^{t}))) + \frac{L_{0} s^{2}}{2} {∥ η^{t} ∥}^{2} \\ = - C_{t} (s) {∥ \nabla f (x^{t}) ∥}^{2} . \end{matrix}

□

Next, we prove that Algorithm 8 with the step size selected by the exact line search (16) ensures sufficient decrease at each iteration.

Lemma 3.

The iterates produced by Algorithm 8, with step size chosen by the exact line search (see Algorithm 4) satisfy the following sufficient decrease property,

f (x^{t + 1}) - f (x^{t}) \leq - (H_{t}^{2} / 2 L_{0}) {∥ \nabla f (x^{t}) ∥}^{2} .

(24)

Proof.

In Algorithm (8), let

η = - \nabla f (x^{t})

denote the Riemannian gradient descent direction at the iterate

x^{t} \in R^{n_{1} \times r} \times R^{r \times n_{2}}

. From Lemmas 2 and 3, one obtains

f (x^{t} + s η) \leq f (x^{t}) - C_{t} (s) {∥ \nabla f (x^{t}) ∥}^{2} .

for

s \in [0, 2 H_{t} / L_{0}]

with

H_{t}

defined in (22). On the other hand, let

\hat{s}

be the step size computed using Algorithm 4, and the next iterate

x^{t + 1} = x^{t} + \hat{s} η

is the minimum of f along the direction

η

:

f (x^{t + 1}) \leq f (x^{t} + s η)

by procession, for all

s \geq 0

. Therefore,

\begin{matrix} f (x^{t + 1}) & \leq min_{s \in [0, 2 H_{t} / L_{0}]} f (x^{t} + s η) \end{matrix}

(25)

\begin{matrix} \leq min_{s \in [0, 2 H_{t} / L_{0}]} (f (x^{t}) - C_{t} (s) ∥ \nabla f (x^{t}) ∥^{2}) \end{matrix}

(26)

\begin{matrix} = f (x^{t}) - (H_{t}^{2} / 2 L_{0}) {∥ \nabla f (x^{t}) ∥}^{2} . \end{matrix}

(27)

In Equation (27), the conclusion

{max}_{s \in [0, 2 H_{t} / L_{0}]} C_{t} (s) = {max}_{s \in [0, 2 H_{t} / L_{0}]} s (H_{t} - \frac{L_{0} s}{2}) = H_{t}^{2} / 2 L_{0}

is applied. □

In both Lemmas 2 and 3, the sufficient decrease quantity depends on the local parameter

H_{t}

. The quality

H_{t}

is useful only when it is a strictly positive number. We address this in Proposition 1 for the gradient setting (12).

Proposition 1.

Under the same settings as in Lemmas 2 and 3, there exists a positive numerical constant

H_{*} > 0

such that the quantities (22) are lower bounded,

inf_{t \geq 0} H_{t} \geq H_{*} .

(28)

Proof.

In the gradient setting (12),

\begin{matrix} H_{t} = δ + min (σ_{m i n}^{2} (Q^{t}), σ_{m i n}^{2} (R^{t})) \geq σ > 0 . \end{matrix}

It is easy to find the result (28) can be ensured by

H_{*} : = σ

as claimed. □

Now, we reach the main result using the following theorem.

Theorem 1.

Under the problem statement (9), given the initial point

x^{0}

and the gradient setting (12), the sequence generated by Algorithm 8 with the step size (16) converges and a upper bound of the gradient norm shows as follows,

∥ \nabla f (x^{N}) ∥ \leq \sqrt{\frac{2 L_{0} (f (x^{0}) - f^{*})}{H_{*} N}}

(29)

after N iterations, where

L_{0} > 0

is the Lipschitz constant in Lemma 1, the numerical constant

H_{*} > 0

is given in Proposition (28), and

f^{*}

is a lower bound of the function value of (9).

Proof.

The convergence of the sequence

{(x^{t})}_{t \geq 0}

is a direct result of the sufficient decrease property (21) in Lemma 2 and the boundedness of the sequence of function values

f {(x^{t})}_{t \geq 0} .

Let

N \geq 1

denote the number of iterations needed for reaching an iterate

x^{N}

such that

∥ \nabla f (x^{N}) ∥ \leq ϵ

, for a tolerance parameter

ϵ > 0

.

Because Algorithm 8 does not terminate at

t \leq N - 1

, the gradient norms

∥ \nabla f (x^{t}) ∥ > ϵ

for all

t \leq N - 1

. Adding up the right hand sides of (24) for

t = 0, \dots, N - 1

follows

\begin{matrix} f (x^{N}) - f (x^{0}) & \leq - \sum_{t = 0}^{N - 1} (H_{t}^{2} / 2 L_{0}) {∥ \nabla f (x^{t}) ∥}^{2} \end{matrix}

(30)

\begin{matrix} \leq - (ϵ^{2} / 2 L_{0}) \sum_{t = 0}^{N - 1} H_{t}^{2} \end{matrix}

(31)

\begin{matrix} = - (H_{*} / 2 L_{0}) ϵ^{2} N . \end{matrix}

(32)

In Equation (32), Proposition 1 is applied. Therefore, the number of iterations satisfies

N \leq \frac{2 L_{0} (f (x^{0}) - f (x^{N}))}{H_{*} ϵ^{2}} \leq \frac{2 L_{0} (f (x^{0}) - f^{*})}{H_{*} ϵ^{2}} .

In other words, the iterate produced by the algorithm after N iteration obeys

∥ \nabla f (x^{N}) ∥ \leq \sqrt{\frac{2 L_{0} (f (x^{0}) - f^{*}))}{H_{*} N}} .

□

4.2. Computational Cost

In this subsection, we analyze the computation cost of our QR-based method with the other. It demonstrates the reason that we can obtain better performance than the compared method. After we make a QR factorization to matrix Q, it leads to some computational cost, but what the factorization reaps, the benefits greatly exceeds what it costs using an ingenious trick.

Cost increase. In Figure 2, we mark three parts including computations in retraction as

C_{1}

,

C_{2}

, and

C_{3}

, respectively. For

C_{1}

, we compute

T r (Q^{T} Q)

as the leading part where

Q \in R^{n_{1} \times r}

, the cost of which is

2 n_{1} r^{2}

. For

C_{2}

, we compute the QR decomposition of a matrix in

R^{n_{1} \times r}

by the MGS algorithm, which costs

2 n_{1} r^{2}

. For

C_{3}

, it just computes the product of matrices

\tilde{R} \in R^{r \times r}

and

R \in R^{r \times n_{2}}

; hence, it costs

2 n_{2} r^{2}

. Assume that the rate of the good orthogonality is

1 - θ_{0}

, and the iteration number is

k_{iter}

; then, the whole increasing cost is

k_{iter} (C_{1} + θ_{0} (C_{2} + C_{3}))

.

Cost decrease. A simple thought is to reduce the cost in each iteration process. First, we consider the gradient of the objective function (9), and the computational costs are summarized in Table 1, where

C_{chol} = 1 / 3

is a coefficient in the Cholesky decomposition while computing the inverse. Therefore, once we compute the gradient we have

D_{1} = 2 (n_{1} + n_{2}) r^{2} + C_{chol} r^{3}

reductions directly with respect to algorithms without QR. Second, we consider the metric (10), and the computational costs are summarized in Table 2. When we compute the metric, the reduction is

D_{2} = 2 (n_{1} + r) r^{2}

under the QR method than those without it.

In Algorithm 8, if we find the step size

s_{t}

using the exact line search (Algorithm 4), then the reduction is

k_{iter} D_{1}

, because the exact line search needs not to compute the metric. Furthermore, the reduction in the computational cost at one iteration is

D_{1} - (C_{1} + θ_{0} (C_{2} + C_{3})) = \frac{1}{3} r^{3} + 2 (1 - θ_{0}) n_{2} r^{2} - 2 n_{1} r^{2}

.

In Algorithm 9, if we find the step size

s_{t}

using the inexact line search (Algorithm 5), the reduction in the worst case is

k_{iter} (D_{1} + i m D_{2})

. Furthermore, the reduction in the computational cost at one iteration is

(D_{1} + i m D_{2}) - (C_{1} + θ_{0} (C_{2} + C_{3})) = (2 i m + \frac{1}{3}) r^{3} + 2 (i m - r) n_{1} r^{2} + 2 (1 - θ_{0}) n_{2} r^{2}

.

5. Numerical Experiments

This section shows a numerical comparison of our algorithms with the recent RGD/RCG algorithms [26], which outperforms existing matrix factorization models on manifolds. The experiments are divided into two parts: in the first part, we test our algorithm on synthetic data, whereas in the second part, we provide the results on an empirical dataset PeMS Traffic [37].

To assess the algorithmic performance, we use the root mean square error (RMSE). Given a matrix

M \in R^{n_{1} \times n_{2}}

observed on

Ω

, the RMSE of

X \in R^{n_{1} \times n_{2}}

with respect to M is defined by

RMSE (X; Ω) = \sqrt{Σ_{(i, j) \in Ω} {(X_{i j} - M_{i j})}^{2} / | Ω |} .

(33)

Other parameters used in experiments are as follows: (1) p is the probability of an entry being observed; (2) the stopping parameter

ϵ = 10^{- 10}

is one of the two parameters stop the iteration process when RMSE reaches it; (3) the iteration budget parameter

λ = 250

is another parameter that stops the iteration process when iterating a specific amount of times over it; (4) the metric parameter

δ = 10^{- 4}

helps the metric be well defined; (5) the orthogonality parameter

θ = 0.01

is used to judge whether a matrix has good orthogonality; and (6) the oversampling factor

O S F \in (2.5, 3)

according to [14], defined by

O S F = | Ω | / (r (n_{1} + n_{2} - r))

, which decides the difficulty of the problem.

In our experiment, we first fix the values of

n_{1}

,

n_{2}

, and p. Next, we determine the difficulty of recovery, which can be characterized by the over sampling factor (OSF). Following [14], we set the OSF in

(2.5, 3)

. Finally, we determine the value of the rank by

r = ⌊ 11 / 30 n_{1} n_{2} p / (n_{1} + n_{2}) ⌋

. To ensure that the matrix M is low ranked (e.g.,

r = 10

), there are two methods. One is setting

n_{1}

and

n_{2}

as small as possible given the values of p. For example, given

p = 0.2

, the values of

n_{1}

and

n_{2}

are about 250. Because of the small size, the problem is trivial. The other is letting p be smaller given the larger values of

n_{1}

and

n_{2}

. This is what was performed in our experiment. For example, given

n_{1} = n_{2} = 2000

in Figure 2, we set

p = 0.05

and obtain

r = 18

.

All numerical experiments were performed on a desktop with 16-core Intel i7-10700F CPUs and 32GB of memory running Windows10 and MATLAB R2022b. The source code is available at https://github.com/Cz1544252489/qrcode (accessed on 14 February 2023).

5.1. Synthetic Data

Initially, we provide some comments about the chosen rank on synthetic data. We first fix the values of

n_{1}

,

n_{2}

, and p. Next, we determine the difficulty of recovery, which can be characterized by the oversampling factor (OSF). Following [14], we set the OSF in

(2.5, 3)

. Finally, we can determine the value of the rank by

r = ⌊ 11 / 30 n_{1} n_{2} p / (n_{1} + n_{2}) ⌋

.

We generate two observed matrices

M_{1}

and

M_{2}

with probability p, which is the ratio of an entry being observed defined by

M_{1} = F Q

with

(F, Q^{T}) \in R^{n_{1} \times r} \times R^{n_{2} \times r}

and

M_{2} = M_{1} / (\max (M_{1}) - \min (M_{1}))

, where

(F, Q^{T})

are composed of columns that are i.i.d. Gaussian vectors. The reason why we generate them is to test our algorithm on different scale of entries, and it will be measured by

E (M)

that is the average of random entries.

Table 3 and Table 4 show the results with matrices size of

2000 \times 2000

and

4000 \times 4000

. And Table 5 shows the results with matrices size ranging from

2000 \times 2000

to

8000 \times 2000

.

5.2. Empirical Data

In this part, we test our algorithm on the PeMS Traffic [37] dataset. It is a matrix with a size of

963 \times 10560

containing traffic occupancy rates (between 0 and 1) recorded across time by

m = 963

sensors placed along different lanes of freeways in the San Francisco Bay Area. The recordings are sampled every 10 minutes, covering a period of 15 months. The column index set corresponds to the time domain and the row index set corresponds to geographical points (sensors), which are referred to as the spatial domain. In the experiment, we use the part of test dataset; it has 173 rows and 6837 columns with

p = 0.05

. Table 6 shows the results on the empirical data.

As shown above, solid lines represent the results of our algorithms with QR factorization, whereas dashed lines correspond to those of the algorithms with rank factorization [26]. For synthetic data, our algorithms either yield better solutions or run with less time in comparison to [26] on most cases. Whereas for the empirical dataset, it shows a slight advantage for weak structures on earth. It has been demonstrated that the algorithms in [26] outperform the state-of-the-art methods using alternating minimization and the manifold concept.

Furthermore, we briefly measure the ratio of speedup from the compared algorithm. It can be defined by the means of speedup on all our experiment, that is,

S U = Σ_{i \in E} S U_{i} / | E |

, where E is the set of experiments. Furthermore, a single speedup

S U_{i}

defined as below:

S U_{i} = \{\begin{matrix} 0 & if t_{1} > t_{2} and ϵ_{1} > ϵ_{2} \\ | \frac{t_{2} ϵ_{2}}{t_{1} ϵ_{1}} - 1 | & o t h e r w i s e \end{matrix}

(34)

where

t_{1}

and

ϵ_{1}

are the time in seconds and theRMSE of the QR method, respectively, whereas

t_{2}

and

ϵ_{2}

are the time in seconds and the RMSE of the compared method. Finally, we obtain

S U = 24.00 %

.

6. Conclusions

We have proposed two LRMC algorithms, QRRGD and QRRCG, for reconstruction of an observed matrix via QR-based retraction on manifolds. These two algorithms are computationally efficient and have higher accuracy, demonstrated by theoretical analysis of computational costs and numerical experiments with synthetic data and a real-world dataset PeMS Traffic. To improve efficacy, one could adjust the values of other parameters such as using smaller

θ

for orthogonality, a larger

O S F

, and a more suitable

δ

value for the metric. On the other hand, different conjugate methods [38], as well as the rank adaptive method [39], can be considered.

Author Contributions

Conceptualization, S.Y. and X.X.; Methodology, S.Y. and X.X.; Validation, X.X.; Formal analysis, K.W. and Z.C.; Investigation, K.W. and Z.C.; Writing—original draft, K.W. and Z.C.; Writing—review & editing, S.Y. and X.X.; Project administration, X.X.; Funding acquisition, S.Y. and X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Foundation of China under Grant Nos. 11971296 and 12071281.

Data Availability Statement

Data available in a publicly accessible repository that does not issue DOIs Publicly available datasets were analyzed in this study. This data can be found here: https://file.cz123.top/DatainManQR/.

Acknowledgments

The authors thank anonymous referees for their valuable comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Candes, E.J.; Plan, Y. Matrix Completion With Noise. Proc. IEEE 2010, 98, 925–936. [Google Scholar] [CrossRef] [Green Version]
Xu, Y.; Yin, W.; Wen, Z.; Zhang, Y. An alternating direction algorithm for matrix completion with nonnegative factors. Front. Math. China 2012, 7, 365–384. [Google Scholar] [CrossRef] [Green Version]
Markovsky, I.; Usevich, K. Structured Low-Rank Approximation with Missing Data. SIAM J. Matrix Anal. Appl. 2013, 34, 814–830. [Google Scholar] [CrossRef] [Green Version]
Markovsky, I. Recent progress on variable projection methods for structured low-rank approximation. Signal Process. 2014, 96, 406–419. [Google Scholar] [CrossRef]
Usevich, K.; Comon, P. Hankel Low-Rank Matrix Completion: Performance of the Nuclear Norm Relaxation. IEEE J. Sel. Top. Signal Process. 2016, 10, 637–646. [Google Scholar] [CrossRef] [Green Version]
Davenport, M. An overview of low-rank matrix recovery from incomplete observations. IEEE J. Sel. Top. Signal Process. 2016, 10, 608–622. [Google Scholar] [CrossRef]
Chi, Y. Low-rank matrix completion [lecture notes]. IEEE Signal Process. Mag. 2018, 35, 178–181. [Google Scholar] [CrossRef]
Ding, Y.; Krislock, N.; Qian, J.; Wolkowicz, H. Sensor Network Localization, Euclidean Distance Matrix completions, and graph realization. Optim. Eng. 2010, 11, 45–66. [Google Scholar] [CrossRef] [Green Version]
Liu, Z.; Vandenberghe, L. Interior-Point Method for Nuclear Norm Approximation with Application to System Identification. SIAM J. Matrix Anal. Appl. 2010, 31, 1235–1256. [Google Scholar] [CrossRef] [Green Version]
Jacob, M.; Mani, M.P.; Ye, J.C. Structured Low-Rank Algorithms: Theory, Magnetic Resonance Applications, and Links to Machine Learning. IEEE Signal Process. Mag. 2020, 37, 54–68. [Google Scholar] [CrossRef] [Green Version]
Jawanpuria, P.; Mishra, B. Structured low-rank matrix learning: Algorithms and applications. arXiv 2017, arXiv:1704.07352. [Google Scholar]
Lu, S.; Ren, X.; Liu, F. Depth Enhancement via Low-rank Matrix Completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 3390–3397. [Google Scholar]
Nguyen, L.T.; Kim, J.; Shim, B. Low-rank matrix completion: A contemporary survey. IEEE Access 2019, 7, 94215–94237. [Google Scholar] [CrossRef]
Vandereycken, B. Low-rank matrix completion by Riemannian optimization. SIAM J. Optim. 2013, 23, 1214–1236. [Google Scholar] [CrossRef] [Green Version]
Wang, H.; Zhao, R.; Cen, Y.; Liang, L.; He, Q.; Zhang, F.; Zeng, M. Low-rank matrix recovery via smooth rank function and its application in image restoration. Int. J. Mach. Learn. Cybern. 2018, 9, 1565–1576. [Google Scholar] [CrossRef]
Candès, E.J.; Recht, B. Exact matrix completion via convex optimization. Found. Comput. Math. 2009, 9, 717–772. [Google Scholar] [CrossRef] [Green Version]
Cai, J.F.; Candès, E.J.; Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 2010, 20, 1956–1982. [Google Scholar] [CrossRef]
Lee, K.; Bresler, Y. Admira: Atomic decomposition for minimum rank approximation. IEEE Trans. Inf. Theory 2010, 56, 4402–4416. [Google Scholar] [CrossRef] [Green Version]
Jain, P.; Netrapalli, P.; Sanghavi, S. Low-rank matrix completion using alternating minimization. In Proceedings of the 45th Annual ACM Symposium on Theory of Computing, Palo Alto, CA, USA, 1–4 June 2013; pp. 665–674. [Google Scholar]
Tanner, J.; Wei, K. Low rank matrix completion by alternating steepest descent methods. Appl. Comput. Harmon. Anal. 2016, 40, 417–429. [Google Scholar] [CrossRef]
Absil, P.A.; Mahony, R.; Sepulchre, R. Optimization Algorithms on Matrix Manifolds; Princeton University Press: Princeton, NJ, USA, 2009. [Google Scholar]
Boumal, N. An Introduction to Optimization on Smooth Manifolds; Cambridge University Press: Cambridge, UK, 2023. [Google Scholar]
Guglielmi, N.; Scalone, C. An efficient method for non-negative low-rank completion. Adv. Comput. Math. 2020, 46, 31. [Google Scholar] [CrossRef]
Mishra, B.; Meyer, G.; Bonnabel, S.; Sepulchre, R. Fixed-rank matrix factorizations and Riemannian low-rank optimization. Comput. Stat. 2014, 29, 591–621. [Google Scholar] [CrossRef] [Green Version]
Cambier, L.; Absil, P.A. Robust low-rank matrix completion by Riemannian optimization. SIAM J. Sci. Comput. 2016, 38, S440–S460. [Google Scholar] [CrossRef] [Green Version]
Dong, S.; Absil, P.A.; Gallivan, K. Riemannian gradient descent methods for graph-regularized matrix completion. Linear Algebra Its Appl. 2021, 623, 193–235. [Google Scholar] [CrossRef]
Zhu, X. A Riemannian conjugate gradient method for optimization on the Stiefel manifold. Comput. Optim. Appl. 2017, 67, 73–110. [Google Scholar] [CrossRef]
Sato, H.; Aihara, K. Cholesky QR-based retraction on the generalized Stiefel manifold. Comput. Optim. Appl. 2019, 72, 293–308. [Google Scholar] [CrossRef]
Golub, G.H.; Van Loan, C.F. Matrix Computations; JHU Press: Baltimore, MD, USA, 2013. [Google Scholar]
Keshavan, R.H.; Montanari, A.; Oh, S. Matrix completion from noisy entries. J. Mach. Learn. Res. 2010, 11, 2057–2078. [Google Scholar]
Dai, Y.H.; Yuan, Y. A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 1999, 10, 177–182. [Google Scholar] [CrossRef] [Green Version]
Armijo, L. Minimization of functions having Lipschitz continuous first partial derivatives. Pac. J. Math. 1966, 16, 1–3. [Google Scholar] [CrossRef] [Green Version]
Björck, Å.; Paige, C.C. Loss and recapture of orthogonality in the modified Gram–Schmidt algorithm. SIAM J. Matrix Anal. Appl. 1992, 13, 176–190. [Google Scholar] [CrossRef]
O’Searcoid, M. Metric Spaces; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Boyd, S.; Boyd, S.P.; Vandenberghe, L. Convex optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Nesterov, Y. Introductory Lectures on Convex Optimization: A Basic Course; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
Dua, D.; Graff, C. UCI Machine Learning Repository. 2017. Available online: http://archive.ics.uci.edu/ml (accessed on 10 February 2019).
Liu, J.; Feng, Y.; Zou, L. Some three-term conjugate gradient methods with the inexact line search condition. Calcolo 2018, 55, 1–16. [Google Scholar] [CrossRef]
Zhou, G.; Huang, W.; Gallivan, K.A.; Van Dooren, P.; Absil, P.A. A Riemannian rank-adaptive method for low-rank optimization. Neurocomputing 2016, 192, 72–80. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Illustration of the tangent space, the conjugate direction, and the retraction process.

Figure 2. Illustration of the QR factorization in retraction.

Figure 3. Performance comparison of RGD (a,b) and RCG (c,d) with and without QR factorization. Simulations were carried out on square matrices

M_{1}

(a,c) and

M_{2}

(b,d). We apply the ELS and IELS to find the step size. The parameter values are

n_{1} = 2000

,

n_{2} = 2000

,

r = 18

, and

p = 0.05

.

Figure 3. Performance comparison of RGD (a,b) and RCG (c,d) with and without QR factorization. Simulations were carried out on square matrices

M_{1}

(a,c) and

M_{2}

(b,d). We apply the ELS and IELS to find the step size. The parameter values are

n_{1} = 2000

,

n_{2} = 2000

,

r = 18

, and

p = 0.05

.

Figure 4. Performance comparison of the RCG with and without QR factorization. Simulations were carried out on square matrices

M_{1}

(a,c) and

M_{2}

(b,d). We apply ELS and IELS to find the step size. The parameter values are

n_{1} = 4000

,

n_{2} = 4000

,

r = 36

, and

p = 0.05

.

Figure 4. Performance comparison of the RCG with and without QR factorization. Simulations were carried out on square matrices

M_{1}

(a,c) and

M_{2}

(b,d). We apply ELS and IELS to find the step size. The parameter values are

n_{1} = 4000

,

n_{2} = 4000

,

r = 36

, and

p = 0.05

.

Figure 5. Performance comparison of RCG with and without QR factorization. Simulations were carried out on square matrices

M_{1}

. We applied ELS. The parameter values are

n_{1} = 2000

, 4000, 6000, 8000,

n_{2} = 2000

,

p = 0.05

, then

r = 18

, 24, 27, 29.

Figure 5. Performance comparison of RCG with and without QR factorization. Simulations were carried out on square matrices

M_{1}

. We applied ELS. The parameter values are

n_{1} = 2000

, 4000, 6000, 8000,

n_{2} = 2000

,

p = 0.05

, then

r = 18

, 24, 27, 29.

Figure 6. Performance comparison of RGD (a,b) and RCG (c,d) with and without QR factorization. Simulations were carried out on the same matrices M built by PeMS Traffic. We applied ELS and IELS to find the step size. The parameter values are

n_{1} = 173

,

n_{2} = 6837

,

p = 0.05

,

r = 3

, and

O S F = 2.82

.

Figure 6. Performance comparison of RGD (a,b) and RCG (c,d) with and without QR factorization. Simulations were carried out on the same matrices M built by PeMS Traffic. We applied ELS and IELS to find the step size. The parameter values are

n_{1} = 173

,

n_{2} = 6837

,

p = 0.05

,

r = 3

, and

O S F = 2.82

.

Table 1. Computational costs of the gradients of the objective function.

Computation	Cost
$P_{Ω} (Q R - M) R^{T} {(R R^{T} + δ I_{r})}^{- 1}$	$(4 r + 1) \| Ω \| + 2 (n_{1} + n_{2}) r^{2} + C_{chol} r^{3}$
$Q^{T} P_{Ω} (Q R - M) {(Q^{T} Q + δ I_{r})}^{- 1}$	$(4 r + 1) \| Ω \| + 2 (n_{1} + n_{2}) r^{2} + C_{chol} r^{3}$
$Q^{T} P_{Ω} (Q R - M) / (1 + δ)$	$(4 r + 1) \| Ω \|$

Table 2. Computation costs of the metric.

Computation	Cost
$T r (ξ_{Q}^{T} η_{Q} (R R^{T} + δ I_{r}))$	$2 (n_{1} + n_{2} + r) r^{2}$
$T r (ξ_{R} η_{R}^{T} (Q^{T} Q + δ I_{r}))$	$2 (n_{1} + n_{2} + r) r^{2}$
$T r (ξ_{R} η_{R}^{T} (1 + δ))$	$2 n_{2} r^{2}$

Table 3. Computational results for Figure 3 with matrices size of

2000 \times 2000