1. Introduction
The problem of matrix completion (MC) has generated a great deal of interest over the last decade [
1], and several variant problems have been considered, such as non-negative matrix completion (NMC) [
2], structured matrix completion [
3,
4] (including Hankel matrices [
5]), and low-rank matrix completion (LRMC) [
6,
7]. Because of its wide applications in sensor network localization [
8], system identification [
9], machine learning [
10,
11], computer vision [
12], recommendation systems [
13], etc., LRMC has drawn a great deal of attention. Let
be an observed matrix and
be an index set of the observed position. Then, the desired low-rank matrix
X can be recovered by solving the following rank minimization problem [
14,
15]:
where
,
and 0, otherwise. Unfortunately, the calculation of the rank function is (non-deterministic polynomial) NP-hard, and thus all known algorithms need double exponential time on the dimension of
n.
To overcome this limitation, many approaches have been proposed [
13]. For instance, Candès and Recht [
16] replaced the rank function with the nuclear norm, and (
1) can be rewritten as
where
and
represents the
i-th largest non-zero singular value. They proved that if the number of observed entries
obeys
with
c being some positive constant and
r being the rank of
X, then most matrices of rank
r can be perfectly recovered with very high probability by solving a simple convex optimization program. However, when the size of the matrix is large, the computation is still burdensome. To mitigate the computational burden, Cai et al. [
17] introduced the singular value thresholding algorithm. The key idea of this approach is to place the regularization term into the objective function of the nuclear norm minimization problem. On the other hand, given the rank of a matrix, Lee and Bresler [
18] replaced the rank function with the Frobenius norm, and (
1) can be rewritten as
According to matrix theory, the matrix
of rank
r can be decomposed into two matrices
and
such that
. A straightforward method is to determine
X and
Y by minimizing the residual between the original
M and the recovered one (that is,
) on the sampling set [
19,
20]:
To solve this multiple objective optimization program, one can employ the alternating minimization technique: (i) set X first and determine Y via minimizing the residual and; (ii) fix Y and determine X in the same way.
To accelerate the completing process, a novel utilization of the rank information is definition of an inner product and a differentiable structure, which formulates a manifold-based optimization program [
21,
22]. Then, one can compute the Riemannian gradient and Hessian matrix to solve the following problem [
14,
23]:
where
. Specially, Mishra et al. [
24] discussed singular value decomposition, rank factorization, and QR factorization on manifolds. Following this line, Cambier and Absil [
25] simultaneously considered singular value decomposition and regularization:
where
and
is a regularization parameter. Yet, the improvement of the accuracy is not remarkable. More recently, Dong et al. [
26] devised a preconditioned gradient descent algorithm for the rank factorization problem:
which is a multiple objective problem on the product space that can be defined as a manifold. Although it shows good performance in comparison to single-objective problems, the algorithm hardly considers the structure on a per matrix basis.
In this paper, we consider QR factorization on manifolds. Different from single-objective optimization on the manifold [
24,
27,
28], we study LRMC using multiple objective optimization in the product space
. During iteration, we first obtain the gradient of the objective function in the tangent space and then retract it with QR factorization. Specially, we introduce a measure to characterize the degree of orthogonality of
Q for retraction, based on which we design two fast algorithms and show their advantage in comparison to rank factorization [
26].
The paper is organized as follows. In
Section 2, we introduce some preliminaries, including basic notations, problem formulation, and the element of manifolds. In
Section 3, we show algorithms based on the choice of initial point, descent direction, step size, and retraction. In
Section 4, we prove convergence and analyze the reason why the proposed algorithms outperform those in [
26]. In
Section 5, we demonstrate the superior performance of the proposed algorithms using numerical experiments. Finally, in
Section 6 we provide a conclusion.
2. Preliminary
Notation. The Euclidean inner product and norm for the product space
, respectively, denoted with
and
, are defined by
for any pair of points
.
Problem statement. The purpose of this paper is to solve the problem (
5). With QR factorization, it becomes:
QR factorization. QR factorization [
29] can be carried out by using Householder transformation, Givens rotations, Gram–Schmidt process, and their variants. In this paper, we choose the modified Gram–Schmidt algorithm for a more reliable procedure. (see Algorithm 7 for details.)
Geometric element on
. The tangent space (see
Figure 1) at a point
is the finite Cartesian product of the tangent spaces of the two element matrix spaces. Then,
(see Section 3.5.2, [
21]) where
indicates that there is a homeomorphism between the topological space
X and
Y.
Comparing the performance of several metrics, we consider the following. Given two tangent vectors
(see Section 4, [
26]) at point
, the preconditioned metric is
where
,
,
is a constant, which keeps the metric well defined and positive definite if
Q or
R does not have full rank. Furthermore, if
, one can write
as a kind of norm at point
x.
Definition 1. For a point , the gradient of is the unique vector in , denoted with , such thatwhere is directional derivative defined [21] by Combing Equations (
8) and (
11), it follows that
where
and
.
3. Algorithms
Initial point
. Following the widely used spectral initialization [
30], we apply
SVD to a zero-filled matrix
and yield three matrices
, and
such that
. Then, the initial point
is set as (see Algorithm 1 for details)
Algorithm 1 Initialization |
Input: data M and rank r Output: initialization - 1:
Use singular value decomposition (SVD) to compute (that satisfies ). - 2:
Trim matrices . - 3:
Set .
|
Descent direction
. Here, we consider two kinds of directions, the steepest descent (SD) direction (see Algorithm 2) and the conjugate descent (CD) direction (see Algorithm 3 and
Figure 1), defined, respectively, by
Although there are several calculations of
, we adopt
from [
31] because it outperforms the others.
Algorithm 2 Steepest descent (SD) direction of the function in (9) with orthogonality of Q |
Input: Data M, iterate , rank r, and metric constant . Output: SD direction - 1:
. - 2:
. - 3:
|
Algorithm 3 Conjugate descent (CD) direction of the function in (9) |
Input: Last conjugate direction (Set ), conjugate direction . Output: Conjugate direction .
- 1:
Compute using Algorithm 2. - 2:
.
|
Stepsize
. For the SD direction, we apply exact line search (ELS) [
22] (see Algorithm 4). Let
be a given descent direction; then,
where
,
, and
. The differential of the formula above reads as:
As a cubic equation, one can obtain its roots easily. The step size
is exactly the real positive root.
Algorithm 4 Exact line search |
Input: Data M, iterate , Conjugate direction . Output: Step size s.
- 1:
Set and . - 2:
Furthermore, solve the cubic equation . - 3:
Let the smallest absolute value s of their solutions be the step size.
|
For the CD direction, we apply the inexact line search (IELS) [
32] (see Algorithm 5). For this purpose, we set
, where
is constant, and
. Then, the step size
at the
th iteration is the largest one in the set
, and therefore
Algorithm 5 Inexact line search |
Input: Data M, iterate x, constant , times limitation , parameter , SD direction , and CD direction . Output: Stepsize s.
- 1:
Set . - 2:
- 3:
whiledo - 4:
- 5:
- 6:
end while
|
Retraction. With the descent direction
and stepsize
s, one can apply retraction (see
Figure 1 and Algorithm 6). For this purpose, we introduce the concept of the degree of orthogonality.
Definition 2. For a matrix , we definite its degree of orthogonality as: Algorithm 6 Retraction with QR factorization |
Input: Iteration , direction , stepsize and parameter . Output: Next iterate .
- 1:
if (see ( 18)) then - 2:
- 3:
else {obtain and from using Algorithm 7} - 4:
- 5:
end if
|
Algorithm 7 Modified Gram–Schmidt algorithm |
Input: with . Output: and .
- 1:
fordo - 2:
- 3:
for do - 4:
- 5:
end for - 6:
for do - 7:
- 8:
for do - 9:
- 10:
end for - 11:
end for - 12:
end for
|
Given a small parameter
, we say that the matrix
has
good orthogonality if
. Then, we adopt
as the value for the next iterate. On the contrary, we have to decompose
[
33] and obtain
,
, and hence, the next iteration point
.
In summary, we present Algorithms 8 and 9 as the whole process of solving the optimization problem (
9), respectively.
Algorithm 8 QR Riemannian gradient descent (QRRGD) |
Input: Function (see (9)), initial point (generated by Algorithm 1), tolerance parameter Output: - 1:
- 2:
Compute the gradient by Algorithm 2 - 3:
whiledo - 4:
Find step size by Algorithm 4 - 5:
Update via retraction (Algorithm 6): - 6:
- 7:
Compute the steepest direction by Algorithm 2 - 8:
end while
|
Algorithm 9 QR Riemannian conjugate gradient (QRRCG) |
Input: Function (see (9)), initial point , tolerance parameter p Output: - 1:
- 2:
Compute the gradient using Algorithm 2 - 3:
whiledo - 4:
Find step size using Algorithm 4 - 5:
Update via retraction (Algorithm 6): - 6:
- 7:
Compute the conjugate direction using Algorithm 3 - 8:
end while
|
5. Numerical Experiments
This section shows a numerical comparison of our algorithms with the recent RGD/RCG algorithms [
26], which outperforms existing matrix factorization models on manifolds. The experiments are divided into two parts: in the first part, we test our algorithm on synthetic data, whereas in the second part, we provide the results on an empirical dataset PeMS Traffic [
37].
To assess the algorithmic performance, we use the root mean square error (RMSE). Given a matrix
observed on
, the RMSE of
with respect to
M is defined by
Other parameters used in experiments are as follows: (1)
p is the probability of an entry being observed; (2) the stopping parameter
is one of the two parameters stop the iteration process when RMSE reaches it; (3) the iteration budget parameter
is another parameter that stops the iteration process when iterating a specific amount of times over it; (4) the metric parameter
helps the metric be well defined; (5) the orthogonality parameter
is used to judge whether a matrix has good orthogonality; and (6) the oversampling factor
according to [
14], defined by
, which decides the difficulty of the problem.
In our experiment, we first fix the values of
,
, and
p. Next, we determine the difficulty of recovery, which can be characterized by the over sampling factor (OSF). Following [
14], we set the OSF in
. Finally, we determine the value of the rank by
. To ensure that the matrix
M is low ranked (e.g.,
), there are two methods. One is setting
and
as small as possible given the values of
p. For example, given
, the values of
and
are about 250. Because of the small size, the problem is trivial. The other is letting
p be smaller given the larger values of
and
. This is what was performed in our experiment. For example, given
in
Figure 2, we set
and obtain
.
All numerical experiments were performed on a desktop with 16-core Intel i7-10700F CPUs and 32GB of memory running Windows10 and MATLAB R2022b. The source code is available at
https://github.com/Cz1544252489/qrcode (accessed on 14 February 2023).
5.1. Synthetic Data
Initially, we provide some comments about the chosen rank on synthetic data. We first fix the values of
,
, and
p. Next, we determine the difficulty of recovery, which can be characterized by the oversampling factor (OSF). Following [
14], we set the OSF in
. Finally, we can determine the value of the rank by
.
We generate two observed matrices and with probability p, which is the ratio of an entry being observed defined by with and , where are composed of columns that are i.i.d. Gaussian vectors. The reason why we generate them is to test our algorithm on different scale of entries, and it will be measured by that is the average of random entries.
Table 3 and
Table 4 show the results with matrices size of
and
. And
Table 5 shows the results with matrices size ranging from
to
.
5.2. Empirical Data
In this part, we test our algorithm on the PeMS Traffic [
37] dataset. It is a matrix with a size of
containing traffic occupancy rates (between 0 and 1) recorded across time by
sensors placed along different lanes of freeways in the San Francisco Bay Area. The recordings are sampled every 10 minutes, covering a period of 15 months. The column index set corresponds to the time domain and the row index set corresponds to geographical points (sensors), which are referred to as the spatial domain. In the experiment, we use the part of test dataset; it has 173 rows and 6837 columns with
.
Table 6 shows the results on the empirical data.
As shown above, solid lines represent the results of our algorithms with QR factorization, whereas dashed lines correspond to those of the algorithms with rank factorization [
26]. For synthetic data, our algorithms either yield better solutions or run with less time in comparison to [
26] on most cases. Whereas for the empirical dataset, it shows a slight advantage for weak structures on earth. It has been demonstrated that the algorithms in [
26] outperform the state-of-the-art methods using alternating minimization and the manifold concept.
Furthermore, we briefly measure the ratio of speedup from the compared algorithm. It can be defined by the means of speedup on all our experiment, that is,
, where
E is the set of experiments. Furthermore, a single speedup
defined as below:
where
and
are the time in seconds and theRMSE of the QR method, respectively, whereas
and
are the time in seconds and the RMSE of the compared method. Finally, we obtain
.