Next Article in Journal
Modeling and Stability Analysis of Within-Host IAV/SARS-CoV-2 Coinfection with Antibody Immunity
Previous Article in Journal
Mathematical Properties of a Novel Graph-Theoretic Irregularity Index with Potential Applicability in QSPR Modeling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Accelerated Randomized Coordinate Descent for Solving Linear Systems

College of Science, China University of Petroleum, Qingdao 266580, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(22), 4379; https://doi.org/10.3390/math10224379
Submission received: 20 October 2022 / Revised: 11 November 2022 / Accepted: 12 November 2022 / Published: 21 November 2022
(This article belongs to the Section Computational and Applied Mathematics)

Abstract

:
The randomized coordinate descent (RCD) method is a simple but powerful approach to solving inconsistent linear systems. In order to accelerate this approach, the Nesterov accelerated randomized coordinate descent method (NARCD) is proposed. The randomized coordinate descent with the momentum method (RCDm) is proposed by Nicolas Loizou, we will provide a new convergence boundary. The global convergence rates of the two methods are established in our paper. In addition, we show that the RCDm method has an accelerated convergence rate by choosing a proper momentum parameter. Finally, in numerical experiments, both the RCDm and the NARCD are faster than the RCD for uniformly distributed data. Moreover, the NARCD has a better acceleration effect than the RCDm and the Nesterov accelerated stochastic gradient descent method. When the linear correlation of matrix A is stronger, the NARCD acceleration is better.

1. Introduction

Consider a large-scale overdetermined linear system
A x = b ,
where A R m × n , m n . We can solve the least-squares problem m i n x b A x 2 . We assume that the columns of A are normalized:
A i   = 1 .
This assumption has no substantial impact on the implementation costs. We could just normalize each A i the first time the algorithm encounters it. However, we do not assume (2) about the algorithms, and include factors A i as needed. Regardless of whether normalization is performed, our randomized algorithms yield the same sequence of iterates.
The coordinate descent (CD) technique [1], which can also be produced by applying the conventional Gauss−Seidel iteration method to the following normal equation [2], is one of the iteration methods that may be used to solve the problem (1) cheaply and effectively.
A T A x = A T b ,
and it is also the same as the quadratic programming problem with no constraints.
m i n f ( x ) = 1 2 x T A T A x b T A x , x R n .
From [1], we can obtain
x k + 1 = x k + A i , b A x A i 2 e i .
In solving problem (1), the coordinate descent approach has a long history of addressing optimization issues and various applications in a wide range of fields such as biological feature selection [3], machine learning [4], protein structure [5], tomography [6,7], and so on. Inspired by the randomized coordinate descent (RCD) method, a lot of related works were presented, such as greedy versions of the randomized coordinate descent [8,9] and block versions of the randomized coordinate descent [10,11,12]. The coordinate descent method is a column projection method and the Kaczmarz [13] method is a row projection method. The RCD method is inspired by the randomized Kaczmarz(RK) [14] method. For the Kaczmarz-type approach; a lot of relevant work has also been conducted. Readers can refer to [15,16,17,18,19,20].
In this paper, for solving large systems of linear equations, we use two methods to accelerate the RCD method. First, we obtained an accelerated RCD method by adding Nesterov’s acceleration mechanism to the traditional RCD algorithm, called the Nesterov accelerated randomized coordinate descent method (NARCD). It is commonly known that by using an appropriate multi-step technique [21], the traditional gradient method may be turned into a quicker system. To solve the number of unconstrained minimization problems with strongly convex objectives, Nesterov improved this accelerated format [22]. Second, we can apply the heavy ball method (momentum method) to accelerate the RCD. Polyak invented the heavy ball method [23], which is a common approach for speeding up the convergence rate of gradient-type algorithms. Many researchers looked into variations of the heavy ball method, see [24]. By these two methods—to accelerate the RCD—the Nesterov accelerated randomized coordinate descent method (NARCD) and the randomized coordinate descent with momentum method (RCDm) were obtained.
In this paper, given a positive semidefinite matrix M, x M is defined as x T M x , . and . stands for the scalar product and the spectral norm, where the column vector is denoted by e i , with 1 at the ith position and 0 elsewhere. In addition, for a given matrix A, A i , A F , σ m i n ( A ) , and A T are used to denote its ith column, Frobenius norm, the smallest nonzero singular value, and the transpose of A respectively. A + is the Moore–Penrose pseudoinverse of A. Note that λ 1 = 1 ( A A T ) + . Let us denote i ( k ) as the index randomly generated at iteration k, and let I ( k ) denote all random indices that occurred or before iteration k, so that
I ( k ) = { i ( k ) , i ( k 1 ) , . . . , i ( 0 ) } ,
the sequences x k + 1 , y k + 1 , v k + 1 are determined by I ( k ) . In the following part of the proof, we use E i ( k ) | I ( k 1 ) ( . ) to denote the expectation of a random variable condition of I ( k 1 ) with respect to the index i ( k ) . So that
E I ( k ) ( . ) = E I ( k 1 ) ( E i ( k ) | I ( k 1 ) ( . ) ) .
The organization of this paper is as follows. In Section 2, we propose the NARCD method naturally and prove the convergence of the method. In Section 3, we propose the RCDm method and prove its convergence. In Section 4, to demonstrate the efficacy of our new methods, several numerical examples are offered. Finally, we present some brief concluding remarks in Section 5.

2. Nesterov’s Accelerated Randomized Coordinate Descent

The NARCD algorithm applies the Nesterov accelerated procedure [22], which is more well-known in terms of the gradient descent algorithm. Moreover, the Nesterov acceleration scheme creates the sequences { x k } , { y k } , and { v k } . When applied to m i n x f ( x ) , gradient descent sets x k + 1 = x k θ k f ( x ) , where f is the objective gradient and θ k is the step-size. We define the following iterative scheme:
y k = α k v k + ( 1 α k ) x k , x k + 1 = y k θ k f ( y k ) , v k + 1 = β k v k + ( 1 β k ) y k γ k f ( y k ) .
The aforementioned scheme’s key addition is that it employs acceptable values for the parameters α k , β k , and γ k , resulting in improved convergence in traditional gradient descent. In [25], the Nesterov-accelerated procedure is applied to the Kaczmarz method, which is a row action method. The RCD is a column action method and the Nesterov-accelerated procedure can be applied in the same way. The relationship between parameters α k , β k , and γ k is given in [22,25]. Now, using the general setup of Nesterov’s scheme, we can obtain the NARCD algorithm (Algorithm 1).
The framework of the NARCD method is given as follows.
Algorithm 1 Nesterov’s accelerated randomized coordinate descent method (NARCD)
Input: A R m × n , b R m , K R , x ( 0 ) R n , λ 0 , λ 1 .
1:
Initialize v 0 = x 0 , γ 1 = 0 , k = 0 .
2:
while k < K do
3:
   Choose γ k to be the larger root of
γ k 2 γ k n = ( 1 γ k λ n ) γ k 1 2
4:
   Set α k and β k as follows:
α k = n γ k λ γ k ( n 2 λ )
β k = 1 λ γ k n
5:
   Set y k = α k v k + ( 1 α k ) x k
6:
   Choose i = i ( k ) from 1 , 2 , . . . , n with equal probability
7:
    x k + 1 = y k + A i , b A y k A i 2 e i
8:
   Set v k + 1 = β k v k + ( 1 β k ) y k + γ k A i , b A y k A i 2 e i
9:
    k = k + 1
10:
end while
Output: x K
Remark 1.
In order to avoid the calculation of the product of matrix and vector ( A y k in steps 7 and 8), we adopt the following
Y k = α k V k + ( 1 α k ) X k , Z k = b Y k , μ k = A i , Z k A i 2 , X k + 1 = Y k + μ k A i , V k + 1 = β k V k + ( 1 β k ) Y k + ( γ k μ k ) A i ,
and X 0 = A x 0 , V 0 = X 0 . At the same time, we can use r k = b Y k to estimate the residue.
Lemma 1.
For any solution x to A T A x = A T b , y R n and P ( y ) = y + A i , b A y A i 2 e i , A i 2 = 1 . We can obtain
E i ( A ( P ( y ) x ) 2 ) = A ( y x ) 2 1 n A T ( A y b ) 2 .
where a random variable i satisfies the uniform distribution of the set {1,2,..,n}.
Proof. 
Using E i to donate the expectation with respect to the index i, we have
E i ( A ( P ( y ) x ) 2 ) = E i ( A ( y + A i , b A y e i x ) 2 ) = E i ( ( A ( y x ) + A i , b A y A i ) 2 ) = A ( y x ) 2 + E i ( A i , b A y A i 2 ) + 2 E i A ( y x ) , A i , b A y A i = A ( y x ) 2 + 1 n A T A ( y x ) 2 + 2 n A ( y x ) , A A T ( b A y ) = A ( y x ) 2 1 n A T A ( y x ) 2 ,
where the last equality uses A T A x = A T b .    □
Lemma 2.
For any y R n , we have
E i ( A i A i , b A y ( A A T ) + 2 ) 1 n A T A ( y x ) 2 ,
where the random variable i satisfies the uniform distribution of the set {1,2,..,n}.
Proof. 
We know the compact singular value decomposition of A as A = U Σ V T , where U R m × r ,   V R n × r ,   Σ R r × r , r is the rank of A, and U T U = I ,   V T V = I , Σ is the positive diagonal, and we can obtain ( A A T ) + = U Σ 2 U T .
E i ( A i A i , b A y ( A A T ) + 2 ) = 1 n i = 1 n A i A i , b A y , ( A A T ) + A i A i , b A y = 1 n t r a c e [ ( A A T ) + i = 1 n A i A i , b A y 2 A i T ] = 1 n t r a c e [ ( A A T ) + A d i a g ( A T ( b A y ) ) 2 A T ] = 1 n t r a c e [ U Σ 2 U T U Σ V T d i a g ( A T ( b A y ) ) 2 V Σ U T ] = 1 n t r a c e [ U Σ 1 V T d i a g ( A T ( b A y ) ) 2 V Σ U T ] = 1 n t r a c e [ V T d i a g ( A T ( b A y ) ) 2 V ] = 1 n d i a g ( A T ( b A y ) ) V F 2 = 1 n i = 1 n ( A i , b A y ) 2 v i 2 1 n A T A ( y x ) 2 .
As the sixth equation is a consequence of trace(ABC)=trace(BCA), and V = [ v 1 , v 2 , . . . , v n ] T , v i 2 1 , so that the last inequality holds.    □
Lemma 3.
As all n 2 λ > 0 , the following definition:
α k = n γ k λ γ k ( n 2 λ ) , β k = 1 λ γ k n
for both sequences { α k } , { β k } lie in the interval [ 0 , 1 ] if and only if γ k satisfies the following property:
1 n γ k n λ ,
and γ 1 = 0 , if γ k 1 1 λ , then γ k [ γ k 1 , 1 λ ] .
Proof. 
The first part of the lemma clearly holds. For the second part, recall from (5) that γ k is the larger root of the following convex quadratic function:
g ( γ ) = γ 2 γ n ( 1 λ γ k 1 2 ) γ k 1 2 ,
we note the following:
g ( γ k 1 ) = ( γ k 1 / n ) ( 1 λ γ k 1 2 ) 0 ,
g ( 1 λ ) = 1 λ 1 n λ ( 1 λ γ k 1 2 ) γ k 1 2 = 1 λ 1 n λ + γ k 1 2 ( λ m 1 ) 1 λ 1 n λ + 1 λ ( λ m 1 ) = 0 ,
which together imply that γ k [ γ k 1 , 1 λ ] .    □
Lemma 4.
Let a, b, and c be any vector in R n , then the following identity holds:
2 a c , c b = a b 2 a c 2 c b 2 .
Theorem 1.
The coordinate descent method with Nesterov’s acceleration for solving linear equations, λ [ 0 , λ 1 ] and x is the least-squares solution. Define σ 1 = 1 + λ 2 n and σ 2 = 1 λ 2 n , then for all k 0 , we have the following
E ( A ( x k + 1 x ) 2 ) 4 λ A ( x 0 x ) 2 ( σ 1 k + 1 σ 2 k + 1 ) 2
and
E ( A ( v k + 1 x ) ( A A T ) + 2 ) 4 A ( x 0 x ) ( A A T ) + 2 ( σ 1 k + 1 + σ 2 k + 1 ) 2 .
Proof. 
We follow the standard notation and steps shown in [22,25]; by (5) and (6), the following relation holds:
1 α k α k = n γ k 1 2 γ k .
From (5) and (7), we have
γ k 2 γ k n β k γ k 1 2 = 0 .
Now, let us define r k 2 = A ( v k x ) ( A A T ) + 2 . Then we have
r k + 1 2 = A ( v k + 1 x ) ( A A T ) + 2 = A ( β k v k + ( 1 β k ) y k + γ k A i , b A y k e i x ) ( A A T ) + 2 = A ( β k v k + ( 1 β k ) y k x ) ( A A T ) + 2 + γ k 2 A i A i , b A y k ( A A T ) + 2 + 2 γ k A ( β k v k + ( 1 β k ) y k x ) , ( A A T ) + A i A i , b A y k = A ( β k v k + ( 1 β k ) y k x ) ( A A T ) + 2 + γ k 2 A i A i , b A y k ( A A T ) + 2 + 2 γ k A ( β k ( 1 α k y k 1 α k α k x k ) + ( 1 β k ) y k x ) , ( A A T ) + A i A i , b A y k = A ( β k v k + ( 1 β k ) y k x ) ( A A T ) + 2 + γ k 2 A i A i , b A y k ( A A T ) + 2 + 2 γ k A ( y k x ) + 1 α k α k β k A ( y k x k ) , ( A A T ) + A i A i , b A y k .
Now, we divide (15) into three parts and simplify them separately. From the convexity of . ( A A T ) + 2 and in Lemma 3, we know β k [ 0 , 1 ] . So the first part of (15) is as follows:
A ( β k v k + ( 1 β k ) y k x ) ( A A T ) + 2 = A ( β k ( v k x ) + ( 1 β k ) ( y k x ) ) ( A A T ) + 2 β k A ( v k x ) ( A A T ) + 2 + ( 1 β k ) A ( y k x ) ( A A T ) + 2 = β k A ( v k x ) ( A A T ) + 2 + γ k λ n A ( y k x ) ( A A T ) + 2 β k A ( v k x ) ( A A T ) + 2 + γ k n A ( y k x ) 2 ,
where the last inequality makes use of λ λ 1 = 1 ( A A T ) + . Using Lemmas 1 and 2, the second part of (15) is as follows:
γ k 2 E i ( k ) | I ( k 1 ) ( A i A i , b A y k ( A A T ) + 2 ) γ k 2 n A T A ( y k x ) 2 = γ k 2 A ( y k x ) 2 γ k 2 E i ( k ) | I ( k 1 ) ( A ( x k + 1 x ) 2 ) .
We use the identity of (8) in the last part of our proof. We take expectations in the last part of (15) and can obtain
2 γ k E i ( k ) | I ( k 1 ) ( A ( y k x ) + 1 α k α k β k A ( y k x k ) , ( A A T ) + A i A i , b A y k ) = 2 γ k A ( y k x ) + 1 α k α k β k A ( y k x k ) , ( A A T ) + E i ( k ) | I ( k 1 ) A i A i , b A y k = 2 γ k n A ( y k x ) + 1 α k α k β k A ( y k x k ) , ( A A T ) + i = 1 n A i A i , b A y k = 2 γ k n A ( y k x ) + 1 α k α k β k A ( y k x k ) , ( A A T ) + A A T ( b A y k ) = 2 γ k n A ( y k x ) + 1 α k α k β k A ( y k x k ) , ( b A y k ) = 2 γ k n A ( y k x ) , b A y k + 2 γ k n 1 α k α k β k A ( y k x k ) , b A y k = 2 γ k n A ( y k x ) 2 + β k γ k 1 2 ( A ( x k x ) 2 A ( y k x ) 2 A ( y k x k ) 2 ) ( 2 γ k n + β k γ k 1 2 ) A ( y k x ) 2 + β k γ k 1 2 A ( x k x ) 2 ,
where the fifth and sixth equalities make use of the (13), and (10), respectively. Substituting all three parts of (16)–(18) into(15), we have
E i ( k ) | I ( k 1 ) ( r k + 1 2 ) β k A ( v k x ) ( A A T ) + 2 + γ k n A ( y k x ) 2 + γ k 2 A ( y k x ) 2 γ k 2 E i ( k ) | I ( k 1 ) ( A ( x k + 1 x ) 2 ) ( 2 γ k n + β k γ k 1 2 ) A ( y k x ) 2 + β k γ k 1 2 A ( x k x ) 2 = β k A ( v k x ) ( A A T ) + 2 + ( γ k 2 γ k n β k γ k 1 2 ) A ( y k x ) 2 γ k 2 E i ( k ) | I ( k 1 ) ( A ( x k + 1 x ) 2 ) + β k γ k 1 2 A ( x k x ) 2 = β k A ( v k x ) ( A A T ) + 2 γ k 2 E i ( k ) | I ( k 1 ) ( A ( x k + 1 x ) 2 ) + β k γ k 1 2 A ( x k x ) 2 ,
where the last equality is the consequence of (14). Let us define two sequences { A k } , { B k } as follows:
B k + 1 2 = B k 2 β k , A k + 1 2 = γ k 2 B k + 1 2 ,
we know the β k ( 0 , 1 ] , and B k 0 , B 0 0 . We have B k + 1 B k . Because of the γ 1 = 0 , we have A 0 = 0 . Moreover, γ k [ γ k 1 , 1 λ ] in Lemma 3, so that we can obtain that the { A k } is also an increasing sequence. Now, multiplying both sides of (19) by B k + 1 2 and using the (20), we have
B k + 1 2 E i ( k ) | I ( k 1 ) A ( v k + 1 x ) ( A A T ) + 2 + A k + 1 2 E i ( k ) | I ( k 1 ) A ( x k + 1 x ) 2 B k 2 A ( v k x ) ( A A T ) + 2 + A k 2 A ( x k x ) 2 ,
and then
E I ( k ) ( B k + 1 2 A ( v k + 1 x ) ( A A T ) + 2 + A k + 1 2 A ( x k + 1 x ) 2 ) = E I ( k 1 ) ( B k + 1 2 E i ( k ) | I ( k 1 ) A ( v k + 1 x ) ( A A T ) + 2 + A k + 1 2 E i ( k ) | I ( k 1 ) A ( x k + 1 x ) 2 ) E I ( k 1 ) ( B k 2 A ( v k x ) ( A A T ) + 2 + A k 2 A ( x k x ) 2 ) E I ( 0 ) ( B 1 2 A ( v 1 x ) ( A A T ) + 2 + A 1 2 A ( x 1 x ) 2 ) B 0 2 A ( v 0 x ) ( A A T ) + 2 + A 0 2 A ( x 1 x ) 2 ) = B 0 2 A ( v 0 x ) ( A A T ) + 2 = B 0 2 A ( x 0 x ) ( A A T ) + 2 .
So, by the (22), we can obtain
E A ( v k + 1 x ) ( A A T ) + 2 B 0 2 B k + 1 2 A ( x 0 x ) ( A A T ) + 2 , E A ( x k + 1 x ) 2 B 0 2 A k + 1 2 A ( x 0 x ) ( A A T ) + 2 ,
we now need to analyze the growth of two sequences { A k } and { B k } . Following the proof in [22,26] for the Nesterov accelerated scheme and the accelerated sampling Kaczmarz Motzkin algorithm [25], we have
B k 2 = β k B k + 1 2 = ( 1 λ γ k n ) B k + 1 2 = ( 1 λ A k + 1 n B k + 1 ) B k + 1 2 .
It implies that
B k 2 = ( 1 λ A k + 1 n B k + 1 ) B k + 1 2 = B k + 1 2 λ n A k + 1 B k + 1 ,
then
λ n A k + 1 B k + 1 = B k + 1 2 B k 2 = ( B k + 1 B k ) ( B k + 1 + B k ) 2 B k + 1 ( B k + 1 B k ) .
Moreover, because the { B k } and the { A k } are increasing sequences, we can simplify them and obtain
B k + 1 B k + λ 2 n A k + 1 B k + λ 2 n A k .
Similarly, we have
A k + 1 2 B k + 1 2 A k + 1 n B k + 1 = γ k 2 γ k n = β k γ k 1 2 = A k 2 B k + 1 2 ,
where the second equality uses (14) and the third equality uses (20). Using the above relationship, we have
1 n A k + 1 B k + 1 = A k + 1 2 A k 2 = ( A k + 1 + A k ) ( A k + 1 A k ) 2 A k + 1 ( A k + 1 A k ) .
Therefore,
A k + 1 A k + B k 2 n .
By combining the two expressions of (25) and (24), we have
A k + 1 B k + 1 1 1 2 n λ 2 n 1 k + 1 A 0 B 0 .
The Jordan decomposition of the matrix in the above expression is
1 1 2 n λ 2 n 1 = 1 1 λ λ 1 σ 1 0 0 σ 2 1 1 λ λ .
Here, σ 1 = 1 + λ 2 n and σ 2 = 1 λ 2 n . Because of A 0 = 0 , we have
A k + 1 B k + 1 1 1 2 n λ 2 n 1 k + 1 A 0 B 0 1 1 λ λ 1 σ 1 0 0 σ 2 k + 1 1 1 λ λ 0 B 0 = 1 2 ( σ 1 k + 1 σ 2 k + 1 λ ) B 0 ( σ 1 k + 1 + σ 2 k + 1 ) B 0 .
The above relationship gives us the growth bound for the sequences { A k } and { B k } . Substituting these above bounds in (23), we have
E A ( v k + 1 x ) ( A A T ) + 2 B 0 2 B k + 1 2 A ( x 0 x ) ( A A T ) + 2 4 A ( x 0 x ) ( A A T ) + 2 ( σ 1 k + 1 + σ 2 k + 1 ) 2 , E A ( x k + 1 x ) 2 B 0 2 A k + 1 2 A ( x 0 x ) ( A A T ) + 2 4 λ A ( x 0 x ) 2 ( σ 1 k + 1 σ 2 k + 1 ) 2 ,
and we have completed the proof.     □
Remark 2.
From the relationship between y k , x k , and v k , y k = α k v k + ( 1 α k ) x k , we know that
E A ( y k + 1 x ) ( A A T ) + 2 = E A ( α k + 1 ( v k + 1 x ) + ( 1 α k + 1 ) ( x k + 1 x ) ) ( A A T ) + 2 α k + 1 E A ( v k + 1 x ) ( A A T ) + 2 + ( 1 α k + 1 ) E A ( x k + 1 x ) ( A A T ) + 2 ,
and
A ( x k + 1 x ) ( A A T ) + 2 A ( x k + 1 x ) 2 ( A A T ) + 2 = A ( x k + 1 x ) 2 σ m i n 4 ( A ) ,
where the σ m i n 4 ( A ) is the nonzero minimum singular value of A. By the above inequality and Theorem 1, we have
E A ( y k + 1 x ) ( A A T ) + 2 α k + 1 E A ( v k + 1 x ) ( A A T ) + 2 + ( 1 α k + 1 ) σ m i n 4 ( A ) E A ( x k + 1 x ) 2 ( 4 α k + 1 ( σ 1 k + 1 + σ 2 k + 1 ) 2 + 4 ( 1 α k + 1 ) λ σ m i n 4 ( A A ) ( σ 1 k + 1 σ 2 k + 1 ) 2 ) A ( x 0 x ) 2 .

3. Randomized Coordinate Descent with Momentum Method

The iterative formula of the gradient descent (GD) method is as follows,
x k + 1 = x k λ k f ( x k ) ,
where λ k is a positive step-size parameter. Polyak [23] proposed the gradient descent method with momentum (GDm) by introducing a momentum term δ ( x k x k 1 ) , which was also known as the heavy ball method
x k + 1 = x k λ k f ( x k ) + δ ( x k x k 1 ) ,
where δ is a momentum parameter. Letting g ( x k ) be an unbiased estimator of the true gradient f ( x k ) , we have the stochastic gradient descent with momentum (mSGD) method.
x k + 1 = x k λ k g ( x k ) + δ ( x k x k 1 ) .
The randomized coordinate descent with momentum (RCDm) method is proposed in [27]. We will give a new convergence boundary.
The RCDm method takes the explicit iterative form
x k + 1 = x k + A i , b A x k A i 2 e i + δ ( x k x k 1 ) .
The framework of the RCDm method is given as follows (Algorithm 2).
Algorithm 2 Randomized coordinate descent with momentum method (RCDm)
Input: A R m × n , b R m , K R , x ( 0 ) R n , δ .
1:
Initialize k = 0 .
2:
while k < K do
3:
   Choose i = i ( k ) from 1 , 2 , . . . , n with equal probability
4:
    x k + 1 = x k + A i , b A x k A i 2 e i + δ ( x k x k 1 )
5:
    k = k + 1
6:
end while
Output: x K
Remark 3.
In order to avoid the calculation of the product of matrix and vector ( A x k in step 5), we adopt the following method
α k = A i , r k A i 2 , r k + 1 = ( 1 + δ ) r k α k A i δ r k 1 ,
and r 0 = b A x 0 , r 1 = r 0 .
Lemma 5
([27]). Fix F 1 = F 0 0 and let { F k } k 0 be a sequence of nonnegative real numbers satisfying the relation
F k + 1 a 1 F k + a 2 F k 1 , k 1 ,
where a 2 0 , a 1 + a 2 < 1 , and at least one of the coefficients a 1 , a 2 is positive. Then the sequence satisfies the relation F k + 1 q k ( 1 + ξ ) F 0 for all k 1 , where q = a 1 + a 1 2 + 4 a 2 2 and ξ = q a 1 0 . Moreover,
q a 1 + a 2 ,
with equality if and only if a 2 = 0 (in that case, q = a 1 and ξ = 0 ).
Theorem 2.
Assume δ 0 , and that the expressions a 1 = 1 σ m i n 2 ( A ) n + 3 δ 3 δ σ m i n 2 ( A ) n + 2 δ 2 and a 2 = 2 δ 2 + δ δ σ m i n 2 ( A ) n satisfy a 1 + a 2 < 1 , where σ m i n 2 ( A ) is nonzero minimum singular value of A. Let { x k } k = 0 be the iteration sequence generated by the RCDm method starting from initial guess x 0 = 0 . Then, it holds that
E ( A ( x k + 1 x ) 2 ) q k ( 1 + ξ ) A ( x 0 x ) 2 ,
where q = a 1 + a 1 2 + 4 a 2 2 , ξ = q a 1 0 , x is the least-squares solution. Moreover, a 1 , a 2 , q obeys a 1 + a 2 q < 1 .
Proof. 
From the algorithm of RCDm, we have
E i ( k ) ) | I ( k 1 ) A ( x k + 1 x ) 2 = E i ( k ) | I ( k 1 ) A ( x k + A i , b A x k e i + δ ( x k x k 1 ) x ) 2 = E i ( k ) | I ( k 1 ) A ( x k x ) + A i A i , b A x k 2 + δ 2 A ( x k x k 1 ) 2 + 2 δ E i ( k ) | I ( k 1 ) A ( x k x k 1 ) , A ( x k x ) + A i A i , b A x k .
We consider the three terms in (28) in turn. For the first term, we have
E i ( k ) | I ( k 1 ) A ( x k x ) + A i A i , b A x k 2 = A ( x k x ) 2 + E i ( k ) | I ( k 1 ) A i A i , b A x k 2 + 2 E i ( k ) | I ( k 1 ) A ( x k x ) , A i A i , b A x k = A ( x k x ) 2 + 1 n i = 1 n A i A i , b A x k 2 + 2 n i = 1 n A ( x k x ) , A i A i , b A x k = A ( x k x ) 2 1 n A T A ( x k x ) 2 A ( x k x ) 2 σ m i n 2 ( A ) n A ( x k x ) 2 = ( 1 σ m i n 2 ( A ) n ) A ( x k x ) 2 ,
where the last inequality is the consequence of singular value inequality ( A x 2 σ m i n 2 ( A ) x 2 ) , and n = A F 2 σ m i n 2 ( A ) . From the second term, we have
δ 2 A ( x k x k 1 ) 2 = δ 2 A ( x k x ) A ( x k 1 x ) 2 2 δ 2 ( A ( x k x ) 2 + A ( x k 1 x ) 2 ) .
From the third term, we have
2 δ E i ( k ) | I ( k 1 ) A ( x k x k 1 ) , A ( x k x ) + A i A i , b A x k = 2 δ A ( x k x k 1 ) , A ( x k x ) + 2 δ E i ( k ) | I ( k 1 ) A ( x k x k 1 ) , A i A i , b A x k = δ ( A ( x k x k 1 ) 2 + A ( x k x ) 2 A ( x k 1 x ) 2 ) + 2 δ n A ( x k x k 1 ) , i = 1 n A i A i , b A x k = δ ( A ( x k x k 1 ) 2 + A ( x k x ) 2 A ( x k 1 x ) 2 ) + 2 δ n A ( x k x k 1 ) , A A T ( b A x k ) = δ ( A ( x k x k 1 ) 2 + A ( x k x ) 2 A ( x k 1 x ) 2 ) 2 δ n A T A ( x k x k 1 ) , A T A ( x k x ) = δ ( A ( x k x k 1 ) 2 + A ( x k x ) 2 A ( x k 1 x ) 2 ) δ n ( A T A ( x k x k 1 ) 2 + A T A ( x k x ) 2 A T A ( x k 1 x ) 2 ) δ ( A ( x k x k 1 ) 2 + A ( x k x ) 2 A ( x k 1 x ) 2 ) δ σ m i n 2 ( A ) n ( A ( x k x k 1 ) 2 + A ( x k x ) 2 A ( x k 1 x ) 2 ) = ( δ δ σ m i n 2 ( A ) n ) ( A ( x k x k 1 ) 2 + A ( x k x ) 2 A ( x k 1 x ) 2 ) ( δ δ σ m i n 2 ( A ) n ) ( 3 A ( x k x ) 2 + A ( x k 1 x ) 2 ) .
where the second equality uses (10), the first inequality uses a singular value inequality and (10), and the last inequality is a consequence of δ A ( x k x k 1 ) 2 2 δ A ( x k x ) 2 + 2 δ A ( x k 1 x ) 2 . Using the (29)–(31), we obtain
E i ( k ) | I ( k 1 ) A ( x k + 1 x ) 2 ( 1 σ m i n 2 ( A ) n ) A ( x k x ) 2 + 2 δ 2 ( A ( x k x ) 2 + A ( x k 1 x ) 2 ) + ( δ δ σ m i n 2 ( A ) n ) ( 3 A ( x k x ) 2 + A ( x k 1 x ) 2 ) .
Moreover, then
E A ( x k + 1 x ) 2 ( 1 σ m i n 2 ( A ) n + 3 δ 3 δ σ m i n 2 ( A ) n + 2 δ 2 ) E A ( x k x ) 2 + ( 2 δ 2 + δ δ σ m i n 2 ( A ) n ) E A ( x k 1 x ) 2 .
By Lemma 5, let F k = E A ( x k x ) 2 , we have the following relation
F k + 1 a 1 F k + a 2 F k 1 ,
and we have
E A ( x k x ) 2 q k 1 ( 1 + ξ ) x 0 x 2 ,
where a 1 = 1 σ m i n 2 ( A ) n + 3 δ 3 δ σ m i n 2 ( A ) n + 2 δ 2 , a 2 = 2 δ 2 + δ δ σ m i n 2 ( A ) n , ξ = q a 1 0 , and we have completed the proof. □
Remark 4.
a 1 = 1 σ m i n 2 ( A ) n + 3 δ 3 δ σ m i n 2 ( A ) n + 2 δ 2 , a 2 = 2 δ 2 + δ δ σ m i n 2 ( A ) n . When δ = 0 , we can obtain a 1 = 1 σ m i n 2 ( A ) n , a 2 = 0 satisfy a 1 + a 2 < 1 . When δ takes a small value, we can obtain that this relation a 1 + a 2 is satisfied. In addition, the RCDm method degenerates to the RCD method when δ = 0 . The RCDm method converges faster than the RCD method if we choose a proper δ. Numerical experiments will show the effectiveness of the RCDm method.
When δ 0 , we can conclude that a 2 0 . For the above theorem, we have to satisfy a 1 + a 2 < 1 , so a 1 + a 2 = 4 δ 2 + 4 ( 1 σ m i n 2 ( A ) n ) δ + 1 σ m i n 2 ( A ) n . We set ω = σ m i n 2 ( A ) n . It can be concluded that δ [ 0 , ω 1 + ( 1 ω ) 2 + ω 2 ] . When δ [ 0 , ω 1 + ( 1 ω ) 2 + ω 2 ] , the RCDm method converges. However, in the later experiments, the choice of δ will exceed this range because it took a lot of scaling to reach this range.

4. Numerical Experiments

In this section, we compare the influence of different δ on the RCDm algorithm and the effectiveness of the RCD, RCDm, and NARCD methods for solving the large linear system A x = b . All experiments were performed in MATLAB [28] (version R2018a), on a personal laptop with a 1.60 GHz central processing unit (Intel(R) Core(TM) i5-10210U CPU), 8.00 GB memory, and a Windows operating system (64 bits, Windows 10).
In all implementations, the starting point was chosen to be x0 = zeros(n, 1), the right vector b + ϵ = A x + ϵ where ϵ N ( A T ) and x = o n e s ( n , 1 ) . The relative residual error (RRE) at the kth iteration is defined as follows:
R R E = b A x k 2 b 2 .
The iterations are terminated once the relative solution error satisfies R R E < 10 8 or the number of iteration steps exceeds 5,000,000. If the number of iteration steps exceeds 5,000,000, it is denoted as “-”. IT and CPU denote the number of iteration steps and the CPU times (in seconds) respectively. In addition, the CPU and IT mean the arithmetical averages of the elapsed running times and the required iteration steps with respect to 50 trials of repeated runs of the corresponding method. The speed-up of the RCD method against the RCDm method is defined as follows:
s p e e d u p 1 = C P U o f R C D C P U o f R C D m
and the speed-up of the RCD method against the NARCD method is defined as follows:
s p e e d u p 2 = C P U o f R C D C P U o f N A R C D .

4.1. Experiments for Different δ on the RCDm

The matrix A is randomly generated by using the MATLAB function unifrnd (0,1,m,n). We observe that RCDm, with appropriately chosen momentum parameters 0 < δ 0.4 , always converges faster than their no-momentum variants. In this subsection, we let δ = 0 , 0.1 , 0.2 , 0.3 , 0.4 to compare their performances. Numerical results are reported in Table 1, Table 2 and Table 3 and Figure 1. We can conclude some observations as follows. when δ = 0.1 , 0.2 , 0.3 , 0.4 , the acceleration effect is good.

4.2. Experiments for NARCD, RCDm, RCD, NASGD

Matrix A is randomly generated by using the MATLAB function unifrnd (0,1,m,n). For the RCDm method, let us take the momentum parameter δ = 0.3 . For the NARCD method, let us take the Nesterov accelerated parameter λ = 0.05 . For the Nesterov accelerated stochastic gradient descent method (NASGD), it is the step size α = 0.01 . We observe the performances of RCD, RCDm, and NARCD methods with matrices A of different sizes. From Figure 2 and Table 4, Table 5, Table 6 and Table 7, we found that both the NARCD and the RCDm with appropriate momentum parameters can accelerate the RCD; the NARCD and the RCDm always converge faster than the RCD. Moreover, we found that the NARCD has a better acceleration effect than the RCDm. From Table 7, for matrix A R 8000 × 3000 , the NARCD method demonstrates the best numerical results than the other matrices in terms of the value of the speed-up, where the speed-up is 3.0206 . From Figure 3, we found that the acceleration of NARCD method and RCDm method experience gentle changes as the matrix becomes larger, so we can see that these two methods still have good speed-ups when the matrix is very large. From Figure 4, we found that the NARCD converges faster than the NASGD.

4.3. Experiment with Different Correlations of Matrix A

Matrix A is randomly generated by using the MATLAB function unifrnd (c,1,m,n), c [ 0 , 1 ) . We let c = 0 , 0.2 , 0.4 , 0.6 . For the RCDm method, let us take the momentum parameter δ = 0.3 . For the NARCD method, let us take the Nesterov accelerated parameter λ = 0.05 . As the value of c increases, the correlation of matrix A becomes stronger. From Table 8, Table 9, Table 10, Table 11 and Table 12, we know that as c increases, the condition number for the matrix increases. The larger the condition number, the more ill-conditioned the matrix. The more ill-conditioned the matrix, the more time it takes to solve. From Table 10 and Table 12, we know that the acceleration effect of RCDm does not change much with the increase of the c value, but the acceleration effect of NARCD is becoming better.

4.4. The Two-Dimensional Tomography Test Problems

In this section, we use the previously and newly proposed methods to reconstruct 2D seismic travel time tomography. The 2D seismic travel-time tomography reconstruction is implemented in the function seismictomo (N, s, p) in the MATLAB package AIR TOOLS [29], which generated a sparse matrix A, an exact solution x (which is shown in Figure 4a) and the right vector b + ϵ = A x + ϵ where ϵ N ( A T ) . We set N = 20, s = 30 and p = 100 in the function seismictomo (N, s, p). We utilize the RCD, RCDm( δ =0.3) and NARCD( λ = 0.05 ) methods to solve the linear least-squares problem (1). The experiment ran 90,000 iterations. From Figure 5, we see that the results of the NARCD methods are better than those of the RCD and RCDm methods under the same number of iteration steps.

5. Conclusions

To solve a large system of linear equations, two new acceleration methods for the RCD method are proposed, called the NARCD method and the RCDm method. Their convergences were proved, and the estimations of the convergence rates of the NARCD method and the RCDm method are given, respectively. The two methods are shown to be equally successful in numerical experiments. In uniformly distributed data, with appropriately chosen momentum parameters, the RCDm is better than the RCD in IT and CPU. The NARCD and the RCDm are faster than the RCD, and the NARCD has a better acceleration effect than the RCDm. In the case of an overdetermined linear system, for the NARCD method, the fatter the matrix, the better the acceleration. The acceleration effect of NARCD becomes better when c in the MATLAB function unifrnd (c, 1, m,n) increases. The block coordinate descent method is a very efficient method for solving large linear equations; in future work, it would be interesting to apply the two accelerated formats to the block coordinate descent method.

Author Contributions

Software, W.B.; Validation, F.Z.; Investigation, Q.W.; Writing—original draft, Q.W.; Writing—review and editing, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Key Research and Development program of China (2019YFC1408400).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Leventhal, D.; Lewis, A.S. Randomized methods for linear constraints: Convergence rates and conditioning. Math. Oper. Res. 2010, 35, 641–654. [Google Scholar] [CrossRef] [Green Version]
  2. Ruhe, A. Numerical aspects of Gram-Schmidt orthogonalization of vectors. Linear Algebra Its Appl. 1983, 52, 591–601. [Google Scholar] [CrossRef] [Green Version]
  3. Breheny, P.; Huang, J. Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 2011, 5, 232. [Google Scholar] [CrossRef] [Green Version]
  4. Chang, K.W.; Hsieh, C.J.; Lin, C.J. Coordinate descent method for large-scale l2-loss linear support vector machines. J. Mach. Learn. Res. 2008, 9, 1369–1398. [Google Scholar]
  5. Canutescu, A.A.; Dunbrack Jr, R.L. Cyclic coordinate descent: A robotics algorithm for protein loop closure. Protein Sci. 2003, 12, 963–972. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Bouman, C.A.; Sauer, K. A unified approach to statistical tomography using coordinate descent optimization. IEEE Trans. Image Process. 1996, 5, 480–492. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Ye, J.C.; Webb, K.J.; Bouman, C.A.; Millane, R.P. Optical diffusion tomography by iterative-coordinate-descent optimization in a Bayesian framework. JOSA A 1999, 16, 2400–2412. [Google Scholar] [CrossRef]
  8. Bai, Z.Z.; Wu, W.T. On greedy randomized coordinate descent methods for solving large linear least-squares problems. Numer. Linear Algebra Appl. 2019, 26, e2237. [Google Scholar] [CrossRef]
  9. Zhang, J.; Guo, J. On relaxed greedy randomized coordinate descent methods for solving large linear least-squares problems. Appl. Numer. Math. 2020, 157, 372–384. [Google Scholar] [CrossRef]
  10. Lu, Z.; Xiao, L. On the complexity analysis of randomized block-coordinate descent methods. Math. Program. 2015, 152, 615–642. [Google Scholar] [CrossRef] [Green Version]
  11. Necoara, I.; Nesterov, Y.; Glineur, F. Random block coordinate descent methods for linearly constrained optimization over networks. J. Optim. Theory Appl. 2017, 173, 227–254. [Google Scholar] [CrossRef]
  12. Richtárik, P.; Takáč, M. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 2014, 144, 1–38. [Google Scholar] [CrossRef] [Green Version]
  13. Karczmarz, S. Angenaherte auflosung von systemen linearer glei-chungen. Bull. Int. Acad. Pol. Sic. Let. Cl. Sci. Math. Nat. 1937, 35, 355–357. [Google Scholar]
  14. Strohmer, T.; Vershynin, R. A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 2009, 15, 262–278. [Google Scholar] [CrossRef] [Green Version]
  15. Bai, Z.Z.; Wu, W.T. On greedy randomized Kaczmarz method for solving large sparse linear systems. SIAM J. Sci. Comput. 2018, 40, A592–A606. [Google Scholar] [CrossRef]
  16. Bai, Z.Z.; Wu, W.T. On relaxed greedy randomized Kaczmarz methods for solving large sparse linear systems. Appl. Math. Lett. 2018, 83, 21–26. [Google Scholar] [CrossRef]
  17. Liu, Y.; Gu, C.Q. Variant of greedy randomized Kaczmarz for ridge regression. Appl. Numer. Math. 2019, 143, 223–246. [Google Scholar] [CrossRef]
  18. Guan, Y.J.; Li, W.G.; Xing, L.L.; Qiao, T.T. A note on convergence rate of randomized Kaczmarz method. Calcolo 2020, 57, 1–11. [Google Scholar] [CrossRef]
  19. Du, K.; Gao, H. A new theoretical estimate for the convergence rate of the maximal weighted residual Kaczmarz algorithm. Numer. Math. Theory Methods Appl. 2019, 12, 627–639. [Google Scholar]
  20. Yang, X. A geometric probability randomized Kaczmarz method for large scale linear systems. Appl. Numer. Math. 2021, 164, 139–160. [Google Scholar] [CrossRef]
  21. Nesterov, Y. A method for unconstrained convex minimization problem with the rate of convergence O (1/k2). Dokl. Akad. Nauk Sssr 1983, 269, 543–547. [Google Scholar]
  22. Nesterov, Y. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 2012, 22, 341–362. [Google Scholar] [CrossRef] [Green Version]
  23. Polyak, B.T. Some methods of speeding up the convergence of iteration methods. Ussr Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
  24. Sun, T.; Li, D.; Quan, Z.; Jiang, H.; Li, S.; Dou, Y. Heavy-ball algorithms always escape saddle points. arXiv 2019, arXiv:1907.09697. [Google Scholar]
  25. Sarowar Morshed, M.; Saiful Islam, M. Accelerated Sampling Kaczmarz Motzkin Algorithm for The Linear Feasibility Problem. J. Glob. Optim. 2019, 77, 361–382. [Google Scholar] [CrossRef]
  26. Liu, J.; Wright, S. An accelerated randomized Kaczmarz algorithm. Math. Comput. 2016, 85, 153–178. [Google Scholar] [CrossRef]
  27. Loizou, N.; Richtárik, P. Momentum and stochastic momentum for stochastic gradient, newton, proximal point and subspace descent methods. Comput. Optim. Appl. 2020, 77, 653–710. [Google Scholar] [CrossRef]
  28. Higham, D.J.; Higham, N.J. MATLAB Guide; SIAM: Philadelphia, PA, USA, 2016. [Google Scholar]
  29. Hansen, P.C.; Jørgensen, J.S. AIR Tools II: Algebraic iterative reconstruction methods, improved implementation. Numer. Algorithms 2018, 79, 107–137. [Google Scholar] [CrossRef]
Figure 1. (a,b): m = 300 rows and n = 150, 100 columns for different δ . (c,d): m = 800 rows and n = 300, 200 columns for different δ . (e,f): m = 8000 rows and n = 3000, 2000 columns for different δ .
Figure 1. (a,b): m = 300 rows and n = 150, 100 columns for different δ . (c,d): m = 800 rows and n = 300, 200 columns for different δ . (e,f): m = 8000 rows and n = 3000, 2000 columns for different δ .
Mathematics 10 04379 g001
Figure 2. (a,b): m = 4000 rows and n = 800, 1000 columns for RCD, RCDm, and NARCD. (c,d): m = 8000 rows and n = 2000, 3000 columns for RCD, RCDm, and NARCD. (e,f): m = 12,000 rows and n = 2000, 4000 columns for RCD, RCDm, and NARCD.
Figure 2. (a,b): m = 4000 rows and n = 800, 1000 columns for RCD, RCDm, and NARCD. (c,d): m = 8000 rows and n = 2000, 3000 columns for RCD, RCDm, and NARCD. (e,f): m = 12,000 rows and n = 2000, 4000 columns for RCD, RCDm, and NARCD.
Mathematics 10 04379 g002
Figure 3. The speed-up of the RCD method against the NARCD and RCDm for matrices A R m × n with m = 300 × k and n = 100 × k .
Figure 3. The speed-up of the RCD method against the NARCD and RCDm for matrices A R m × n with m = 300 × k and n = 100 × k .
Mathematics 10 04379 g003
Figure 4. m = 800 and n = 300 for NARCD and NASGD.
Figure 4. m = 800 and n = 300 for NARCD and NASGD.
Mathematics 10 04379 g004
Figure 5. Performance of RCD, RCDm, and NARCD methods for the seismictomo test problem. (a) Exact seismic. (b) RCD. (c) RCDm. (d) NARCD.
Figure 5. Performance of RCD, RCDm, and NARCD methods for the seismictomo test problem. (a) Exact seismic. (b) RCD. (c) RCDm. (d) NARCD.
Mathematics 10 04379 g005
Table 1. For different δ , IT, and CPU of RCDm for matrices A R m × n with m = 8000 and different n.
Table 1. For different δ , IT, and CPU of RCDm for matrices A R m × n with m = 8000 and different n.
δ ITCPU
m × 3000 m × 2000 m × 1000 m × 800 m × 3000 m × 2000 m × 1000 m × 800
0416,319209,59259,63643,10721.26259.97952.30011.9263
0.1338,108155,67252,80638,48318.28907.85862.07351.5128
0.2352,018146,36546,74437,42919.40767.39411.77751.4354
0.3326,510135,15747,96332,41217.44896.81532.25771.2290
0.4279,492123,87141,81231,27015.20376.27752.00891.2017
Table 2. For different δ , IT, and CPU of RCDm for matrices A R m × n with m = 800 and different n.
Table 2. For different δ , IT, and CPU of RCDm for matrices A R m × n with m = 800 and different n.
δ ITCPU
800 × 300 800 × 200 800 × 100 800 × 50 800 × 300 800 × 200 800 × 100 800 × 50
043,76018,200561524490.27360.12390.04420.0149
0.139,44216,200569621880.24130.09290.03230.0131
0.230,90015,724528624270.20910.08610.03030.0150
0.328,50114,047464518580.24080.07910.02640.0118
0.425,91812,211418517350.14870.13740.02440.0114
Table 3. For different δ , IT, and CPU of RCDm for matrices A R m × n with m = 300 and different n.
Table 3. For different δ , IT, and CPU of RCDm for matrices A R m × n with m = 300 and different n.
δ ITCPU
300 × 200 300 × 150 300 × 100 300 × 50 300 × 200 300 × 150 300 × 100 300 × 50
090,51732,68513,31038580.47220.19760.06410.0169
0.167,61030,38213,02231210.46690.17190.05660.0137
0.277,74831,87911,38226540.45290.16570.04920.0126
0.378,02325,130988824900.45280.15400.04890.0147
0.466,66318,566796523440.40370.10460.04110.0115
Table 4. IT and CPU of RCD, RCDm, NARCD for matrices A R m × n with m = 4000 and different n.
Table 4. IT and CPU of RCD, RCDm, NARCD for matrices A R m × n with m = 4000 and different n.
ITCPU
4000 × 800 4000 × 1000 4000 × 800 4000 × 1000
RCD57,72381,9261.39391.9264
RCDm44,96266,4251.10681.5824
NARCD20,07524,1840.86570.9711
Table 5. IT and CPU of RCD, RCDm, NARCD for matrices A R m × n with m = 8000 and different n.
Table 5. IT and CPU of RCD, RCDm, NARCD for matrices A R m × n with m = 8000 and different n.
ITCPU
8000 × 2000 8000 × 3000 8000 × 2000 8000 × 3000
RCD194,046414,4659.435724.2728
RCDm144,592312,6658.086119.1512
NARCD45,70071,2164.58018.0357
Table 6. IT and CPU of RCD, RCDm, NARCD for matrices A R m × n with m = 12,000 and different n.
Table 6. IT and CPU of RCD, RCDm, NARCD for matrices A R m × n with m = 12,000 and different n.
ITCPU
12,000 × 200012,000 × 400012,000 × 200012,000 × 4000
RCD146,007465,48410.874631.4180
RCDm110,339340,6338.589423.4675
NARCD43,11089,0465.731211.0535
Table 7. The speed-up of the RCD method against the NARCD and RCDm.
Table 7. The speed-up of the RCD method against the NARCD and RCDm.
4000 × 1000 8000 × 3000 12,000 × 4000
s p e e d u p 1 1.21731.26741.3387
s p e e d u p 2 1.98373.02062.8423
Table 8. The condition number of different matrices.
Table 8. The condition number of different matrices.
0 0.2 0.4 0.9
c o n d ( A 800 × 300 ) 75.6431113.8689172.74041425.0834
c o n d ( A 1000 × 800 ) 452.9536673.91111104.06018969.5561
Table 9. IT and CPU of RCD, RCDm, NARCD for matrices A R m × n m = 800, n = 300.
Table 9. IT and CPU of RCD, RCDm, NARCD for matrices A R m × n m = 800, n = 300.
ITCPU
0 0.2 0.4 0.9 0 0.2 0.4 0.9
RCD34,95368,289150,982-0.19930.40090.9189-
RCDm30,90854,842120,4904,374,3850.17150.36420.678724.3558
NARCD892114,96025,714469,0830.07080.11480.21343.5497
Table 10. The speed-up of the RCD method against the NARCD and RCDm, A R m × n m = 800, n = 300.
Table 10. The speed-up of the RCD method against the NARCD and RCDm, A R m × n m = 800, n = 300.
0 0.2 0.4 0.9
s p e e d u p 1 1.16201.10071.3539-
s p e e d u p 2 2.52913.49214.3059-
Table 11. IT and CPU of RCD, RCDm, NARCD for matrices A R m × n m = 1000, n = 800.
Table 11. IT and CPU of RCD, RCDm, NARCD for matrices A R m × n m = 1000, n = 800.
ITCPU
0 0 . 2 0 . 4 0 . 9 0 0 . 2 0 . 4 0 . 9
RCD1,111,3631,694,2623,609,833-8.050811.981525.3595-
RCDm746,9481,128,3632,190,455-5.07487.770115.2623-
NARCD90,521145,186209,7951,123,6320.80051.07751.867010.4081
Table 12. The speed-up of the RCD method against the NARCD and RCDm, A R m × n m = 1000, n = 800.
Table 12. The speed-up of the RCD method against the NARCD and RCDm, A R m × n m = 1000, n = 800.
0 0.2 0.4 0.9
s p e e d u p 1 1.58641.54201.6615-
s p e e d u p 2 10.057211.119713.5830-
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wang, Q.; Li, W.; Bao, W.; Zhang, F. Accelerated Randomized Coordinate Descent for Solving Linear Systems. Mathematics 2022, 10, 4379. https://doi.org/10.3390/math10224379

AMA Style

Wang Q, Li W, Bao W, Zhang F. Accelerated Randomized Coordinate Descent for Solving Linear Systems. Mathematics. 2022; 10(22):4379. https://doi.org/10.3390/math10224379

Chicago/Turabian Style

Wang, Qin, Weiguo Li, Wendi Bao, and Feiyu Zhang. 2022. "Accelerated Randomized Coordinate Descent for Solving Linear Systems" Mathematics 10, no. 22: 4379. https://doi.org/10.3390/math10224379

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop