Next Article in Journal
A Modified Black-Scholes-Merton Model for Option Pricing
Next Article in Special Issue
HIFA-LPR: High-Frequency Augmented License Plate Recognition in Low-Quality Legacy Conditions via Gradual End-to-End Learning
Previous Article in Journal
Driver Response Time and Age Impact on the Reaction Time of Drivers: A Driving Simulator Study among Professional-Truck Drivers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Accelerated Convex Optimization Algorithm with Line Search and Applications in Machine Learning

by
Dawan Chumpungam
1,
Panitarn Sarnmeta
2 and
Suthep Suantai
1,3,*
1
Data Science Research Center, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
2
KOSEN-KMITL, Bangkok 10520, Thailand
3
Research Group in Mathematics and Applied Mathematics, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(9), 1491; https://doi.org/10.3390/math10091491
Submission received: 16 March 2022 / Revised: 22 April 2022 / Accepted: 27 April 2022 / Published: 30 April 2022

Abstract

:
In this paper, we introduce a new line search technique, then employ it to construct a novel accelerated forward–backward algorithm for solving convex minimization problems of the form of the summation of two convex functions in which one of these functions is smooth in a real Hilbert space. We establish a weak convergence to a solution of the proposed algorithm without the Lipschitz assumption on the gradient of the objective function. Furthermore, we analyze its performance by applying the proposed algorithm to solving classification problems on various data sets and compare with other line search algorithms. Based on the experiments, the proposed algorithm performs better than other line search algorithms.

1. Introduction

The convex minimization problem in the form of the sum of two convex functions plays a very important role in machine learning. This problem has been analyzed and studied by many authors because of its applications in various fields such as data science, computer science, statistics, engineering, physics, and medical science. Some examples of these applications are signal processing, compressed sensing, medical image reconstruction, digital image processing, and data prediction and classification; see [1,2,3,4,5,6,7,8].
As we know in machine learning, especially in data prediction and classification problems, the main objective is to minimize loss functions. Many loss functions can be viewed as convex functions; thus by employing convex minimization, one could find the minimum of such functions, which in turn solve data prediction and classification problems. Many works have implemented this strategy; see [9,10,11] and the references therein for more information. In this work, we apply extreme learning machine together with the least absolute shrinkage and selection operator to solve classification problems; more detail will be discussed in a later section. First, we introduce a convex minimization problem, which can be formulated as the following form:
min x H { f ( x ) + g ( x ) } ,
where f : H R { + } is proper, convex differentiable on an open set containing d o m ( g ) and g : H R { + } is a proper, lower semicontinuous convex function defined on a real Hilbert space H.
A solution of (1) is in fact a fixed point of the operator p r o x α g ( I α f ) , i.e.,
x * = p r o x α g ( I α f ) ( x * ) ,
where α > 0 , and p r o x α g ( I α f ) ( x ) = a r g min y H { g ( y ) + 1 2 α ( x α f ( x ) ) y 2 } , which is known as the forward–backward operator. In order to solve (1), the forward–backward algorithm [12] was introduced as follows:
x n + 1 = p r o x α n g backward ( I α n f ) forward ( x n ) , f o r a l l n N ,
where α n is a positive number. If f is L-Lipschitz continuous and α n ( 0 , 2 L ) , then a sequence generated by (3) converges weakly to a solution of (1). There are several techniques that can improve the performance of (3). For instance, we could utilize an inertial step, which was first introduced by Polyak [13], to solve smooth convex minimization problems. Since then, there have been several works that included an inertial step in their algorithms to accelerate the convergence behavior; see [14,15,16,17,18,19] for examples.
One of the most famous forward–backward-type algorithms that implements an inertial step is the fast iterative shrinkage–thresholding algorithm (FISTA) [20]. It is defined as the following Algorithm 1.
Algorithm 1. FISTA.
1:
Input Given y 1 = x 0 R n , and t 1 = 1 , for n N ,
x n = p r o x 1 L g ( y n 1 L f ( y n ) ) , t n + 1 = 1 + 1 + 4 t n 2 2 , θ n = t n 1 t n + 1 , y n + 1 = x n + θ n ( x n x n 1 ) ,
where L is a Lipschitz constant of f .
The term x n + θ n ( x n x n 1 ) is known as an inertial term with an inertial parameter θ n . It has been shown that FISTA performs better than (3). Later, other forward–backward-type algorithms have been introduced and studied by many authors; see for instance [2,8,18,21,22]. However, most of these works assume the Lipschitz assumption on f , which is difficult for computation in general. Therefore, in this paper, we focus on another approach where f is not necessarily Lipschitz continuous.
In 2016, Cruz and Nghia [23] introduced a line search technique as the following Algorithm 2.
Algorithm 2. Line search 1. ( x , δ , σ , θ )
1:
Input Given x d o m ( g ) , δ > 0 , σ > 0 and θ ( 0 , 1 ) .
2:
Set γ = σ .
3:
while γ f ( p r o x γ g ( x γ f ( x ) ) ) f ( x ) > δ p r o x γ g ( x γ f ( x ) ) x do
4:
    Set  γ = θ γ
5:
end while
6:
Output γ .
   They asserted that Line Search 1 stops after finitely many steps and proposed the following Algorithm 3.
Algorithm 3. Algorithm with Line Search 1.
1:
Input Given x 0 d o m ( g ) , δ ( 0 , 1 2 ) , σ > 0 , and θ ( 0 , 1 ) , for all n N ,
x n + 1 = p r o x γ n g ( I γ n f ) ( x n ) ,
where γ n : = Line Search 1 ( x n , δ , σ , θ ) .
They also showed that the sequence { x n } defined by Algorithm 3 converges weakly to a solution of (1) under Assumptions A1 and A2 where:
A1.
f , g are proper lower semicontinuous convex functions with d o m ( g ) d o m ( f ) ;
A2.
f is differentiable on an open set containing d o m ( g ) , and f is uniformly continuous on any bounded subset of d o m ( g ) and maps any bounded subset of d o m ( g ) to a bounded set in H .
It is noted that the L-Lipschitz continuity of f is not necessarily assumed. Moreover, if f is L-Lipschitz continuous, then A2 is satisfied.
   In 2019, Kankam et al. [3] proposed the new line search as the following Algorithm 4.
Algorithm 4. Line search 2. ( x , δ , σ , θ )
1:
Input Given x d o m ( g ) , δ > 0 , σ > 0 and θ ( 0 , 1 ) . Set
L ( x , γ ) = p r o x γ g ( x γ f ( x ) ) ,   and
S ( x , γ ) = p r o x γ g ( L ( x , γ ) γ f ( L ( x , γ ) ) ) .
2:
Set γ = σ .
3:
while
γ max { f ( S ( x , γ ) ) f ( L ( x , γ ) ) , f ( L ( x , γ ) ) f ( x ) } > δ ( S ( x , γ ) L ( x , γ ) + L ( x , γ ) x )
do
4:
    Set  γ = θ γ , L ( x , γ ) = L ( x , θ γ ) , S ( x , γ ) = S ( x , θ γ )
5:
end while
6:
Output γ .
They also asserted that Line Search 2 stops at finitely many steps and proposed the following Algorithm 5.
Algorithm 5. Algorithm with Line Search 2.
1:
Input Given x 0 d o m ( g ) , δ ( 0 , 1 8 ) , σ > 0 and θ ( 0 , 1 ) , for all n N ,
y n = p r o x γ n g ( x n γ n f ( x n ) ) , x n + 1 = p r o x γ n g ( y n γ n f ( y n ) ) ,
where γ n : = L i n e S e a r c h 2 ( x n , δ , σ , θ ) .
A weak convergence result of this algorithm was obtained under Assumptions A1 and A2. Although Algorithms 3 and 5 obtained weak convergence results without the Lipschitz assumption on f , the two algorithms did not utilize an inertial step yet. Therefore, some improvements of their convergence behavior using this technique are interesting to investigate.
Motivated by the works mentioned earlier, we aim to introduce a new line search technique and prove that it is well-defined. Then, we employ it to construct a novel forward–backward algorithm that utilizes an inertial step to improve its performance to be better than the other line search algorithms. We prove a weak convergence theorem of the proposed algorithm without the Lipschitz assumption on f and apply it to solve classification problems on various data sets. We also compare its performance with Algorithms 3 and 5 to show that the proposed algorithm performs better.
This work is organized as follows: In Section 2, we recall some important definitions and lemmas used in later sections. In Section 3, we introduce a new line search technique and algorithm for solving (1). Then, we analyze the convergence and complexity of the proposed algorithm under Assumptions A1 and A2. In Section 4, we apply the proposed algorithm to solve data classification problems and compare its performance with other algorithms. Finally, the conclusion of this work is presented in Section 5.

2. Preliminaries

In this section, some important definitions and lemmas, which will be used in later sections, are presented.
Let { x n } be a sequence in H and x H . We denote x n x and x n x as a strong and weak convergence of { x n } to x, respectively. Let f : H R { + } be a proper lower semicontinuous and convex function. We denote d o m ( f ) = { x H : f ( x ) < + } .
A subdifferential of f at x is defined by
f ( x ) : = { u H : u , y x + f ( x ) f ( y ) , y H } .
A proximal operator  p r o x α f : H d o m ( f ) is defined as follows:
p r o x α f ( x ) = ( I + α f ) 1 ( x ) ,
where I is an identity mapping and α is a positive number. It is well known that this operator is single-valued, nonexpansive, and
x p r o x α f ( x ) α f ( p r o x α f ( x ) ) , for   all x H   and   α > 0 ;
see [23] for more details. Next, we present some important lemmas for this work.
Lemma 1
([24]). Let f be a subdifferential of f. Then, the following hold:
(i) 
f is maximal monotone;
(ii) 
G p h ( f ) : = { ( x , y ) H × H : y f ( x ) } is demiclosed, i.e., for any sequence { ( x n , y n ) } G p h ( f ) such that { x n } x and { y n } y , then ( x , y ) G p h ( f ) .
Lemma 2
([25]). Let f , g : H R { + } be proper lower semicontinuous convex functions with d o m ( g ) d o m ( f ) and J ( x , α ) = p r o x α g ( x α f ( x ) ) . Then, for any x d o m ( g ) and α 2 α 1 > 0 , we have
α 2 α 1 x J ( x , α 1 ) x J ( x , α 2 ) x J ( x , α 1 ) .
Lemma 3
([26]). Let H be a real Hilbert space. Then, for all a , b , c H and ζ [ 0 , 1 ] , the following hold:
(i) 
a ± b 2 = a 2 ± 2 a , b + b 2 ;
(ii) 
ζ a + ( 1 ζ ) b 2 = ζ a 2 + ( 1 ζ ) b 2 ζ ( 1 ζ ) a b 2 ;
(iii) 
a + b 2 a 2 + 2 b , a + b ;
(iv) 
a b , b c = 1 2 ( a c 2 a b 2 b c 2 ) .
Lemma 4
([8]). Let { a n } and { b n } be sequences of non-negative real numbers such that
a n + 1 ( 1 + b n ) a n + b n a n 1 , f o r a l l n N .
Then, the following holds:
a n + 1 K j = 1 n ( 1 + 2 b j ) , w h e r e K = max { a 1 , a 2 } .
Moreover, if n = 1 + b n < + , then { a n } is bounded.
Lemma 5
([26]). Let { α n } , { β n } and { γ n } be sequences of non-negative real numbers such that
α n + 1 ( 1 + γ n ) α n + β n , f o r a l l n N .
If n = 1 + γ n < + and n = 1 + β n < + , then lim n + α n exists.
Lemma 6
([27], Opial). Let { x n } be a sequence in a Hilbert space H. If there exists a nonempty subset Ω of H such that the following hold:
(i)
For any x * Ω , lim n + x n x * exists;
(ii)
Every weak-cluster point of { x n } belongs to Ω .
Then, { x n } converges weakly to an element in Ω.

3. Main Results

In this section, we define a new line search technique and a new accelerated algorithm with the new line search for solving (1). We denote S * the set of all solutions of (1) and suppose that f , g : H R { + } are two convex functions that satisfy Assumptions A1 and A2, and d o m ( g ) is closed. Furthermore, we also suppose that S * .
We first introduce a new line search technique as the following Algorithm 6.
Algorithm 6. Line Search 3 ( x , δ , σ , θ ) .
1:
Input Given x d o m ( g ) , δ > 0 , σ > 0 and θ ( 0 , 1 ) . Set
L ( x , γ ) = p r o x γ g ( x γ f ( x ) ) ,   and  
S ( x , γ ) = p r o x γ g ( L ( x , γ ) γ f ( L ( x , γ ) ) ) .
2:
Set γ = σ .
3:
while
γ 2 ( f ( S ( x , γ ) ) f ( L ( x , γ ) ) + f ( L ( x , γ ) ) f ( x ) ) > δ ( S ( x , γ ) L ( x , γ ) + L ( x , γ ) x ) ,
or γ f ( L ( x , γ ) ) f ( x ) > 4 δ L ( x , γ ) x .
do
4:
    Set  γ = θ γ , L ( x , γ ) = L ( x , θ γ ) , S ( x , γ ) = S ( x , θ γ )
5:
end while
6:
Output γ .
We first show that Line Search 3 terminates at finitely many steps.
Lemma 7.
Line Search 3 stops at finitely many steps.
Proof. 
If x S * , then x = L ( x , σ ) = S ( x , σ ) , so Line Search 3 stops with zero steps. If x S * , suppose by contradiction that, for all n N , the following hold:
σ θ n 2 ( f ( S ( x , σ θ n ) ) f ( L ( x , σ θ n ) ) + f ( L ( x , σ θ n ) ) f ( x ) ) > δ ( S ( x , σ θ n ) L ( x , σ θ n ) + L ( x , σ θ n ) x ) ,
or
σ θ n f ( L ( x , σ θ n ) ) f ( x ) > 4 δ L ( x , σ θ n ) x .
Then, from these assumptions, we can find a subsequence { σ θ n k } of { σ θ n } such that (5) or (6) holds. First, we show that
{ f ( L ( x , σ θ n ) ) f ( x ) } and { f ( S ( x , σ θ n ) ) f ( L ( x , σ θ n ) ) }
are bounded. It follows from Lemma 2 that
L ( x , σ θ n ) x L ( x , σ ) x ,
for all n N . In combination with A2, we conclude that { f ( L ( x , σ θ n ) ) f ( x ) } is bounded. Next, we prove that { f ( S ( x , σ θ n ) ) f ( L ( x , σ θ n ) ) } is bounded. Since p r o x γ g is nonexpansive, for any γ > 0 , then
S ( x , σ θ n ) L ( x , σ θ n ) = p r o x σ θ n g ( L ( x , σ θ n ) σ θ n f ( L ( x , σ θ n ) ) ) p r o x σ θ n g ( x σ θ n f ( x ) ) ( L ( x , σ θ n ) σ θ n f ( L ( x , σ θ n ) ) ( x σ θ n f ( x ) ) L ( x , σ θ n ) x + σ θ n f ( L ( x , σ θ n ) f ( x ) ) L ( x , σ θ n ) x + σ f ( L ( x , σ θ n ) f ( x ) ) ,
for all n N ; hence, { S ( x , σ θ n ) L ( x , σ θ n ) } is bounded. Again, it follows from A2 that { f ( S ( x , σ θ n ) ) f ( L ( x , σ θ n ) ) } is bounded. To complete the proof, we consider the only two possible cases to find a contradiction.
Case 1: Suppose that there exists a subsequence { σ θ n k } of { σ θ n } such that (5) holds, for all k N . Then, it follows that S ( x , σ θ n k ) L ( x , σ θ n k ) 0 and L ( x , σ θ n k ) x 0 , as k + . Since f is uniformly continuous, we obtain:
f ( S ( x , σ θ n k ) ) f ( L ( x , σ θ n k ) ) 0 and f ( L ( x , σ θ n k ) ) f ( x ) 0 ,
as k + . Therefore, it follows from (5) that L ( x , σ θ n k ) x σ θ n k 0 , as k + . By (4), we obtain
x σ θ n k f ( x ) L ( x , σ θ n k ) σ θ n k g ( L ( x , σ θ n k ) ) .
Thus, L ( x , σ θ n k ) x σ θ n k f ( x ) g ( L ( x , σ θ n k ) ) . Since L ( x , σ θ n k ) x , as k + , we obtain from Lemma 1 that 0 f ( x ) + g ( x ) ( f + g ) ( x ) . Hence, x S * , which is a contradiction.
Case 2: Suppose that there is a subsequence { σ θ n k } of { σ θ n } satisfying (6), for all k N . Then, L ( x , σ θ n k ) x 0 , as k + . Again, from the uniform continuity of f , we have
f ( L ( x , σ θ n k ) ) f ( x ) 0 ,
as k + . From (6), we conclude that
L ( x , σ θ n k ) x σ θ n k 0 ,
as k + . By the same argument as in Case 1, we can show that 0 ( f + g ) ( x ) , and hence, x S * , a contradiction. Therefore, we conclude that Line Search 3 stops with finite steps, and the proof is complete.    □
We propose a new inertial algorithm with Line Search 3 as following Algorithm 7.
Algorithm 7. Inertial algorithm with Line Search 3.
1:
Input Given x 0 , x 1 d o m ( g ) , α n [ 0 , 1 ] , β n 0 , σ > 0 , θ ( 0 , 1 ) and δ ( 0 , 1 8 ) , for n N ,
y n = x n + β n ( x n x n 1 ) , z n = P d o m ( g ) y n , w n = p r o x γ n g ( z n γ n f ( z n ) ) , x n + 1 = ( 1 α n ) w n + α n p r o x γ n g ( w n γ n f ( w n ) ) ,
where γ n : = Line Search 3 ( z n , δ , σ , θ ) , and P d o m ( g ) is a metric projection map onto d o m ( g ) .
The diagram of Algorithm 7 can be seen in Figure 1.
Next, we prove the following lemma, which will play a crucial role in our main theorems.
Lemma 8.
Let γ n : = L i n e S e a r c h 3 ( z n , δ , σ , θ ) . Then, for all n N and x d o m ( g ) , the following hold:
(I) 
z n x 2 w n x 2 2 γ n [ ( f + g ) ( w n ) ( f + g ) ( x ) ] + ( 1 8 δ ) w n z n 2 ;
(II) 
z n x 2 v n x 2 2 γ n [ ( f + g ) ( w n ) + ( f + g ) ( v n ) 2 ( f + g ) ( x ) ]
                             + ( 1 8 δ ) ( w n z n 2 + v n w n 2 ) .
where v n = p r o x γ n g ( w n γ n f ( w n ) ) .
Proof. 
First, we show that ( I ) is true. From (4), we know that
z n w n γ n f ( z n ) g ( w n ) , for all n N .
Moreover, it follows from the definitions of g ( w n ) , f ( z n ) and f ( w n ) that
g ( x ) g ( w n ) z n w n γ n f ( z n ) , x w n ,
f ( x ) f ( z n ) f ( z n ) , x z n   and   f ( z n ) f ( w n ) f ( w n ) , z n w n ,
for all n N . Consequently,
f ( x ) f ( z n ) + g ( x ) g ( w n ) 1 γ n z n w n , x w n + f ( z n ) , w n z n = 1 γ n z n w n , x w n + f ( z n ) f ( w n ) , w n z n + f ( w n ) , w n z n 1 γ n z n w n , x w n f ( z n ) f ( w n ) w n z n + f ( w n ) , w n z n 1 γ n z n w n , x w n 4 δ γ n w n z n 2 + f ( w n ) f ( z n ) ,
for all n N . It follows that
1 γ n z n w n , w n x ( f + g ) ( w n ) ( f + g ) ( x ) 4 δ γ n w n z n 2 , for all n N .
From Lemma 3, we have z n w n , w n x = 1 2 ( z n x 2 z n w n 2 w n x 2 ) , and hence,
1 2 γ n ( z n x 2 z n w n 2 w n x 2 ) ( f + g ) ( w n ) ( f + g ) ( x ) 4 δ γ n w n z n 2 ,
for all n N . Then, it follows that, for any x d o m ( g ) ,
z n x 2 w n x 2 2 γ n [ ( f + g ) ( w n ) ( f + g ) ( x ) ] + ( 1 8 δ ) w n z n 2 ,
and ( I ) is proven. Next, we show ( I I ) . From (4), we have that
z n w n γ n f ( z n ) g ( w n ) , and
w n v n γ n f ( w n ) g ( v n ) .
Then,
g ( x ) g ( w n ) z n w n γ n f ( z n ) , x w n , and
g ( x ) g ( v n ) w n v n γ n f ( w n ) , x v n , for all n N .
Moreover,
f ( x ) f ( z n ) f ( z n ) , x z n ,
f ( x ) f ( w n ) f ( w n ) , x w n ,
f ( z n ) f ( w n ) f ( w n ) , z n w n , and
f ( w n ) f ( v n ) f ( v n ) , w n v n , for all n N .
The above inequalities imply
f ( x ) f ( z n ) + f ( x ) f ( w n ) + g ( x ) g ( w n ) + g ( x ) g ( v n ) 1 γ n z n w n , x w n + f ( z n ) , w n z n + 1 γ n w n v n , x v n + f ( w n ) , v n w n = 1 γ n z n w n , x w n + f ( z n ) f ( w n ) , w n z n + f ( w n ) , w n z n + 1 γ n w n v n , x v n + f ( w n ) f ( v n ) , v n w n + f ( v n ) , v n w n 1 γ n z n w n , x w n + 1 γ n w n v n , x v n f ( w n ) f ( z n ) w n z n + f ( w n ) , w n z n f ( v n ) f ( w n ) v n w n + f ( v n ) , v n w n 1 γ n z n w n , x w n + 1 γ n w n v n , x v n f ( w n ) f ( z n ) ( w n z n + v n w n ) + f ( w n ) , w n z n f ( v n ) f ( w n ) ( w n z n + v n w n ) + f ( v n ) , v n w n = 1 γ n z n w n , x w n + 1 γ n w n v n , x v n + f ( w n ) , w n z n + f ( v n ) , v n w n ( f ( w n ) f ( z n ) + f ( v n ) f ( w n ) ) ( w n z n + v n w n ) 1 γ n z n w n , x w n + 1 γ n w n v n , x v n + f ( w n ) , w n z n + f ( v n ) , v n w n 2 δ γ n ( w n z n + v n w n ) 2 1 γ n z n w n , x w n + 1 γ n w n v n , x v n + f ( v n ) f ( z n ) 4 δ γ n ( w n z n 2 + v n w n 2 ) ,
for all x d o m ( g ) and n N . Hence,
1 γ n z n w n , w n x + 1 γ n w n v n , v n x ( f + g ) ( w n ) + ( f + g ) ( v n ) 2 ( f + g ) ( x ) 4 δ γ n w n z n 2 4 δ γ n v n w n 2 .
Moreover, from Lemma 3, we have, for all n N ,
z n w n , w n x = 1 2 ( z n x 2 z n w n 2 w n x 2 ) , and
w n v n , v n x = 1 2 ( w n x 2 w n v n 2 v n x 2 ) .
As a result, we obtain
1 2 γ n ( z n x 2 z n w n 2 ) 1 2 γ n ( w n v n 2 + v n x 2 ) ( f + g ) ( w n ) + ( f + g ) ( v n ) 2 ( f + g ) ( x ) 4 δ γ n w n z n 2 4 δ γ n v n w n 2 ,
for all x d o m ( g ) , and n N . Therefore,
z n x 2 v n x 2 2 γ n [ ( f + g ) ( w n ) + ( f + g ) ( v n ) 2 ( f + g ) ( x ) ] + ( 1 8 δ ) ( w n z n 2 + v n w n 2 ) ,
for all x d o m ( g ) , and n N , and hence, ( I I ) is proven.    □
Next, we prove the weak convergence result of Algorithm 7.
Theorem 9.
Let { x n } be a sequence generated by Algorithm 7. Suppose that the following hold:
B1. 
n = 1 + β n < + ;
B2. 
There exists γ > 0 such that γ n γ , for all n N .
Then, { x n } converges weakly to some point in S * .
Proof. 
Let x * S * ; obviously, x * d o m ( g ) . The following are direct consequences of Lemma 8:
z n x * 2 w n x * 2 2 γ n [ ( f + g ) ( w n ) ( f + g ) ( x * ) ] + ( 1 8 δ ) w n z n 2 ( 1 8 δ ) w n z n 2 ,
and
z n x * 2 v n x * 2 2 γ n [ ( f + g ) ( w n ) + ( f + g ) ( v n ) 2 ( f + g ) ( x * ) ] + ( 1 8 δ ) ( w n z n 2 + v n w n 2 ) ( 1 8 δ ) ( w n z n 2 + v n w n 2 ) ,
where v n = p r o x γ n g ( w n γ n f ( w n ) ) . Then, we have
x n + 1 x * ( 1 α n ) w n x * + α n v n x * ( 1 α n ) w n x * + α n z n x * z n x * .
Next, we show that lim n x n x * exists. Since P d o m ( g ) is nonexpansive, we have
x n + 1 x * z n x * = P d o m ( g ) y n P d o m ( g ) x * y n x * x n x * + β n x n x n 1 ( 1 + β n ) x n x * + β n x n 1 x * , for all n N .
By using Lemma 4, we have that { x n } is bounded. Consequently, n = 1 + β n x n x n 1 < + , and
y n x n = β n x n x n 1 0 , as n + .
By (10) together with Lemma 5, we conclude that lim n + x n x * exists. Since x n d o m ( g ) , for all n N , we obtain
y n z n y n x n , for all n N ,
which implies that lim n + y n z n = 0 . Consequently, lim n + x n z n = 0 , and hence, lim n + x n x * = lim n + z n x * . Now, we will show that lim n + x n w n = 0 . To do this, we consider the following two cases.
Case 1. lim sup n + α n = c < 1 , then from (9), we obtain
lim sup n + w n x * = lim sup n + x n x * = lim sup n + z n x * .
Therefore, we obtain from (7) that lim n + w n z n = 0 . As a result, we have lim n + x n w n = 0 .
Case 2. lim sup n + α n = 1 , then it follows from (9) that
lim sup n + v n x * = lim sup n + x n x * = lim sup n + z n x * .
It follows from (8) that lim n + w n z n = 0 , and hence, lim n + x n w n = 0 .
We claim that every weak-cluster point of { x n } belongs to S * . To prove this claim, let w be a weak-cluster point of { x n } . Then, there exists a subsequence { x n k } of { x n } such that x n k w , and hence, w n k w . Next, we show that w S * . From A2, we know that f is uniformly continuous, so lim k + f w n k f z n k = 0 . From (4), we also have
z n k γ n k f z n k w n k γ n k g ( w n k ) , for all k N .
Hence,
z n k w n k γ n k f z n k + f w n k g ( w n k ) + f w n k = ( f + g ) ( w n k ) , for all k N .
By letting k + in the above inequality, we can conclude from (1) that 0 ( f + g ) ( w ) , and hence, w S * . It follows directly from Lemma 6 that { x n } converges weakly to a point in S * , and the proof is now complete.    □
If we set β n = 0 , for all n N , in Algorithm 7, we obtain the following Algorithm 8.
Algorithm 8. Algorithm with Line Search 3.
1:
Input Given x 0 d o m ( g ) , σ > 0 , θ ( 0 , 1 ) , δ ( 0 , 1 8 ) and α n [ 0 , 1 ] , for n N ,
w n = p r o x γ n g ( x n γ n f ( x n ) ) , x n + 1 = ( 1 α n ) w n + α n p r o x γ n g ( w n γ n f ( w n ) ) ,
where γ n : = Line Search 3 ( x n , δ , σ , θ ) .
The diagram of Algorithm 8 can be seen in Figure 2.
We next prove the complexity of Algorithm 8.
Theorem 10.
Let { x n } be a sequence generated by Algorithm 8. Suppose that there exists γ > 0 such that γ n γ , for all n N , then { x n } converges weakly to a point in S * . In addition, if δ ( 0 , 1 16 ) , then the following also holds:
( f + g ) ( x n ) min x H ( f + g ) ( x ) 1 2 γ [ d ( x 0 , S * ) ] 2 n ,
for all n N .
Proof. 
A weak convergence of { x n } is guaranteed by Theorem 9. It remains to show that (11) is true. Let v n = p r o x γ n g ( w n γ n f ( w n ) ) and x * S * .
We first show that f ( x k + 1 ) f ( x k ) , for all k N . We know that x k = z k in Lemma 8, so for any x d o m ( g ) and k N , we have:
x k x 2 w k x 2 2 γ k [ ( f + g ) ( w k ) ( f + g ) ( x ) ] + ( 1 8 δ ) w k x k 2 ,
and
x k x 2 v k x 2 2 γ k [ ( f + g ) ( w k ) + ( f + g ) ( v k ) 2 ( f + g ) ( x ) ] + ( 1 8 δ ) ( w k x k 2 + v k w k 2 ) .
Putting x = x k in (12) and (13), we obtain
w k x k 2 2 γ k [ ( f + g ) ( w k ) ( f + g ) ( x k ) ] + ( 1 8 δ ) w k x k 2 ,
and
v k x k 2 2 γ k [ ( f + g ) ( w k ) + ( f + g ) ( v k ) 2 ( f + g ) ( x k ) ] + ( 1 8 δ ) ( w k x k 2 + v k w k 2 ) ,
respectively. Substituting x with w k in (13), we obtain
x k w k 2 v k w k 2 2 γ k [ ( f + g ) ( v k ) ( f + g ) ( w k ) ] + ( 1 8 δ ) ( w k x k 2 + v k w k 2 ) .
By summing (15) and (16), we obtain
( 16 δ 1 ) x k w k 2 + ( 16 δ 4 ) v k w k 2 4 γ k [ ( f + g ) ( v k ) ( f + g ) ( x k ) ] .
It follows from (14) and (17) that
( f + g ) ( w k ) ( f + g ) ( x k ) and ( f + g ) ( v k ) ( f + g ) ( x k ) ,
respectively, for all k N . Hence,
( f + g ) ( x k + 1 ) ( f + g ) ( x k ) ( 1 α k ) ( f + g ) ( w k ) + α k ( f + g ) ( v k ) ( f + g ) ( x k ) 0 ,
for all k N . Hence, { ( f + g ) ( x k ) } is a non-increasing sequence. Now, put x = x * in (12) and (13), then we obtain
w k x * 2 x k x * 2 2 γ k [ ( f + g ) ( x * ) ( f + g ) ( w k ) ] ,
and
v k x * 2 x k x * 2 2 γ k [ 2 ( f + g ) ( x * ) ( f + g ) ( w k ) ( f + g ) ( v k ) ] 2 γ k [ ( f + g ) ( x * ) ( f + g ) ( v k ) ] .
Inequalities (19) and (20) imply that
x k + 1 x * 2 x k x * 2 ( 1 α k ) w k x * 2 + α k v k x * 2 x k x * 2 2 γ k ( 1 α k ) [ ( f + g ) ( x * ) ( f + g ) ( w k ) ] + 2 γ k α k [ ( f + g ) ( x * ) ( f + g ) ( v k ) ] = 2 γ k ( f + g ) ( x * ) 2 γ k [ ( 1 α k ) ( f + g ) ( w k ) + α k ( f + g ) ( v k ) ] 2 γ k [ ( f + g ) ( x * ) ( f + g ) ( x k + 1 ) ] ,
for all k N . Since γ k γ , we obtain
0 ( f + g ) ( x * ) ( f + g ) ( x k + 1 ) 1 2 γ k ( x k + 1 x * 2 x k x * 2 ) 1 2 γ ( x k + 1 x * 2 x k x * 2 ) ,
for all k N . Summing the above inequality over k = 1 , 2 , 3 , . . . , n 1 , we obtain
n ( f + g ) ( x * ) k = 0 n 1 ( f + g ) ( x k ) 1 2 γ x n x * 2 x 0 x * 2 ,
for all n N . Since, { ( f + g ) ( x k ) } is a non-increasing, we have
n ( f + g ) ( x * ) n ( f + g ) ( x n ) 1 2 γ x n x * 2 x 0 x * 2 ,
for all n N . Hence,
( f + g ) ( x n ) ( f + g ) ( x * ) 1 2 γ x 0 x * 2 n
Since x * is arbitrarily chosen from S * , we obtain
( f + g ) ( x n ) min x H ( f + g ) ( x ) 1 2 γ [ d ( x 0 , S * ) ] 2 n ,
for all n N , and the proof is now complete. □

4. Some Applications on Data Classification

In this section, we apply Algorithms 3, 5, 7, and 8 to solve some classification problems based on a learning technique called extreme learning machine (ELM) introduced by Huang et al. [28]. It is formulated as follows:
Let { ( x k , t k ) : x k R n , t k R m , k = 1 , 2 , , N } be a set of N samples where x k is an input and t k is a target. A simple mathematical model for the output of ELM for SLFNs with M hidden nodes and activation function G is defined by
o j = i = 1 M η i G ( w i , x j + b i ) ,
where w i is the weight that connects the i-th hidden node and the input node, η i is the weight connecting the i-th hidden node and the output node, and b i is the bias. The hidden layer output matrix H is defined by
H = G ( w 1 , x 1 + b 1 ) G ( w M , x 1 + b M ) G ( w 1 , x N + b 1 ) G ( w M , x N + b M ) .
The main objective of ELM is to calculate an optimal weight η = [ η 1 T , , η M T ] T such that H η = T , where T = [ t 1 T , , t N T ] T is the training target. If the Moore–Penrose generalized inverse H of H exists, then η = H T is the solution. However, in general cases, H may not exist or be difficult for computation. Thus, in order to avoid such difficulties, we transformed the problem into a convex minimization problem and used our proposed algorithm to find the solution η without H .
In machine learning, a model can be overfit in the sense that it is very accurate on a training sets, but inaccurate on a testing set. In other words, it cannot be used to predict unknown data. In order to prevent overfitting, the least absolute shrinkage and selection operator (LASSO) [29] is used. It can be formulated as follows:
Minimize : H η T 2 2 + λ η 1 ,
where λ is a regularization parameter. If we set f ( x ) : = H x T 2 2 and g ( x ) : = λ x 1 , then the problem (Section 4) is reduced to the problem (1). Hence, we can use our algorithm as a learning method to find the optimal weight η and solve classification problems.
In the experiments, we aim to classify three data sets from https://archive.ics.uci.edu (accessed on 15 November 2021):
  • Iris data set [30]. Each sample in this data set has four attributes, and the set contains three classes with 50 samples for each type.
  • Heart disease data set [31]. This data set contains 303 samples each of which has 13 attributes. In this data set, we classified two classes of data.
  • Wine data set [32]. In this data set, we classified three classes of 178 samples. Each sample contains 13 attributes.
In all experiments, we used the sigmoid as the activation function. The number of hidden nodes M = 30 . We calculate the accuracy of the output data by:
accuracy = correctly   predicted   data all   data × 100 .
We chose control parameters for each algorithm as seen in Table 1.
In our experiments, the inertial parameters β n for Algorithm 7 were chosen as follows:
β n = 0.95 ,   if   n 1000 1 n 2 ,   if   n 1001 .
In the first experiment, we chose the regularization parameter λ = 0.1 for all algorithms and data sets. Then, we used 10-fold cross-validation and utilized Average ACC and ERR % for evaluating the performance of each algorithm.
Average   ACC = i = 1 N x i y i × 100 % / N ,
where N is the number of folds ( N = 10 ), x i is the number of data correctly predicted at fold i, and y i is the number of all data at fold i.
Let err L s u m = the sum of errors in all 10 training sets, err T s u m = the sum of errors in all 10 testing sets, L s u m = the sum of all data in all 10 training sets, and T s u m = the sum of all data in all 10 testing sets. Then,
ERR % = ( err L % + err T % ) / 2 ,
where err L % = err L s u m L s u m × 100 % and err T % = err T s u m T s u m × 100 % .
With these evaluation tools, we obtained the results for each data set as seen in Table 2, Table 3 and Table 4.
As seen in Table 2, Table 3 and Table 4, with the same regularization λ = 0.1 , Algorithms 7 and 8 perform better than Algorithms 3 and 5 in terms of accuracy, while the computation times are relatively close among the four algorithms.
In the second experiment, the regularization parameters λ for each algorithm and data set were chosen using 10-fold cv. We compared the error of each model and data set with various λ , then chose the λ that gives the lowest error ( E R R % ) for the particular model and data set. Hence, the parameter λ varies depending on the algorithm and data set. The choice of parameters λ can be seen in Table 5.
With the chosen λ , we also evaluated the performance of each algorithm using 10-fold cross-validation and similar evaluation tools as in the first experiment. The results can be seen in the following Table 6, Table 7 and Table 8.
With the chosen regularization parameters λ as in Table 5, we see that the E R R % of each algorithm in Table 6, Table 7 and Table 8 is lower than that of Table 2, Table 3 and Table 4. We can also see that Algorithms 7 and 8 perform better than Algorithms 3 and 5 in terms of accuracy in all experiments conducted.
In Figure 3, we show the graph of E R R % for each algorithm of the second experiment. As we can see, Algorithms 7 and 8 have lower E R R % , which means they perform better than Algorithm 3 and 5.
From Table 6, Table 7 and Table 8, we can notice that the computational time of Algorithms 7 and 8 is 30 % slower than Algorithm 3 at the same number of iterations. However, from Figure 3, we see that at the 120th iteration, both Algorithms 7 and 8 have lower E R R % than Algorithm 3 at the 200th iteration. Therefore, the time needed for Algorithms 7 and 8 to achieve the same accuracy as or higher accuracy than Algorithm 3 is actually lower because we can compute the 120-step iteration much faster than the 200-step iteration.

5. Conclusions

We introduced a new line search technique and employed it in order to introduce new algorithms, namely Algorithms 7 and 8. Furthermore, Algorithm 7 also utilizes an inertial step to accelerate its convergence behavior. Both algorithms converge weakly to a solution of (1) without the Lipschitz assumption on f . The complexity of Algorithm 8 was also analyzed and studied. Then, we applied the proposed algorithms to the data classification of the Iris, Heart disease, and Wine data set, then their performances were evaluated and compared with other line search algorithms, namely Algorithms 3 and 5. We observed from our experiments that Algorithm 7 achieved the highest accuracy in all data sets under the same number of iterations. Moreover, Algorithm 8, which is not an inertial algorithm, also performed better than Algorithms 3 and 5. Furthermore, from Figure 3, we see that at a lower number of iterations, the proposed algorithms were more accurate than the other algorithms at a higher iteration number.
Based on the experiments on various data sets, we conclude that the proposed algorithms perform better than the previously established algorithms. Therefore, for our future works, we would like to implement the proposed algorithm to predict and classify the data of patients with non-communicable diseases (NCDs) collected from Sriphat Medical Center, Faculty of Medicine, Chiang Mai University, Thailand. We aim to make an innovation for screening and preventing non-communicable diseases, which will be used in hospitals in Chiang Mai, Thailand.

Author Contributions

Writing—original draft preparation, P.S.; software and editing, D.C.; supervision, review and funding acquisition, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

The NSRF via the Program Management Unit for Human Resources & Institutional Development, Research and Innovation (Grant Number B05F640183).

Data Availability Statement

All data can be obtained from https://archive.ics.uci.edu (accessed on 15 November 2021).

Acknowledgments

This research has received funding support from the NSRF via the Program Management Unit for Human Resources & Institutional Development, Research and Innovation (Grant Number B05F640183). This research was also supported by Chiang Mai University.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chen, M.; Zhang, H.; Lin, G.; Han, Q. A new local and nonlocal total variation regularization model for image denoising. Clust. Comput. 2019, 22, 7611–7627. [Google Scholar] [CrossRef]
  2. Combettes, P.L.; Wajs, V. Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 2005, 4, 1168–1200. [Google Scholar] [CrossRef] [Green Version]
  3. Kankam, K.; Pholasa, N.; Cholamjiak, C. On convergence and complexity of the modified forward–backward method involving new line searches for convex minimization. Math. Meth. Appl. Sci. 2019, 1352–1362. [Google Scholar] [CrossRef]
  4. Luo, Z.Q. Applications of convex optimization in signal processing and digital communication. Math. Program. 2003, 97, 177–207. [Google Scholar] [CrossRef]
  5. Xiong, K.; Zhao, G.; Shi, G.; Wang, Y. A Convex Optimization Algorithm for Compressed Sensing in a Complex Domain: The Complex-Valued Split Bregman Method. Sensors 2019, 19, 4540. [Google Scholar] [CrossRef] [Green Version]
  6. Zhang, Y.; Li, X.; Zhao, G.; Cavalcante, C.C. Signal reconstruction of compressed sensing based on alternating direction method of multipliers. Circuits Syst. Signal Process 2020, 39, 307–323. [Google Scholar] [CrossRef]
  7. Hanjing, A.; Bussaban, L.; Suantai, S. The Modified Viscosity Approximation Method with Inertial Technique and Forward–Backward Algorithm for Convex Optimization Model. Mathematics 2022, 10, 1036. [Google Scholar] [CrossRef]
  8. Hanjing, A.; Suantai, S. A fast image restoration algorithm based on a fixed point and optimization method. Mathematics 2020, 8, 378. [Google Scholar] [CrossRef] [Green Version]
  9. Zhong, T. Statistical Behavior and Consistency of Classification Methods Based on Convex Risk Minimization. Ann. Stat. 2004, 32, 56–134. [Google Scholar] [CrossRef]
  10. Elhamifar, E.; Sapiro, G.; Yang, A.; Sasrty, S.S. A Convex Optimization Framework for Active Learning. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 209–216. [Google Scholar] [CrossRef] [Green Version]
  11. Yuan, M.; Wegkamp, M. Classification Methods with Reject Option Based on Convex Risk Minimization. J. Mach. Learn. Res. 2010, 11, 111–130. [Google Scholar]
  12. Lions, P.L.; Mercier, B. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 1979, 16, 964–979. [Google Scholar] [CrossRef]
  13. Polyak, B.T. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
  14. Attouch, H.; Cabot, A. Convergence rate of a relaxed inertial proximal algorithm for convex minimization. Optimization 2019, 69, 1281–1312. [Google Scholar] [CrossRef]
  15. Alvarez, F.; Attouch, H. An inertial proximal method for maxi mal monotone operators via discretiza tion of a nonlinear oscillator with damping. Set-Valued Anal. 2001, 9, 3–11. [Google Scholar] [CrossRef]
  16. Van Hieu, D. An inertial-like proximal algorithm for equilibrium problems. Math. Meth. Oper. Res. 2018, 88, 399–415. [Google Scholar] [CrossRef]
  17. Chidume, C.E.; Kumam, P.; Adamu, A. A hybrid inertial algorithm for approximating solution of convex feasibility problems with applications. Fixed Point Theory Appl. 2020, 2020, 12. [Google Scholar] [CrossRef]
  18. Moudafi, A.; Oliny, M. Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 2003, 155, 447–454. [Google Scholar] [CrossRef] [Green Version]
  19. Sarnmeta, P.; Inthakon, W.; Chumpungam, D.; Suantai, S. On convergence and complexity analysis of an accelerated forward–backward algorithm with line search technique for convex minimization problems and applications to data prediction and classification. J. Inequal. Appl. 2021, 2021, 141. [Google Scholar] [CrossRef]
  20. Beck, A.; Teboulle, M. A fast iterative shrinkage–thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef] [Green Version]
  21. Boţ, R.I.; Csetnek, E.R. An inertial forward–backward-forward primal-dual splitting algorithm for solving monotone inclusion problems. Numer. Algor. 2016, 71, 519–540. [Google Scholar] [CrossRef] [Green Version]
  22. Verma, M.; Shukla, K.K. A new accelerated proximal gradient technique for regularized multitask learning framework. Pattern Recogn. Lett. 2017, 95, 98–103. [Google Scholar] [CrossRef]
  23. Bello Cruz, J.Y.; Nghia, T.T. On the convergence of the forward–backward splitting method with line searches. Optim. Methods Softw. 2016, 31, 1209–1238. [Google Scholar] [CrossRef] [Green Version]
  24. Burachik, R.S.; Iusem, A.N. Set-Valued Mappings and Enlargements of Monotone Operators; Springer: Berlin, Germany, 2008. [Google Scholar]
  25. Huang, Y.; Dong, Y. New properties of forward–backward splitting and a practical proximal-descent algorithm. Appl. Math. Comput. 2014, 237, 60–68. [Google Scholar] [CrossRef]
  26. Takahashi, W. Introduction to Nonlinear and Convex Analysis; Yokohama Publishers: Yokohama, Japan, 2009. [Google Scholar]
  27. Moudafi, A.; Al-Shemas, E. Simultaneous iterative methods for split equality problem. Trans. Math. Program. Appl. 2013, 1, 1–11. [Google Scholar]
  28. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  29. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  30. Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
  31. Detrano, R.; Janosi, A.; Steinbrunn, W.; Pfisterer, M.; Schmid, J.J.; Sandhu, S.; Guppy, K.H.; Lee, S.; Froelicher, V. International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 1989, 64, 304–310. [Google Scholar] [CrossRef]
  32. Forina, M.; Leardi, R.; Armanino, C.; Lanteri, S. PARVUS: An Extendable Package of Programs for Data Exploration; Elsevier: Amsterdam, The Netherlands, 1988. [Google Scholar]
Figure 1. Diagram of Algorithm 7.
Figure 1. Diagram of Algorithm 7.
Mathematics 10 01491 g001
Figure 2. Diagram of Algorithm 8.
Figure 2. Diagram of Algorithm 8.
Mathematics 10 01491 g002
Figure 3. ERR % of each algorithm and data set of the second experiment.
Figure 3. ERR % of each algorithm and data set of the second experiment.
Mathematics 10 01491 g003
Table 1. Chosen parameters of each algorithm.
Table 1. Chosen parameters of each algorithm.
Algorithm 3Algorithm 5Algorithm 7Algorithm 8
σ 0.490.1240.1240.124
δ 0.10.10.10.1
θ 0.10.10.10.1
α n -- 1 2 1 3
Table 2. The performance of each algorithm in the first experiment at the 200th iteration with 10-fold cv. on the Iris data set.
Table 2. The performance of each algorithm in the first experiment at the 200th iteration with 10-fold cv. on the Iris data set.
Algorithm 3Algorithm 5Algorithm 7Algorithm 8
acc.trainacc.testacc.trainacc.testacc.trainacc.testacc.trainacc.test
Fold 187.4186.6793.3386.6797.7810097.0493.33
Fold 288.1593.3392.5910096.3010096.30100
Fold 388.1510092.5910097.7893.3396.30100
Fold 488.1510092.5910097.7810096.30100
Fold 586.6786.6793.3386.6797.7810096.30100
Fold 688.1573.3392.598099.2686.6797.7886.67
Fold 787.4110092.5910097.7810096.30100
Fold 888.1586.6793.3393.3397.0493.3397.7886.67
Fold 988.898093.3393.3398.5293.3396.3093.33
Fold 1088.1573.3392.5993.3397.7810095.56100
Average acc.87.938892.8993.3397.7896.6796.5996
ERR % 12.046.892.783.70
Time0.06090.09010.07810.0767
Table 3. The performance of each algorithm in the first experiment at the 200th iteration with 10-fold cv. on the Heart disease data set.
Table 3. The performance of each algorithm in the first experiment at the 200th iteration with 10-fold cv. on the Heart disease data set.
Algorithm 3Algorithm 5Algorithm 7Algorithm 8
acc.trainacc.testacc.trainacc.testacc.trainacc.testacc.trainacc.test
Fold 179.8586.6781.3286.6783.1593.3382.0586.67
Fold 280.1580.6580.1580.6584.1983.8781.6283.87
Fold 381.2577.4282.3577.4284.9377.4283.0980.65
Fold 480.5183.8782.3587.1084.5680.6582.7290.32
Fold 579.859081.329084.9886.6782.4286.67
Fold 681.688083.1583.3384.6286.6783.5283.33
Fold 780.2286.6781.6883.3384.2583.3382.0583.33
Fold 882.0566.6782.4266.6784.9873.3382.4266.67
Fold 981.327081.687086.0873.3382.0570
Fold 1080.9576.6782.058084.2583.3382.0580
Average acc.80.7879.8681.8580.5284.6082.1982.4081.15
ERR % 19.6718.8116.6118.21
Time0.07260.10480.10040.0921
Table 4. The performance of each algorithm in the first experiment at the 200th iteration with 10-fold cv. on the Wine data set.
Table 4. The performance of each algorithm in the first experiment at the 200th iteration with 10-fold cv. on the Wine data set.
Algorithm 3Algorithm 5Algorithm 7Algorithm 8
acc.trainacc.testacc.trainacc.testacc.trainacc.testacc.trainacc.test
Fold 196.8910096.8910099.3810098.14100
Fold 296.8810097.5010099.3810098.13100
Fold 397.5010098.1310099.3810098.13100
Fold 497.5010096.8810099.3810098.13100
Fold 596.8810097.5010099.3810098.13100
Fold 697.5094.4496.8810099.3810098.13100
Fold 797.5094.4498.1394.4410094.4498.7594.44
Fold 897.5010096.8810099.3810098.13100
Fold 998.7588.8998.1388.8999.3888.8999.3888.89
Fold 1098.7688.2498.7688.2499.3810098.14100
Average acc.97.5796.6097.5797.1699.4498.3398.3198.33
ERR % 2.902.621.121.69
Time0.06240.09970.08700.0810
Table 5. Chosen λ of each algorithm.
Table 5. Chosen λ of each algorithm.
Regularization Parameter λ
IrisHeart DiseaseWine
Algorithm 3 0.001 0.003 0.02
Algorithm 5 0.01 0.03 0.006
Algorithm 7 0.003 0.13 0.0001
Algorithm 8 0.01 0.008 0.003
Table 6. The performance of each algorithm in the second experiment at the 200th iteration with 10-fold cv. on the Iris data set.
Table 6. The performance of each algorithm in the second experiment at the 200th iteration with 10-fold cv. on the Iris data set.
Algorithm 3Algorithm 5Algorithm 7Algorithm 8
acc.trainacc.testacc.trainacc.testacc.trainacc.testacc.trainacc.test
Fold 188.1586.6793.3386.6798.5210097.0493.33
Fold 288.1593.3392.5910098.5210096.30100
Fold 388.8910093.3310098.5210096.30100
Fold 488.1510092.5910098.5210096.30100
Fold 586.6786.6793.3386.6798.5210096.30100
Fold 688.1573.3393.338099.2686.6797.7886.67
Fold 787.4110092.5910098.5210096.30100
Fold 888.1586.6793.3393.3397.7810097.7886.67
Fold 988.898093.3393.3398.5210096.3093.33
Fold 1088.1573.3392.5993.3398.5210095.56100
Average acc.88.078893.0493.3398.5298.6796.5996
ERR % 11.966.811.413.70
Time0.06180.09730.07930.0783
Table 7. The performance of each algorithm in the second experiment at the 200th iteration with 10-fold cv. on the Heart disease data set.
Table 7. The performance of each algorithm in the second experiment at the 200th iteration with 10-fold cv. on the Heart disease data set.
Algorithm 3Algorithm 5Algorithm 7Algorithm 8
acc.trainacc.testacc.trainacc.testacc.trainacc.testacc.trainacc.test
Fold 179.4986.6782.0586.6784.258082.0586.67
Fold 280.1580.6580.5183.8783.8287.1081.6283.87
Fold 381.6277.4281.9980.6584.5680.6583.4680.65
Fold 480.5183.8782.7290.3283.8287.1083.0987.10
Fold 579.859082.4286.6786.4576.6782.7886.67
Fold 681.688083.5283.3385.3586.6783.5283.33
Fold 780.2286.6781.6883.3384.9873.3382.0583.33
Fold 882.4266.6782.4266.6783.159082.7866.67
Fold 980.9570.0082.057084.6283.3382.4270
Fold 1080.9576.6782.058084.989082.7883.33
Average acc.80.7879.8682.1481.1584.6083.4882.6681.16
ERR % 19.6718.3415.9518.08
Time0.07940.11290.10130.097
Table 8. The performance of each algorithm in the second experiment at the 200th iteration with 10-fold cv. on the Wine data set.
Table 8. The performance of each algorithm in the second experiment at the 200th iteration with 10-fold cv. on the Wine data set.
Algorithm 3Algorithm 5Algorithm 7Algorithm 8
acc.trainacc.testacc.trainacc.testacc.trainacc.testacc.trainacc.test
Fold 196.8910097.5210099.3810098.14100
Fold 296.8810097.5010010010098.75100
Fold 397.5010097.5010010010098.13100
Fold 497.5010098.1310099.3810098.13100
Fold 597.5010098.1310099.3810098.13100
Fold 697.5094.4498.1310099.3810098.13100
Fold 797.5094.4498.7594.4410094.4498.7594.44
Fold 897.5010097.5010099.3810098.13100
Fold 998.7588.8998.7588.8999.3810099.3888.89
Fold 1098.7688.2498.1488.2410010098.14100
Average acc.97.6396.609897.1699.6399.4498.3898.33
ERR % 2.872.400.471.65
Time0.06440.09710.08740.0819
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chumpungam, D.; Sarnmeta, P.; Suantai, S. An Accelerated Convex Optimization Algorithm with Line Search and Applications in Machine Learning. Mathematics 2022, 10, 1491. https://doi.org/10.3390/math10091491

AMA Style

Chumpungam D, Sarnmeta P, Suantai S. An Accelerated Convex Optimization Algorithm with Line Search and Applications in Machine Learning. Mathematics. 2022; 10(9):1491. https://doi.org/10.3390/math10091491

Chicago/Turabian Style

Chumpungam, Dawan, Panitarn Sarnmeta, and Suthep Suantai. 2022. "An Accelerated Convex Optimization Algorithm with Line Search and Applications in Machine Learning" Mathematics 10, no. 9: 1491. https://doi.org/10.3390/math10091491

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop