An Accelerated Convex Optimization Algorithm with Line Search and Applications in Machine Learning

Chumpungam, Dawan; Sarnmeta, Panitarn; Suantai, Suthep

doi:10.3390/math10091491

Open AccessArticle

An Accelerated Convex Optimization Algorithm with Line Search and Applications in Machine Learning

by

Dawan Chumpungam

¹

,

Panitarn Sarnmeta

² and

Suthep Suantai

^1,3,*

¹

Data Science Research Center, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand

²

KOSEN-KMITL, Bangkok 10520, Thailand

³

Research Group in Mathematics and Applied Mathematics, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(9), 1491; https://doi.org/10.3390/math10091491

Submission received: 16 March 2022 / Revised: 22 April 2022 / Accepted: 27 April 2022 / Published: 30 April 2022

(This article belongs to the Special Issue Advances in Machine Learning and Mathematical Modeling for Optimization Problems)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we introduce a new line search technique, then employ it to construct a novel accelerated forward–backward algorithm for solving convex minimization problems of the form of the summation of two convex functions in which one of these functions is smooth in a real Hilbert space. We establish a weak convergence to a solution of the proposed algorithm without the Lipschitz assumption on the gradient of the objective function. Furthermore, we analyze its performance by applying the proposed algorithm to solving classification problems on various data sets and compare with other line search algorithms. Based on the experiments, the proposed algorithm performs better than other line search algorithms.

Keywords:

forward–backward algorithm; line search; accelerated algorithm; convex minimization problems; data classification; machine learning

MSC:

65K05; 90C25; 90C30

1. Introduction

The convex minimization problem in the form of the sum of two convex functions plays a very important role in machine learning. This problem has been analyzed and studied by many authors because of its applications in various fields such as data science, computer science, statistics, engineering, physics, and medical science. Some examples of these applications are signal processing, compressed sensing, medical image reconstruction, digital image processing, and data prediction and classification; see [1,2,3,4,5,6,7,8].

As we know in machine learning, especially in data prediction and classification problems, the main objective is to minimize loss functions. Many loss functions can be viewed as convex functions; thus by employing convex minimization, one could find the minimum of such functions, which in turn solve data prediction and classification problems. Many works have implemented this strategy; see [9,10,11] and the references therein for more information. In this work, we apply extreme learning machine together with the least absolute shrinkage and selection operator to solve classification problems; more detail will be discussed in a later section. First, we introduce a convex minimization problem, which can be formulated as the following form:

min_{x \in H} {f (x) + g (x)},

(1)

where

f : H \to R \cup {+ \infty}

is proper, convex differentiable on an open set containing

d o m (g)

and

g : H \to R \cup {+ \infty}

is a proper, lower semicontinuous convex function defined on a real Hilbert space H.

A solution of (1) is in fact a fixed point of the operator

p r o x_{α g} (I - α ▿ f)

, i.e.,

x^{*} = p r o x_{α g} (I - α ▿ f) (x^{*}),

(2)

where

α > 0,

and

p r o x_{α g} (I - α ▿ f) (x) = a r g {min}_{y \in H} {g (y) + \frac{1}{2 α} {∥ (x - α ▿ f (x)) - y ∥}^{2}},

which is known as the forward–backward operator. In order to solve (1), the forward–backward algorithm [12] was introduced as follows:

x_{n + 1} = \underset{backward}{\underset{︸}{p r o x_{α_{n} g}}} \underset{forward}{\underset{︸}{(I - α_{n} ▿ f)}} (x_{n}), f o r a l l n \in N,

(3)

where

α_{n}

is a positive number. If

▿ f

is L-Lipschitz continuous and

α_{n} \in (0, \frac{2}{L})

, then a sequence generated by (3) converges weakly to a solution of (1). There are several techniques that can improve the performance of (3). For instance, we could utilize an inertial step, which was first introduced by Polyak [13], to solve smooth convex minimization problems. Since then, there have been several works that included an inertial step in their algorithms to accelerate the convergence behavior; see [14,15,16,17,18,19] for examples.

One of the most famous forward–backward-type algorithms that implements an inertial step is the fast iterative shrinkage–thresholding algorithm (FISTA) [20]. It is defined as the following Algorithm 1.

Algorithm 1. FISTA.

1:: Input Given $y_{1} = x_{0} \in R^{n}$ , and $t_{1} = 1,$ for $n \in N,$

$\begin{matrix} x_{n} = p r o x_{\frac{1}{L} g} (y_{n} - \frac{1}{L} ▿ f (y_{n})), \\ t_{n + 1} = \frac{1 + \sqrt{1 + 4 t_{n}^{2}}}{2}, θ_{n} = \frac{t_{n} - 1}{t_{n + 1}}, \\ y_{n + 1} = x_{n} + θ_{n} (x_{n} - x_{n - 1}), \end{matrix}$

where L is a Lipschitz constant of $▿ f$ .

The term

x_{n} + θ_{n} (x_{n} - x_{n - 1})

is known as an inertial term with an inertial parameter

θ_{n}

. It has been shown that FISTA performs better than (3). Later, other forward–backward-type algorithms have been introduced and studied by many authors; see for instance [2,8,18,21,22]. However, most of these works assume the Lipschitz assumption on

▿ f

, which is difficult for computation in general. Therefore, in this paper, we focus on another approach where

▿ f

is not necessarily Lipschitz continuous.

In 2016, Cruz and Nghia [23] introduced a line search technique as the following Algorithm 2.

Algorithm 2. Line search 1.

(x, δ, σ, θ)

1:: Input Given $x \in d o m (g),$ $δ > 0,$ $σ > 0$ and $θ \in (0, 1) .$
2:: Set $γ = σ .$
3:: while $γ ∥ ▿ f (p r o x_{γ g} (x - γ ▿ f (x))) - ▿ f (x) ∥ > δ ∥ p r o x_{γ g} (x - γ ▿ f (x)) - x ∥$ do
4:: Set $γ = θ γ$
5:: end while
6:: Output $γ .$

They asserted that Line Search 1 stops after finitely many steps and proposed the following Algorithm 3.

Algorithm 3. Algorithm with Line Search 1.

1:: Input Given $x_{0} \in d o m (g),$ $δ \in (0, \frac{1}{2}),$ $σ > 0,$ and $θ \in (0, 1),$ for all $n \in N,$

$x_{n + 1} = p r o x_{γ_{n} g} (I - γ_{n} ▿ f) (x_{n}),$

where $γ_{n} : =$ Line Search 1 $(x_{n}, δ, σ, θ) .$

They also showed that the sequence

{x_{n}}

defined by Algorithm 3 converges weakly to a solution of (1) under Assumptions A1 and A2 where:

A1.: $f, g$ are proper lower semicontinuous convex functions with $d o m (g) \subseteq d o m (f);$
A2.: f is differentiable on an open set containing $d o m (g),$ and $▿ f$ is uniformly continuous on any bounded subset of $d o m (g)$ and maps any bounded subset of $d o m (g)$ to a bounded set in $H .$

It is noted that the L-Lipschitz continuity of

▿ f

is not necessarily assumed. Moreover, if

▿ f

is L-Lipschitz continuous, then A2 is satisfied.

In 2019, Kankam et al. [3] proposed the new line search as the following Algorithm 4.

Algorithm 4. Line search 2.

(x, δ, σ, θ)

1:: Input Given $x \in d o m (g),$ $δ > 0,$ $σ > 0$ and $θ \in (0, 1)$ . Set

$L (x, γ) = p r o x_{γ g} (x - γ ▿ f (x)), and$

$S (x, γ) = p r o x_{γ g} (L (x, γ) - γ ▿ f (L (x, γ))) .$
2:: Set $γ = σ .$
3:: while

$\begin{matrix} γ max {∥ ▿ f (S (x, γ)) - ▿ f (L (x & , γ)) ∥, ∥ ▿ f (L (x, γ)) - ▿ f (x) ∥} \\ > δ (∥ S (x, γ) - L (x, γ) ∥ + ∥ L (x, γ) - x ∥) \end{matrix}$

do
4:: Set $γ = θ γ, L (x, γ) = L (x, θ γ), S (x, γ) = S (x, θ γ)$
5:: end while
6:: Output $γ .$

They also asserted that Line Search 2 stops at finitely many steps and proposed the following Algorithm 5.

Algorithm 5. Algorithm with Line Search 2.

1:: Input Given $x_{0} \in d o m (g),$ $δ \in (0, \frac{1}{8}),$ $σ > 0$ and $θ \in (0, 1),$ for all $n \in N,$

$\begin{matrix} y_{n} = p r o x_{γ_{n} g} (x_{n} - γ_{n} ▿ f (x_{n})), \\ x_{n + 1} = p r o x_{γ_{n} g} (y_{n} - γ_{n} ▿ f (y_{n})), \end{matrix}$

where $γ_{n} : = L i n e S e a r c h 2 (x_{n}, δ, σ, θ) .$

A weak convergence result of this algorithm was obtained under Assumptions A1 and A2. Although Algorithms 3 and 5 obtained weak convergence results without the Lipschitz assumption on

▿ f

, the two algorithms did not utilize an inertial step yet. Therefore, some improvements of their convergence behavior using this technique are interesting to investigate.

Motivated by the works mentioned earlier, we aim to introduce a new line search technique and prove that it is well-defined. Then, we employ it to construct a novel forward–backward algorithm that utilizes an inertial step to improve its performance to be better than the other line search algorithms. We prove a weak convergence theorem of the proposed algorithm without the Lipschitz assumption on

▿ f

and apply it to solve classification problems on various data sets. We also compare its performance with Algorithms 3 and 5 to show that the proposed algorithm performs better.

This work is organized as follows: In Section 2, we recall some important definitions and lemmas used in later sections. In Section 3, we introduce a new line search technique and algorithm for solving (1). Then, we analyze the convergence and complexity of the proposed algorithm under Assumptions A1 and A2. In Section 4, we apply the proposed algorithm to solve data classification problems and compare its performance with other algorithms. Finally, the conclusion of this work is presented in Section 5.

2. Preliminaries

In this section, some important definitions and lemmas, which will be used in later sections, are presented.

Let

{x_{n}}

be a sequence in H and

x \in H

. We denote

x_{n} \to x

and

x_{n} ⇀ x

as a strong and weak convergence of

{x_{n}}

to x, respectively. Let

f : H \to R \cup {+ \infty}

be a proper lower semicontinuous and convex function. We denote

d o m (f) = {x \in H : f (x) < + \infty}

.

A subdifferential of f at x is defined by

\partial f (x) : = {u \in H : 〈 u, y - x 〉 + f (x) \leq f (y), y \in H} .

A proximal operator

p r o x_{α f} : H \to d o m (f)

is defined as follows:

p r o x_{α f} (x) = {(I + α \partial f)}^{- 1} (x),

where I is an identity mapping and

α

is a positive number. It is well known that this operator is single-valued, nonexpansive, and

\frac{x - p r o x_{α f} (x)}{α} \in \partial f (p r o x_{α f} (x)), for all x \in H and α > 0;

(4)

see [23] for more details. Next, we present some important lemmas for this work.

Lemma 1

([24]). Let

\partial f

be a subdifferential of f. Then, the following hold:

(i): $\partial f$ is maximal monotone;
(ii): $G p h (\partial f) : = {(x, y) \in H \times H : y \in \partial f (x)}$ is demiclosed, i.e., for any sequence ${(x_{n}, y_{n})} \subseteq G p h (\partial f)$ such that ${x_{n}} ⇀ x$ and ${y_{n}} \to y,$ then $(x, y) \in G p h (\partial f)$ .

Lemma 2

([25]). Let

f, g : H \to R \cup {+ \infty}

be proper lower semicontinuous convex functions with

d o m (g) \subseteq d o m (f)

and

J (x, α) = p r o x_{α g} (x - α ▿ f (x)) .

Then, for any

x \in d o m (g)

and

α_{2} \geq α_{1} > 0,

we have

\frac{α_{2}}{α_{1}} ∥ x - J (x, α_{1}) ∥ \geq ∥ x - J (x, α_{2}) ∥ \geq ∥ x - J (x, α_{1}) ∥ .

Lemma 3

([26]). Let H be a real Hilbert space. Then, for all

a, b, c \in H

and

ζ \in [0, 1],

the following hold:

(i): ${∥ a \pm b ∥}^{2} = {∥ a ∥}^{2} \pm 2 〈 a, b 〉 + {∥ b ∥}^{2};$
(ii): ${∥ ζ a + (1 - ζ) b ∥}^{2} = {ζ ∥ a ∥}^{2} + {(1 - ζ) ∥ b ∥}^{2} - ζ (1 - ζ) {∥ a - b ∥}^{2};$
(iii): ${∥ a + b ∥}^{2} \leq {∥ a ∥}^{2} + 2 〈 b, a + b 〉;$
(iv): $〈 a - b, b - c 〉 = \frac{1}{2} {(∥ a - c ∥}^{2} {- ∥ a - b ∥}^{2} {- ∥ b - c ∥}^{2}) .$

Lemma 4

([8]). Let

{a_{n}}

and

{b_{n}}

be sequences of non-negative real numbers such that

a_{n + 1} \leq (1 + b_{n}) a_{n} + b_{n} a_{n - 1}, f o r a l l n \in N .

Then, the following holds:

a_{n + 1} \leq K \prod_{j = 1}^{n} (1 + 2 b_{j}), w h e r e K = max {a_{1}, a_{2}} .

Moreover, if

\sum_{n = 1}^{+ \infty} b_{n} < + \infty,

then

{a_{n}}

is bounded.

Lemma 5

([26]). Let

{α_{n}}, {β_{n}}

and

{γ_{n}}

be sequences of non-negative real numbers such that

α_{n + 1} \leq (1 + γ_{n}) α_{n} + β_{n}, f o r a l l n \in N .

If

\sum_{n = 1}^{+ \infty} γ_{n} < + \infty

and

\sum_{n = 1}^{+ \infty} β_{n} < + \infty,

then

lim_{n \to + \infty} α_{n}

exists.

Lemma 6

([27], Opial). Let

{x_{n}}

be a sequence in a Hilbert space H. If there exists a nonempty subset Ω of H such that the following hold:

(i): For any $x^{*} \in Ω, lim_{n \to + \infty} ∥ x_{n} - x^{*} ∥$ exists;
(ii): Every weak-cluster point of ${x_{n}}$ belongs to $Ω .$

Then,

{x_{n}}

converges weakly to an element in Ω.

3. Main Results

In this section, we define a new line search technique and a new accelerated algorithm with the new line search for solving (1). We denote

S_{*}

the set of all solutions of (1) and suppose that

f, g : H \to R \cup {+ \infty}

are two convex functions that satisfy Assumptions A1 and A2, and

d o m (g)

is closed. Furthermore, we also suppose that

S_{*} \neq \emptyset

.

We first introduce a new line search technique as the following Algorithm 6.

Algorithm 6. Line Search 3

(x, δ, σ, θ)

.

1:: Input Given $x \in d o m (g),$ $δ > 0,$ $σ > 0$ and $θ \in (0, 1)$ . Set

$L (x, γ) = p r o x_{γ g} (x - γ ▿ f (x)), and$

$S (x, γ) = p r o x_{γ g} (L (x, γ) - γ ▿ f (L (x, γ))) .$
2:: Set $γ = σ .$
3:: while

$\begin{matrix} \frac{γ}{2} (∥ ▿ f (S (x, γ)) - ▿ f (L (x, γ)) ∥ & + ∥ ▿ f (L (x, γ)) - ▿ f (x) ∥) \\ > δ (∥ S (x, γ) - L (x, γ) ∥ + ∥ L (x, γ) - x ∥), \end{matrix}$

$or γ ∥ ▿ f (L (x, γ)) - ▿ f (x) ∥ > 4 δ ∥ L (x, γ) - x ∥ .$

do
4:: Set $γ = θ γ, L (x, γ) = L (x, θ γ), S (x, γ) = S (x, θ γ)$
5:: end while
6:: Output $γ .$

We first show that Line Search 3 terminates at finitely many steps.

Lemma 7.

Line Search 3 stops at finitely many steps.

Proof.

If

x \in S_{*}

, then

x = L (x, σ) = S (x, σ)

, so Line Search 3 stops with zero steps. If

x \notin S_{*}

, suppose by contradiction that, for all

n \in N,

the following hold:

\begin{matrix} \frac{σ θ^{n}}{2} & (∥ ▿ f (S (x, σ θ^{n})) - ▿ f (L (x, σ θ^{n})) ∥ + ∥ ▿ f (L (x, σ θ^{n})) - ▿ f (x) ∥) \\ > δ (∥ S (x, σ θ^{n}) - L (x, σ θ^{n}) ∥ + ∥ L (x, σ θ^{n}) - x ∥), \end{matrix}

(5)

or

σ θ^{n} ∥ ▿ f (L (x, σ θ^{n})) - ▿ f (x) ∥ > 4 δ ∥ L (x, σ θ^{n}) - x ∥ .

(6)

Then, from these assumptions, we can find a subsequence

{σ θ^{n_{k}}}

of

{σ θ^{n}}

such that (5) or (6) holds. First, we show that

{∥ ▿ f (L (x, σ θ^{n})) - ▿ f (x) ∥} and {∥ ▿ f (S (x, σ θ^{n})) - ▿ f (L (x, σ θ^{n})) ∥}

are bounded. It follows from Lemma 2 that

∥ L (x, σ θ^{n}) - x ∥ \leq ∥ L (x, σ) - x ∥,

for all

n \in N .

In combination with A2, we conclude that

{∥ ▿ f (L (x, σ θ^{n})) - ▿ f (x) ∥}

is bounded. Next, we prove that

{∥ ▿ f (S (x, σ θ^{n})) - ▿ f (L (x, σ θ^{n})) ∥}

is bounded. Since

p r o x_{γ g}

is nonexpansive, for any

γ > 0

, then

\begin{matrix} ∥ S (x, σ θ^{n}) & - ▿ L (x, σ θ^{n}) ∥ \\ = ∥ p r o x_{σ θ^{n} g} (L (x, σ θ^{n}) - σ θ^{n} ▿ f (L (x, σ θ^{n}))) - p r o x_{σ θ^{n} g} (x - σ θ^{n} ▿ f (x)) ∥ \\ \leq ∥ (L (x, σ θ^{n}) - σ θ^{n} ▿ f (L (x, σ θ^{n})) - (x - σ θ^{n} ▿ f (x)) ∥ \\ \leq ∥ L (x, σ θ^{n}) - x ∥ + σ θ^{n} ∥ ▿ f (L (x, σ θ^{n}) - ▿ f (x)) ∥ \\ \leq ∥ L (x, σ θ^{n}) - x ∥ + σ ∥ ▿ f (L (x, σ θ^{n}) - ▿ f (x)) ∥, \end{matrix}

for all

n \in N

; hence,

{∥ S (x, σ θ^{n}) - ▿ L (x, σ θ^{n}) ∥}

is bounded. Again, it follows from A2 that

{∥ ▿ f (S (x, σ θ^{n})) - ▿ f (L (x, σ θ^{n})) ∥}

is bounded. To complete the proof, we consider the only two possible cases to find a contradiction.

Case 1: Suppose that there exists a subsequence

{σ θ^{n_{k}}}

of

{σ θ^{n}}

such that (5) holds, for all

k \in N

. Then, it follows that

∥ S (x, σ θ^{n_{k}}) - L (x, σ θ^{n_{k}}) ∥ \to 0

and

∥ L (x, σ θ^{n_{k}}) - x ∥ \to 0

, as

k \to + \infty

. Since

▿ f

is uniformly continuous, we obtain:

∥ ▿ f (S (x, σ θ^{n_{k}})) - ▿ f (L (x, σ θ^{n_{k}})) ∥ \to 0 and ∥ ▿ f (L (x, σ θ^{n_{k}})) - ▿ f (x) ∥ \to 0,

as

k \to + \infty .

Therefore, it follows from (5) that

\frac{∥ L (x, σ θ^{n_{k}}) - x ∥}{σ θ^{n_{k}}} \to 0,

as

k \to + \infty .

By (4), we obtain

\frac{x - σ θ^{n_{k}} ▿ f (x) - L (x, σ θ^{n_{k}})}{σ θ^{n_{k}}} \in \partial g (L (x, σ θ^{n_{k}})) .

Thus,

\frac{L (x, σ θ^{n_{k}}) - x}{σ θ^{n_{k}}} - ▿ f (x) \in \partial g (L (x, σ θ^{n_{k}})) .

Since

L (x, σ θ^{n_{k}}) \to x

, as

k \to + \infty

, we obtain from Lemma 1 that

0 \in ▿ f (x) + \partial g (x) \subseteq \partial (f + g) (x) .

Hence,

x \in S_{*}

, which is a contradiction.

Case 2: Suppose that there is a subsequence

{σ θ^{n_{k}}}

of

{σ θ^{n}}

satisfying (6), for all

k \in N

. Then,

∥ L (x, σ θ^{n_{k}}) - x ∥ \to 0

, as

k \to + \infty

. Again, from the uniform continuity of

▿ f

, we have

∥ ▿ f (L (x, σ θ^{n_{k}})) - ▿ f (x) ∥ \to 0,

as

k \to + \infty .

From (6), we conclude that

\frac{∥ L (x, σ θ^{n_{k}}) - x ∥}{σ θ^{n_{k}}} \to 0,

as

k \to + \infty .

By the same argument as in Case 1, we can show that

0 \in \partial (f + g) (x),

and hence,

x \in S_{*},

a contradiction. Therefore, we conclude that Line Search 3 stops with finite steps, and the proof is complete. □

We propose a new inertial algorithm with Line Search 3 as following Algorithm 7.

Algorithm 7. Inertial algorithm with Line Search 3.

1:: Input Given $x_{0}, x_{1} \in d o m (g), α_{n} \in [0, 1], β_{n} \geq 0, σ > 0, θ \in (0, 1)$ and $δ \in (0, \frac{1}{8}),$ for $n \in N,$

$\begin{matrix} y_{n} = x_{n} + β_{n} (x_{n} - x_{n - 1}), \\ z_{n} = P_{d o m (g)} y_{n}, \\ w_{n} = p r o x_{γ_{n} g} (z_{n} - γ_{n} ▿ f (z_{n})), \\ x_{n + 1} = (1 - α_{n}) w_{n} + α_{n} p r o x_{γ_{n} g} (w_{n} - γ_{n} ▿ f (w_{n})), \end{matrix}$

where $γ_{n} : = Line Search 3 (z_{n}, δ, σ, θ),$ and $P_{d o m (g)}$ is a metric projection map onto $d o m (g) .$

The diagram of Algorithm 7 can be seen in Figure 1.

Next, we prove the following lemma, which will play a crucial role in our main theorems.

Lemma 8.

Let

γ_{n} : = L i n e S e a r c h 3 (z_{n}, δ, σ, θ) .

Then, for all

n \in N

and

x \in d o m (g)

, the following hold:

(I): $∥ z_{n} {- x ∥}^{2} - ∥ w_{n} {- x ∥}^{2} \geq 2 γ_{n} [(f + g) (w_{n}) - (f + g) (x)] + (1 - 8 δ) {∥ w_{n} - z_{n} ∥}^{2};$
(II): $∥ z_{n} {- x ∥}^{2} - {∥ v_{n} - x ∥}^{2} \geq 2 γ_{n} [(f + g) (w_{n}) + (f + g) (v_{n}) - 2 (f + g) (x)]$
$+ (1 - 8 δ) (∥ w_{n} - z_{n} ∥^{2} + ∥ v_{n} - w_{n} ∥^{2}) .$

where

v_{n} = p r o x_{γ_{n} g} (w_{n} - γ_{n} ▿ f (w_{n})) .

Proof.

First, we show that

(I)

is true. From (4), we know that

\frac{z_{n} - w_{n}}{γ_{n}} - ▿ f (z_{n}) \in \partial g (w_{n}), for all n \in N .

Moreover, it follows from the definitions of

\partial g (w_{n}), ▿ f (z_{n})

and

▿ f (w_{n})

that

g (x) - g (w_{n}) \geq 〈 \frac{z_{n} - w_{n}}{γ_{n}} - ▿ f (z_{n}), x - w_{n} 〉,

f (x) - f (z_{n}) \geq 〈 ▿ f (z_{n}), x - z_{n} 〉 and f (z_{n}) - f (w_{n}) \geq 〈 ▿ f (w_{n}), z_{n} - w_{n} 〉,

for all

n \in N

. Consequently,

\begin{matrix} f (x) - f (z_{n}) + g (x) - g (w_{n}) & \geq \frac{1}{γ_{n}} 〈 z_{n} - w_{n}, x - w_{n} 〉 + 〈 ▿ f (z_{n}), w_{n} - z_{n} 〉 \\ = \frac{1}{γ_{n}} 〈 z_{n} - w_{n}, x - w_{n} 〉 + 〈 ▿ f (z_{n}) - ▿ f (w_{n}), w_{n} - z_{n} 〉 \\ + 〈 ▿ f (w_{n}), w_{n} - z_{n} 〉 \\ \geq \frac{1}{γ_{n}} 〈 z_{n} - w_{n}, x - w_{n} 〉 - ∥ ▿ f (z_{n}) - ▿ f (w_{n}) ∥ ∥ w_{n} - z_{n} ∥ \\ + 〈 ▿ f (w_{n}), w_{n} - z_{n} 〉 \\ \geq \frac{1}{γ_{n}} 〈 z_{n} - w_{n}, x - w_{n} 〉 - \frac{4 δ}{γ_{n}} {∥ w_{n} - z_{n} ∥}^{2} + f (w_{n}) - f (z_{n}), \end{matrix}

for all

n \in N

. It follows that

\frac{1}{γ_{n}} 〈 z_{n} - w_{n}, w_{n} - x 〉 \geq (f + g) (w_{n}) - (f + g) (x) - \frac{4 δ}{γ_{n}} {∥ w_{n} - z_{n} ∥}^{2}, for all n \in N .

From Lemma 3, we have

〈 z_{n} - w_{n}, w_{n} - x 〉 = \frac{1}{2} (∥ z_{n} {- x ∥}^{2} - ∥ z_{n} - w_{n} ∥^{2} - ∥ w_{n} - x ∥^{2})

, and hence,

\frac{1}{2 γ_{n}} (∥ z_{n} {- x ∥}^{2} - ∥ z_{n} - w_{n} ∥^{2} - ∥ w_{n} {- x ∥}^{2}) \geq (f + g) (w_{n}) - (f + g) (x) - \frac{4 δ}{γ_{n}} {∥ w_{n} - z_{n} ∥}^{2},

for all

n \in N .

Then, it follows that, for any

x \in d o m (g),

\begin{matrix} ∥ z_{n} {- x ∥}^{2} - {∥ w_{n} - x ∥}^{2} & \geq 2 γ_{n} [(f + g) (w_{n}) - (f + g) (x)] + (1 - 8 δ) {∥ w_{n} - z_{n} ∥}^{2}, \end{matrix}

and

(I)

is proven. Next, we show

(I I)

. From (4), we have that

\frac{z_{n} - w_{n}}{γ_{n}} - ▿ f (z_{n}) \in \partial g (w_{n}), and

\frac{w_{n} - v_{n}}{γ_{n}} - ▿ f (w_{n}) \in \partial g (v_{n}) .

Then,

g (x) - g (w_{n}) \geq 〈 \frac{z_{n} - w_{n}}{γ_{n}} - ▿ f (z_{n}), x - w_{n} 〉, and

g (x) - g (v_{n}) \geq 〈 \frac{w_{n} - v_{n}}{γ_{n}} - ▿ f (w_{n}), x - v_{n} 〉, for all n \in N .

Moreover,

f (x) - f (z_{n}) \geq 〈 ▿ f (z_{n}), x - z_{n} 〉,

f (x) - f (w_{n}) \geq 〈 ▿ f (w_{n}), x - w_{n} 〉,

f (z_{n}) - f (w_{n}) \geq 〈 ▿ f (w_{n}), z_{n} - w_{n} 〉, and

f (w_{n}) - f (v_{n}) \geq 〈 ▿ f (v_{n}), w_{n} - v_{n} 〉, for all n \in N .

The above inequalities imply

\begin{matrix} f (x) - f (z_{n}) + f (x) - f (w_{n}) + g (x) - g (w_{n}) + g (x) - g (v_{n}) \\ \geq \frac{1}{γ_{n}} 〈 z_{n} - w_{n}, x - w_{n} 〉 + 〈 ▿ f (z_{n}), w_{n} - z_{n} 〉 + \frac{1}{γ_{n}} 〈 w_{n} - v_{n}, x - v_{n} 〉 + 〈 ▿ f (w_{n}), v_{n} - w_{n} 〉 \\ = \frac{1}{γ_{n}} 〈 z_{n} - w_{n}, x - w_{n} 〉 + 〈 ▿ f (z_{n}) - ▿ f (w_{n}), w_{n} - z_{n} 〉 + 〈 ▿ f (w_{n}), w_{n} - z_{n} 〉 \\ + \frac{1}{γ_{n}} 〈 w_{n} - v_{n}, x - v_{n} 〉 + 〈 ▿ f (w_{n}) - ▿ f (v_{n}), v_{n} - w_{n} 〉 + 〈 ▿ f (v_{n}), v_{n} - w_{n} 〉 \\ \geq \frac{1}{γ_{n}} 〈 z_{n} - w_{n}, x - w_{n} 〉 + \frac{1}{γ_{n}} 〈 w_{n} - v_{n}, x - v_{n} 〉 - ∥ ▿ f (w_{n}) - ▿ f (z_{n}) ∥ ∥ w_{n} - z_{n} ∥ \\ + 〈 ▿ f (w_{n}), w_{n} - z_{n} 〉 - ∥ ▿ f (v_{n}) - ▿ f (w_{n}) ∥ ∥ v_{n} - w_{n} ∥ + 〈 ▿ f (v_{n}), v_{n} - w_{n} 〉 \\ \geq \frac{1}{γ_{n}} 〈 z_{n} - w_{n}, x - w_{n} 〉 + \frac{1}{γ_{n}} 〈 w_{n} - v_{n}, x - v_{n} 〉 \\ - ∥ ▿ f (w_{n}) - ▿ f (z_{n}) ∥ (∥ w_{n} - z_{n} ∥ + ∥ v_{n} - w_{n} ∥) + 〈 ▿ f (w_{n}), w_{n} - z_{n} 〉 \\ - ∥ ▿ f (v_{n}) - ▿ f (w_{n}) ∥ (∥ w_{n} - z_{n} ∥ + ∥ v_{n} - w_{n} ∥) + 〈 ▿ f (v_{n}), v_{n} - w_{n} 〉 \\ = \frac{1}{γ_{n}} 〈 z_{n} - w_{n}, x - w_{n} 〉 + \frac{1}{γ_{n}} 〈 w_{n} - v_{n}, x - v_{n} 〉 + 〈 ▿ f (w_{n}), w_{n} - z_{n} 〉 + 〈 ▿ f (v_{n}), v_{n} - w_{n} 〉 \\ - (∥ ▿ f (w_{n}) - ▿ f (z_{n}) ∥ + ∥ ▿ f (v_{n}) - ▿ f (w_{n}) ∥) (∥ w_{n} - z_{n} ∥ + ∥ v_{n} - w_{n} ∥) \\ \geq \frac{1}{γ_{n}} 〈 z_{n} - w_{n}, x - w_{n} 〉 + \frac{1}{γ_{n}} 〈 w_{n} - v_{n}, x - v_{n} 〉 + 〈 ▿ f (w_{n}), w_{n} - z_{n} 〉 \\ + 〈 ▿ f (v_{n}), v_{n} - w_{n} 〉 - \frac{2 δ}{γ_{n}} (∥ w_{n} - z_{n} ∥ + ∥ v_{n} - w_{n} {∥)}^{2} \\ \geq \frac{1}{γ_{n}} 〈 z_{n} - w_{n}, x - w_{n} 〉 + \frac{1}{γ_{n}} 〈 w_{n} - v_{n}, x - v_{n} 〉 + f (v_{n}) - f (z_{n}) \\ - \frac{4 δ}{γ_{n}} (∥ w_{n} - z_{n} ∥^{2} + ∥ v_{n} - w_{n} ∥^{2}), \end{matrix}

for all

x \in d o m (g)

and

n \in N .

Hence,

\begin{matrix} \frac{1}{γ_{n}} & 〈 z_{n} - w_{n}, w_{n} - x 〉 + \frac{1}{γ_{n}} 〈 w_{n} - v_{n}, v_{n} - x 〉 \\ \geq (f + g) (w_{n}) + (f + g) (v_{n}) - 2 (f + g) (x) - \frac{4 δ}{γ_{n}} ∥ w_{n} - z_{n} ∥^{2} - \frac{4 δ}{γ_{n}} {∥ v_{n} - w_{n} ∥}^{2} . \end{matrix}

Moreover, from Lemma 3, we have, for all

n \in N,

〈 z_{n} - w_{n}, w_{n} - x 〉 = \frac{1}{2} (∥ z_{n} {- x ∥}^{2} - ∥ z_{n} - w_{n} ∥^{2} - ∥ w_{n} - x ∥^{2}), and

〈 w_{n} - v_{n}, v_{n} - x 〉 = \frac{1}{2} (∥ w_{n} {- x ∥}^{2} - ∥ w_{n} - v_{n} ∥^{2} - ∥ v_{n} - x ∥^{2}) .

As a result, we obtain

\begin{matrix} \frac{1}{2 γ_{n}} & (∥ z_{n} {- x ∥}^{2} - ∥ z_{n} - w_{n} ∥^{2}) - \frac{1}{2 γ_{n}} (∥ w_{n} - v_{n} ∥^{2} + ∥ v_{n} - x ∥^{2}) \\ \geq (f + g) (w_{n}) + (f + g) (v_{n}) - 2 (f + g) (x) - \frac{4 δ}{γ_{n}} ∥ w_{n} - z_{n} ∥^{2} - \frac{4 δ}{γ_{n}} {∥ v_{n} - w_{n} ∥}^{2}, \end{matrix}

for all

x \in d o m (g),

and

n \in N

. Therefore,

\begin{matrix} ∥ z_{n} {- x ∥}^{2} - {∥ v_{n} - x ∥}^{2} \geq & 2 γ_{n} [(f + g) (w_{n}) + (f + g) (v_{n}) - 2 (f + g) (x)] \\ + (1 - 8 δ) (∥ w_{n} - z_{n} ∥^{2} + ∥ v_{n} - w_{n} ∥^{2}), \end{matrix}

for all

x \in d o m (g),

and

n \in N

, and hence,

(I I)

is proven. □

Next, we prove the weak convergence result of Algorithm 7.

Theorem 9.

Let

{x_{n}}

be a sequence generated by Algorithm 7. Suppose that the following hold:

B1.: $\sum_{n = 1}^{+ \infty} β_{n} < + \infty;$
B2.: There exists $γ > 0$ such that $γ_{n} \geq γ,$ for all $n \in N .$

Then,

{x_{n}}

converges weakly to some point in

S_{*}

.

Proof.

Let

x^{*} \in S_{*}

; obviously,

x^{*} \in d o m (g) .

The following are direct consequences of Lemma 8:

\begin{matrix} ∥ z_{n} - x^{*} ∥^{2} - {∥ w_{n} - x^{*} ∥}^{2} & \geq 2 γ_{n} [(f + g) (w_{n}) - (f + g) (x^{*})] + (1 - 8 δ) {∥ w_{n} - z_{n} ∥}^{2} \\ \geq (1 - 8 δ) ∥ w_{n} - z_{n} ∥^{2}, \end{matrix}

(7)

and

\begin{matrix} ∥ z_{n} - x^{*} ∥^{2} - {∥ v_{n} - x^{*} ∥}^{2} & \geq 2 γ_{n} [(f + g) (w_{n}) + (f + g) (v_{n}) - 2 (f + g) (x^{*})] \\ + (1 - 8 δ) (∥ w_{n} - z_{n} ∥^{2} + ∥ v_{n} - w_{n} ∥^{2}) \\ \geq (1 - 8 δ) (∥ w_{n} - z_{n} ∥^{2} + ∥ v_{n} - w_{n} ∥^{2}), \end{matrix}

(8)

where

v_{n} = p r o x_{γ_{n} g} (w_{n} - γ_{n} ▿ f (w_{n})) .

Then, we have

\begin{matrix} ∥ x_{n + 1} - x^{*} ∥ & \leq (1 - α_{n}) ∥ w_{n} - x^{*} ∥ + α_{n} ∥ v_{n} - x^{*} ∥ \\ \leq (1 - α_{n}) ∥ w_{n} - x^{*} ∥ + α_{n} ∥ z_{n} - x^{*} ∥ \\ \leq ∥ z_{n} - x^{*} ∥ . \end{matrix}

(9)

Next, we show that

lim_{n \to \infty} ∥ x_{n} - x^{*} ∥

exists. Since

P_{d o m (g)}

is nonexpansive, we have

\begin{matrix} ∥ x_{n + 1} - x^{*} ∥ & \leq ∥ z_{n} - x^{*} ∥ \\ = ∥ P_{d o m (g)} y_{n} - P_{d o m (g)} x^{*} ∥ \\ \leq ∥ y_{n} - x^{*} ∥ \\ \leq ∥ x_{n} - x^{*} ∥ + β_{n} ∥ x_{n} - x_{n - 1} ∥ \\ \leq (1 + β_{n}) ∥ x_{n} - x^{*} ∥ + β_{n} ∥ x_{n - 1} - x^{*} ∥, for all n \in N . \end{matrix}

(10)

By using Lemma 4, we have that

{x_{n}}

is bounded. Consequently,

\sum_{n = 1}^{+ \infty} β_{n} ∥ x_{n} - x_{n - 1} ∥ < + \infty,

and

∥ y_{n} - x_{n} ∥ = β_{n} ∥ x_{n} - x_{n - 1} ∥ \to 0, as n \to + \infty .

By (10) together with Lemma 5, we conclude that

lim_{n \to + \infty} ∥ x_{n} - x^{*} ∥

exists. Since

x_{n} \in d o m (g),

for all

n \in N

, we obtain

∥ y_{n} - z_{n} ∥ \leq ∥ y_{n} - x_{n} ∥, for all n \in N,

which implies that

lim_{n \to + \infty} ∥ y_{n} - z_{n} ∥ = 0 .

Consequently,

lim_{n \to + \infty} ∥ x_{n} - z_{n} ∥ = 0,

and hence,

lim_{n \to + \infty} ∥ x_{n} - x^{*} ∥ = lim_{n \to + \infty} ∥ z_{n} - x^{*} ∥ .

Now, we will show that

lim_{n \to + \infty} ∥ x_{n} - w_{n} ∥ = 0 .

To do this, we consider the following two cases.

Case 1.

\underset{n \to + \infty}{lim sup} α_{n} = c < 1

, then from (9), we obtain

\underset{n \to + \infty}{lim sup} ∥ w_{n} - x^{*} ∥ = \underset{n \to + \infty}{lim sup} ∥ x_{n} - x^{*} ∥ = \underset{n \to + \infty}{lim sup} ∥ z_{n} - x^{*} ∥ .

Therefore, we obtain from (7) that

lim_{n \to + \infty} ∥ w_{n} - z_{n} ∥ = 0 .

As a result, we have

lim_{n \to + \infty} ∥ x_{n} - w_{n} ∥ = 0 .

Case 2.

\underset{n \to + \infty}{lim sup} α_{n} = 1

, then it follows from (9) that

\underset{n \to + \infty}{lim sup} ∥ v_{n} - x^{*} ∥ = \underset{n \to + \infty}{lim sup} ∥ x_{n} - x^{*} ∥ = \underset{n \to + \infty}{lim sup} ∥ z_{n} - x^{*} ∥ .

It follows from (8) that

lim_{n \to + \infty} ∥ w_{n} - z_{n} ∥ = 0,

and hence,

lim_{n \to + \infty} ∥ x_{n} - w_{n} ∥ = 0 .

We claim that every weak-cluster point of

{x_{n}}

belongs to

S_{*} .

To prove this claim, let w be a weak-cluster point of

{x_{n}} .

Then, there exists a subsequence

{x_{n_{k}}}

of

{x_{n}}

such that

x_{n_{k}} ⇀ w,

and hence,

w_{n_{k}} ⇀ w

. Next, we show that

w \in S_{*} .

From A2, we know that

▿ f

is uniformly continuous, so

lim_{k \to + \infty} ∥ ▿ f w_{n_{k}} - ▿ f z_{n_{k}} ∥ = 0 .

From (4), we also have

\frac{z_{n_{k}} - γ_{n_{k}} ▿ f z_{n_{k}} - w_{n_{k}}}{γ_{n_{k}}} \in \partial g (w_{n_{k}}), for all k \in N .

Hence,

\frac{z_{n_{k}} - w_{n_{k}}}{γ_{n_{k}}} - ▿ f z_{n_{k}} + ▿ f w_{n_{k}} \in \partial g (w_{n_{k}}) + ▿ f w_{n_{k}} = \partial (f + g) (w_{n_{k}}), for all k \in N .

By letting

k \to + \infty

in the above inequality, we can conclude from (1) that

0 \in \partial (f + g) (w),

and hence,

w \in S_{*} .

It follows directly from Lemma 6 that

{x_{n}}

converges weakly to a point in

S_{*}

, and the proof is now complete. □

If we set

β_{n} = 0,

for all

n \in N,

in Algorithm 7, we obtain the following Algorithm 8.

Algorithm 8. Algorithm with Line Search 3.

1:: Input Given $x_{0} \in d o m (g), σ > 0, θ \in (0, 1)$ , $δ \in (0, \frac{1}{8})$ and $α_{n} \in [0, 1]$ , for $n \in N,$

$\begin{matrix} w_{n} = p r o x_{γ_{n} g} (x_{n} - γ_{n} ▿ f (x_{n})), \\ x_{n + 1} = (1 - α_{n}) w_{n} + α_{n} p r o x_{γ_{n} g} (w_{n} - γ_{n} ▿ f (w_{n})), \end{matrix}$

where $γ_{n} : = Line Search 3 (x_{n}, δ, σ, θ) .$

The diagram of Algorithm 8 can be seen in Figure 2.

We next prove the complexity of Algorithm 8.

Theorem 10.

Let

{x_{n}}

be a sequence generated by Algorithm 8. Suppose that there exists

γ > 0

such that

γ_{n} \geq γ,

for all

n \in N,

then

{x_{n}}

converges weakly to a point in

S_{*} .

In addition, if

δ \in (0, \frac{1}{16}),

then the following also holds:

(f + g) (x_{n}) - min_{x \in H} (f + g) (x) \leq \frac{1}{2 γ} \frac{{[d (x_{0}, S_{*})]}^{2}}{n},

(11)

for all

n \in N .

Proof.

A weak convergence of

{x_{n}}

is guaranteed by Theorem 9. It remains to show that (11) is true. Let

v_{n} = p r o x_{γ_{n} g} (w_{n} - γ_{n} ▿ f (w_{n}))

and

x^{*} \in S_{*}

.

We first show that

f (x_{k + 1}) \leq f (x_{k}),

for all

k \in N .

We know that

x_{k} = z_{k}

in Lemma 8, so for any

x \in d o m (g)

and

k \in N

, we have:

∥ x_{k} {- x ∥}^{2} - ∥ w_{k} {- x ∥}^{2} \geq 2 γ_{k} [(f + g) (w_{k}) - (f + g) (x)] + (1 - 8 δ) {∥ w_{k} - x_{k} ∥}^{2},

(12)

and

\begin{matrix} ∥ x_{k} {- x ∥}^{2} - {∥ v_{k} - x ∥}^{2} & \geq 2 γ_{k} [(f + g) (w_{k}) + (f + g) (v_{k}) - 2 (f + g) (x)] \\ + (1 - 8 δ) (∥ w_{k} - x_{k} ∥^{2} + ∥ v_{k} - w_{k} ∥^{2}) . \end{matrix}

(13)

Putting

x = x_{k}

in (12) and (13), we obtain

- ∥ w_{k} - x_{k} ∥^{2} \geq 2 γ_{k} [(f + g) (w_{k}) - (f + g) (x_{k})] + (1 - 8 δ) {∥ w_{k} - x_{k} ∥}^{2},

(14)

and

\begin{matrix} - ∥ v_{k} - x_{k} ∥^{2} & \geq 2 γ_{k} [(f + g) (w_{k}) + (f + g) (v_{k}) - 2 (f + g) (x_{k})] \\ + (1 - 8 δ) (∥ w_{k} - x_{k} ∥^{2} + ∥ v_{k} - w_{k} ∥^{2}), \end{matrix}

(15)

respectively. Substituting x with

w_{k}

in (13), we obtain

\begin{matrix} ∥ x_{k} - w_{k} ∥^{2} - {∥ v_{k} - w_{k} ∥}^{2} & \geq 2 γ_{k} [(f + g) (v_{k}) - (f + g) (w_{k})] \\ + (1 - 8 δ) (∥ w_{k} - x_{k} ∥^{2} + ∥ v_{k} - w_{k} ∥^{2}) . \end{matrix}

(16)

By summing (15) and (16), we obtain

\begin{matrix} (16 δ - 1) ∥ x_{k} - w_{k} ∥^{2} + (16 δ - 4) {∥ v_{k} - w_{k} ∥}^{2} \geq 4 γ_{k} [(f + g) (v_{k}) - (f + g) (x_{k})] . \end{matrix}

(17)

It follows from (14) and (17) that

(f + g) (w_{k}) \leq (f + g) (x_{k}) and (f + g) (v_{k}) \leq (f + g) (x_{k}),

respectively, for all

k \in N .

Hence,

(f + g) (x_{k + 1}) - (f + g) (x_{k}) \leq (1 - α_{k}) (f + g) (w_{k}) + α_{k} (f + g) (v_{k}) - (f + g) (x_{k}) \leq 0,

(18)

for all

k \in N .

Hence,

{(f + g) (x_{k})}

is a non-increasing sequence. Now, put

x = x^{*}

in (12) and (13), then we obtain

∥ w_{k} - x^{*} ∥^{2} - {∥ x_{k} - x^{*} ∥}^{2} \leq 2 γ_{k} [(f + g) (x^{*}) - (f + g) (w_{k})],

(19)

and

\begin{matrix} ∥ v_{k} - x^{*} ∥^{2} - {∥ x_{k} - x^{*} ∥}^{2} & \leq 2 γ_{k} [2 (f + g) (x^{*}) - (f + g) (w_{k}) - (f + g) (v_{k})] \\ \leq 2 γ_{k} [(f + g) (x^{*}) - (f + g) (v_{k})] . \end{matrix}

(20)

Inequalities (19) and (20) imply that

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} - {∥ x_{k} - x^{*} ∥}^{2} & \leq (1 - α_{k}) ∥ w_{k} - x^{*} ∥^{2} + α_{k} ∥ v_{k} - x^{*} ∥^{2} - {∥ x_{k} - x^{*} ∥}^{2} \\ \leq 2 γ_{k} (1 - α_{k}) [(f + g) (x^{*}) - (f + g) (w_{k})] \\ + 2 γ_{k} α_{k} [(f + g) (x^{*}) - (f + g) (v_{k})] \\ = 2 γ_{k} (f + g) (x^{*}) - 2 γ_{k} [(1 - α_{k}) (f + g) (w_{k}) + α_{k} (f + g) (v_{k})] \\ \leq 2 γ_{k} [(f + g) (x^{*}) - (f + g) (x_{k + 1})], \end{matrix}

for all

k \in N .

Since

γ_{k} \geq γ,

we obtain

\begin{matrix} 0 \geq (f + g) (x^{*}) - (f + g) (x_{k + 1}) & \geq \frac{1}{2 γ_{k}} (∥ x_{k + 1} - x^{*} ∥^{2} - ∥ x_{k} - x^{*} ∥^{2}) \\ \geq \frac{1}{2 γ} (∥ x_{k + 1} - x^{*} ∥^{2} - ∥ x_{k} - x^{*} ∥^{2}), \end{matrix}

(21)

for all

k \in N .

Summing the above inequality over

k = 1, 2, 3, . . ., n - 1,

we obtain

n (f + g) (x^{*}) - \sum_{k = 0}^{n - 1} (f + g) (x_{k}) \geq \frac{1}{2 γ} ∥ x_{n} - x^{*} ∥^{2} - {∥ x_{0} - x^{*} ∥}^{2},

for all

n \in N

. Since,

{(f + g) (x_{k})}

is a non-increasing, we have

n (f + g) (x^{*}) - n (f + g) (x_{n}) \geq \frac{1}{2 γ} ∥ x_{n} - x^{*} ∥^{2} - {∥ x_{0} - x^{*} ∥}^{2},

for all

n \in N

. Hence,

(f + g) (x_{n}) - (f + g) (x^{*}) \leq \frac{1}{2 γ} \frac{∥ x_{0} - x^{*} ∥^{2}}{n}

(22)

Since

x^{*}

is arbitrarily chosen from

S_{*}

, we obtain

(f + g) (x_{n}) - min_{x \in H} (f + g) (x) \leq \frac{1}{2 γ} \frac{{[d (x_{0}, S_{*})]}^{2}}{n},

for all

n \in N,

and the proof is now complete. □

4. Some Applications on Data Classification

In this section, we apply Algorithms 3, 5, 7, and 8 to solve some classification problems based on a learning technique called extreme learning machine (ELM) introduced by Huang et al. [28]. It is formulated as follows:

Let

{(x_{k}, t_{k}) : x_{k} \in R^{n}, t_{k} \in R^{m}, k = 1, 2, \dots, N}

be a set of N samples where

x_{k}

is an input and

t_{k}

is a target. A simple mathematical model for the output of ELM for SLFNs with M hidden nodes and activation function G is defined by

o_{j} = \sum_{i = 1}^{M} η_{i} G (〈 w_{i}, x_{j} 〉 + b_{i}),

where

w_{i}

is the weight that connects the i-th hidden node and the input node,

η_{i}

is the weight connecting the i-th hidden node and the output node, and

b_{i}

is the bias. The hidden layer output matrix

H

is defined by

H = [\begin{matrix} G (〈 w_{1}, x_{1} 〉 + b_{1}) & \dots & G (〈 w_{M}, x_{1} 〉 + b_{M}) \\ ⋮ & ⋱ & ⋮ \\ G (〈 w_{1}, x_{N} 〉 + b_{1}) & \dots & G (〈 w_{M}, x_{N} 〉 + b_{M}) \end{matrix}] .

The main objective of ELM is to calculate an optimal weight

η = {[η_{1}^{T}, \dots, η_{M}^{T}]}^{T}

such that

H η = T,

where

T = {[t_{1}^{T}, \dots, t_{N}^{T}]}^{T}

is the training target. If the Moore–Penrose generalized inverse

H^{†}

of

H

exists, then

η = H^{†} T

is the solution. However, in general cases,

H^{†}

may not exist or be difficult for computation. Thus, in order to avoid such difficulties, we transformed the problem into a convex minimization problem and used our proposed algorithm to find the solution

η

without

H^{†}

.

In machine learning, a model can be overfit in the sense that it is very accurate on a training sets, but inaccurate on a testing set. In other words, it cannot be used to predict unknown data. In order to prevent overfitting, the least absolute shrinkage and selection operator (LASSO) [29] is used. It can be formulated as follows:

{Minimize : ∥ H η - T ∥}_{2}^{2} + λ {∥ η ∥}_{1},

(23)

where

λ

is a regularization parameter. If we set

f (x) : = {∥ H x - T ∥}_{2}^{2}

and

g (x) : = {λ ∥ x ∥}_{1},

then the problem (Section 4) is reduced to the problem (1). Hence, we can use our algorithm as a learning method to find the optimal weight

η

and solve classification problems.

In the experiments, we aim to classify three data sets from https://archive.ics.uci.edu (accessed on 15 November 2021):

Iris data set [30]. Each sample in this data set has four attributes, and the set contains three classes with 50 samples for each type.
Heart disease data set [31]. This data set contains 303 samples each of which has 13 attributes. In this data set, we classified two classes of data.
Wine data set [32]. In this data set, we classified three classes of 178 samples. Each sample contains 13 attributes.

In all experiments, we used the sigmoid as the activation function. The number of hidden nodes

M = 30 .

We calculate the accuracy of the output data by:

accuracy = \frac{correctly predicted data}{all data} \times 100 .

We chose control parameters for each algorithm as seen in Table 1.

In our experiments, the inertial parameters

β_{n}

for Algorithm 7 were chosen as follows:

β_{n} = \{\begin{matrix} 0.95, if n \leq 1000 \\ \frac{1}{n^{2}}, if n \geq 1001 . \end{matrix}

In the first experiment, we chose the regularization parameter

λ = 0.1

for all algorithms and data sets. Then, we used 10-fold cross-validation and utilized Average ACC and ERR

_{%}

for evaluating the performance of each algorithm.

Average ACC = \sum_{i = 1}^{N} \frac{x_{i}}{y_{i}} \times 100 % / N,

where N is the number of folds (

N = 10

),

x_{i}

is the number of data correctly predicted at fold i, and

y_{i}

is the number of all data at fold i.

Let err

_{L s u m}

= the sum of errors in all 10 training sets, err

_{T s u m}

= the sum of errors in all 10 testing sets,

L s u m =

the sum of all data in all 10 training sets, and

T s u m =

the sum of all data in all 10 testing sets. Then,

{ERR}_{%} = ({err}_{L %} + {err}_{T %}) / 2,

where err

_{L %}

=

\frac{{err}_{L s u m}}{L s u m} \times 100 %

and err

_{T %}

=

\frac{{err}_{T s u m}}{T s u m} \times 100 %

.

With these evaluation tools, we obtained the results for each data set as seen in Table 2, Table 3 and Table 4.

As seen in Table 2, Table 3 and Table 4, with the same regularization

λ = 0.1

, Algorithms 7 and 8 perform better than Algorithms 3 and 5 in terms of accuracy, while the computation times are relatively close among the four algorithms.

In the second experiment, the regularization parameters

λ

for each algorithm and data set were chosen using 10-fold cv. We compared the error of each model and data set with various

λ

, then chose the

λ

that gives the lowest error (

E R R_{%}

) for the particular model and data set. Hence, the parameter

λ

varies depending on the algorithm and data set. The choice of parameters

λ

can be seen in Table 5.

With the chosen

λ

, we also evaluated the performance of each algorithm using 10-fold cross-validation and similar evaluation tools as in the first experiment. The results can be seen in the following Table 6, Table 7 and Table 8.

With the chosen regularization parameters

λ

as in Table 5, we see that the

E R R_{%}

of each algorithm in Table 6, Table 7 and Table 8 is lower than that of Table 2, Table 3 and Table 4. We can also see that Algorithms 7 and 8 perform better than Algorithms 3 and 5 in terms of accuracy in all experiments conducted.

In Figure 3, we show the graph of

E R R_{%}

for each algorithm of the second experiment. As we can see, Algorithms 7 and 8 have lower

E R R_{%}

, which means they perform better than Algorithm 3 and 5.

From Table 6, Table 7 and Table 8, we can notice that the computational time of Algorithms 7 and 8 is

30 %

slower than Algorithm 3 at the same number of iterations. However, from Figure 3, we see that at the 120th iteration, both Algorithms 7 and 8 have lower

E R R_{%}

than Algorithm 3 at the 200th iteration. Therefore, the time needed for Algorithms 7 and 8 to achieve the same accuracy as or higher accuracy than Algorithm 3 is actually lower because we can compute the 120-step iteration much faster than the 200-step iteration.

5. Conclusions

We introduced a new line search technique and employed it in order to introduce new algorithms, namely Algorithms 7 and 8. Furthermore, Algorithm 7 also utilizes an inertial step to accelerate its convergence behavior. Both algorithms converge weakly to a solution of (1) without the Lipschitz assumption on

▿ f

. The complexity of Algorithm 8 was also analyzed and studied. Then, we applied the proposed algorithms to the data classification of the Iris, Heart disease, and Wine data set, then their performances were evaluated and compared with other line search algorithms, namely Algorithms 3 and 5. We observed from our experiments that Algorithm 7 achieved the highest accuracy in all data sets under the same number of iterations. Moreover, Algorithm 8, which is not an inertial algorithm, also performed better than Algorithms 3 and 5. Furthermore, from Figure 3, we see that at a lower number of iterations, the proposed algorithms were more accurate than the other algorithms at a higher iteration number.

Based on the experiments on various data sets, we conclude that the proposed algorithms perform better than the previously established algorithms. Therefore, for our future works, we would like to implement the proposed algorithm to predict and classify the data of patients with non-communicable diseases (NCDs) collected from Sriphat Medical Center, Faculty of Medicine, Chiang Mai University, Thailand. We aim to make an innovation for screening and preventing non-communicable diseases, which will be used in hospitals in Chiang Mai, Thailand.

Author Contributions

Writing—original draft preparation, P.S.; software and editing, D.C.; supervision, review and funding acquisition, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

The NSRF via the Program Management Unit for Human Resources & Institutional Development, Research and Innovation (Grant Number B05F640183).

Data Availability Statement

All data can be obtained from https://archive.ics.uci.edu (accessed on 15 November 2021).

Acknowledgments

This research has received funding support from the NSRF via the Program Management Unit for Human Resources & Institutional Development, Research and Innovation (Grant Number B05F640183). This research was also supported by Chiang Mai University.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chen, M.; Zhang, H.; Lin, G.; Han, Q. A new local and nonlocal total variation regularization model for image denoising. Clust. Comput. 2019, 22, 7611–7627. [Google Scholar] [CrossRef]
Combettes, P.L.; Wajs, V. Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 2005, 4, 1168–1200. [Google Scholar] [CrossRef] [Green Version]
Kankam, K.; Pholasa, N.; Cholamjiak, C. On convergence and complexity of the modified forward–backward method involving new line searches for convex minimization. Math. Meth. Appl. Sci. 2019, 1352–1362. [Google Scholar] [CrossRef]
Luo, Z.Q. Applications of convex optimization in signal processing and digital communication. Math. Program. 2003, 97, 177–207. [Google Scholar] [CrossRef]
Xiong, K.; Zhao, G.; Shi, G.; Wang, Y. A Convex Optimization Algorithm for Compressed Sensing in a Complex Domain: The Complex-Valued Split Bregman Method. Sensors 2019, 19, 4540. [Google Scholar] [CrossRef] [Green Version]
Zhang, Y.; Li, X.; Zhao, G.; Cavalcante, C.C. Signal reconstruction of compressed sensing based on alternating direction method of multipliers. Circuits Syst. Signal Process 2020, 39, 307–323. [Google Scholar] [CrossRef]
Hanjing, A.; Bussaban, L.; Suantai, S. The Modified Viscosity Approximation Method with Inertial Technique and Forward–Backward Algorithm for Convex Optimization Model. Mathematics 2022, 10, 1036. [Google Scholar] [CrossRef]
Hanjing, A.; Suantai, S. A fast image restoration algorithm based on a fixed point and optimization method. Mathematics 2020, 8, 378. [Google Scholar] [CrossRef] [Green Version]
Zhong, T. Statistical Behavior and Consistency of Classification Methods Based on Convex Risk Minimization. Ann. Stat. 2004, 32, 56–134. [Google Scholar] [CrossRef]
Elhamifar, E.; Sapiro, G.; Yang, A.; Sasrty, S.S. A Convex Optimization Framework for Active Learning. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 209–216. [Google Scholar] [CrossRef] [Green Version]
Yuan, M.; Wegkamp, M. Classification Methods with Reject Option Based on Convex Risk Minimization. J. Mach. Learn. Res. 2010, 11, 111–130. [Google Scholar]
Lions, P.L.; Mercier, B. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 1979, 16, 964–979. [Google Scholar] [CrossRef]
Polyak, B.T. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
Attouch, H.; Cabot, A. Convergence rate of a relaxed inertial proximal algorithm for convex minimization. Optimization 2019, 69, 1281–1312. [Google Scholar] [CrossRef]
Alvarez, F.; Attouch, H. An inertial proximal method for maxi mal monotone operators via discretiza tion of a nonlinear oscillator with damping. Set-Valued Anal. 2001, 9, 3–11. [Google Scholar] [CrossRef]
Van Hieu, D. An inertial-like proximal algorithm for equilibrium problems. Math. Meth. Oper. Res. 2018, 88, 399–415. [Google Scholar] [CrossRef]
Chidume, C.E.; Kumam, P.; Adamu, A. A hybrid inertial algorithm for approximating solution of convex feasibility problems with applications. Fixed Point Theory Appl. 2020, 2020, 12. [Google Scholar] [CrossRef]
Moudafi, A.; Oliny, M. Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 2003, 155, 447–454. [Google Scholar] [CrossRef] [Green Version]
Sarnmeta, P.; Inthakon, W.; Chumpungam, D.; Suantai, S. On convergence and complexity analysis of an accelerated forward–backward algorithm with line search technique for convex minimization problems and applications to data prediction and classification. J. Inequal. Appl. 2021, 2021, 141. [Google Scholar] [CrossRef]
Beck, A.; Teboulle, M. A fast iterative shrinkage–thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef] [Green Version]
Boţ, R.I.; Csetnek, E.R. An inertial forward–backward-forward primal-dual splitting algorithm for solving monotone inclusion problems. Numer. Algor. 2016, 71, 519–540. [Google Scholar] [CrossRef] [Green Version]
Verma, M.; Shukla, K.K. A new accelerated proximal gradient technique for regularized multitask learning framework. Pattern Recogn. Lett. 2017, 95, 98–103. [Google Scholar] [CrossRef]
Bello Cruz, J.Y.; Nghia, T.T. On the convergence of the forward–backward splitting method with line searches. Optim. Methods Softw. 2016, 31, 1209–1238. [Google Scholar] [CrossRef] [Green Version]
Burachik, R.S.; Iusem, A.N. Set-Valued Mappings and Enlargements of Monotone Operators; Springer: Berlin, Germany, 2008. [Google Scholar]
Huang, Y.; Dong, Y. New properties of forward–backward splitting and a practical proximal-descent algorithm. Appl. Math. Comput. 2014, 237, 60–68. [Google Scholar] [CrossRef]
Takahashi, W. Introduction to Nonlinear and Convex Analysis; Yokohama Publishers: Yokohama, Japan, 2009. [Google Scholar]
Moudafi, A.; Al-Shemas, E. Simultaneous iterative methods for split equality problem. Trans. Math. Program. Appl. 2013, 1, 1–11. [Google Scholar]
Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
Detrano, R.; Janosi, A.; Steinbrunn, W.; Pfisterer, M.; Schmid, J.J.; Sandhu, S.; Guppy, K.H.; Lee, S.; Froelicher, V. International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 1989, 64, 304–310. [Google Scholar] [CrossRef]
Forina, M.; Leardi, R.; Armanino, C.; Lanteri, S. PARVUS: An Extendable Package of Programs for Data Exploration; Elsevier: Amsterdam, The Netherlands, 1988. [Google Scholar]

Figure 1. Diagram of Algorithm 7.

Figure 2. Diagram of Algorithm 8.

Figure 3. ERR

_{%}

of each algorithm and data set of the second experiment.

Figure 3. ERR

_{%}

of each algorithm and data set of the second experiment.

Table 1. Chosen parameters of each algorithm.

	Algorithm 3	Algorithm 5	Algorithm 7	Algorithm 8
$σ$	0.49	0.124	0.124	0.124
$δ$	0.1	0.1	0.1	0.1
$θ$	0.1	0.1	0.1	0.1
$α_{n}$	-	-	$\frac{1}{2}$	$\frac{1}{3}$

Table 2. The performance of each algorithm in the first experiment at the 200th iteration with 10-fold cv. on the Iris data set.

	Algorithm 3		Algorithm 5		Algorithm 7		Algorithm 8
	acc.train	acc.test	acc.train	acc.test	acc.train	acc.test	acc.train	acc.test
Fold 1	87.41	86.67	93.33	86.67	97.78	100	97.04	93.33
Fold 2	88.15	93.33	92.59	100	96.30	100	96.30	100
Fold 3	88.15	100	92.59	100	97.78	93.33	96.30	100
Fold 4	88.15	100	92.59	100	97.78	100	96.30	100
Fold 5	86.67	86.67	93.33	86.67	97.78	100	96.30	100
Fold 6	88.15	73.33	92.59	80	99.26	86.67	97.78	86.67
Fold 7	87.41	100	92.59	100	97.78	100	96.30	100
Fold 8	88.15	86.67	93.33	93.33	97.04	93.33	97.78	86.67
Fold 9	88.89	80	93.33	93.33	98.52	93.33	96.30	93.33
Fold 10	88.15	73.33	92.59	93.33	97.78	100	95.56	100
Average acc.	87.93	88	92.89	93.33	97.78	96.67	96.59	96
ERR $_{%}$	12.04		6.89		2.78		3.70
Time	0.0609		0.0901		0.0781		0.0767

Table 3. The performance of each algorithm in the first experiment at the 200th iteration with 10-fold cv. on the Heart disease data set.

	Algorithm 3		Algorithm 5		Algorithm 7		Algorithm 8
	acc.train	acc.test	acc.train	acc.test	acc.train	acc.test	acc.train	acc.test
Fold 1	79.85	86.67	81.32	86.67	83.15	93.33	82.05	86.67
Fold 2	80.15	80.65	80.15	80.65	84.19	83.87	81.62	83.87
Fold 3	81.25	77.42	82.35	77.42	84.93	77.42	83.09	80.65
Fold 4	80.51	83.87	82.35	87.10	84.56	80.65	82.72	90.32
Fold 5	79.85	90	81.32	90	84.98	86.67	82.42	86.67
Fold 6	81.68	80	83.15	83.33	84.62	86.67	83.52	83.33
Fold 7	80.22	86.67	81.68	83.33	84.25	83.33	82.05	83.33
Fold 8	82.05	66.67	82.42	66.67	84.98	73.33	82.42	66.67
Fold 9	81.32	70	81.68	70	86.08	73.33	82.05	70
Fold 10	80.95	76.67	82.05	80	84.25	83.33	82.05	80
Average acc.	80.78	79.86	81.85	80.52	84.60	82.19	82.40	81.15
ERR $_{%}$	19.67		18.81		16.61		18.21
Time	0.0726		0.1048		0.1004		0.0921

Table 4. The performance of each algorithm in the first experiment at the 200th iteration with 10-fold cv. on the Wine data set.

	Algorithm 3		Algorithm 5		Algorithm 7		Algorithm 8
	acc.train	acc.test	acc.train	acc.test	acc.train	acc.test	acc.train	acc.test
Fold 1	96.89	100	96.89	100	99.38	100	98.14	100
Fold 2	96.88	100	97.50	100	99.38	100	98.13	100
Fold 3	97.50	100	98.13	100	99.38	100	98.13	100
Fold 4	97.50	100	96.88	100	99.38	100	98.13	100
Fold 5	96.88	100	97.50	100	99.38	100	98.13	100
Fold 6	97.50	94.44	96.88	100	99.38	100	98.13	100
Fold 7	97.50	94.44	98.13	94.44	100	94.44	98.75	94.44
Fold 8	97.50	100	96.88	100	99.38	100	98.13	100
Fold 9	98.75	88.89	98.13	88.89	99.38	88.89	99.38	88.89
Fold 10	98.76	88.24	98.76	88.24	99.38	100	98.14	100
Average acc.	97.57	96.60	97.57	97.16	99.44	98.33	98.31	98.33
ERR $_{%}$	2.90		2.62		1.12		1.69
Time	0.0624		0.0997		0.0870		0.0810

Table 5. Chosen

λ

of each algorithm.

Table 5. Chosen

λ

of each algorithm.

	Regularization Parameter $λ$
	Iris	Heart Disease	Wine
Algorithm 3	$0.001$	$0.003$	$0.02$
Algorithm 5	$0.01$	$0.03$	$0.006$
Algorithm 7	$0.003$	$0.13$	$0.0001$
Algorithm 8	$0.01$	$0.008$	$0.003$

Table 6. The performance of each algorithm in the second experiment at the 200th iteration with 10-fold cv. on the Iris data set.

	Algorithm 3		Algorithm 5		Algorithm 7		Algorithm 8
	acc.train	acc.test	acc.train	acc.test	acc.train	acc.test	acc.train	acc.test
Fold 1	88.15	86.67	93.33	86.67	98.52	100	97.04	93.33
Fold 2	88.15	93.33	92.59	100	98.52	100	96.30	100
Fold 3	88.89	100	93.33	100	98.52	100	96.30	100
Fold 4	88.15	100	92.59	100	98.52	100	96.30	100
Fold 5	86.67	86.67	93.33	86.67	98.52	100	96.30	100
Fold 6	88.15	73.33	93.33	80	99.26	86.67	97.78	86.67
Fold 7	87.41	100	92.59	100	98.52	100	96.30	100
Fold 8	88.15	86.67	93.33	93.33	97.78	100	97.78	86.67
Fold 9	88.89	80	93.33	93.33	98.52	100	96.30	93.33
Fold 10	88.15	73.33	92.59	93.33	98.52	100	95.56	100
Average acc.	88.07	88	93.04	93.33	98.52	98.67	96.59	96
ERR $_{%}$	11.96		6.81		1.41		3.70
Time	0.0618		0.0973		0.0793		0.0783

Table 7. The performance of each algorithm in the second experiment at the 200th iteration with 10-fold cv. on the Heart disease data set.

	Algorithm 3		Algorithm 5		Algorithm 7		Algorithm 8
	acc.train	acc.test	acc.train	acc.test	acc.train	acc.test	acc.train	acc.test
Fold 1	79.49	86.67	82.05	86.67	84.25	80	82.05	86.67
Fold 2	80.15	80.65	80.51	83.87	83.82	87.10	81.62	83.87
Fold 3	81.62	77.42	81.99	80.65	84.56	80.65	83.46	80.65
Fold 4	80.51	83.87	82.72	90.32	83.82	87.10	83.09	87.10
Fold 5	79.85	90	82.42	86.67	86.45	76.67	82.78	86.67
Fold 6	81.68	80	83.52	83.33	85.35	86.67	83.52	83.33
Fold 7	80.22	86.67	81.68	83.33	84.98	73.33	82.05	83.33
Fold 8	82.42	66.67	82.42	66.67	83.15	90	82.78	66.67
Fold 9	80.95	70.00	82.05	70	84.62	83.33	82.42	70
Fold 10	80.95	76.67	82.05	80	84.98	90	82.78	83.33
Average acc.	80.78	79.86	82.14	81.15	84.60	83.48	82.66	81.16
ERR $_{%}$	19.67		18.34		15.95		18.08
Time	0.0794		0.1129		0.1013		0.097

Table 8. The performance of each algorithm in the second experiment at the 200th iteration with 10-fold cv. on the Wine data set.

	Algorithm 3		Algorithm 5		Algorithm 7		Algorithm 8
	acc.train	acc.test	acc.train	acc.test	acc.train	acc.test	acc.train	acc.test
Fold 1	96.89	100	97.52	100	99.38	100	98.14	100
Fold 2	96.88	100	97.50	100	100	100	98.75	100
Fold 3	97.50	100	97.50	100	100	100	98.13	100
Fold 4	97.50	100	98.13	100	99.38	100	98.13	100
Fold 5	97.50	100	98.13	100	99.38	100	98.13	100
Fold 6	97.50	94.44	98.13	100	99.38	100	98.13	100
Fold 7	97.50	94.44	98.75	94.44	100	94.44	98.75	94.44
Fold 8	97.50	100	97.50	100	99.38	100	98.13	100
Fold 9	98.75	88.89	98.75	88.89	99.38	100	99.38	88.89
Fold 10	98.76	88.24	98.14	88.24	100	100	98.14	100
Average acc.	97.63	96.60	98	97.16	99.63	99.44	98.38	98.33
ERR $_{%}$	2.87		2.40		0.47		1.65
Time	0.0644		0.0971		0.0874		0.0819

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chumpungam, D.; Sarnmeta, P.; Suantai, S. An Accelerated Convex Optimization Algorithm with Line Search and Applications in Machine Learning. Mathematics 2022, 10, 1491. https://doi.org/10.3390/math10091491

AMA Style

Chumpungam D, Sarnmeta P, Suantai S. An Accelerated Convex Optimization Algorithm with Line Search and Applications in Machine Learning. Mathematics. 2022; 10(9):1491. https://doi.org/10.3390/math10091491

Chicago/Turabian Style

Chumpungam, Dawan, Panitarn Sarnmeta, and Suthep Suantai. 2022. "An Accelerated Convex Optimization Algorithm with Line Search and Applications in Machine Learning" Mathematics 10, no. 9: 1491. https://doi.org/10.3390/math10091491

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Accelerated Convex Optimization Algorithm with Line Search and Applications in Machine Learning

Abstract

1. Introduction

2. Preliminaries

3. Main Results

4. Some Applications on Data Classification

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI