A New Forward–Backward Algorithm with Line Searchand Inertial Techniques for Convex Minimization Problems with Applications

Chumpungam, Dawan; Sarnmeta, Panitarn; Suantai, Suthep

doi:10.3390/math9131562

Open AccessArticle

A New Forward–Backward Algorithm with Line Searchand Inertial Techniques for Convex Minimization Problems with Applications

by

Dawan Chumpungam

¹

,

Panitarn Sarnmeta

¹ and

Suthep Suantai

^1,2,*

¹

Data Science Research Center, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand

²

Research Center in Mathematics and Applied Mathematics, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand

^*

Author to whom correspondence should be addressed.

Mathematics 2021, 9(13), 1562; https://doi.org/10.3390/math9131562

Submission received: 17 May 2021 / Revised: 29 June 2021 / Accepted: 30 June 2021 / Published: 2 July 2021

(This article belongs to the Special Issue Nonlinear Problems and Applications of Fixed Point Theory)

Download

Browse Figures

Versions Notes

Abstract

:

For the past few decades, various algorithms have been proposed to solve convex minimization problems in the form of the sum of two lower semicontinuous and convex functions. The convergence of these algorithms was guaranteed under the L-Lipschitz condition on the gradient of the objective function. In recent years, an inertial technique has been widely used to accelerate the convergence behavior of an algorithm. In this work, we introduce a new forward–backward splitting algorithm using a new line search and inertial technique to solve convex minimization problems in the form of the sum of two lower semicontinuous and convex functions. A weak convergence of our proposed method is established without assuming the L-Lipschitz continuity of the gradient of the objective function. Moreover, a complexity theorem is also given. As applications, we employed our algorithm to solve data classification and image restoration by conducting some experiments on these problems. The performance of our algorithm was evaluated using various evaluation tools. Furthermore, we compared its performance with other algorithms. Based on the experiments, we found that the proposed algorithm performed better than other algorithms mentioned in the literature.

Keywords:

convex minimization problems; machine learning; forward–backward algorithm; line search; accelerated algorithm; data classification; image restoration

1. Introduction

The convex minimization problem has been studied intensively for the past few decades due to its wide range of applications in various real-world problems. Some major problems in physics, economics, data science, engineering, and medical science can be viewed as convex minimization problems, for instance a reaction–diffusion equation, which is a mathematical model describing physical phenomena, such as chemical reactions, heat flow models, and population dynamics. The problem of finding an unknown reaction term of such an equation can be formulated as a coefficient inverse problem (CIP) for a partial differential equation (PDE). Numerical methods for solving CIPs for PDEs, as well as their applications to various subjects have been widely studied and analyzed; for more comprehensive information on this topic, see [1,2,3,4,5]. To obtain a globally convergent method for solving CIPs for PDEs, many authors have employed the convexification technique, which reformulates CIPs as convex minimization problems; for a more in-depth development and discussion, we refer to [6,7,8]. Therefore, a numerical method for convex minimization problems can be applied. More examples of convex minimization problems are signal and image processing, compressed sensing, and machine learning tasks such as regression and classification; see [9,10,11,12,13,14,15,16] and the references therein for more information.

The problem is formulated, in the form of the summation of two convex functions, as follows:

min_{x \in H} {f (x) + g (x)},

(1)

where

f, g : H \to R \cup {+ \infty}

are proper, lower semicontinuous convex functions and H is a Hilbert space.

If f is differentiable, then x solves (1) if and only if:

x = p r o x_{α g} (I - α ▿ f) (x),

(2)

where

α > 0,

p r o x_{α g} (x) = J_{α}^{\partial g} (x) = {(I - α \partial g)}^{- 1} (x),

I is an identity mapping and

\partial g

is a subdifferential of

g .

In other words, x is a fixed point of

p r o x_{α g} (I - α ▿ f) .

Under some conditions, the Picard iteration converges to the solution. As a result, the forward–backward algorithm [17], which is defined as follows:

x_{n + 1} = p r o x_{α_{n} g} (I - α_{n} ▿ f) (x_{n}), for all n \in N,

(3)

where

α_{n}

is a suitable step size, converges to a solution of (1). Due to its simplicity, this method has been intensively studied and improved by many researchers; see [10,16,18] and the references therein for more information. One well-known method, which notably improves the convergence rate of (3), is the fast iterative shrinkage-thresholding algorithm (FISTA) introduced by Beck and Teboulle [19]. To the best of our knowledge, most of these works assumed that

▿ f

is Lipschitz continuous. However, in general, such an assumption might be too strong, and finding a Lipschitz constant of

▿ f

is sometimes difficult. To relax this strong assumption, Cruz and Nghia [20] proposed a line search technique and replaced the Lipschitz continuity assumption of

▿ f

with weaker assumptions, as seen in the following conditions:

Assumption A1.

f, g

are proper lower semicontinuous convex functions with

d o m g \subseteq d o m f;

Assumption A2.

f is differentiable on an open set containing

d o m g,

and

▿ f

is uniformly continuous on any bounded subset of

d o m g

and maps any bounded subset of

d o m g

to a bounded set in

H .

Note that if

▿ f

is L-Lipschitz continuous, then A2 holds. They also established an algorithm using a line search technique and obtained a weak convergence result under Assumptions 1 and 2. In the same work, they also proposed an accelerated algorithm based on FISTA and a line search technique. They showed that this accelerated algorithm performed better than the other introduced algorithm.

Recently, inspired by [20], Kankam et al. [9] proposed a new line search technique and a new algorithm to solve (1). They conducted some experiments on signal processing and found that their method performed better than that of Cruz and Nghia [20].

In recent years, many authors have utilized the inertial technique in order to accelerate their algorithms. This was first introduced by Polyak [21] to solve smooth convex minimization problems. After that, many inertial-type algorithms were proposed and studied by many authors; see [22,23,24,25] and the references therein.

The algorithms introduced in [9,20] are easy to employ, since a Lipschitz constant of the gradient of an objective function f is not required to exist. Although the convergence of these algorithms is guaranteed, some improvements are still welcome, specifically utilizing an inertial technique in order to improve the performance. Hence, in this work, motivated by the works mentioned above, our main objective was to propose a new algorithm that utilizes both line search and inertial techniques to not only guarantee its convergence to a solution of (1) without assuming an L-Lipschitz continuity on

▿ f

, but also to improve its performance by mean of an inertial technique. We established a weak convergence theorem under Assumptions 1 and 2. Moreover, a complexity theorem of this new algorithm was studied. Furthermore, we employed our proposed algorithm to solve an image restoration problem, as well as data classification problems. Then, we evaluated its performance and compared it with various other algorithms. The proposed method is also of great interest in solving nonlinear coefficient inverse problems for partial differential equations, along with other possible applications of convex minimization problems.

This work is organized as follows: In Section 2, we recall important definitions, lemmas that will be used in later sections, as well as various methods introduced in [9,19,20,22]. In Section 3, we introduce a new algorithm using new line search and inertial techniques and establish its weak convergence to a solution of (1). Moreover, the complexity of this method is also analyzed. In Section 4, in order to compare the performance of the studied algorithms, some numerical experiments on data classification problems and image restoration problem are conducted and discussed. In the last section, the brief conclusion of this paper is presented.

2. Preliminaries

Throughout this work, we denote

x_{n} \to x

and

x_{n} ⇀ x

as

{x_{n}}

converges strongly and weakly to x, respectively. For

h : H \to R \cup {+ \infty}

, we also denote

d o m h : = {x \in H : h (x) < + \infty} .

First, we recall some methods for solving (1) introduced by many authors mentioned in Section 1.

The fast iterative shrinkage-thresholding algorithm (FISTA) was introduced by Beck and Teboulle[19] as follows (Algorithm 1).

Algorithm 1: FISTA.

1:: Input Given $y_{1} = x_{0} \in R^{n}$ , and $t_{1} = 1,$ for $n \in N,$

$\begin{array}{l} x_{n} = p r o x_{\frac{1}{L} g} (y_{n} - \frac{1}{L} ▿ f (y_{n})), \\ t_{n + 1} = \frac{1 + \sqrt{1 + 4 t_{n}^{2}}}{2}, θ_{n} = \frac{t_{n} - 1}{t_{n + 1}}, \\ y_{n + 1} = x_{n} + θ_{n} (x_{n} - x_{n - 1}), n \in N, \end{array}$

where L is a Lipschitz constant of $▿ f$ .

In 2016, Cruz and Nghia [20] introduced a line search step as follows (Algorithm 2).

Algorithm 2: Line Search 1

(x, σ, θ, δ)

.

1:: Input Given $x \in d o m g,$ $σ > 0$ , $θ \in (0, 1),$ and $δ \in (0, \frac{1}{2}) .$
2:: Set $α = σ .$
3:: while $α ∥ ▿ f (p r o x_{α g} (x - α ▿ f (x))) - ▿ f (x) ∥ > δ ∥ p r o x_{α g} (x - α ▿ f (x)) - x ∥$ do
4:: Set $α = θ α$
5:: end while
6:: Output $α .$

They asserted that Line Search 1 stops after finitely many steps and proposed the following algorithm (Algorithm 3).

Algorithm 3: Line Search 1 stops after finitely many steps.

1:: Input Given $x_{0} \in d o m g,$ $σ > 0$ , $θ \in (0, 1),$ and $δ \in (0, \frac{1}{2}),$ for $n \in N$ ,

$x_{n + 1} = p r o x_{α_{n} g} (I - α_{n} ▿ f) (x_{n}),$

where $α_{n} : =$ Line Search 1 $(x_{n}, σ, θ, δ) .$

They showed that a sequence generated by Algorithm 3 converges weakly to a solution of (1) under Assumptions 1 and 2.

Recently, Kankam et al. [9] proposed a new line search technique as follows (Algorithm 4).

Algorithm 4: Line Search 2

(x, σ, θ, δ)

.

1:: Input Given $x \in d o m g,$ $σ > 0$ , $θ \in (0, 1),$ and $δ > 0$ . Set

$\begin{matrix} L (x, α) = p r o x_{α g} (x - α ▿ f (x)), and \\ S (x, α) = p r o x_{α g} (L (x, α) - α ▿ f (L (x, α))) . \end{matrix}$
2:: Set $α = σ .$
3:: while

$\begin{array}{l} α max {∥ ▿ f (S (x, α)) - ▿ f (L (x, & α)) ∥, ∥ ▿ f (L (x, α)) - ▿ f (x) ∥} \\ > δ (∥ S (x, α) - L (x, α) ∥ + ∥ L (x, α) - x ∥) \end{array}$

do
4:: Set $α = θ α, L (x, α) = L (x, θ α), S (x, α) = S (x, θ α)$
5:: end while
6:: Output $α .$

They proved that Line Search 2 stops at finitely many steps and introduced the following algorithm (Algorithm 5).

Algorithm 5: Line Search 2 stops at finitely many steps.

1:: Input Given $x_{0} \in d o m g,$ $σ > 0$ , $θ \in (0, 1),$ and $δ \in (1, \frac{1}{8}),$ for $n \in N,$

$\begin{array}{l} y_{n} = p r o x_{γ_{n} g} (x_{n} - γ_{n} ▿ f (x_{n})), \\ x_{n + 1} = p r o x_{γ_{n} g} (y_{n} - γ_{n} ▿ f (y_{n})), \end{array}$

where $γ_{n} : = Line Search 2 (x_{n}, σ, θ, δ) .$

They established a weak convergence theorem for Algorithm 5, under Assumptions 1 and 2. As we can see, Algorithms 3 and 5 do not utilize an inertial step.

Inspired by Algorithm 1 (FISTA), Cruz and Nghia [20] also proposed the following accelerated algorithm (Algorithm 6).

Algorithm 6: Accelerated algorithm.

1:: Input Given $x_{0}, x_{1} \in d o m g, α_{- 1} > 0$ , $θ \in (0, 1), t_{0} = 1,$ and $δ \in (0, \frac{1}{2}),$ for $n \in N,$

$\begin{array}{l} t_{n + 1} = \frac{1 + \sqrt{1 + 4 t_{n}^{2}}}{2}, \\ y_{n} = x_{n} + (\frac{t_{n} - 1}{t_{n + 1}}) (x_{n} - x_{n - 1}), \\ {\tilde{y}}_{n} = P_{d o m g} (y_{n}), \\ x_{n + 1} = p r o x_{α_{n} g} ({\tilde{y}}_{n} - α_{n} ▿ f ({\tilde{y}}_{n})), \end{array}$

where $α_{n} : =$ Line Search 1 $({\tilde{y}}_{n}, α_{n - 1}, θ, δ),$ and P is a metric projection.

They showed that Algorithm 6 has better complexity than Algorithm 3. However, the convergence to a solution of (1) is not guaranteed under the inertial parameter

β_{n} = \frac{t_{n} - 1}{t_{n + 1}}

.

In 2019, Attouch and Cabot [22] analyzed the convergence rate of a method called the relaxed inertial algorithm (RIPA) for solving monotone inclusion problems, which is closely related to convex minimization problems. This method utilizes an inertial step

x_{n} + β_{n} (x_{n} - x_{n - 1})

to improve its performance. It was defined as follows (Algorithm 7).

Algorithm 7: RIPA.

1:: Input Given $x_{0}, x_{1} \in H, β_{n} \geq 0, ρ_{n} \in (0, 1], μ_{n} > 0,$ for $n \in N,$

$\begin{matrix} y_{n} = x_{n} + β_{n} (x_{n} - x_{n - 1}), \\ x_{n + 1} = (1 - ρ_{n}) y_{n} + ρ_{n} J_{μ_{n}}^{A} (y_{n}), \end{matrix}$

where A is a maximal monotone operator and $J_{μ_{n}}^{A} (y_{n}) = {(I + μ_{n} A)}^{- 1}$ .

Under mild restrictions of the control parameters, they showed that Algorithm 7 (RIPA) gives a fast convergence rate.

Next, we recall some important tools that will be used in the later sections.

Definition 1.

For

x \in H,

a subdifferential of h at x is defined as follows:

\partial h (x) : = {u \in H : 〈 u, y - x 〉 + h (x) \leq h (y), y \in H} .

The following can be found in [26].

Lemma 1 ([26]).

A subdifferential

\partial h

is maximal monotone. Moreover, a graph of

\partial h,

G p h (\partial h) : = {(u, v) \in H \times H : v \in \partial h (u)}

is demiclosed, i.e., for any sequence

(u_{n}, v_{n}) \in G p h (\partial h)

such that

{u_{n}}

converges weakly to u and

{v_{n}}

converges strongly to

v,

we have

(u, v) \in G p h (\partial h)

.

The proximal operator,

p r o x_{g} : H \to d o m g

with

p r o x_{g} (x) = {(I + \partial g)}^{- 1} (x)

, is single-valued with the full domain, and the following holds:

\frac{x - p r o x_{α g} x}{α} \in \partial g (p r o x_{α g} x), for all x \in H and α > 0 .

(4)

The following lemmas are crucial for the main results.

Lemma 2 ([27]).

Let H be a real Hilbert space. Then, the following hold, for all

x, y \in H

and

α \in [0, 1]

:

(i): ${∥ x \pm y ∥}^{2} = {∥ x ∥}^{2} \pm 2 〈 x, y 〉 + {∥ y ∥}^{2};$
(ii): ${∥ α x + (1 - α) y ∥}^{2} = {α ∥ x ∥}^{2} + {(1 - α) ∥ y ∥}^{2} - α (1 - α) {∥ x - y ∥}^{2};$
(iii): ${∥ x + y ∥}^{2} \leq {∥ x ∥}^{2} + 2 〈 y, x + y 〉 .$

Lemma 3 ([16]).

Let

{a_{n}}

and

{β_{n}}

be sequences of non-negative real numbers such that:

a_{n + 1} \leq (1 + β_{n}) a_{n} + β_{n} a_{n - 1}, f o r a l l n \in N .

Then, the following holds:

a_{n + 1} \leq K \prod_{j = 1}^{n} (1 + 2 β_{j}), w h e r e K = max {a_{1}, a_{2}} .

Moreover, if

\sum_{n = 1}^{+ \infty} β_{n} < + \infty,

then

{a_{n}}

is bounded.

Lemma 4 ([27]).

Let

{a_{n}}, {b_{n}}

and

{δ_{n}}

be sequences of non-negative real numbers such that:

a_{n + 1} \leq (1 + δ_{n}) a_{n} + b_{n}, f o r a l l n \in N .

If

\sum_{n = 1}^{+ \infty} δ_{n} < + \infty

and

\sum_{n = 1}^{+ \infty} b_{n} < + \infty,

then

lim_{n \to + \infty} a_{n}

exists.

Lemma 5 ([28]).

Let H be a Hilbert space and

{x_{n}}

a sequence in H such that there exists a nonempty subset Ω of H satisfying the following:

(i): for any $x^{*} \in Ω, lim_{n \to} ∥ x_{n} - x^{*} ∥$ exists;
(ii): every weak-cluster point of ${x_{n}}$ belongs to $Ω .$

Then,

{x_{n}}

converges weakly to an element in Ω.

3. Main Results

In this section, we assume the existence of a solution of (1) and denote

S_{*}

the set of all such solutions. We modify Line Search 2 as follows (Algorithm 8).

Algorithm 8: Line Search 3

(x, σ, θ, δ)

.

1:: Input Given $x \in d o m g,$ $σ > 0$ , $θ \in (0, 1),$ and $δ > 0$ . Set

$\begin{matrix} L (x, α) = p r o x_{α g} (x - α ▿ f (x)), and \\ S (x, α) = p r o x_{α g} (L (x, α) - α ▿ f (L (x, α))) . \end{matrix}$
2:: Set $α = σ .$
3:: while

$\begin{array}{l} \frac{α}{2} (∥ ▿ f (S (x, α)) - ▿ f (L (x, α)) ∥ & + ∥ ▿ f (L (x, α)) - ▿ f (x) ∥) \\ > δ (∥ S (x, α) - L (x, α) ∥ + ∥ L (x, α) - x ∥), \end{array}$

do
4:: Set $α = θ α, L (x, α) = L (x, θ α), S (x, α) = S (x, θ α)$
5:: end while
6:: Output $α .$

We know that Line Search 3 terminates at a lower step than Line Search 2, since:

\begin{matrix} \frac{α}{2} (∥ ▿ f (S (x, & α)) - ▿ f (L (x, α)) ∥ + ∥ ▿ f (L (x, α)) - ▿ f (x) ∥) \\ \leq α max {∥ ▿ f (S (x, α)) - ▿ f (L (x, α)) ∥, ∥ ▿ f (L (x, α)) - ▿ f (x) ∥} . \end{matrix}

Therefore, Line Search 3 stops after finitely many steps. We introduce an accelerated algorithm by utilizing Line Search 3 as follows (Algorithm 9).

Algorithm 9.

1:: Input Given $x_{0}, x_{1} \in d o m g, β_{n} \geq 0, σ > 0, θ \in (0, 1) and δ \in (0, \frac{1}{8}), for n \in N,$

$\begin{array}{l} {\hat{x}}_{n} = x_{n} + β_{n} (x_{n} - x_{n - 1}), \\ y_{n} = P_{d o m g} {\hat{x}}_{n}, \\ z_{n} = {p r o x}_{γ_{n} g} (y_{n} - γ_{n} ▿ f (y_{n})), \\ x_{n + 1} = {p r o x}_{γ_{n} g} (z_{n} - γ_{n} ▿ f (z_{n})), \end{array}$

for $y_{n} : =$ Line Search 3( $y_{n}, σ, θ, δ$ ), and $P_{d o m g}$ is a metric projection map onto $d o m g$ .

Next, we establish our first theorem.

Theorem 1.

Let H be a Hilbert space,

f : H \to R \cup {+ \infty}

and

g : H \to R \cup {+ \infty}

proper lower semicontinuous convex functions satisfying A1 and A2. Suppose that

d o m g

is closed and the following also hold, for all

n \in N

:

C1.: $γ_{n} \geq γ > 0;$
C2.: $β_{n} \geq 0 a n d \sum_{n = 1}^{+ \infty} β_{n} < + \infty .$

Then, a sequence

{x_{n}}

generated by Algorithm 9 converges weakly to a point in

S_{*}

.

Proof.

For any

x \in d o m g

and

n \in N,

we claim that:

\begin{matrix} ∥ y_{n} {- x ∥}^{2} - {∥ x_{n + 1} - x ∥}^{2} \geq & 2 γ_{n} [(f + g) (z_{n}) + (f + g) (x_{n + 1}) - 2 (f + g) (x)] \\ + (1 - 8 δ) (∥ z_{n} - y_{n} ∥^{2} + ∥ x_{n + 1} - z_{n} ∥^{2}) . \end{matrix}

(5)

To prove our claim, we know, from (4) and the definition of

\partial g,

that:

\frac{y_{n} - z_{n}}{γ_{n}} - ▿ f (y_{n}) \in \partial g (z_{n}), and \frac{z_{n} - x_{n + 1}}{γ_{n}} - ▿ f (z_{n}) \in \partial g (x_{n + 1}) .

Then,

g (x) - g (z_{n}) \geq 〈 \frac{y_{n} - z_{n}}{γ_{n}} - ▿ f (y_{n}), x - z_{n} 〉, and

g (x) - g (x_{n + 1}) \geq 〈 \frac{z_{n} - x_{n + 1}}{γ_{n}} - ▿ f (z_{n}), x - x_{n + 1} 〉, for all n \in N .

Moreover,

f (x) - f (y_{n}) \geq 〈 ▿ f (y_{n}), x - y_{n} 〉,

f (x) - f (z_{n}) \geq 〈 ▿ f (z_{n}), x - z_{n} 〉,

f (y_{n}) - f (z_{n}) \geq 〈 ▿ f (z_{n}), y_{n} - z_{n} 〉, and

f (z_{n}) - f (x_{n + 1}) \geq 〈 ▿ f (x_{n + 1}), z_{n} - x_{n + 1} 〉, for all n \in N .

From the definition of

γ_{n}

and the above inequalities, we have, for all

x \in d o m g

and

n \in N

,

\begin{matrix} f (x) & - f (y_{n}) + f (x) - f (z_{n}) + g (x) - g (z_{n}) + g (x) - g (x_{n + 1}) \\ \geq \frac{1}{γ_{n}} 〈 y_{n} - z_{n}, x - z_{n} 〉 + 〈 ▿ f (y_{n}), z_{n} - y_{n} 〉 + \frac{1}{γ_{n}} 〈 z_{n} - x_{n + 1}, x - x_{n + 1} 〉 \\ + 〈 ▿ f (z_{n}), x_{n + 1} - z_{n} 〉 \\ = \frac{1}{γ_{n}} 〈 y_{n} - z_{n}, x - z_{n} 〉 + 〈 ▿ f (y_{n}) - ▿ f (z_{n}), z_{n} - y_{n} 〉 + 〈 ▿ f (z_{n}), z_{n} - y_{n} 〉 \\ + \frac{1}{γ_{n}} 〈 z_{n} - x_{n + 1}, x - x_{n + 1} 〉 + 〈 ▿ f (z_{n}) - ▿ f (x_{n + 1}), x_{n + 1} - z_{n} 〉 \\ + 〈 ▿ f (x_{n + 1}), x_{n + 1} - z_{n} 〉 \\ \geq \frac{1}{γ_{n}} 〈 y_{n} - z_{n}, x - z_{n} 〉 + \frac{1}{γ_{n}} 〈 z_{n} - x_{n + 1}, x - x_{n + 1} 〉 \\ - ∥ ▿ f (z_{n}) - ▿ f (y_{n}) ∥ ∥ z_{n} - y_{n} ∥ + 〈 ▿ f (z_{n}), z_{n} - y_{n} 〉 \\ - ∥ ▿ f (x_{n + 1}) - ▿ f (z_{n}) ∥ ∥ x_{n + 1} - z_{n} ∥ + 〈 ▿ f (x_{n + 1}), x_{n + 1} - z_{n} 〉 \\ \geq \frac{1}{γ_{n}} 〈 y_{n} - z_{n}, x - z_{n} 〉 + \frac{1}{γ_{n}} 〈 z_{n} - x_{n + 1}, x - x_{n + 1} 〉 + 〈 ▿ f (z_{n}), z_{n} - y_{n} 〉 \\ - ∥ ▿ f (z_{n}) - ▿ f (y_{n}) ∥ (∥ z_{n} - y_{n} ∥ + ∥ x_{n + 1} - z_{n} ∥) \\ - ∥ ▿ f (x_{n + 1}) - ▿ f (z_{n}) ∥ (∥ z_{n} - y_{n} ∥ + ∥ x_{n + 1} - z_{n} ∥) + 〈 ▿ f (x_{n + 1}), x_{n + 1} - z_{n} 〉 \\ = \frac{1}{γ_{n}} 〈 y_{n} - z_{n}, x - z_{n} 〉 + \frac{1}{γ_{n}} 〈 z_{n} - x_{n + 1}, x - x_{n + 1} 〉 + 〈 ▿ f (z_{n}), z_{n} - y_{n} 〉 \\ + 〈 ▿ f (x_{n + 1}), x_{n + 1} - z_{n} 〉 \\ - (∥ ▿ f (z_{n}) - ▿ f (y_{n}) ∥ + ∥ ▿ f (x_{n + 1}) - ▿ f (z_{n}) ∥) (∥ z_{n} - y_{n} ∥ + ∥ x_{n + 1} - z_{n} ∥) \\ \geq \frac{1}{γ_{n}} 〈 y_{n} - z_{n}, x - z_{n} 〉 + \frac{1}{γ_{n}} 〈 z_{n} - x_{n + 1}, x - x_{n + 1} 〉 + 〈 ▿ f (z_{n}), z_{n} - y_{n} 〉 \\ + 〈 ▿ f (x_{n + 1}), x_{n + 1} - z_{n} 〉 - \frac{2 δ}{γ_{n}} (∥ z_{n} - y_{n} ∥ + ∥ x_{n + 1} - z_{n} {∥)}^{2} \\ \geq \frac{1}{γ_{n}} 〈 y_{n} - z_{n}, x - z_{n} 〉 + \frac{1}{γ_{n}} 〈 z_{n} - x_{n + 1}, x - x_{n + 1} 〉 + f (x_{n + 1}) - f (y_{n}) \\ - \frac{4 δ}{γ_{n}} (∥ z_{n} - y_{n} ∥^{2} + ∥ x_{n + 1} - z_{n} ∥^{2}) . \end{matrix}

Hence,

\begin{matrix} \frac{1}{γ_{n}} & 〈 y_{n} - z_{n}, z_{n} - x 〉 + \frac{1}{γ_{n}} 〈 z_{n} - x_{n + 1}, x_{n + 1} - x 〉 \\ \geq (f + g) (z_{n}) + (f + g) (x_{n + 1}) - 2 (f + g) (x) - \frac{4 δ}{γ_{n}} ∥ z_{n} - y_{n} ∥^{2} - \frac{4 δ}{γ_{n}} {∥ x_{n + 1} - z_{n} ∥}^{2} . \end{matrix}

Moreover, the following also hold, for all

n \in N,

〈 y_{n} - z_{n}, z_{n} - x 〉 = \frac{1}{2} (∥ y_{n} {- x ∥}^{2} - ∥ y_{n} - z_{n} ∥^{2} - ∥ z_{n} - x ∥^{2}), a n d,

〈 z_{n} - x_{n + 1}, x_{n + 1} - x 〉 = \frac{1}{2} (∥ z_{n} {- x ∥}^{2} - ∥ z_{n} - x_{n + 1} ∥^{2} - ∥ x_{n + 1} - x ∥^{2}) .

As a result, we obtain:

\begin{matrix} \frac{1}{2 γ_{n}} & (∥ y_{n} {- x ∥}^{2} - ∥ y_{n} - z_{n} ∥^{2}) - \frac{1}{2 γ_{n}} (∥ z_{n} - x_{n + 1} ∥^{2} + ∥ x_{n + 1} - x ∥^{2}) \\ \geq (f + g) (z_{n}) + (f + g) (x_{n + 1}) - 2 (f + g) (x) - \frac{4 δ}{γ_{n}} ∥ z_{n} - y_{n} ∥^{2} - \frac{4 δ}{γ_{n}} {∥ x_{n + 1} - z_{n} ∥}^{2}, \end{matrix}

for all

x \in d o m g

and

n \in N

. Therefore,

\begin{matrix} ∥ y_{n} {- x ∥}^{2} - {∥ x_{n + 1} - x ∥}^{2} \geq & 2 γ_{n} [(f + g) (z_{n}) + (f + g) (x_{n + 1}) - 2 (f + g) (x)] \\ + (1 - 8 δ) (∥ z_{n} - y_{n} ∥^{2} + ∥ x_{n + 1} - z_{n} ∥^{2}), \end{matrix}

for all

x \in d o m g

and

n \in N

. Furthermore, putting

x = x^{*} \in S_{*}

, we obtain:

\begin{matrix} ∥ y_{n} - x^{*} ∥^{2} - ∥ x_{n + 1} - x^{*} ∥^{2} \geq (1 - 8 δ) (∥ z_{n} - y_{n} ∥^{2} + ∥ x_{n + 1} - z_{n} ∥^{2}) . \end{matrix}

(6)

Next, we show that

lim_{n \to \infty} ∥ x_{n} - x^{*} ∥

exists. From (6), we have:

\begin{matrix} ∥ x_{n + 1} - x^{*} ∥ & \leq ∥ y_{n} - x^{*} ∥, \\ = ∥ P_{d o m g} {\hat{x}}_{n} - P_{d o m g} x^{*} ∥, \\ \leq ∥ {\hat{x}}_{n} - x^{*} ∥, \\ \leq ∥ x_{n} - x^{*} ∥ + β_{n} ∥ x_{n} - x_{n - 1} ∥, \\ \leq (1 + β_{n}) ∥ x_{n} - x^{*} ∥ + β_{n} ∥ x_{n - 1} - x^{*} ∥, for all n \in N . \end{matrix}

(7)

This implies by Lemma 3 and C2 that

{x_{n}}

is bounded.

Consequently,

\sum_{n = 1}^{+ \infty} β_{n} ∥ x_{n} - x_{n - 1} ∥ < + \infty,

and:

∥ {\hat{x}}_{n} - x_{n} ∥ = β_{n} ∥ x_{n} - x_{n - 1} ∥ \to 0, a s n \to + \infty .

By (7) and Lemma 4, we obtain that

lim_{n \to + \infty} ∥ x_{n} - x^{*} ∥

exists. By the definitions of

z_{n - 1}

and

x_{n},

we see

x_{n} \in d o m g,

for all

n \in N

. As a result, we have:

∥ {\hat{x}}_{n} - y_{n} ∥ \leq ∥ {\hat{x}}_{n} - x_{n} ∥, for all n \in N,

so,

lim_{n \to + \infty} ∥ {\hat{x}}_{n} - y_{n} ∥ = 0 .

Thus, we obtain from (7) that

lim_{n \to + \infty} ∥ x_{n} - x^{*} ∥ = lim_{n \to + \infty} ∥ y_{n} - x^{*} ∥ .

Moreover, it follows from (6) that

lim_{n \to + \infty} ∥ z_{n} - y_{n} ∥ = 0,

and hence,

lim_{n \to + \infty} ∥ z_{n} - x_{n} ∥ = 0 .

Now, we prove that every weak-cluster point of

{x_{n}}

belongs to

S_{*} .

In order to accomplish this, we first let w be a weak-cluster point of

{x_{n}} .

Therefore, there exists a subsequence

{x_{n_{k}}}

of

{x_{n}}

such that

x_{n_{k}} ⇀ w,

and hence,

z_{n_{k}} ⇀ w

. Next, we prove that w belongs to

S^{*} .

Since

▿ f

is uniformly continuous, we have

lim_{k \to + \infty} ∥ ▿ f z_{n_{k}} - ▿ f y_{n_{k}} ∥ = 0 .

From (4), we obtain:

\frac{y_{n_{k}} - γ_{n_{k}} ▿ f y_{n_{k}} - z_{n_{k}}}{γ_{n_{k}}} \in \partial g (z_{n_{k}}), for all k \in N .

Hence,

\frac{y_{n_{k}} - z_{n_{k}}}{γ_{n_{k}}} - ▿ f y_{n_{k}} + ▿ f z_{n_{k}} \in \partial g (z_{n_{k}}) + ▿ f z_{n_{k}} = \partial (f + g) (z_{n_{k}}), for all k \in N .

By letting

k \to + \infty

in the above inequality, the demiclosedness of

G p h (\partial (f + g))

implies that

0 \in \partial (f + g) (w),

and hence,

w \in S_{*} .

Therefore, every weak-cluster point of

{x_{n}}

belongs to

S_{*} .

We derive from Lemma 5 that

{x_{n}}

converges weakly to

w^{*}

in

S_{*}

. Therefore,

{x_{n}}

converges weakly to a solution of (1), and the proof is complete. □

In the next theorem, we provide the complexity theorem of Algorithm 9. First, we introduce the control sequence

{t_{n}}

defined in [22] by:

t_{n} = 1 + \sum_{k = n}^{+ \infty} (\prod_{i = n}^{k} β_{i}), for all n \in N .

(8)

This sequence is well defined if the following assumption holds:

\sum_{k = n}^{+ \infty} (\prod_{i = n}^{k} β_{i}) < + \infty, for all n \in N .

Hence, from (8), we can see that:

β_{n} t_{n + 1} = t_{n} - 1, for all n \in N .

(9)

Next, we establish the following theorem.

Theorem 2.

Given

x_{0} = x_{1} \in d o m g

and letting

{x_{n}}

be a sequence generated by Algorithm 9, assume that all assumptions in Theorem 1 are satisfied. Furthermore, suppose that the following conditions hold, for all

n \in N

:

D1.: $\sum_{k = n}^{+ \infty} (\prod_{i = n}^{k} β_{i}) < + \infty,$ and $2 t_{n + 1}^{2} - 2 t_{n + 1} \leq t_{n}^{2},$
D2.: $γ_{n} \geq γ_{n + 1} .$

Then,

(f + g) (x_{n + 1}) - min_{x \in H} (f + g) (x) \leq \frac{d {(x_{1}, S_{*})}^{2} + 2 γ_{1} t_{1}^{2} [(f + g) (x_{1}) - min_{x \in H} (f + g) (x)]}{2 γ t_{n + 1}^{2}},

for all

n \in N .

In other words,

(f + g) (x_{n + 1}) - min_{x \in H} (f + g) (x) = O (\frac{1}{t_{n + 1}^{2}}), f o r a l l n \in N .

Proof.

For any

x \in d o m g,

we know that:

∥ y_{n} {- x ∥}^{2} - {∥ x_{n + 1} - x ∥}^{2} \geq 2 γ_{n} [(f + g) (z_{n}) + (f + g) (x_{n + 1}) - 2 (f + g) (x)]

for all

n \in N .

Since

x \in d o m g,

we obtain

∥ {\hat{x}}_{n} - x ∥ \geq ∥ y_{n} - x ∥ .

Thus, we conclude that:

∥ {\hat{x}}_{n} {- x ∥}^{2} - {∥ x_{n + 1} - x ∥}^{2} \geq 2 γ_{n} [(f + g) (z_{n}) + (f + g) (x_{n + 1}) - 2 (f + g) (x)], for all n \in N .

(10)

Let

x^{*}

be an element in

S_{*}

. We know that

x_{n}, x \in d o m g

and

t_{n + 1} \geq 1,

so

(1 - \frac{1}{t_{n + 1}}) x_{n} + \frac{1}{t_{n + 1}} x^{*} \in d o m g .

Next, we put

x = (1 - \frac{1}{t_{n + 1}}) x_{n} + \frac{1}{t_{n + 1}} x^{*}

in (10) and obtain the following:

\begin{matrix} ∥ x_{n + 1} & - (1 - \frac{1}{t_{n + 1}}) x_{n} - \frac{1}{t_{n + 1}} x^{*} ∥^{2} - {∥ {\hat{x}}_{n} - (1 - \frac{1}{t_{n + 1}}) x_{n} - \frac{1}{t_{n + 1}} x^{*} ∥}^{2} \\ \leq 2 γ_{n} [(f + g) ((1 - \frac{1}{t_{n + 1}}) x_{n} + \frac{1}{t_{n + 1}} x^{*}) - (f + g) (x_{n + 1})] \\ + 2 γ_{n} [(f + g) ((1 - \frac{1}{t_{n + 1}}) x_{n} + \frac{1}{t_{n + 1}} x^{*}) - (f + g) (z_{n})] \\ \leq 2 γ_{n} [(1 - \frac{1}{t_{n + 1}}) (f + g) (x_{n}) + \frac{1}{t_{n + 1}} (f + g) (x^{*}) - (f + g) (x_{n + 1})] \\ + 2 γ_{n} [(1 - \frac{1}{t_{n + 1}}) (f + g) (x_{n}) + \frac{1}{t_{n + 1}} (f + g) (x^{*}) - (f + g) (z_{n})] \\ = 2 γ_{n} [(1 - \frac{1}{t_{n + 1}}) [(f + g) (x_{n}) - (f + g) (x^{*})] - [(f + g) (x_{n + 1}) - (f + g) (x^{*})]] \\ + 2 γ_{n} [(1 - \frac{1}{t_{n + 1}}) [(f + g) (x_{n}) - (f + g) (x^{*})] - [(f + g) (z_{n}) - (f + g) (x^{*})]] \\ \leq 4 γ_{n} (1 - \frac{1}{t_{n + 1}}) [(f + g) (x_{n}) - (f + g) (x^{*})] - 2 γ_{n} [(f + g) (x_{n + 1}) - (f + g) (x^{*})], \end{matrix}

(11)

for all

n \in N

. From D1 and D2, we know that

t_{n}^{2} \geq 2 t_{n + 1}^{2} - 2 t_{n + 1}

and

γ_{n} \geq γ_{n + 1},

so:

\begin{matrix} 4 γ_{n} (1 - \frac{1}{t_{n + 1}}) & [(f + g) (x_{n}) - (f + g) (x^{*})] - 2 γ_{n} [(f + g) (x_{n + 1}) - (f + g) (x^{*})] \\ \leq \frac{1}{t_{n + 1}^{2}} [2 γ_{n} (2 t_{n + 1}^{2} - 2 t_{n + 1}) [(f + g) (x_{n}) - (f + g) (x^{*})] \\ - 2 γ_{n + 1} t_{n + 1}^{2} [(f + g) (x_{n + 1}) - (f + g) (x^{*})]] \\ \leq \frac{1}{t_{n + 1}^{2}} [2 γ_{n} t_{n}^{2} [(f + g) (x_{n}) - (f + g) (x^{*})] \\ - 2 γ_{n + 1} t_{n + 1}^{2} [(f + g) (x_{n + 1}) - (f + g) (x^{*})]], f o r a l l n \in N . \end{matrix}

(12)

Moreover, we also have:

\begin{matrix} ∥ & x_{n + 1} - (1 - \frac{1}{t_{n + 1}}) x_{n} - \frac{1}{t_{n + 1}} x^{*} ∥^{2} - {∥ {\hat{x}}_{n} - (1 - \frac{1}{t_{n + 1}}) x_{n} - \frac{1}{t_{n + 1}} x^{*} ∥}^{2} \\ = \frac{1}{t_{n + 1}^{2}} (∥ t_{n + 1} x_{n + 1} - (t_{n + 1} - 1) x_{n} - x^{*} ∥^{2} \\ - ∥ t_{n + 1} x_{n} + β_{n} t_{n + 1} (x_{n} - x_{n - 1}) - (t_{n + 1} - 1) x_{n} - x^{*} ∥^{2}) \\ = \frac{1}{t_{n + 1}^{2}} (∥ t_{n + 1} x_{n + 1} - (t_{n + 1} - 1) x_{n} - x^{*} ∥^{2} - ∥ (t_{n} - 1) (x_{n} - x_{n - 1}) + x_{n} - x^{*} ∥^{2}) \\ = \frac{1}{t_{n + 1}^{2}} (∥ t_{n + 1} x_{n + 1} - (t_{n + 1} - 1) x_{n} - x^{*} ∥^{2} - ∥ t_{n} x_{n} - (t_{n} - 1) x_{n - 1} - x^{*} ∥^{2}), \end{matrix}

(13)

for all

n \in N

. By (11)–(13), we obtain:

\begin{matrix} ∥ t_{n + 1} & x_{n + 1} - (t_{n + 1} - 1) x_{n} - x^{*} ∥^{2} - {∥ t_{n} x_{n} - (t_{n} - 1) x_{n - 1} - x^{*} ∥}^{2} \\ \leq 2 γ_{n} t_{n}^{2} [(f + g) (x_{n}) - (f + g) (x^{*})] - 2 γ_{n + 1} t_{n + 1}^{2} [(f + g) (x_{n + 1}) - (f + g) (x^{*})], \end{matrix}

for all

n \in N

. It follows that:

\begin{matrix} 2 γ_{n + 1} t_{n + 1}^{2} [(f + g) (x_{n + 1}) - (f + g) (x)] & \leq ∥ t_{n} x_{n} - (t_{n} - 1) x_{n - 1} - x^{*} ∥^{2} \\ - ∥ t_{n + 1} x_{n + 1} - (t_{n + 1} - 1) x_{n} - x^{*} ∥^{2} \\ + 2 γ_{n} t_{n}^{2} [(f + g) (x_{n}) - (f + g) (x^{*})], \end{matrix}

(14)

for all

n \in N

. Moreover, we can inductively prove, from (14), that:

\begin{matrix} 2 γ_{n + 1} t_{n + 1}^{2} [(f + g) (x_{n + 1}) - (f + g) (x^{*})] & \leq ∥ t_{n} x_{n} - (t_{n} - 1) x_{n - 1} - x^{*} ∥^{2} \\ + 2 γ_{n} t_{n}^{2} [(f + g) (x_{n}) - (f + g) (x^{*})] \\ \leq ∥ t_{n - 1} x_{n - 1} - (t_{n - 1} - 1) x_{n - 2} - x^{*} ∥^{2} \\ + 2 γ_{n - 1} t_{n - 1}^{2} [(f + g) (x_{n - 1}) - (f + g) (x^{*})] \\ ⋮ \\ \leq ∥ t_{1} x_{1} - (t_{1} - 1) x_{0} - x^{*} ∥^{2} \\ + 2 γ_{1} t_{1}^{2} [(f + g) (x_{1}) - (f + g) (x^{*})], \end{matrix}

for all

n \in N .

From C1, we know that

γ_{n + 1} \geq γ .

Therefore, we obtain, for all

n \in N,

\begin{matrix} (f + g) (x_{n + 1}) - min_{x \in H} (f + g) (x) & \leq \frac{1}{2 γ_{n + 1} t_{n + 1}^{2}} {∥ x_{1} - x^{*} ∥}^{2} \\ + \frac{2 γ_{1} t_{1}^{2}}{2 γ_{n + 1} t_{n + 1}^{2}} [(f + g) (x_{1}) - (f + g) (x^{*})] \\ \leq \frac{∥ x_{1} - x^{*} ∥^{2} + 2 γ_{1} t_{1}^{2} [(f + g) (x_{1}) - min_{x \in H} (f + g) (x)]}{2 γ t_{n + 1}^{2}} . \end{matrix}

Since

x^{*}

is arbitrarily chosen from

S_{*}

, we have:

(f + g) (x_{n + 1}) - min_{x \in H} (f + g) (x) \leq \frac{d {(x_{1}, S_{*})}^{2} + 2 γ_{1} t_{1}^{2} [(f + g) (x_{1}) - min_{x \in H} (f + g) (x)]}{2 γ t_{n + 1}^{2}},

for all

n \in N,

and the proof is complete. □

Remark 1.

To justify that there exists a sequence

{β_{n}}

satisfying D1, we choose:

β_{n} = \frac{1}{{(n + 1)}^{2}}, for all n \in N .

We see that, for all

n \in N,

t_{n} = 1 + \frac{1}{{(n + 1)}^{2}} + \frac{1}{{(n + 1)}^{2} {(n + 2)}^{2}} + \frac{1}{{(n + 1)}^{2} {(n + 2)}^{2} {(n + 3)}^{2}} + \dots .

Therefore, we have:

t_{n + 1} \leq t_{n} for all n \in N .

Furthermore, it can be seen that

t_{n + 1} < 1 + \sum_{k = 3}^{+ \infty} \frac{1}{k^{2}} = \frac{π^{2}}{6} - \frac{1}{4}

, for all

n \in N .

Then:

\begin{matrix} 2 t_{n + 1}^{2} - 2 t_{n + 1} & = (t_{n + 1}) (2 t_{n + 1} - 2) \\ \leq (t_{n}) (\frac{π^{2}}{3} - \frac{5}{2}), f o r a l l n \in N . \end{matrix}

Since

\frac{π^{2}}{3} - \frac{5}{2} < 1,

we can conclude that

2 t_{n + 1}^{2} - 2 t_{n + 1} < t_{n} < t_{n}^{2},

for all

n \in N .

Obviously,

\sum_{k = n}^{+ \infty} (\prod_{i = n}^{k} β_{i}) < + \infty;

, hence D1 is satisfied.

We also note that to obtain a sequence

{γ_{n}}

satisfying D2, one could simply modify Algorithm 9 by choosing

γ_{n} = Line Search 3 (y_{n}, γ_{n - 1}, θ, δ) .

4. Applications to Data Classification and Image Restoration Problems

In this section, the proposed algorithm is used to solve classification and image restoration problems. The performance of Algorithm 9 is evaluated and compared with Algorithms 3, 5, and 6.

4.1. Data Classification

Data classification is a major branch of problems in machine learning, which is an application of artificial intelligence (AI) possessing the ability to learn and improve from experience without being programmed. In this work, we focused on one particular learning technique called extreme learning machine (ELM) introduced by Huang et al. [29]. It is defined as follows.

Let

S : = {(x_{k}, t_{k}) : x_{k} \in R^{n}, t_{k} \in R^{m}, k = 1, 2, . . ., N}

be a training set of N samples, where

x_{k}

is an input and

t_{k}

is a target. The output of ELM with M hidden nodes and activation function G is defined by:

o_{j} = \sum_{i = 1}^{M} η_{i} G (〈 w_{i}, x_{j} 〉 + b_{i}),

where

w_{i}

is the weight vector connecting the i-th hidden node and the input node,

η_{i}

is the weight vector connecting the i-th hidden node and the output node, and

b_{i}

is the bias. The hidden layer output matrix

H

is formulated as:

H = [\begin{matrix} G (〈 w_{1}, x_{1} 〉 + b_{1}) & \dots & G (〈 w_{M}, x_{1} 〉 + b_{M}) \\ ⋮ & ⋱ & ⋮ \\ G (〈 w_{1}, x_{N} 〉 + b_{1}) & \dots & G (〈 w_{M}, x_{N} 〉 + b_{M}) \end{matrix}] .

The main goal of ELM is to find an optimal weight

η = {[η_{1}^{T}, . . ., η_{M}^{T}]}^{T}

such that

H η = T,

where

T = {[t_{1}^{T}, . . ., t_{N}^{T}]}^{T}

is the training set. If the Moore–Penrose generalized inverse

H^{†}

of

H

exists, then

η = H^{†} T

is the desired solution. However, in general cases,

H^{†}

may not exist or be challenging to find. Hence, to avoid such difficulties, we applied the concept of convex minimization to find

η

without relying on

H^{†}

.

To prevent overfitting, we used the least absolute shrinkage and selection operator (LASSO) [30], formulated as follows:

min_{η} {{H η - T ∥}_{2}^{2} + {λ ∥ η ∥}_{1}},

(15)

where

λ

is a regularization parameter. In the setting of convex minimization, we set

f (x) = {∥ H x - T ∥}_{2}^{2}

and

g (x) = {λ ∥ x ∥}_{1}

.

In the experiment, we aimed to classify three datasets https://archive.ics.uci.edu, accessed on 1 May 2021.

Iris dataset [31]: Each sample in this dataset has four attributes, and the set contains three classes with fifty samples for each type.

Heart disease dataset [32]: This dataset contains 303 samples, each of which has 13 attributes. In this dataset, we classified two classes of data.

Wine dataset [33]: In this dataset, we classified three classes of one-hundred seventy-eight samples. Each sample contained 13 attributes.

In all experiments, we used the sigmoid as the activation function with the number of hidden nodes

M = 30 .

The accuracy of the output is calculated by:

accuracy = \frac{correctly predicted data}{all data} \times 100 .

We also utilized 10-fold cross-validation to evaluate the performance of each algorithm and used the average accuracy as the evaluation tool. It is defined as follows:

Average ACC = \sum_{i = 1}^{N} \frac{x_{i}}{y_{i}} \times 100 % / N .

where N is the number of sets considered during cross-validation (

N = 10

),

x_{i}

is the number of correctly predicted data at fold i, and

y_{i}

is the number of all data at fold i.

We used 10-fold cross-validation to split the data into training sets and testing sets; more information can be seen in Table 1.

All parameters of Algorithms 3, 5, 6 and 9 were chosen as in Table 2.

The inertial parameters

β_{n}

of Algorithm 9 may vary depending on the dataset, since some

β_{n}

work well on specific datasets. We used the following two choices of

β_{n}

in our experiments.

β_{n}^{1} = \frac{10^{8}}{∥ x_{n} - x_{n - 1} ∥^{3} + n^{3} + 10^{8}}, and β_{n}^{2} = \{\begin{matrix} 0.9, if n \leq 1000 \\ \frac{1}{{(n + 1)}^{2}}, if n \geq 1001 \end{matrix} .

The regularization parameters

λ

for each dataset and algorithm were chosen to prevent overfitting, i.e., a model obtained from the algorithm achieves high accuracy on the training set, but low accuracy on the testing set in comparison, so it cannot be used to predict the unknown data. It is known that when

λ

is too large, the model tends to underfit, i.e., low accuracy on the training set, and cannot be used to predict the future data. On the other hand, if

λ

is too small, then it may not be enough to prevent a model from overfitting. In our experiment, for each algorithm, we chose a set of

λ

that satisfies

| A c c_t r a i n - A c c_t e s t | < 2 %

, where

A c c_t r a i n

and

A c c_t e s t

are the average accuracy of the training set and testing set, respectively. Under this criterion, we can prevent the studied models from overfitting. Then, from these candidates, we chose

λ

, which yields high

A C C_t e s t

for each algorithm. Therefore, the models obtained from Algorithms 3, 5, 6 and 9 can be effectively used to predict the unknown data.

By this process, the regularization parameters

λ

for Algorithms 5, 6, and 9 were as in Table 3.

We assessed the performance of each algorithm at the 300th iteration with the average accuracy. The results can be seen in Table 4.

As we see from Table 4, from the choice of

λ

, all models obtained from Algorithms 3, 5, 6 and 9 had reasonably high average accuracy on both the training and testing sets for all datasets. Moreover, we observed that a model from Algorithm 9 performed better than the models from other algorithms in terms of the accuracy in all experiments conducted.

4.2. Image Restoration

We first recall that an image restoration problem can be formulated as a simple mathematical model as follows:

A x = b + w

(16)

where

x \in R^{n \times 1}

is the original image,

A \in R^{m \times n}

is a blurring matrix, b is an observed image, and w is noise. The main objective of image restoration is to find x from given image

b,

blurring matrix A, and noise w.

In order to solve (16), one could implement LASSO [30] and reformulate the problem in the following form.

min_{x} {{∥ A x - b ∥}_{2}^{2} + {λ ∥ x ∥}_{1}},

(17)

where

λ

is a regularization parameter. Hence, it can be viewed as a convex minimization problem. Therefore, Algorithms 3, 5, 6 and 9 can be used to solve an image restoration problem.

In our experiment, we used the

256 \times 256

color image as the original image. We used Gaussian blur of size

9^{2}

and standard deviation four on the original image and obtained the blurred image. In order to assess the performance of each algorithm, we implemented the peak-signal-to-noise ratio (PSNR) [34] defined by:

P S N R (x_{n}) = 10 {log}_{10} (\frac{255^{2}}{MSE}) .

(18)

For any original image x and deblurred image

x_{n}

, the mean squared error (MSE) is calculated by

MSE = \frac{1}{M} {∥ x_{n} - x ∥}^{2},

where M is the number of pixels of x. We also need to mention that an algorithm with a higher PSNR performs better than one with a lower PSNR.

The control parameters of each algorithm were chosen as

δ = θ = σ = 0.1

. As the inertial parameter

β_{n}

of Algorithm 9, we used the following:

β_{n} = \{\begin{matrix} 0.95, if n \leq 1000 \\ \frac{1}{n^{2}}, if n \geq 1001 . \end{matrix} .

As for the regularization parameter

λ

, we experimented on

λ

varying from zero to one for each algorithm. In Figure 1, we show the PSNR of Algorithms 3 and 5 with respect to

λ

at the 200th iteration. In Figure 2, we show the PSNR of Algorithms 6 and 9 with respect to

λ

at the 200th iteration.

We observe from Figure 1 and Figure 2 that the PSNRs of Algorithms 3, 5, 6 and 9 increased as

λ

became smaller. Based on this, for the next two experiments, we chose

λ

to be small to obtain a high PSNR for all algorithms. Next, we observed the PSNR of each algorithm when

λ

was small (

λ < 10^{- 4}

). In Figure 3, we show the PSNR of each algorithm with respect to

λ < 10^{- 4}

at the 200th iteration.

We see from Figure 3 that Algorithm 9 offered a higher PSNR than the other algorithms.

In the next experiment, we chose

λ = 5 \times 10^{- 5}

for each algorithm and evaluated the performance of each algorithm at the 200th iteration; see Table 5 for the results.

In Figure 4, we show the PSNR of each algorithm at each step of iteration.

In Figure 5, we show the original test image, blurred image and deblurred images obtained from Algorithms 3, 5, 6 and 9. As we see from Table 5 and Figure 4, Algorithm 9 achieved the highest PSNR.

5. Conclusions

We introduced a new line search technique inspired by [9,19] and used this technique along with the inertial step to construct a new accelerated algorithm for solving convex minimization problems in the form of the sum of two lower semicontinuous convex functions. We proved its weak convergence to a solution of (1), which did not require an L-Lipschitz continuity of

▿ f,

as well as its complexity theorem. We note that many forward–backward-type algorithms require

▿ f

to be L-Lipschitz continuous in order to obtain the convergence theorem; see [10,16,18] for examples. Our proposed algorithm is easy to employ, since an L-Lipschitz constant of

▿ f

does not need to be calculated. Moreover, we also utilized an inertial technique to improve the convergence behavior of our proposed algorithm. In order to show that our algorithm performed better than other line search algorithms mentioned in the literature, we conducted some experiments on classification and image restoration problems. In our experiments, we evaluated the performance of each algorithm based on selecting its suitable parameters, especially the regularization parameter. It was evidenced that our proposed algorithm performed better than the other algorithms on both data classification and image restoration problems. Moreover, we observed from [19,20] that the inertial parameter

β_{n} = \frac{t_{n} - 1}{t_{n + 1}}

of FISTA and Algorithm 6 satisfied the condition

sup_{n} β_{n} = 1,

while our algorithm required a more strict condition,

\sum_{n = 1}^{+ \infty} β_{n} < + \infty,

to ensure its convergence to a solution of (1), which was a limitation of our algorithm. We also note that the convergence of FISTA and Algorithm 6 cannot be obtained under the condition

sup_{n} β_{n} = 1

. Therefore, it is very interesting to find a weaker condition on

β_{n}

that still ensure the convergence of the algorithm to a solution of (1).

Author Contributions

Writing—original draft preparation, P.S.; software and editing, D.C.; supervision, S.S. All authors read and agreed to the published version of the manuscript.

Funding

Thailand Science Research and Innovation: IRN62W0007, Chiang Mai University.

Data Availability Statement

All data is available at https://archive.ics.uci.edu accessed on 17 May 2021.

Acknowledgments

P. Sarnmeta was supported by the Post-Doctoral Fellowship of Chiang Mai University, Thailand. This research was also supported by Chiang Mai University and Thailand Science Research and Innovation under the Project IRN62W0007.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kaltenbacher, B.; Rundell, W. On the identification of a nonlinear term in a reaction-diffusion equation. Inverse Probl. 2019, 35, 115007. [Google Scholar] [CrossRef] [Green Version]
Kaltenbacher, B.; Rundell, W. The inverse problem of reconstructing reaction–diffusion systems. Inverse Probl. 2020, 36, 065011. [Google Scholar] [CrossRef]
Averós, J.C.; Llorens, J.P.; Uribe-Kaffure, R. Numerical simulation of non-linear models of reaction-diffusion for a DGT sensor. Algorithms 2020, 13, 98. [Google Scholar] [CrossRef] [Green Version]
Lukyanenko, D.; Yeleskina, T.; Prigorniy, I.; Isaev, T.; Borzunov, A.; Shishlenin, M. Inverse problem of recovering the initial condition for a nonlinear equation of the reaction-diffusion-advection type by data given on the position of a reaction front with a time delay. Mathematics 2021, 9, 342. [Google Scholar] [CrossRef]
Lukyanenko, D.; Borzunov, A.; Shishlenin, M. Solving coefficient inverse problems for nonlinear singularly perturbed equations of the reaction-diffusion-advection type with data on the position of a reaction front. Commun. Nonlinear Sci. Numer. Simul. 2021, 99, 105824. [Google Scholar] [CrossRef]
Egger, H.; Engl, H.W.; Klibanov, M.V. Global uniqueness and Hölder stability for recovering a nonlinear source term in a parabolic equation. Inverse Probl. 2004, 21, 271. [Google Scholar] [CrossRef]
Beilina, L.; Klibanov, M.V. A Globally convergent numerical method for a coefficient inverse problem. SIAM J. Sci. Comput. 2008, 31, 478–509. [Google Scholar] [CrossRef]
Klibanov, M.V.; Li, J.; Zhang, W. Convexification for an inverse parabolic problem. Inverse Probl. 2020, 36, 085008. [Google Scholar] [CrossRef]
Kankam, K.; Pholasa, N.; Cholamjiak, C. On convergence and complexity of the modified forward–backward method involving new line searches for convex minimization. Math. Methods Appl. Sci. 2019, 42, 1352–1362. [Google Scholar] [CrossRef]
Combettes, P.L.; Wajs, V. Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 2005, 4, 1168–1200. [Google Scholar] [CrossRef] [Green Version]
Luo, Z.Q. Applications of convex optimization in signal processing and digital communication. Math. Program. 2003, 97, 177–207. [Google Scholar] [CrossRef]
Xiong, K.; Zhao, G.; Shi, G.; Wang, Y. A convex optimization algorithm for compressed sensing in a complex domain: The complex-valued split Bregman method. Sensors 2019, 19, 4540. [Google Scholar] [CrossRef] [Green Version]
Chen, M.; Zhang, H.; Lin, G.; Han, Q. A new local and nonlocal total variation regularization model for image denoising. Cluster Comput. 2019, 22, 7611–7627. [Google Scholar] [CrossRef]
Zhang, Y.; Li, X.; Zhao, G.; Cavalcante, C.C. Signal reconstruction of compressed sensing based on alternating direction method of multipliers. Circuits Syst. Signal Process 2020, 39, 307–323. [Google Scholar] [CrossRef]
Parekh, A.; Selesnick, I.W. Convex fused lasso denoising with non-convex regularization and its use for pulse detection. In Proceedings of the 2015 IEEE Signal Processing in Medicine and Biology Symposium, Philadelphia, PA, USA, 12 December 2015; pp. 21–30. [Google Scholar]
Hanjing, A.; Suantai, S. A fast image restoration algorithm based on a fixed point and optimization method. Mathematics 2020, 8, 378. [Google Scholar] [CrossRef] [Green Version]
Lions, P.L.; Mercier, B. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 1979, 16, 964–979. [Google Scholar] [CrossRef]
Boţ, R.I.; Csetnek, E.R. An inertial forward–backward-forward primal-dual splitting algorithm for solving monotone inclusion problems. Numer. Algorithms 2016, 71, 519–540. [Google Scholar] [CrossRef] [Green Version]
Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef] [Green Version]
Bello Cruz, J.Y.; Nghia, T.T. On the convergence of the forward–backward splitting method with line searches. Optim. Methods Softw. 2016, 31, 1209–1238. [Google Scholar] [CrossRef] [Green Version]
Polyak, B.T. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
Attouch, H.; Cabot, A. Convergence rate of a relaxed inertial proximal algorithm for convex minimization. Optimization 2019, 69, 1281–1312. [Google Scholar] [CrossRef]
Alvarez, F.; Attouch, H. An inertial proximal method for maxi mal monotone operators via discretiza tion of a nonlinear oscillator with damping. Set-Valued Anal. 2001, 9, 3–11. [Google Scholar] [CrossRef]
Van Hieu, D. An inertial-like proximal algorithm for equilibrium problems. Math. Methods Oper. Res. 2018, 88, 399–415. [Google Scholar] [CrossRef]
Chidume, C.E.; Kumam, P.; Adamu, A. A hybrid inertial algorithm for approximating solution of convex feasibility problems with applications. Fixed Point Theory Appl. 2020, 2020. [Google Scholar] [CrossRef]
Burachik, R.S.; Iusem, A.N. Set-Valued Mappings and Enlargements of Monotone Operators; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Takahashi, W. Introduction to Nonlinear and Convex Analysis; Yokohama Publishers: Yokohama, Japan, 2009. [Google Scholar]
Moudafi, A.; Al-Shemas, E. Simultaneous iterative methods for split equality problem. Trans. Math. Program. Appl. 2013, 1, 1–11. [Google Scholar]
Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
Detrano, R.; Janosi, A.; Steinbrunn, W.; Pfisterer, M.; Schmid, J.J.; Sandhu, S.; Guppy, K.H.; Lee, S.; Froelicher, V. International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 1989, 64, 304–310. [Google Scholar] [CrossRef]
Forina, M.; Leardi, R.; Armanino, C.; Lanteri, S. PARVUS: An Extendable Package of Programs for Data Exploration; Elsevier: Amsterdam, The Netherlands, 1988. [Google Scholar]
Thung, K.-H.; Raveendran, P. A Survey of Image Quality Measures. In Proceedings of the IEEE Technical Postgraduates (TECHPOS) International Conference, Kuala Lumpur, Malaysia, 14–15 December 2009; pp. 1–4. [Google Scholar]

Figure 1. PSNR of Algorithm 3 (left) and Algorithm 5 (right) with respect to

λ

at the 200th iteration.

Figure 1. PSNR of Algorithm 3 (left) and Algorithm 5 (right) with respect to

λ

at the 200th iteration.

Figure 2. PSNR of Algorithm 6 (left) and Algorithm 9 (right) with respect to

λ

at the 200th iteration.

Figure 2. PSNR of Algorithm 6 (left) and Algorithm 9 (right) with respect to

λ

at the 200th iteration.

Figure 3. A graph of the PSNR of each algorithm with respect to

λ

at the 200th iteration.

Figure 3. A graph of the PSNR of each algorithm with respect to

λ

at the 200th iteration.

Figure 4. A graph of the PSNR of each algorithm at Iteration Number 1 to 200.

Figure 5. Deblurred images of each algorithm at the 200th iteration.

Table 1. Number of samples in each fold for all datasets.

	Iris		Heart Disease		Wine
	Train	Test	Train	Test	Train	Test
Fold 1	135	15	273	30	161	17
Fold 2	135	15	272	31	160	18
Fold 3	135	15	272	31	160	18
Fold 4	135	15	272	31	160	18
Fold 5	135	15	273	30	160	18
Fold 6	135	15	273	30	160	18
Fold 7	135	15	273	30	160	18
Fold 8	135	15	273	30	160	18
Fold 9	135	15	273	30	160	18
Fold 10	135	15	273	30	161	17

Table 2. Chosen parameters for each algorithm.

	Algorithm 3	Algorithm 5	Algorithm 6	Algorithm 9
$σ$	0.49	0.124	0.49	0.124
$δ$	0.1	0.1	0.1	0.1
$θ$	0.1	0.1	0.1	0.1

Table 3. Chosen

λ

for each algorithm.

Table 3. Chosen

λ

for each algorithm.

	Regularization Parameter $λ$
	Iris	Heart Disease	Wine
Algorithm 3	$0.001$	$0.003$	$0.02$
Algorithm 5	$0.01$	$0.03$	$0.006$
Algorithm 6	$0.9$	$0.2$	$0.003$
Algorithm 9	$0.003$	$0.16$	$0.17$

Table 4. Average accuracy of each algorithm at the 300th iteration with 10-fold cv.

	Iris		Heart Disease		Wine
	Train	Test	Train	Test	Train	Test
Algorithm 3	92.37	90.67	81.85	80.52	97.57	97.16
Algorithm 5	94.37	94.00	83.53	81.84	98.00	97.19
Algorithm 6	96.67	96.00	84.42	83.49	99.25	98.33
Algorithm 9	98.52	98.67	84.30	83.82	99.38	99.44

Table 5. PSNR of each algorithm at the 200th iteration.

	PSNR (dB)
Algorithm 3	77.62
Algorithm 5	78.55
Algorithm 6	78.95
Algorithm 9	81.29

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chumpungam, D.; Sarnmeta, P.; Suantai, S. A New Forward–Backward Algorithm with Line Searchand Inertial Techniques for Convex Minimization Problems with Applications. Mathematics 2021, 9, 1562. https://doi.org/10.3390/math9131562

AMA Style

Chumpungam D, Sarnmeta P, Suantai S. A New Forward–Backward Algorithm with Line Searchand Inertial Techniques for Convex Minimization Problems with Applications. Mathematics. 2021; 9(13):1562. https://doi.org/10.3390/math9131562

Chicago/Turabian Style

Chumpungam, Dawan, Panitarn Sarnmeta, and Suthep Suantai. 2021. "A New Forward–Backward Algorithm with Line Searchand Inertial Techniques for Convex Minimization Problems with Applications" Mathematics 9, no. 13: 1562. https://doi.org/10.3390/math9131562

APA Style

Chumpungam, D., Sarnmeta, P., & Suantai, S. (2021). A New Forward–Backward Algorithm with Line Searchand Inertial Techniques for Convex Minimization Problems with Applications. Mathematics, 9(13), 1562. https://doi.org/10.3390/math9131562

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Forward–Backward Algorithm with Line Searchand Inertial Techniques for Convex Minimization Problems with Applications

Abstract

1. Introduction

2. Preliminaries

3. Main Results

4. Applications to Data Classification and Image Restoration Problems

4.1. Data Classification

4.2. Image Restoration

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI