A New Accelerated Algorithm for Convex Bilevel Optimization Problems and Applications in Data Classification

Thongpaen, Panadda; Inthakon, Warunun; Leerapun, Taninnit; Suantai, Suthep

doi:10.3390/sym14122617

Open AccessArticle

A New Accelerated Algorithm for Convex Bilevel Optimization Problems and Applications in Data Classification

by

Panadda Thongpaen

¹,

Warunun Inthakon

^2,3,

Taninnit Leerapun

⁴ and

Suthep Suantai

^2,3,*

¹

Graduate Ph.D’s Degree Program in Mathematics, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand

²

Research Group in Mathematics and Applied Mathematics, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand

³

Data Science Research Center, Department of Mathematics, Chiang Mai University, Chiang Mai 50200, Thailand

⁴

Sriphat Medical Center, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand

^*

Author to whom correspondence should be addressed.

Symmetry 2022, 14(12), 2617; https://doi.org/10.3390/sym14122617

Submission received: 3 November 2022 / Revised: 20 November 2022 / Accepted: 30 November 2022 / Published: 10 December 2022

(This article belongs to the Special Issue Symmetry in Nonlinear Analysis and Boundary Value Problems)

Download Versions Notes

Abstract

:

In the development of algorithms for convex optimization problems, symmetry plays a very important role in the approximation of solutions in various real-world problems. In this paper, based on a fixed point algorithm with the inertial technique, we proposed and study a new accelerated algorithm for solving a convex bilevel optimization problem for which the inner level is the sum of smooth and nonsmooth convex functions and the outer level is a minimization of a smooth and strongly convex function over the set of solutions of the inner level. Then, we prove its strong convergence theorem under some conditions. As an application, we apply our proposed algorithm as a machine learning algorithm for solving some data classification problems. We also present some numerical experiments showing that our proposed algorithm has a better performance than the five other algorithms in the literature, namely BiG-SAM, iBiG-SAM, aiBiG-SAM, miBiG-SAM and amiBiG-SAM.

Keywords:

convex bilevel optimization problems; accelerated algorithm; common fixed point; nonexpansive mappings; classification problems

1. Introduction

Breast cancer is the most common type of cancer in Thai women. Anxiously, although the breast cancer can be treated, the risk of developing diseases that affect the heart or blood vessels is very high.

The three most common methods for treating breast cancer are surgery, chemotherapy and radiotherapy. However, radiotherapy often involves some incidental exposure of the heart to ionizing radiation because it was discovered, in [1], that the exposure of the heart to ionizing radiation during the therapy increases the consequent rate of ischemic heart disease which begins within a few years after exposure and continues for at least 20 years. Thus, women with preexisting cardiac risk factors have higher absolute increases in risk from this therapy than other women.

Therefore, if a patient is diagnosed with heart disease early, they will be able to prevent the risks from this type of treatment. Similarly, the malignant cells of a patient can be treated before it spreads to other parts of the body when cancer is detected at an early stage. To support the diagnosis of breast cancer and heart disease, our objective in this work is developing an algorithm for such patient prediction.

It is well known that symmetry serves as the foundation for fixed-point and optimization theory and methods. We first recall the background of some mathematical models. Consider the constrained minimization problem:

min_{x \in Γ} F (x),

(1)

when

H

is a real Hilbert space,

F : H \to R

is a strongly convex differentiable function with convexity parameter

ρ

, and

Γ

is the nonempty set of minimizers of the unconstrained minimization problem, as in the form:

min_{x} \{ϕ (x) + ψ (x)\},

(2)

where

ψ, ϕ : H \to R \cup {+ \infty}

are proper convex and lower semicontinuous functions and

ϕ

is a smooth function. Problems (1) and (2) are called outer-level and inner-level problems, respectively. In [2,3,4,5], such a problem is labeled as a simple bilevel optimization problem.

In 2017, Sabach and Shtern [6] introduced the Bilevel Gradient Sequential Averaging Method (BiG-SAM) for solving (1) and (2) as defined by Algorithm 1.

Algorithm 1 BiG-SAM: Bilevel Gradient Sequential Averaging Method

1:: Initial step. Let $x_{1} \in R^{n}$ and ${α_{k}}$ is a sequence in $(0, 1]$ satisfying the conditions assumed in [7].
Select $λ \in (0, \frac{1}{L_{ϕ}}]$ and $σ \in (0, \frac{2}{L_{F} + ρ})$ while $L_{ϕ}$ is the Lipschitz gradient of $ϕ$ and $L_{F}$ is the Lipschitz gradient of $F$ .
2:: Step 1. For $k \geq 1$ , compute

$\{\begin{matrix} y_{k} : = {p r o x}_{λ ψ} (x_{k} - λ \nabla ϕ (x_{k})), u_{k} : = x_{k} - σ \nabla F (x_{k}), x_{k + 1} : = α_{k} u_{k} + (1 - α_{k}) y_{k}, \end{matrix}$

where $\nabla ϕ$ and $\nabla F$ are gradients of $ϕ$ and $F$ , respectively.

They presented that BiG-SAM appears simpler and cheaper than the method desired in [8]. Moreover, the authors in [6] used a numerical example to show that BiG-SAM outruns the method in [8] for solving problems (1) and (2). Up to this point, the algorithm in [6] seems to be the most efficient method for convex simple bilevel optimization problems.

In 2019, Shehu et al. [9] utilized the notion of an inertial technique, which was proposed by Polyak [10], to be beneficial to accelerate the convergence rate of the BiG-SAM method, called iBiG-SAM, as defined by Algorithm 2.

Algorithm 2 iBiG-SAM: Inertial with Bilevel Gradient Sequential Averaging Method

1:: Initial step. Let $L_{ϕ}$ and $L_{F}$ be Lipschitz gradients of $ϕ$ and $F$ , respectively. Given ${α_{k}}$ be a sequence in $(0, 1), λ \in (0, \frac{2}{L_{ϕ}})$ and $σ \in (0, \frac{2}{L_{F} + ρ}]$ . Select arbitrary points $x_{1}, x_{0} \in R^{n}$ and $α \geq 3$ .
2:: Step 1. Choose $μ_{k} \in [0, \bar{μ_{k}}]$ such that for $k \geq 1$ ,

$\bar{μ_{k}} : = \{\begin{matrix} min \{\frac{k}{k + α - 1}, \frac{η_{k}}{∥ x_{k} - x_{k - 1} ∥}\} if x_{k} \neq x_{k - 1}, \\ \frac{k}{k + α - 1} otherwise . \end{matrix}$

(3)
3:: Step 2. Compute

$\{\begin{matrix} z_{k} : = x_{k} + μ_{k} (x_{k} - x_{k - 1}), \\ y_{k} : = {p r o x}_{λ ψ} (z_{k} - λ \nabla ϕ z_{k}), \\ u_{k} : = z_{k} - σ \nabla F (z_{k}), \\ x_{k + 1} : = α_{k} u_{k} + (1 - α_{k}) y_{k}, \end{matrix}$

where $\nabla ϕ$ and $\nabla F$ are gradients of $ϕ$ and $F$ , respectively.

They also proved that the sequence

{x_{k}}

generated by iBiG-SAM converges to the optimal solution of problems (1) and (2) under the sequence

{α_{k}}

satisfying conditions:

(1): ${lim}_{k \to \infty} α_{k} = 0$ ;
(2): $\sum_{k = 1}^{\infty} α_{k} = + \infty$ .

The above assumptions are derived from [7] by reducing some situations.

Recently, to accelerate the convergence of the iBiG-SAM algorithm, Duan and Zhang [11] proposed three algorithms of inertial approximation methods based on the proximal gradient algorithm as defined by Algorithms 3–5.

Algorithm 3 aiBiG-SAM: The alternated inertial Bilevel Gradient Sequential Averaging Method

1:: Initial step. Let $L_{ϕ}$ and $L_{F}$ be Lipschitz gradients of $ϕ$ and $F$ , respectively. Given $λ \in (0, \frac{2}{L_{ϕ}}), σ \in (0, \frac{2}{L_{F} + ρ}], ϵ > 0$ . Let ${α_{k}}$ be a sequence in $(0, 1)$ satisfying the conditions assumed in [9]. Select arbitrary points $x_{1}, x_{0} \in H$ and $α \geq 3 .$ Set $k = 1$ .
2:: Step 1. Compute

$z_{k} = \{\begin{matrix} x_{k} + μ_{k} (x_{k} - x_{k - 1}), if k = odd; \\ x_{k} if k = even . \end{matrix}$
3:: When k is odd, choose $μ_{k}$ such that $0 \leq | μ_{k} | \leq \bar{μ_{k}}$ with $\bar{μ_{k}}$ defined by

$\bar{μ_{k}} : = \{\begin{matrix} min \{\frac{k}{k + α - 1}, \frac{η_{k}}{∥ x_{k} - x_{k - 1} ∥}\} if x_{k} \neq x_{k - 1}, \\ \frac{k}{k + α - 1} if x_{k} = x_{k - 1} . \end{matrix}$
4:: When k is even, $μ_{k} = 0$ .
5:: Step 2. Compute

$\{\begin{matrix} y_{k} = {prox}_{λ ψ} (z_{k} - λ \nabla ϕ (z_{k})), \\ u_{k} = z_{k} - σ \nabla F (z_{k}), \\ x_{k + 1} = α_{k} u_{k} + (1 - α_{k}) y_{k}, k \leq 1, \end{matrix}$

where $\nabla ϕ$ and $\nabla F$ are gradients of $ϕ$ and $F$ , respectively.
6:: Step 3. If $∥ x_{k} - x_{k - 1} ∥ < ϵ$ , then stop. Otherwise, set $k = k + 1$ and go to Step 1.

Algorithm 4 miBiG-SAM: The multi-step inertial Bilevel Gradient Sequential Averaging Method

1:: Initial step. Let $L_{ϕ}$ and $L_{F}$ be Lipschitz gradients of $ϕ$ and $F$ , respectively. Given $λ_{k} \in (0, \frac{2}{L_{ϕ}}), σ \in (0, \frac{2}{L_{F} + ρ}), ϵ > 0$ and $α \geq 3 .$ Let ${α_{k}}$ be a sequence in $(0, 1)$ satisfying the conditions assumed in [9]. Select arbitrary points $x_{0}, x_{1}, \dots, x_{2 - q} \in H$ and $q \in N_{+}$ . Set $k = 1$ .
2:: Step 1. Given $x_{k}, x_{k - 1}, \dots, x_{k - q + 1}$ and compute

$z_{k} = x_{k} + \sum_{i \in Q} μ_{i, k} (x_{k - i} - x_{k - 1 - i}),$

where $Q = {0, 1, \dots, q - 1}$ . Choose $μ_{i, k}$ such that $0 \leq | μ_{i, k} | \leq \bar{μ_{k}}$ with $\bar{μ_{k}}$ defined by

$\bar{μ_{k}} : = \{\begin{matrix} min \{\frac{k}{k + α - 1}, \frac{η_{k}}{\sum_{i \in Q} ∥ x_{k - i} - x_{k - 1 - i} ∥}\} if \sum_{i \in Q} ∥ x_{k - i} - x_{k - 1 - i} ∥ \neq 0, \\ \frac{k}{k + α - 1} otherwise . \end{matrix}$
3:: Step 2. Compute

$\{\begin{matrix} y_{k} = {p r o x}_{λ_{k} ψ} (z_{k} - λ_{k} \nabla ϕ (z_{k})), \\ u_{k} = z_{k} - σ \nabla F (z_{k}), \\ x_{k + 1} = α_{k} u_{k} + (1 - α_{k}) y_{k}, k \leq 1, \end{matrix}$

where $\nabla ϕ$ and $\nabla F$ are gradients of $ϕ$ and $F$ , respectively.
4:: Step 3. If $∥ x_{k} - x_{k - 1} ∥ < ϵ$ , then stop. Otherwise, set $k = k + 1$ and go to Step 1.

Algorithm 5 amiBiG-SAM: The multi-step alternated inertial Bilevel Gradient Sequential Averaging Method

1:: Initial step. Let $L_{ϕ}$ and $L_{F}$ be Lipschitz gradients of $ϕ$ and $F$ , respectively. Given $λ_{k} \in (0, \frac{2}{L_{ϕ}}), σ \in (0, \frac{2}{L_{F} + ρ}], ϵ > 0$ and $α \geq 3 .$ Let ${α_{k}}$ be a sequence in $(0, 1)$ satisfying the conditions assumed in [9]. Select arbitrary points $x_{0}, x_{1}, \dots, x_{2 - q} \in H$ and $q \in N_{+}$ . Set $k = 1$ .
2:: Step 1. Given $x_{k}, x_{k - 1}, \dots, x_{k - q + 1}$ and compute

$z_{k} = \{\begin{matrix} x_{k} + \sum_{i \in Q} μ_{i, k} (x_{k - i} - x_{k - 1 - i}), if k = odd; \\ x_{k} if k = even . \end{matrix}$

where $Q = {0, 1, \dots, q - 1}$ . Choose $μ_{i, k}$ such that $0 \leq | μ_{i, k} | \leq \bar{μ_{k}}$ with $\bar{μ_{k}}$ defined by

$\bar{μ_{k}} : = \{\begin{matrix} min \{\frac{k}{k + α - 1}, \frac{η_{k}}{\sum_{i \in Q} ∥ x_{k - i} - x_{k - 1 - i} ∥}\} if \sum_{i \in Q} ∥ x_{k - i} - x_{k - 1 - i} ∥ \neq 0, \\ \frac{k}{k + α - 1} otherwise . \end{matrix}$
3:: Step 2. Compute

$\{\begin{matrix} y_{k} = {prox}_{λ_{k} ψ} (z_{k} - λ_{k} \nabla ϕ (z_{k})), \\ u_{k} = z_{k} - σ \nabla F (z_{k}), \\ x_{k + 1} = α_{k} u_{k} + (1 - α_{k}) y_{k}, k \leq 1, \end{matrix}$

where $\nabla ϕ$ and $\nabla F$ are gradients of $ϕ$ and $F$ , respectively.
4:: Step 3. If $∥ x_{k} - x_{k - 1} ∥ < ϵ$ , then stop. Otherwise, set $k = k + 1$ and go to Step 1.

The convergence behavior of Algorithms 3–5 was shown, in [11], to be better than that of BiG-SAM and iBiG-SAM.

It is known that the following variational inequality:

〈 \nabla F (x^{☆}), x - x^{☆} 〉 \geq 0, \forall x \in Γ

(4)

implies

x^{☆}

is a solution of convex bilevel optimization problem (1); for more details, see [12]. For recent results, see [13,14] and references therein.

It is worth noting that

x^{☆} \in Γ

can be described by fixed-point equation:

{prox}_{λ ψ} (x^{☆} - λ \nabla ϕ (x^{☆})) = x^{☆},

(5)

where

λ > 0

and prox

_{λ ψ} (x) = \underset{u \in H}{arg min} \{ψ (u) + \frac{1}{2 λ} {∥ u - x ∥}_{2}^{2}\}

, which was introduced by Moreau [15]. This means that solving the bilevel problem is equivalent to finding a fixed point of the proximal operator. It is well known that the fixed point theory plays a very crucial role in solving many real-world problems, such as problems in engineering, economics, machine learning and data science, see [16,17,18,19,20,21,22,23,24] for more details. For the past three decades, several fixed point algorithms were introduced and studied by many authors, see [25,26,27,28,29,30,31,32,33,34]. Some of these algorithms were applied for solving various problems in images and signal processing, data classification and regression, for example, see [19,20,21,22,23]. In addition, fuzzy classification is another important data classification mechanism, see [35,36].

All of the works mentioned above motivate and inspire us to establish a new accelerated algorithm to solve a convex bilevel optimization problem and apply it for solving data classification problems.

We organize the paper as follows: In Section 2, we provide some basic definitions and useful lemmas used in the later section. The main results of the paper are given in Section 3. In this section, we introduce and study a new accelerated algorithm for solving a convex bilevel optimization problem and then prove a strong convergence of our proposed algorithm. After that, we apply our main results for solving a data classification problem in Section 4. Finally, a brief conclusion of the paper is given in Section 5.

2. Preliminaries

Throughout this paper, a real Hilbert space, denoted by

H

, with the inner product

〈 \cdot, \cdot 〉

, inducing the norm

∥ \cdot ∥

.

A mapping

T : C \to C

is called L-Lipschitz if there exists

L > 0

such that

∥ T x - T y ∥ \leq L ∥ x - y ∥, \forall x, y \in C \subseteq H .

If

L \in [0, 1)

, then T is called contraction. It is called nonexpansive if

L = 1

. We denote by

F (T)

the set of all fixed points of T, that is,

F (T) = {x \in C : T x = x}

. For a sequence

{x_{k}}

in

H

, we denote the strong convergence and the weak convergence of

{x_{k}}

to

u \in H

by

x_{k} \to u

and

x_{k} ⇀ u

, respectively.

Let

{T_{k}}

and ℑ be families of nonexpansive operators from C into itself with

\emptyset \neq F (ℑ) \subset ⋂_{k = 1}^{\infty} F (T_{k}),

where

F (ℑ)

is the set of all common fixed points of ℑ and

F (T_{k})

is the set of all fixed points of

T_{k}

.

The sequence

{T_{k}}

is said to satisfy the NST-condition

(I)

with ℑ if for every bounded sequence

{x_{k}}

in C,

lim_{k \to \infty} ∥ x_{k} - T_{k} x_{k} ∥ = 0 ⟹ lim_{k \to \infty} ∥ x_{k} - T x_{k} ∥ = 0, \forall T \in ℑ,

see [37] for more details. In particular, if

ℑ = {T}

, then

{T_{k}}

is a sequence satisfying NST-condition

(I)

with T.

Later, NST

^{☆}

-condition was proposed by Nakajo et al. [38] which is a weaker condition than that of NST-condition

(I)

. A sequence

{T_{k}}

is said to satisfy NST

^{☆}

-condition if for every bounded sequence

{x_{k}}

in

C,

if

{lim}_{k \to \infty} ∥ x_{k} - x_{k + 1} ∥ = 0

and

{lim}_{k \to \infty} ∥ x_{k} - T_{k} x_{k} ∥ = 0

imply

ω_{w} (x_{k}) \subset ⋂_{k = 1}^{\infty} F (T_{k}),

where

ω_{w} (x_{k})

is the set of all weak cluster points of

{x_{k}}

. It is easy to see that if

{T_{k}}

satisfies the NST-condition

(I)

, then it satisfies the NST

^{☆}

-condition.

In a real Hilbert space

H

, these properties hold: for any

u, v \in H

,

(1): ${∥ u + v ∥}^{2} \leq {∥ u ∥}^{2} + 2 〈 v, u + v 〉;$
(2): ${∥ r u + (1 - r) v ∥}^{2} = {r ∥ u ∥}^{2} + {(1 - r) ∥ v ∥}^{2} - r (1 - r) {∥ u - v ∥}^{2}, \forall r \in [0, 1] .$

If C is a nonempty closed convex subset of

H

, then for each

x \in H

, there exists a unique element in C, say

P_{C} x

, such that

∥ x - P_{C} x ∥ \leq ∥ x - y ∥, \forall y \in C .

The mapping

P_{C}

is known as the metric projection of

H

onto C and it is also nonexpansive. Moreover,

〈 x - P_{C} x, y - P_{C} x 〉 \leq 0

(6)

holds for all

x \in H

and

y \in C

.

The following results are also essential for proving our main results.

Lemma 1

([39]). Let

{u_{k}}, {t_{k}}

be nonnegative real numbers sequences,

{v_{k}}

a sequence in

[0, 1]

and

{w_{k}}

a sequence of numbers such that

u_{k + 1} \leq (1 - v_{k}) u_{k} + v_{k} w_{k} + t_{k}, \forall k \in N .

If all following conditions hold:

(1): $\sum_{k = 1}^{\infty} v_{k} = + \infty$ ;
(2): $\sum_{k = 1}^{\infty} t_{k} < + \infty$ ;
(3): ${lim sup}_{k \to \infty} w_{k} \leq 0 .$

Then,

{lim sup}_{k \to \infty} u_{k} = 0 .

Lemma 2

([40]). Let

H

be a real Hilbert space and

T : H \to H

a nonexpansive mapping with

F (T) \neq \emptyset

. Then, for any sequence

{x_{k}}

in

H

such that

x_{k} ⇀ u \in H

and

{lim}_{k \to \infty} ∥ x_{k} - T x_{k} ∥ = 0

imply

u \in F (T)

.

Lemma 3

([41]). Let

{λ_{k}}

be a sequence of real numbers that does not decrease at infinity in the sense that there exists a subsequence

{λ_{k_{i}}}

of

{λ_{k}}

which satisfies

λ_{k_{i}} < λ_{k_{i} + 1}

for all

i \in N

. Define

{φ (k)}_{k \geq m_{0}}

of integers as follows:

φ (k) = max {j \leq k : λ_{k} < λ_{k + 1}},

where

m_{0} \in N

such that

{j \leq m_{0} : λ_{k} < λ_{k + 1}} \neq \emptyset

. Then, the following hold:

(1): $φ (m_{0}) \leq φ (m_{0} + 1) \leq \dots a n d φ (k) \to \infty$ ;
(2): $λ_{φ (k)} \leq λ_{φ (k) + 1}$ and $λ_{k} \leq λ_{φ (k) + 1}$ for all $k \geq m_{0}$ .

Proposition 1

([6]). Suppose

F : H \to R

is strongly convex with convexity parameter

ρ > 0

and continuously differentiable function such that

\nabla F

is Lipschitz continuous with constant

L_{F}

. Then, the mapping

I - σ \nabla F

is contraction for all

σ \leq \frac{2}{L_{F} + ρ}

, where I is the identity operator.

Definition 1

([15]). Let

ψ : H \to R \cup {+ \infty}

be a proper convex and lower semicontinuous function. The proximity operator of parameter

λ > 0

of ψ at

u \in H

is denoted by prox

_{λ ψ}

and it is defined by

p r o x_{λ ψ} (u) = \underset{v \in H}{arg min} \{ψ (v) + \frac{1}{2 λ} {∥ v - u ∥}^{2}\} .

The operator

T : =

prox

_{λ ψ} (I - λ \nabla ϕ)

is known as a forward–backward operator of

ϕ

and

ψ

with respect to

λ

, where

λ > 0

and

\nabla ϕ

is the gradient operator of function

ϕ

. Moreover, T is a nonexpansive mapping whenever

λ \in (0, \frac{2}{L_{ϕ}})

where

L_{ϕ}

is a Lipschitz gradient of

ϕ

.

Lemma 4

([42]). For a real Hilbert space

H,

let

ψ : H \to R \cup {+ \infty}

be a proper convex and lower semicontinuous function, and

ϕ : H \to R

be convex differentiable with gradient

\nabla ϕ

being

L_{ϕ}

-Lipschitz gradient for some

L_{ϕ} > 0 .

If

{T_{k}}

is the family of forward–backward operators of ϕ and ψ with respect to

c_{k} \in (0, \frac{2}{L_{ϕ}})

such that

{c_{k}}

converges to c, then

{T_{k}}

satisfies NST-condition (I) with T, where T is the forward–backward operator of ϕ and ψ with respect to

c \in (0, \frac{2}{L_{ϕ}}) .

3. Main Results

We start this section by introducing a new common fixed point algorithm using the inertial technique together with the modified Ishikawa iteration (see [43,44,45] for more details) to obtain a strong convergence theorem for two countable families of nonexpansive mappings in a real Hilbert space as seen in Algorithm 6.

Algorithm 6 IVAM (I): Inertial Viscosity Approximation Method for Two Families of Nonexpansive Mappings

1:: Input. Let $x_{0}, x_{1} \in H, {η_{k}}$ a positive sequence and $f : H \to H$ a contraction with constant $γ$ . Choose ${α_{k}}, {β_{k}}, {ξ_{k}} \subset (0, 1)$ and $θ_{k} \geq 0$ .
2:: Select $μ_{k} \in (0, \bar{μ_{k}}]$ such that for $k \geq 1$ ,

$\bar{μ_{k}} : = \{\begin{matrix} min \{θ_{k}, \frac{η_{k}}{∥ x_{k} - x_{k - 1} ∥}\} if x_{k} \neq x_{k - 1}, \\ θ_{k} otherwise . \end{matrix}$

(7)
3:: Compute

$\{\begin{matrix} z_{k} = x_{k} + μ_{k} (x_{k} - x_{k - 1}), \\ y_{k} = β_{k} z_{k} + (1 - β_{k}) T_{k} z_{k}, \\ w_{k} = ξ_{k} y_{k} + (1 - ξ_{k}) S_{k} y_{k} \\ x_{k + 1} = α_{k} f (w_{k}) + (1 - α_{k}) w_{k} . \end{matrix}$

Lemma 4

Let

{T_{k}}

and

{S_{k}}

be two countable families of nonexpansive mappings from

H

into itself such that

Γ = \cap_{k = 1}^{\infty} F (T_{k}) ⋂ \cap_{k = 1}^{\infty} F (S_{k}) \neq \emptyset

and let

f : H \to H

be a contraction. If

{lim}_{k \to \infty} \frac{η_{k}}{α_{k}} = 0

, then the sequence

{x_{k}}

generated by Algorithm 6 is bounded. Furthermore,

{f (w_{k})}, {w_{k}}, {y_{k}}

and

{z_{k}}

are bounded.

Proof.

Let

x^{☆} \in Γ

be such that

x^{☆} = P_{Γ} f (x^{☆})

. Then, by the definition of

z_{k}

and

y_{k}

in Algorithm 6, for every

k \in N

, we have

\begin{matrix} ∥ z_{k} - x^{☆} ∥ & = ∥ x_{k} + μ_{k} (x_{k} - x_{k - 1}) - x^{☆} ∥ \\ \leq ∥ x_{k} - x^{☆} ∥ + μ_{k} ∥ x_{k} - x_{k - 1} ∥, \end{matrix}

(8)

and

\begin{matrix} ∥ y_{k} - x^{☆} ∥ & \leq β_{k} ∥ z_{k} - x^{☆} ∥ + (1 - β_{k}) ∥ T_{k} z_{k} - x^{☆} ∥ \\ \leq β_{k} ∥ z_{k} - x^{☆} ∥ + (1 - β_{k}) ∥ z_{k} - x^{☆} ∥ \\ = ∥ z_{k} - x^{☆} ∥ . \end{matrix}

(9)

This implies

\begin{matrix} ∥ w_{k} - x^{☆} ∥ & \leq ξ_{k} ∥ y_{k} - x^{☆} ∥ + (1 - ξ_{k}) ∥ S_{k} y_{k} - x^{☆} ∥ \\ \leq ξ_{k} ∥ y_{k} - x^{☆} ∥ + (1 - ξ_{k}) ∥ y_{k} - x^{☆} ∥ \\ = ∥ y_{k} - x^{☆} ∥ \end{matrix}

(10)

\begin{matrix} \leq ∥ z_{k} - x^{☆} ∥ . \end{matrix}

(11)

It follows from (8) and (11) that

\begin{matrix} ∥ x_{k + 1} - x^{☆} ∥ & = ∥ α_{k} (f (w_{k}) - x^{☆}) + (1 - α_{k}) (w_{k} - x^{☆}) ∥ \\ \leq α_{k} ∥ f (w_{k}) - x^{☆} ∥ + (1 - α_{k}) ∥ w_{k} - x^{☆} ∥ \\ = α_{k} ∥ f (w_{k}) - f (x^{☆}) + f (x^{☆}) - x^{☆} ∥ + (1 - α_{k}) ∥ w_{k} - x^{☆} ∥ \\ \leq α_{k} ∥ f (w_{k}) - f (x^{☆}) ∥ + α_{k} ∥ f (x^{☆}) - x^{☆} ∥ + (1 - α_{k}) ∥ w_{k} - x^{☆} ∥ \\ \leq α_{k} γ ∥ w_{k} - x^{☆} ∥ + α_{k} ∥ f (x^{☆}) - x^{☆} ∥ + (1 - α_{k}) ∥ w_{k} - x^{☆} ∥ \\ = [1 - α_{k} (1 - γ)] ∥ w_{k} - x^{☆} ∥ + α_{k} ∥ f (x^{☆}) - x^{☆} ∥ \\ \leq [1 - α_{k} (1 - γ)] ∥ z_{k} - x^{☆} ∥ + α_{k} ∥ f (x^{☆}) - x^{☆} ∥ \\ \leq [1 - α_{k} (1 - γ)] (∥ x_{k} - x^{☆} ∥ + μ_{k} ∥ x_{k} - x_{k - 1} ∥) + α_{k} ∥ f (x^{☆}) - x^{☆} ∥ \\ = [1 - α_{k} (1 - γ)] ∥ x_{k} - x^{☆} ∥ + μ_{k} ∥ x_{k} - x_{k - 1} ∥ - α_{k} (1 - γ) μ_{k} ∥ x_{k} - x_{k - 1} ∥ \\ + α_{k} ∥ f (x^{☆}) - x^{☆} ∥ \\ \leq [1 - α_{k} (1 - γ)] ∥ x_{k} - x^{☆} ∥ + μ_{k} ∥ x_{k} - x_{k - 1} ∥ + α_{k} ∥ f (x^{☆}) - x^{☆} ∥ \\ = [1 - α_{k} (1 - γ)] ∥ x_{k} - x^{☆} ∥ + α_{k} (1 - γ) [\frac{\frac{μ_{k}}{α_{k}} ∥ x_{k} - x_{k - 1} ∥ + ∥ f (x^{☆}) - x^{☆} ∥}{1 - γ}] \\ \leq max \{∥ x_{k} - x^{☆} ∥, \frac{\frac{μ_{k}}{α_{k}} ∥ x_{k} - x_{k - 1} ∥ + ∥ f (x^{☆}) - x^{☆} ∥}{1 - γ}\} . \end{matrix}

Using

lim_{k \to \infty} \frac{η_{k}}{α_{k}} = 0

and (7), we obtain

lim_{k \to \infty} \frac{μ_{k}}{α_{k}} ∥ x_{k} - x_{k - 1} ∥ = lim_{k \to \infty} \frac{η_{k}}{α_{k} ∥ x_{k} - x_{k - 1} ∥} ∥ x_{k} - x_{k - 1} ∥ = lim_{k \to \infty} \frac{η_{k}}{α_{k}} = 0 .

Thus, there exists

0 < \bar{M}

such that

\frac{μ_{k}}{α_{k}} ∥ x_{k} - x_{k - 1} ∥ < \bar{M}

for all

k \in N

, which implies

∥ x_{k + 1} - x^{☆} ∥ \leq max \{∥ x_{k} - x^{☆} ∥, \frac{\bar{M} + ∥ f (x^{☆}) - x^{☆} ∥}{1 - γ}\} .

(12)

By mathematical induction, we conclude that

∥ x_{k} - x^{☆} ∥ \leq M

for all

k \in N

, where

M = max \{∥ x_{1} - x^{☆} ∥, \frac{\bar{M} + ∥ f (x^{☆}) - x^{☆} ∥}{1 - γ}\} .

It follows that

{x_{k}}

is bounded. This implies that the sequences

{f (w_{k})}, {w_{k}}, {y_{k}}

and

{z_{k}}

are bounded. □

We now prove a strong convergence theorem of the sequence

{x_{k}}

generated by Algorithm 6 to solve a common fixed point problem as follows.

Theorem 1

Let

{T_{k}}

and

{S_{k}}

be two countable families of nonexpansive mappings from

H

into

H

such that

Γ = \cap_{k = 1}^{\infty} F (T_{k}) ⋂ \cap_{k = 1}^{\infty} F (S_{k}) \neq \emptyset

. Let

{x_{k}}

be a sequence generated by Algorithm 6. Suppose

{T_{k}}

and

{S_{k}}

satisfy NST

^{☆}

-conditions and the following conditions hold:

(1): $0 < a < α_{k} < \hat{a} < 1$ ;
(2): $0 < b < β_{k} < \hat{b} < 1$ ;
(3): $0 < c < ξ_{k} < \hat{c} < 1$ ;
(4): ${lim}_{k \to \infty} α_{k} = 0$ and $\sum_{k = 1}^{\infty} α_{k} = + \infty$ ;
(5): ${lim}_{k \to \infty} \frac{η_{k}}{α_{k}} = 0$ ,

where

a, b, c, \hat{a}, \hat{b}

and

\hat{c}

are real positive numbers. Then,

{x_{k}}

converges strongly to

x^{☆} \in Γ

, where

x^{☆} = P_{Γ} f (x^{☆})

.

Proof.

Let

x^{☆} \in Γ

be such that

x^{☆} = P_{Γ} f (x^{☆})

. It follows from (11) that

\begin{matrix} ∥ x_{k + 1} - x^{☆} ∥^{2} & = ∥ α_{k} [f (w_{k}) - f (x^{☆})] + (1 - α_{k}) (w_{k} - x^{☆}) + α_{k} (f (x^{☆}) - x^{☆}) ∥^{2} \\ \leq ∥ α_{k} [f (w_{k}) - f (x^{☆})] + (1 - α_{k}) (w_{k} - x^{☆}) ∥^{2} + 2 α_{k} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 \\ = α_{k} ∥ f (w_{k}) - f (x^{☆}) ∥^{2} + (1 - α_{k}) {∥ w_{k} - x^{☆} ∥}^{2} \\ - α_{k} (1 - α_{k}) {∥ (f (w_{k}) - f (x^{☆})) - (w_{k} - x^{☆}) ∥}^{2} + 2 α_{k} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 \\ \leq α_{k} ∥ f (w_{k}) - f (x^{☆}) ∥^{2} + (1 - α_{k}) {∥ w_{k} - x^{☆} ∥}^{2} + 2 α_{k} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 \\ \leq α_{k} γ^{2} ∥ w_{k} - x^{☆} ∥^{2} + (1 - α_{k}) {∥ w_{k} - x^{☆} ∥}^{2} + 2 α_{k} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 \\ = [1 - α_{k} (1 - γ^{2})] {∥ w_{k} - x^{☆} ∥}^{2} + 2 α_{k} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 \\ \leq [1 - α_{k} (1 - γ^{2})] {∥ z_{k} - x^{☆} ∥}^{2} + 2 α_{k} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 . \end{matrix}

This together with

z_{k} - x^{☆} = (x_{k} - x^{☆}) + μ_{k} (x_{k} - x_{k - 1})

and

0 \leq γ < 1

give us that

\begin{matrix} ∥ x_{k + 1} - x^{☆} ∥^{2} & \leq [1 - α_{k} (1 - γ)] {∥ (x_{k} - x^{☆}) + μ_{k} (x_{k} - x_{k - 1}) ∥}^{2} + 2 α_{k} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 \\ \leq [1 - α_{k} (1 - γ)] (∥ x_{k} - x^{☆} ∥^{2} + 2 μ_{k} ∥ x_{k} - x^{☆} ∥ ∥ x_{k} - x_{k - 1} ∥ + μ_{k}^{2} {∥ x_{k} - x_{k - 1} ∥}^{2}) \\ + 2 α_{k} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 \\ = [1 - α_{k} (1 - γ)] {∥ x_{k} - x^{☆} ∥}^{2} + 2 α_{k} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 \\ + [1 - α_{k} (1 - γ)] μ_{k} ∥ x_{k} - x_{k - 1} ∥ (2 ∥ x_{k} - x^{☆} ∥ + μ_{k} ∥ x_{k} - x_{k - 1} ∥) . \end{matrix}

(13)

Because

lim_{k \to \infty} μ_{k} ∥ x_{k} - x_{k - 1} ∥ = lim_{k \to \infty} α_{k} \frac{μ_{k}}{α_{k}} ∥ x_{k} - x_{k - 1} ∥ = 0,

there exists

0 < M_{1}

such that

μ_{k} ∥ x_{k} - x_{k - 1} ∥ \leq M_{1}

(14)

for all

k \in N .

Put

M_{2} : = sup_{k \in N} {∥ x_{k} - x^{☆} ∥, M_{1}}

. This together with (13) and (14) yields

\begin{matrix} ∥ x_{k + 1} - x^{☆} ∥^{2} & \leq [1 - α_{k} (1 - γ)] {∥ x_{k} - x^{☆} ∥}^{2} + 2 α_{k} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 \\ + μ_{k} ∥ x_{k} - x_{k - 1} ∥ (2 ∥ x_{k} - x^{☆} ∥ + M_{1}) \\ \leq [1 - α_{k} (1 - γ)] {∥ x_{k} - x^{☆} ∥}^{2} + 2 α_{k} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 \\ + μ_{k} ∥ x_{k} - x_{k - 1} ∥ (2 M_{2} + M_{2}) \\ = [1 - α_{k} (1 - γ)] {∥ x_{k} - x^{☆} ∥}^{2} + 2 α_{k} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 \\ + 3 M_{2} μ_{k} ∥ x_{k} - x_{k - 1} ∥ \\ = [1 - α_{k} (1 - γ)] {∥ x_{k} - x^{☆} ∥}^{2} \\ + α_{k} (1 - γ) [\frac{3 M_{2} \frac{μ_{k}}{α_{k}} ∥ x_{k} - x_{k - 1} ∥ + 2 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉}{1 - γ}] . \end{matrix}

(15)

We now set

u_{k}, v_{k}

and

s_{k}

as the following:

u_{k} : = {∥ x_{k} - x^{☆} ∥}^{2}, v_{k} : = α_{k} (1 - γ)

and

s_{k} : = \frac{3 M_{2} μ_{k}}{α_{k} (1 - γ)} ∥ x_{k} - x_{k - 1} ∥ + \frac{2}{1 - γ} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 .

So, we have from (15) that

u_{k + 1} \leq (1 - v_{k}) u_{k} + v_{k} s_{k}, \forall k \in N .

(16)

Next, we analyze the convergence of sequence

{x_{k}}

by considering the following two cases:

Case 1. Suppose

{∥ x_{k} - x^{☆} {∥}}_{k \geq m_{o}}

is nonincreasing for some

m_{0} \in N

. Because

{∥ x_{k} - x^{☆} ∥}

is bounded from below by zero, we obtain

lim_{k \to \infty} ∥ x_{k} - x^{☆} ∥

exists. It follows from

lim_{k \to \infty} α_{k} = 0

and

\sum_{k = 1}^{\infty} α_{k} = + \infty

that

\sum_{k = 1}^{\infty} v_{k} = \sum_{k = 1}^{\infty} α_{k} (1 - γ) = (1 - γ) \sum_{k = 1}^{\infty} α_{k} = + \infty .

To apply Lemma 1, we need to claim that

{lim sup}_{k \to \infty} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 \leq 0

. Indeed, by definition of

y_{k}

, we have

\begin{matrix} ∥ y_{k} - x^{☆} ∥^{2} & = ∥ β_{k} (z_{k} - x^{☆}) + (1 - β_{k}) (T_{k} z_{k} - x^{☆}) ∥^{2} \\ = β_{k} ∥ z_{k} - x^{☆} ∥^{2} + (1 - β_{k}) ∥ T_{k} z_{k} - x^{☆} ∥^{2} - β_{k} (1 - β_{k}) {∥ z_{k} - T_{k} z_{k} ∥}^{2} \\ \leq β_{k} ∥ z_{k} - x^{☆} ∥^{2} + (1 - β_{k}) ∥ z_{k} - x^{☆} ∥^{2} - β_{k} (1 - β_{k}) {∥ z_{k} - T_{k} z_{k} ∥}^{2} \\ = ∥ z_{k} - x^{☆} ∥^{2} - β_{k} (1 - β_{k}) {∥ z_{k} - T_{k} z_{k} ∥}^{2} . \end{matrix}

(17)

By Algorithm 6, (10) and (17), we obtain

\begin{matrix} ∥ x_{k + 1} - x^{☆} ∥^{2} & = ∥ α_{k} (f (w_{k}) - x^{☆}) + (1 - α_{k}) (w_{k} - x^{☆}) ∥^{2} \\ = α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + (1 - α_{k}) ∥ w_{k} - x^{☆} ∥^{2} - α_{k} (1 - α_{k}) {∥ f (w_{k}) - w_{k} ∥}^{2} \\ \leq α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + (1 - α_{k}) {∥ w_{k} - x^{☆} ∥}^{2} \\ \leq α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + (1 - α_{k}) {∥ y_{k} - x^{☆} ∥}^{2} \\ \leq α_{k} {∥ f (w_{k}) - x^{☆} ∥}^{2} + (1 - α_{k}) (∥ z_{k} - x^{☆} ∥^{2} - β_{k} (1 - β_{k}) {∥ z_{k} - T_{k} z_{k} ∥}^{2}) \\ = α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + (1 - α_{k}) {∥ (x_{k} - x^{☆}) + μ_{k} (x_{k} - x_{k - 1}) ∥}^{2} \\ - (1 - α_{k}) β_{k} (1 - β_{k}) {∥ z_{k} - T_{k} z_{k} ∥}^{2} \\ \leq α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} - (1 - α_{k}) β_{k} (1 - β_{k}) {∥ z_{k} - T_{k} z_{k} ∥}^{2} \\ + (1 - α_{k}) (∥ x_{k} - x^{☆} ∥^{2} + 2 μ_{k} ∥ x_{k} - x^{☆} ∥ ∥ x_{k} - x_{k - 1} ∥ + μ_{k}^{2} {∥ x_{k} - x_{k - 1} ∥}^{2}), \end{matrix}

(18)

which implies that for any

k \in N

,

\begin{matrix} (1 - α_{k}) β_{k} (1 - β_{k}) {∥ z_{k} - T_{k} z_{k} ∥}^{2} & \leq α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + (1 - α_{k}) ∥ x_{k} - x^{☆} ∥^{2} - {∥ x_{k + 1} - x^{☆} ∥}^{2} \\ + 2 μ_{k} (1 - α_{k}) ∥ x_{k} - x^{☆} ∥ ∥ x_{k} - x_{k - 1} ∥ \\ + μ_{k}^{2} (1 - α_{k}) {∥ x_{k} - x_{k - 1} ∥}^{2} \\ = α_{k} (∥ (f (w_{k}) - f (x_{k})) + (f (x_{k}) - x^{☆}) ∥^{2}) \\ + (1 - α_{k}) ∥ x_{k} - x^{☆} ∥^{2} - {∥ x_{k + 1} - x^{☆} ∥}^{2} \\ + 2 μ_{k} (1 - α_{k}) ∥ x_{k} - x^{☆} ∥ ∥ x_{k} - x_{k - 1} ∥ \\ + μ_{k}^{2} (1 - α_{k}) {∥ x_{k} - x_{k - 1} ∥}^{2} \\ \leq α_{k} (∥ f (w_{k}) - f (x_{k}) ∥^{2} + 2 ∥ f (w_{k}) - f (x_{k}) ∥ ∥ f (x_{k}) - x^{☆} ∥) \\ + α_{k} ∥ f (x_{k}) - x^{☆} ∥^{2} + (1 - α_{k}) {∥ x_{k} - x^{☆} ∥}^{2} \\ - ∥ x_{k + 1} - x^{☆} ∥^{2} + 2 μ_{k} (1 - α_{k}) ∥ x_{k} - x^{☆} ∥ ∥ x_{k} - x_{k - 1} ∥ \\ + μ_{k}^{2} (1 - α_{k}) {∥ x_{k} - x_{k - 1} ∥}^{2} . \end{matrix}

Taking

k \to \infty

, we obtain

lim_{k \to \infty} ∥ z_{k} - T_{k} z_{k} ∥ = 0 .

(19)

This implies

\begin{matrix} lim_{k \to \infty} ∥ y_{k} - z_{k} ∥ & = lim_{k \to \infty} (1 - β_{k}) ∥ T_{k} z_{k} - z_{k} ∥ \\ \leq lim_{k \to \infty} ∥ T_{k} z_{k} - z_{k} ∥ \\ = 0 . \end{matrix}

(20)

Because

∥ z_{k} - x_{k} ∥ = μ_{k} ∥ x_{k} - x_{k - 1} ∥

and

{lim}_{k \to \infty} μ_{k} ∥ x_{k} - x_{k - 1} ∥ = 0

, we derive

lim_{k \to \infty} ∥ z_{k} - x_{k} ∥ = 0 .

(21)

From

∥ y_{k} - x_{k} ∥ \leq ∥ y_{k} - z_{k} ∥ + ∥ z_{k} - x_{k} ∥

, (20) and (21), we obtain

lim_{k \to \infty} ∥ y_{k} - x_{k} ∥ = 0 .

(22)

Moreover, we have from (9), (18) and nonexpansiveness of

S_{k}

that

\begin{matrix} ∥ x_{k + 1} - x^{☆} ∥^{2} & \leq α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + (1 - α_{k}) {∥ w_{k} - x^{☆} ∥}^{2} \\ = α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + ∥ w_{k} - x^{☆} ∥^{2} - α_{k} {∥ w_{k} - x^{☆} ∥}^{2} \\ \leq α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + {∥ w_{k} - x^{☆} ∥}^{2} \\ = α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + {∥ ξ_{k} (y_{k} - x^{☆}) + (1 - ξ_{k}) (S_{k} y_{k} - x^{☆}) ∥}^{2} \\ = α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + ξ_{k} ∥ y_{k} - x^{☆} ∥^{2} + (1 - ξ_{k}) {∥ S_{k} y_{k} - x^{☆} ∥}^{2} \\ - ξ_{k} (1 - ξ_{k}) {∥ y_{k} - S_{k} y_{k} ∥}^{2} \\ \leq α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + ξ_{k} ∥ y_{k} - x^{☆} ∥^{2} + (1 - ξ_{k}) {∥ y_{k} - x^{☆} ∥}^{2} \\ - ξ_{k} (1 - ξ_{k}) {∥ y_{k} - S_{k} y_{k} ∥}^{2} \\ = α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + ∥ y_{k} - x^{☆} ∥^{2} - ξ_{k} (1 - ξ_{k}) {∥ y_{k} - S_{k} y_{k} ∥}^{2} \\ \leq α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + ∥ z_{k} - x^{☆} ∥^{2} - ξ_{k} (1 - ξ_{k}) {∥ y_{k} - S_{k} y_{k} ∥}^{2} \\ = α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + ∥ (x_{k} - x^{☆}) + μ_{k} (x_{k} - x_{k - 1}) ∥^{2} - ξ_{k} (1 - ξ_{k}) {∥ y_{k} - S_{k} y_{k} ∥}^{2} \\ \leq α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + ∥ x_{k} - x^{☆} ∥^{2} + 2 μ_{k} ∥ x_{k} - x^{☆} ∥ ∥ x_{k} - x_{k - 1} ∥ \\ + μ_{k}^{2} ∥ x_{k} - x_{k - 1} ∥^{2} - ξ_{k} (1 - ξ_{k}) {∥ y_{k} - S_{k} y_{k} ∥}^{2} . \end{matrix}

The above inequality implies

\begin{matrix} ξ_{k} (1 - ξ_{k}) {∥ y_{k} - S_{k} y_{k} ∥}^{2} & \leq α_{k} ∥ f (w_{k}) - x^{☆} ∥^{2} + ∥ x_{k} - x^{☆} ∥^{2} + 2 μ_{k} ∥ x_{k} - x^{☆} ∥^{2} {∥ x_{k} - x_{k - 1} ∥}^{2} \\ + μ_{k}^{2} ∥ x_{k} - x_{k - 1} ∥^{2} - {∥ x_{k + 1} - x^{☆} ∥}^{2} . \end{matrix}

By assumptions (3), (4) and

{lim}_{k \to \infty} ∥ x_{k} - x^{☆} ∥

exists together with

lim_{k \to \infty} μ_{k} ∥ x_{k} - x_{k - 1} ∥ = 0

, we obtain

lim_{k \to \infty} ∥ y_{k} - S_{k} y_{k} ∥ = 0 .

(23)

From the definition of

w_{k}

and assumption (3), we have

\begin{matrix} ∥ w_{k} - x_{k} ∥ & \leq ξ_{k} ∥ y_{k} - x_{k} ∥ + (1 - ξ_{k}) ∥ S_{k} y_{k} - x_{k} ∥ \\ \leq ξ_{k} ∥ y_{k} - x_{k} ∥ + (1 - ξ_{k}) (∥ S_{k} y_{k} - y_{k} ∥ + ∥ y_{k} - x_{k} ∥) \\ = ∥ y_{k} - x_{k} ∥ + (1 - ξ_{k}) ∥ S_{k} y_{k} - y_{k} ∥ \\ \leq ∥ y_{k} - x_{k} ∥ + ∥ S_{k} y_{k} - y_{k} ∥ . \end{matrix}

It follows from (22) and (23) that

lim_{k \to \infty} ∥ w_{k} - x_{k} ∥ = 0 .

(24)

Using the definition of

x_{k + 1}

, we have

\begin{matrix} ∥ x_{k + 1} - x_{k} ∥ & \leq α_{k} ∥ f (w_{k}) - x_{k} ∥ + (1 - α_{k}) ∥ w_{k} - x_{k} ∥ \\ \leq α_{k} ∥ f (w_{k}) - f (x^{☆}) ∥ + α_{k} ∥ f (x^{☆}) - x_{k} ∥ + (1 - α_{k}) ∥ w_{k} - x_{k} ∥ \\ \leq α_{k} γ ∥ w_{k} - x^{☆} ∥ + α_{k} ∥ f (x^{☆}) - x_{k} ∥ + ∥ w_{k} - x_{k} ∥ . \end{matrix}

Due to

{lim}_{k \to \infty} α_{k} = 0

, (24) and the boundedness of

{x_{k}}

and

{w_{k}}

, we obtain

lim_{k \to \infty} ∥ x_{k + 1} - x_{k} ∥ = 0 .

(25)

Let

ζ = {lim sup}_{k \to \infty} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 .

The boundedness of

{x_{k}}

implies that there exists a subsequence

{x_{k_{j}}}

such that

lim_{j \to \infty} 〈 f (x^{☆}) - x^{☆}, x_{k_{j} + 1} - x^{☆} 〉 = \underset{k \to \infty}{lim sup} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉 = ζ

and

x_{k_{j}} ⇀ x \in H .

It derives from the nonexpansiveness of

T_{k}

that

\begin{matrix} ∥ x_{k} - T_{k} x_{k} ∥ & \leq ∥ x_{k} - z_{k} ∥ + ∥ z_{k} - T_{k} z_{k} ∥ + ∥ T_{k} z_{k} - T_{k} x_{k} ∥ \\ \leq ∥ x_{k} - z_{k} ∥ + ∥ z_{k} - T_{k} z_{k} ∥ + ∥ z_{k} - x_{k} ∥ \\ = 2 ∥ x_{k} - z_{k} ∥ + ∥ z_{k} - T_{k} z_{k} ∥ . \end{matrix}

(26)

It follows from (19) and (21) that

lim_{k \to \infty} ∥ x_{k} - T_{k} x_{k} ∥ = 0

.

Using Lemma 2, we obtain

x \in \cap_{k = 1}^{\infty} F (T_{k})

. Due to

S_{k}

being nonexpansive, we have for any

k \in N

,

\begin{matrix} ∥ x_{k} - S_{k} x_{k} ∥ & \leq ∥ x_{k} - y_{k} ∥ + ∥ y_{k} - S_{k} y_{k} ∥ + ∥ S_{k} y_{k} - S_{k} x_{k} ∥ \\ \leq ∥ x_{k} - y_{k} ∥ + ∥ y_{k} - S_{k} y_{k} ∥ + ∥ y_{k} - x_{k} ∥ \\ = 2 ∥ x_{k} - y_{k} ∥ + ∥ y_{k} - S_{k} y_{k} ∥, \end{matrix}

(27)

which implies

lim_{k \to \infty} ∥ x_{k} - S_{k} x_{k} ∥ = 0

by employing (22) and (23). By Lemma 2, we obtain

x \in \cap_{k = 1}^{\infty} F (S_{k})

. Because

lim_{k \to \infty} ∥ x_{k + 1} - x_{k} ∥ = 0

, it follows that

x_{k_{j} + 1}

converges weakly to x.

In addition, utilizing

x^{☆} = P_{Γ} f (x^{☆})

together with (6) gives us that

ζ = lim_{j \to \infty} 〈 f (x^{☆}) - x^{☆}, x_{k_{j} + 1} - x^{☆} 〉 = 〈 f (x^{☆}) - x^{☆}, x - x^{☆} 〉 \leq 0 .

Therefore,

\underset{k \to \infty}{lim sup} 〈 f (x^{☆}) - x^{☆}, x_{k} - x^{☆} 〉 = 0 .

(28)

Invoking

lim_{k \to \infty} \frac{μ_{k}}{α_{k}} ∥ x_{k} - x_{k - 1} ∥ = 0

and (28), we obtain

\underset{k \to \infty}{lim sup} s_{k} = \underset{k \to \infty}{lim sup} [3 M_{2} \frac{μ_{k}}{α_{k} (1 - γ)} ∥ x_{k} - x_{k - 1} ∥ + \frac{2}{1 - γ} 〈 f (x^{☆}) - x^{☆}, x_{k + 1} - x^{☆} 〉] \leq 0 .

(29)

Coming back to (16), by Lemma 1, we can conclude that

x_{k} \to x^{☆}

.

Case 2. Suppose that

{∥ x_{k} - x^{☆} ∥}

is not a monotonically decreasing sequence. To apply Lemma 3, put

λ_{k} : = ∥ x_{k} - x^{☆} ∥

. Then, there exists a subsequence

{λ_{k_{i}}}

of

{λ_{k}}

such that

λ_{k_{i}} < λ_{k_{i} + 1}, \forall i \in N .

In this case, let

φ : N \to N

be defined by

φ (k) : = max {j \in N : j \leq k, λ_{k_{j}} < λ_{k_{j} + 1}} .

Therefore,

φ (k)

satisfies the condition in Lemma 3. Hence, we have

λ_{φ (k)} \leq λ_{φ (k) + 1}

for all

k .

This means that

∥ x_{φ (k)} - x^{☆} ∥ \leq ∥ x_{φ (k) + 1} - x^{☆} ∥, \forall k .

As the proof in Case 1, we also have that for any

k,

\begin{matrix} β_{φ (k)} (1 & - β_{φ (k)}) (1 - α_{φ (k)}) {∥ z_{φ (k)} - T_{φ (k)} z_{φ (k)} ∥}^{2} \\ \leq α_{φ (k)} ∥ f (w_{φ (k)}) - f (x_{φ (k)}) ∥^{2} + α_{φ (k)} {∥ f (x_{φ (k)}) - x^{☆} ∥}^{2} \\ + 2 α_{φ (k)} ∥ f (w_{φ (k)}) - f (x_{φ (k)}) ∥ ∥ f (x_{φ (k)}) - x^{☆} ∥ - α_{φ (k)} {∥ x_{φ (k)} - x^{☆} ∥}^{2} \\ + ∥ x_{φ (k)} - x^{☆} ∥^{2} - {∥ x_{φ (k) + 1} - x^{☆} ∥}^{2} \\ + μ_{φ (k)} (1 - α_{φ (k)}) ∥ x_{φ (k)} - x_{φ (k) - 1} ∥ (2 ∥ x_{φ (k)} - x^{☆} ∥ + μ_{φ (k)} ∥ x_{φ (k)} - x_{φ (k) - 1} ∥) . \end{matrix}

Because

∥ x_{φ (k)} - x^{☆} ∥ \leq ∥ x_{φ (k) + 1} - x^{☆} ∥

for all k, the above inequality leads to

\begin{matrix} β_{φ (k)} (1 & - β_{φ (k)}) (1 - α_{φ (k)}) {∥ z_{φ (k)} - T_{φ (k)} z_{φ (k)} ∥}^{2} \\ \leq α_{φ (k)} ∥ f (w_{φ (k)}) - f (x_{φ (k)}) ∥^{2} + α_{φ (k)} {∥ f (x_{φ (k)}) - x^{☆} ∥}^{2} \\ + 2 α_{φ (k)} ∥ f (w_{φ (k)}) - f (x_{φ (k)}) ∥ ∥ f (x_{φ (k)}) - x^{☆} ∥ - α_{φ (k)} {∥ x_{φ (k)} - x^{☆} ∥}^{2} \\ + μ_{φ (k)} (1 - α_{φ (k)}) ∥ x_{φ (k)} - x_{φ (k) - 1} ∥ (2 ∥ x_{φ (k)} - x^{☆} ∥ + μ_{φ (k)} ∥ x_{φ (k)} - x_{φ (k) - 1} ∥) . \end{matrix}

Using

lim_{k \to \infty} α_{φ (k)} = 0

and

lim_{k \to \infty} μ_{φ (k)} ∥ x_{φ (k)} - x_{φ (k) - 1} ∥ = 0

, we obtain

lim_{k \to \infty} ∥ z_{φ (k)} - T_{φ (k)} z_{φ (k)} ∥ = 0 .

(30)

Similar to the proof of Case 1, we conclude

\begin{matrix} lim_{k \to \infty} ∥ z_{φ (k)} - x_{φ (k)} ∥ & = 0, \end{matrix}

(31)

\begin{matrix} lim_{k \to \infty} ∥ y_{φ (k)} - x_{φ (k)} ∥ & = 0, \end{matrix}

(32)

\begin{matrix} lim_{k \to \infty} ∥ y_{φ (k)} - S_{φ (k)} y_{φ (k)} ∥ & = 0, \end{matrix}

(33)

and so

\begin{matrix} lim_{k \to \infty} ∥ x_{φ (k) + 1} - x_{φ (k)} ∥ & = 0 . \end{matrix}

(34)

Put

δ : = \underset{k \to \infty}{lim sup} 〈 f (x^{☆}) - x^{☆}, x_{φ (k) + 1} - x^{☆} 〉 .

Due to

{x_{φ (k)}}

being bounded, there exists a subsequence

{x_{φ (k_{j})}}

of

{x_{φ (k)}}

such that

δ : = \underset{k \to \infty}{lim sup} 〈 f (x^{☆}) - x^{☆}, x_{φ (k) + 1} - x^{☆} 〉 = δ : = lim_{j \to \infty} 〈 f (x^{☆}) - x^{☆}, x_{φ (k_{j}) + 1} - x^{☆} 〉

and

x_{φ (k_{j})} ⇀ ν

for some

ν \in H

. The nonexpansiveness of

T_{φ (k)}

and

S_{φ (k)}

implies

\begin{matrix} ∥ x_{φ (k)} - T_{φ (k)} x_{φ (k)} ∥ & \leq ∥ x_{φ (k)} - z_{φ (k)} ∥ + ∥ z_{φ (k)} - T_{φ (k)} z_{φ (k)} ∥ + ∥ T_{φ (k)} z_{φ (k)} - T_{φ (k)} x_{φ (k)} ∥ \\ \leq ∥ x_{φ (k)} - z_{φ (k)} ∥ + ∥ z_{φ (k)} - T_{φ (k)} z_{φ (k)} ∥ + ∥ z_{φ (k)} - x_{φ (k)} ∥ \end{matrix}

(35)

and

\begin{matrix} ∥ x_{φ (k)} - S_{φ (k)} x_{φ (k)} ∥ & \leq ∥ x_{φ (k)} - y_{φ (k)} ∥ + ∥ y_{φ (k)} - S_{φ (k)} y_{φ (k)} ∥ + ∥ S_{φ (k)} y_{φ (k)} - S_{φ (k)} x_{φ (k)} ∥ \\ \leq ∥ x_{φ (k)} - y_{φ (k)} ∥ + ∥ y_{φ (k)} - S_{φ (k)} y_{φ (k)} ∥ + ∥ y_{φ (k)} - x_{φ (k)} ∥ . \end{matrix}

(36)

Taking

k \to \infty

in (35) and (36), we derive from (30)–(33) that

\begin{matrix} lim_{k \to \infty} ∥ x_{φ (k)} - T_{φ (k)} x_{φ (k)} ∥ & = 0 \end{matrix}

(37)

and

\begin{matrix} lim_{k \to \infty} ∥ x_{φ (k)} - S_{φ (k)} x_{φ (k)} ∥ & = 0 . \end{matrix}

(38)

By Lemma 2, we obtain

ν \in Γ .

Due to

lim_{j \to \infty} ∥ x_{φ (k_{j}) + 1} - x_{φ (k_{j})} ∥ = 0

, we obtain

x_{φ (k_{j}) + 1} ⇀ ν

. Furthermore, it follows from

x^{☆} : = P_{Γ} f (x^{☆})

and (6) that

δ : = lim_{j \to \infty} 〈 f (x^{☆}) - x^{☆}, x_{φ (k_{j}) + 1} - x^{☆} 〉 = 〈 f (x^{☆}) - P_{Γ} f (x^{☆}), ν - P_{Γ} f (x^{☆}) 〉 \leq 0,

and thus

\begin{matrix} \underset{k \to \infty}{lim sup} 〈 f (x^{☆}) - x^{☆}, x_{φ (k) + 1} - x^{☆} 〉 = δ \leq 0 . \end{matrix}

(39)

Because

λ_{φ (k)} \leq λ_{φ (k) + 1}

, as in the proof of Case 1, we have that for every k,

\begin{matrix} ∥ x_{φ (k)} & - x^{☆} ∥^{2} \\ \leq ∥ x_{φ (k) + 1} - x^{☆} ∥^{2} \\ \leq (1 - α_{φ (k)} (1 - γ)) {∥ x_{φ (k)} - x^{☆} ∥}^{2} \\ + α_{φ (k)} (1 - γ) [\frac{3 M_{2} \frac{μ_{φ (k)}}{α_{φ (k)}} ∥ x_{φ (k)} - x_{φ (k) - 1} ∥ + 2 〈 f (x^{☆}) - x^{☆}, x_{φ (k) + 1} - x^{☆} 〉}{1 - γ}] . \end{matrix}

(40)

Therefore,

\begin{matrix} α_{φ (k)} (1 & - γ) ∥ x_{φ (k)} - x^{☆} ∥^{2} \\ \leq α_{φ (k)} (1 - γ) [\frac{3 M_{2} \frac{μ_{φ (k)}}{α_{φ (k)}} ∥ x_{φ (k)} - x_{φ (k) - 1} ∥ + 2 〈 f (x^{☆}) - x^{☆}, x_{φ (k) + 1} - x^{☆} 〉}{1 - γ}] . \end{matrix}

(41)

From

α_{φ (k)} \in (0, 1)

and

γ \in [0, 1)

, we obtain

α_{φ (k)} (1 - γ) > 0

, which implies

\begin{matrix} ∥ x_{φ (k)} - x^{☆} ∥^{2} & \leq \frac{3 M_{2} \frac{μ_{φ (k)}}{α_{φ (k)}} ∥ x_{φ (k)} - x_{φ (k) - 1} ∥ + 2 〈 f (x^{☆}) - x^{☆}, x_{φ (k) + 1} - x^{☆} 〉}{1 - γ} . \end{matrix}

(42)

Invoking

lim_{k \to \infty} \frac{μ_{k}}{α_{k}} ∥ x_{k} - x_{k - 1} ∥ = 0

and (39), we obtain

\underset{k \to \infty}{lim sup} ∥ x_{φ (k)} - x^{☆} ∥ = 0,

and hence

lim_{k \to \infty} ∥ x_{φ (k)} - x^{☆} ∥ = 0 .

It follows from (34) that

lim_{k \to \infty} ∥ x_{φ (k) + 1} - x^{☆} ∥ = 0 .

By Lemma 3, we obtain

0 \leq lim_{k \to \infty} ∥ x_{k} - x^{☆} ∥ \leq lim_{k \to \infty} ∥ x_{φ (k) + 1} - x^{☆} ∥ = 0 .

Therefore,

{x_{k}}

converges strongly to

x^{☆}

. □

We observe that Algorithm 6 can be reduced to Algorithm 7 by setting

S_{k} = T_{k}

for finding a common fixed point of a countable family of nonexpansive mappings of

{T_{k}}

.

Corollary 1.

Let

{T_{k}}

be a countable family of nonexpansive mappings from

H

into itself such that

Γ = \cap_{k = 1}^{\infty} F (T_{k}) \neq \emptyset

. Suppose

{T_{k}}

satisfies NST

^{☆}

-conditions and the following conditions hold:

(1): $0 < a < α_{k} < \hat{a} < 1$ ;
(2): $0 < b < β_{k} < \hat{b} < 1$ ;
(3): $0 < c < ξ_{k} < \hat{c} < 1$ ;
(4): ${lim}_{k \to \infty} α_{k} = 0$ and $\sum_{k = 1}^{\infty} α_{k} = + \infty$ ;
(5): ${lim}_{k \to \infty} \frac{η_{k}}{α_{k}} = 0$ ,

where

a, b, c, \hat{a}, \hat{b}

and

\hat{c}

are real positive numbers. Then, the sequence

{x_{k}}

generated by Algorithm 7 converges strongly to

x^{☆} \in Γ

, where

x^{☆} = P_{Γ} f (x^{☆})

.

Algorithm 7 IVMIA (II): Inertial Viscosity Approximation Method for a family of Nonexpansive Mappings

1:: Input. Let $x_{0}, x_{1} \in H, {η_{k}}$ a positive sequence and $f : H \to H$ a $γ$ -contraction. Choose ${α_{k}}, {β_{k}}, ξ_{k} \subset (0, 1)$ and $θ_{k} \geq 0$ .
2:: Select $μ_{k} \in (0, \bar{μ_{k}}]$ such that for $k \geq 1$ ,

$\bar{μ_{k}} : = \{\begin{matrix} min \{θ_{k}, \frac{η_{k}}{∥ x_{k} - x_{k - 1} ∥}\} if x_{k} \neq x_{k - 1}, \\ θ_{k} otherwise . \end{matrix}$

(43)
3:: Compute

$\{\begin{matrix} z_{k} = x_{k} + μ_{k} (x_{k} - x_{k - 1}), \\ y_{k} = β_{k} z_{k} + (1 - β_{k}) T_{k} z_{k}, \\ w_{k} = ξ_{k} y_{k} + (1 - ξ_{k}) T_{k} y_{k} \\ x_{k + 1} = α_{k} f (w_{k}) + (1 - α_{k}) w_{k} . \end{matrix}$

4. Application to Convex Bilevel Optimization Problems

The aim of this section is to apply our proposed algorithm for solving the following convex bilevel optimization problem:

min_{x \in Γ} F (x),

(44)

where

F : H \to R

is strongly convex differentiable with

\nabla F

being

L_{F}

-Lipschitz continuous and

Γ

is the set of all common minimizers of the following unconstrained minimization problems:

min_{x} \{ϕ_{1} (x) + ψ_{1} (x)\} a n d min_{x} \{ϕ_{2} (x) + ψ_{2} (x)\},

(45)

where

ψ_{i}, ϕ_{i} : H \to (- \infty, + \infty]

,

i = 1, 2

, are proper convex and lower semicontinuous functions and

ϕ_{1}, ϕ_{2}

are differentiable functions. Problem (45) can be reduced to (2) if

ϕ_{1} = ϕ_{2}

and

ψ_{1} = ψ_{2} .

As in the literature, we know that

x^{☆} \in Γ

if and only if

x^{☆} = {prox}_{λ_{k} ψ_{1}} (I - λ_{k} \nabla ϕ_{1}) a n d x^{☆} = {prox}_{ε_{k} ψ_{2}} (I - ε_{k} \nabla ϕ_{2}),

where

λ_{k} \in (0, \frac{2}{L_{ϕ_{1}}}), ε_{k} \in (0, \frac{2}{L_{ϕ 2}})

while

L_{ϕ_{1}}

and

L_{ϕ_{2}}

are Lipschitz gradients of

\nabla ϕ_{1}

and

\nabla ϕ_{2},

respectively. In addition,

x^{☆} \in Γ

also is a solution of problem (44) if it satisfies the following form:

〈 \nabla F (x^{☆}), x - x^{☆} 〉 \geq 0, \forall x \in Γ .

(46)

Therefore, we solve convex bilevel optimization problems (44) and (45) by finding a common fixed point

x^{☆}

of

{prox}_{λ_{k} ψ_{1}} (I - λ_{k} \nabla ϕ_{1})

and

{prox}_{ε_{k} ψ_{2}} (I - ε_{k} \nabla ϕ_{2})

, which satisfies the formulation of (46).

Next, we present the algorithm derived from our main result for solving the convex bilevel optimization problem as defined by Algorithm 8.

In order to solve (44) and (45), we suppose the following conditions hold:

(1): $f : H \to H$ is a $γ$ -contraction with $γ \in [0, 1)$ ;
(2): ${λ_{k}} \subset (0, \frac{2}{L_{ϕ_{1}}})$ and $λ \in (0, \frac{2}{L_{ϕ_{1}}})$ with $λ_{k} \to λ$ ;
(3): ${ε_{k}} \subset (0, \frac{2}{L_{ϕ_{2}}})$ and $ε \in (0, \frac{2}{L_{ϕ_{2}}})$ with $ε_{k} \to ε$ ;
(4): ${α_{k}}, {β_{k}}$ and ${ξ_{k}}$ are sequences in $(0, 1)$ ;
(5): $ψ_{i}, i = 1, 2,$ be two lower semicontinuous functions and convex from $H$ into $R \cup + \infty$ ;
(6): $ϕ_{i}, i = 1, 2,$ be two smooth convex loss functions and differentiable with $L_{ϕ_{i}}$ -Lipschitz continuous gradients of $\nabla ϕ_{i}, i = 1, 2,$ respectively;
(7): $F : H \to R$ is strongly convex differentiable with $\nabla F$ being $L_{F}$ -Lipschitz constant and $σ \in (0, \frac{2}{L_{F} + ρ})$ where $ρ$ is a parameter such that $F$ is strongly convex.

Theorem 2.

Let

{x_{k}}

be a sequence generated by Algorithm 8 such that all conditions as in Theorem 1 hold. Let Ω be the set of all solutions of (44). Then,

{x_{k}}

converges strongly to

x^{☆} \in Ω

which satisfies

x^{☆} = P_{Γ} f (x^{☆})

.

Algorithm 8 iVMBi(I): Inertial Viscosity Method for Bilevel Optimization Problem (I)

1:: Input. Let $x_{0}, x_{1} \in H, {η_{k}}$ a positive sequence. Choose ${α_{k}}, {β_{k}}, {ξ_{k}} \subset (0, 1)$ and $θ_{k} \geq 0$ .
2:: Step 1. Select $μ_{k} \in (0, \bar{μ_{k}}]$ such that for $k \geq 1$ ,

$\bar{μ_{k}} : = \{\begin{matrix} min \{θ_{k}, \frac{η_{k}}{∥ x_{k} - x_{k - 1} ∥}\} if x_{k} \neq x_{k - 1}, \\ θ_{k} otherwise . \end{matrix}$

(47)
3:: Step 2. Compute

$\{\begin{matrix} z_{k} = x_{k} + μ_{k} (x_{k} - x_{k - 1}), \\ y_{k} = β_{k} z_{k} + (1 - β_{k}) {prox}_{λ_{k} ψ_{1}} (I - λ_{k} \nabla ϕ_{1}) z_{k}, \\ w_{k} = ξ_{k} y_{k} + (1 - ξ_{k}) {prox}_{ε_{k} ψ_{2}} (I - ε_{k} \nabla ϕ_{2}) y_{k} \\ u_{k} = (I - σ \nabla F) (w_{k}), \\ x_{k + 1} = α_{k} u_{k} + (1 - α_{k}) w_{k} . \end{matrix}$

Proof.

Let

T_{k} : =

prox

_{λ_{k} ψ_{1}} (I - λ_{k} \nabla ϕ_{1})

and

S_{k} : =

prox

_{ε_{k} ψ_{2}} (I - ε_{k} \nabla ϕ_{2})

as in Algorithm 6, where

λ_{k} \in (0, \frac{2}{L_{ϕ_{1}}}), ε_{k} \in (0, \frac{2}{L_{ϕ_{2}}})

while

L_{ϕ_{i}}, i = 1, 2,

are Lipschitz gradients of

\nabla ϕ_{i}, i = 1, 2

, respectively. Using Proposition 1, we get that

I - σ \nabla F

is a contraction mapping. By Theorem 1 and setting

f : = I - σ \nabla F

, we obtain that

{x_{k}}

converges strongly to

x^{☆} \in Γ

, where

x^{☆} = P_{Γ} f (x^{☆})

. Observe that

f (x^{☆}) = x^{☆} - σ \nabla F (x^{☆}) .

It is derived from (6) that for any

x \in Γ

,

\begin{matrix} 0 & \leq 〈P_{Γ} f (x^{☆}) - f (x^{☆}), x - P_{Γ} f (x^{☆})〉 \\ = 〈x^{☆} - f (x^{☆}), x - x^{☆}〉 \\ = 〈x^{☆} - (x^{☆} - σ \nabla F (x^{☆})), x - x^{☆}〉 \\ = σ 〈\nabla F (x^{☆}), x - x^{☆}〉 . \end{matrix}

Because

0 < σ

, we conclude

0 \leq 〈 \nabla F (x^{☆}), x - x^{☆} 〉

for all

x \in Γ,

that is,

x^{☆}

is an optimal solution of problem (44). Hence, we obtain the desired result. □

Furthermore, our algorithm can be applied to solving convex bilevel optimization problems (1) and (2) by using the same proximity operator in step 2 and 3 as seen in Algorithm 9.

Algorithm 9 iVMBi(II): Inertial Viscosity Method for Bilevel Optimization Problem (II)

1:: Input. Let $x_{0}, x_{1} \in H, {η_{k}}$ a positive sequence. Choose ${α_{k}}, {β_{k}}, {ξ_{k}} \subset (0, 1)$ and $θ_{k} \geq 0$ .
2:: Step 1. Select $μ_{k} \in (0, \bar{μ_{k}}]$ such that for $k \geq 1$ ,

$\bar{μ_{k}} : = \{\begin{matrix} min \{θ_{k}, \frac{η_{k}}{∥ x_{k} - x_{k - 1} ∥}\} if x_{k} \neq x_{k - 1}, \\ θ_{k} otherwise . \end{matrix}$

(48)
3:: Step 2. Compute

$\{\begin{matrix} z_{k} = x_{k} + μ_{k} (x_{k} - x_{k - 1}), \\ y_{k} = β_{k} z_{k} + (1 - β_{k}) {prox}_{λ_{k} ψ} (I - λ_{k} \nabla ϕ) z_{k}, \\ w_{k} = ξ_{k} y_{k} + (1 - ξ_{k}) {prox}_{λ_{k} ψ} (I - λ_{k} \nabla ϕ) y_{k} \\ u_{k} = (I - σ \nabla F) (w_{k}), \\ x_{k + 1} = α_{k} u_{k} + (1 - α_{k}) w_{k} . \end{matrix}$

The following result is immediately obtained by Theorem 2.

Theorem 3.

Let

{x_{k}}

be a sequence generated by Algorithm 9 such that all conditions as in Corollary 1 hold. Then,

{x_{k}}

converges strongly to

x^{☆} \in arg min (ϕ + ψ)

which satisfies

x^{☆} = P_{Γ} f (x^{☆}) a n d 〈 \nabla F (x^{☆}), x - x^{☆} 〉 \geq 0, \forall x \in Γ,

that is,

x_{k} \to x^{☆} \in Ω

, where Ω is the set of all solutions of problems (1) and (2).

Next, we use Algorithm 9 as a machine learning algorithm for solving some data classification problems applying on UCI-datasets of breast cancer and heart disease. Moreover, we compare the performance of Algorithm 9 with BiG-SAM, iBiG-SAM, aiBiG-SAM, miBiG-SAM and amiBiG-SAM.

In order to employ Algorithm 9 for solving data classification, we need to know what is the objective function of the inner level. To obtain this, we use a single-layer feedback neuron network (SLFNs) model and the concept of extreme learning machine (ELM) introduced by Huang et al. [46].

In supervised learning, we start with the training set of N samples

S : = {(p_{k}, q_{k}) : p_{k} \in R^{n}, q_{k} \in R^{m}, k = 1, 2, \dots, N}

, where

p_{k}

is input data and

q_{k}

is a target. The mathematical model of ELM for SLFNs with M hidden nodes and activate function

G

is given by

o_{j} = \sum_{i = 1}^{M} m_{i} G (〈 w_{i}, p_{j} 〉 + r_{i}), j = 1, 2, \dots, N

where

m_{i}

is the weight vector connecting the i-th hidden node and the output node,

r_{i}

is a bias and

w_{i}

is the weight vector connecting the i-th hidden node and the input node.

Let

A

be a matrix given by the following:

A = [\begin{matrix} G (〈 w_{1}, p_{1} 〉 + r_{1}) & \dots & G (〈 w_{M}, p_{1} 〉 + r_{M}) \\ ⋮ & ⋱ & ⋮ \\ G (〈 w_{1}, p_{N} 〉 + r_{1}) & \dots & G (〈 w_{M}, p_{N} 〉 + r_{M}) \end{matrix}] .

This matrix

A

is known as the hidden-layer output matrix.

For prediction or classification problem by using ELM model, we need a zero mean, that is,

\sum_{j = 1}^{N} | o_{j} - q_{j} | = 0 .

Hence,

q_{j} = \sum_{i = 1}^{M} m_{i} G (〈 w_{i}, x_{j} 〉 + r_{i}), i = 1, 2, \dots, N .

We can write the above system of linear equations of M variable and N equations as a matrix equation as follows:

A m = Q,

(49)

where

m = {[m_{1}^{T}, \dots, m_{M}^{T}]}^{T}

and

Q = {[q_{1}^{T}, \dots, q_{N}^{T}]}^{T}

is the training data. To solve ELM, it is to find a weight m satisfies (49). If the Moore–Penrose generalized inverse

A^{†}

of

A

exists, then

m = A^{†} Q

. However, in the case that

A^{†}

does not exist, we can find m as the minimizer of the following convex minimization problem:

min_{m} {∥ A m - Q ∥}_{2}^{2} .

(50)

Using a least squares model (50) may cause the over fitting problem. In order to prevent this problem, the regularization methods were proposed. The classical one is the Tikhonov regularization [47], which was employed to solve the following minimization problem:

{Minimize : ∥ A m - Q ∥}_{2}^{2} + β {∥ K m ∥}_{2}^{2},

(51)

where

β

is the regularized parameter and K is the Tikhonov matrix. In the standard form, K is set to be the identity.

Another regularization method is the least absolute shrinkage and selection operator (LASSO), which was proposed by Tibshirani [48] for solving the following convex minimization problem:

{Minimize : ∥ A m - Q ∥}_{2}^{2} + β {∥ m ∥}_{1},

(52)

where

β

is the regularized parameter and

∥ (x_{1}, x_{2}, \dots, x_{p}) ∥_{1} = \sum_{i = 1}^{p} | x_{i} | .

In this work, we set

ψ (m) = {β ∥ m ∥}_{1}

and

ϕ (m) = {∥ A m - Q ∥}_{2}^{2}

. Based on model (51), we can apply Algorithm 9 for solving the convex bilevel optimization problems (1) and (2) while the objective function of the outer level

F (m) = \frac{1}{2} {∥ m ∥}_{2}^{2}

. We now conduct some numerical experiments for classifications of the following datasets.

In these experiments, we aim to classify the datasets of breast cancer and heart disease from https://archive.ics.uci.edu, accessed on 12 June 2022.

Breast cancer dataset [49]. This dataset contains 699 samples, each of which has 11 attributes. In this dataset, we classify two classes of data.

Heart disease dataset [50]. This dataset contains 303 samples, each of which has 13 attributes. In this dataset, we classify two classes of data.

Throughout these experiments, all the results are performed under MATLAB 9.6 (R2019a) running on a MacBook Air 13.3-inch, 2020, with Apple M1 chip processor and 8-core GPU, configured with 8 GB of RAM.

In all the experiments, sigmoid is used as an activation function, and we set the number of hidden node

M = 30

. The following formula for the accuracy of the data classification is given by

Accuracy(Acc) = \frac{T P + T N}{T P + T N + F P + F N} \times 100,

where

T P

is the model successfully predicting the patient as positive,

T N

denotes the model successfully predicting the patient as negative,

F N

represents the prediction of the diseased patient as healthy by negative test results and

F P

means the prediction of a healthy patient as diseased by a positive test result.

We also compute the success probability of making a correct positive class classification as the following form:

Precision(Pre) = \frac{T P}{T P + F P} .

In addition, we measure the sensitivity of the model toward identifying the positive class as the following form:

Recall(Rec) = \frac{T P}{T P + F N} .

The Lipschitz gradient

L_{ϕ}

of

\nabla ϕ

is computed by

{2 ∥ A ∥}^{2}

. When the dimension of

A

is so large, it is hard to compute such

L_{ϕ}

. All parameters for each algorithm of our experiments are given in Table 1.

From Table 1, we select the best choice of parameter for each algorithm in order to achieve the highest performance. It is worth noting that all parameters satisfy the assumptions of each convergence theorem, see [6,9,11] for more details. In addition, we set

β = 0.00001

which is a regularized parameter of problem (52). In Algorithm 9, we choose

ξ_{k}, β_{k} = \frac{1}{k + 2}

for experimentation on the breast cancer dataset, while the classification of heart disease uses

ξ_{k} = 0.5

together with

β_{k} = 0.1 .

We compare the performance of each method at the 100th and 500th iterations and obtain the following results, as seen in Table 2 and Table 3, respectively.

Table 2 shows that our algorithm performs the best accuracy at the 100th iteration. Moreover, Table 3 shows the performance of each algorithm at the 500th iteration. We found that Algorithm 9 has a better accuracy than the others.

Next, we show the performance for the prediction of each algorithm in terms of the number of iterations and training times for which each algorithm achieves the highest accuracy.

From Table 4, comparing with Algorithm 1 (BiG-SAM), Algorithm 2 (iBiG-SAM), Algorithm 3 (aiBiG-SAM), Algorithm 4 (miBiG-SAM) and Algorithm 5 (amiBiG-SAM), Algorithm 9 provides a higher value of accuracy for training. In the testing case, we found that the accuracy of Algorithm 2 (iBiG-SAM) is better than our algorithm on the breast cancer experimentation. However, our method has the lowest number of iterations and training times compared with the others.

We also construct a 10-fold cross validation to appraise the performance of each algorithm and use Average accuracy as the appraising tool. It is defined as follows:

A v e r a g e A c c = \sum_{i = 1}^{N} \frac{u_{i}}{v_{i}} \times 100 % / N .

where N is a number of sets considered during the cross validation (

N = 10

),

u_{i}

is a number of correctly predicted data at fold i and

v_{i}

is a number of all data at fold i.

Let Err

_{M} =

sum of errors in all 10 training sets, Err

_{K}

= sum of errors in all 10 testing sets,

M =

sum of all data in 10 training sets and

K =

sum of all data in 10 testing sets. Then,

{Error}_{%} = \frac{{error}_{M %} + {error}_{K %}}{2},

where

{error}_{M %} = \frac{{Err}_{M}}{M} \times 100 %

and

{error}_{K %} = \frac{{Err}_{K}}{K} \times 100 %

We split the data into training sets and testing sets by using the 10-fold cross validation, as seen in Table 5.

In Table 6, we show the average of the accuracy of each algorithm with the 500th iteration.

Table 6 demonstrates that Algorithm 9 performs better than Algorithm 1 (BiG-SAM), Algorithm 2 (iBiG-SAM), Algorithm 3 (aiBiG-SAM), Algorithm 4 (miBiG-SAM) and Algorithm 5 (amiBiG-SAM) in terms of the accuracy in all the experiments conducted.

5. Conclusions

We propose a novel iterative method based on a fixed-point approach with an inertial technique for approximating a common fixed point of two countable families of nonexpansive mappings in a Hilbert space and also present strong convergence theorems. Our algorithm leads to a sequence converging strongly to a solution for convex bilevel optimization problems for which the inner level consists of the minimization of the sum of smooth and nonsmooth functions. Furthermore, we apply the proposed algorithm to the data classification of breast cancer and heart disease datasets and then their performances are assessed and compared with the other algorithms. We derive from the experiment that our algorithm provides a higher value of accuracy of training and testing on various datasets. We can conclude the advantages of our proposed algorithm from our experiments in that it requires a lower number of iterations and less training time compared with the others. It is worth mentioning that our proposed algorithm is intelligent machine learning for the prediction and classification of big data. It is an efficient algorithm that can be developed to software/applications for prediction and classifications in future works. Furthermore, we aim to employ our proposed algorithm for real datasets of the patients at Sriphat Medical Center, Faculty of Medicine, Chiang Mai University, Thailand.

Author Contributions

Conceptualization, S.S.; formal analysis, P.T. and W.I.; investigation, P.T.; methodology, S.S.; supervision, S.S.; validation, T.L. and S.S.; writing—original draft, P.T.; writing—review and editing, S.S. and W.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research has received funding support from the NSRF via the Program Management Unit for Human Resources & Institutional Development, Research and Innovation [grant number B05F640183] and Chiang Mai University. The first and fourth authors were also supported by Thailand Science Research and Innovation under the project IRN62W0007.

Data Availability Statement

All data in this work is obtainable at https://archive.ics.uci.edu, accessed on 12 June 2022.

Acknowledgments

The authors would like to thank the referees for the valuable suggestions. This research has received funding support from the NSRF via the Program Management Unit for Human Resources & Institutional Development, Research and Innovation [grant number B05F640183] and Chiang Mai University. The first and fourth authors were also supported by Thailand Science Research and Innovation under the project IRN62W0007.

Conflicts of Interest

The authors declare no conflict of interest.

References

Darby, S.C.; Ewertz, M.; McGale, P.; Bennet, A.M.; Blom-Goldman, U.; Brønnum, D.; Correa, C.; Cutter, D.; Gagliardi, G.; Gigante, B.; et al. Risk of ischemic heart disease in women after radiotherapy for breast cancer. N. Engl. J. Med. 2013, 368, 987–998. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Solodov, M. An explicit descent method for bilevel convex optimization. J. Convex Anal. 2007, 4, 227–237. [Google Scholar]
Cabot, A. Proximal point algorithm controlled by a slowly vanishing term: Applications to hierarchical minimization. SIAM J. Optim. 2005, 15, 555–572. [Google Scholar] [CrossRef]
Helou, E.S.; Simões, L.E. ϵ-subgradient algorithms for bilevel convex optimization. Inverse Probl. 2017, 33, 5. [Google Scholar] [CrossRef] [Green Version]
Dempe, S.; Kalashnikov, V.; Pérez-Valdés, G.A.; Kalashnykova, N. Bilevel programming problems. In Energy Systems; Springer: Berlin, Germany, 2015. [Google Scholar]
Sabach, S.; Shtern, S. A first order method for solving convex bilevel optimization problems. SIAM J. Optim. 2017, 27, 640–660. [Google Scholar] [CrossRef] [Green Version]
Xu, H.K. Viscosity approximation methods for nonexpansive mappings. J. Math. Anal. Appl. 2004, 298, 279–291. [Google Scholar] [CrossRef] [Green Version]
Beck, A.; Sabach, S. A first order method for finding minimal norm-like solutions of convex optimization problems. Math. Program. 2014, 147, 25–46. [Google Scholar] [CrossRef]
Shehu, Y.; Vuong, P.T.; Zemkoho, A. An inertial extrapolation method for convex simple bilevel optimization. Optim. Methods Softw. 2019, 2019, 1–20. [Google Scholar] [CrossRef]
Polyak, B.T. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–7. [Google Scholar] [CrossRef]
Duan, P.; Zhang, Y. Alternated and multi-step inertial approximation methods for solving convex bilevel optimization problems. Optimization 2022. [Google Scholar] [CrossRef]
Anulekha, D.; Joydeep, D. Optimality Conditions in Convex Optimization: A Finite-Dimensional View; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]
Yao, Y.; Iyiola, O.S.; Shehu, Y. Subgradient extragradient method with double inertial steps for variational inequalities. J. Sci. Comput. 2022, 90, 1–29. [Google Scholar] [CrossRef]
Zhao, X.; Yao, J.C.; Yao, Y. A nonmonotone gradient method for constrained multiobjective optimization problems. J. Nonlinear Var. Anal. 2022, 6, 693–706. [Google Scholar]
Moreau, J.J. Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. Fr. 1965, 93, 273–299. [Google Scholar] [CrossRef]
Combettes, P.L.; Wajs, V.R. Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 2005, 4, 1168–1200. [Google Scholar] [CrossRef] [Green Version]
Byrne, C. Iterative oblique projection onto convex subsets and the split feasibility problem. Inverse Probl. 2002, 18, 441–453. [Google Scholar] [CrossRef]
Byrne, C. Aunified treatment of some iterative algorithms in signal processing and image reconstruction. Inverse Probl. 2004, 20, 103–120. [Google Scholar] [CrossRef] [Green Version]
Cholamjiak, P.; Shehu, Y. Inertial forward-backward splitting method in Banach spaces with application to compressed sensing. Appl. Math. 2019, 64, 409–435. [Google Scholar] [CrossRef]
Kunrada, K.; Pholasa, N.; Cholamjiak, P. On convergence and complexity of the modified forward-backward method involving new linesearches for convex minimization. Math. Meth. Appl. Sci. 2019, 42, 1352–1362. [Google Scholar]
Suantai, S.; Eiamniran, N.; Pholasa, N.; Cholamjiak, P. Three-step projective methods for solving the split feasibility problems. Mathematics 2019, 7, 712. [Google Scholar] [CrossRef] [Green Version]
Suantai, S.; Kesornprom, S.; Cholamjiak, P. Modified proximal algorithms for finding solutions of the split variational inclusions. Mathematics 2019, 7, 708. [Google Scholar] [CrossRef] [Green Version]
Thong, D.V.; Cholamjiak, P. Strong convergence of a forward-backward splitting method with a new step size for solving monotone inclusions. Comput. Appl. Math. 2019, 38, 94. [Google Scholar] [CrossRef]
Thongpaen, P.; Wattanaweekul, R. A fast fixed-point algorithm for convex minimization problems and its application in image restoration problems. Mathematics 2021, 9, 2619. [Google Scholar] [CrossRef]
Mann, W.R. Mean value methods in iteration. Proc. Am. Math. Soc. 1953, 4, 506–510. [Google Scholar] [CrossRef]
Halpern, B. Fixed points of nonexpansive maps. Bull. Am. Math. Soc. 1967, 73, 957–961. [Google Scholar] [CrossRef] [Green Version]
Ishikawa, S. Fixed points by a new iteration method. Proc. Am. Math. Soc. 1974, 44, 147–150. [Google Scholar] [CrossRef]
Phuengrattana, W.; Suantai, S. On the rate of convergence of Mann, Ishikawa, Noor and SP-iterations for continuous functions on an arbitrary interval. J. Comput. Appl. Math. 2011, 235, 3006–3014. [Google Scholar] [CrossRef] [Green Version]
Wongyai, S.; Suantai, S. Convergence Theorem and Rate of Convergence of a New Iterative Method for Continuous Functions on Closed Interval. In Proceedings of the AMM and APAM Conference Proceedings, Bangkok, Thailand, 23–25 May 2016; pp. 111–118. [Google Scholar]
De la Sen, M.; Agarwal, R.P. Common fixed points and best proximity points of two cyclic self-mappings. Fixed Point Theory Appl. 2012, 2012, 1–17. [Google Scholar] [CrossRef] [Green Version]
Shoaib, A. Common fixed point for generalized contraction in b-multiplicative metric spaces with applications. Bull. Math. Anal. Appl. 2020, 12, 46–59. [Google Scholar]
Kim, K.S. A Constructive scheme for a common coupled fixed point problems in Hilbert space. Mathematics 2020, 8, 1717. [Google Scholar] [CrossRef]
Haghi, R.H.; Bakhshi, N. Some coupled fixed point results without mixed monotone property. J. Adv. Math. Stud. 2022, 15, 456–463. [Google Scholar]
Jailoka, P.; Suantai, S.; Hanjing, A. A fast viscosity forward-backward algorithm for convex minimization problems with an application in image recovery. Carpathian J. Math. 2021, 37, 449–461. [Google Scholar] [CrossRef]
Tang, Y.M.; Zhang, L.; Bao, G.Q.; Ren, F.J.; Pedrycz, W. Symmetric implicational algorithm derived from intuitionistic fuzzy entropy. Iran. J. Fuzzy Syst. 2022, 19, 27–44. [Google Scholar]
Mokeddem, S.A. A fuzzy classification model for myocardial infarction risk assessment. Appl. Intell. 2018, 48, 1233–1250. [Google Scholar] [CrossRef]
Nakajo, K.; Shimoji, K.; Takahashi, W. Strong convergence to common fixed points of families of nonexpansive mappings in Banach spaces. J. Nonlinear Convex Anal. 2007, 8, 11–34. [Google Scholar]
Nakajo, K.; Shimoji, K.; Takahashi, W. Strong convergence theorems by the hybrid method for families of nonexpansive mappings in Hilbert spaces. Taiwan J. Math. 2006, 10, 339–360. [Google Scholar] [CrossRef]
Xu, H.K. Another control condition in an iterative method for nonexpansive mappings. Bull. Aust. Math. Soc. 2002, 65, 109–113. [Google Scholar] [CrossRef] [Green Version]
Goebel, K.; Kirk, W.A. Topic in Metric Fixed Point Theory; Cambridge Studies in Advanced Mathematics; Cambridge University Press: Cambridge, UK, 1990; Volume 28. [Google Scholar]
Mainge, P.E. Strong convergence of projected subgradient methods for nonsmooth and nostrictly convex minimization. Set-Valued Anal. 2008, 16, 899–912. [Google Scholar] [CrossRef]
Bussaban, L.; Suantai, S.; Kaewkhao, A. A parallel inertial S-iteration forward-backward algorithm for regression and classification problems. Carpathian J. Math. 2020, 36, 21–30. [Google Scholar] [CrossRef]
Das, G.; Debata, J.P. Fixed point of quasi-non-expansive mappings. Indian J. Pure Appl. Math. 1968, 17, 1263–1269. [Google Scholar]
Takahashi, W.; Tamura, T. Convergence theorems for a pair of non-expansive mappings. J. Convex Anal. 1998, 5, 45–58. [Google Scholar]
Thongpaen, P.; Inthakon, W. Common attractive points of widely more generalized hybrid mappings in Hilbert spaces. Thai J. Math. 2020, 18, 861–869. [Google Scholar]
Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Tikhonov, A.N.; Arsenin, A.Y. Solution of Ill-Posed Problems; John Wiley & Sons: Washington, DC, USA, 1997. [Google Scholar]
Tibshirani, R. Regression shrinkage abd selection via lasso. J. R. Stat. Soc. Ser. B (Method) 1996, 58, 267–288. [Google Scholar]
Street, W.N.; Wolberg, W.H.; Mangasarian, O.L. Nuclear feature extraction for breast tumor diagnosis. Biomed. Image Process. Biomed. Vis. 1993, 1905, 861–870. [Google Scholar]
Detrano, R.; Janosi, A.; Steinbrunn, W.; Pfisterer, M.; Schmid, J.J.; Sandhu, S.; Guppy, K.H.; Lee, S.; Froelicher, V. International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 1989, 64, 304–310. [Google Scholar] [CrossRef]

Table 1. Chosen parameters of each algorithm.

Parameters	Algorithm 9	Algorithm 1	Algorithm 2	Algorithm 3	Algorithm 4	Algorithm 5
$σ$	$\frac{2}{L_{F} + ρ}$	$\frac{2}{L_{F} + ρ}$	$\frac{2}{L_{F} + ρ}$	$\frac{2}{L_{F} + ρ}$	$\frac{2}{L_{F} + ρ}$	$\frac{2}{L_{F} + ρ}$
$λ$	-	$\frac{1}{L_{ϕ}}$	$\frac{1}{L_{ϕ}}$	$\frac{1}{L_{ϕ}}$	-	-
$λ_{k}$	$\frac{1}{L_{ϕ}}$	-	-	-	$\frac{1}{L_{ϕ}}$	$\frac{1}{L_{ϕ}}$
$α$	-	-	3	3	3	3
$α_{k}$	$\frac{1}{50 k}$	$\frac{1}{k + 2}$	$\frac{1}{50 k}$	$\frac{1}{k + 2}$	$\frac{1}{k + 2}$	$\frac{1}{k + 2}$
$θ_{k}$	$\frac{k}{k + 1}$	-	-	-	-	-
$η_{k}$	$\frac{10^{50}}{k^{2}}$	-	$\frac{10^{50}}{k^{2}}$	$\frac{α_{k}}{k^{0.01}}$	$\frac{α_{k}}{k^{0.01}}$	$\frac{α_{k}}{k^{0.01}}$
q	-	-	-	-	4	4

Table 2. The performance of each algorithm at 100th iteration on each dataset.

Dataset	Algorithm	Pre Train	Rec Train	Pre Test	Rec Test	Acc Train	Acc Test
Breast Cancer	Algorithm 1	0.8845	0.9812	0.9718	1.0000	90.4082	98.0861
	Algorithm 2	0.9686	0.9625	0.9857	1.0000	95.5102	99.0431
	Algorithm 3	0.8845	0.9812	0.9718	1.0000	90.4082	98.0861
	Algorithm 4	0.8966	0.9750	0.9718	1.0000	91.0204	98.0861
	Algorithm 5	0.8845	0.9812	0.9718	1.0000	90.4082	98.0861
	Algorithm 9	0.9747	0.9625	0.9857	1.0000	95.9184	99.0431
Heart Disease	Algorithm 1	0.7656	0.8522	0.7647	0.7800	77.6190	75.2688
	Algorithm 2	0.8306	0.8957	0.7719	0.8800	84.2857	79.5699
	Algorithm 3	0.7656	0.8522	0.7647	0.7800	77.6190	75.2688
	Algorithm 4	0.8049	0.8609	0.7593	0.8200	80.9524	76.3441
	Algorithm 5	0.7656	0.8522	0.7647	0.7800	77.6190	75.2688
	Algorithm 9	0.8268	0.9130	0.7667	0.9200	84.7619	80.6452

Table 3. The performance of each algorithm at 500th iteration on each dataset.

Dataset	Algorithm	Pre Train	Rec Train	Pre Test	Rec Test	Acc Train	Acc Test
Breast Cancer	Algorithm 1	0.9506	0.9625	0.9857	1.0000	94.2857	99.0431
	Algorithm 2	0.9778	0.9625	0.9928	1.0000	96.1224	99.5215
	Algorithm 3	0.9506	0.9625	0.9857	1.0000	94.2857	99.0431
	Algorithm 4	0.9536	0.9625	0.9857	1.0000	94.4898	99.0431
	Algorithm 5	0.9506	0.9625	0.9857	1.0000	94.2857	99.0431
	Algorithm 9	0.9778	0.9625	0.9928	1.0000	96.1224	99.5215
Heart Disease	Algorithm 1	0.8065	0.8696	0.7679	0.8600	81.4286	78.4946
	Algorithm 2	0.8455	0.9043	0.7797	0.9200	85.7143	81.7204
	Algorithm 3	0.8065	0.8696	0.7679	0.8600	81.4286	78.4946
	Algorithm 4	0.8115	0.8609	0.7544	0.8600	81.4286	77.4194
	Algorithm 5	0.8065	0.8696	0.7679	0.8600	81.4286	78.4946
	Algorithm 9	0.8455	0.9043	0.7833	0.9400	85.7143	82.7957

Table 4. The iteration number and training time of each algorithm with the highest accuracy on each dataset.

Dataset	Algorithm	Iteration No.	Training Time	Acc Train	Acc Test
Breast Cancer	Algorithm 1	819	0.0272	95.1020	99.0431
	Algorithm 2	264	0.0095	96.1224	99.5215
	Algorithm 3	819	0.0267	95.1020	99.0431
	Algorithm 4	531	0.0320	95.1020	99.0431
	Algorithm 5	819	0.0330	95.1020	99.0431
	Algorithm 9	78	0.0054	96.1224	99.0431
Heart Disease	Algorithm 1	2024	0.0293	86.1905	79.5699
	Algorithm 2	556	0.0096	86.1905	81.7204
	Algorithm 3	2024	0.0452	86.1905	79.5699
	Algorithm 4	1226	0.0517	86.1905	79.5699
	Algorithm 5	1398	0.0317	85.7143	78.4946
	Algorithm 9	192	0.0064	86.1905	82.7957

Table 5. Number of samples in each fold for all datasets.

	Breast Cancer		Heart Disease
	Train	Test	Train	Test
Fold 1	630	69	273	30
Fold 2	629	70	272	31
Fold 3	629	70	272	31
Fold 4	629	70	272	31
Fold 5	629	70	273	30
Fold 6	629	70	273	30
Fold 7	629	70	273	30
Fold 8	629	70	273	30
Fold 9	629	70	273	30
Fold 10	629	70	273	30

Table 6. Average accuracy of each algorithm at 500th iteration with 10-fold cross validation.

Algorithm	Breast Cancer			Heart Disease
Algorithm	Acc Train	Acc Test	Error $_{%}$	Acc Train	Acc Test	Error $_{%}$
Algorithm 1	95.8989	95.9876	4.0534	79.8319	78.5484	20.8104
Algorithm 2	96.7094	96.9876	3.1474	85.0318	82.8387	16.0616
Algorithm 3	95.8989	95.9876	4.0534	79.8319	78.5484	20.8104
Algorithm 4	96.0420	96.1304	3.9103	80.9683	80.5269	19.2519
Algorithm 5	95.8989	95.9876	4.0534	79.8319	78.5484	20.8104
Algorithm 9	96.7889	97.4182	2.8930	85.8148	82.8387	15.9883

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Thongpaen, P.; Inthakon, W.; Leerapun, T.; Suantai, S. A New Accelerated Algorithm for Convex Bilevel Optimization Problems and Applications in Data Classification. Symmetry 2022, 14, 2617. https://doi.org/10.3390/sym14122617

AMA Style

Thongpaen P, Inthakon W, Leerapun T, Suantai S. A New Accelerated Algorithm for Convex Bilevel Optimization Problems and Applications in Data Classification. Symmetry. 2022; 14(12):2617. https://doi.org/10.3390/sym14122617

Chicago/Turabian Style

Thongpaen, Panadda, Warunun Inthakon, Taninnit Leerapun, and Suthep Suantai. 2022. "A New Accelerated Algorithm for Convex Bilevel Optimization Problems and Applications in Data Classification" Symmetry 14, no. 12: 2617. https://doi.org/10.3390/sym14122617

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Accelerated Algorithm for Convex Bilevel Optimization Problems and Applications in Data Classification

Abstract

1. Introduction

2. Preliminaries

3. Main Results

4. Application to Convex Bilevel Optimization Problems

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI