A Novel Inertial Viscosity Algorithm for Bilevel Optimization Problems Applied to Classification Problems

Janngam, Kobkoon; Suantai, Suthep; Cho, Yeol Je; Kaewkhao, Attapol; Wattanataweekul, Rattanakorn

doi:10.3390/math11143241

Open AccessArticle

A Novel Inertial Viscosity Algorithm for Bilevel Optimization Problems Applied to Classification Problems

¹

Graduate Ph.D. Degree Program in Mathematics, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand

²

Research Center in Optimization and Computational Intelligence for Big Data Prediction, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand

³

Department of Mathematics Education, Gyeogsang National University, Jinju 52828, Republic of Korea

⁴

Department of Mathematics, Statistics and Computer, Faculty of Science, Ubon Ratchathani University, Ubon Ratchathani 34190, Thailand

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(14), 3241; https://doi.org/10.3390/math11143241

Submission received: 6 June 2023 / Revised: 16 July 2023 / Accepted: 19 July 2023 / Published: 24 July 2023

(This article belongs to the Special Issue Fixed Point Theory and Its Applications in Nonlinear Analysis and Optimization)

Download Versions Notes

Abstract

:

Fixed-point theory plays many important roles in real-world problems, such as image processing, classification problem, etc. This paper introduces and analyzes a new, accelerated common-fixed-point algorithm using the viscosity approximation method and then employs it to solve convex bilevel optimization problems. The proposed method was applied to data classification with the Diabetes, Heart Disease UCI and Iris datasets. According to the data classification experiment results, the proposed algorithm outperformed the others in the literature.

Keywords:

classification problems; convex bilevel optimization; forward–backward algorithm

MSC:

47H09; 90C25; 65K10

1. Introduction

Many real-world hierarchical problems can be modeled mathematically as bilevel problems and appear in many practical applications. They are often encountered in the fields of production and capacity planning [1,2], traffic and transportation [3,4], chemistry [5,6] and management science [7,8], as well as energy networks and markets [9,10]. In addition, Nimana et al. [11] proposed an algorithm combining the incremental proximal gradient method with a smooth penalization technique to solve convex bilevel problems and applied it to image inpainting and binary classification problem.

Nowadays, we are in a world with various types of big data. In order to obtain the benefits of such data, we need to integrate advanced knowledge concerning both theory and methods from many areas, such as mathematics, computer science, statistics, medicine, etc. In mathematics, optimization plays a very important role in classifying and predicting large amounts of data because it can provide deep machine learning algorithms with high accuracy. Among optimization models for machine learning, the bilevel optimization model approach is an efficient one that makes it possible to create intelligent machine learning algorithms for data prediction and classification.

In this work, we study a bilevel problem that is an optimization problem where the constraint is another optimization problem. This problem is formulated as follows:

\begin{matrix} min_{x \in S_{*}} ω (x), \end{matrix}

(1)

where

ω : R^{n} \to R

is assumed to be strongly convex and differentiable, while

S_{*}

is a nonempty set of inner-level problem minimizers given by

\begin{matrix} min_{x \in R^{n}} {f (x) + g (x)}, \end{matrix}

(2)

where

f : R^{n} \to R

is a differentiable and convex function such that

\nabla f

is

L_{f}

-Lipschitz-continuous and

g : R^{n} \to R \cup {\infty}

is a lower-semicontinuous, proper, convex function.

It can be observed that the above bilevel optimization model contains both inner- and outer-level minimization problems (Equations (1) and (2)). Normally, the minimization problem in Equation (2) can be applied to data prediction and classification; see [12,13]. However, among the solutions to the inner-level problem (Equation (2)), we use the objective function

ω

to select solutions that are minimizers of

ω

. This method can provide more accuracy for data prediction and classification than Equation (2) alone.

The inner-level optimization problem is a constraint on the outer-level optimization problem. There are several algorithms for solving the problem in Equation (2); see [12,14,15].

The proximal gradient (PG) method, also known as the proximal forward–backward technique, is the basic algorithm used to solve the problem in Equation (2) (see [16,17]). It is defined by

\begin{matrix} x_{n + 1} = p r o x_{α_{n} g} (I - α_{n} \nabla f) (x_{n}), \end{matrix}

(3)

where

α_{n} > 0

is the step size,

p r o x_{g}

is the proximity operator of g and

\nabla f

is the gradient of f. The algorithm in Equation (3), which is also known as the forward–backward splitting algorithm (FBSA) [14], is suitable to solve Equation (2) if

\nabla f

is L-Lipschitz-continuous. The FBSA is also called an iterative denoising method [18] or a fixed-point continuation algorithm [19].

One of the most well-known first-order optimization schemes is the fast iterative shrinkage-thresholding algorithm (FISTA). Beck and Teboulle [15] proposed the FISTA to solve the problem in Equation (2) by using an inertial technique as follows:

\begin{matrix} \{\begin{matrix} w_{n} = p r o x_{\frac{1}{L} g} (I - \frac{1}{L} \nabla f) (x_{n}), \\ p_{n + 1} = (1 + \sqrt{1 + 4 p_{n}^{2}}) / 2, \\ x_{n + 1} = y_{n} + (\frac{p_{n} - 1}{p_{n + 1}}) (w_{n} - w_{n - 1}), n \geq 1, \end{matrix} \end{matrix}

(4)

where

x_{1} = w_{0} \in R^{N}

and

p_{1} = 1

. They applied the FISTA to image restoration problems and showed that the rate of convergence of the FISTA was better than other existing algorithms. The generated sequence’s weak convergence was then proved by Liang and Schonlieb [20], who modified the FISTA by setting

p_{n + 1} = (u + \sqrt{v + s p_{n}^{2}}) / 2

, where

u, v > 0

and

0 < s \leq 4

.

It may be noticed that the convex minimization problem and fixed-point problem are related. If

0 < α < 2 / L

, then we know that a forward–backward operator

T : = p r o x_{α g} (I - α \nabla f)

is nonexpansive. It is known that

F i x (T) = a r g m i n {f (x) + g (x)}

. Fixed-point problems with nonexpansive mappings have been investigated by many authors using the method of viscosity approximation [21,22,23,24]. This method provides a strong convergence result and it is defined by the following:

\begin{matrix} x_{n + 1} = β_{n} S (x_{n}) + (1 - β_{n}) T x_{n}, n \geq 1, \end{matrix}

(5)

where

x_{1} \in H

,

S : H \to H

is a contraction when H is a Hilbert space and

{β_{n}} \in (0, 1)

. We can also call Equation (5) the viscosity forward–backward algorithm if

T : = p r o x_{α g} (I - α \nabla f)

.

In 2014, Beck et al. [25] introduced a new, direct first-order method to solve the problem in Equation (1) and established its convergence results under some suitable conditions, as well as the rate of convergence of the sequence of function values. After that, Sabach and Shtern [26] proposed the following algorithm, called the bilevel gradient sequential averaging method (BiG-SAM), to solve the problems in Equations (1) and (2). The iterative process can be defined as follows

\begin{matrix} \{\begin{matrix} u_{n} = p r o x_{c g} (x_{n - 1} - c \nabla f (x_{n - 1})), \\ v_{n} = x_{n - 1} - λ \nabla ω (x_{n - 1}), \\ x_{n + 1} = γ_{n} v_{n} + (1 - γ_{n}) u_{n}, n \geq 1, \end{matrix} \end{matrix}

(6)

where

x_{0} \in R^{n},

c \in (0, 1 / L_{f}],

λ \in (0, 2 / (L_{ω} + σ)],

in which

L_{f}

and

L_{ω}

are the Lipschitz constants of

\nabla f

and

\nabla ω

, and

{γ_{n}}

satisfies certain conditions from [22]. In terms of the values of the inner objective function, the authors of [22] studied and analyzed the convergence behavior of the BiG-SAM with a nonasymptotic

O (1 / n)

global rate of convergence.

In 2019, Shehu et al. [27] introduced an inertial extrapolation step into BiG-SAM (Equation (6)), calling the result the inertial bilevel gradient sequential averaging method (iBiG-SAM), to solve the problems in Equations (1) and (2). This iterative scheme is defined by

\begin{matrix} \{\begin{matrix} s_{n} = x_{n} + θ_{n} (x_{n} - x_{n - 1}), \\ u_{n} = p r o x_{c g} (I - c \nabla f) (s_{n}), \\ v_{n} = s_{n} - λ \nabla ω (s_{n}), \\ x_{n + 1} = γ_{n} v_{n} + (1 - γ_{n}) u_{n}, n \geq 1 . \end{matrix} \end{matrix}

(7)

In their study, they presented a strong convergence analysis of an inertial algorithm that can be used to approximate fixed points of nonexpansive mappings in infinite-dimensional real Hilbert spaces. Furthermore, they converted the bilevel optimization problems into a fixed-point problem of nonexpansive mappings and showed its convergence under certain conditions.

In 2022, Duan and Zhang [28] introduced an alternated inertial step into BiG-SAM to create an alternated inertial bilevel gradient sequential averaging method (aiBiG-SAM) for solving convex bilevel optimization problems. It is defined as

\begin{matrix} s_{n} = \{\begin{matrix} x_{n} if n is even, \\ x_{n} + θ_{n} (x_{n} - x_{n - 1}) if n is odd, \end{matrix} \end{matrix}

and

\begin{matrix} \{\begin{matrix} u_{n} = p r o x_{c g} (I - c \nabla f) (s_{n}), \\ v_{n} = s_{n} - λ \nabla ω (s_{n}), \\ x_{n + 1} = γ_{n} v_{n} + (1 - γ_{n}) u_{n}, n \geq 1 . \end{matrix} \end{matrix}

(8)

They proved that the aiBiG-SAM converges strongly to a solution for the problem and extended the method into a more general alternating inertial acceleration method.

Recently, in [29,30], the authors proposed new bilevel optimization methods within the framework of Hilbert spaces and proved the strong convergence of their algorithms using the viscosity approximation technique.

In this paper, motivated by these results, we present a novel accelerated algorithm using the viscosity approximation method and the inertial parameter of the FISTA to solve the convex bilevel optimization problem. We then demonstrate the efficacy of this algorithm in solving data classification problems.

The paper is organized as follows. In Section 2, we present the preliminaries in terms of definitions, notations and lemmas for proving the main results. The new accelerated viscosity-type algorithm is introduced and studied, and then we apply it to solving the convex bilevel optimization problems described in Section 3. Then, in Section 4, we present mathematical models for the classification of datasets and the application of the results obtained in the previous section, and we provide numerical experimental results in Section 5. Finally, we present the conclusions and future work in Section 6.

2. Preliminaries

In this section, we present fundamental ideas and principles that will be utilized in the the rest of the research.

Throughout the present paper, H denotes a real Hilbert space with norm

∥ \cdot ∥

and inner product

〈 \cdot, \cdot 〉

,

R

is the set of real numbers and

N

is the set of positive integers. I denotes the identity operator on H. Let C be a nonempty subset of H and let T be a mapping of C into itself. The strong convergence of a sequence

{x_{n}}

in H to

x \in H

is denoted by

x_{n} \to x

, weak convergence by

x_{n} ⇀ x

and

F i x (T)

symbolizes the set of all fixed points of T.

For this work, nonlinear mappings from the following classes were essentially needed.

Definition 1.

The mapping

T : C \to C

is said to be L-Lipschitz with

L \geq 0

if

∥ T u - T v ∥ \leq L ∥ u - v ∥

for all

u, v \in C

.

An L-Lipschitz mapping T is said to be a contraction mapping if

L \in [0, 1)

, and it is nonexpansive if

L = 1

.

Definition 2

([31]). Let T be a nonexpansive mapping of C into itself and let

T_{n} : C \to C

be a family of nonexpansive mappings such that

\emptyset \neq F i x (T) \subset Γ : = ⋂_{n = 1}^{\infty} F i x (T_{n})

, where

F i x (T_{n}

) is the set of all fixed points of

T_{n}

for each

n \geq 1

. Then,

{T_{n}}

is said to satisfy the NST-condition (I) with T if, for any bounded sequence

{x_{n}} \subset C

,

lim_{n \to \infty} ∥ x_{n} - T_{n} x_{n} ∥ = 0 implies lim_{n \to \infty} ∥ x_{n} - T x_{n} ∥ = 0 .

Definition 3

([32,33]). For any bounded sequence

{u_{n}}

in H, a family

{T_{n}}

of nonexpansive mappings

T_{n} : C \to C

with

⋂_{n = 1}^{\infty} F i x (T_{n}) \neq \emptyset

is said to satisfy the condition (Z) if

lim_{n \to \infty} ∥ u_{n} - T_{n} u_{n} ∥ = 0 .

Then, every weak cluster point of

{u_{n}}

∈

⋂_{n = 1}^{\infty} F i x (T_{n})

.

Using the demicloseness of

I - T

where

T : C \to C

is a nonexpansive mapping, we obtain the following remark.

Remark 1.

Let T be a nonexpansive mapping. Then,

{T_{n}}

satisfies the condition (Z) if

{T_{n}}

is a family of nonexpansive mappings that satisfies NST-condition (I) with T.

The metric projection

P_{C}

from H onto C is defined by

P_{C} x = a r g m i n {∥ x - y ∥ : y \in C}

for all

x \in H

, where C is a nonempty closed convex subset of H. It is known that

v = P_{C} x

if and only if

〈 x - v, y - v 〉 \leq 0

for all

y \in C

.

Let us recall the definition of the proximity operator and its properties.

Definition 4

([34,35]). Let

f : H \to R \cup {\infty}

be a convex, proper and lower-semicontinuous function. The proximity operator of f, denoted by

p r o x_{f}

, is defined as follows:

\begin{matrix} p r o x_{f} = min_{y \in H} f (y) + \frac{1}{2} {∥ x - y ∥}^{2} \end{matrix}

and it can be formulated in the equivalent form:

p r o x_{f} = {(I + \partial f)}^{- 1},

where

\partial f

is the subdifferential of f defined by

\partial f (x) : = {v \in H : f (x) + 〈 v, u - x 〉 \leq f (u) for all u \in H}

for all

x \in H

. For

ρ > 0

, we also know that

p r o x_{ρ f}

is firmly nonexpansive and

F i x (p r o x_{ρ f}) = A r g m i n f : = {v \in H : f (v) \leq f (u) for all u \in H} .

Let C be closed convex with

\emptyset \neq C \subset H

. In particular, if

f : = i_{C}

, the indicator function on C is defined by

\begin{matrix} i_{C} (x) = \{\begin{matrix} 0 if x \in C, \\ \infty otherwise . \end{matrix} \end{matrix}

Then,

p r o x_{ρ f} = P_{C}

.

The following lemmas are well known; see [13,36,37].

Lemma 1

([13]). Let

g : H \to R \cup {\infty}

be a lower-semicontinuous, proper and convex function and let

f : H \to R

be differentiable and convex such that

\nabla f

is L-Lipschitz-continuous. Let

T_{n} : = p r o x_{ρ_{n} g} (I - ρ_{n} \nabla f) and T : = p r o x_{ρ g} (I - ρ \nabla f),

where

ρ_{n}, ρ \in (0, 2 / L)

with

ρ_{n} \to ρ

as

n \to \infty

. Then,

{T_{n}}

satisfies the NST-condition (I) with T.

Lemma 2

([36]). Let

η, μ \in H

and

ζ \in [0, 1]

. Then, the following properties hold for H:

(1): ${∥ η + μ ∥}^{2} \leq {∥ η ∥}^{2} + 2 〈 μ, η + μ 〉$ ;
(2): ${∥ η \pm μ ∥}^{2} = {∥ η ∥}^{2} \pm 2 〈 η, μ 〉 + {∥ μ ∥}^{2}$ ;
(3): ${∥ ζ η + (1 - ζ) μ ∥}^{2} = {ζ ∥ η ∥}^{2} + {(1 - ζ) ∥ μ ∥}^{2} - ζ (1 - ζ) {∥ η - μ ∥}^{2}$ .

Lemma 3

([37]). Let

{c_{n}} \subset R_{+},

{b_{n}} \subset R

and

{t_{n}} \subset (0, 1)

such that

\sum_{n = 1}^{\infty} t_{n} = \infty

. Suppose that

\begin{matrix} c_{n + 1} \leq (1 - t_{n}) c_{n} + t_{n} b_{n} \end{matrix}

for all

n \in N

. If

{lim sup}_{i \to \infty} b_{n_{i}} \leq 0

, and for any subsequence

{c_{n_{i}}}

of

{c_{n}}

satisfying

\begin{matrix} \underset{i \to \infty}{lim inf} (c_{n_{i} + 1} - c_{n_{i}}) \geq 0, \end{matrix}

then

{lim}_{n \to \infty} c_{n} = 0

.

In the next section, we introduce an inertial viscosity modified SP algorithm and its application to the convex bilevel optimization problem.

3. Proposed Method

We first present a new inertial viscosity algorithm and prove a strong convergence theorem under mild conditions as follows.

Let C be closed convex with

\emptyset \neq C \subset H

and the mapping

S : C \to C

be a k-contraction where

0 < k < 1

. Let

{T_{n}}

be a family of nonexpansive mappings of C onto itself satisfying the condition (Z) such that

Γ : = ⋂_{n = 1}^{\infty} F i x (T_{n}) \neq \emptyset

.

Many mathematicians often use inertial-type extrapolation [38,39] in optimization problems to speed up the convergence of iterative methods by using the technical term

θ_{n} (x_{n} - x_{n - 1})

. The momentum

x_{n} - x_{n - 1}

is controlled by the parameter

θ_{n}

, also known as an inertial parameter.

In 2012, Phuengrattana and Suantai [40] introduced an SP algorithm and showed that its convergence behavior is better than Mann and Ishikawa iterations [41,42]. By using the idea of the SP algorithm, in this paper, we introduce an inertial viscosity modified SP algorithm (IVMSPA) for obtaining a common fixed point for

{T_{n}}

as follows.

The following theorem establishes strong convergence for the proposed algorithm.

Theorem 1.

A sequence

{x_{n}}

generated by Algorithm 1 converges strongly to an element

\overset{˘}{a} \in Γ

, where

\overset{˘}{a} = P_{Γ} S (\overset{˘}{a})

, provided that the sequences

{α_{n}}

,

{β_{n}}

,

{γ_{n}}

and

{τ_{n}}

satisfy the following conditions:

(C1): $0 < a_{1} \leq β_{n} \leq a_{2} < 1$ ;
(C2): $0 < α_{n}$ , $γ_{n} < 1$ , ${lim}_{n \to \infty} α_{n} = 0$ and $\sum_{n = 1}^{\infty} α_{n} = \infty$ ;
(C3): ${lim}_{n \to \infty} τ_{n} = 0 .$

Algorithm 1 An Inertial Viscosity Modified SP Algorithm (IVMSPA)

Initialization: Let

{α_{n}}

,

{β_{n}}

,

{γ_{n}}

and

{τ_{n}}

be sequences of positive real numbers. Take

x_{0}

,

x_{1} \in H

arbitrarily.
Iterative steps: For

n \geq 1,

calculate

x_{n + 1}

as follows:
Step 1. Compute an inertial parameter

\begin{matrix} θ_{n} = \{\begin{matrix} min \{\frac{p_{n} - 1}{p_{n + 1}}, \frac{α_{n} τ_{n}}{∥ x_{n} - x_{n - 1} ∥}\} & if x_{n} \neq x_{n - 1}, \\ \frac{p_{n} - 1}{p_{n + 1}} & otherwise, \end{matrix} \end{matrix}

where

p_{1} = 1

and

p_{n + 1} = \frac{1 + \sqrt{1 + 4 p_{n}^{2}}}{2}

.
Step 2. Compute

\begin{matrix} \begin{matrix} y_{n} = x_{n} + θ_{n} (x_{n} - x_{n - 1}), \\ z_{n} = (1 - α_{n}) y_{n} + α_{n} S (y_{n}), \\ w_{n} = (1 - β_{n}) z_{n} + β_{n} T_{n} z_{n}, \\ x_{n + 1} = (1 - γ_{n}) w_{n} + γ_{n} T_{n} w_{n} . \end{matrix} \end{matrix}

Proof.

Let

\overset{˘}{a} = P_{Γ} S (\overset{˘}{a})

. Then,

\overset{˘}{a} \in ⋂_{n = 1}^{\infty} F (T_{n})

. First of all, we show that

{x_{n}}

is bounded. From Algorithm 1, we have

\begin{matrix} ∥ w_{n} - \overset{˘}{a} ∥ & \leq (1 - β_{n}) ∥ z_{n} - \overset{˘}{a} ∥ + β_{n} ∥ T_{n} z_{n} - \overset{˘}{a} ∥ \\ \leq (1 - β_{n}) ∥ z_{n} - \overset{˘}{a} ∥ + β_{n} ∥ z_{n} - \overset{˘}{a} ∥ \\ = ∥ z_{n} - \overset{˘}{a} ∥ \end{matrix}

(9)

and

\begin{matrix} ∥ x_{n + 1} - \overset{˘}{a} ∥ & \leq γ_{n} ∥ w_{n} - \overset{˘}{a} ∥ + (1 - γ_{n}) ∥ T_{n} w_{n} - \overset{˘}{a} ∥ \\ \leq γ_{n} ∥ w_{n} - \overset{˘}{a} ∥ + (1 - γ_{n}) ∥ w_{n} - \overset{˘}{a} ∥ \\ = ∥ w_{n} - \overset{˘}{a} ∥ \\ \leq ∥ z_{n} - \overset{˘}{a} ∥ . \end{matrix}

(10)

From the definition of

y_{n}

and

z_{n}

, we obtain

\begin{matrix} ∥ z_{n} - \overset{˘}{a} ∥ & \leq α_{n} ∥ S (y_{n}) - \overset{˘}{a} ∥ + (1 - α_{n}) ∥ y_{n} - \overset{˘}{a} ∥ \\ \leq α_{n} ∥ S (y_{n}) - S (\overset{˘}{a}) ∥ + α_{n} ∥ S (\overset{˘}{a}) - \overset{˘}{a} ∥ + (1 - α_{n}) ∥ y_{n} - \overset{˘}{a} ∥ \\ \leq α_{n} k ∥ y_{n} - \overset{˘}{a} ∥ + α_{n} ∥ S (\overset{˘}{a}) - \overset{˘}{a} ∥ + (1 - α_{n}) ∥ y_{n} - \overset{˘}{a} ∥ \\ = [1 - α_{n} (1 - k)] ∥ y_{n} - \overset{˘}{a} ∥ + α_{n} ∥ S (\overset{˘}{a}) - \overset{˘}{a} ∥ \\ \leq [1 - α_{n} (1 - k)] [∥ x_{n} - \overset{˘}{a} ∥ + θ_{n} ∥ x_{n - 1} - x_{n} ∥] + α_{n} ∥ S (\overset{˘}{a}) - \overset{˘}{a} ∥ \\ \leq [1 - α_{n} (1 - k)] ∥ x_{n} - \overset{˘}{a} ∥ + α_{n} (\frac{θ_{n}}{α_{n}} ∥ x_{n - 1} - x_{n} ∥ + ∥ S (\overset{˘}{a}) - \overset{˘}{a} ∥) . \end{matrix}

From (C3), we have

\begin{matrix} \frac{θ_{n}}{α_{n}} ∥ x_{n - 1} - x_{n} ∥ \to 0 as n \to \infty . \end{matrix}

(11)

From Equation (11), we know that there exists

M > 0

such that

\frac{θ_{n}}{α_{n}} ∥ x_{n - 1} - x_{n} ∥ \leq M

for all

n \geq 1

. Thus,

\begin{matrix} ∥ z_{n} - \overset{˘}{a} ∥ & \leq [1 - (1 - k) α_{n}] ∥ x_{n} - \overset{˘}{a} ∥ + (1 - k) α_{n} (\frac{∥ S (\overset{˘}{a}) - \overset{˘}{a} ∥ + M}{1 - k}) \\ \leq max \{∥ x_{n} - \overset{˘}{a} ∥, \frac{∥ S (\overset{˘}{a}) - \overset{˘}{a} ∥ + M}{1 - k}\} . \end{matrix}

From Equation (10) and the above inequality, we obtain

\begin{matrix} ∥ x_{n + 1} - \overset{˘}{a} ∥ \leq max \{∥ x_{n} - \overset{˘}{a} ∥, \frac{∥ S (\overset{˘}{a}) - \overset{˘}{a} ∥ + M}{1 - k}\} . \end{matrix}

Using mathematical induction, we have

\begin{matrix} ∥ x_{n} - \overset{˘}{a} ∥ \leq max \{∥ x_{1} - \overset{˘}{a} ∥, \frac{∥ S (\overset{˘}{a}) - \overset{˘}{a} ∥ + M}{1 - k}\} \end{matrix}

for all

n \geq 1

. It follows that

{x_{n}}

is bounded and, hence,

{z_{n}}

is bounded. According to part (3) of Lemma 2, we obtain

\begin{matrix} ∥ x_{n + 1} - \overset{˘}{a} ∥^{2} & = γ_{n} ∥ T_{n} w_{n} - \overset{˘}{a} ∥^{2} + (1 - γ_{n}) ∥ w_{n} - \overset{˘}{a} ∥^{2} - (1 - γ_{n}) γ_{n} {∥ w_{n} - T_{n} w_{n} ∥}^{2} \\ \leq (1 - γ_{n}) ∥ w_{n} - \overset{˘}{a} ∥^{2} + γ_{n} {∥ w_{n} - \overset{˘}{a} ∥}^{2} \\ = ∥ w_{n} - \overset{˘}{a} ∥^{2} \end{matrix}

(12)

and

\begin{matrix} ∥ w_{n} - \overset{˘}{a} ∥^{2} & = β_{n} ∥ T_{n} z_{n} - \overset{˘}{a} ∥^{2} + (1 - β_{n}) ∥ z_{n} - \overset{˘}{a} ∥^{2} - (1 - β_{n}) β_{n} {∥ z_{n} - T_{n} z_{n} ∥}^{2} \\ = ∥ z_{n} - \overset{˘}{a} ∥^{2} - (1 - β_{n}) β_{n} {∥ z_{n} - T_{n} z_{n} ∥}^{2} . \end{matrix}

(13)

Using Lemma 2, we obtain

\begin{matrix} ∥ z_{n} - \overset{˘}{a} ∥^{2} & \leq ∥ (1 - α_{n}) (y_{n} - \overset{˘}{a}) + α_{n} (S (y_{n}) - S (\overset{˘}{a})) ∥^{2} + 2 α_{n} 〈 S (\overset{˘}{a}) - \overset{˘}{a}, z_{n} - \overset{˘}{a} 〉 \\ \leq (1 - α_{n}) ∥ y_{n} - \overset{˘}{a} ∥^{2} + α_{n} {∥ S (y_{n}) - S (\overset{˘}{a}) ∥}^{2} + 2 α_{n} 〈 S (\overset{˘}{a}) - \overset{˘}{a}, z_{n} - \overset{˘}{a} 〉 \\ \leq (1 - α_{n}) ∥ y_{n} - \overset{˘}{a} ∥^{2} + α_{n} k {∥ y_{n} - \overset{˘}{a} ∥}^{2} + 2 α_{n} 〈 S (\overset{˘}{a}) - \overset{˘}{a}, z_{n} - \overset{˘}{a} 〉 \\ = [1 - α_{n} (1 - k)] {∥ y_{n} - \overset{˘}{a} ∥}^{2} + 2 α_{n} 〈 S (\overset{˘}{a}) - \overset{˘}{a}, z_{n} - \overset{˘}{a} 〉 \end{matrix}

(14)

and

\begin{matrix} ∥ y_{n} - \overset{˘}{a} ∥^{2} & \leq ∥ x_{n} - \overset{˘}{a} ∥^{2} + 2 θ_{n} ∥ x_{n} - \overset{˘}{a} ∥ ∥ x_{n - 1} - x_{n} ∥ + θ_{n}^{2} {∥ x_{n - 1} - x_{n} ∥}^{2} . \end{matrix}

(15)

From Equations (12)–(15), we obtain

\begin{matrix} ∥ x_{n + 1} - \overset{˘}{a} ∥^{2} & \leq ∥ z_{n} - \overset{˘}{a} ∥^{2} - (1 - β_{n}) β_{n} {∥ z_{n} - T_{n} z_{n} ∥}^{2} \\ = [1 - (1 - k) α_{n}] {∥ y_{n} - \overset{˘}{a} ∥}^{2} + 2 α_{n} 〈 S (\overset{˘}{a}) - \overset{˘}{a}, z_{n} - \overset{˘}{a} 〉 \\ - (1 - β_{n}) β_{n} {∥ z_{n} - T_{n} z_{n} ∥}^{2} \\ \leq [1 - (1 - k) α_{n}] ∥ x_{n} - \overset{˘}{a} ∥^{2} + 2 θ_{n} ∥ x_{n} - \overset{˘}{a} ∥ ∥ x_{n - 1} - x_{n} ∥ + θ_{n}^{2} {∥ x_{n - 1} - x_{n} ∥}^{2} \\ + 2 α_{n} 〈 S (\overset{˘}{a}) - \overset{˘}{a}, z_{n} - \overset{˘}{a} 〉 - (1 - β_{n}) β_{n} {∥ z_{n} - T_{n} z_{n} ∥}^{2} \\ = [1 - (1 - k) α_{n}] ∥ x_{n} - \overset{˘}{a} ∥^{2} - (1 - β_{n}) β_{n} {∥ z_{n} - T_{n} z_{n} ∥}^{2} + (1 - k) α_{n} b_{n}, \end{matrix}

(16)

where

\begin{matrix} b_{n} & = \frac{1}{1 - k} [2 〈 S (\overset{˘}{a}) - \overset{˘}{a}, z_{n} - \overset{˘}{a} 〉 + θ_{n} ∥ x_{n - 1} - x_{n} ∥ (\frac{θ_{n}}{α_{n}} ∥ x_{n - 1} - x_{n} ∥) \\ + 2 ∥ x_{n} - \overset{˘}{a} ∥ (\frac{θ_{n}}{α_{n}} ∥ x_{n - 1} - x_{n} ∥)] . \end{matrix}

It follows that

\begin{matrix} (1 - β_{n}) β_{n} ∥ z_{n} - T_{n} z_{n} ∥^{2} \leq ∥ x_{n} - \overset{˘}{a} ∥^{2} - {∥ x_{n + 1} - \overset{˘}{a} ∥}^{2} + (1 - k) α_{n} B^{^{'}}, \end{matrix}

(17)

where

B^{^{'}} = sup {b_{n} : n \in N}

.

We next show that

{x_{n}}

converges strongly to

\overset{˘}{a}

. To apply Lemma 3, we set

a_{n} : = {∥ x_{n} - \overset{˘}{a} ∥}^{2}

and

t_{n} : = α_{n} (1 - k)

. From Equation (16), we obtain

\begin{matrix} a_{n + 1} \leq (1 - t_{n}) a_{n} + t_{n} b_{n} . \end{matrix}

Suppose that

{a_{n_{i}}}

is a subsequence of

{a_{n}}

such that

{lim inf}_{i \to \infty} (a_{n_{i} + 1} - a_{n_{i}}) \geq 0

. Using Equation (17) and (C2), we obtain

\begin{matrix} \underset{i \to \infty}{lim sup} β_{n_{i}} (1 - β_{n_{i}}) {∥ z_{n_{i}} - T_{n_{i}} z_{n_{i}} ∥}^{2} & \leq \underset{i \to \infty}{lim sup} (a_{n_{i}} - a_{n_{i} + 1} + α_{n_{i}} (1 - k) B^{^{'}}) \\ = \underset{i \to \infty}{lim sup} (a_{n_{i} + 1} - a_{n_{i}}) \\ \leq 0 . \end{matrix}

(18)

From (C1) and Equation (18), we obtain

\begin{matrix} lim_{i \to \infty} ∥ z_{n_{i}} - T_{n_{i}} z_{n_{i}} ∥ = 0 . \end{matrix}

(19)

Next, we show that

{lim sup}_{i \to \infty} b_{n_{i}} \leq 0

. Obviously, it suffices to show that

\begin{matrix} \underset{i \to \infty}{lim sup} 〈S (\overset{˘}{a}) - \overset{˘}{a}, z_{n_{i}} - \overset{˘}{a}〉 \leq 0 . \end{matrix}

Since

{z_{n_{i}}}

is bounded, there exists a subsequence

{z_{n_{i_{j}}}}

of

{z_{n_{i}}}

and

y \in H

such that

z_{n_{i_{j}}} ⇀ y

as

j \to \infty

and

\underset{i \to \infty}{lim sup} 〈S (\overset{˘}{a}) - \overset{˘}{a}, z_{n_{i}} - \overset{˘}{a}〉 = lim_{j \to \infty} 〈S (\overset{˘}{a}) - \overset{˘}{a}, z_{n_{i_{j}}} - \overset{˘}{a}〉 .

Since

{T_{n}}

satisfies the condition (Z) and Equation (19), we obtain

y \in Γ

. Using

\overset{˘}{a} = P_{Γ} S (\overset{˘}{a})

, we get

lim_{j \to \infty} 〈S (\overset{˘}{a}) - \overset{˘}{a}, z_{n_{i_{j}}} - \overset{˘}{a}〉 \leq 0 .

So, we have

\underset{i \to \infty}{lim sup} 〈S (\overset{˘}{a}) - \overset{˘}{a}, z_{n_{i}} - \overset{˘}{a}〉 \leq 0 .

Thus, in view of Lemma 3,

{x_{n}}

converges to

\overset{˘}{a}

as required. □

We next establish our inertial bilevel gradient modified SP algorithm (IBiG-MSPA) to solve the convex bilevel optimization problems in Equations (1) and (2) by applying Algorithm 1 and present its strong convergence. We use the following assumptions in order to solve this problem:

$f : H \to R$ is a convex and differentiable function such that $\nabla f$ is Lipschitz-continuous with constant $f > 0$ and $g : H \to (- \infty, \infty]$ are proper, lower-semicontinuous and convex functions;
$ω : R^{n} \to R$ is strongly convex with a parameter $σ$ such that $\nabla ω$ is $L_{ω}$ -Lipschitz continuous and $s \in (0, \frac{2}{L_{ω} + σ})$ .

Our IBiG-MSPA algorithm is defined as follows.

The following useful result was proved by Sabach and Shtern [26].

Proposition 1.

Suppose that

ω : R^{n} \to R

is strongly convex with

σ > 0

and

\nabla ω

is Lipschitz-continuous with constant

L_{ω}

. Then, the mapping defined by

S_{s} = I - s \nabla ω

is a contraction for all

s \in (0, \frac{2}{σ + L_{ω}})

. Thus,

\begin{matrix} ∥ x - s \nabla ω (u) - (v - s \nabla ω (v)) ∥ \leq \sqrt{1 - \frac{2 s σ L_{ω}}{σ + L_{ω}}} ∥ u - v ∥ \end{matrix}

for all

u, v \in R^{n}

.

Combining Theorem 1 and Proposition 1, we obtain the following result.

Theorem 2.

Let Λ be the set of all solutions to Equation (1) and

\overset{˘}{a} = P_{S_{*}} (I - s \nabla ω) (\overset{˘}{a})

and let (C1)–(C3) in Theorem 1 hold. Then,

{x_{n}}

generated by Algorithm 2 converges strongly to

\overset{˘}{a} \in Λ .

Algorithm 2 An Inertial Bilevel Gradient Modified SP Algorithm (IBiG-MSPA)

Initialization: Let

{α_{n}}

,

{β_{n}}

,

{γ_{n}}

,

{τ_{n}}

and

{c_{n}}

be sequences of positive real numbers. Take

x_{0}

,

x_{1} \in H

arbitrarily.
Iterative steps: For

n \geq 1,

calculate

x_{n + 1}

as follows:
Step 1. Compute an inertial parameter

\begin{matrix} θ_{n} = \{\begin{matrix} min \{\frac{p_{n} - 1}{p_{n + 1}}, \frac{α_{n} τ_{n}}{∥ x_{n} - x_{n - 1} ∥}\} & if x_{n} \neq x_{n - 1}, \\ \frac{p_{n} - 1}{p_{n + 1}} & otherwise, \end{matrix} \end{matrix}

where

p_{1} = 1

and

p_{n + 1} = \frac{1 + \sqrt{1 + 4 p_{n}^{2}}}{2}

.
Step 2. Compute

\begin{matrix} y_{n} = x_{n} + θ_{n} (x_{n} - x_{n - 1}), \\ z_{n} = (1 - α_{n}) y_{n} + α_{n} (I - s \nabla ω) y_{n}, \\ w_{n} = (1 - β_{n}) z_{n} + β_{n} p r o x_{c_{n} g} (I - c_{n} \nabla f) z_{n}, \\ x_{n + 1} = (1 - γ_{n}) w_{n} + γ_{n} p r o x_{c_{n} g} (I - c_{n} \nabla f) w_{n} . \end{matrix}

Proof.

Set

S = I - s \nabla ω

and

T_{n} = p r o x_{c_{n} g} (I - c_{n} \nabla f)

. Then, according to Proposition 1, S is a contraction mapping. We also know that

T_{n}

is nonexpansive. Using Theorem 1, we can conclude that

x_{n} \to \overset{˘}{a} \in Γ

, where

\overset{˘}{a} = P_{Γ} S (\overset{˘}{a})

. It can be noted that, in this case,

Γ = ⋂_{n = 1}^{\infty} F (T_{n}) = S_{*}

. Then, for all

x \in S_{*}

, we have

\begin{matrix} 0 & \geq 〈 S (\overset{˘}{a}) - \overset{˘}{a}, x - \overset{˘}{a} 〉 = 〈 \overset{˘}{a} - s \nabla ω (\overset{˘}{a}) - \overset{˘}{a}, x - \overset{˘}{a} 〉 = 〈 - s \nabla ω (\overset{˘}{a}), x - \overset{˘}{a} 〉 . \end{matrix}

Dividing the above inequalities by

- s

, we obtain

\begin{matrix} 〈 \nabla ω (\overset{˘}{a}), x - \overset{˘}{a} 〉 \geq 0 \end{matrix}

for all

x \in S_{*}

. Then,

x_{n} \to \overset{˘}{a} \in Λ

. This completes the proof. □

Using our main results (Theorems 1 and 2), we apply the IBiG-MSPA in the next section to solve a classification problem.

4. Applications with Classification Problems

There are several mathematical models used for the classification of datasets. For this paper, we use the extreme learning machine model and present the advantages of this model as follows.

The advantages of feedforward neural networks have led to their widespread use in diverse fields over the past few decades. Stated concisely, feedforward neural networks allow for the approximation of complex nonlinear mappings directly from input samples and provide models for numerous natural and artificial phenomena that are difficult to deal with using classical parametric techniques. However, the rendering of feedforward neural networks is time-consuming due to the dependence of the parameters of the different layers and the requirement to configure all of the parameters. One of the widely used feedforward neural networks is the single hidden layer feedforward neural network (SLFN). It has been widely studied in terms of both theory and application because of its learning ability and anti-error ability (see [43,44,45] for more detail).

In order to increase the effectiveness of SLFNs, a development model of a neural learning algorithm called the extreme learning machine (ELM) [46] was recently established. The advantage of the ELM is that hidden node learning parameters, such as input weights and biases, are generated at random and do not need to be adjusted, whereas the output weights can be obtained analytically by using a simply generalized inverse operation. The ELM has been effectively used in several real-world applications, including regression and classification problems [47,48].

We next examine some aspects of the ELM regarding the classification of data.

Let

{(x_{l}, t_{l}) : x_{l} \in R^{n}

,

t_{l} \in R^{m}

,

l = 1, 2, \dots, N}

be a set of training data taken from different samples with a total sample size N, where

x_{l} = [x_{l 1}, x_{l 2}, \dots, x_{l n}] \in R^{n}

and

t_{l} = [t_{l 1}, t_{l 2}, \dots, t_{l m}] \in R^{m}

are the input data and target data, respectively. The mathematical formula for an ELM with M hidden nodes is as follows:

\sum_{r = 1}^{M} K_{r} E (〈 v_{r}, x_{s} 〉 + a_{r}) = o_{s}, s = 1, \dots, N,

where

E (x)

represents the activation function,

K_{r} = {[K_{r 1}, K_{r 2}, \dots, K_{r m}]}^{T}

is the weight that links the r-th output node and the hidden node,

v_{r} = {[v_{r 1}, v_{r 2}, \dots, v_{r n}]}^{T}

is the weight that links the r-th input nodes and the hidden node and

a_{r}

is a bias.

The purpose of SLFNs is the prediction of N output nodes that satisfy

\sum_{s = 1}^{N} ∥ o_{s} - t_{s} ∥ = 0

. This means that there exist

K_{r}

,

v_{r}

and

a_{r}

such that

\sum_{r = 1}^{M} K_{r} E (〈 v_{r}, x_{s} 〉 + a_{r}) = t_{s}, s = 1, \dots, N .

From the above system of linear equations, we can rewrite the following:

H K = T,

where

H = [\begin{matrix} E (〈 v_{1}, x_{1} 〉 + a_{1}) & \dots & E (〈 v_{M}, x_{1} 〉 + a_{M}) \\ ⋮ & ⋱ & ⋮ \\ E (〈 v_{1}, x_{N} 〉 + a_{1}) & \dots & E (〈 v_{M}, x_{N} 〉 + a_{M}) \end{matrix}],

K = {[K_{1}^{T}, \dots, K_{M}^{T}]}_{m \times M}^{T},

and

T = {[t_{1}^{T}, \dots, t_{N}^{T}]}_{m \times N}^{T} .

As the Moore–Penrose generalized inverse

\overset{˘}{H}

of

H

exists, K can be obtained from

K = \overset{˘}{H} T

(see [46]). If

\overset{˘}{H}

does not exist, then it could be impossible to find K using this approach. To solve this issue, we determine K as a minimizer of the ordinary least squares minimization problem (OLS):

\begin{matrix} min_{K} {∥ H K - T ∥}_{2}^{2}, \end{matrix}

(20)

where

H \in R^{N \times M}

is called the hidden layer output matrix,

K \in R^{M \times m}

is the weight of the output layer,

T \in R^{N \times m}

is the training data target matrix, M is the number of hidden nodes and N is the number of training samples.

However, in a real situation, the use of OLS (Equation (20)) may cause an overfitting problem. To overcome such problems, the output weight K can be approximated with the least absolute shrinkage and selection operator (lasso) (see [49]):

\begin{matrix} min_{K} {∥ H K - T ∥}_{2}^{2} + λ {∥ K ∥}_{1}, \end{matrix}

(21)

where

λ

is a regularization parameter. Now, let

S_{*}

be the set of all solutions to Equation (21). Among the solutions in

S_{*}

, we would like to select a solution

K^{*} \in S_{*}

in such a way that

K^{*}

is a minimizer of

\begin{matrix} min_{K \in S_{*}} \frac{1}{2} {∥ K ∥}^{2} . \end{matrix}

(22)

Our aim in the next section is to employ the IBiG-MSPA to solve the convex bilevel optimization problems in Equations (21) and (22) and to use the obtained optimal weight for classification of the Diabetes [50], Heart Disease UCI [51] and Iris datasets [52]. These databases are widely used as benchmarks in many research works in the area of data classification.

5. Numerical Experiments

In this section, we present the experimental results from applying our proposed algorithm to classify the Diabetes, Heart Disease UCI and Iris datasets.

We employed our algorithm (IBiG-MSPA) to solve the convex bilevel optimization problems in Equations (21) and (22) by setting

ω (K) = \frac{1}{2} {∥ K ∥}_{2}^{2}

,

f (K) = {∥ H K - T ∥}_{2}^{2}

,

g (K) = {λ ∥ K ∥}_{1}

and

E (x)

as sigmoid.

The parameters selected for this experiment are shown in Table 1, where

L_{f}

=

{2 ∥ H ∥}^{2}

. We measured the efficiency of each algorithm using the output data accuracy as follows:

\begin{matrix} accuracy = 100 \times \frac{correct prediction}{total cases} . \end{matrix}

Next, we used the Diabetes, Heart Disease UCI and Iris datasets for classification, as described below.

Diabetes dataset [50]: There are nine features in the dataset. We categorized two data classes in this dataset.

Heart Disease UCI dataset [51]: The dataset has 14 features. Patients’ heart disease data are presented in this dataset. We wanted to categorize the data into two groups.

Iris dataset [52]: The dataset contains four features and three classes. We aimed to classify the data into three types of iris plants.

Testing and training data are shown in Table 2.

For each dataset, the numbers of iterations and hidden nodes were determined as follows Table 3:

The number of iterations for each dataset was chosen to produce the best results for each method under consideration, as can be seen in Table 2.

We conducted experiments to compare the efficiency of the IBiG-MSPA with other algorithms under consideration; namely, the BiG-SAM, iBiG-SAM and aiBiG-SAM.

As representations of the accuracy of testing and training, we use the terms Ac.Test and Ac.Train, respectively, in Table 4.

The results from Table 4 show that our proposed method, the IBiG-MSPA, performed better than the BiG-SAM, iBiG-SAM and aiBiG-SAM in terms of training and testing accuracy for each dataset. Therefore, based on our study, the IBiG-MSPA could classify the chosen datasets with greater accuracy than the other methods.

We can observe that, for the Heart Disease UCI dataset, the accuracies of the existing algorithms were around 70%, while our proposed algorithm achieved accuracy over 80%. In a practical scenario, even small improvements in classification accuracy can have significant effects. For instance, in the case of medical diagnoses, for which the Heart Disease UCI dataset is often used as a benchmark, a slight increase in accuracy can lead to more reliable predictions and better patient outcomes. It may help identify more individuals at risk or improve the overall efficiency of the classification process, leading to appropriate treatments. This observation applies equally well to the other two datasets and datasets similar to them.

6. Conclusions

We first introduced an inertial viscosity modified SP algorithm (IVMSPA). Then, the strong convergence of the IVMSPA was proved under mild conditions with the control sequence. Next, we proposed the inertial bilevel gradient modified SP algorithm (IBiG-MSPA) to solve the convex bilevel optimization problem. Finally, we applied our method to classifying the Diabetes, Heart Disease UCI and Iris datasets. The numerical experiments showed that the IBiG-MSPA had higher efficiency than the BiG-SAM, iBiG-SAM and aiBiG-SAM.

The performances of the algorithms discussed in this paper depend in part on the characteristics of the datasets. In order to improve the accuracy, one needs to address issues related to the preprocessing of the data, such as feature selection, missing data and dataset imbalances. The goal of our future research is to develop new techniques or processes that can improve the efficiency of algorithms in classifying imbalanced and big datasets.

Author Contributions

Conceptualization, R.W.; Formal analysis, K.J. and R.W.; Investigation, K.J. and R.W.; Methodology, R.W.; Software, K.J.; Supervision, S.S. and Y.J.C.; Validation, S.S., Y.J.C., A.K. and R.W.; Visualization, R.W.; Writing—original draft, K.J.; Writing—review and editing, S.S., Y.J.C., A.K. and R.W. All authors have discussed the results and approved the final manuscript.

Funding

NSRF (grant number B05F640183).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All datasets used in this paper were obtained from https://archive.ics.uci.edu (accessed on 30 October 2022).

Acknowledgments

This research received funding support from the NSRF via the Program Management Unit for Human Resources and Institutional Development, Research and Innovation (grant number B05F640183), and it was also partially supported by Chiang Mai University and Ubon Ratchathani University.

Conflicts of Interest

The authors declare no conflict of interest.

References

Garcia-Herreros, P.; Zhang, L.; Misra, P.; Arslan, E.; Mehta, S.; Grossmann, I.E. Mixed-integer bilevel optimization for capacity planning with rational markets. Comput. Chem. Eng. 2016, 86, 33–47. [Google Scholar] [CrossRef]
Maravillo, H.; Camacho-Vallejo, J.; Puerto, J.; Labbé, M. A market regulation bilevel problem: A case study of the mexican petrochemical industry. Omega 2019, 97, 102–105. [Google Scholar] [CrossRef] [Green Version]
Marcotte, P. Network design problem with congestion effects: A case of bilevel programming. Math. Program. 1986, 34, 142–162. [Google Scholar] [CrossRef]
Migdalas, A. Bilevel programming in traffic planning: Models, methods and challenge. J. Glob. Optim. 1995, 7, 381–405. [Google Scholar] [CrossRef]
Clark, P.A.; Westerberg, A.W. Bilevel programming for steady-state chemical process design–I. Fundamentals and algorithms. Comput. Chem. Eng. 1990, 14, 87–97. [Google Scholar] [CrossRef]
Dempe, S. Foundations of Bi-Level Programming, 1st ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
Bard, J.F. Coordination of a multidivisional organization through two levels of management. Omega 1983, 11, 457–468. [Google Scholar] [CrossRef]
Dan, T.; Marcotte, P. Competitive facility location with selfish users and queues. Oper. Res. 2019, 67, 479–497. [Google Scholar] [CrossRef]
Gabriel, S.A.; Conejo, A.J.; Fuller, J.D.; Hobbs, B.F.; Ruiz, C. Complementarity Modeling in Energy, 1st ed.; Springer: New York, NY, USA, 2012. [Google Scholar]
Wogrin, S.; Pineda, S.; Tejada-Arango, D.A. Applications of bilevel optimization in energy and electricity markets. Bilevel Optim. 2020, 161, 139–168. [Google Scholar]
Nimana, N.; Petrot, N. Incremental proximal gradient scheme with penalization for constrained composite convex optimization problems. Optimization 2019, 70, 1307–1336. [Google Scholar] [CrossRef]
Janngam, K.; Suantai, S. An inertial modified S-Algorithm for convex minimization problems with directed graphs and their applications in classification problems. Mathematics 2022, 10, 4442. [Google Scholar] [CrossRef]
Bussaban, L.; Suantai, S.; Kaewkhao, A. A parallel inertial S-iteration forward-backward algorithm for regression and classification problems. Carpathian J. Math. 2020, 36, 35–44. [Google Scholar] [CrossRef]
Lions, P.L.; Mercier, B. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 1979, 16, 964–979. [Google Scholar] [CrossRef]
Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef] [Green Version]
Bruck, R.E., Jr. On the weak convergence of an ergodic iteration for the solution of variational inequalities for monotone operators in Hilbert space. J. Math. Anal. Appl. 1977, 61, 159–164. [Google Scholar] [CrossRef] [Green Version]
Passty, G.B. Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 1979, 72, 383–390. [Google Scholar] [CrossRef] [Green Version]
Figueiredo, M.; Nowak, R. An EM algorithm for wavelet-based image restoration. IEEE Trans. Image Process. 2003, 12, 906–916. [Google Scholar] [CrossRef] [Green Version]
Hale, E.; Yin, W.; Zhang, Y. A Fixed-Point Continuation Method for l1-Regularized Minimization with Applications to Compressed Sensing; Department of Computational and Applied Mathematics, Rice University: Houston, TX, USA, 2007. [Google Scholar]
Liang, J.; Schonlieb, C.B. Improving fista: Faster, smarter and greedier. arXiv 2018, arXiv:1811.01430. [Google Scholar]
Moudafi, A. Viscosity approximation method for fixed-points problems. J. Math. Anal. Appl. 2000, 241, 46–55. [Google Scholar] [CrossRef] [Green Version]
Xu, H.K. Viscosity approximation methods for nonexpansive mappings. J. Math. Anal. Appl. 2004, 298, 279–291. [Google Scholar] [CrossRef] [Green Version]
Jailoka, P.; Suantai, S. and Hanjing, A. A fast viscosity forward-backward algorithm for convex minimization problems with an application in image recovery. Carpathian J. Math. 2020, 37, 449–461. [Google Scholar] [CrossRef]
Tan, B.; Zhou, Z.; Li, S. Viscosity-type inertial extragradient algorithms for solving variational inequality problems and fixed point problems. J. Appl. Math. Comput. 2022, 68, 1387–1411. [Google Scholar] [CrossRef]
Beck, A.; Sabach, S. A first order method for finding minimal norm-like solutions of convex optimization problems. Math. Program. 2014, 147, 25–46. [Google Scholar] [CrossRef]
Sabach, S.; Shtern, S. A first order method for solving convex bilevel optimization problems. SIAM J. Optim. 2017, 27, 640–660. [Google Scholar] [CrossRef] [Green Version]
Shehu, Y.; Vuong, P.T.; Zemkoho, A. An inertial extrapolation method for convex simple bilevel optimization. Optim. Methods Softw. 2019, 2019, 1–19. [Google Scholar] [CrossRef]
Duan, P.; Zhang, Y. Alternated and multi-step inertial approximation methods for solving convex bilevel optimization problems. Optimization 2022, 2022, 1–30. [Google Scholar] [CrossRef]
Jiang, R.; Abolfazli, N.; Mokhtari, A.; Hamedani, E.Y. A Conditional Gradient-based Method for Simple Bilevel Optimization with Convex Lower-level Problem. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, Valencia, Spain, 25–27 April 2023; pp. 10305–10323. [Google Scholar]
Thongsri, P.; Panyanak, B.; Suantai, S. A New Accelerated Algorithm Based on Fixed Point Method for Convex Bilevel Optimization Problems with Applications. Mathematics 2023, 11, 702. [Google Scholar] [CrossRef]
Nakajo, K.; Shimoji, K.; Takahashi, W. Strong convergence to a common fixed point of families of nonexpansive mappings in Banach spaces. J. Nonlinear Convex Anal. 2007, 8, 11–34. [Google Scholar]
Aoyama, K.; Kimura, Y. Strong convergence theorems for strongly nonexpansive sequences. Appl. Math. Comput. 2011, 217, 7537–7545. [Google Scholar] [CrossRef]
Aoyama, K.; Kohsaka, F.; Takahashi, W. Strong convergence theorems by shrinking and hybrid projection methods for relatively nonexpansive mappings in Banach spaces. In Nonlinear Analysis and Convex Analysis, Proceedings of the 5th International Conference on Nonlinear Analysis and Convex Analysi, Hsinchu, Taiwan, 31 May–4 June 2007; Yokohama Publishers: Yokohama, Japan, 2009; pp. 7–26. [Google Scholar]
Moreau, J.J. Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes Rendus Acad. Sci. Paris Ser. A Math. 1962, 255, 2897–2899. [Google Scholar]
Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces; Springer: New York, NY, USA, 2011. [Google Scholar]
Takahashi, W. Introduction to Nonlinear and Convex Analysis; Yokohama Publishers: Yokohama, Japan, 2009. [Google Scholar]
Saejung, S.; Yotkaew, P. Approximation of zeros of inverse strongly monotone operators in Banach spaces. Nonlinear Anal. 2012, 75, 724–750. [Google Scholar] [CrossRef]
Polyak, B.T. Some methods of speeding up the convergence of iteration methods. Ussr Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
Nesterov, Y. A method for solving the convex programming problem with convergence rate O(1/k²). Dokl. Akad. Nauk SSSR 1983, 269, 543–547. [Google Scholar]
Phuengrattana, W.; Suantai, S. On the rate of convergence of Mann, Ishikawa, Noor and SP-iterations for continuous functions on an arbitrary interval. J. Comput. Appl. Math. 2000, 235, 3006–3014. [Google Scholar] [CrossRef] [Green Version]
Mann, W.R. Mean value methods in iteration. Proc. Am. Math. Soc. 1953, 4, 506–510. [Google Scholar] [CrossRef]
Ishikawa, S. Fixed points by a new iteration method. Proc. Am. Math. Soc. 1974, 44, 147–150. [Google Scholar] [CrossRef]
Ding, S.F.; Jia, W.K.; Su, C.Y.; Zhang, L.W. Research of neural network algorithm based on factor analysis and cluster analysis. Neural Comput. Appl. 2011, 20, 297–302. [Google Scholar] [CrossRef]
Razavi, S.; Tolson, B.A. A new formulation for feedforward neural networks. IEEE Trans. Neural Netw. 2011, 22, 1588–1598. [Google Scholar] [CrossRef]
Chen, Y.; Zheng, W.X. Stochastic state estimation for neural networks with distributed delays and Markovian jump. Neural Netw. 2012, 25, 14–20. [Google Scholar] [CrossRef]
Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Rong, H.J.; Ong, Y.S.; Tan, A.H.; Zhu, Z. A fast pruned-extreme learning machine for classification problem. Neurocomputing 2008, 72, 359–366. [Google Scholar] [CrossRef]
Huang, G.B.; Ding, X.; Zhou, H. Optimization method based extreme learning machine for classification. Neurocomputing 2010, 74, 155–163. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Smith, J.W.; Everhart, J.E.; Dickson, W.C.; Knowler, D.C.; Johannes, R.S. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care; IEEE Computer Society Press: New York, NY, USA, 1998; pp. 261–265. [Google Scholar]
Lichman, M. UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu (accessed on 20 October 2022).
Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]

Table 1. Parameter selection for the IBIG-MSPA, BiG-SAM, iBiG-SAM and aiBiG-SAM.

Methods	Settings
IBiG-MSPA	$s = 0.01$ , $c_{n} = \frac{1}{L_{f}}$ , $α_{n} = \frac{1}{50 n}$ , $β_{n} = γ_{n} = 0.5$ , $τ_{n} = \frac{10^{20}}{n}$
BiG-SAM	$λ = 0.01$ , $c = \frac{1}{L_{f}}$ , $γ_{n} = \frac{2 (0.1)}{1 - \frac{2 + c L_{f}}{4}}$
iBiG-SAM	$λ = 0.01$ , $c = \frac{1}{L_{f}}$ , $α = 3$ , $γ_{n} = \frac{2 (0.1)}{1 - \frac{2 + c L_{f}}{4}}$ , $β_{n} = \frac{γ_{n}}{n^{0.01}}$ $θ_{n} = \{\begin{matrix} min \{\frac{n}{n + α - 1}, \frac{β_{n}}{∥ x_{n} - x_{n - 1} ∥}\} if x_{n} \neq x_{n - 1}, \\ \frac{n}{n + α - 1} otherwise . \end{matrix}$
aiBiG-SAM	$λ = 0.01$ , $c = \frac{1}{L_{f}}$ , $γ_{n} = \frac{2 (0.1)}{1 - \frac{2 + c L_{f}}{4}}$ , $β_{n} = \frac{γ_{n}}{n^{0.01}}$ $θ_{n} = \{\begin{matrix} min \{\frac{n}{n + α - 1}, \frac{β_{n}}{∥ x_{n} - x_{n - 1} ∥}\} if x_{n} \neq x_{n - 1}, \\ \frac{n}{n + α - 1} otherwise . \end{matrix}$

Table 2. Diabetes, Heart Disease UCI and Iris datasets, with 30% of each dataset used for testing and 70% for training.

Datasets	Features	Sample
Datasets	Features	Testing Set	Training Set
Diabetes	9	230	538
Heart Disease UCI	14	90	213
Iris	4	45	105

Table 3. Numbers of iterations and hidden nodes specified for each data collection.

Datasets	Number of Iterations ( $\bar{I}$ )	Number of Hidden Nodes (M)
Diabetes	200	100
Heart Disease UCI	100	60
Iris	300	30

Table 4. Accuracy of predictions using various algorithms.

Dataset	IBiG-MSPA		BiG-SAM		iBiG-SAM		aiBiG-SaM
Dataset	Ac.Train	Ac.Test	Ac.Train	Ac.Test	Ac.Train	Ac.Test	Ac.Train	Ac.Test
Diabetes	77.11	81.08	71.98	76.13	72.34	76.58	70.88	73.42
Heart Disease UCI	85.71	83.87	74.76	74.19	82.38	78.49	83.81	79.57
Iris	99.05	100.00	94.29	95.56	94.29	95.56	94.29	95.56

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Janngam, K.; Suantai, S.; Cho, Y.J.; Kaewkhao, A.; Wattanataweekul, R. A Novel Inertial Viscosity Algorithm for Bilevel Optimization Problems Applied to Classification Problems. Mathematics 2023, 11, 3241. https://doi.org/10.3390/math11143241

AMA Style

Janngam K, Suantai S, Cho YJ, Kaewkhao A, Wattanataweekul R. A Novel Inertial Viscosity Algorithm for Bilevel Optimization Problems Applied to Classification Problems. Mathematics. 2023; 11(14):3241. https://doi.org/10.3390/math11143241

Chicago/Turabian Style

Janngam, Kobkoon, Suthep Suantai, Yeol Je Cho, Attapol Kaewkhao, and Rattanakorn Wattanataweekul. 2023. "A Novel Inertial Viscosity Algorithm for Bilevel Optimization Problems Applied to Classification Problems" Mathematics 11, no. 14: 3241. https://doi.org/10.3390/math11143241

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Inertial Viscosity Algorithm for Bilevel Optimization Problems Applied to Classification Problems

Abstract

1. Introduction

2. Preliminaries

3. Proposed Method

4. Applications with Classification Problems

5. Numerical Experiments

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI