Comparative Analysis of Accelerated Models for Solving Unconstrained Optimization Problems with Application of Khan’s Hybrid Rule

Rakočević, Vladimir; Petrović, Milena J.

doi:10.3390/math10234411

Open AccessArticle

Comparative Analysis of Accelerated Models for Solving Unconstrained Optimization Problems with Application of Khan’s Hybrid Rule

by

Vladimir Rakočević

^1,2 and

Milena J. Petrović

^3,*

¹

Serbian Academy of Sciences and Arts, Kneza Mihajla 35, 11000 Belgrade, Serbia

²

Faculty of Sciences and Mathematics, University of Niš, Višegradska 33, 18106 Niš, Serbia

³

Faculty of Sciences and Mathematics, University of Pristina in Kosovska Mitrovica, Lole Ribara 29, 38220 Kosovska Mitrovica, Serbia

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(23), 4411; https://doi.org/10.3390/math10234411

Submission received: 30 October 2022 / Revised: 17 November 2022 / Accepted: 21 November 2022 / Published: 23 November 2022

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we follow a chronological development of gradient descent methods and its accelerated variants later on. We specifically emphasise some contemporary approaches within this research field. Accordingly, a constructive overview over the class of hybrid accelerated models derived from the three-term hybridization process proposed by Khan is presented. Extensive numerical test results illustrate the performance profiles of hybrid and non-hybrid versions of chosen accelerated gradient models regarding the number of iterations, CPU time, and number of function evaluation metrics. Favorable outcomes justify this hybrid approach as an accepted method in developing new efficient optimization schemes.

Keywords:

line search; gradient descent methods; quasi-Newton method; convergence rate

MSC:

49M15; 49M37; 65B99; 90C26; 90C30; 90C53

1. Class of Accelerated Gradient Descent Methods and Its Benefits

Many contemporary scientific, engineering, medical and problems from various other research areas, are closely related to mathematical optimization theory. Among all others, the unconstrained optimization problems are the most frequently considered [1,2,3,4,5,6,7,8]. Owing to the duality principle, an optimization problem may be viewed as a minimization problem. Unconstrained minimization problems can simply be stated as finding

min f (x), x \in R^{n},

where

f : R^{n} \to R

is an objective function, that is solved by the general iteration:

x_{k + 1} = x_{k} + t_{k} d_{k} .

(1)

In (1),

x_{k}

presents the current iterative point,

x_{k + 1}

is the next one; the positive iterative step length value is denoted by

t_{k}

while

d_{k}

stands for the k-th search direction vector. As can be observed, two main elements that measure the efficiency and robustness of the iterative rule (1) are the adequately calculated iterative step size

t_{k}

and the properly chosen iterative search direction

d_{k}

. Since we are dealing with minimization problems, it is a natural choice to define the search direction so that it fulfils the descent condition, i.e.,

g_{k}^{T} d_{k} < 0,

(2)

where

g_{k}

is the gradient of f at the point

x_{k}

. Apart from that, we use standard notations for the gradient and the Hessian of the objective function f:

g (x) = ▽ f (x), G (x) = ▽^{2} f (x), g_{k} = ▽ f (x_{k}), G_{k} = ▽^{2} f (x_{k}) .

(3)

Following condition (2), it is easy to conclude that the

d_{\equiv} - g_{k}

produces the most certain descent direction, known as the gradient descent direction. Iteration (1) that includes the gradient descent direction is known as the gradient descent method (or GD method)

x_{k + 1} = x_{k} - t_{k} g_{k} .

(4)

Step length parameter

t_{k}

in iterations (1) and (4), is determined either by the exact or by some of the inexact line search procedures. Using the exact line search technique, iterative step length value

t_{k}

is computed by solving the following minimization task:

f (x_{k} + t_{k} d_{k}) = min f (x_{k} + t d_{k}), t > 0 .

(5)

It is clear that solving the previous minimization problem in each iterative value presents a time- and resource-consuming task regarding CPU time requirements and the number of required objective function evaluations. For this reason, most contemporary optimization methods use the inexact line search algorithms to calculate iterative step size instead of the exact line search procedure. The convergence properties of line search methods for unconstrained optimization are specifically examined in [9]. Some of the frequently used inexact line search algorithms are weak and strong Wolfe’s algorithms [10], Backtracking algorithm proposed in [11] with Armijo’s rule [12], etc.:

Weak Wolfe’s line search:

$f (x_{k} + t_{k} d_{k}) \leq f (x_{k}) + δ t_{k} g_{k}^{T} d_{k}$

$g {(x_{k} + t_{k} d_{k})}^{T} d_{k} \geq σ g_{k}^{T} t d_{k};$
Strong Wolfe’s line search:

$f (x_{k} + t_{k} d_{k}) \leq f (x_{k}) + δ t_{k} g_{k}^{T} d_{k}$

$| g {(x_{k} + t_{k} d_{k})}^{T} d_{k} | \geq - σ g_{k}^{T} t d_{k};$
Backtracking algorithm:

Objective function $f (x)$ , the direction $d_{k}$ of the search at the point $x_{k}$ and numbers $0 < σ < 0.5$ and $β \in (0, 1)$ are required;
$t = 1$ ;
$f (x_{k} + t d_{k}) > f (x_{k}) + σ t g_{k}^{T} d_{k}$ , take $t : = t β$ ;
Return $t_{k} = t$ .

Subsequently, the Newton method with line search is given as

x_{k + 1} = x_{k} - t_{k} G_{k}^{- 1} g_{k} .

(6)

In (6),

G_{k}^{- 1}

stands for the inverse of the function Hessian, according to the previously adopted notation. Step length parameter

t_{k}

is obtained by applying some chosen inexact procedure. Instead of calculating the inverse of the function Hessian, which is often a demanding task, in quasi-Newton methods the adequate approximation of the Hessian (or of its inverse) is used

x_{k + 1} = x_{k} - t_{k} H_{k} g_{k} .

(7)

Herein,

H_{k} \equiv B_{k}^{- 1}

and

B_{k}

is derived as a symmetric positive definite Hessian’s approximation. Updating of

{B_{i}}, i \in N

is conducted using the quasi-Newton property of secant equation

B_{k + 1} s_{k} = y_{k},

where parameters

s_{k}

and

y_{k}

are the differences in two successive iterative points and iterative gradients, respectively, i.e.,

s_{k} = x_{k + 1} - x_{k}, y_{k} = g_{k + 1} - g_{k}

. In [13], the classification of methods for updating the matrix

B_{k}

is presented. Therein, the three updating methods are extricated:

matrix $B_{k}$ is defined as a scalar matrix, i.e., $B_{k} = γ_{k} I, γ_{k} > 0;$
matrix $B_{k}$ is defined as a diagonal matrix, i.e., $B_{k} = d i a g (λ_{1}, \dots, λ_{n}), λ_{i} > 0, i = \bar{1, n};$
matrix $B_{k}$ is defined as a full matrix.

Taking the simplest updating approach, the first one, i.e.,

B_{k} = γ_{k} I ∽ G_{k}, γ_{k} > 0

, the quasi-Newton method (7) is transformed to

x_{k + 1} = x_{k} - t_{k} γ_{k}^{- 1} g_{k} .

(8)

In [14], the authors named the iterative methods (8) as a class of accelerated gradient methods. They named it so for their good convergence and performance metrics. Previously in [15], Andrei defined this accelerated iteration by calculating parameter

θ_{k} (= γ_{k}^{- 1})

as follows:

Algorithm for generating scalar $θ_{k}$ from [15]:

Objective function $f (x)$ , the direction $d_{k}$ of the search at the point $x_{k}$ and numbers $0 < σ < 0.5$ and $β \in (0, 1)$ are required;
Apply Backtracking algorithm to calculate $t_{k} \in (0, 1]$ ;
Compute $z = x_{k} - t_{k} g_{k}, g_{z} = \nabla f (z), y_{k} = g_{z} - g_{k}$ ;
Compute $a_{k} = t_{k} g_{k}^{T} g_{k}, b_{k} = - t_{k} y_{k}^{T} g_{k}$
Return $θ_{k} = \frac{a_{k}}{b_{k}}$ .

Stanimirović and Miladinović in [14] determined the parameter

γ_{k}^{- 1}

from (8) on the basis of the second-order Taylor’s expansion, and denoted it as

S M

method. Results obtained by several researchers on this topic confirmed that this way of deriving the accelerated parameter (as named in [16]), is justifiable regarding the convergence and numerical performance aspects [17,18]. Several forms of this important variable in chosen accelerated gradient schemes are listed as expressions (5)–(9) in [19] and some other approaches are presented in [20,21,22].

Among the various accelerated gradient iterations, for this research we specifically chose three among which Khan’s hybridization three-term process was later applied. The first of the three is the already mentioned

S M

method, defined by relation (8) and presented in [14]. The second one is the

A D D

method (i.e., accelerated double direction method) from [16], with the iterative representation

x_{k + 1} = x_{k} + α_{k}^{2} d_{k} - α_{k} {γ_{k}}^{- 1} g_{k} .

(9)

In (9),

α_{k} > 0

is the iterative step length value, while

d_{k}

is the second search vector calculated under the assumption

∥ d_{k} ∥ = 1, k = 1, 2, \dots

, by the next procedure:

d_{k} (t) = \{\begin{matrix} d_{k}^{*}, & k \leq m - 1 \\ \sum_{i = 2}^{m} t^{i - 1} d_{k - i + 1}^{*} & k \geq m \end{matrix}

(10)

and

d_{k}^{*}

is the solution of the problem

{min}_{x \in R} Φ_{k} (d)

,

Φ_{k} (d) = ▽ f {(x_{k})}^{T} d + \frac{1}{2} γ_{k + 1} I = g {(x_{k})}^{T} d + \frac{1}{2} γ_{k + 1} I .

(11)

The iterative value of the accelerated parameter

γ_{k}

, obtained through Taylor’s series of (9), is

γ_{k + 1}^{A D D} = 2 \frac{f (x_{k + 1}) - f (x_{k}) - α_{k} g_{k}^{T} (α_{k} d_{k} - γ_{k}^{- 1} g_{k})}{{(α_{k} d_{k} - γ_{k}^{- 1} g_{k})}^{T} (t_{k} d_{k} - γ_{k}^{- 1} g_{k})} .

(12)

The positive step length value

α_{k}

of the

A D D

scheme is derived using the Backtracking algorithm, starting with initial

t = 1 .

Taking the following substitutions in (9)

α_{k}^{2} \to β_{k},

where

β_{k}

is calculated by a different Backtracking procedure, and

d_{k} \to - g_{k},

lead to the ADSS (accelerated double step size) method [17]:

x_{k + 1} = x_{k} - α_{k} {γ_{k}}^{- 1} g_{k} - β_{k} g_{k} = x_{k} - (α_{k} γ_{k}^{- 1} + β_{k}) g_{k} .

(13)

Finally, under assumption

α_{k} + β_{k} = 1,

A D S S

iteration is transformed to

T A D S S

scheme [18], our third chosen accelerated gradient descent model:

x_{k + 1} = x_{k} - [α_{k} ({γ_{k}}^{- 1} - 1) + 1] g_{k} .

(14)

As in (9), the iterative step length value

α_{k}

in (14) is calculated on the basis of Backtracking algorithm. The accelerated parameter of the

T A D S S

scheme is

γ_{k + 1}^{T A D S S} = 2 \frac{f (x_{k + 1}) - f (x_{k}) + ψ_{k} {∥ g_{k} ∥}^{2}}{ψ_{k}^{2} {∥ g_{k} ∥}^{2}}, ψ_{k} = [α_{k} γ_{k}^{- 1} - α_{k}^{2}) + 1] .

(15)

In the following proposition, we prove that (8), (9) and (14) iterations are a gradient descending process.

Proposition 1.

The search directions in iterations defined by (8), (9) and (14) fulfil the descending condition (2).

Proof.

We separately analyze the search directions of all three listed methods.

According to the general iteration form (1), the search direction in $S M$ method, defined by relation (8), is $d_{k} \equiv - γ_{k}^{- 1} g_{k} .$ One of the essential properties of the accelerated parameter $γ_{k}$ is its positiveness. If in some iterative step k of the accelerated gradient algorithms with leading iterative rules (8), (9) and (14) this necessary condition is not fulfilled, then the k-th accelerated scalar value is set to be $γ_{k} = 1$ . Bearing this fact in mind, we easily conclude that

$g_{k}^{T} d_{k} = g_{k}^{T} (- γ_{k}^{- 1} g_{k}) = - γ_{k}^{- 1} {∥ g_{k} ∥}^{2} < 0,$

which confirms that the condition (2) in $S M$ method, defined by (8), is fulfilled.
Accelerated double direction $A D D$ scheme (9) contains two search vectors. The first one, denoted as $d_{k}$ , is defined by (10). The second one is of the same form as in the $S M$ iteration, i.e., $- {γ_{k}}^{- 1} g_{k}$ . In the procedure (10), crucial element $d_{k}^{*}$ in deriving vector $d_{k}$ is defined as a solution of the minimization problem (11) that depends on the gradient $g_{k}$ , under the assumption $∥ d_{k} ∥ = 1 .$ Thus, the defined vector direction is a relaxed differentiable variant of the procedure for determination of the search vector $d_{k}$ (rule 2) in [23] and accordingly, the $d = 0$ is globally optimum of the problem (11). Subsequently, we consider only the second direction, which is already performed in the previous item.
In $T A D S S$ scheme, vector direction can be seen as $- [α_{k} ({γ_{k}}^{- 1} - 1) + 1] g_{k}$ . Checking the descent condition (2), we get

$g_{k}^{T} \cdot (- [α_{k} ({γ_{k}}^{- 1} - 1) + 1] g_{k}) = - [α_{k} ({γ_{k}}^{- 1} - 1) + 1] {∥ g_{k} ∥}^{2} < 1,$

since $γ_{k}^{- 1} > 1$ and $α_{k} \in (0, 1]$ according to $T A D S S$ algorithm.

□

Further on, in Section 2, we analyse Khan’s hybridization rule applied on various gradient methods. Finally, Dolan–Moré representations and parallels among the hybrid and non-hybrid versions of

S M, A D D

and

T A D S S

schemes, obtained over large-scale numerical outcomes, are illustrated in Section 3.

2. Three-Term Khan’s Hybridization Principle over the Accelerated Gradient Descent Models

For a nonempty convex subset

C

of a normed space

E

, let

T : C \to C

be a mapping defined on

C

. Then, for some sequences

u_{k}, v_{k}, z_{k}

and

y_{k}

defined on

C

, the Picard, Mann and Ishikawa iterative processes [24,25,26] are, respectively, given as:

\{\begin{matrix} u_{1} = u \in C, \\ u_{k + 1} = T u_{k}, & k \in N, \end{matrix}

\{\begin{matrix} v_{1} = v \in C, \\ v_{k + 1} = (1 - α_{k}) v_{k} + α_{k} T v_{k}, & k \in N, \end{matrix}

\{\begin{matrix} z_{1} = z \in C, \\ z_{k + 1} = (1 - α_{k}) z_{k} + α_{k} T y k, \\ y_{k} = (1 - β_{k}) z_{k} + β_{k} T z_{k} . & k \in N \end{matrix}

In the listed relations, parameters

{α_{k}}, {β_{k}} \in (0, 1)

are the sequences of positive numbers, which in the Ishikawa process [26] fulfil the following assumptions

$0 \leq α_{k} \leq β_{k} \leq 1, k \geq 0$ ,
${lim}_{k \to \infty} β_{k} = 0$ ,
$\sum_{k = 1}^{\infty} α_{k} β_{k} = \infty$ .

In [27], Khan proposed a new three-term iterative process as follows

\{\begin{matrix} x_{1} = x \in R, \\ x_{k + 1} = T y_{k}, \\ y_{k} = (1 - α_{k}) x_{k} + α_{k} T x_{k}, & k \in N \end{matrix}

(16)

with the sequence of positive numbers

{α_{k}} \in (0, 1)

, which is considered as a set of constant values, i.e.,

(α = α_{k} \in (0, 1) \forall k \in N,)

in practical numerical tests, as proposed in [27]. Khan confirmed in [27] that the process (16) converges faster than the processes of Picard, Mann and Ishikawa.

Khan developed this iterative process (16) as a hybrid variant of previously mentioned, well-known iterations and with that managed to improve these types of methods. Further, some authors used auspicious aspects of this hybrid rule and applied it to some accelerated gradient optimization methods. The hybridization principle consists of taking the objective accelerated iteration as a guiding operator in (16). As a result, several accelerated-hybridization processes arose [28,29,30,31,32]. We list them below, together with their accelerated parameters.

Hybrid accelerated gradient descent method ( $H S M$ ) [28]

$x_{k + 1} = x_{k} - (α_{k} + 1) t_{k} γ_{k}^{- 1} g_{k},$

(17)

$γ_{k + 1}^{H S M} = 2 γ_{k} \frac{γ_{k} [f (x_{k + 1}) - f (x_{k})] + (α_{k} + 1) t_{k} {∥ g_{k} ∥}^{2}}{{(α_{k} + 1)}^{2} t_{k}^{2} {∥ g_{k} ∥}^{2}} .$

(18)
Hybrid accelerated double direction method ( $H A D D$ ) [29]

$x_{k + 1} = x_{k} - α t_{k} γ_{k}^{- 1} g_{k} + α t_{k}^{2} d_{k}, α \in (1, 2),$

(19)

$γ_{k + 1} = 2 \frac{f (x_{k + 1}) - f (x_{k}) - α g_{k}^{T} (t_{k}^{2} d_{k} - t_{k} γ_{k}^{- 1} g_{k})}{α^{2} t_{k}^{2} {(t_{k} d_{k} - γ_{k}^{- 1} g_{k})}^{T} (t_{k} d_{k} - γ_{k}^{- 1} g_{k})} .$

(20)
Hybrid accelerated double step size method ( $H A D S S$ ) [30]

$x_{k + 1} = x_{k} - g_{k} α (t_{k} γ_{k}^{- 1} + p_{k}), α \equiv α_{k} + 1 \in (1, 2) \forall k,$

(21)

$γ_{k + 1}^{H A D S S} = 2 \frac{f (x_{k + 1}) - f (x_{k}) + α (t_{k} γ_{k}^{- 1} + p_{k}) {∥ g_{k} ∥}^{2}}{α^{2} {(t_{k} γ_{k}^{- 1} + p_{k})}^{2} {∥ g_{k} ∥}^{2}} .$

(22)
Hybrid transformed double step size method ( $H T A D S S$ ) [31]

$x_{k + 1} = x_{k} - α (t_{k} (γ_{k}^{- 1} - 1) + 1) g_{k}, α \in (1, 2),$

(23)

$γ_{k + 1}^{H T A D S S} = 2 \frac{f (x_{k + 1}) - f (x_{k}) + α φ_{k} {∥ g_{k} ∥}^{2}}{α^{2} φ_{k}^{2} {∥ g_{k} ∥}^{2}},$

(24)

where

$φ_{k} = t_{k} (γ_{k}^{- 1} - 1) + 1 .$

(25)
Hybrid gradient descent method ( $H G D$ ) [32]

$x_{k + 1} = x_{k} - (α_{k} + 1) t_{k} g_{k}, α_{k} \in (0, 1) \forall k .$

(26)
Hybrid accelerated gradient descent method ( $H A G D$ ) [32]

$x_{k + 1} = x_{k} - (α_{k} + 1) θ_{k} t_{k} g_{k}, α \in (0, 1) \forall k, θ_{k} = \frac{γ_{k}}{t_{k} γ_{k + 1}} .$

(27)
Hybrid modified accelerated gradient descent method ( $H M A G D$ ) [32]

$x_{k + 1} = x_{k} - (α_{k} + 1) θ_{k} (t_{k} + t_{k}^{2} - t_{k}^{3}) g_{k}, α \in (0, 1) \forall k, θ_{k} = \frac{γ_{k}}{t_{k} γ_{k + 1}} .$

(28)
Hybrid modified improved gradient descent method ( $H M I G D$ ) [32]

$x_{k + 1} = x_{k} - (α_{k} + 1) γ_{k}^{- 1} (t_{k} + t_{k}^{2} - t_{k}^{3}) g_{k}, α \in (0, 1) \forall k .$

(29)

As shown above, from Khan’s hybridization rule at least eight hybrid models have appeared. Convergence properties as well as performance efficiency of these iterative schemes are presented and illustrated in the literature. The leading model of all listed ((17), (19), (21), (23), (26), (27), (28), (29)), and therewith the first one that was developed on the basis of the Khan’s process, is certainly the

H S M

method from [28]. In paper [28], the authors examined the performance of the defined method for various values of the so-called correction parameter

α_{k} \in (0, 1)

, i.e.,

α = α_{k} + 1 \in (1, 2)

, which is a necessary factor of all hybrid methods generated through Khan’s hybridization. They experimentally concluded that the

H S M

method achieves best performance characteristics when the correction parameter is taken closely to its left limit. Later in [33], the authors improved the

H S M

model by reducing the initial step length parameter in the Backtracking procedure.

3. Dolan–Moré Performance Profiles and Comparisons

In this section, we apply the aspects of Dolan–Moré benchmarking optimization software from [34] on hybrid and non-hybrid variants of chosen accelerated gradient minimization models. In conducted numerical tests, we follow performance metrics regarding the number of iterations, CPU time and number of function evaluations.

For all obtained numerical outcomes, the following points are valid:

Codes are written in the visual C++ programming language and run on a Workstation Intel(R) Core(TM) 2.3 GHz.
The Backtracking parameters values taken are: $σ = 0.0001$ and $β = 0.8$ . These are standard values for the Backtracking parameters applied in various optimization models with Backtracking algorithm [2,14,15,20,21,22,28,29,30,31]. This set of values means that a small portion of the decrease predicted by the linear approximation of the current point is accepted.
The stopping criteria are:

$∥ g_{k} ∥ \leq 10^{- 6} and \frac{| f (x_{k + 1}) - f (x_{k}) |}{1 + | f (x_{k}) |} \leq 10^{- 16} .$
Chosen test functions are taken from the unconstrained test functions collection presented in [35]. More precisely, all specific test functions that were used for this research are listed in Listing 1.

Listing 1. Test functions.

1. Extended Penalty

2. Perturbed Quadratic

3. Raydan-1

4. Diagonal 1

5. Diagonal 3

6. Generalized Tridiagonal-1

7. Diagonal 4

8. Extended Himmelblau

9. Quadr. Diag. Perturbed

10. Quadratic QF1

11. Exten. Quadr. Penalty QP1

12. Exten. Quadr. Penalty QP2

13. Quadratic QF2

14. Extended EP1

15. Arwhead

16. Almost Perturbed Quadratic

17. Engval1

18. Quartc

19. Generalized Quartic

20. LIARWHD

21. Diagonal 6

22. Tridia

23. Indef

24. Diagonal 9

25. DIXON3DQ

26. NONSCOMP

27. BIGGSB1

28. Power (Cute)

29. Hager

30. Raydan 2

Further, by

i_{p, s}, t_{p, s}

and

e_{p, s}

we denote the number of iterations, the CPU time and the number of function evaluations, respectively, needed for solving problem p when the solver s is applied. The main observation arises from a comparison of performance profiles, considering analyzed metrics, of hybrid and non-hybrid versions of the same scheme. For this investigation, we chose the following three accelerated gradient methods:

S M

(8),

A D D

(9) and

T A D S S

(14). Accordingly, we analyzed their hybrid forms

H S M

(17),

H A D D

(19) and

H T A D S S

(23), as well. So in these tests, the solver s belongs to the set of six elements,

s \in {S M, H S M, A D D, H A D D, T A D S S, H T A D S S}

. We specifically chose this set of comparative non-hybrid and hybrid pairs among the others mentioned above in Section 2, since the selected models are the most cited of Khan’s hybrid methods.

According to the benchmark presented in [34] and regarding the comparisons obtained within this paper, for each pair of comparative hybrid and non-hybrid variants we have two solvers, i.e.,

n_{s} = 2

, where the comparative pairs are:

{(S M, H S M), (A D D, H A D D), (T A D S S, H T A D S S)} .

The total number of experiments for each pair is minimal

n_{p} = 210

. Precisely, for the pair

(S M, H S M)

we have conducted numerical tests for 25 test function and 11 different numbers of variables, so

n_{p} = 11 \cdot 25 = 275

, the same as for the pair

(T A D S S, H T A D S S)

. The pair

(A D D, H A D D)

included 21 test functions with 10 different numbers of variables, so in this case

n_{p} = 210

. In order to apply Dolan–Moré benchmarking optimization software with performance profiles over the chosen accelerated and hybrid models, we use the original outcomes presented in the papers in which these models were generated [28,29,31]. So, since in [28,31] numerical experiments included 25 test functions, while in [29] the number of tested functions is 21, the total number of tests for all test functions and all three pairs of models is

n_{p}^{S M, H S M} + n_{p}^{T A D S S, H T A D S S} + n_{p}^{A D D, H A D D} = 760 .

Considering defined parameters, we are now able to expose performance ratios defined for the number of iterations, the CPU time and the number of function evaluations, respectively:

\begin{array}{l} r_{p, s} = \frac{i_{p, s}}{min {i_{p, s} : s \in {S M, H S M}}} \\ = \frac{i_{p, s}}{min {i_{p, s} : s \in {A D D, H A D D}}} = \frac{i_{p, s}}{min {i_{p, s} : s \in {T A D S S, H T A D S S}}}, \end{array}

\begin{array}{l} r_{p, s} = \frac{t_{p, s}}{min {t_{p, s} : s \in {S M, H S M}}} \\ = \frac{t_{p, s}}{min {t_{p, s} : s \in {A D D, H A D D}}} = \frac{t_{p, s}}{min {t_{p, s} : s \in {T A D S S, H T A D S S}}}, \end{array}

\begin{array}{l} r_{p, s} = \frac{e_{p, s}}{min {e_{p, s} : s \in {S M, H S M}}} \\ = \frac{e_{p, s}}{min {e_{p, s} : s \in {A D D, H A D D}}} = \frac{e_{p, s}}{min {e_{p, s} : s \in {T A D S S, H T A D S S}}} . \end{array}

As in [34], we define the performance profile for each solver s with respect to all three measured metrics

ρ_{s} (τ) = \frac{1}{n_{p}} s i z e {p \in P : r_{p, s} \leq τ},

(30)

which presents the cumulative distribution function. In (30) parameter

τ \in R,

while

P

is the set of problems.

In Figure 1, Figure 2 and Figure 3, we present the performance profiles of

S M

and

H S M

regarding the number of iterations, the CPU time and the number of function evaluations, respectively. Comparisons between pairs (

A D D, H A D D

) and (

T A D S S, H T A D S S

), regarding all three tested metrics are similarly illustrated in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9.

From Figure 1, Figure 2 and Figure 3, we clearly observe that the

H S M

algorithm outperforms the non-hybrid

S M

model, with respect to the number of iterations and the CPU time metrics, while regarding the number of evaluations metric, the hybrid and non-hybrid models perform similarly.

From Figure 5 and Figure 6, we see that the hybrid accelerated double direction model shows conspicuously better features regarding CPU time and the number of evaluations metrics. Nevertheless, concerning the number of iterations metric, forerunner

A D D

is more efficient, as shown in Figure 4.

Finally, comparisons of the

T A D S S

and the

H T A D S S

methods are displayed in Figure 7, Figure 8 and Figure 9. From these three presented graphs, we observe that the hybrid version of the transformed double step size method convincingly upgrades its counterpart non-hybrid model. In this case, the dominance of the hybrid variant with respect to all three analyzed metrics is more than evident.

To obtain Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9, a total of 4560 numerical outcomes were included. More precisely, for 6 analyzed methods (

S M, A D D, T A D S S, H S M, H A D D, H T A D S S

) we followed 3 metrics (number of iterations, CPU time, number of function evaluations) on 25 test functions for

S M, H S M, T A D S S

and

H T A D S S

solvers and 21 test functions for

A D D

and

H A D D

solvers from [35]. For each test function, the tests were conducted for at least 10 different numbers of variables. With that, the execution time for each test is limited by the time-limiter parameter defined in [16].

4. Conclusions

In this research, we present an overview of two gradient method classes: accelerated gradient descent models and its hybrid variants derived from Khan’s three-term iterative rule. This is an important and useful retrospective of one, confirmed efficient approach in defining the robust accelerated methods for solving unconstrained optimization problems. The obtained results, achieved on the basis of comprehensive Dolan–Moré performance profiles [34], conducted on total 4560 numerical outcomes, confirm that Khan’s hybridization rule is justified for use as an applicable technique in generating effective minimization processes. Accordingly, this research paves the way for new possibilities aimed at generating similar hybridization rules and their applications to accelerated gradient schemes.

Author Contributions

Conceptualization, V.R. and M.J.P.; Methodology, V.R.; Software, M.J.P.; Validation, V.R.; Formal analysis, V.R.; Investigation, M.J.P.; Resources, V.R.; Data curation, M.J.P.; Writing–original draft, M.J.P.; Writing–review & editing, V.R.; Visualization, M.J.P.; Supervision, V.R.; Project administration, M.J.P. All authors contributed equally and significantly to the writing of this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was financially supported by internal-junior project IJ-0202, Faculty of Sciences and Mathematics, University of Priština in Kosovska Mitrovica.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data results are available on request from the corresponding author.

Acknowledgments

The authors gratefully acknowledge support from the project Grant No. 174025 by the Ministry of Education and Science of Republic of Serbia.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Powel, M.J.D. A survey of numerical methods for unconstrained optimization. SIAM Rev. 1970, 12, 79–97. [Google Scholar] [CrossRef]
Andrei, N. Nonlinear Conjugate Gradient Methods for Unconstrained Optimization. In Nonlinear Conjugate Gradient Methods for Unconstrained Optimization; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
Nocadal, J.; Wright, S.J. Numerical Optimization. In Numerical Optimization; Springer: New York, NY, USA, 1999. [Google Scholar]
Jacoby, S.L.S.; Kowalik, J.S.; Pizzo, J.T. Iterative Methods for Nonlinear Optimization Problems. In Iterative Methods for Nonlinear Optimization Problems; Prentice-Hall, Inc.: Englewood, NJ, USA, 1977. [Google Scholar]
Deniss, J.E.; Kowalik, J.S.; Schnabel, R.B. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. In Numerical Methods for Unconstrained Optimization and Nonlinear Equations; Prentice-Hall: Englewood Cliffs, NJ, USA, 1983. [Google Scholar]
Fletcher, R. Practical methods of optimization. In Practical Methods of Optimization; Prentice-Hall, Wiley: New York, NY, USA, 2000. [Google Scholar]
Luenberg, D.G.; Ye, Y. Linear and nonlinear programming. In Linear and Nonlinear Programming; Springer Science + Business Media, LLC: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Sun, W.; Yuan, Y.X. Optimization Theory and Methods: Nonlinear Programming. In Optimization Theory and Methods: Nonlinear Programming; Springer: New York, NY, USA, 2006. [Google Scholar]
Shi, Z.J. Convergence of line search methods for unconstrained optimization. Appl. Math. Comput. 2004, 151, 393–405. [Google Scholar] [CrossRef]
Wolfe, P. Convergence conditions for ascent methods. SIAM Rev. 1968, 11, 226–235. [Google Scholar] [CrossRef]
Ortega, J.M.; Rheinboldt, W.C. Iterative Solution of Nonlinear Equation in Several Variables. In Iterative Solution of Nonlinear Equation in Several Variables; Academic Press: Cambridge, MA, USA, 1970. [Google Scholar]
Armijo, L. Minimization of functions having Lipschitz continuous first partial derivatives. Pac. J. Math. 2008, 16, 1–3. [Google Scholar] [CrossRef] [Green Version]
Brezinski, C. A classification of quasi-Newton methods. Numer. Algor. 2003, 33, 123–135. [Google Scholar] [CrossRef]
Stanimirovic, P.S.; Miladinović, M.B. Accelerated gradient descent methods with line search. Numer. Algor. 2010, 54, 503–520. [Google Scholar] [CrossRef]
Andrei, N. An acceleration of gradient descent algoritham with backtracing for unconstrained optimization. Numer. Algor. 2006, 42, 63–173. [Google Scholar] [CrossRef]
Petrović, M.J.; Stanimirovic, P.S. Accelerated Double Direction Method For Solving Unconstrained Optimization Problems. Math. Probl. Eng. 2014, 2014, 965104. [Google Scholar] [CrossRef] [Green Version]
Petrović, M.J. An accelerated Double Step Size method in unconstrained optimization. Appl. Math. Comput. 2015, 250, 309–319. [Google Scholar] [CrossRef]
Stanimirovic, P.S.; Petrović, M.J.; Milovanović, G.V. A Transformation of Accelerated Double Step Size Method for Unconstrained Optimization. Math. Probl. Eng. 2015, 2015, 283679. [Google Scholar] [CrossRef]
Petrović, M.J.; Valjarević, D.; Ilić, D.; Valjarević, A.; Mladenović, J. An improved modification of accelerated double direction and double step-size optimization schemes. Mathematics 2022, 10, 259. [Google Scholar] [CrossRef]
Barzilai, B.; Borwein, J.M. Two point step-size gradient method. IMA J. Numer. Anal. 1988, 8, 141–148. [Google Scholar] [CrossRef]
Miladinović, M.; Stanimirović, P.S.; Miljković, S. Scalar correction method for solving large scale’ unconstrained minimization problems. J. Optim. Theory Appl. 2011, 151, 304–320. [Google Scholar] [CrossRef]
Andrei, N. A new three-term conjugate gradient algorithm for unconstrained optimization. Numer. Algor. 2014, 68, 305–321. [Google Scholar] [CrossRef]
Djuranović-Miličić, N.I.; Gardašević-Filipović, M. A multi-step curve search algorithm in nonlinear optimization: Nondifferentiable convex case. Facta Univ. Ser. Math. Inform. 2010, 25, 11–24. [Google Scholar]
Picard, E. Memoire sur la theorie des equations aux derivees partielles et la methode des approximations successives. J. Math. Pures Appl. 1890, 6, 145–210. [Google Scholar]
Mann, W.R. Mean value methods in iterations. Proc. Am. Math. Soc. 1953, 4, 506–510. [Google Scholar] [CrossRef]
Ishikawa, S. Fixed points by a new iteration method. Proc. Am. Math. Soc. 1974, 44, 147–150. [Google Scholar] [CrossRef]
Khan, S.H. A Picard-Mann hybrid iterative process. Fixed Point Theory Appl. 2013, 2013, 69. [Google Scholar] [CrossRef]
Petrović, M.J.; Rakočević, V.; Kontrec, N.; Panić, S.; Ilić, D. Hybridization Accel. Gradient Descent Method. Numer. Algor. 2018, 79, 769–786. [Google Scholar] [CrossRef]
Petrović, M.J.; Stanimirović, P.S.; Kontrec, N.; Maldenović, J. Hybrid modification of accelerated double direction method. Math. Probl. Eng. 2018, 2018, 1523267. [Google Scholar] [CrossRef]
Petrović, M.J. Hybridization Rule Applied on Accelerated Double Step Size Optimization Scheme. Filomat 2019, 33, 655–665. [Google Scholar] [CrossRef]
Petrović, M.J.; Rakočević, V.; Valjarević, D.; Ilić, D. A note on hybridization process applied on transformed double step size model. Numer. Algor. 2020, 85, 449–465. [Google Scholar] [CrossRef]
Ivanov, M.J.; Stanimirović, P.S.; Milovanović, G.V.; Djordjević, S.; Brajević, I. Accelerated Multi Step-Size Methods Solving Unconstrained Optimization Problems. Optim. Method Softw. 2020, 85, 449–465. [Google Scholar] [CrossRef]
Petrović, M.J.; Panić, S.; Carerević, M.M. Initial improvement of the hybrid accelerated gradient descent process. Bull. Aust. Math. Soc. 2018, 98, 331–338. [Google Scholar] [CrossRef]
Dolan, E.; Moré, J. Benchmarking optimization software with performance profiles. Math. Program 2002, 91, 201–213. [Google Scholar] [CrossRef]
Andrei, N. An Unconstrained Optimization Test Functions Collection. Adv. Model. Optim. 2008, 10, 147–161. Available online: https://camo.ici.ro/journal/vol10/v10a10.pdf (accessed on 16 November 2022).

Figure 1. Performance profiles for the HSM and SM methods regarding the number of iterations metric.

Figure 2. Performance profiles for the HSM and SM methods regarding the CPU time metric.

Figure 3. Performance profiles for the HSM and SM methods regarding the number of function evaluations metric.

Figure 4. Performance profiles for the HADD and ADD methods regarding the number of iterations metric.

Figure 5. Performance profiles for the HADD and ADD methods regarding the CPU time metric.

Figure 6. Performance profiles for the HADD and ADD methods regarding the number of function evaluations metric.

Figure 7. Performance profiles for the HTADSS and TADSS methods regarding the number of iterations metrics.

Figure 8. Performance profiles for the HTADSS and TADSS methods regarding the CPU time metric.

Figure 9. Performance profiles for the HTADSS and TADSS methods regarding the number of function evaluations metric.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rakočević, V.; Petrović, M.J. Comparative Analysis of Accelerated Models for Solving Unconstrained Optimization Problems with Application of Khan’s Hybrid Rule. Mathematics 2022, 10, 4411. https://doi.org/10.3390/math10234411

AMA Style

Rakočević V, Petrović MJ. Comparative Analysis of Accelerated Models for Solving Unconstrained Optimization Problems with Application of Khan’s Hybrid Rule. Mathematics. 2022; 10(23):4411. https://doi.org/10.3390/math10234411

Chicago/Turabian Style

Rakočević, Vladimir, and Milena J. Petrović. 2022. "Comparative Analysis of Accelerated Models for Solving Unconstrained Optimization Problems with Application of Khan’s Hybrid Rule" Mathematics 10, no. 23: 4411. https://doi.org/10.3390/math10234411

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparative Analysis of Accelerated Models for Solving Unconstrained Optimization Problems with Application of Khan’s Hybrid Rule

Abstract

1. Class of Accelerated Gradient Descent Methods and Its Benefits

2. Three-Term Khan’s Hybridization Principle over the Accelerated Gradient Descent Models

3. Dolan–Moré Performance Profiles and Comparisons

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI