A Three-Dimensional Subspace Algorithm Based on the Symmetry of the Approximation Model and WYL Conjugate Gradient Method

Wang, Guoxin; Yao, Shengwei; Pei, Mingyang; Xu, Jieqiong

doi:10.3390/sym15061207

Open AccessArticle

A Three-Dimensional Subspace Algorithm Based on the Symmetry of the Approximation Model and WYL Conjugate Gradient Method

¹

School of Mathematics and Information Science, Guangxi University, Nanning 530004, China

²

Guangxi (ASEAN) Financial Research Center, Guangxi University of Finance and Economics, Nanning 530007, China

^*

Author to whom correspondence should be addressed.

Symmetry 2023, 15(6), 1207; https://doi.org/10.3390/sym15061207

Submission received: 4 May 2023 / Revised: 30 May 2023 / Accepted: 1 June 2023 / Published: 5 June 2023

(This article belongs to the Special Issue Symmetry in Optimization Theory, Algorithm and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, a three-dimensional subspace method is proposed, in which the search direction is generated by minimizing the approximation model of the objective function in a three-dimensional subspace. The approximation model of the objective function is not unique, and alternatives can be chosen between a symmetric quadratic model and a conic model by specific criteria. Moreover, the idea of a WLY conjugate gradient method is applied to characterize the change of gradient direction between adjacent iteration points. The strategy of initial stepsize and nonmonotone line search are adopted, and the global convergence of the presented algorithm is established under mild assumptions. In numerical experiments, we use a collection of 80 unconstrained optimization test problems to show the competitive performance of the presented method.

Keywords:

unconstrained optimization; subspace; conic model; global convergence

1. Introduction

Considering the following unconstrained optimization problem:

\min_{x \in ℜ^{n}} f (x),

(1)

with an initial point

x_{0}

, the following iterative formula is often used to solve (1):

x_{k + 1} = x_{k} + α_{k} d_{k},

(2)

where

x_{k}

is the kth iteration point,

α_{k} \in ℜ

is the stepsize determined by a line search procedure, and

d_{k}

is the search direction obtained by special ways.

The search direction of the conjugate gradient (CG) methods is obtained by

d_{k + 1} = - g_{k + 1} + β_{k} d_{k}, k = 0, 1, . . .,

(3)

where

d_{0} = - g_{0}

,

g_{k + 1} = \nabla f (x_{k + 1})

and

β_{k} \in ℜ

is a scalar called the conjugate gradient parameter. Corresponding to different choices for the parameter

β_{k}

, various nonlinear conjugate gradient methods have been proposed. Some classical CG methods include HS method (Hestenes and Stiefel [1]), FR method (Fletcher and Reeves [2]), PRP method (Polak et al. [3]), CD method (Fletcher [4]), LS method (Liu and Storey [5]) and DY method (Dai and Yuan [6]).

Among these mentioned CG methods, the PRP, HS and LS methods share the common numerator

g_{k + 1}^{T} y_{k}

, where

y_{k} = g_{k + 1} - g_{k}

. When the step

x_{k + 1} - x_{k}

is small, the factor

y_{k}

tends to zero, so

β_{k}

becomes small and the next search direction is essentially the steepest descent direction which can avoid jamming. Hence, these methods have better numerical performance than those lacking in

g_{k + 1}^{T} y_{k}

. However, the PRP, HS and LS methods possess weak convergence properties. Based on the PRP method, Wei et al. [7] proposed a new formula

β_{k}^{W Y L} = \frac{g_{k + 1}^{T} y_{k}^{*}}{g_{k}^{T} g_{k}},

where

y_{k}^{*} = g_{k + 1} - \frac{∥ g_{k + 1} ∥}{∥ g_{k} ∥} g_{k}

and

∥ . ∥

denotes the Euclidean norm. When step

x_{k + 1} - x_{k}

is small,

y_{k}^{*}

also tends to zero so that the WYL method can also avoid jamming. Moreover,

y_{k}^{*}

can avoid the drawback: if the gradients of

x_{k + 1}

and

x_{k}

are significantly different, the structure of

y_{k}

makes the smaller one between

g_{k + 1}

and

g_{k}

nearly insignificant so that the method cannot make full use of the information. Furthermore, the WYL method not only has nice numerical performance but also has global convergence properties. More applications and properties of

y_{k}^{*}

can be found in [7,8,9,10,11,12,13].

The procedure for determining the stepsize

α_{k}

used in (2) is usually called line search, which can be classified into an exact one and an inexact one. One of the most used inexact line search is the so-called standard Wolfe line search [14]:

f (x_{k} + α_{k} d_{k}) \leq f (x_{k}) + δ α_{k} g_{k}^{T} d_{k},

(4)

\nabla f {(x_{k} + α_{k} d_{k})}^{T} d_{k} \geq σ g_{k}^{T} d_{k},

(5)

where

0 < δ < σ < 1

. Obviously, it is a monotone procedure that seeks a suitable

α_{k}

, making the function value decrease to some extent. Zhang and Hager [15] proposed a nonmonotone version (ZH line search) that modifies condition (4) to

f (x_{k} + α_{k} d_{k}) \leq C_{k} + δ α_{k} g_{k}^{T} d_{k},

(6)

where

C_{0} = f (x_{0})

,

Q_{0} = 1

,

C_{k + 1}

and

Q_{k + 1}

are updated by

C_{k + 1} = \frac{η_{k} Q_{k} C_{k} + f (x_{k + 1})}{Q_{k + 1}}, Q_{k + 1} = η_{k} Q_{k} + 1,

(7)

where

η_{k} \in [η_{m i n}, η_{m a x}]

and

0 \leq η_{m i n} \leq η_{m a x} \leq 1

. The choice of

η_{k}

controls the degree of nonmonotonicity. Such a line search can not only overcome some drawbacks in the monotone line search, but is particularly efficient for unconstrained problems in numerical experiments [15]. In addition to ZH line search, there are many other efficient nonmonotone line search procedures, which can be found in [16,17,18,19,20,21,22,23,24,25].

The subspace technique is one of the effective means for solving large-scale optimization problems, which is receivinging more and more attention. Yuan reviewed various subspace techniques that have been used in constructing numerical methods for solving nonlinear optimization problems in [26,27]. There are many optimization methods adopting the subspace technique, such as the limited-memory quasi-Newton method [28], the subspace trust region methods [29,30], the subspace SQP method [31], and more research can be found in [32,33,34,35,36,37,38]. Moreover, the combination between subspace technique and conjugate gradient method has been extensively studied. In the earliest research (see [39]), Yuan and Stoer computed the search direction

d_{k + 1}

by minimizing the approximation quadratic model in the two dimensional subspace spanned by

g_{k + 1}

and

s_{k}

, namely

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}}

where

s_{k} = x_{k + 1} - x_{k}

, and proposed the subspace minimization conjugate gradient method (SMCG), in which

d_{k + 1}

is formed by

d_{k + 1} = t g_{k + 1} + μ s_{k},

(8)

where t and

μ

are undetermined parameters. Based on the above idea, Andrei [40] extended the subspace to

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}, y_{k}}

and exploited the acceleration scheme, finally presenting a three-term conjugate gradient method (TTS), in which

d_{k + 1} = - g_{k + 1} + μ s_{k} + ν y_{k},

and

μ

,

ν

are also scalar parameters. Inspired by Andrei, Yang et al. [41] changed the subspace into

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}, s_{k - 1}}

, and put forward the subspace three-term conjugate gradient method (STT). For the same subspace, Li et al. [42] added more parameters to the computation of search direction so that

d_{k + 1} = t g_{k + 1} + μ s_{k} + ν s_{k - 1},

and adopted the strategy of nonmonotone line search, eventually proposing the subspace minimization conjugate gradient method with nonmonotone line search (SMCG_NLS). Huo et al. [13] combined the idea of the WYL method with subspace method, then constructed the subspace

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}, y_{k}^{*}}

, in which

d_{k + 1} = t g_{k + 1} + μ s_{k} + ν y_{k}^{*},

and finally proposed a new three-dimensional subspace conjugate gradient method (TSCG).

On the other hand, Dai and Kou [43] also focused on the analysis of Yuan and Stoer, but they paid more attention to the estimate of the parameter

ρ_{k + 1} = g_{k + 1}^{T} B_{k + 1} g_{k + 1}

during the calculation of

d_{k + 1}

. They combined the Barzilai–Borwein [44] idea and provided some efficient Barzilai–Borwein conjugate gradient methods (BBCG). It is remarkable that the idea of BBCG to estimate

ρ_{k + 1}

is employed in this paper.

It is noteworthy that all of the above mentioned subspace minimization conjugate gradient methods obtain the search direction by minimizing the approximate quadratic model of objective function in the presented subspace. However, Sun and Yuan [45,46] have pointed out that, when the current iterative point is not close to the minimizer, the quadratic model may lead to a poor prediction of the minimizer if the objective function possesses strong non-quadratic behavior. Furthermore, a quadratic model does not take into account more information instead of the gradient value in the current iteration, which means that it does not have enough degrees freedom for incorporating all of the information in the iterative procedure.

Thus, the research for approximate nonquadratic model is of the essence. Up to now, many nonquadratic models have been applied to optimization problems, such as conic model, tensor model and regularization model. The conic model can incorporate more function information than the quadratic model, and its application in unconstrained optimization was first studied by Davidon [47]. A typical conic model for unconstrained optimization is

ϕ_{k + 1} (s) = \frac{g_{k + 1}^{T} s}{1 + b_{k + 1}^{T} s} + \frac{1}{2} \frac{s^{T} B_{k + 1} s}{{(1 + b_{k + 1}^{T} s)}^{2}},

which is an approximation to

f (x_{k} + s) - f (x_{k})

, and

B_{k + 1}

is a symmetric positive definite matrix approximating to the Hessian of

f (x)

at

x_{k + 1}

which satisfies the secant equation

B_{k + 1} s_{k} = y_{k}

. The vector

b_{k + 1}

is normally called the horizontal vector satisfying

1 + b_{k + 1}^{T} s > 0

. Such a conic model has been investigated by many scholars. Sorensen [48] discussed a class of conic methods called “optimization by collinear scaling” for unconstrained optimization and shown that a particular member of this algorithm class has a Q-superlinear convergence. Ariyawansa [49] modified the procedure of Sorensen [48] and established the duality between the collinear scaling DFP and BFGS methods. Sheng [50] further discussed the interpolation properties of conic model method. Di and Sun [51] proposed a trust region method for conic models to solve unconstrained optimization problems. The trust region methods based on conic model have brought about a great number of research.

Li et al. [52] paid attention to the combination of subspace method and conic model. They considered the following conic approximation model:

ϕ_{k + 1} (d) = \frac{g_{k + 1}^{T} d}{1 + b_{k + 1}^{T} d} + \frac{1}{2} \frac{d^{T} B_{k + 1} d}{{(1 + b_{k + 1}^{T} d)}^{2}}

(9)

where

\begin{matrix} b_{k + 1} & = - \frac{1 - γ_{k + 1}}{γ_{k + 1} g_{k + 1}^{T} s_{k}} g_{k + 1}, \\ γ_{k + 1} & = \frac{- g_{k}^{T} s_{k}}{\sqrt{Δ_{k + 1}} + f_{k} - f_{k - 1}}, \\ Δ_{k + 1} & = {(f_{k} - f_{k + 1})}^{2} - (g_{k + 1}^{T} s_{k}) (g_{k}^{T} s_{k}) . \end{matrix}

In the two-dimensional subspace

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}}

, they minimized the above conic model to compute the search direction, and finally developed a subspace minimization conjugate method based on the conic model (SMCG_Conic). Sun et al. [53] extended the subspace to

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}, s_{k - 1}}

and presented a three-dimensional subspace minimization conjugate gradient method based on a conic model (CONIC_CG3).

The adoption of a nonquadratic model does not mean abandoning the quadratic model; here we refer to the analysis of Yuan [54], in which a quantity

u_{k}

is defined by

u_{k} = |\frac{2 (f_{k} - f_{k + 1} + g_{k + 1}^{T} s_{k})}{s_{k}^{T} y_{k}} - 1|,

which shows the extent of how the objective function

f (x)

is close to a quadratic on the line segment between

x_{k}

and

x_{k + 1}

. Dai et al. [55] indicate that if the following condition

u_{k} \leq c_{1} o r \max {u_{k}, u_{k - 1}} \leq c_{2}

(10)

holds, where

0 < c_{1} < c_{2}

are two small constants, then they think that

f (x)

is very close to a quadratic on the line segment between

x_{k}

and

x_{k + 1}

. If the condition (10) is satisfied, then the choice of the quadratic approximation model is preferable; otherwise, the conic model is more suitable. The utilization of such a quantity can be found in [56,57].

Since the combination of subspace technique and conic model has been investigated, we come up with the idea of whether we can make further studies. SMCG_Conic and CONIC_CG3 are efficient methods that possess good theoretical and numerical results, but they only considered the two-dimensional subspace

S p a n {g_{k + 1}, s_{k}}

and three-dimensional subspace

S p a n {g_{k + 1}, s_{k}, s_{k - 1}}

, respectively. Hence, we expect to extend the application of conic model in the subspace conjugate gradient method. In addition, the WYL method with the ingredient

y_{k}^{*}

has good performance both in theory and numerical experiment, and the subspace method TSCG based on quadratic model and subspace

S p a n {g_{k + 1}, s_{k}, y_{k}^{*}}

also is an efficient method. In this paper, we investigates a new application of WYL method and subspace method based on the conic model by extending SMCG_Conic to subspace

S p a n {g_{k + 1}, s_{k}, y_{k}^{*}}

. Furthermore, we adopt the initial stepsize strategy and nonmonotone line search, under which we establish the sufficient descent property and global convergence property. Since our method is based on three-dimensional subspace and conic model, it has efficient performance in numerical experiment and can solve more problems than subspace method based on two-dimensional subspace and quadratic model.

This paper is organized as follows: in Section 2, the search directions on the subspace

Ω_{k + 1}

based on two different models are derived, and the criteria for how to choose the approximate model and search direction are presented. In Section 3, we obtain the stepsize by the strategies of initial stepsize and nonmonotone line search, and elaborate the generated algorithm. In Section 4, we give the proofs for some important lemmas of the search direction, then state the convergence performance of the generated algorithm under suitable assumptions. In Section 5, we compare the numerical results of our algorithm with another two methods.

2. The Search Direction

For an optimization method, the search direction attaches great importance to the iterative formula, and the computation of search direction is always the first job. Hence, the main content of this section is to construct the formula for the search direction. Since the approximation model is selected between conic model and quadratic model, the computation will be divided into two parts.

2.1. Conic Model

In this subsection, we will give the computation of the search direction in the case that conic model is used.

When (10) is violated, the conic model is more suitable to approximate the objective function, so we compute the search direction by minimizing the conic model in subspace

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}, y_{k}^{*}}

. We consider the subproblem

\min_{d \in Ω_{k + 1}} ϕ_{k + 1} (d),

(11)

where

ϕ_{k + 1} (d)

is the same as (9).

The discussion will be divided into three parts depending on the three different dimensions of the subspace

Ω_{k + 1}

.

Situation 1:

d i m (Ω_{k + 1}) = 3 .

In this situation, the search direction is

d_{k + 1} = t g_{k + 1} + μ s_{k} + ν y_{k}^{*} .

(12)

By substituting (12) into (11) and using the secant equation, the problem (11) turns into

\min_{(t, μ, ν)} ϕ_{k + 1} (u) = \frac{a^{T} u}{1 + c^{T} u} + \frac{1}{2} \frac{u^{T} A_{k + 1} u}{{(1 + c^{T} u)}^{2}},

(13)

where

A_{k + 1} = (\begin{matrix} ρ_{k + 1} & g_{k + 1}^{T} y_{k} & ω_{k} \\ g_{k + 1}^{T} y_{k} & s_{k}^{T} y_{k} & y_{k}^{T} y_{k}^{*} \\ ω_{k} & y_{k}^{T} y_{k}^{*} & τ_{k} \end{matrix}), a = (\begin{matrix} ∥ g_{k + 1} ∥^{2} \\ g_{k + 1}^{T} s_{k} \\ g_{k + 1}^{T} y_{k}^{*} \end{matrix}), c = (\begin{matrix} b_{k + 1}^{T} g_{k + 1} \\ b_{k + 1}^{T} s_{k} \\ b_{k + 1}^{T} y_{k}^{*} \end{matrix}), u = (\begin{matrix} t \\ μ \\ ν \end{matrix}),

and

ρ_{k + 1} = g_{k + 1}^{T} B_{k + 1} g_{k + 1}, ω_{k} = g_{k + 1}^{T} B_{k + 1} y_{k}^{*}, τ_{k} = {y_{k}^{*}}^{T} B_{k + 1} y_{k}^{*} .

To solve (13), we first figure out the zero solution of the function

\nabla ϕ_{k + 1} (u) = \frac{1}{1 + c^{T} u} (I - \frac{c u^{T}}{1 + c^{T} u}) (a + \frac{A_{k + 1} u}{1 + c^{T} u}),

and obviously

I - \frac{c u^{T}}{1 + c^{T} u}

is invertible, so the problem is reduced to

a + \frac{A_{k + 1} u}{1 + c^{T} u} = 0

. After calculation, we can deduce that, when

A_{k + 1}

is positive definite and

1 + c^{T} A_{k + 1}^{- 1} a \neq 0

, the solution of (13) is

u_{k + 1} = \frac{- A_{k + 1}^{- 1} a}{1 + c^{T} A_{k + 1}^{- 1} a} .

(14)

By using the relationship

A_{k + 1}^{- 1} = \frac{A_{k + 1}^{*}}{| A_{k + 1} |}

, where

\begin{matrix} A_{k + 1}^{*} = (\begin{matrix} X & θ_{1} & θ_{2} \\ θ_{1} & θ & θ_{3} \\ θ_{2} & θ_{3} & Y \end{matrix}) \end{matrix}

(15)

is the adjugate matrix of

A_{k + 1}

, and

X, θ_{1}, θ_{2}, θ, θ_{3}, Y

denote the cofactor of

A_{k + 1}^{11}, A_{k + 1}^{21}

,

A_{k + 1}^{31}, A_{k + 1}^{22}, A_{k + 1}^{32}, A_{k + 1}^{33}

, respectively, in which

A_{k + 1}^{i j}

denotes the ij-th element of

A_{k + 1}

and

\begin{matrix} X & = (s_{k}^{T} y_{k}) τ_{k} - {(y_{k}^{T} y_{k}^{*})}^{2}, \\ θ_{1} & = (y_{k}^{T} y_{k}^{*}) ω_{k} - (g_{k + 1}^{T} y_{k}) τ_{k}, \\ θ_{2} & = (g_{k + 1}^{T} y_{k}) (y_{k}^{T} y_{k}^{*}) - (s_{k}^{T} y_{k}) ω_{k}, \\ θ & = ρ_{k + 1} τ_{k} - ω_{k}^{2}, \\ θ_{3} & = (g_{k + 1}^{T} y_{k}) ω_{k} - ρ_{k + 1} (y_{k}^{T} y_{k}^{*}), \\ Y & = ρ_{k + 1} (s_{k}^{T} y_{k}) - {(g_{k + 1}^{T} y_{k})}^{2}, \end{matrix}

then we finally obtain the solution of (13),

u_{k + 1} = (\begin{matrix} t_{k + 1} \\ μ_{k + 1} \\ ν_{k + 1} \end{matrix}) = - \frac{1}{D_{k + 1}} (\begin{matrix} X & θ_{1} & θ_{2} \\ θ_{1} & θ & θ_{3} \\ θ_{2} & θ_{3} & Y \end{matrix}) (\begin{matrix} ∥ g_{k + 1} ∥^{2} \\ g_{k + 1}^{T} s_{k} \\ g_{k + 1}^{T} y_{k}^{*} \end{matrix}) = \frac{1}{D_{k + 1}} (\begin{matrix} - q_{1} \\ - q_{2} \\ - q_{3} \end{matrix}),

(16)

where

\begin{matrix} | A_{k + 1} | & = ρ_{k + 1} X + θ_{1} g_{k + 1}^{T} y_{k} + θ_{2} ω_{k}, \\ D_{k + 1} & = | A_{k + 1} | + q_{1} b_{k + 1}^{T} g_{k + 1} + q_{2} b_{k + 1}^{T} s_{k} + q_{3} b_{k + 1}^{T} y_{k}^{*}, \\ q_{1} & = X ∥ g_{k + 1} ∥^{2} + θ_{1} g_{k + 1}^{T} s_{k} + θ_{2} g_{k + 1}^{T} y_{k}^{*}, \\ q_{2} & = θ_{1} {∥ g_{k + 1} ∥}^{2} + θ g_{k + 1}^{T} s_{k} + θ_{3} g_{k + 1}^{T} y_{k}^{*}, \\ q_{3} & = θ_{2} {∥ g_{k + 1} ∥}^{2} + θ_{3} g_{k + 1}^{T} s_{k} + Y g_{k + 1}^{T} y_{k}^{*} . \end{matrix}

So the solution of (11) in the three-dimensional subspace

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}, y_{k}^{*}}

is

d_{k + 1} = (\begin{matrix} g_{k + 1} & s_{k} & y_{k}^{*} \end{matrix}) u_{k + 1} .

During evaluation, it is important to find appropriate ways to estimate

ρ_{k + 1}, ω_{k}

and

τ_{k}

in order to avoid matrix–vector multiplication and improve efficiency. Yuan and Stoer [39] proposed two ways to calculate such quantities containing

B_{k + 1}

, one of which is to obtain

B_{k + 1}

by using the scaled memoryless BFGS formula. The approach of Dai and Kou [43] is to combine the Barzilai–Borwein [44] idea by approximating the Hessian by

(1 / α_{k + 1}^{B B_{1}}) I

or

(1 / α_{k + 1}^{B B_{2}}) I

, where

\begin{matrix} α_{k + 1}^{B B_{1}} = \frac{∥ s_{k} ∥^{2}}{s_{k}^{T} y_{k}}, α_{k + 1}^{B B_{2}} = \frac{s_{k}^{T} y_{k}}{∥ y_{k} ∥^{2}} . \end{matrix}

For

ω_{k}

, we utilize the memoryless BFGS formula to obtain

B_{k + 1}

so that

\begin{matrix} ω_{k} & = g_{k + 1}^{T} (I + \frac{y_{k} y_{k}^{T}}{s_{k}^{T} y_{k}} - \frac{s_{k} s_{k}^{T}}{s_{k}^{T} s_{k}}) y_{k}^{*} \\ = g_{k + 1}^{T} y_{k}^{*} + \frac{g_{k + 1}^{T} y_{k} y_{k}^{T} y_{k}^{*}}{s_{k}^{T} y_{k}} - \frac{g_{k + 1}^{T} s_{k} s_{k}^{T} y_{k}^{*}}{s_{k}^{T} s_{k}} . \end{matrix}

(17)

Then for

τ_{k}

, we combine the idea of [42,43], and estimate

τ_{k}

by

τ_{k} = ζ_{k} \frac{∥ y_{k} ∥^{2}}{s_{k}^{T} y_{k}} {∥ y_{k}^{*} ∥}^{2},

(18)

where

ζ_{k} = \{\begin{matrix} \max {0.9 ζ_{k - 1}, 1.2}, & if α_{k} > 1.0, \\ \min {1.1 ζ_{k - 1}, 1.75}, & otherwise, \end{matrix}

and

ζ_{0} = 1.5

. It is obvious that

ζ_{k} \in [1.2, 1.75]

.

For the estimate of

ρ_{k + 1}

, we adopt the idea of Li et al. [42], because it can guarantee some good properties. Firstly, according to the analysis regarding (14), the positive definitiveness of

A_{k + 1}

is essential, which requires

| A_{k + 1} | > 0

. It follows that

ρ_{k + 1} > \frac{- θ_{1} g_{k + 1}^{T} y_{k} - θ_{2} ω_{k}}{X} .

(19)

By setting

X = m_{k} s_{k}^{T} y_{k} τ_{k}

with

m_{k} ≜ 1 - \frac{{(y_{k}^{T} y_{k}^{*})}^{2}}{s_{k}^{T} y_{k} τ_{k}}

, and combining (15), we have

ρ_{k + 1} > [\frac{{(g_{k + 1}^{T} y_{k})}^{2}}{s_{k}^{T} y_{k}} + \frac{ω_{k}^{2}}{τ_{k}} - 2 \frac{g_{k + 1}^{T} y_{k} ω_{k} y_{k}^{T} y_{k}^{*}}{s_{k}^{T} y_{k} τ_{k}}] / m_{k} ≜ n_{k},

(20)

if and only if

m_{k}

is positive, which can be guaranteed by (18). In addition, the positive definitiveness of

A_{k + 1}

also requires that its first and second order leading principal minors are positive, and it means

ρ_{k + 1} > \frac{{(g_{k + 1}^{T} y_{k})}^{2}}{s_{k}^{T} y_{k}} .

(21)

Secondly,

D_{k + 1} > 0

is also a necessary condition to keep the sufficient descent property, from which follows

ρ_{k + 1} > \frac{S_{k}}{s_{k}^{T} y_{k} τ_{k}} / M_{k},

(22)

if

M_{k} > 0

, where

\begin{matrix} S_{k} = - θ_{1} g_{k + 1}^{T} y_{k} - θ_{2} ω_{k} & - b_{k + 1}^{T} g_{k + 1} (X ∥ g_{k + 1} ∥^{2} + θ_{1} g_{k + 1}^{T} s_{k} + θ_{2} g_{k + 1}^{T} y_{k}^{*}) \\ - b_{k + 1}^{T} s_{k} (θ_{1} ∥ g_{k + 1} ∥^{2} - ω_{k}^{2} g_{k + 1}^{T} s_{k} + ω_{k} g_{k + 1}^{T} y_{k} g_{k + 1}^{T} y_{k}^{*}) \\ - b_{k + 1}^{T} y_{k}^{*} (θ_{2} ∥ g_{k + 1} ∥^{2} + ω_{k} g_{k + 1}^{T} y_{k} g_{k + 1}^{T} s_{k} - {(g_{k + 1}^{T} y_{k})}^{2} g_{k + 1}^{T} y_{k}^{*}), \end{matrix}

and

M_{k} = 1 - \frac{{(y_{k}^{T} y_{k}^{*})}^{2}}{s_{k}^{T} y_{k} τ_{k}} + \frac{1 - γ_{k + 1}}{γ_{k + 1}} [2 \frac{g_{k + 1}^{T} y_{k}^{*} y_{k}^{T} y_{k}^{*}}{s_{k}^{T} y_{k} τ_{k}} - \frac{g_{k + 1}^{T} s_{k}}{s_{k}^{T} y_{k}} - \frac{{(g_{k + 1}^{T} y_{k}^{*})}^{2}}{g_{k + 1}^{T} s_{k} τ_{k}}] .

Likewise, we define the right-hand side of (22) as

N_{k}

, i.e.,

N_{k} ≜ \frac{S_{k}}{s_{k}^{T} y_{k} τ_{k}} / M_{k}

.

After the above discussion, we can estimate

ρ_{k + 1}

as follows,

ρ_{k + 1} = ζ_{k} \max {K, N_{k}, n_{k}},

(23)

where

K = K_{1} {∥ g_{k + 1} ∥}^{2}

,

K_{1} = \max \{\frac{∥ y_{k} ∥^{2}}{s_{k}^{T} y_{k}}, |\frac{1 - γ_{k + 1}}{γ_{k + 1}}| \frac{∥ g_{k + 1} ∥^{2}}{| g_{k + 1}^{T} s_{k} |}\},

and

ζ_{k}

is the same as that in (18). One of the purposes of K is to ensure (21).

Before computing

d_{k + 1}

by (12) and (16), we should verify the following conditions:

Δ_{k + 1} \geq 0,

(24)

M_{k} \geq ρ_{0},

(25)

ξ_{1} \leq \frac{s_{k}^{T} y_{k}}{∥ s_{k} ∥^{2}} \leq \frac{∥ y_{k} ∥^{2}}{s_{k}^{T} y_{k}} \leq ξ_{2},

(26)

|\frac{1 - γ_{k + 1}}{γ_{k + 1}}| \frac{∥ g_{k + 1} ∥^{2}}{| g_{k + 1}^{T} s_{k} |} \leq ξ_{3},

(27)

\frac{| A_{k + 1} |}{s_{k}^{T} y_{k} τ_{k} ρ_{k + 1}} \geq ξ_{5},

(28)

where

ρ_{0} \in (0, 1)

and

ξ_{1}, ξ_{2}, ξ_{3}, ξ_{5}

are positive constants. The expressions (24) and (25) are fundamental premises of the conic model (5) and relation (22), respectively. On the basis of the Barzilai–Borwein [44] idea, (26) might indicate the suitable condition numbers of the approximation Hessian matrix. Moreover, (27) is vital to guarantee the descent property of the search direction. As for (28), obviously it makes

A_{k + 1}

more positive definite, and is also important for establishing the sufficient descent property of the search direction.

For convenience, we call (24)–(28) the first conic model conditions. Hence, if the first conic model conditions hold, then we compute the search direction by (12) and (16).

Situation 2:

d i m (Ω_{k + 1}) = 2

or 1.

Li et al. [52] have made a deep study of the subspace conjugate gradient method based on the conic model in this case. Here we refer to their works. When

d i m (Ω_{k + 1}) = 2

, the search direction is formed by

d_{k + 1} = t g_{k + 1} + μ s_{k},

which is the same as (8). The solution of t and

μ

is

(\begin{matrix} t_{k + 1} \\ μ_{k + 1} \end{matrix}) = \frac{1}{{\bar{D}}_{k + 1}} (\begin{matrix} - c_{1} \\ - c_{2} \end{matrix}),

(29)

which is the same as (13) in [52].

In this case, conditions (24) and (26) are still essential, but before obtaining

d_{k + 1}

by (8) and (29), the following conditions are also necessary:

\begin{matrix} {\bar{m}}_{k} & \geq {\bar{ρ}}_{0}, \end{matrix}

(30)

\begin{matrix} |\frac{1 - γ_{k + 1}}{γ_{k + 1}}| \frac{∥ g_{k + 1} ∥^{2}}{| g_{k + 1}^{T} s_{k} |} & \leq ξ_{3}, if \frac{1 - γ_{k + 1}}{γ_{k + 1} g_{k + 1}^{T} s_{k}} < 0, \end{matrix}

(31)

\begin{matrix} \frac{∥ g_{k + 1} ∥^{2} ∥ y_{k} ∥ ∥ s_{k} ∥}{{(g_{k + 1}^{T} s_{k})}^{2}} & \leq ξ_{4}, if \frac{1 - γ_{k + 1}}{γ_{k + 1} g_{k + 1}^{T} s_{k}} \geq 0, \end{matrix}

(32)

where

{\bar{ρ}}_{0} \in (0, 1)

,

ξ_{4}

is a positive constant and

ξ_{3}

is identical to that in (27). We call (24), (26) and (30)–(32) the second conic model conditions. If the first conic model conditions do not all hold but the second conic model conditions hold, we compute the search direction by (8) and (29).

If the first and second conic model conditions are violated but (33) and (34) hold,

ϑ_{1} \leq \frac{s_{k}^{T} y_{k}}{∥ s_{k} ∥^{2}},

(33)

\frac{| g_{k + 1}^{T} y_{k} g_{k + 1}^{T} d_{k} |}{d_{k}^{T} y_{k} {∥ g_{k + 1} ∥}^{2}} \leq ϑ_{4},

(34)

where

ϑ_{4} \in [0, 1)

and

ϑ_{1}

are positive constants, the HS direction is considered, i.e.,

d_{k + 1} = - g_{k + 1} + β_{k}^{H S} d_{k} .

(35)

We call (33) and (34) the HS conditions.

Therefore, there are two choices of search direction when

d i m (Ω_{k + 1}) = 2

: one is to compute

d_{k + 1}

by (8) and (29) if the second conic model conditions hold; the other is to compute

d_{k + 1}

by (35) if the HS conditions hold. Otherwise, we use the negative gradient direction

- g_{k + 1}

as our search direction.

2.2. Quadratic Model

The content of this subsection is to calculate the search direction in the case that the objective function is approximated by a quadratic model.

When the condition (10) holds, the quadratic approximation model seems to perform well in approximating the objective function, so we consider the subproblem based on the quadratic approximation model as follows:

\min_{d \in Ω_{k + 1}} ψ_{k + 1} (d) = g_{k + 1}^{T} d + \frac{1}{2} d^{T} B_{k + 1} d,

(36)

and related work can be found in [13]. The corresponding results can be applied to this situation. We state the concrete results in the following.

Firstly, there are some necessary conditions:

ϑ_{1} \leq \frac{s_{k}^{T} y_{k}}{∥ s_{k} ∥^{2}} \leq \frac{∥ y_{k} ∥^{2}}{s_{k}^{T} y_{k}} \leq ϑ_{2},

(37)

ϑ_{1} \leq \frac{τ_{k}}{∥ y_{k}^{*} ∥^{2}}, \frac{4 ∥ y_{k} ∥^{4} {∥ y_{k}^{*} ∥}^{2}}{{(s_{k}^{T} y_{k})}^{2} τ_{k}} \leq ϑ_{2},

(38)

\frac{∥ s_{k} ∥^{2}}{∥ g_{k + 1} ∥^{2}} \geq ϑ_{3},

(39)

we call (37)–(39) the quadratic model conditions.

When the quadratic model conditions hold,

d_{k + 1} = t_{k + 1}^{1} g_{k + 1} + μ_{k + 1}^{1} s_{k} + ν_{k + 1}^{1} y_{k}^{*},

(40)

where

\begin{matrix} (\begin{matrix} t_{k + 1}^{1} \\ μ_{k + 1}^{1} \\ ν_{k + 1}^{1} \end{matrix}) = - \frac{1}{| A_{k + 1} |} (\begin{matrix} X & θ_{1} & θ_{2} \\ θ_{1} & θ & θ_{3} \\ θ_{2} & θ_{3} & Y \end{matrix}) (\begin{matrix} ∥ g_{k + 1} ∥^{2} \\ g_{k + 1}^{T} s_{k} \\ g_{k + 1}^{T} y_{k}^{*} \end{matrix}) . \end{matrix}

When the quadratic model conditions do not all hold but (37) holds,

d_{k + 1} = t_{k + 1}^{2} g_{k + 1} + μ_{k + 1}^{2} s_{k},

(41)

where

\begin{matrix} (\begin{matrix} t_{k + 1}^{2} \\ μ_{k + 1}^{2} \end{matrix}) = \frac{1}{| {\bar{A}}_{k + 1} |} (\begin{matrix} g_{k + 1}^{T} y_{k} g_{k + 1}^{T} s_{k} - {∥ g_{k + 1} ∥}^{2} s_{k}^{T} y_{k} \\ ∥ g_{k + 1} ∥^{2} g_{k + 1}^{T} y_{k} - ρ_{k + 1} g_{k + 1}^{T} s_{k} \end{matrix}) . \end{matrix}

{\bar{A}}_{k + 1} = (\begin{matrix} ρ_{k + 1} & g_{k + 1}^{T} y_{k} \\ g_{k + 1}^{T} y_{k} & s_{k}^{T} y_{k} \end{matrix}) .

When (37) does not holds but the HS conditions hold,

d_{k + 1} = - g_{k + 1} + β_{k}^{H S} d_{k} .

Otherwise,

d_{k + 1} = - g_{k + 1} .

To sum up, the generated directions possess several forms as follows:

: when the approximation conic model is selected, i.e., (10) is violated,
-: $d_{k + 1}$ is calculated by (12) and (16), if the first conic model conditions hold,
-: $d_{k + 1}$ is calculated by (8) and (29), if the second conic model conditions hold,
-: $d_{k + 1}$ is calculated by (35), if the HS conditions hold;
: when the approximation quadratic model is selected, i.e., (10) holds,
-: $d_{k + 1}$ is calculated by (12) and (40), if the quadratic model conditions hold,
-: $d_{k + 1}$ is calculated by (8) and (41), if only (37) holds,
-: $d_{k + 1}$ is calculated by (35), if the HS conditions hold;
: otherwise, $d_{k + 1} = - g_{k + 1}$ .

3. The Stepsize and Algorithm

In this section, we will present the obtainment of another essential ingredient in the iterative formula of optimization method, i.e., stepsize. It requires two steps: the first is the choice of initial stepsize, and the second is the line search procedure. After these discussion, then we can propose our whole algorithm.

3.1. Strategy for the Initial Stepsize

Firstly, we show how to choose the initial stepsize.

It is generally acknowledged that the initial stepsize is of great significance for an optimization method, especially for the conjugate gradient method. For Newton and quasi-Newton methods, the choice of the initial trial stepsize may always be unit step

α^{0} = 1

. However, it is different for methods that do not produce well-scaled search directions, such as the steepest descent or the conjugate gradient methods. Thus, it is significant to make a reasonable initial guess of the stepsize by considering the current information about the objective function and algorithm for such methods [58,59]. Many strategies of the initial stepsize have been proposed which can be found in the achievements of Hager and Zhang [60], Nocedal and Wright [58], Dai and Kou [19], Liu and Liu [57].

In the strategy of [19], Dai and Kou presented a condition,

\frac{| φ_{k + 1} (α_{k + 1}^{0}) - φ_{k + 1} (0) |}{ε_{1} + | φ_{k + 1} (0) |} \leq ε_{2},

(42)

where

ε_{1}, ε_{2}

are small positive constants,

α_{k + 1}^{0}

denotes the initial trial stepsize, and

φ_{k + 1} (α) = f (x_{k + 1} + α d_{k + 1})

. If condition (42) holds, it implies that the points

x_{k + 1} + α_{k + 1}^{0} d_{k + 1}

and

x_{k + 1}

are not far away from each other, so it is reasonable to use the minimizer of the quadratic interpolation function for

φ_{k + 1} (0)

,

φ_{k + 1}^{^{'}} (0)

and

φ_{k + 1} (α_{k + 1}^{0})

as the new initial stepsize, where

φ_{k + 1}^{^{'}} (0)

denotes the first derivative of

φ_{k + 1} (0)

.

In this paper, the selection of the initial stepsize has two parts, depending on whether the negative gradient direction is adopted or not, and is presented via modification of that in Liu and Liu [56].

When the search direction is negative gradient direction, according to the analysis of Liu and Liu [57], it is desirable to take the initial trial stepsize by

{\bar{\bar{α}}}_{k + 1} = \{\begin{matrix} \max {\min {s_{k}^{T} y_{k} / ∥ y_{k} ∥^{2}, λ_{m a x}}, λ_{m i n}}, & if g_{k + 1}^{T} s_{k} > 0, \\ \max {\min {∥ s_{k} ∥^{2} / s_{k}^{T} y_{k}, λ_{m a x}}, λ_{m i n}}, & if g_{k + 1}^{T} s_{k} \leq 0, \end{matrix}

(43)

where

λ_{m i n}

and

λ_{m a x}

are two positive constants controlling the initial stepsize within the interval

[λ_{m i n}, λ_{m a x}]

which is preferable in numerical experiments.

Andrei [61] thinks that, the more accurate the step length is, the faster convergence a conjugate gradient algorithm has. It therefore makes sense to verify if the initial trial stepsize

{\bar{\bar{α}}}_{k + 1}

satisfies (42) or not. If so and

d_{k} \neq - g_{k}

,

∥ g_{k + 1} ∥^{2} \leq 1

, we update the initial stepsize by

{\tilde{α}}_{k + 1} = \max {\min {{\tilde{\tilde{α}}}_{k + 1}, λ_{m a x}}, λ_{m i n}},

where

{\tilde{\tilde{α}}}_{k + 1} = \min q (φ_{k + 1} (0), φ_{k + 1}^{^{'}} (0), φ_{k + 1} ({\bar{\bar{α}}}_{k + 1})),

where

q (φ_{k + 1} (0), φ_{k + 1}^{^{'}} (0), φ_{k + 1} ({\bar{\bar{α}}}_{k + 1}))

denotes the quadratic interpolation function for

φ_{k + 1} (0)

,

φ_{k + 1}^{^{'}} (0)

and

φ_{k + 1} ({\bar{\bar{α}}}_{k + 1})

.

Therefore, the initial stepsize for the negative gradient direction is

α_{k + 1}^{0} = \{\begin{matrix} {\tilde{α}}_{k + 1}, & if (42) holds, d_{k} \neq - g_{k}, {∥ g_{k + 1} ∥}^{2} < 1 and {\tilde{\tilde{α}}}_{k + 1} > 0, \\ {\bar{\bar{α}}}_{k + 1}, & otherwise . \end{matrix}

(44)

When it comes to the search direction that is not negative gradient direction, the similarity of the calculation between our direction and that in quasi-Newton methods implies that the unit stepsize

α_{k + 1}^{0} = 1

might be a reasonable initial trial stepsize. Again we figure out the minimizer of the quadratic interpolation function

{\bar{α}}_{k + 1} = \min q (φ_{k + 1} (0), φ_{k + 1}^{^{'}} (0), φ_{k + 1} (1)) .

If

α_{k + 1}^{0} = 1

satisfies (42) and

{\bar{α}}_{k + 1} > 0

, we update the initial stepsize by

{\hat{α}}_{k + 1} = \max {\min {{\bar{α}}_{k + 1}, λ_{m a x}}, λ_{m i n}} .

Therefore, the initial stepsize for the search direction except negative gradient direction is

α_{k + 1}^{0} = \{\begin{matrix} {\hat{α}}_{k + 1}, & if (42) holds and {\bar{α}}_{k + 1} > 0, \\ 1, & otherwise . \end{matrix}

(45)

3.2. The Nonmonotone Line Search

Secondly, we introduce the line search used in our algorithm.

For the variable

η_{k}

in ZH line search, Zhang and Hager [15] proved that if

η_{m a x} = 1

, then the generated sequence

{x_{k}}

only has the property that

\underset{k \to \infty}{\lim \inf} ∥ g_{k} ∥ = 0 .

Liu and Liu [57] presented a formula of

η_{k}

,

η_{k} = \{\begin{matrix} c, mod (k, n) = n - 1, \\ 1, mod (k, n) \neq n - 1, \end{matrix}

where

0 < c < 1

and

\mod (k, n)

denotes the residue for k modulo n, and it resulted in a better convergence.

Referring to the above study, Li et al. [52] and Sun et al. [53] made some modifications so that the improved line search is more appropriate for their algorithms.

In order to gain the decent convergence result and performance, this paper modifies the improved ZH line search used in [53]. To be specific, we set

\begin{matrix} C_{k + 1} = & \{\begin{matrix} f_{k + 1} + \min {1, 0.9 (C_{k} - f_{k + 1})}, & k < 5, \\ η_{k} Q_{k} C_{k} + f (x_{k + 1})] / Q_{k + 1}, & k \geq 5, \end{matrix} \end{matrix}

(46)

\begin{matrix} Q_{k + 1} = & \{\begin{matrix} 6, & k = 4, \\ η_{k} Q_{k} + 1, & k \geq 5, \end{matrix} \end{matrix}

(47)

where

η_{k} = \{\begin{matrix} η, mod (k, n) = 0, \\ 1, mod (k, n) \neq 0, \end{matrix}

(48)

where

η = 0.999

.

3.3. Algorithm

In this subsection, we will detail our new three-dimensional subspace method based on conic model.

First, we incorporate a special restart technique proposed by Dai and Kou [19]. They defined a quantity

r_{k} = \frac{2 (f_{k + 1} - f_{k})}{α_{k} (g_{k}^{T} d_{k} + g_{k + 1}^{T} d_{k})},

where

f_{k + 1} = φ_{k} (α_{k})

and

f_{k} = φ_{k} (0)

. If

r_{k}

is close to 1, they think that the line search function

φ_{k}

is close to some quadratic function. According to their analysis, the exact approach of this quantity is that if there are continuously many iterations such that

r_{k}

is close to 1, we restart the algorithm with the negative gradient direction. In addition, if the number of the iterations since the last restart reaches the MaxRestart threshold, we also restart out algorithm.

The details of the three-dimensional subspace conjugate gradient method based on conic model are given in the next algorithm, which is Algorithm 1.

Algorithm 1 WYL_TSCO

Require: initial point

x_{0}

, initial stepsize

α_{0}^{0}

, positive constants

ϵ, ϵ_{1}, ϵ_{2}, 0 < δ < σ < 1,

ξ_{1}, ξ_{2}, ξ_{3}, ξ_{4}, ξ_{5}, ϑ_{1}, ϑ_{2}, ϑ_{3}, ϑ_{4},

ρ_{0}, {\bar{ρ}}_{0} \in (0, 1), λ_{1}, λ_{2}, λ_{m i n}, λ_{m a x}

Ensure: optimal

x^{*}

1:

Set MaxRestart: = 4n, IterRestart: = 0, IterQuad: = 0, MinQuad: = 3, Numnongrad: = 0,

C_{0} = f_{0}, d_{0} = - g_{0}

and k: = 0.

2:

If

∥ g_{0} ∥_{\infty} \leq ϵ

, stop.

3:

Calculate the stepsize

α_{k}

by (5) and (6) with

α_{k}^{0}

.

4:

Update

x_{k + 1} = x_{k} + α_{k} d_{k}

. If

∥ g_{k + 1} ∥_{\infty} \leq ϵ

, stop; otherwise, let IterRestart: = IterRestart +1. If

| r_{k} - 1 | \leq 10^{- 8}

or

| f_{k + 1} - f_{k} - 0.5 (g_{k + 1}^{T} s_{k} + g_{k}^{T} s_{k}) | \leq 6.0 \times 10^{- 8}

, IterQuad: = IterQuad+1; else, IterQuad: = 0.

5:

Calculate the search direction

d_{k}

. If Numnongrad = MaxRestart, or (IterQuad = MinQuad and IterRestart ≠IterQuad), go to 5.6; else if the condition (10) holds, go to 5.1; else, go to 5.3.

5.1:: If the quadratic model conditions hold, compute $d_{k + 1}$ by (12) and (40), set Numnongrad: = Numnongrad+1; else, go to 5.2.
5.2:: If condition (37) holds, compute $d_{k + 1}$ by (8) and (41), set Numnongrad: = Numnongrad+1; else, go to 5.5.
5.3:: If the first conic model conditions hold, compute $d_{k + 1}$ by (12) and (16), set Numnongrad: = Numnongrad+1; else, go to 5.4.
5.4:: If the second conic model conditions, compute $d_{k + 1}$ by (8) and (29), set Numnongrad: = Numnongrad+1; else, go to 5.5.
5.5:: If the HS conditions hold, compute $d_{k + 1}$ by (35), set Numnongrad: = Numnongrad +1; else, go to 5.6.
5.6:: Compute $d_{k + 1} = - g_{k + 1}$ , set Numnongrad: = 0 and IterRestart: = 0.

6:

If

d_{k + 1} = - g_{k + 1}

, calculate

α_{k + 1}^{0}

by (44); otherwise, calculate

α_{k + 1}^{0}

by (45).

7:

Update

Q_{k + 1}

and

C_{k + 1}

by (46)–(48).

8:

Set

k : = k + 1

, go to Step 3.

4. Properties of the Proposed Algorithm

The theoretical nature is an important criterion to determine the performance of an algorithm. In this section, we will give some properties of WYL_TSCO, including the sufficient descent property and the global convergence.

4.1. Some Lemmas

This subsection will make a deep discussion on our algorithm and prove some properties of the presented directions before establishing the global convergence. At first, we propose two assumptions as follows:

Assumption A1.

The objective function

f (x)

is continuously differentiable and bounded from below on

ℜ^{n}

.

Assumption A2.

The gradient function

g (x)

is Lipschitz continuous on the level set

D = {x \in ℜ^{n} : f (x) \leq f (x_{0})}

, which means that there exists a positive constant

L > 0

satisfying

∥ g (x) - g (y) ∥ \leq L ∥ x - y ∥, \forall x, y \in D,

which implies that

∥ y_{k} ∥ \leq L ∥ s_{k} ∥

.

With these assumptions, we can prove some properties of the new directions in the following.

Lemma 1.

For the search direction

d_{k + 1}

calculated by WYL_TSCO, there exists a constant

ς_{1} > 0

such that

g_{k + 1}^{T} d_{k + 1} \leq - ς_{1} {∥ g_{k + 1} ∥}^{2} .

(49)

Proof.

We will discuss in several parts based on different situations and different approximation models.

Case I: if the negative direction is adopted, i.e.,

d_{k + 1} = - g_{k + 1}

, then

g_{k + 1}^{T} d_{k + 1} = - ∥ g_{k + 1} ∥^{2} \leq - \frac{1}{2} {∥ g_{k + 1} ∥}^{2} .

Case II: if

d_{k + 1}

is determined by (35), i.e., the HS direction, combining (34), then we have

\begin{matrix} g_{k + 1}^{T} d_{k + 1} & = - ∥ g_{k + 1} ∥^{2} + β_{k}^{H S} g_{k + 1}^{T} d_{k} \\ \leq - ∥ g_{k + 1} ∥^{2} + \frac{| g_{k + 1}^{T} y_{k} g_{k + 1}^{T} d_{k} |}{d_{k}^{T} y_{k}} \\ \leq - ∥ g_{k + 1} ∥^{2} + ϑ_{4} {∥ g_{k + 1} ∥}^{2} \\ = - (1 - ϑ_{4}) {∥ g_{k + 1} ∥}^{2} . \end{matrix}

Case III (conic): if

d_{k + 1}

is calculated by (8) and (29), Li, Liu and Liu [52] have proved that

g_{k + 1}^{T} d_{k + 1} \leq - \bar{ς} {∥ g_{k + 1} ∥}^{2},

where

\bar{ς} = \min \{\frac{{\bar{ρ}}_{0}}{(8 - 6 {\bar{ρ}}_{0} \max {ξ_{2}, ξ_{4}})}, \frac{1}{6 \bar{c}}\},

and

\bar{c}

is a positive constant.

Case IV (conic): when

d_{k + 1}

is obtained by (12) and (16), we have

\begin{matrix} g_{k + 1}^{T} d_{k + 1} & = {(\begin{matrix} ∥ g_{k + 1} ∥^{2} \\ g_{k + 1}^{T} s_{k} \\ g_{k + 1}^{T} y_{k}^{*} \end{matrix})}^{T} (\begin{matrix} t_{k + 1} \\ μ_{k + 1} \\ ν_{k + 1} \end{matrix}) \\ = - \frac{1}{D_{k + 1}} {(\begin{matrix} ∥ g_{k + 1} ∥^{2} \\ g_{k + 1}^{T} s_{k} \\ g_{k + 1}^{T} y_{k}^{*} \end{matrix})}^{T} (\begin{matrix} X & θ_{1} & θ_{2} \\ θ_{1} & θ & θ_{3} \\ θ_{2} & θ_{3} & Y \end{matrix}) (\begin{matrix} ∥ g_{k + 1} ∥^{2} \\ g_{k + 1}^{T} s_{k} \\ g_{k + 1}^{T} y_{k}^{*} \end{matrix}) \\ = - \frac{∥ g_{k + 1} ∥^{4}}{D_{k + 1}} h (x, y), \end{matrix}

where

x ≜ \frac{g_{k + 1}^{T} y_{k}^{*}}{∥ g_{k + 1} ∥^{2}}

,

y ≜ \frac{g_{k + 1}^{T} s_{k}}{∥ g_{k + 1} ∥^{2}}

. Moreover,

h (x, y)

is a binary quadratic function of x and y which can be expressed as

h (x, y) = Y x^{2} + 2 θ_{3} x y + θ y^{2} + 2 θ_{2} x + 2 θ_{1} y + X .

It is easy to acquire the Hessian of

h (x, Y)

H_{h} = (\begin{matrix} 2 Y & 2 θ_{3} \\ 2 θ_{3} & 2 θ \end{matrix}),

we have

Y > 0

because

A_{k + 1}

is positive definite, and the determinant of

H_{h}

4 Y θ - 4 θ_{3}^{2} = 4 ρ_{k + 1} | A_{k + 1} |,

is also positive, so

h (x, y)

has a minimizer, that is

h {(x, y)}_{m i n} = \frac{| A_{k + 1} |}{ρ_{k + 1}} .

Therefore, we can derive

g_{k + 1}^{T} d_{k + 1} \leq - \frac{∥ g_{k + 1} ∥^{4}}{D_{k + 1}} h {(x, y)}_{m i n} \leq - \frac{| A_{k + 1} |}{D_{k + 1} ρ_{k + 1}} {∥ g_{k + 1} ∥}^{4} .

(50)

Because

ρ_{k + 1}, D_{k + 1}

and

| A_{k + 1} |

are all positive, we just need to prove that

\frac{| A_{k + 1} |}{D_{k + 1} ρ_{k + 1}} {∥ g_{k + 1} ∥}^{2}

has a lower bound. Since

D_{k + 1}

contains

ρ_{k + 1}

, we first prove that

ρ_{k + 1}

has an upper bound; that is the upper bound of

N_{k}, n_{k}

and K.

For

N_{k}

, we use the Cauchy inequality and can obtain

\begin{matrix} | N_{k} | & = |\frac{S_{k}}{s_{k}^{T} y_{k} τ_{k}}| / M_{k} \\ \leq [|\frac{1 - γ_{k + 1}}{γ_{k + 1}}| (\frac{∥ g_{k + 1} ∥^{4}}{| g_{k + 1}^{T} s_{k} |} + 4 \frac{∥ g_{k + 1} ∥^{4} ∥ y_{k} ∥^{2} {∥ y_{k}^{*} ∥}^{2}}{| g_{k + 1}^{T} s_{k} | s_{k}^{T} y_{k} τ_{k}} + 4 \frac{∥ g_{k + 1} ∥^{2} ∥ y_{k} ∥ ∥ y_{k}^{*} ∥ | ω_{k} |}{s_{k}^{T} y_{k} τ_{k}} \\ + 2 \frac{∥ g_{k + 1} ∥^{3} ∥ y_{k} ∥}{s_{k}^{T} y_{k}} + 2 \frac{∥ g_{k + 1} ∥^{3} ∥ y_{k}^{*} ∥ | ω_{k} |}{| g_{k + 1}^{T} s_{k} | τ_{k}} + \frac{∥ g_{k + 1} ∥ ∥ s_{k} ∥ ω_{k}^{2}}{s_{k}^{T} y_{k} τ_{k}}) \\ + \frac{∥ g_{k + 1} ∥^{2} {∥ y_{k} ∥}^{2}}{s_{k}^{T} y_{k}} + \frac{ω_{k}^{2}}{τ_{k}} + 2 \frac{∥ g_{k + 1} ∥ ∥ y_{k} ∥^{2} ∥ y_{k}^{*} ∥ | ω_{k} |}{s_{k}^{T} y_{k} τ_{k}}] / M_{k}, \end{matrix}

combining (17), (18) and using the Cauchy inequality again, we have

\begin{matrix} | N_{k} | \leq \frac{1}{M_{k}} {∥ g_{k + 1} ∥}^{2} [|\frac{1 - γ_{k + 1}}{γ_{k + 1}}| (\frac{∥ g_{k + 1} ∥^{2}}{| g_{k + 1}^{T} s_{k} |} + 4 \frac{∥ g_{k + 1} ∥^{2}}{| g_{k + 1}^{T} s_{k} |} + 4 \frac{2 + ξ_{2}}{ζ_{k}} \frac{∥ g_{k + 1} ∥}{∥ y_{k} ∥} \\ + 2 \frac{∥ g_{k + 1} ∥ ∥ y_{k} ∥}{s_{k}^{T} y_{k}} + 2 \frac{2 + ξ_{2}}{ζ_{k}} \frac{s_{k}^{T} y_{k}}{∥ y_{k} ∥^{2}} \frac{∥ g_{k + 1} ∥^{2}}{| g_{k + 1}^{T} s_{k} |} + \frac{{(2 + ξ_{2})}^{2}}{ζ_{k}} \frac{∥ g_{k + 1} ∥ ∥ s_{k} ∥}{∥ y_{k} ∥^{2}}) \\ + \frac{∥ y_{k} ∥^{2}}{s_{k}^{T} y_{k}} + \frac{{(2 + ξ_{2})}^{2}}{ζ_{k}} \frac{s_{k}^{T} y_{k}}{∥ y_{k} ∥^{2}} + 2 \frac{2 + ξ_{2}}{ζ_{k}}] . \end{matrix}

Under condition (26), we have

\frac{∥ y_{k} ∥ ∥ s_{k} ∥}{s_{k}^{T} y_{k}} \leq \sqrt{\frac{ξ_{2}}{ξ_{1}}}

,

\frac{∥ s_{k} ∥}{∥ y_{k} ∥} \leq \frac{1}{ξ_{1}}

, and

\frac{∥ g_{k + 1} ∥ ∥ y_{k} ∥}{s_{k}^{T} y_{k}} = \frac{∥ g_{k + 1} ∥ ∥ y_{k} ∥ | g_{k + 1}^{T} s_{k} |}{s_{k}^{T} y_{k} | g_{k + 1}^{T} s_{k} |} \leq \frac{∥ y_{k} ∥ ∥ s_{k} ∥}{s_{k}^{T} y_{k}} \frac{∥ g_{k + 1} ∥^{2}}{| g_{k + 1}^{T} s_{k} |},

and in the same way,

\frac{∥ g_{k + 1} ∥}{∥ y_{k} ∥} \leq \frac{∥ s_{k} ∥}{∥ y_{k} ∥} \frac{∥ g_{k + 1} ∥^{2}}{| g_{k + 1}^{T} s_{k} |} .

Note that

ζ_{k} \geq 1

, the above inequality of

| N_{k} |

can be simplified to

\begin{matrix} | N_{k} | \leq \frac{1}{M_{k}} {∥ g_{k + 1} ∥}^{2} [|\frac{1 - γ_{k + 1}}{γ_{k + 1}}| (5 + 4 (2 + ξ_{2}) \frac{∥ s_{k} ∥}{∥ y_{k} ∥} + 2 \frac{∥ y_{k} ∥ ∥ s_{k} ∥}{s_{k}^{T} y_{k}} + 2 (2 + ξ_{2}) \frac{s_{k}^{T} y_{k}}{∥ y_{k} ∥^{2}} \\ + {(2 + ξ_{2})}^{2} \frac{∥ s_{k} ∥^{2}}{∥ y_{k} ∥^{2}}) \frac{∥ g_{k + 1} ∥^{2}}{| g_{k + 1}^{T} s_{k} |} + {(1 + (2 + ξ_{2}) \frac{s_{k}^{T} y_{k}}{∥ y_{k} ∥^{2}})}^{2} \frac{∥ y_{k} ∥^{2}}{s_{k}^{T} y_{k}}] . \end{matrix}

Utilizing conditions (25)–(27) and the expression of K, we can obtain the upper bound of

N_{k}

,

\begin{matrix} | N_{k} | & \leq \frac{∥ g_{k + 1} ∥^{2}}{M_{k}} [(\frac{{(2 + ξ_{2})}^{2}}{ξ_{1}^{2}} + 6 \frac{(2 + ξ_{2})}{ξ_{1}} + 2 \sqrt{\frac{ξ_{2}}{ξ_{1}}} + 5) |\frac{1 - γ_{k + 1}}{γ_{k + 1}}| \frac{∥ g_{k + 1} ∥^{2}}{| g_{k + 1}^{T} s_{k} |} \\ + {(\frac{2 + ξ_{2}}{ξ_{1}} + 1)}^{2} \frac{∥ y_{k} ∥^{2}}{s_{k}^{T} y_{k}}] \\ \leq \frac{1}{ρ_{0}} (2 \frac{{(2 + ξ_{2})}^{2}}{ξ_{1}^{2}} + 8 \frac{(2 + ξ_{2})}{ξ_{1}} + 2 \sqrt{\frac{ξ_{2}}{ξ_{1}}} + 6) K_{1} {∥ g_{k + 1} ∥}^{2} \\ = (2 \frac{{(2 + ξ_{2})}^{2}}{ξ_{1}^{2}} + 8 \frac{(2 + ξ_{2})}{ξ_{1}} + 2 \sqrt{\frac{ξ_{2}}{ξ_{1}}} + 6) \frac{K}{ρ_{0}} . \end{matrix}

Next, the upper bound of

n_{k}

is acquired by the same procedure,

\begin{matrix} | n_{k} | & \leq \frac{1}{m_{k}} {∥ g_{k + 1} ∥}^{2} (\frac{∥ y_{k} ∥^{2}}{s_{k}^{T} y_{k}} + \frac{{(2 + ξ_{2})}^{2}}{ζ_{k}} \frac{s_{k}^{T} y_{k}}{∥ y_{k} ∥^{2}} + 2 \frac{(2 + ξ_{2})}{ζ_{k}}) \\ \leq \frac{1}{m_{k}} {∥ g_{k + 1} ∥}^{2} {(1 + (2 + ξ_{2}) \frac{s_{k}^{T} y_{k}}{∥ y_{k} ∥^{2}})}^{2} \frac{∥ y_{k} ∥^{2}}{s_{k}^{T} y_{k}} \\ \leq \frac{1}{1 - \frac{1}{ζ_{k}}} \frac{{(ξ_{1} + ξ_{2} + 2)}^{2}}{ξ_{1}^{2}} K_{1} {∥ g_{k + 1} ∥}^{2} \\ \leq 6 \frac{{(ξ_{1} + ξ_{2} + 2)}^{2}}{ξ_{1}^{2}} K . \end{matrix}

For

K_{1}

, it is easy to know that

K_{1} \leq \max {ξ_{2}, ξ_{3}} .

Through the above discussion, now we can give the upper bound of

ρ_{k + 1}

,

\begin{matrix} ρ_{k + 1} & = ζ_{k} \max {K, N_{k}, n_{k}} \\ \leq 2 \max \{(2 \frac{{(2 + ξ_{2})}^{2}}{ξ_{1}^{2}} + 8 \frac{(2 + ξ_{2})}{ξ_{1}} + 2 \sqrt{\frac{ξ_{2}}{ξ_{1}}} + 6) / ρ_{0}, 6 \frac{{(ξ_{1} + ξ_{2} + 2)}^{2}}{ξ_{1}^{2}}\} K_{1} {∥ g_{k + 1} ∥}^{2} \\ \leq L_{0} {∥ g_{k + 1} ∥}^{2}, \end{matrix}

where

L_{0} ≜ 2 \max \{(2 \frac{{(2 + ξ_{2})}^{2}}{ξ_{1}^{2}} + 8 \frac{(2 + ξ_{2})}{ξ_{1}} + 2 \sqrt{\frac{ξ_{2}}{ξ_{1}}} + 6) / ρ_{0}, 6 \frac{{(ξ_{1} + ξ_{2} + 2)}^{2}}{ξ_{1}^{2}}\} \max {ξ_{2}, ξ_{3}} .

Since we have found the upper bound of

ρ_{k + 1}

, then we now consider

D_{k + 1}

. According to (22),

D_{k}

can be expressed as

D_{k + 1} = s_{k}^{T} y_{k} τ_{k} ρ_{k + 1} M_{k} - S_{k} = s_{k}^{T} y_{k} τ_{k} (M_{k} ρ_{k + 1} - M_{k} N_{k}) .

(51)

Using the formula of

M_{k}

and the upper bound of

ρ_{k + 1}

, we have

\begin{matrix} | D_{k + 1} | = D_{k + 1} \leq {∥ g_{k + 1} ∥}^{2} s_{k}^{T} y_{k} τ_{k} [L_{0} + L_{0} \frac{∥ y_{k} ∥^{2} {∥ y_{k}^{*} ∥}^{2}}{s_{k}^{T} y_{k} τ_{k}} + L_{0} |\frac{1 - γ_{k + 1}}{γ_{k + 1}}| (2 \frac{∥ g_{k + 1} ∥ ∥ y_{k} ∥ ∥ y_{k}^{*} ∥^{2}}{s_{k}^{T} y_{k} τ_{k}} \\ + \frac{∥ g_{k + 1} ∥ ∥ s_{k} ∥}{s_{k}^{T} y_{k}} + \frac{∥ g_{k + 1} ∥^{2} {∥ y_{k}^{*} ∥}^{2}}{| g_{k + 1} s_{k} | τ_{k}}) + \frac{M_{k} | N_{k} |}{∥ g_{k + 1} ∥^{2}}] . \end{matrix}

Note that

\frac{∥ g_{k + 1} ∥ ∥ y_{k} ∥ ∥ y_{k}^{*} ∥^{2}}{s_{k}^{T} y_{k} τ_{k}} = \frac{∥ g_{k + 1} ∥ ∥ y_{k} ∥ ∥ y_{k}^{*} ∥^{2} | g_{k + 1} s_{k} |}{s_{k}^{T} y_{k} τ_{k} | g_{k + 1} s_{k} |} \leq \frac{∥ s_{k} ∥}{∥ y_{k} ∥} \frac{∥ g_{k + 1} ∥^{2}}{| g_{k + 1} s_{k} |},

\frac{∥ g_{k + 1} ∥ ∥ s_{k} ∥}{s_{k}^{T} y_{k}} = \frac{∥ g_{k + 1} ∥ ∥ s_{k} ∥ | g_{k + 1} s_{k} |}{s_{k}^{T} y_{k} | g_{k + 1} s_{k} |} \leq \frac{∥ s_{k} ∥^{2}}{s_{k}^{T} y_{k}} \frac{∥ g_{k + 1} ∥^{2}}{| g_{k + 1} s_{k} |},

so the above inequality of

D_{k + 1}

can be simplified to

\begin{matrix} D_{k + 1} \leq {∥ g_{k + 1} ∥}^{2} s_{k}^{T} y_{k} τ_{k} [2 L_{0} + L_{0} (2 \frac{∥ s_{k} ∥}{∥ y_{k} ∥} + \frac{∥ s_{k} ∥^{2}}{s_{k}^{T} y_{k}} + \frac{s_{k}^{T} y_{k}}{∥ y_{k} ∥^{2}}) |\frac{1 - γ_{k + 1}}{γ_{k + 1}}| \frac{∥ g_{k + 1} ∥^{2}}{| g_{k + 1} s_{k} |} \\ + (4 \frac{ξ_{2}}{ξ_{1}} + 6 \sqrt{\frac{ξ_{2}}{ξ_{1}}} + 16) K_{1}] . \end{matrix}

Finally, using (26) and (27), we can obtain

\begin{matrix} D_{k + 1} & \leq ∥ g_{k + 1} ∥^{2} s_{k}^{T} y_{k} τ_{k} [2 L_{0} + (4 \frac{L_{0}}{ξ_{1}} + 4 \frac{ξ_{2}}{ξ_{1}} + 6 \sqrt{\frac{ξ_{2}}{ξ_{1}}} + 16) K_{1}] \\ \leq ∥ g_{k + 1} ∥^{2} s_{k}^{T} y_{k} τ_{k} [2 L_{0} + (4 \frac{L_{0}}{ξ_{1}} + 4 \frac{ξ_{2}}{ξ_{1}} + 6 \sqrt{\frac{ξ_{2}}{ξ_{1}}} + 16) \max {ξ_{2}, ξ_{3}}] \\ \leq L_{1} {∥ g_{k + 1} ∥}^{2} s_{k}^{T} y_{k} τ_{k}, \end{matrix}

(52)

where

L_{1} ≜ [2 L_{0} + (4 \frac{L_{0}}{ξ_{1}} + 4 \frac{ξ_{2}}{ξ_{1}} + 6 \sqrt{\frac{ξ_{2}}{ξ_{1}}} + 16) \max {ξ_{2}, ξ_{3}}] .

With (28), (50) and (52), the sufficient descent property of

d_{k + 1}

under this case can be established by

g_{k + 1}^{T} d_{k + 1} \leq - \frac{ξ_{5}}{L_{1}} {∥ g_{k + 1} ∥}^{2} .

Case V (quadratic): when

d_{k + 1}

is generated by (8) and (41), the proof can be found in [43], in which Dai and Kou proved that

g_{k + 1}^{T} d_{k + 1} \leq - \frac{1}{2 ϑ_{2}} {∥ g_{k + 1} ∥}^{2} .

(53)

Next, if

d_{k + 1}

is generated by (12) and (40), Yang et al. [13] have drawn that

g_{k + 1}^{T} d_{k + 1} \leq - \frac{1}{16 ϑ_{2}} {∥ g_{k + 1} ∥}^{2} .

(54)

Combining (54) with (53), we finally have

g_{k + 1}^{T} d_{k + 1} \leq - \frac{1}{16 ϑ_{2}} {∥ g_{k + 1} ∥}^{2} .

After the above discussion, we can prove that there indeed exists a constant

ς_{1}

such that

g_{k + 1}^{T} d_{k + 1} \leq - ς_{1} {∥ g_{k + 1} ∥}^{2},

where

ς_{1} = \min \{\bar{ς}, \frac{1}{2}, 1 - ϑ_{4}, \frac{ξ_{5}}{L_{1}}, \frac{1}{16 ϑ_{2}}\} .

The proof is complete. □

Lemma 2.

Assume that f satisfies Assumption 2. If the search direction

d_{k + 1}

is calculated by WYL_TSCO, then there exists a constant

ς_{2} > 0

such that

∥ d_{k + 1} ∥ \leq ς_{2} ∥ g_{k + 1} ∥ .

(55)

Proof.

Similar to Lemma 1, the proof is divided into several cases.

Case I: if

d_{k + 1}

is calculated by negative gradient, then there certainly holds

∥ d_{k + 1} ∥ \leq ∥ g_{k + 1} ∥ .

Case II: if the search direction is the HS direction, according to Assumption 2 and the condition (33), we have

\begin{matrix} ∥ d_{k + 1} ∥ & = ∥ - g_{k + 1} + β_{k}^{H S} d_{k} ∥ \\ \leq ∥ g_{k + 1} ∥ + \frac{∥ g_{k + 1} ∥ ∥ y_{k} ∥ ∥ d_{k} ∥}{d_{k}^{T} y_{k}} \\ \leq (1 + \frac{L}{ϑ_{1}}) ∥ g_{k + 1} ∥ . \end{matrix}

Case III (conic): if

d_{k + 1}

is calculated by (8) and (29), from [52] we conclude that

∥ d_{k + 1} ∥ \leq (\frac{10 ξ_{2} + 5 ξ_{1} + 5 m}{{\bar{ρ}}_{0} ξ_{1}^{2}}) ∥ g_{k + 1} ∥,

where

m = 2 n_{0} \bar{ξ} / {\bar{ρ}}_{0}

with

n_{0} = \max \{4 - 3 {\bar{ρ}}_{0}, 1 + \frac{2 ξ_{2}}{ξ_{1}}\}, \bar{ξ} = \max {ξ_{2}, ξ_{3}, ξ_{4}} .

Case IV (conic): if

d_{k + 1}

is formed by (12) and (16), we have

\begin{matrix} ∥ d_{k + 1} ∥ & = ∥ t_{k + 1} g_{k + 1} + μ_{k + 1} s_{k} + ν_{k + 1} y_{k}^{*} ∥ \\ \leq \frac{1}{D_{k + 1}} (| q_{1} | ∥ g_{k + 1} ∥ + | q_{2} | ∥ s_{k} ∥ + | q_{3} | ∥ y_{k}^{*} ∥), \end{matrix}

(56)

so in order to prove Lemma 2 in this case, we first need to obtain a lower bound of

D_{k + 1}

. Combining (23), (25) and (51), and the value range of

ζ_{k}

, we can derive

\begin{matrix} D_{k + 1} & = s_{k}^{T} y_{k} τ_{k} M_{k} (ζ_{k} \max {K, N_{k}, n_{k}} - N_{k}) \\ \geq s_{k}^{T} y_{k} τ_{k} ρ_{0} (\frac{6}{5} \max {K, N_{k}, n_{k}} - N_{k}) \\ \geq s_{k}^{T} y_{k} τ_{k} ρ_{0} \frac{K}{5} . \end{matrix}

Then using the above inequality, we can transform (56) into

\begin{matrix} ∥ d_{k + 1} ∥ & \leq \frac{5}{ρ_{0} K s_{k}^{T} y_{k} τ_{k}} (| q_{1} | ∥ g_{k + 1} ∥ + | q_{2} | ∥ s_{k} ∥ + | q_{3} | ∥ y_{k}^{*} ∥) \\ \leq \frac{5}{ρ_{0} K} ∥ g_{k + 1} ∥ [∥ g_{k + 1} ∥^{2} + 4 \frac{∥ g_{k + 1} ∥^{2} ∥ y_{k} ∥^{2} {∥ y_{k}^{*} ∥}^{2}}{s_{k}^{T} y_{k} τ_{k}} + 2 \frac{∥ g_{k + 1} ∥^{2} ∥ s_{k} ∥ ∥ y_{k} ∥}{s_{k}^{T} y_{k}} \\ + 4 \frac{∥ g_{k + 1} ∥ ∥ s_{k} ∥ ∥ y_{k} ∥ ∥ y_{k}^{*} ∥ | ω_{k} |}{s_{k}^{T} y_{k} τ_{k}} + 2 \frac{∥ g_{k + 1} ∥ ∥ y_{k}^{*} ∥ | ω_{k} |}{τ_{k}} + \frac{ρ_{k + 1} {∥ s_{k} ∥}^{2}}{s_{k}^{T} y_{k}} \\ + \frac{∥ s_{k} ∥^{2} ω_{k}^{2}}{s_{k}^{T} y_{k} τ_{k}} + 2 \frac{ρ_{k + 1} ∥ s_{k} ∥ ∥ y_{k} ∥ ∥ y_{k}^{*} ∥^{2}}{s_{k}^{T} y_{k} τ_{k}} + \frac{ρ_{k + 1} {∥ y_{k}^{*} ∥}^{2}}{τ_{k}}] \\ \leq \frac{5 ∥ g_{k + 1} ∥}{ρ_{0}} \frac{∥ g_{k + 1} ∥^{2}}{K} (\frac{{(2 + ξ_{2})}^{2}}{ξ_{1}^{2}} + 6 \frac{(2 + ξ_{2})}{ξ_{1}} + 4 \frac{M_{0}}{ξ_{1}} + 2 \sqrt{\frac{ξ_{2}}{ξ_{1}}} + 5) \\ \leq \frac{∥ g_{k + 1} ∥}{ρ_{0} K_{1}} (\frac{25 ξ_{1}^{2} + 5 ξ_{2}^{2} + 30 ξ_{1} ξ_{2} + 10 ξ_{1} \sqrt{ξ_{1} ξ_{2}} + (60 + 20 M_{0}) ξ_{1} + 20 ξ_{2} + 20}{ξ_{1}^{2}}) . \end{matrix}

Since (26) implies that

K_{1} \geq ξ_{1}

, we finally obtain an upper bound of

d_{k + 1}

, that is

\begin{matrix} ∥ d_{k + 1} ∥ \leq (\frac{25 ξ_{1}^{2} + 5 ξ_{2}^{2} + 30 ξ_{1} ξ_{2} + 10 ξ_{1} \sqrt{ξ_{1} ξ_{2}} + (60 + 20 M_{0}) ξ_{1} + 20 ξ_{2} + 20}{ρ_{0} ξ_{1}^{3}}) ∥ g_{k + 1} ∥ . \end{matrix}

For convenience, we define

L_{2} ≜ (\frac{25 ξ_{1}^{2} + 5 ξ_{2}^{2} + 30 ξ_{1} ξ_{2} + 10 ξ_{1} \sqrt{ξ_{1} ξ_{2}} + (60 + 20 M_{0}) ξ_{1} + 20 ξ_{2} + 20}{ρ_{0} ξ_{1}^{3}}),

namely,

∥ d_{k + 1} ∥ \leq L_{2} ∥ g_{k + 1} ∥ .

Case V (quadratic): if

d_{k + 1}

is formed by (8) and (41), it is similar to Lemma 4 in Li, Liu and Liu [42]. According to their proof, we can deduce

∥ d_{k + 1} ∥ \leq \frac{20}{ϑ_{1}} ∥ g_{k + 1} ∥ .

If

d_{k + 1}

is calculated by (12) and (40), it follows from [13] that

∥ d_{k + 1} ∥ \leq \frac{750}{ϑ_{1}} ∥ g_{k + 1} ∥ .

Since

ϑ_{1} > 0

, we finally have

∥ d_{k + 1} ∥ \leq \frac{750}{ϑ_{1}} ∥ g_{k + 1} ∥ .

According to the above analysis, the proof of Lemma 2 is completed by setting

ς_{2} = \max \{(\frac{10 ξ_{2} + 5 ξ_{1} + 5 m}{{\bar{ρ}}_{0} ξ_{1}^{2}}), 1, 1 + \frac{L}{ϑ_{1}}, L_{2}, \frac{750}{ϑ_{1}}\} .

□

4.2. Convergence Analysis

In this subsection, we will give the global convergence of the presented algorithm for general functions.

Lemma 3.

Suppose

α_{k}

satisfies the line search conditions (5) and (6), and f satisfies Assumption 2, then

α_{k} \geq \frac{(1 - σ) | g_{k}^{T} d_{k} |}{L ∥ d_{k} ∥^{2}} .

(57)

Proof.

According to the line search condition (5), we can easily obtain

(σ - 1) g_{k}^{T} d_{k} \leq {(g_{k + 1}^{T} - g_{k})}^{T} d_{k} = y_{k}^{T} d_{k} \leq ∥ y_{k} ∥ ∥ d_{k} ∥ \leq L ∥ s_{k} ∥ ∥ d_{k} ∥ = α_{k} L {∥ d_{k} ∥}^{2},

because

σ - 1 < 0

and

g_{k}^{T} d_{k} < 0

, it is obvious that

α_{k} \geq \frac{(1 - σ) | g_{k}^{T} d_{k} |}{L ∥ d_{k} ∥^{2}} .

□

Theorem 1.

Suppose the objective function f satisfies Assumptions 1 and 2, and sequence

{x_{k}}

is generated by WYL_TSCO, then we have

\lim_{k \to \infty} ∥ g_{k} ∥ = 0 .

(58)

Proof.

According to (57), (49) and (55), it follows

δ α_{k} g_{k}^{T} d_{k} \leq - \frac{(1 - σ) δ}{L} \frac{{(g_{k}^{T} d_{k})}^{2}}{∥ d_{k} ∥^{2}} \leq - \frac{(1 - σ) δ ς_{1}^{2}}{L ς_{2}} ∥ g_{k} ∥^{2} = - H {∥ g_{k} ∥}^{2},

where

H ≜ \frac{(1 - σ) δ ς_{1}^{2}}{L ς_{2}}

, then combining (6), we have

f_{k + 1} \leq C_{k} + δ α_{k} g_{k}^{T} d_{k} \leq C_{k} - H {∥ g_{k} ∥}^{2} .

(59)

With the reasonable assumptions that

Q_{0} = 1

,

Q_{1} = 2

,

Q_{2} = 3

,

Q_{3} = 4

,

Q_{4} = 5

and

η_{0} = η_{1} = η_{2} = η_{3} = η_{4} = 1

according to the update formulas (7) and (48), we can acquire that

Q_{k + 1} = 1 + \sum_{j = 0}^{k} \prod_{i = 0}^{j} η_{k - i}, k = 0, 1, 2 . . .

Combining (48), we can derive the general formula of

Q_{k + 1}

,

Q_{k + 1} = \{\begin{matrix} 1 + η^{k / n} + n \sum_{i = 1}^{k / n} η^{i}, & mod (k, n) = 0, \\ 1 + \mod (k, n) + η^{⌊ k / n ⌋} + n \sum_{i = 1}^{⌊ k / n ⌋} η^{i}, & mod (k, n) \neq 0, \end{matrix}

where

⌊ \cdot ⌋

denotes the floor function. Thus, it is easy to obtain an upper bound of

Q_{k + 1}

,

\begin{matrix} Q_{k + 1} & \leq 1 + \mod (k, n) + η^{⌊ k / n ⌋ + 1} + n \sum_{i = 1}^{⌊ k / n ⌋ + 1} η^{i} \\ \leq 1 + n + (n + 1) \sum_{i = 1}^{⌊ k / n ⌋ + 1} η^{i} \\ \leq 1 + n + (n + 1) \sum_{i = 1}^{k + 1} η^{i} \\ \leq (1 + n) \sum_{i = 0}^{k + 1} η^{i} \\ \leq \frac{1 + n}{1 - η} \\ = W, \end{matrix}

(60)

where

W = \frac{1 + n}{1 - η}

.

When

k \geq 5

, combining (7), (59) and (60), we obtain

C_{k + 1} = C_{k} + \frac{f_{k + 1} - C_{k}}{Q_{k + 1}} \leq C_{k} - \frac{H}{W} {∥ g_{k} ∥}^{2},

which means

\frac{H}{W} {∥ g_{k} ∥}^{2} \leq C_{k} - C_{k + 1} .

(61)

When

k < 5

, (6) and (46) imply that

C_{k + 1} \leq f_{k + 1} + 0.9 (C_{k} - f_{k + 1}) = C_{k} + 0.1 (f_{k + 1} - C_{k}) < C_{k} .

Hence

C_{k}

is monotonically decreasing. According to Lemma 1.1 in [15], we have

f_{k + 1} \leq C_{k + 1}

for each

k \geq 5

, and (46) indicates that

f_{k + 1} \leq C_{k + 1}

for each

k < 5

. Thus, with the assumption that f is bounded below,

C_{k}

is certainly bounded from below.

Summing up the above analysis and (61), we can finally obtain that

\sum_{k = 0}^{\infty} \frac{H}{W} {∥ g_{k} ∥}^{2} < \infty,

which implies

\lim_{k \to \infty} {∥ g_{k} ∥}^{2} = 0 .

The proof is completed. □

5. Numerical Results

In this section, the results of the numerical experiments are shown below. The unconstrained test functions were taken from [62] with the given initial points. To prove the efficiency of the proposed WYL_TSCO algorithm, we compare its numerical performance with SMCG_Conic and TSCG. The performance profile proposed by Dolan and Moré [63] is used to evaluate the performance of these methods. The dimension of the test functions is 10,000. All the programs were written in C code.

SMCG_Conic is also a subspace minimization conjugate gradient algorithm based on the conic model, and is a pioneering one that combines the subspace technique with a conic model to seek the search direction. The numerical experiments in [52] show that the performance of SMCG_Conic is very efficient. Since the biggest difference between WYL_TSCO and SMCG_Conic is the dimension of the used subspace, the comparison between WYL_TSCO and SMCG_Conic can not only reflect the high efficiency of our algorithm, but also reveal the influence on the numerical result generated by the change of dimension for the adopted subspace in the subspace minimization conjugate gradient algorithm.

As for TSCG, it is an efficient subspace minimization conjugate gradient method, and it successfully applies the idea of subspace method and WYL method which are employed in our algorithm as well. However, TSCG only uses the quadratic model to approximate the objective function. Thus, the comparison between the numerical performance of WYL_TSCO and TSCG can reveal the influence of the approximation model in the subspace method.

For the initial stepsize of the first iteration, we adopt the adaptive strategy used in [42]. The other parameters of WYL_TSCO are selected as follows.

ϵ = 10^{- 6}, ϵ_{1} = 10^{- 3}, ϵ_{2} = 10^{- 4}, δ = 0.001, σ = 0.9999, λ_{m i n} = 10^{- 30}, λ_{m a x} = 10^{30},

ξ_{1} = 0.15 \times 10^{2}, ξ_{2} = 8.5 \times 10^{4}, ξ_{3} = 4 \times 10^{8}, ξ_{4} = 6.5 \times 10^{7}, ξ_{5} = 0.1, ρ_{0} = 0.3,

{\bar{ρ}}_{0} = 0.9, ϑ_{1} = 10^{- 7}, ϑ_{2} = 10^{4}, ϑ_{3} = 10^{- 2}, ϑ_{4} = 10^{- 4}, λ_{1} = 10^{- 7}, λ_{2} = 0.05 .

SMCG_Conic and TSCG use the original parameters in their paper respectively. In addition to stopping when the stopping criterion

∥ g_{k} ∥_{\infty} \leq ε

holds, the algorithm also stops when the number of iterations exceeds 200,000.

For the 80 test problems from [62], WYL_TSCO successfully solves 100% of them, while TSCG and SMCG_Conic both successfully solve 76 problems. We compare the numerical results of problems that are solved successfully by all the three algorithms, the number of which are 72. The numerical results of the three algorithms are presented in the following figures.

Figure 1 depicts the performance based on the number of iterations for the three methods. It shows that the performance of WYL_TSCO is similar to that of TSCG, but is a little inferior to SMCG_Conic’s only when

τ < 1.31

. However, WYL_TSCO outperforms TSCG and SMCG_Conic when

τ > 1.31

.

In Figure 2, we observe that WYL_TSCO and TSCG outperforms SMCG_Conic on the number of function evaluations, and TSCG is slightly ahead of WYL_TSCG only in the case of

τ < 1.5

.

Similar to Figure 1, Figure 3 illustrates that WYL_TSCO is at a little disadvantage compared with SMCG_Conic only when

τ < 1.318

, but is competitive with TSCG.

For the CPU time, Figure 4 shows that WYL_TSCO has an improvement on TSCG and SMCG_Conic, which shows the high efficiency of WYL_TSCO.

For the 80 test problems, the numerical results show that WYL_TSCO and TSCG may perform a little worse than SMCG_Conic when

τ

is very small, but they overall have significant improvements over SMCG_Conic, and WYL_TSCO also is superior to TSCG for almost all

τ

. Moreover, WYL_TSCO can solve more problems than TSCG and SMCG_Conic, which might show the advantage of the three-dimensional subspace method and conic model. To sum up, the WYL_TSCO is an efficient algorithm for solving the unconstrained optimization problem.

6. Conclusions

Combining the idea of WYL method, this paper proposes a three-dimensional subspace method based on conic models for unconstrained optimization (WYL_TSCO). The sufficient descent property of the search direction and the global convergence of this method are obtained under some suitable assumptions. The selection of an approximation model depends on whether certain criterions are satisfied. Furthermore, the strategies of initial stepsize and nonmonotone line search are exploited which are beneficial to the convergence and efficiency. According to the numerical results and theoretical analysis, WYL_TSCO is very competitive and promising.

Author Contributions

Conceptualization, S.Y.and J.X.; methodology, G.W.; software and validation, G.W. and M.P.; visualization and formal analysis, G.W.; writing—original draft preparation, G.W.; supervision, S.Y.; funding acquisition, S.Y. and J.X. All authors have read and agreed to the published version of the manuscript.

Funding

Chinese National Natural Science Foundation no. 12262002, Natural Science Foundation of Guangxi Province (CN) no. 2020GXNSFAA159014 and 2021GXNSFAA196076, the Program for the Innovative Team of Guangxi University of Finance and Economics, and Guangxi First-class Discipline statistics Construction Project.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hestenes, M.R.; Stiefel, E.L. Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 1952, 49, 409–436. [Google Scholar] [CrossRef]
Fletcher, R.; Reeves, C.M. Function minimization by conjugate gradients. Comput. J. 1964, 7, 149–154. [Google Scholar] [CrossRef] [Green Version]
Polyak, B.T. The conjugate gradient method in extremal problems. Jussr Comput. Math. Math. Phys. 1969, 9, 94–112. [Google Scholar] [CrossRef]
Fletcher, R. Volume 1 Unconstrained Optimization. In Practical Methods of Optimization; Wiley: New York, NY, USA, 1987. [Google Scholar]
Liu, Y.L.; Storey, C.S. Efficient generalized conjugate gradient algorithms, part 1: Theory. J. Optim. Theory Appl. 1991, 69, 129–137. [Google Scholar] [CrossRef]
Dai, Y.H.; Yuan, Y.X. A nonlinear conjugate gradient method with a strong global convergence property. Siam J. Optim. 1999, 10, 177–182. [Google Scholar] [CrossRef] [Green Version]
Wei, Z.; Yao, S.; Liu, L. The convergence properties of some new conjugate gradient methods. Appl. Math. Comput. 2006, 183, 1341–1350. [Google Scholar] [CrossRef]
Huang, H.; Wei, Z.; Yao, S. The proof of the sufficient descent condition of the Wei-Yao-Liu conjugate gradient method under the strong Wolfe-Powell line search. Appl. Math. Comput. 2007, 189, 1241–1245. [Google Scholar] [CrossRef]
Zhang, L. An improved Wei-Yao-Liu nonlinear conjugate gradient method for optimization computation. Appl. Math. Comput. 2009, 215, 2269–2274. [Google Scholar] [CrossRef]
Yao, S.; Qin, B. A hybrid of DL and WYL nonlinear conjugate gradient methods. Abstr. Appl. Anal. 2014, 2014, 279891. [Google Scholar] [CrossRef] [Green Version]
Huang, H.; Lin, S. A modified Wei-Yao-Liu conjugate gradient method for unconstrained optimization. Appl. Math. Comput. 2014, 231, 179–186. [Google Scholar] [CrossRef]
Hu, Y.; Wei, Z. Wei-Yao-Liu conjugate gradient projection algorithm for nonlinear monotone equations with convex constraints. Int. J. Comput. Math. 2015, 92, 2261–2272. [Google Scholar] [CrossRef]
Huo, J.; Yang, J.; Wang, G.; Yao, S. A class of three-dimensional subspace conjugate gradient algorithms for unconstrained optimization. Symmetry 2022, 14, 80. [Google Scholar] [CrossRef]
Wolfe, P. Convergence conditions for ascent methods. SIAM Rev. Soc. Ind. Appl. Math. 1969, 11, 226–235, Erratum in SIAM Rev. Soc. Ind. Appl. Math. 1971, 13, 185–188. [Google Scholar] [CrossRef]
Zhang, H.; Hager, W.W. A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim. 2004, 14, 1043–1056. [Google Scholar] [CrossRef] [Green Version]
Grippo, L.; Lampariello, F.; Lucidi, S. A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 1986, 23, 707–716. [Google Scholar] [CrossRef]
Hager, W.W.; Zhang, H. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 2005, 16, 170–192. [Google Scholar] [CrossRef] [Green Version]
Gu, N.Z.; Mo, J.T. Incorporating nonmonotone strategies into the trust region method for unconstrained optimization. Comput. Math. Appl. 2008, 55, 2158–2172. [Google Scholar] [CrossRef] [Green Version]
Dai, Y.H.; Kou, C.X. A nonlinear conjugate gradient algorithm with an optimal property and an improved Wolfe line search. SIAM J. Optim. 2013, 23, 296–320. [Google Scholar] [CrossRef] [Green Version]
Huang, S.; Wan, Z.; Chen, X. A new nonmonotone line search technique for unconstrained optimization. Numer. Algorithms 2015, 68, 671–689. [Google Scholar] [CrossRef]
Ou, Y.; Liu, Y. A memory gradient method based on the nonmonotone technique. J. Ind. Manag. Optim. 2017, 13, 857–872. [Google Scholar] [CrossRef]
Ahookhosh, M.; Amini, K.; Peyghami, M.R. A nonmonotone trust-region line search method for large-scale unconstrained optimization. Appl. Math. Model. 2012, 36, 478–487. [Google Scholar] [CrossRef]
Ahookhosh, M.; Amini, K. An efficient nonmonotone trust-region method for unconstrained optimization. Numer. Algorithms 2012, 59, 523–540. [Google Scholar] [CrossRef]
Kimiaei, M.; Rahpeymaii, F. A new nonmonotone line-search trust-region approach for nonlinear systems. TOP 2019, 27, 199–232. [Google Scholar] [CrossRef]
Kimiaei, M. A new class of nonmonotone adaptive trust-region methods for nonlinear equations with box constraints. Calcolo 2017, 54, 769–812. [Google Scholar] [CrossRef]
Yuan, Y.X. Subspace techniques for nonlinear optimization. In Some Topics in Industrial and Applied Mathematics; Jeltsch, R., Li, D.Q., Sloan, I.H., Eds.; Higher Education Press: Beijing, China, 2007; pp. 206–218. [Google Scholar]
Yuan, Y.X. A review on subspace methods for nonlinear optimization. In Proceedings of the International Congress of Mathematics, Seoul, Republic of Korea, 13–21 August 2014; pp. 807–827. [Google Scholar]
Liu, D.C.; Nocedal, J. On the limited memory BFGS method for large scale optimization. Math. Program. 1989, 45, 503–528. [Google Scholar] [CrossRef] [Green Version]
Wang, Z.H.; Wen, Z.W.; Yuan, Y. A subspace trust region method for large scale unconstrained optimization. In Numerical Linear Algebra and Optimization; Yuan, Y., Ed.; Science Press: Beijing, China, 2004; pp. 265–274. [Google Scholar]
Wang, Z.H.; Yuan, Y. A subspace implementation of quasi-Newton trust region methods for unconstrained optimization. Numer. Math. 2006, 104, 241–269. [Google Scholar] [CrossRef]
Lee, J.H.; Jung, Y.M.; Yuan, Y.X.; Yun, S. A subspace SQP method for equality constrained optimization. Comput. Optim. Appl. 2019, 74, 177–194. [Google Scholar] [CrossRef]
Gill, P.E.; Leonard, M.W. Reduced-hessian quasi-newton methods for unconstrained optimization. SIAM J. Optim. 2001, 12, 209–237. [Google Scholar] [CrossRef]
Liu, X.; Wen, Z.; Zhang, Y. Limited memory block Krylov subspace optimization for computing dominant singular value decompositions. SIAM J. Sci. Comput. 2013, 35, A1641–A1668. [Google Scholar] [CrossRef] [Green Version]
Yuan, Y.X. Subspace methods for large scale nonlinear equations and nonlinear least squares. Optim. Eng. 2009, 10, 207–218. [Google Scholar] [CrossRef]
Dong, Q.; Liu, X.; Wen, Z.W.; Yuan, Y.X. A parallel line search subspace correction method for composite convex optimization. J. Oper. Res. Soc. China. 2015, 3, 163–187. [Google Scholar] [CrossRef]
Li, Y.; Liu, H.; Wen, Z.; Yuan, Y.X. Low-rank matrix iteration using polynomial-filtered subspace extraction. SIAM J. Sci. Comput. 2020, 42, A1686–A1713. [Google Scholar] [CrossRef]
Kimiaei, M.; Neumaier, A.; Azmi, B. LMBOPT: A limited memory method for bound-constrained optimization. Math. Prog. Comp. 2022, 14, 271–318. [Google Scholar] [CrossRef]
Cartis, C.; Roberts, L. Scalable subspace methods for derivative-free nonlinear least-squares optimization. Math. Program. 2023, 199, 461–524. [Google Scholar] [CrossRef]
Yuan, Y.X.; Stoer, J. A subspace study on conjugate gradient algorithms. Z. Angew. Math. Mech. 1995, 75, 69–77. [Google Scholar] [CrossRef]
Andrei, N. An accelerated subspace minimization three-term conjugate gradient algorithm for unconstrained optimization. Numer. Algorithms 2014, 65, 859–874. [Google Scholar] [CrossRef]
Yang, Y.; Chen, Y.; Lu, Y. A subspace conjugate gradient algorithm for large-scale unconstrained optimization. Numer. Algorithms 2017, 76, 813–828. [Google Scholar] [CrossRef]
Li, M.; Liu, H.; Liu, Z. A new subspace minimization conjugate gradient method with nonmonotone line search for unconstrained optimization. Numer. Algorithms 2018, 79, 195–219. [Google Scholar] [CrossRef]
Dai, Y.H.; Kou, C.X. A Barzilai-Borwein conjugate gradient method. Sci. China Math. 2016, 59, 1511–1524. [Google Scholar] [CrossRef]
Barzilai, J.; Borwein, J.M. Two-point step size gradient methods. IMA J. Numer. Anal. 1988, 8, 141–148. [Google Scholar] [CrossRef]
Sun, W. On nonquadratic model optimization methods. Asia. Pac. J. Oper. Res. 1996, 13, 43–63. [Google Scholar]
Sun, W.; Yuan, Y. Optimization Theory and Methods: Nonlinear Programming; Springer: New York, NY, USA, 2006. [Google Scholar]
Davidon, W.C. Conic Approximations and Collinear Scalings for Optimizers. SIAM J. Numer. Anal. 1980, 17, 268–281. [Google Scholar] [CrossRef]
Sorensen, D.C. The Q-superlinear convergence of a collinear scaling algorithm for unconstrained optimization. SIAM J. Numer. Anal. 1980, 17, 84–114. [Google Scholar] [CrossRef]
Ariyawansa, K.A. Deriving collinear scaling algorithms as extensions of quasi-Newton methods and the local convergence of DFP- and BFGS-related collinear scaling algorithms. Math. Program. 1990, 49, 23–48. [Google Scholar] [CrossRef]
Sheng, S. Interpolation by conic model for unconstrained optimization. Computing 1995, 54, 83–98. [Google Scholar] [CrossRef]
Di, S.; Sun, W. A trust region method for conic model to solve unconstraind optimizaions. Optim. Methods Softw. 1996, 6, 237–263. [Google Scholar] [CrossRef]
Li, Y.; Liu, Z.; Liu, H. A subspace minimization conjugate gradient method based on conic model for unconstrained optimization. Comput. Appl. Math. 2019, 38, 16. [Google Scholar] [CrossRef]
Sun, W.; Li, Y.; Wang, T.; Liu, H. A new subspace minimization conjugate gradient method based on conic model for large-scale unconstrained optimization. Comput. Appl. Math. 2022, 41, 178. [Google Scholar] [CrossRef]
Yuan, Y.X. A modified BFGS algorithm for unconstrained optimization. IMA J. Numer. Anal. 1991, 11, 325–332. [Google Scholar] [CrossRef]
Dai, Y.H.; Yuan, J.Y.; Yuan, Y.X. Modified two-point stepsize gradient methods for unconstrained optimization problems. Comput. Optim. Appl. 2002, 22, 103–109. [Google Scholar] [CrossRef]
Liu, Z.; Liu, H. An efficient gradient method with approximate optimal stepsize for large-scale unconstrained optimization. Numer. Algorithms 2018, 78, 21–39. [Google Scholar] [CrossRef]
Liu, H.; Liu, Z. An efficient Barzilai-Borwein conjugate gradient method for unconstrained optimization. J. Optim. Theory Appl. 2019, 180, 879–906. [Google Scholar] [CrossRef]
Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: New York, NY, USA, 2006. [Google Scholar]
Andrei, N. Nonlinear Conjugate Gradient Methods for Unconstrained Optimization; Springer: Cham, Switzerland, 2020. [Google Scholar]
Hager, W.W.; Zhang, H. Algorithm 851: CG_DESCENT, a conjugate gradient method with guaranteed descent. ACM Trans. Math. Softw. 2006, 32, 113–137. [Google Scholar] [CrossRef]
Andrei, N. Open problems in nonlinear conjugate gradient algorithms for unconstrained optimization. Bull. Malays. Math. Sci. Soc. Second Ser. 2011, 34, 319–330. [Google Scholar]
Andrei, N. An unconstrained optimization test functions collection. Adv. Model. Optim. 2008, 10, 147–161. [Google Scholar]
Dolan, E.D.; Moré, J.J. Benchmarking optimization software with performance profiles. Math. Program. 2002, 91, 201–213. [Google Scholar] [CrossRef]

Figure 1. Performance profile based on the number of iterations.

Figure 2. Performance profile based on the number of function evaluations.

Figure 3. Performance profile based on the number of gradient evaluations.

Figure 4. Performance profile based on the CPU time.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, G.; Yao, S.; Pei, M.; Xu, J. A Three-Dimensional Subspace Algorithm Based on the Symmetry of the Approximation Model and WYL Conjugate Gradient Method. Symmetry 2023, 15, 1207. https://doi.org/10.3390/sym15061207

AMA Style

Wang G, Yao S, Pei M, Xu J. A Three-Dimensional Subspace Algorithm Based on the Symmetry of the Approximation Model and WYL Conjugate Gradient Method. Symmetry. 2023; 15(6):1207. https://doi.org/10.3390/sym15061207

Chicago/Turabian Style

Wang, Guoxin, Shengwei Yao, Mingyang Pei, and Jieqiong Xu. 2023. "A Three-Dimensional Subspace Algorithm Based on the Symmetry of the Approximation Model and WYL Conjugate Gradient Method" Symmetry 15, no. 6: 1207. https://doi.org/10.3390/sym15061207

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Three-Dimensional Subspace Algorithm Based on the Symmetry of the Approximation Model and WYL Conjugate Gradient Method

Abstract

1. Introduction

2. The Search Direction

2.1. Conic Model

2.2. Quadratic Model

3. The Stepsize and Algorithm

3.1. Strategy for the Initial Stepsize

3.2. The Nonmonotone Line Search

3.3. Algorithm

4. Properties of the Proposed Algorithm

4.1. Some Lemmas

4.2. Convergence Analysis

5. Numerical Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI