A Sparse Quasi-Newton Method Based on Automatic Differentiation for Solving Unconstrained Optimization Problems

Cao, Huiping; An, Xiaomin

doi:10.3390/sym13112093

Open AccessArticle

A Sparse Quasi-Newton Method Based on Automatic Differentiation for Solving Unconstrained Optimization Problems

by

Huiping Cao

^1,*

and

Xiaomin An

²

¹

School of Science, Xi’An Polytechnic University, Xi’an 710048, China

²

School of Science, Xi’An Technological University, Xi’an 710021, China

^*

Author to whom correspondence should be addressed.

Symmetry 2021, 13(11), 2093; https://doi.org/10.3390/sym13112093

Submission received: 18 September 2021 / Revised: 25 October 2021 / Accepted: 29 October 2021 / Published: 4 November 2021

(This article belongs to the Section Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

In our paper, we introduce a sparse and symmetric matrix completion quasi-Newton model using automatic differentiation, for solving unconstrained optimization problems where the sparse structure of the Hessian is available. The proposed method is a kind of matrix completion quasi-Newton method and has some nice properties. Moreover, the presented method keeps the sparsity of the Hessian exactly and satisfies the quasi-Newton equation approximately. Under the usual assumptions, local and superlinear convergence are established. We tested the performance of the method, showing that the new method is effective and superior to matrix completion quasi-Newton updating with the Broyden–Fletcher–Goldfarb–Shanno (BFGS) method and the limited-memory BFGS method.

Keywords:

symmetric quasi-Newton method; unconstrained optimization problems; matrix completion; automatic differentiation; superlinear convergence; Broyden–Fletcher–Goldfarb–Shanno method

MSC:

65K05; 90C06; 90C53

1. Introduction

We concentrated on the unconstrained optimization problem

min f (x), x \in R^{n},

(1)

where

f : R^{n} \to R

is a twice continuously differentiable function; and

\nabla f (x)

and

\nabla^{2} f (x)

denote the gradient and Hessian of f at x, respectively. The first order necessary condition of (1) is

\nabla f (x) = 0,

which can be written as the symmetric nonlinear equations

F (x) = 0,

(2)

where

F : R^{n} \to R^{n}

is a continuously differentiable mapping and the symmetry implies that the Jacobian

F^{'} (x)

satisfies

F^{'} (x) = F^{'} {(x)}^{T}

. That symmetric nonlinear system has close relationships with many practical problems, such as the gradient mapping of unconstrained optimization problems, the Karush–Kuhn–Tuckrt (KKT) system of equality constrained optimization problem, the discretized two-point boundary value problem, and the saddle point problem (2) [1,2,3,4,5].

For small or medium-scale problems, classical quasi-Newton methods enjoy superlinear convergence without the calculation of the Hessian [6,7]. Let

x_{k}

be the current iterative point and

B_{k}

be the symmetric approximation of the Hessian; then the iteration

{x_{k}}

generated by quasi-Newton methods is

x_{k + 1} = x_{k} + α_{k} d_{k} \nabla f (x_{k}),

where

α_{k} > 0

is a step length obtained by some line search or other strategies. The search direction

d_{k}

can be gotten by solving the equations

B_{k} d_{k} + \nabla f (x_{k}) = 0,

where the quasi-Newton matrix

B_{k}

is an approximation of

\nabla^{2} f (x_{k})

and satisfies the secant condition:

B_{k + 1} s_{k} = y_{k},

where

s_{k} = x_{k + 1} - x_{k}

,

y_{k} = \nabla f (x_{k + 1}) - \nabla f (x_{k})

. The matrix

B_{k}

can be updated by different update formulae. The Davidon–Fletcher–Powell (DFP) update,

\begin{matrix} B_{k + 1} & = & (I - \frac{y_{k} s_{k}^{T}}{y_{k}^{T} s_{k}}) B_{k} (I - \frac{y_{k} s_{k}^{T}}{y_{k}^{T} s_{k}}) + \frac{y_{k} y_{k}^{T}}{y_{k}^{T} s_{k}} \\ = & B_{k} + \frac{(y_{k} - B_{k} s_{k}) s_{k}^{T} + s_{k} {(y_{k} - B_{k} s_{k})}^{T}}{y_{k}^{T} s_{k}} \\ - \frac{{(y_{k} - B_{k} s_{k})}^{T} s_{k}}{{(y_{k}^{T} s_{k})}^{2}} y_{k} y_{k}^{T} \\ = & (I - \frac{y_{k} s_{k}^{T}}{y_{k}^{T} s_{k}}) B_{k} (I - \frac{y_{k} s_{k}^{T}}{y_{k}^{T} s_{k}}) + \frac{y_{k} y_{k}^{T}}{y_{k}^{T} s_{k}}, \end{matrix}

was first proposed by Davidon [8] and developed by Fletcher and Powell [9]. The Broyden–Fletcher–Goldfard–Shanno (BFGS) update,

B_{k + 1} = B_{k} - \frac{B_{k} s_{k} s_{k}^{T} B_{k}}{s_{k}^{T} B_{k} s_{k}} + \frac{y_{k} y_{k}^{T}}{y_{k}^{T} s_{k}},

was proposed independently by Broyden [10], Fletcher [11], Goldfarb [12], and Shanno [13]. One can find more on the topic in references [14,15,16,17].

If we assume that

H_{k} = B_{k}^{- 1}

, then using Sherman–Morrison formula, we have the Broyden’s family update:

H_{k + 1} = H_{k} - \frac{H_{k} y_{k} y_{k}^{T} H_{k}}{y_{k}^{T} H_{k} y_{k}} + \frac{s_{k} s_{k}^{T}}{s_{k}^{T} y_{k}} + ϕ_{k} v_{k} v_{k}^{T},

where

ϕ_{k} \in [0, 1]

is

ϕ_{k} = \sqrt{y_{k}^{T} H_{k} y_{k}} (\frac{s_{k}}{s_{k}^{T} y_{k}} - \frac{H_{k} y_{k}}{y_{k}^{T} H_{k} y_{k}}),

When

ϕ_{k} \equiv 1

, we have a BFGS update. When

ϕ_{k} \equiv 0

, we have a DFP update.

However, a quasi-Newton method is not desirable when applied to solve large-scale problems, because we need to store the full matrx

B_{k}

. To overcome such drawback, the so-called sparse quasi-Newton methods [14] have received much attention. Early in 1970, Schubert [18] has proposed a sparse Broyden’s rank one method. Then Powell and Toint [19], Toint [20] studied the sparse quasi-Newton method.

Existing sparse quasi-Newton methods usually use a sparse symmetric matrix as an approximation of the Hessian so that both matrices take the same form or have similar structures. If the limited memory technique [21,22] is adopted, which only stores several pairs

(s_{k}, y_{k})

to construct a matrix

H_{k}

by updating the initial matrix

H_{0}

m times, the method can be widely used in practical optimization problems. On the other hand, there are many large-scale problems in scientific fields take the partially separable form

f (x) = \sum_{i = 1}^{m} f_{i} (x),

where function

f_{i}

,

i = 1, \dots, m

is related to a few variables. For the partially separable unconstrained optimization problems, the partitioned BFGS method [23,24] was proposed and has better performance in practice. The partitioned BFGS method updates each matrix

B_{k}^{i}

of each element function

f_{i} (x)

separately via BFGS updating and sums these matrices to construct the next quasi-Newton matrix

B_{k + 1}

. Since the size of x in

f_{i} (x)

is smaller than that of n, the matrix

B_{k + 1}^{i}

will be a small matrix, and then the matrix

B_{k + 1}

will be sparse. The quasi-Newton direction is the solution of the linear equations:

(\sum_{i = 1}^{m} B_{k}^{i}) d_{k} = - \nabla f (x_{k}) .

However, the partitioned BFGS method cannot always preserve the positive definiteness of the matrix

B_{k}

, only if that each element function

f_{i} (x)

is convex, so the partitioned BFGS method is implemented with the trust region strategy [25]. Recently, for the partially separable nonlinear equations, Cao and Li [26] have introduced two kinds of partitioned quasi-Newton methods and given their global and superlinear convergence.

Another efficient sparse quasi-Newton method is designed to exploit the sparsity structures of the Hessian. We assume that for all

x \in R^{n}

,

{(\nabla^{2} f (x))}_{i, j} = 0, (i, j) \in F,

where

F \subseteq {1, \dots, n} \times {1, \dots, n}

. References [27,28] have proposed sparse quasi-Newton methods, where

H_{k + 1}

satisfies the secant equation

H_{k + 1} y_{k} = s_{k}

and sparse condition

{(H_{k + 1})}_{i j} = 0, (i, j) \in F

simultaneously, where

H_{k + 1}

is an approximate inverse Hessian. Recently, Yamashita [29] proposed another type of matrix completion quasi-Newton (MCQN) update for solving problem (1) with a sparse Hessian and proved the local and superlinear convergence for MCQN updates with the DFP method. Reference [30] established the convergence of MCQN updates with all of Broyden’s convex family. However, global convergence analysis [31] was presented for two-dimensional functions with uniformly positive definite Hessians.

Another kind of quasi-Newton method for solving large scale unconstrained optimization problems is the diagonal quasi-Newton method, where the Hessian of an objective function is approximated by a diagonal matrix with positive elements. The first version was developed by Nazareth [32], where the quasi-Newton matrix satisfies the least change and weak secant condition [33]:

\begin{matrix} min & ‖ H_{k + 1} - H_{k} ‖ \\ s.t. & y_{k}^{T} H_{k + 1} y_{k} = y_{k}^{T} s_{k}, \end{matrix}

(3)

where

{‖ \cdot ‖}_{F}

is the standard Frobenius norm. Recently, Andrei N. [34] developed a diagonal quasi-Newton method, where the diagonal elements satisfy the least change weak secant condition (3) and minimize the trace of the update. Besides, lots of other techniques, such as forward and central finite differences, the variational principle with a weighted norm, and the generalized Frobenius norm, can be used to derive different kinds of diagonal quasi-Newton method [35,36,37]. Under usual assumptions, the diagonal quasi-Newton method is linearly convergent. The authors of [38] adopted a similar technique to derivation with the DFP method and got a low memory diagonal quasi-Newton method. Using the Armijo line search, they established the global convergence and gave the sufficient conditions for the method to be superlinearly convergent.

The main contribution of our paper is to propose a sparse quasi-Newton algorithm based on automatic differentiation for solving (1). Firstly, similarly to the derivation of BFGS update, we can perform a symmetric rank-two quasi-Newton update:

B_{k + 1} = B_{k} - \frac{B_{k} σ_{k} σ_{k}^{T} B_{k}}{σ_{k}^{T} B_{k} σ_{k}} + \frac{\nabla^{2} f (x_{k + 1}) σ_{k} σ_{k}^{T} \nabla^{2} f (x_{k + 1})}{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k}},

(4)

where

σ_{k} \in R^{n}

and

B_{k + 1}

satisfying the adjoint tangent condition [39]

σ_{k}^{T} B_{k + 1} = σ_{k}^{T} \nabla^{2} f (x_{k + 1}) .

For an

n \times n

matrix, we denote

A ⪰ 0

, as A is positive definite. Then, when

B_{k} ⪰ 0

,

B_{k + 1} ⪰ 0

if and only if

σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k} > 0

, which means that the proposed update (4) keeps the positive definiteness, as in BFGS updating. Moreover, when

B_{0}

is positive definite, the matrices

{B_{k}}

updated by the proposed update (4) are positive definite for solving (1) with uniformly positive definite Hessians. In our work, we pay attention to

σ_{k} = s_{k}

; then the proposed rank-two quasi-Newton update (4) method satisfies

B_{k + 1} s_{k} = \nabla^{2} f (x_{k + 1}) s_{k},

which means that

B_{k + 1}

equals

\nabla^{2} f (x_{k + 1})

in the direction

s_{k}

exactly. Several lemmas have been given to present the properties of the proposed rank-two quasi-Newton update formula. Secondly, combined with the idea of MCQN method [29], we propose a sparse and symmetric quasi-Newton algorithm for solving (1). Under appropriate conditions, local and superlinear convergence are established. Finally, our numerical results illustrate that the proposed algorithm has satisfying performance.

The paper is organized as follows. In Section 2, we introduce a symmetric rank-two quasi-Newton update based on automatic differentiation and prove several nice properties. In Section 3, by using the idea of matrix completion, we present a sparse quasi-Newton algorithm and show some nice properties. In Section 4, we prove the local and superlinear convergence of the algorithm proposed in Section 3. Numerical results are listed in Section 5, which verify that the proposed algorithm is very encouraging. Finally, we give the conclusion.

2. A New Symmetric Rank-Two Quasi–Newton Update

Similarly to the derivation of BFGS update, we will derive a new symmetric rank-two quasi-Newton update and show several lemmas. Let

B_{k + 1} = B_{k} + ∆_{k},

where

∆_{k}

is a rank-two matrix and

B_{k + 1}

satisfies the condition

σ_{k}^{T} B_{k + 1} = σ_{k}^{T} \nabla^{2} f (x_{k + 1}),

(5)

where

σ_{k} \in R^{n}

and

σ_{k} \neq 0

. Similarly to the derivation of BFGS, we have the following symmetric rank-two update:

B_{k + 1} = B_{k} - \frac{B_{k} σ_{k} σ_{k}^{T} B_{k}}{σ_{k}^{T} B_{k} σ_{k}} + \frac{\nabla^{2} f (x_{k + 1}) σ_{k} σ_{k}^{T} \nabla^{2} f (x_{k + 1})}{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k}} .

(6)

If we denote

H_{k} = B_{k}^{- 1}

and

H_{k + 1} = B_{k + 1}^{- 1}

, then (6) can be expressed as

\begin{matrix} H_{k + 1} & = & H_{k} - \frac{H_{k} \nabla^{2} f (x_{k + 1}) σ_{k} σ_{k}^{T} + σ_{k} σ_{k}^{T} \nabla^{2} f (x_{k + 1}) H_{k}}{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k}} \\ + (1 + \frac{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) H_{k} \nabla^{2} f (x_{k + 1}) σ_{k}}{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k}}) \cdot \frac{σ_{k} σ_{k}^{T}}{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k}} . \end{matrix}

(7)

It can be seen that the update (6) involves the Hessian

\nabla^{2} f (x)

, but we do not need to compute them in practice. For given vectors x, s, and

σ

, we can get

\nabla^{2} f (x) s

and

σ^{T} \nabla^{2} f (x)

exactly by the forward and reverse mode of automatic differentiation.

Next, several lemmas are presented.

Lemma 1.

We suppose that

B_{k} ⪰ 0

and

B_{k + 1}

is updated by (6); then

B_{k + 1} ⪰ 0

if and only if

σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k} > 0

.

Proof.

According to the condition (5), one has

σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k} = σ_{k}^{T} B_{k + 1} σ_{k} .

If

B_{k + 1}

is positive definite, one has

σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k} > 0

.

Let

σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k} > 0

and

B_{k} ⪰ 0

. Then for

\forall d_{k} \in R^{n}

,

d_{k} \neq 0

, it can be derived from (6) that

d_{k}^{T} B_{k + 1} d_{k} = d_{k}^{T} B_{k} d_{k} - \frac{{(d_{k}^{T} B_{k} σ_{k})}^{2}}{σ_{k}^{T} B_{k} σ_{k}} + \frac{{(d_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k})}^{2}}{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k}} .

According to that

B_{k} ⪰ 0

, there is a symmetric matrix

B_{k}^{1 / 2} ⪰ 0

, such that

B_{k} = B_{k}^{1 / 2} B_{k}^{1 / 2}

. Then we have from Cauchy–Schwarz inequality that

\begin{matrix} {(d_{k}^{T} B_{k} σ_{k})}^{2} & = & {({(B_{k}^{1 / 2} d_{k})}^{T} (B_{k}^{1 / 2} σ_{k}))}^{2} \\ \leq & ‖ B_{k}^{1 / 2} d_{k} ‖^{2} \cdot {‖ B_{k}^{1 / 2} σ_{k} ‖}^{2} \\ = & (d_{k}^{T} B_{k} d_{k}) (σ_{k}^{T} B_{k} σ_{k}), \end{matrix}

(8)

where the equality holds if and only if

d_{k} = λ_{k} σ_{k}

,

λ_{k} \neq 0

.

If the inequality (8) holds strictly, one has

d_{k}^{T} B_{k + 1} d_{k} > d_{k}^{T} B_{k} d_{k} - d_{k}^{T} B_{k} d_{k} + \frac{{(d_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k})}^{2}}{σ_{k}^{T} \nabla^{2} f (x_{k + 1} σ_{k})} \geq 0 .

If the equality (8) holds; i.e., there exists a

λ_{k} \neq 0

such that

d_{k} = λ_{k} σ_{k}

, then it can be deduced from (8) that

d_{k}^{T} B_{k + 1} d_{k} \geq \frac{{(d_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k})}^{2}}{σ_{k}^{T} \nabla^{2} f (x_{k + 1} σ_{k})} = λ_{k}^{2} σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k} > 0 .

In conclusion,

d_{k}^{T} B_{k + 1} d_{k} > 0

for

\forall d_{k} \in R^{n}

and

d_{k} \neq 0

. □

Lemma 2.

If we rewrite update Formula (7) as

H_{k + 1} = H_{k} + E

, where

H_{k}

is symmetric and satisfies

σ_{k}^{T} = σ_{k}^{T} \nabla^{2} f (x_{k + 1}) H_{k}

, then E is the solution of the following minimization problem:

\begin{matrix} min_{E} & {‖ E ‖}_{W} \\ s . t . & E^{T} = E, \\ σ_{k}^{T} \nabla^{2} f (x_{k + 1}) E = η^{T}, \end{matrix}

where

η = σ_{k}^{T} - σ_{k}^{T} \nabla^{2} f (x_{k + 1}) H_{k}

and W satisfies

σ_{k}^{T} W = σ_{k}^{T} \nabla^{2} f (x_{k + 1})

.

Proof.

A suitable Lagrangian function of the convex programming problem is

φ = \frac{1}{4} trace (W E^{T} W E) + trace (Λ^{T} (E^{T} - E)) - λ^{T} W (E \nabla^{2} f (x_{k + 1}) σ_{k} - η),

where

Λ

and

λ

are Lagrange multipliers. Moreover,

\begin{matrix} \frac{\partial φ}{\partial E_{i j}} & = & \frac{1}{4} (trace (W e_{j} e_{i}^{T} W E) + trace (W E^{T} W e_{i} e_{j}^{T})) \\ + trace (Λ (e_{j} e_{i}^{T} - e_{i} e_{j}^{T})) - λ^{T} W e_{i} e_{j}^{T} F^{'} (x_{k + 1}) σ_{k} = 0, \end{matrix}

or according to the symmetry and cyclic permutations, one has

\frac{1}{2} {[W E W]}_{i j} + Λ_{i j} - Λ_{j i} = {[W λ σ_{k}^{T} \nabla^{2} f (x_{k + 1})]}_{i j} .

Taking the transpose and accumulating eliminates

Λ

to yield

W E W = W λ σ_{k}^{T} \nabla^{2} f (x_{k + 1}) + \nabla^{2} f (x_{k + 1}) σ_{k} λ^{T} W,

and by

σ_{k}^{T} W = σ_{k}^{T} \nabla^{2} f (x_{k + 1})

and the nonsingularity of W we have that

E = λ σ_{k}^{T} + σ_{k} λ^{T} .

(9)

Substituting (9) into

σ_{k}^{T} \nabla^{2} f (x_{k + 1}) E = η^{T}

and rewriting gives

λ = \frac{η - σ_{k} λ^{T} \nabla^{2} f (x_{k + 1}) σ_{k}}{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k}} .

Postmultiplying by

σ_{k}^{T} \nabla^{2} f (x_{k + 1})

gives

λ^{T} \nabla^{2} f (x_{k + 1}) σ_{k} = \frac{1}{2} \frac{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) η}{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k}},

so we have

λ = \frac{η - \frac{1}{2} \frac{σ_{k}^{T} σ_{k} \nabla^{2} f (x_{k + 1}) η}{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k}}}{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k}} = \frac{H_{k} \nabla^{2} f (x_{k + 1}) σ_{k} - \frac{1}{2} \frac{σ_{k}^{T} σ_{k} \nabla^{2} f (x_{k + 1}) H \nabla^{2} f (x_{k + 1}) σ_{k}}{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k}}}{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k}} .

Substituting this into (9) gives the result (7). □

Lemma 3.

If

H_{k} = B_{k}^{- 1} > 0

and

σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k} > 0

. Then

B_{k + 1}

given by (6) solves the variational problem

\begin{matrix} min_{B > 0} & ψ (H_{k}^{1 / 2} B H_{k}^{1 / 2}) \\ s . t . & B^{T} = B, \\ σ_{k}^{T} B = σ_{k}^{T} \nabla^{2} f (x_{k + 1}) . \end{matrix}

Proof.

According to the definition of

ψ

, where

ψ : R^{n \times n} \to R

[40] is given by

ψ (A) = tr (A) - ln det (A),

(10)

so we have

ψ (H_{k}^{1 / 2} B H_{k}^{1 / 2}) = trace (H_{k} B) - \ln (\det H_{k} \det B) = ψ (H_{k} B) = ψ (B H_{k}) .

(11)

We have the Lagrangian function

\begin{matrix} L (B, Λ, λ) & = & \frac{1}{2} ψ (H_{k}^{1 / 2} B H_{k}^{1 / 2}) + trace (Λ^{T} (B^{T} - B) + (σ_{k}^{T} B - σ_{k}^{T} \nabla^{2} f (x_{k + 1})) λ_{k} \\ = & \frac{1}{2} (ψ (H_{k} B) - \ln (\det H_{k}) - \ln (\det B)) \\ + trace (Λ^{T} (B^{T} - B)) + (σ_{k}^{T} B - σ_{k}^{T} \nabla^{2} f (x_{k + 1})) λ_{k}, \end{matrix}

where

Λ

and

λ

are the Lagrange multipliers. Moreover, one has

\begin{matrix} \frac{\partial L}{\partial B_{i j}} & = & \frac{1}{2} trace (H_{k} e_{i} e_{j}^{T} - {(B^{- 1})}_{j i}) + trace (Λ^{T} (e_{k} e_{i}^{T} - e_{i} e_{j}^{T})) + σ_{k}^{T} e_{i} e_{j}^{T} λ \\ = & \frac{1}{2} {(H_{k})}_{j i} - {(B^{- 1})}_{j i}) + Λ_{j i} - Λ_{i j} + {(σ_{k}^{T} λ)}_{i j} = 0 . \end{matrix}

(12)

Transposing and adding in (12) that

H_{k} - B^{- 1} + σ_{k}^{T} λ + λ^{T} σ_{k} = 0, B^{- 1} = H_{k} + σ_{k}^{T} λ + λ^{T} σ_{k} .

(13)

Combined with the tangent condition, we have that

σ_{k}^{T} = σ_{k}^{T} \nabla^{2} f (x_{k + 1}) H_{k} + σ_{k}^{T} \nabla^{2} f (x_{k + 1}) λ σ_{k}^{T} + σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k} λ^{T},

and hence

σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k} = \frac{1}{2} (1 - \frac{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) H_{k} \nabla^{2} f (x_{k + 1}) σ_{k}}{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k}}),

and so

λ = \frac{σ_{k} - H_{k} \nabla^{2} f (x_{k + 1}) - \frac{1}{2} (1 - \frac{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) H_{k} \nabla^{2} f (x_{k + 1}) σ_{k}}{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k}})}{σ_{k}^{T} \nabla^{2} f (x_{k + 1}) σ_{k}} .

Combined with (6), one has the Formula (7).

According to the Sherman–Morrison formula, (7) is equivalent to (6). Since the function

ψ (H_{k}^{1 / 2} B H_{k}^{1 / 2})

is strictly convex on

B ⪰ 0

, the update formula (6) is the unique solution of the variational problem. □

In this paper, we set

σ_{k} = s_{k}

, so one has

B_{k + 1} s_{k} = \nabla^{2} f (x_{k + 1}) s_{k},

which means that

B_{k + 1}

is an exact approximation to

\nabla^{2} f (x_{k + 1})

in direction

s_{k}

. Then we have the symmetric rank-two update formula

B_{k + 1} = B_{k} - \frac{B_{k} s_{k} s_{k}^{T} B_{k}}{s_{k}^{T} B_{k} s_{k}} + \frac{\nabla^{2} f (x_{k + 1}) s_{k} s_{k}^{T} \nabla^{2} f (x_{k + 1})}{s_{k}^{T} \nabla^{2} f (x_{k + 1}) s_{k}} .

(14)

It can be seen that

B_{k + 1}

can preserve the symmetry when

B_{k}

is symmetric. If we denote

w_{k} = \nabla^{2} f (x_{k + 1}) s_{k}

, then we can obtain a similar Broyden convex family update formula:

H_{k + 1} = H_{k} - \frac{H_{k} w_{k} w_{k}^{T} H_{k}}{w_{k}^{T} H_{k} w_{k}} + \frac{s_{k} s_{k}^{T}}{s_{k}^{T} w_{k}} + ϕ_{k} v_{k} v_{k}^{T},

(15)

where the parameter

ϕ_{k} \in [0, 1]

is defined as

v_{k} = \sqrt{w_{k}^{T} H_{k} w_{k}} (\frac{s_{k}}{s_{k}^{T} w_{k}} - \frac{H_{k} w_{k}}{w_{k}^{T} H_{k} w_{k}}) .

(16)

The choice

ϕ_{k} \equiv 1

corresponds to the BFGS update

\begin{matrix} H_{k + 1} & = & H_{k} + (1 + \frac{w_{k}^{T} H_{k} w_{k}}{s_{k}^{T} w_{k}}) \frac{s_{k}^{T} s_{k}}{s_{k}^{T} w_{k}} - \frac{s_{k}^{T} w_{k} H_{k} + H_{k} w_{k} s_{k}^{T}}{s_{k}^{T} w_{k}} \\ = & H_{k} + \frac{(s_{k} - H_{k} w_{k}) s_{k}^{T} + s_{k} {(s_{k} - H_{k} w_{k})}^{T}}{s_{k}^{T} w_{k}} . \end{matrix}

(17)

3. Algorithm and Related Properties

For the update formula (15), we adopt the idea of matrix completion. The next quasi-Newton matrix

H_{k + 1}

is the solution of the following minimization problem:

\begin{matrix} min & ψ (H_{k}^{- 1 / 2} H H_{k}^{- 1 / 2}) \\ s.t. & H_{i j} = H_{i, j}^{A D}, (i, j) \in F, \\ {(H^{- 1})}_{i, j} = 0, (i, j) \notin F, \\ H^{T} = H, H ⪰ 0 . \end{matrix}

(18)

When

G (V, \bar{F})

is chordal, the minimization problem (18) can be solved by solving the problem

\begin{matrix} max & det (H) \\ s.t. & H_{i, j} = H_{i, j}^{A D}, (i, j) \in F, \\ H^{T} = H, H ⪰ 0 . \end{matrix}

(19)

Then

H_{k + 1}

can be expressed as the sparse clique-factorization formula [29]. Then Algorithm 1 is stated as follows.

Algorithm 1 (Sparse Quasi-Newton Algorithm based on Automatic Differentiation)

Step 0. Compute $\bar{F}$ according to F such that $G (V, \bar{F})$ is a chordal graph, where $V = {1, 2, \dots, n}$ . Choose $x_{0} \in R^{n}$ , $ϵ > 0$ and a matrix $H_{0} \in R^{n \times n},$ $H_{0} ⪰ 0$ with ${(H_{0}^{- 1})}_{i j} = 0$ , $\forall (i, j) \notin F$ . Let $k : = 0$ .
Step 1 If $‖ \nabla f (x_{k}) ‖ ⩽ ϵ$ , stop.
Step 2 $x_{k + 1} = x_{k} - H_{k} \nabla f (x_{k})$ .
Step 3 Update $H_{k}$ to get $H_{i j}^{A D}, ϕ_{k} \in [0, 1]$ , $(i, j) \in F$ by update Formula (15).
Step 4 Get $H_{k + 1}$ by the minimization problem (18). When $G (V, \bar{F})$ is a chordal graph, the problem (18) can be solved by solving the problem (19).
Step 5 Let $k : = k + 1$ , go to Step 1.

When the

H_{k}

in step 3 is updated by Broyden’s class method, the method corresponds to the method in [29]. In the present paper, we focus on the MCQN update with

H^{A D} = H_{k + 1}

, where

H_{k + 1}

is given by (15).

In what follows, we give some notation for the convenience of analysis. For a nonsingular matrix P satisfying

{(P^{- 1})}_{i j} = 0, \forall (i, j) \in F,

(20)

we let

{\bar{s}}_{k} = P^{- 1 / 2} s_{k}, {\bar{w}}_{k} = P^{1 / 2} w_{k}, {\bar{H}}_{k} = P^{- 1 / 2} H_{k} P^{- 1 / 2}, {\bar{H}}^{A D} = P^{- 1 / 2} H^{A D} P^{- 1 / 2},

where

H^{A D} = H_{k + 1}

is given by (15). Then we can get from (15) that

{\bar{H}}^{A D} = {\bar{H}}_{k} - \frac{{\bar{H}}_{k} {\bar{w}}_{k} {\bar{w}}_{k}^{T} {\bar{H}}_{k}}{{\bar{w}}_{k}^{T} {\bar{H}}_{k} {\bar{w}}_{k}} + \frac{{\bar{s}}_{k} {\bar{s}}_{k}^{T}}{{\bar{s}}_{k}^{T} {\bar{w}}_{k}} + ϕ_{k} {\bar{v}}_{k} {\bar{v}}_{k}^{T},

(21)

where

{\bar{v}}_{k} = \sqrt{{\bar{w}}_{k}^{T} {\bar{H}}_{k} {\bar{w}}_{k}} (\frac{{\bar{s}}_{k}}{{\bar{s}}_{k}^{T} {\bar{w}}_{k}} - \frac{{\bar{H}}_{k} {\bar{w}}_{k}}{{\bar{w}}_{k}^{T} {\bar{H}}_{k} {\bar{w}}_{k}}) .

(22)

Similarly to that in [30], we can assume that

τ_{k} = \frac{{\bar{w}}_{k}^{T} {\bar{H}}_{k} {\bar{w}}_{k}^{T}}{‖ {\bar{w}}_{k} ‖ \cdot ‖ {\bar{H}}_{k} {\bar{w}}_{k} ‖}, q_{k} = \frac{{\bar{w}}_{k}^{T} {\bar{H}}_{k} {\bar{w}}_{k}}{‖ {\bar{w}}_{k} ‖^{2}}, η_{k} = \frac{{\bar{s}}_{k}^{T} {\bar{H}}_{k} {\bar{w}}_{k}}{{\bar{s}}_{k}^{T} {\bar{w}}_{k}}, m_{k} = \frac{{\bar{s}}_{k}^{T} {\bar{w}}_{k}}{{\bar{w}}_{k}^{T} {\bar{w}}_{k}},

M_{k} = \frac{‖ {\bar{s}}_{k} ‖^{2}}{{\bar{s}}_{k}^{T} {\bar{w}}_{k}}, β_{k} = \frac{{\bar{s}}_{k}^{T} {\bar{H}}_{k}^{- 1} {\bar{s}}_{k}^{T}}{{\bar{s}}_{k}^{T} {\bar{w}}_{k}}, γ_{k} = \frac{{\bar{w}}_{k}^{T} {\bar{H}}_{k} {\bar{w}}_{k}}{{\bar{s}}_{k}^{T} {\bar{w}}_{k}} .

According to [41] and (21), we have

tr ({\bar{H}}^{A D}) = tr ({\bar{H}}_{k}) - (1 - ϕ_{k}) \frac{q_{k}}{τ_{k}^{2}} - 2 ϕ_{k} η_{k} + (1 + ϕ_{k} \frac{q_{k}}{m_{k}}) M_{k}

(23)

and

\det ({\bar{H}}^{A D}) = \det ({\bar{H}}_{k}) (1 + ϕ_{k} (β_{k} γ_{k} - 1)) / γ_{k} .

(24)

Next, we establish a relation between

{\bar{H}}_{k + 1}

and

{\bar{H}}^{A D}

, which is very important in the establishment of the local and superlinear convergence of Algorithm 1.

Proposition 1.

For the Algorithm 1, we have the following relation:

t r ({\bar{H}}_{k + 1}) = t r ({\bar{H}}^{A D}), d e t ({\bar{H}}_{k + 1}) ⩾ d e t ({\bar{H}}^{A D}) .

(25)

Proof.

We can obtain from (18) that

{(H_{k + 1})}_{i, j} = {(H^{A D})}_{i, j}, \forall (i, j) \in F .

Combined with (20), one has that for any

(i, j) \in F

, there at least exists one of the

{(H_{k + 1} - H^{A D})}_{i j}

and

P_{i, j}^{- 1}

equals to zero. Then we can get that

\begin{matrix} tr ({\bar{H}}_{k + 1} - {\bar{H}}^{A D}) & = & tr (P^{- 1} (H_{k + 1} - H^{A D})) \\ = & \sum_{i = 1}^{n} \sum_{j = 1}^{n} {(P^{- 1})}_{i, j} {(H_{k + 1} - H^{A D})}_{i, j} = 0 . \end{matrix}

(26)

Moreover, since

H^{A D}

satisfies (19), we must have

det (H_{k + 1}) ⩾ det (H^{A D}) .

Consequently, one has

\begin{matrix} det ({\bar{H}}_{k + 1}) = {\bar{H}}^{A D} . \end{matrix}

(27)

□

Remark 1.

According to the definition of ψ (10) and the relation between

{\bar{H}}_{k + 1}

and

{\bar{H}}^{A D}

(25), one has that

ψ ({\bar{H}}_{k + 1}) ⩽ ψ ({\bar{H}}^{A D}) .

(28)

When we substitute (23) and (24) into (28), the

ψ ({\bar{H}}_{k + 1})

and

ψ ({\bar{H}}_{k})

has the relation

\begin{matrix} ψ ({\bar{H}}_{k + 1}) & ⩽ & ψ ({\bar{H}}_{k}) - (1 - ϕ_{k}) \frac{q_{k}}{τ_{k}^{2}} - 2 ϕ_{k} η_{k} + (1 + ϕ_{k} \frac{q_{k}}{m_{k}}) M_{k} \\ - ln (1 + ϕ_{k} (β_{k} γ_{k} - 1)) + ln γ_{k} . \end{matrix}

(29)

4. The Local and Superlinear Convergence

Based on the discussion in Section 3, we prove the local and superlinear convergence of Algorithm 1. First, we list the assumptions.

Assumption A1.

Assume that

x^{*}

is a solution of (1) and

] Ω = {x \in R^{n} | ‖ x - x^{*} ‖ ⩽ b},

where

b > 0

.

(1): The function $f \in R^{n} \to R$ is twice continuously differentiable on Ω.
(2): There exist two constants, $m > 0$ and $M > 0$ , satisfying

${m ‖ u ‖}^{2} ⩽ u^{T} {(\nabla^{2} f (x))}^{- 1} u ⩽ M {‖ u ‖}^{2}, \forall u \in R^{n}, x \in Ω .$

(30)

According to Assumption 1, we have constants

\bar{L} > 0

and

L > 0

such that

‖ \nabla f (x) - \nabla f (y) ‖ ⩽ \bar{L} ‖ x - y ‖, \forall x, y \in Ω,

(31)

‖ \nabla^{2} f (x) - \nabla^{2} f (y) ‖ ⩽ L ‖ x - y ‖, \forall x, y \in Ω .

(32)

We define

ϵ_{k} = max {‖ x_{k} - x^{*} ‖, ‖ x_{k + 1} - x^{*} ‖},

(33)

and get from (32) that

\begin{matrix} ‖ w_{k} - \nabla^{2} f (x^{*}) s_{k} ‖ & = & ‖ \nabla^{2} f (x_{k + 1}) s_{k} - \nabla^{2} f (x^{*}) s_{k} ‖ \\ ⩽ & ‖ \nabla^{2} f (x_{k + 1}) - \nabla^{2} f (x^{*}) ‖ \cdot ‖ s_{k} ‖ \\ ⩽ & L ‖ x_{k + 1} - x^{*} ‖ \cdot ‖ s_{k} ‖ \\ = & L ϵ_{k} ‖ s_{k} ‖ . \end{matrix}

(34)

If we take

P = H^{*}

, then one has from (34) that

\begin{matrix} ‖ {\bar{w}}_{k} - {\bar{s}}_{k} ‖ & = & ‖ P^{1 / 2} w_{k} - P^{- 1 / 2} s_{k} ‖ \\ = & ‖ {H^{*}}^{1 / 2} ‖ \cdot ‖ w_{k} - \nabla^{2} f (x^{*}) s_{k} ‖ ⩽ L ‖ H^{*} ‖^{1 / 2} ϵ_{k} ‖ s_{k} ‖, \end{matrix}

(35)

Furthermore, it is easy to deduce that

M_{k} - 1, μ_{k} = \frac{2 - M_{k} - m_{k}}{m_{k}}, {\hat{μ}}_{k} = \frac{{({\bar{w}}_{k} - {\bar{s}}_{k})}^{T} {\bar{H}}_{k} {\bar{w}}_{k}}{tr ({\bar{H}}_{k}) {\bar{s}}_{k}^{T} {\bar{w}}_{k}}, ln m_{k} ⩽ \frac{1}{2} c_{1} ϵ_{k},

(36)

where

c_{1} > 0

,

c_{2} \in (0, b)

, and

ϵ_{k} < c_{2}

. We define

\begin{matrix} ρ_{k} & = & q_{k} - 1 - ln q_{k}, \\ ζ_{k} & = & (1 - ϕ_{k}) q_{k} (τ_{k}^{- 2} - 1), \\ ξ_{k} & = & ln (1 + ϕ_{k} (β_{k} γ_{k} - 1)), \end{matrix}

(37)

and rewrite (29) as

\begin{matrix} ψ ({\bar{H}}_{k + 1}) & ⩽ & ψ ({\bar{H}}_{k}) - ρ_{k} - ζ_{k} - ξ_{k} + (M_{k} - 1) \\ + ϕ_{k} q_{k} μ_{k} + ϕ_{k} tr ({\bar{H}}_{k}) {\hat{μ}}_{k} + ln m_{k} . \end{matrix}

(38)

As

γ_{k} = q_{k} / m_{k}

and

0 ⩽ q_{k} ⩽ tr ({\bar{H}}_{k})

, we can obtain from the above inequality and (36) that

ψ ({\bar{H}}_{k + 1}) ⩽ ψ ({\bar{H}}_{k}) - ρ_{k} - ζ_{k} - ξ_{k} + c_{1} (1 + tr ({\bar{H}}_{k})) ϵ_{k} .

(39)

Considering

λ - ln λ ⩾ max ((1 - \frac{1}{e}) λ, 1), \forall λ > 0,

(40)

one has

ψ (A) ⩾ max ((1 - \frac{1}{e}) tr (A), n),

where

A^{T} = A

and

A > 0

. Moreover, it follows from (40) that

ψ ({\bar{H}}_{k + 1}) ⩽ (1 + c_{3} ϵ_{k}) ψ ({\bar{H}}_{k}) - ρ_{k} - ζ_{k} - ξ_{k},

(41)

where

c_{3} = c_{1} (\frac{1}{n} + \frac{e}{e - 1})

. Since

τ_{k}^{2} ⩽ 1

and

β_{k} γ_{k} ⩾ 1

, it is obvious that

ρ_{k}, ζ_{k}, ξ_{k} > 0

, and

ψ ({\bar{H}}_{k + 1}) ⩽ (1 + c_{3} ϵ_{k}) ψ ({\bar{H}}_{k}) .

(42)

The theorem given bellow shows that Algorithm 1 converges locally and linearly, where the relation (42) plays an essential role.

Theorem 1.

Let Assumption 1 hold and sequence

{x_{k}}

be generated by Algorithm 1 with

α_{k} \equiv 1

, where

H_{k}

is updated by (15). Then for any

ρ \in (0, 1)

, there is a constant τ

‖ x_{0} - x^{*} ‖ ⩽ τ

,

‖ H_{0} - H^{*} ‖ ⩽ τ

, such that

‖ x_{k + 1} - x^{*} ‖ ⩽ ρ ‖ x_{k} - x^{*} ‖ .

(43)

Proof.

According to the Lemma 4 [29], there are constants

\bar{τ} \in (0, b)

and

δ > 0

such that when

‖ x_{0} - x^{*} ‖ ⩽ \bar{τ}

, one has

ψ ({\bar{H}}_{0}) - n ⩽ δ / 2,

(44)

and

‖ H - H^{*} ‖ ⩽ ρ / (2 \bar{L}),

(45)

where

H ⪰ 0

and

\bar{H} = {H^{*}}^{- 1 / 2} H \bar{H} = {H^{*}}^{- 1 / 2}

. Define

τ = min \{\bar{τ}, c_{2}, \frac{ρ}{\bar{L}}, \frac{ρ}{L M}, \frac{1 - ρ}{c_{3}} ln (\frac{2 (n + δ)}{2 n + δ})\} .

(46)

We will prove the inequalities (43) and

‖ H_{k} - H^{*} ‖ ⩽ \frac{ρ}{2 \bar{L}}

(47)

hold for any

k ⩾ 0

by induction. By the Lipstchitz continuity of

\nabla^{2} f (x)

, we have for

\forall x \in Ω

,

\begin{matrix} ‖ x - x^{*} - H^{*} \nabla f (x) ‖ & ⩽ & ∥ H^{*} ‖ \cdot \int_{0}^{1} ‖ \nabla^{2} f (x + t (x - x^{*})) - \nabla^{2} f (x^{*}) ‖ \cdot ‖ x - x^{*} ‖ d t \\ ⩽ & \frac{1}{2} L M {‖ x - x^{*} ‖}^{2} . \end{matrix}

(48)

Then, when

k = 0

, it is easy to deduce (43) by (44) and (45). Moreover, when we take

α_{k} \equiv 1

and substitute

x_{0}

into (48), we can obtain

\begin{matrix} ‖ x_{1} - x^{*} ‖ & = & ‖ x_{0} - H_{0} \nabla f (x_{0}) - x^{*} ‖ \\ ⩽ & ‖ x_{0} - x^{*} - H^{*} \nabla f (x_{0}) ‖ + ‖ (H_{0} - H^{*}) (\nabla f (x_{0})) - \nabla f (x^{*})) ‖ \\ ⩽ & \frac{1}{2} L M ‖ x_{0} - x^{*} ‖^{2} + ‖ H_{0} - H^{*} ‖ \cdot ‖ \nabla f (x_{0})) - \nabla f (x^{*}) ‖ \\ ⩽ & (\frac{1}{2} L M ‖ x_{0} - x^{*} ‖ + \frac{ρ}{2}) ‖ x_{0} - x^{*} ‖ \\ ⩽ & (\frac{1}{2} L M τ + \frac{ρ}{2}) ‖ x_{0} - x^{*} ‖ \\ ⩽ & ρ ‖ x_{0} - x^{*} ‖ . \end{matrix}

(49)

So we have that (43) and (47) hold for

k = 1

. Assume that (43) and (47) hold for

k = 0, 1, \dots, l

; then one has

ϵ_{k} = ‖ x_{k} - x^{*} ‖, ϵ_{k} ⩽ ρ^{k} ϵ_{0} ⩽ ρ^{k} τ, k = 0, 1, \dots, l,

and

\begin{matrix} ‖ x_{l + 1} - x^{*} ‖ & = & ‖ x_{l} - H_{l} \nabla f (x_{l}) - x^{*} ‖ \\ ⩽ & ‖ x_{l} - x^{*} - H^{*} \nabla f (x_{l}) ‖ + ‖ (H_{l} - H^{*}) (\nabla f (x_{l})) - \nabla f (x^{*})) ‖ \\ ⩽ & \frac{1}{2} L M ‖ x_{l} - x^{*} ‖^{2} + ‖ H_{l} - H^{*} ‖ \cdot ‖ \nabla f (x_{l})) - \nabla f (x^{*}) ‖ \\ ⩽ & (\frac{1}{2} L M ‖ x_{l} - x^{*} ‖ + \frac{ρ}{2}) ‖ x_{l} - x^{*} ‖ \\ ⩽ & (\frac{1}{2} L M ρ^{l} τ + \frac{ρ}{2}) ‖ x_{l} - x^{*} ‖ \\ ⩽ & ρ ‖ x_{l} - x^{*} ‖ . \end{matrix}

(50)

Then by the definition of

τ

(46), one has

c_{3} \sum_{k = 0}^{l} ϵ_{k} ⩽ c_{3} τ \sum_{k = 0}^{l} ρ^{k} = c_{3} τ \frac{1 - ρ^{l + 1}}{1 - ρ} ⩽ \frac{c_{3} τ}{1 - ρ} ⩽ ln \frac{2 (n + δ)}{2 n + δ} .

(51)

Combine (42) and (44). It can seen that

\begin{matrix} ψ ({\bar{H}}_{l + 1}) - n & ⩽ & (ψ ({\bar{H}}_{0}) - n) + (\prod_{k = 0}^{l} (1 + c_{3} ϵ_{k}) - 1) ψ ({\bar{H}}_{0}) \\ ⩽ & \frac{δ}{2} + (n + \frac{δ}{2}) (\prod_{k = 0}^{l} e^{c_{3} ϵ_{k}} - 1) \\ ⩽ & \frac{δ}{2} + (n + \frac{δ}{2}) (e^{c_{3} \sum_{k = 0}^{l} ϵ_{k}} - 1) \\ ⩽ & \frac{δ}{2} + (n + \frac{δ}{2}) (\frac{2 (n + δ)}{2 n + δ} - 1) = δ . \end{matrix}

(52)

Thus, we can get that (47) holds for all

k = l + 1

. This completes the proof. □

Based on the above discussion and the relation (42), we can show the superlinear convergence of the Algorithm 1.

Theorem 2.

Let Assumption A1 hold and sequence

{x_{k}}

be generated by Algorithm 1 with

α_{k} \equiv 1

, where

H_{k}

is updated by (15). Then there is a constant

τ > 0

such that when

‖ x_{0} - x^{*} ‖ ⩽ τ

,

‖ H_{0} - H^{*} ‖ ⩽ τ

, one has

lim_{k \to \infty} \frac{‖ (H_{k} - H^{*}) w_{k} ‖}{‖ w_{k} ‖} = 0 .

(53)

Then the sequence

{x_{k}}

is superlinearly convergent.

Proof.

Let

τ

be defined as in (1), and for all k one has

ψ ({\bar{H}}_{k}) - n ⩽ δ .

(54)

It follows from (41) that

ρ_{k} + ζ_{k} + ξ_{k} ⩽ (ψ ({\bar{H}}_{k + 1}) - ψ ({\bar{H}}_{k})) + c_{3} ϵ_{k} ψ ({\bar{H}}_{k}) .

Summing the above inequality and combining (51) and (54), we can deduce

\sum_{k ⩾ 1} (ρ_{k} + ζ_{k} + ξ_{k}) ⩽ c_{3} \sum_{k ⩾ 1} ϵ_{k} ψ ({\bar{H}}_{k}) ⩽ c_{3} (n + δ) ln \frac{2 (n + δ)}{2 n + δ} < \infty,

(55)

which means that the nonnegative constants

ρ_{k}

,

ζ_{k}

and

ξ_{k}

all tend to zero when

k \to + \infty

. Furthermore, according to the definition of (37), we have that

(1) q_{k} \to 1; (2) if ϕ_{k} ⩽ \frac{1}{2}, τ \to 1; (3) if ϕ_{k} > \frac{1}{2}, β_{k} γ_{k} \to 1 .

First, we have

\begin{matrix} \frac{‖ {H^{*}}^{- 1 / 2} (H_{k} - H^{*}) w_{k} ‖^{2}}{‖ {H^{*}}^{1 / 2} w_{k} ‖^{2}} & = & \frac{‖ {\bar{H}}_{k} {\bar{w}}_{k} ‖^{2} - 2 {\bar{w}}_{k}^{T} {\bar{H}}_{k} {\bar{w}}_{k} + {‖ {\bar{w}}_{k} ‖}^{2}}{‖ {\bar{H}}_{k} ‖^{2}} \\ = & \frac{q_{k}}{τ_{k}^{2}} - 2 q_{k} + 1 . \end{matrix}

(56)

For the case

{k_{i} : ϕ_{k_{i}} ⩽ \frac{1}{2}}

, one has

q_{k} \to 1

, and

τ_{k_{i}} \to 1

; and then (53) is true.

Moreover, it is easy to deduce that

\begin{matrix} \frac{‖ {\bar{H}}_{k} {\bar{w}}_{k} - {\bar{s}}_{k} ‖^{2}}{‖ {\bar{w}}_{k} ‖^{2}} & ⩽ & \frac{‖ {\bar{H}}_{k}^{1 / 2} ‖^{2} \cdot {‖ {\bar{H}}_{k}^{1 / 2} {\bar{w}}_{k} - {({\bar{H}}_{k})}^{- 1 / 2} {\bar{s}}_{k} ‖}^{2}}{‖ {\bar{w}}_{k} ‖^{2}} \\ = & \frac{‖ {\bar{H}}_{k}^{1 / 2} ‖^{2} ({\bar{w}}_{k}^{T} {\bar{H}}_{k} {\bar{w}}_{k} - 2 {\bar{s}}_{k}^{T} {\bar{w}}_{k} + {\bar{s}}_{k}^{T} {({\bar{H}}_{k})}^{- 1} {\bar{s}}_{k})}{‖ {\bar{w}}_{k} ‖^{2}} \\ = & ‖ {\bar{H}}_{k}^{1 / 2} ‖^{2} (q_{k} - 2 m_{k} + \frac{β_{k} γ_{k}}{q_{k}}) . \end{matrix}

(57)

We also have

| \frac{‖ {\bar{H}}_{k} {\bar{w}}_{k} - {\bar{w}}_{k} ‖}{‖ {\bar{w}}_{k} ‖} - \frac{‖ {\bar{H}}_{k} {\bar{w}}_{k} - {\bar{s}}_{k} ‖}{‖ {\bar{w}}_{k} ‖} | ⩽ \frac{‖ {\bar{w}}_{k} - {\bar{s}}_{k} ‖}{‖ {\bar{w}}_{k} ‖} \to 0 .

(58)

For the case

{k_{i} : ϕ_{k_{i}} > \frac{1}{2}}

, one has

q_{k} \to 1

,

β_{k} γ_{k} \to 1

,

m_{k} \to 1

; then (53) is true by (56)–(58). Thus, the relation (53) holds for all k.

Next, we will show that (53) indicates that the sufficient condition [6]

lim_{k \to \infty} \frac{‖ (B_{k} - \nabla^{2} f (x^{*})) s_{k} ‖}{‖ s_{k} ‖} = 0

(59)

holds. According to (47), one has that there is a constant

λ_{min} > 0

such that

{(λ_{k})}_{i} ⩾ λ_{min}

, where

{(λ_{k})}_{i}

denotes the eigenvalues of

H_{k}

,

i = 1, 2, \dots, n

. When we let

w_{k} = \nabla^{2} f (x_{k + 1}) s_{k}

, one has

\begin{matrix} ‖ (H_{k} - H^{*}) w_{k} ‖ \\ = & ‖ (H_{k} - H^{*}) \nabla^{2} f (x^{*}) s_{k} + (H_{k} - H^{*}) (\nabla^{2} f (x_{k + 1}) - \nabla^{2} f (x^{*})) s_{k} ‖ \\ ⩾ & ‖ H_{k} (\nabla^{2} f (x^{*}) - B_{k}) s_{k} ‖ - ‖ (H_{k} - H^{*}) (\nabla^{2} f (x_{k + 1}) - \nabla^{2} f (x^{*})) s_{k} ‖ \\ ⩾ & λ_{min} ‖ (\nabla^{2} f (x^{*}) - B_{k}) s_{k} ‖ - ‖ (H_{k} - H^{*}) (\nabla^{2} f (x_{k + 1}) - \nabla^{2} f (x^{*})) s_{k} ‖, \end{matrix}

and

\begin{matrix} \frac{‖ (H_{k} - H^{*}) w_{k} ‖}{‖ w_{k} ‖} \\ = & \frac{λ_{min} ‖ (\nabla^{2} f (x^{*}) - B_{k}) s_{k} ‖}{‖ \nabla^{2} f (x_{k + 1}) s_{k} ‖} - \frac{‖ (H_{k} - H^{*}) (\nabla^{2} f (x_{k + 1}) - \nabla^{2} f (x^{*})) s_{k} ‖}{‖ \nabla^{2} f (x_{k + 1}) s_{k} ‖} \\ ⩾ & \frac{λ_{min} ‖ (\nabla^{2} f (x^{*}) - B_{k}) s_{k} ‖}{\frac{1}{λ_{min}} ‖ s_{k} ‖} - \frac{‖ (H_{k} - H^{*}) (\nabla^{2} f (x_{k + 1}) - \nabla^{2} f (x^{*})) s_{k} ‖}{\frac{1}{λ_{min}} ‖ s_{k} ‖} \\ = & λ_{min}^{2} ‖ (\nabla^{2} f (x^{*}) - B_{k}) ‖ - λ_{min} ‖ (H_{k} - H^{*}) (\nabla^{2} f (x_{k + 1}) - \nabla^{2} f (x^{*})) ‖ . \end{matrix}

When

k \to \infty

, since

x_{k} \to x^{*}

, then one has from (53) that

lim_{k \to \infty} \frac{‖ (B_{k} - \nabla^{2} f (x^{*})) s_{k} ‖}{‖ s_{k} ‖} = 0,

which is the well-known Dennis–Moré condition. Thus, we get the superlinear convergence. □

5. Numerical Experiments

The performance in [29] shows that the MCQN update with the BFGS method has better numerical performance than the MCQN update with DFP method. Hence, we compare the numerical performance of Algorithm 1 with the MCQN update with BFGS method and the limited-memory BFGS method.

The 24 test problems with initial points are given in Table 1, which are from [29,42,43,44]. It can be seen that all the test problems have special Hessian structures such as band matrices, so the chordal extension of the sparsity could be obtained easily. Then

H_{k + 1}

in Algorithm 1 can be written as the sparse clique-factorization formula.

All the methods were coded in MATLAB R2016a on a Core (TM) i5 PC. The automatic differentiation was computed by ADMAT 2.0, which is available on the cayuga research GitHUB page. In Table 1, Table 2, Table 3 and Table 4 and Figure 1 and Figure 2, we report the numerical performances of the three methods. For the convenience of statement, we use the following notation in our numerical results.

Pro: the problems;
Dim: the dimensions of the test problem;
Init: the initial points;
Method: the algorithm used to solve the problem;
MCQN-BFGS: MCQN update with the BFGS method;
L-BFGS: limited-memory with the BFGS method.

We adopted the termination criterion as follows:

\frac{‖ \nabla f (x) ‖}{n} ⩽ 10^{- 5} or ite ⩾ 5000 .

Firstly, we tested all three methods on the above 24 problems, whose dimensions are 10, 20, 50, 100, 200, 5000, 1000, 2000, 5000, and 1000. We set

m = 15

in the limited-memory BFGS method. Table 2 and Table 3 contain the numbers of iterations of the three methods for the test problems. Taking account of the total number of iterations, Algorithm 1 outperformed the MCQN update with BFGS method on 11 problems (2, 4, 5, 7, 9, 10, 12, 14, 18, 23, 24). Additionally, Algorithm 1 outperformed the limited memeory BFGS method on 13 problems (1, 2, 3, 7, 9, 12, 15, 16, 18, 19, 20, 21, 23).

For the sake of precise comparison, we adopted the performance profiles from [45], which are distribution functions of a performance metric. We denote P and S as the test set and the set of solvers; and

N_{p}

and

N_{s}

as the umber of problems and number of solvers, respectively. For solver

s \in S

and problem

p \in P

, we define

t_{p, s}

as the number of iterations or number of function evaluations required for solve problem p using solver s. Then, using the performance ration

r_{p, s} = \frac{t_{p, s}}{min {t_{p, q} : q \in S}},

we define

ρ_{s} (t) = \frac{1}{N_{p}} size {p \in P : r_{p, s} \leq t},

where

r_{p, s} \leq r_{M}

for some constant for all p and s. The equality holds if and only if solver s cannot solve problem p. Therefore,

ρ_{s} : R \to [0, 1]

was the probability for

s \in S

satisfying

r_{p, s} \leq t, t \in R

among the best possible ratios.

Figure 1 evaluates the number of iterations of and the MCQN update with BFGS method by using performance profiles. It can be seen that the top curve corresponds to Algorithm 1, which shows that Algorithm 1 had better performance than the MCQN update with BFGS method. Additionally, Figure 2 demonstrates that Algorithm 1 had better performance than the limited-memory BFGS method.

Secondly, for a further comparison of Algorithm 1 and the MCQN update with BFGS method, we tested five different initial points,

x_{0}

,

2 x_{0}

,

4 x_{0}

,

7 x_{0}

, and

10 x_{0}

, where

x_{0}

is specified in Table 1. The dimensions of the test problems was 1000. Table 4 reports the number of iterations required of the two methods for 24 test problems, which also demonstrates that Algorithm 1 was effective and superior to the MCQN update with BFGS method.

6. Conclusions

In this paper, we presented a symmetric rank-two quasi-Newton update method based on an adjoint tangent condition for solving unconstrained optimization problems. Combined with the idea of matrix completion, we proposed a sparse quasi-Newton algorithm and established its local and superlinear convergence. Extensive numerical results demonstrated that the proposed algorithm outperformed other methods and can be used to solve large-scale unconstrained optimization problems.

Author Contributions

Conceptualization, H.C.; methodology, H.C. and X.A.; software, H.C. and X.A.; formal analysis, H.C.; writing—original draft preparation, H.C. and X.A.; writing—review and editing H.C. and X.A. All authors have read and agreed to the published version of the manuscript.

Funding

The work is supported by the National Natural Science Foundation of China, grant number 11701577; the Natural Science Foundation of Hunan Province, China, grant number 2020JJ5960; and the Scientific Research Foundation of Hunan Provincial Education Department, China, grant number 18C0253.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The date used to support the research plan and all the code used in this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhou, W. A modified BFGS type quasi-Newton method with line search for symmetric nonlinear equations problems. J. Comput. Appl. Math. 2020, 367, 112454. [Google Scholar] [CrossRef]
Zhou, W. A globally convergent BFGS method for symmetric nonlinear equations. J. Ind. Manag. Optim. 2021. [Google Scholar] [CrossRef]
Zhou, W. A class of line search-type methods for nonsmooth convex regularized minimization. Softw. Comput. 2021, 25, 7131–7141. [Google Scholar] [CrossRef]
Zhou, W.; Zhang, L. A modified Broyden-like quasi-Newton method for nonlinear equations. J. Comput. Appl. Math. 2020, 372, 112744. [Google Scholar] [CrossRef]
Sabi’u, J.; Muangchoo, K.; Shah, A.; Abubakar, A.B.; Jolaoso, L.O. A Modified PRP-CG Type Derivative-Free Algorithm with Optimal Choices for Solving Large-Scale Nonlinear Symmetric Equations. Symmetry 2021, 13, 234. [Google Scholar] [CrossRef]
Dennis, J.E.; Moré, J.J. A characterization of superlinear convergence and its application to quasi-Newton methods. Math. Comput. 1974, 28, 549–560. [Google Scholar] [CrossRef]
Dennis, J.E.; Moré, J.J. Quasi–Newton methods, motivation and theory. SIAM Rev. 1977, 19, 46–89. [Google Scholar] [CrossRef] [Green Version]
Davidon, W.C. Variable metric method for minimization. In Research Development Report ANL-5990; University of Chicago: Chicago, IL, USA, 1959. [Google Scholar] [CrossRef] [Green Version]
Fletcher, R.; Powell, M.J. A rapidly convergent descent method for minimization. Comput. J. 1963, 6, 163–168. [Google Scholar] [CrossRef]
Broyden, C.G. The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA J. Appl. Math. 1970, 6, 76–90. [Google Scholar] [CrossRef]
Fletcher, R. A new approach to variable metric algorithms. Comput. J. 1970, 13, 317–322. [Google Scholar] [CrossRef] [Green Version]
Goldfarb, D. A family of variable-metric methods derived by variational means. Math. Comput. 1970, 24, 23–26. [Google Scholar] [CrossRef]
Shanno, D.F. Conditioning of quasi-Newton methods for function minimization. Math. Comput. 1970, 24, 647–656. [Google Scholar] [CrossRef]
Quasi–Newton Methods. In Optimization Theory and Methods; Springer Series in Optimization and Its Applications; Springer: Boston, MA, USA, 2006; Volume 1, pp. 203–301. [CrossRef]
Quasi–Newton Methods. In Numerical Optimization; Springer Series in Operations Research and Financial Engineering; Springer: New York, NY, USA, 2006. [CrossRef]
Sun, W.; Yuan, Y.X. Optimization Theory and Methods: Nonlinear Programming; Springer Science & Business Media: New York, NY, USA, 2006. [Google Scholar]
Andrei, N. Continuous Nonlinear Optimization for Engineering Applications in GAMS Technology; Springer Optimization and Its Applications Series; Springer: Berlin, Germany, 2017; Volume 121. [Google Scholar] [CrossRef]
Schubert, L.K. Modification of a quasi-Newton method for nonlinear equations with a sparse Jacobian. Math. Comput. 1970, 24, 27–30. [Google Scholar] [CrossRef]
Powell, M.J.D.; Toint, P.L. On the estimation of sparse Hessian matrices. SIAM J. Numer. Anal. 1979, 16, 1060–1074. [Google Scholar] [CrossRef]
Toint, P. Towards an efficient sparsity exploiting Newton method for minimization. In Sparse Matrices and Their Uses; Academic Press: London, UK, 1981; pp. 57–88. [Google Scholar]
Nocedal, J. Updating quasi-Newton matrices with limited storage. Math. Comput. 1980, 35, 773–782. [Google Scholar] [CrossRef]
Liu, D.C.; Nocedal, J. On the limited memory BFGS method for large scale optimization. Math. Program. 1989, 45, 503–528. [Google Scholar] [CrossRef] [Green Version]
Griewank, A.; Toint, P.L. Partitioned variable metric updates for large structured optimization problems. Numer. Math. 1982, 39, 119–137. [Google Scholar] [CrossRef]
Griewank, A.; Toint, P.L. Local convergence analysis for partitioned quasi-Newton updates. Numer. Math. 1982, 39, 429–448. [Google Scholar] [CrossRef]
Griewank, A. The global convergence of partitioned BFGS on problems with convex decompositions and Lipschitzian gradients. Math. Program. 1991, 50, 141–175. [Google Scholar] [CrossRef]
Cao, H.P.; Li, D.H. Partitioned quasi-Newton methods for sparse nonlinear equations. Comput. Optim. Appl. 2017, 66, 481–505. [Google Scholar] [CrossRef]
Toint, P.L. On sparse and symmetric matrix updating subject to a linear equation. Math. Comput. 1977, 31, 954–961. [Google Scholar] [CrossRef]
Fletcher, R. An optimal positive definite update for sparse Hessian matrices. SIAM J. Optim. 1995, 5, 192–218. [Google Scholar] [CrossRef]
Yamashita, N. Sparse quasi-Newton updates with positive definite matrix completion. Math. Program. 2008, 115, 1–30. [Google Scholar] [CrossRef]
Dai, Y.H.; Yamashita, N. Analysis of sparse quasi-Newton updates with positive definite matrix completion. J. Oper. Res. Soc. China 2014, 2, 39–56. [Google Scholar] [CrossRef] [Green Version]
Dai, Y.H.; Yamashita, N. Convergence analysis of sparse quasi-Newton updates with positive definite matrix completion for two-dimensional functions. Numer. Algebr. Control. Optim. 2011, 1, 61–69. [Google Scholar] [CrossRef]
Nazareth, J.L. If quasi-Newton then why not quasi-Cauchy. SIAG/Opt Views-and-News 1995, 6, 11–14. [Google Scholar]
Dennis, J.E., Jr.; Wolkowicz, H. Sizing and least-change secant methods. SIAM J. Numer. Anal. 1993, 30, 1291–1314. [Google Scholar] [CrossRef] [Green Version]
Andrei, N. A diagonal quasi-Newton updating method for unconstrained optimization. Numer. Algorithms 2019, 81, 575–590. [Google Scholar] [CrossRef]
Andrei, N. A new diagonal quasi-Newton updating method with scaled forward finite differences directional derivative for unconstrained optimization. Numer. Funct. Anal. Optim. 2019, 40, 1467–1488. [Google Scholar] [CrossRef]
Andrei, N. Diagonal Approximation of the Hessian by Finite Differences for Unconstrained Optimization. J. Optim. Theory Appl. 2020, 185, 859–879. [Google Scholar] [CrossRef]
Andrei, N. A new accelerated diagonal quasi-Newton updating method with scaled forward finite differences directional derivative for unconstrained optimization. Optimization 2021, 70, 345–360. [Google Scholar] [CrossRef]
Leong, W.J.; Enshaei, S.; Kek, S.L. Diagonal quasi-Newton methods via least change updating principle with weighted Frobenius norm. Numer. Algorithms 2021, 86, 1225–1241. [Google Scholar] [CrossRef]
Schlenkrich, S.; Griewank, A.; Walther, A. On the local convergence of adjoint Broyden methods. Math. Program. 2010, 121, 221–247. [Google Scholar] [CrossRef]
Byrd, R.H.; Nocedal, J. A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J. Numer. Anal. 1989, 26, 727–739. [Google Scholar] [CrossRef]
Byrd, R.H.; Nocedal, J.; Yuan, Y.X. Global convergence of a cass of quasi-Newton methods on convex problems. SIAM J. Numer. Anal. 1987, 24, 1171–1190. [Google Scholar] [CrossRef]
Moré, J.J.; Garbow, B.S.; Hillstrom, K.E. Testing unconstrained optimization software. ACM Trans. Math. Softw. (TOMS) 1981, 7, 17–41. [Google Scholar] [CrossRef]
Luksan, L.; Matonoha, C.; Vlcek, J. Modified CUTE Problems for Sparse Unconstrained Optimization; Technical Report 1081; Institute of Computer Science, Academy of Sciences of the Czech Republic: Prague, Czech Republic, 2010. [Google Scholar]
Andrei, N. An unconstrained optimization test functions collection. Adv. Model. Optim. 2008, 10, 147–161. [Google Scholar]
Dolan, E.D.; Moré, J.J. Benchmarking optimization software with performance profiles. Math. Program. 2002, 91, 201–213. [Google Scholar] [CrossRef]

Figure 1. Performance profiles based on the numbers of iterations.

Figure 2. Performance profiles based on the numbers of iterations.

Table 1. The test problems.

Pro	the Test Functions	Init
1	TRIDIA [29]	$x_{0} = {(1, 1, \dots, 1)}^{T}$
2	the chained Rosenbrock problem [29]	$x_{0} = {(- 1.2, - 1, \dots, - 1.2, - 1)}^{T}$
3	the boundary value problem [29]	$x_{0} = {(\frac{1}{n + 1}, \frac{2}{n + 1} \dots, \frac{n}{n + 1})}^{T}$
4	Broyden tridiagonal function [42]	$x_{0} = {(- 1, - 1, \dots, - 1)}^{T}$
5	DQRTIC [43]	$x_{0} = {(2, 2, \dots, 2)}^{T}$
6	EDENSCH [43]	$x_{0} = {(0, 0, \dots, 0)}^{T}$
7	ENGVAL1 [43]	$x_{0} = {(2, 2, \dots, 2)}^{T}$
8	COSINE [43]	$x_{0} = {(1, 1, \dots, 1)}^{T}$
9	ERRINROS-modified [43]	$x_{0} = {(- 1, - 1, \dots, - 1)}^{T}$
10	FREUROTH [43]	$x_{0} = {(0.5, - 2, 0, \dots, 0)}^{T}$
11	MOREBV- different start point [43]	$x_{0} = {(0.5, 0.5, \dots, 0.5)}^{T}$
12	TOINTGSS [43]	$x_{0} = {(3, 3, \dots, 3)}^{T}$
13	SCHMVETT [43]	$x_{0} = {(3, 3, \dots, 3)}^{T}$
14	Extended Freudenstein and Roth function [44]	$x_{0} = {(0.5, - 2, \dots, 0.5, - 2)}^{T}$
15	Raydan 1 function [44]	$x_{0} = {(1, 1, \dots, 1)}^{T}$
16	Generalized Tridiagonal function [44]	$x_{0} = {(2, 2, \dots, 2)}^{T}$
17	Extended Himmelblau function [44]	$x_{0} = {(1, 1, \dots, 1)}^{T}$
18	Generalized PSCI function [44]	$x_{0} = {(3, 0.1, \dots, 3, 0.1)}^{T}$
19	Extended Tridiagonal 2 function [44]	$x_{0} = {(1, 1, \dots, 1)}^{T}$
20	Raydan 2 function [44]	$x_{0} = {(1, 1, \dots, 1)}^{T}$
21	Extended Freudenstein and Roth function [44]	$x_{0} = {(1, 1, \dots, 1)}^{T}$
22	DQDRTIC function [44]	$x_{0} = {(3, 3, \dots, 3)}^{T}$
23	Generalized Quartic function [44]	$x_{0} = {(1, 1, \dots, 1)}^{T}$
24	HIMMELBG function [44]	$x_{0} = {(1.5, 1.5, \dots, 1.5)}^{T}$

Table 2. Numbers of iterations for problems 1–12.

Dim	10	20	50	100	200	500	1000	2000	5000	10,000
(1) Algorithm 1	30	38	51	78	96	146	217	301	424	527
(1) MCQN-BFGS	29	38	51	72	95	146	192	298	424	528
(1) L-BFGS	26	39	96	158	360	864	1042	1759	3153	3152
(2) Algorithm 1	49	90	166	308	595	1345	2699	5437	3218	2725
(2) MCQN-BFGS	60	95	200	384	683	1668	3249	6486	4562	3207
(2) L-BFGS	59	113	260	504	999	2481	4947	9887	24,732	49,391
(3) Algorithm 1	16	26	42	58	59	51	49	60	102	399
(3) MCQN-BFGS	15	26	42	50	59	71	54	69	101	402
(3) L-BFGS	39	114	279	700	1503	1659	2695	3370	8867	27,471
(4) Algorithm 1	31	25	34	49	43	44	43	49	52	53
(4) MCQN-BFGS	30	29	43	45	49	58	61	62	63	56
(4) L-BFGS	21	27	40	54	41	38	38	56	52	50
(5) Algorithm 1	30	48	60	92	109	99	92	89	81	81
(5) MCQN-BFGS	35	49	67	94	111	108	112	98	84	84
(5) L-BFGS	28	27	34	31	33	39	41	43	54	81
(6) Algorithm 1	23	26	38	44	54	55	47	51	61	51
(6) MCQN-BFGS	17	27	36	53	60	55	54	51	50	54
(6) L-BFGS	17	19	19	22	21	23	24	26	24	25
(7) Algorithm 1	16	21	21	19	17	17	16	15	16	15
(7) MCQN-BFGS	20	22	23	22	15	15	15	17	17	16
(7) L-BFGS	20	22	26	21	22	21	25	27	28	30
(8) Algorithm 1	22	23	24	21	23	27	27	28	29	30
(8) MCQN-BFGS	23	25	26	26	26	27	28	28	29	30
(8) L-BFGS	9	9	9	9	10	10	10	10	10	10
(9) Algorithm 1	74	122	134	149	137	180	148	153	172	170
(9) MCQN-BFGS	106	125	145	171	199	181	168	171	174	179
(9) L-BFGS	163	245	216	196	189	190	163	169	171	192
(10) Algorithm 1	45	45	48	39	45	43	45	145	161	145
(10) MCQN-BFGS	47	48	49	43	47	48	41	244	204	279
(10) L-BFGS	24	25	24	24	24	22	22	22	20	22
(11) Algorithm 1	24	45	97	121	82	67	35	34	21	11
(11) MCQN-BFGS	24	45	98	121	82	67	35	34	21	11
(11) L-BFGS	33	103	136	127	85	49	32	21	10	9
(12) Algorithm 1	13	11	11	12	9	5	6	2	3	2
(12) MCQN-BFGS	15	11	13	12	12	7	6	2	3	2
(12) L-BFGS	6	8	9	10	13	11	10	10	9	10

Table 3. Numbers of iterations for problems 13–24.

Dim	10	20	50	100	200	500	1000	2000	5000	10,000
(13) Algorithm 1	19	20	21	22	18	17	15	14	13	12
(13) MCQN-BFGS	19	20	21	22	18	17	15	14	13	12
(13) L-BFGS	16	18	17	18	18	17	18	18	17	18
(14) Algorithm 1	36	52	95	111	169	262	612	567	1062	1114
(14) MCQN-BFGS	37	54	86	120	190	300	594	679	1062	1126
(14) L-BFGS	10	10	10	10	10	10	10	10	10	11
(15) Algorithm 1	11	13	23	30	39	45	60	97	196	295
(15) MCQN-BFGS	11	12	21	28	37	45	64	95	196	295
(15) L-BFGS	13	21	32	50	79	122	207	338	402	770
(16) Algorithm 1	29	31	47	69	110	149	165	174	170	163
(16) MCQN-BFGS	29	35	51	70	100	146	164	173	171	164
(16) L-BFGS	25	65	80	162	160	156	151	150	144	140
(17) Algorithm 1	16	15	14	11	13	12	12	12	10	9
(17) MCQN-BFGS	14	11	18	12	13	12	12	12	10	9
(17) L-BFGS	8	8	8	8	8	8	8	8	8	8
(18) Algorithm 1	49	21	23	21	19	14	22	21	13	11
(18) MCQN-BFGS	43	28	22	29	20	15	20	25	23	26
(18) L-BFGS	36	34	37	36	39	36	40	34	41	37
(19) Algorithm 1	13	12	11	13	12	12	12	10	7	5
(19) MCQN-BFGS	13	12	11	13	12	12	12	10	7	5
(19) L-BFGS	12	14	15	16	16	17	16	15	16	17
(20) Algorithm 1	5	5	4	4	4	4	4	3	3	3
(20) MCQN-BFGS	5	5	4	4	4	4	4	3	3	3
(20) L-BFGS	7	7	7	7	7	7	7	7	7	7
(21) Algorithm 1	14	11	11	21	18	18	35	28	26	43
(21) MCQN-BFGS	13	10	11	22	18	18	35	28	26	43
(21) L-BFGS	10	13	15	19	22	30	36	51	56	64
(22) Algorithm 1	36	36	26	28	26	27	27	30	28	29
(22) MCQN-BFGS	36	36	26	28	26	27	27	30	28	29
(22) L-BFGS	13	13	16	16	17	16	16	15	19	19
(23) Algorithm 1	13	17	12	21	12	14	8	11	10	9
(23) MCQN-BFGS	16	18	22	25	15	13	12	13	12	10
(23) L-BFGS	14	14	16	15	17	24	27	27	27	31
(24) Algorithm 1	11	13	18	24	31	37	28	21	13	7
(24) MCQN-BFGS	12	15	19	25	32	37	28	21	13	7
(24) L-BFGS	10	10	10	10	10	10	10	10	10	10

Table 4. Results of Dim = 1000 with different initial points.

Pro	Algorithm 1					MCQN-BFGS
Init	$x_{0}$	$2 x_{0}$	$4 x_{0}$	$7 x_{0}$	$10 x_{0}$	$x_{0}$	$2 x_{0}$	$4 x_{0}$	$7 x_{0}$	$10 x_{0}$
(1)	217	189	192	194	196	192	210	213	220	213
(2)	2699	2684	2641	1192	2694	3249	4850	5056	2157	4961
(3)	49	47	55	68	65	54	213	210	228	294
(4)	43	47	36	81	80	60	213	210	228	294
(5)	92	83	86	91	89	112	106	85	94	94
(6)	47	47	47	47	47	54	54	54	54	54
(7)	16	30	53	19	21	15	19	27	21	22
(8)	27	31	16	35	54	28	31	16	33	56
(9)	148	159	154	184	157	168	151	147	191	154
(10)	45	45	164	170	175	41	263	203	192	194
(11)	35	70	96	112	127	35	70	96	112	127
(12)	6	6	2	2	2	6	6	2	2	2
(13)	15	16	17	14	14	15	16	17	14	14
(14)	612	312	504	511	318	594	523	503	481	563
(15)	60	57	208	452	1024	64	58	203	532	941
(16)	165	180	173	153	129	164	183	191	197	215
(17)	12	11	10	7	21	12	11	8	7	30
(18)	22	23	25	60	41	20	23	30	65	75
(19)	12	12	14	16	24	12	12	14	22	20
(20)	4	6	5	6	4	4	6	5	6	4
(21)	35	30	35	112	719	35	30	35	112	719
(22)	27	28	28	29	29	27	28	28	29	29
(23)	8	22	23	14	35	12	13	24	19	70
(24)	28	20	21	21	21	28	20	21	21	21

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, H.; An, X. A Sparse Quasi-Newton Method Based on Automatic Differentiation for Solving Unconstrained Optimization Problems. Symmetry 2021, 13, 2093. https://doi.org/10.3390/sym13112093

AMA Style

Cao H, An X. A Sparse Quasi-Newton Method Based on Automatic Differentiation for Solving Unconstrained Optimization Problems. Symmetry. 2021; 13(11):2093. https://doi.org/10.3390/sym13112093

Chicago/Turabian Style

Cao, Huiping, and Xiaomin An. 2021. "A Sparse Quasi-Newton Method Based on Automatic Differentiation for Solving Unconstrained Optimization Problems" Symmetry 13, no. 11: 2093. https://doi.org/10.3390/sym13112093

APA Style

Cao, H., & An, X. (2021). A Sparse Quasi-Newton Method Based on Automatic Differentiation for Solving Unconstrained Optimization Problems. Symmetry, 13(11), 2093. https://doi.org/10.3390/sym13112093

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Sparse Quasi-Newton Method Based on Automatic Differentiation for Solving Unconstrained Optimization Problems

Abstract

1. Introduction

2. A New Symmetric Rank-Two Quasi–Newton Update

3. Algorithm and Related Properties

4. The Local and Superlinear Convergence

5. Numerical Experiments

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI