Next Article in Journal
The Application of Mixed Smoothing Spline and Fourier Series Model in Nonparametric Regression
Previous Article in Journal
Manifold Calculus in System Theory and Control—Fundamentals and First-Order Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Sparse Quasi-Newton Method Based on Automatic Differentiation for Solving Unconstrained Optimization Problems

1
School of Science, Xi’An Polytechnic University, Xi’an 710048, China
2
School of Science, Xi’An Technological University, Xi’an 710021, China
*
Author to whom correspondence should be addressed.
Symmetry 2021, 13(11), 2093; https://doi.org/10.3390/sym13112093
Submission received: 18 September 2021 / Revised: 25 October 2021 / Accepted: 29 October 2021 / Published: 4 November 2021
(This article belongs to the Section Mathematics)

Abstract

:
In our paper, we introduce a sparse and symmetric matrix completion quasi-Newton model using automatic differentiation, for solving unconstrained optimization problems where the sparse structure of the Hessian is available. The proposed method is a kind of matrix completion quasi-Newton method and has some nice properties. Moreover, the presented method keeps the sparsity of the Hessian exactly and satisfies the quasi-Newton equation approximately. Under the usual assumptions, local and superlinear convergence are established. We tested the performance of the method, showing that the new method is effective and superior to matrix completion quasi-Newton updating with the Broyden–Fletcher–Goldfarb–Shanno (BFGS) method and the limited-memory BFGS method.

1. Introduction

We concentrated on the unconstrained optimization problem
min f ( x ) , x R n ,
where f : R n R is a twice continuously differentiable function; and f ( x ) and 2 f ( x ) denote the gradient and Hessian of f at x, respectively. The first order necessary condition of (1) is
f ( x ) = 0 ,
which can be written as the symmetric nonlinear equations
F ( x ) = 0 ,
where F : R n R n is a continuously differentiable mapping and the symmetry implies that the Jacobian F ( x ) satisfies F ( x ) = F ( x ) T . That symmetric nonlinear system has close relationships with many practical problems, such as the gradient mapping of unconstrained optimization problems, the Karush–Kuhn–Tuckrt (KKT) system of equality constrained optimization problem, the discretized two-point boundary value problem, and the saddle point problem (2) [1,2,3,4,5].
For small or medium-scale problems, classical quasi-Newton methods enjoy superlinear convergence without the calculation of the Hessian [6,7]. Let x k be the current iterative point and B k be the symmetric approximation of the Hessian; then the iteration { x k } generated by quasi-Newton methods is
x k + 1 = x k + α k d k f ( x k ) ,
where α k > 0 is a step length obtained by some line search or other strategies. The search direction d k can be gotten by solving the equations
B k d k + f ( x k ) = 0 ,
where the quasi-Newton matrix B k is an approximation of 2 f ( x k ) and satisfies the secant condition:
B k + 1 s k = y k ,
where s k = x k + 1 x k , y k = f ( x k + 1 ) f ( x k ) . The matrix B k can be updated by different update formulae. The Davidon–Fletcher–Powell (DFP) update,
B k + 1 = I y k s k T y k T s k B k I y k s k T y k T s k + y k y k T y k T s k = B k + ( y k B k s k ) s k T + s k ( y k B k s k ) T y k T s k ( y k B k s k ) T s k ( y k T s k ) 2 y k y k T = I y k s k T y k T s k B k I y k s k T y k T s k + y k y k T y k T s k ,
was first proposed by Davidon [8] and developed by Fletcher and Powell [9]. The Broyden–Fletcher–Goldfard–Shanno (BFGS) update,
B k + 1 = B k B k s k s k T B k s k T B k s k + y k y k T y k T s k ,
was proposed independently by Broyden [10], Fletcher [11], Goldfarb [12], and Shanno [13]. One can find more on the topic in references [14,15,16,17].
If we assume that H k = B k 1 , then using Sherman–Morrison formula, we have the Broyden’s family update:
H k + 1 = H k H k y k y k T H k y k T H k y k + s k s k T s k T y k + ϕ k v k v k T ,
where ϕ k [ 0 , 1 ] is
ϕ k = y k T H k y k s k s k T y k H k y k y k T H k y k ,
When ϕ k 1 , we have a BFGS update. When ϕ k 0 , we have a DFP update.
However, a quasi-Newton method is not desirable when applied to solve large-scale problems, because we need to store the full matrx B k . To overcome such drawback, the so-called sparse quasi-Newton methods [14] have received much attention. Early in 1970, Schubert [18] has proposed a sparse Broyden’s rank one method. Then Powell and Toint [19], Toint [20] studied the sparse quasi-Newton method.
Existing sparse quasi-Newton methods usually use a sparse symmetric matrix as an approximation of the Hessian so that both matrices take the same form or have similar structures. If the limited memory technique [21,22] is adopted, which only stores several pairs ( s k , y k ) to construct a matrix H k by updating the initial matrix H 0 m times, the method can be widely used in practical optimization problems. On the other hand, there are many large-scale problems in scientific fields take the partially separable form
f ( x ) = i = 1 m f i ( x ) ,
where function f i , i = 1 , , m is related to a few variables. For the partially separable unconstrained optimization problems, the partitioned BFGS method [23,24] was proposed and has better performance in practice. The partitioned BFGS method updates each matrix B k i of each element function f i ( x ) separately via BFGS updating and sums these matrices to construct the next quasi-Newton matrix B k + 1 . Since the size of x in f i ( x ) is smaller than that of n, the matrix B k + 1 i will be a small matrix, and then the matrix B k + 1 will be sparse. The quasi-Newton direction is the solution of the linear equations:
i = 1 m B k i d k = f ( x k ) .
However, the partitioned BFGS method cannot always preserve the positive definiteness of the matrix B k , only if that each element function f i ( x ) is convex, so the partitioned BFGS method is implemented with the trust region strategy [25]. Recently, for the partially separable nonlinear equations, Cao and Li [26] have introduced two kinds of partitioned quasi-Newton methods and given their global and superlinear convergence.
Another efficient sparse quasi-Newton method is designed to exploit the sparsity structures of the Hessian. We assume that for all x R n ,
( 2 f ( x ) ) i , j = 0 , ( i , j ) F ,
where F { 1 , , n } × { 1 , , n } . References  [27,28] have proposed sparse quasi-Newton methods, where H k + 1 satisfies the secant equation
H k + 1 y k = s k
and sparse condition
( H k + 1 ) i j = 0 , ( i , j ) F
simultaneously, where H k + 1 is an approximate inverse Hessian. Recently, Yamashita [29] proposed another type of matrix completion quasi-Newton (MCQN) update for solving problem (1) with a sparse Hessian and proved the local and superlinear convergence for MCQN updates with the DFP method. Reference [30] established the convergence of MCQN updates with all of Broyden’s convex family. However, global convergence analysis [31] was presented for two-dimensional functions with uniformly positive definite Hessians.
Another kind of quasi-Newton method for solving large scale unconstrained optimization problems is the diagonal quasi-Newton method, where the Hessian of an objective function is approximated by a diagonal matrix with positive elements. The first version was developed by Nazareth [32], where the quasi-Newton matrix satisfies the least change and weak secant condition [33]:
min H k + 1 H k s.t. y k T H k + 1 y k = y k T s k ,
where · F is the standard Frobenius norm. Recently, Andrei N. [34] developed a diagonal quasi-Newton method, where the diagonal elements satisfy the least change weak secant condition (3) and minimize the trace of the update. Besides, lots of other techniques, such as forward and central finite differences, the variational principle with a weighted norm, and the generalized Frobenius norm, can be used to derive different kinds of diagonal quasi-Newton method [35,36,37]. Under usual assumptions, the diagonal quasi-Newton method is linearly convergent. The authors of [38] adopted a similar technique to derivation with the DFP method and got a low memory diagonal quasi-Newton method. Using the Armijo line search, they established the global convergence and gave the sufficient conditions for the method to be superlinearly convergent.
The main contribution of our paper is to propose a sparse quasi-Newton algorithm based on automatic differentiation for solving (1). Firstly, similarly to the derivation of BFGS update, we can perform a symmetric rank-two quasi-Newton update:
B k + 1 = B k B k σ k σ k T B k σ k T B k σ k + 2 f ( x k + 1 ) σ k σ k T 2 f ( x k + 1 ) σ k T 2 f ( x k + 1 ) σ k ,
where σ k R n and B k + 1 satisfying the adjoint tangent condition [39]
σ k T B k + 1 = σ k T 2 f ( x k + 1 ) .
For an n × n matrix, we denote A 0 , as A is positive definite. Then, when B k 0 , B k + 1 0 if and only if σ k T 2 f ( x k + 1 ) σ k > 0 , which means that the proposed update (4) keeps the positive definiteness, as in BFGS updating. Moreover, when B 0 is positive definite, the matrices { B k } updated by the proposed update (4) are positive definite for solving (1) with uniformly positive definite Hessians. In our work, we pay attention to σ k = s k ; then the proposed rank-two quasi-Newton update (4) method satisfies
B k + 1 s k = 2 f ( x k + 1 ) s k ,
which means that B k + 1 equals 2 f ( x k + 1 ) in the direction s k exactly. Several lemmas have been given to present the properties of the proposed rank-two quasi-Newton update formula. Secondly, combined with the idea of MCQN method [29], we propose a sparse and symmetric quasi-Newton algorithm for solving (1). Under appropriate conditions, local and superlinear convergence are established. Finally, our numerical results illustrate that the proposed algorithm has satisfying performance.
The paper is organized as follows. In Section 2, we introduce a symmetric rank-two quasi-Newton update based on automatic differentiation and prove several nice properties. In Section 3, by using the idea of matrix completion, we present a sparse quasi-Newton algorithm and show some nice properties. In Section 4, we prove the local and superlinear convergence of the algorithm proposed in Section 3. Numerical results are listed in Section 5, which verify that the proposed algorithm is very encouraging. Finally, we give the conclusion.

2. A New Symmetric Rank-Two Quasi–Newton Update

Similarly to the derivation of BFGS update, we will derive a new symmetric rank-two quasi-Newton update and show several lemmas. Let
B k + 1 = B k + k ,
where k is a rank-two matrix and B k + 1 satisfies the condition
σ k T B k + 1 = σ k T 2 f ( x k + 1 ) ,
where σ k R n and σ k 0 . Similarly to the derivation of BFGS, we have the following symmetric rank-two update:
B k + 1 = B k B k σ k σ k T B k σ k T B k σ k + 2 f ( x k + 1 ) σ k σ k T 2 f ( x k + 1 ) σ k T 2 f ( x k + 1 ) σ k .
If we denote H k = B k 1 and H k + 1 = B k + 1 1 , then (6) can be expressed as
H k + 1 = H k H k 2 f ( x k + 1 ) σ k σ k T + σ k σ k T 2 f ( x k + 1 ) H k σ k T 2 f ( x k + 1 ) σ k + 1 + σ k T 2 f ( x k + 1 ) H k 2 f ( x k + 1 ) σ k σ k T 2 f ( x k + 1 ) σ k · σ k σ k T σ k T 2 f ( x k + 1 ) σ k .
It can be seen that the update (6) involves the Hessian 2 f ( x ) , but we do not need to compute them in practice. For given vectors x, s, and σ , we can get 2 f ( x ) s and σ T 2 f ( x ) exactly by the forward and reverse mode of automatic differentiation.
Next, several lemmas are presented.
Lemma 1.
We suppose that B k 0 and B k + 1 is updated by (6); then B k + 1 0 if and only if σ k T 2 f ( x k + 1 ) σ k > 0 .
Proof. 
According to the condition (5), one has
σ k T 2 f ( x k + 1 ) σ k = σ k T B k + 1 σ k .
If B k + 1 is positive definite, one has σ k T 2 f ( x k + 1 ) σ k > 0 .
Let σ k T 2 f ( x k + 1 ) σ k > 0 and B k 0 . Then for d k R n , d k 0 , it can be derived from (6) that
d k T B k + 1 d k = d k T B k d k ( d k T B k σ k ) 2 σ k T B k σ k + ( d k T 2 f ( x k + 1 ) σ k ) 2 σ k T 2 f ( x k + 1 ) σ k .
According to that B k 0 , there is a symmetric matrix B k 1 / 2 0 , such that B k = B k 1 / 2 B k 1 / 2 . Then we have from Cauchy–Schwarz inequality that
( d k T B k σ k ) 2 = ( B k 1 / 2 d k ) T ( B k 1 / 2 σ k ) 2 B k 1 / 2 d k 2 · B k 1 / 2 σ k 2 = ( d k T B k d k ) ( σ k T B k σ k ) ,
where the equality holds if and only if d k = λ k σ k , λ k 0 .
If the inequality (8) holds strictly, one has
d k T B k + 1 d k > d k T B k d k d k T B k d k + ( d k T 2 f ( x k + 1 ) σ k ) 2 σ k T 2 f ( x k + 1 σ k ) 0 .
If the equality (8) holds; i.e., there exists a λ k 0 such that d k = λ k σ k , then it can be deduced from (8) that
d k T B k + 1 d k ( d k T 2 f ( x k + 1 ) σ k ) 2 σ k T 2 f ( x k + 1 σ k ) = λ k 2 σ k T 2 f ( x k + 1 ) σ k > 0 .
In conclusion, d k T B k + 1 d k > 0 for d k R n and d k 0 .    □
Lemma 2.
If we rewrite update Formula (7) as H k + 1 = H k + E , where H k is symmetric and satisfies σ k T = σ k T 2 f ( x k + 1 ) H k , then E is the solution of the following minimization problem:
min E E W s . t . E T = E , σ k T 2 f ( x k + 1 ) E = η T ,
where η = σ k T σ k T 2 f ( x k + 1 ) H k and W satisfies σ k T W = σ k T 2 f ( x k + 1 ) .
Proof. 
A suitable Lagrangian function of the convex programming problem is
φ = 1 4 trace ( W E T W E ) + trace ( Λ T ( E T E ) ) λ T W ( E 2 f ( x k + 1 ) σ k η ) ,
where Λ and λ are Lagrange multipliers. Moreover,
φ E i j = 1 4 ( trace ( W e j e i T W E ) + trace ( W E T W e i e j T ) ) + trace ( Λ ( e j e i T e i e j T ) ) λ T W e i e j T F ( x k + 1 ) σ k = 0 ,
or according to the symmetry and cyclic permutations, one has
1 2 [ W E W ] i j + Λ i j Λ j i = [ W λ σ k T 2 f ( x k + 1 ) ] i j .
Taking the transpose and accumulating eliminates Λ to yield
W E W = W λ σ k T 2 f ( x k + 1 ) + 2 f ( x k + 1 ) σ k λ T W ,
and by σ k T W = σ k T 2 f ( x k + 1 ) and the nonsingularity of W we have that
E = λ σ k T + σ k λ T .
Substituting (9) into σ k T 2 f ( x k + 1 ) E = η T and rewriting gives
λ = η σ k λ T 2 f ( x k + 1 ) σ k σ k T 2 f ( x k + 1 ) σ k .
Postmultiplying by σ k T 2 f ( x k + 1 ) gives
λ T 2 f ( x k + 1 ) σ k = 1 2 σ k T 2 f ( x k + 1 ) η σ k T 2 f ( x k + 1 ) σ k ,
so we have
λ = η 1 2 σ k T σ k 2 f ( x k + 1 ) η σ k T 2 f ( x k + 1 ) σ k σ k T 2 f ( x k + 1 ) σ k = H k 2 f ( x k + 1 ) σ k 1 2 σ k T σ k 2 f ( x k + 1 ) H 2 f ( x k + 1 ) σ k σ k T 2 f ( x k + 1 ) σ k σ k T 2 f ( x k + 1 ) σ k .
Substituting this into (9) gives the result (7).    □
Lemma 3.
If H k = B k 1 > 0 and σ k T 2 f ( x k + 1 ) σ k > 0 . Then B k + 1 given by (6) solves the variational problem
min B > 0 ψ ( H k 1 / 2 B H k 1 / 2 ) s . t . B T = B , σ k T B = σ k T 2 f ( x k + 1 ) .
Proof. 
According to the definition of ψ , where ψ : R n × n R  [40] is given by
ψ ( A ) = tr ( A ) ln det ( A ) ,
so we have
ψ ( H k 1 / 2 B H k 1 / 2 ) = trace ( H k B ) ln ( det H k det B ) = ψ ( H k B ) = ψ ( B H k ) .
We have the Lagrangian function
L ( B , Λ , λ ) = 1 2 ψ ( H k 1 / 2 B H k 1 / 2 ) + trace ( Λ T ( B T B ) + ( σ k T B σ k T 2 f ( x k + 1 ) ) λ k = 1 2 ( ψ ( H k B ) ln ( det H k ) ln ( det B ) ) + trace ( Λ T ( B T B ) ) + ( σ k T B σ k T 2 f ( x k + 1 ) ) λ k ,
where Λ and λ are the Lagrange multipliers. Moreover, one has
L B i j = 1 2 trace H k e i e j T ( B 1 ) j i + trace Λ T ( e k e i T e i e j T ) + σ k T e i e j T λ = 1 2 ( H k ) j i ( B 1 ) j i ) + Λ j i Λ i j + ( σ k T λ ) i j = 0 .
Transposing and adding in (12) that
H k B 1 + σ k T λ + λ T σ k = 0 , B 1 = H k + σ k T λ + λ T σ k .
Combined with the tangent condition, we have that
σ k T = σ k T 2 f ( x k + 1 ) H k + σ k T 2 f ( x k + 1 ) λ σ k T + σ k T 2 f ( x k + 1 ) σ k λ T ,
and hence
σ k T 2 f ( x k + 1 ) σ k = 1 2 1 σ k T 2 f ( x k + 1 ) H k 2 f ( x k + 1 ) σ k σ k T 2 f ( x k + 1 ) σ k ,
and so
λ = σ k H k 2 f ( x k + 1 ) 1 2 1 σ k T 2 f ( x k + 1 ) H k 2 f ( x k + 1 ) σ k σ k T 2 f ( x k + 1 ) σ k σ k T 2 f ( x k + 1 ) σ k .
Combined with (6), one has the Formula (7).
According to the Sherman–Morrison formula, (7) is equivalent to (6). Since the function ψ ( H k 1 / 2 B H k 1 / 2 ) is strictly convex on B 0 , the update formula (6) is the unique solution of the variational problem.    □
In this paper, we set σ k = s k , so one has
B k + 1 s k = 2 f ( x k + 1 ) s k ,
which means that B k + 1 is an exact approximation to 2 f ( x k + 1 ) in direction s k . Then we have the symmetric rank-two update formula
B k + 1 = B k B k s k s k T B k s k T B k s k + 2 f ( x k + 1 ) s k s k T 2 f ( x k + 1 ) s k T 2 f ( x k + 1 ) s k .
It can be seen that B k + 1 can preserve the symmetry when B k is symmetric. If we denote w k = 2 f ( x k + 1 ) s k , then we can obtain a similar Broyden convex family update formula:
H k + 1 = H k H k w k w k T H k w k T H k w k + s k s k T s k T w k + ϕ k v k v k T ,
where the parameter ϕ k [ 0 , 1 ] is defined as
v k = w k T H k w k s k s k T w k H k w k w k T H k w k .
The choice ϕ k 1 corresponds to the BFGS update
H k + 1 = H k + 1 + w k T H k w k s k T w k s k T s k s k T w k s k T w k H k + H k w k s k T s k T w k = H k + ( s k H k w k ) s k T + s k ( s k H k w k ) T s k T w k .

3. Algorithm and Related Properties

For the update formula (15), we adopt the idea of matrix completion. The next quasi-Newton matrix H k + 1 is the solution of the following minimization problem:
min ψ ( H k 1 / 2 H H k 1 / 2 ) s.t. H i j = H i , j A D , ( i , j ) F , ( H 1 ) i , j = 0 , ( i , j ) F , H T = H , H 0 .
When G ( V , F ¯ ) is chordal, the minimization problem (18) can be solved by solving the problem
max det ( H ) s.t. H i , j = H i , j A D , ( i , j ) F , H T = H , H 0 .
Then H k + 1 can be expressed as the sparse clique-factorization formula [29]. Then Algorithm 1 is stated as follows.
Algorithm 1 (Sparse Quasi-Newton Algorithm based on Automatic Differentiation)
  • Step 0. Compute F ¯ according to F such that G ( V , F ¯ ) is a chordal graph, where    V = { 1 , 2 , , n } . Choose x 0 R n , ϵ > 0 and a matrix H 0 R n × n , H 0 0 with ( H 0 1 ) i j = 0 , ( i , j ) F . Let k : = 0 .
  • Step 1 If f ( x k ) ϵ , stop.
  • Step 2 x k + 1 = x k H k f ( x k ) .
  • Step 3 Update H k to get H i j A D , ϕ k [ 0 , 1 ] , ( i , j ) F by update Formula (15).
  • Step 4 Get H k + 1 by the minimization problem (18). When G ( V , F ¯ ) is a chordal graph, the problem (18) can be solved by solving the problem (19).
  • Step 5 Let k : = k + 1 , go to Step 1.
When the H k in step 3 is updated by Broyden’s class method, the method corresponds to the method in [29]. In the present paper, we focus on the MCQN update with H A D = H k + 1 , where H k + 1 is given by (15).
In what follows, we give some notation for the convenience of analysis. For a nonsingular matrix P satisfying
( P 1 ) i j = 0 , ( i , j ) F ,
we let
s ¯ k = P 1 / 2 s k , w ¯ k = P 1 / 2 w k , H ¯ k = P 1 / 2 H k P 1 / 2 , H ¯ A D = P 1 / 2 H A D P 1 / 2 ,
where H A D = H k + 1 is given by (15). Then we can get from (15) that
H ¯ A D = H ¯ k H ¯ k w ¯ k w ¯ k T H ¯ k w ¯ k T H ¯ k w ¯ k + s ¯ k s ¯ k T s ¯ k T w ¯ k + ϕ k v ¯ k v ¯ k T ,
where
v ¯ k = w ¯ k T H ¯ k w ¯ k s ¯ k s ¯ k T w ¯ k H ¯ k w ¯ k w ¯ k T H ¯ k w ¯ k .
Similarly to that in [30], we can assume that
τ k = w ¯ k T H ¯ k w ¯ k T w ¯ k · H ¯ k w ¯ k , q k = w ¯ k T H ¯ k w ¯ k w ¯ k 2 , η k = s ¯ k T H ¯ k w ¯ k s ¯ k T w ¯ k , m k = s ¯ k T w ¯ k w ¯ k T w ¯ k ,
M k = s ¯ k 2 s ¯ k T w ¯ k , β k = s ¯ k T H ¯ k 1 s ¯ k T s ¯ k T w ¯ k , γ k = w ¯ k T H ¯ k w ¯ k s ¯ k T w ¯ k .
According to [41] and (21), we have
tr ( H ¯ A D ) = tr ( H ¯ k ) ( 1 ϕ k ) q k τ k 2 2 ϕ k η k + 1 + ϕ k q k m k M k
and
det ( H ¯ A D ) = det ( H ¯ k ) 1 + ϕ k ( β k γ k 1 ) / γ k .
Next, we establish a relation between H ¯ k + 1 and H ¯ A D , which is very important in the establishment of the local and superlinear convergence of Algorithm 1.
Proposition 1.
For the Algorithm 1, we have the following relation:
t r ( H ¯ k + 1 ) = t r ( H ¯ A D ) , d e t ( H ¯ k + 1 ) d e t ( H ¯ A D ) .
Proof. 
We can obtain from (18) that
( H k + 1 ) i , j = ( H A D ) i , j , ( i , j ) F .
Combined with (20), one has that for any ( i , j ) F , there at least exists one of the ( H k + 1 H A D ) i j and P i , j 1 equals to zero. Then we can get that
tr ( H ¯ k + 1 H ¯ A D ) = tr ( P 1 ( H k + 1 H A D ) ) = i = 1 n j = 1 n ( P 1 ) i , j ( H k + 1 H A D ) i , j = 0 .
Moreover, since H A D satisfies (19), we must have
det ( H k + 1 ) det ( H A D ) .
Consequently, one has
det ( H ¯ k + 1 ) = H ¯ A D .
Remark 1.
According to the definition of ψ (10) and the relation between H ¯ k + 1 and H ¯ A D (25), one has that
ψ ( H ¯ k + 1 ) ψ ( H ¯ A D ) .
When we substitute (23) and (24) into (28), the ψ ( H ¯ k + 1 ) and ψ ( H ¯ k ) has the relation
ψ ( H ¯ k + 1 ) ψ ( H ¯ k ) ( 1 ϕ k ) q k τ k 2 2 ϕ k η k + 1 + ϕ k q k m k M k ln 1 + ϕ k ( β k γ k 1 ) + ln γ k .

4. The Local and Superlinear Convergence

Based on the discussion in Section 3, we prove the local and superlinear convergence of Algorithm 1. First, we list the assumptions.
Assumption A1.
Assume that x is a solution of (1) and
] Ω = { x R n | x x b } ,
where b > 0 .
(1) 
The function f R n R is twice continuously differentiable on Ω.
(2) 
There exist two constants, m > 0 and M > 0 , satisfying
m u 2 u T ( 2 f ( x ) ) 1 u M u 2 , u R n , x Ω .
According to Assumption 1, we have constants L ¯ > 0 and L > 0 such that
f ( x ) f ( y ) L ¯ x y , x , y Ω ,
2 f ( x ) 2 f ( y ) L x y , x , y Ω .
We define
ϵ k = max { x k x , x k + 1 x } ,
and get from (32) that
w k 2 f ( x ) s k = 2 f ( x k + 1 ) s k 2 f ( x ) s k 2 f ( x k + 1 ) 2 f ( x ) · s k L x k + 1 x · s k = L ϵ k s k .
If we take P = H , then one has from (34) that
w ¯ k s ¯ k = P 1 / 2 w k P 1 / 2 s k = H 1 / 2 · w k 2 f ( x ) s k L H 1 / 2 ϵ k s k ,
Furthermore, it is easy to deduce that
M k 1 , μ k = 2 M k m k m k , μ ^ k = ( w ¯ k s ¯ k ) T H ¯ k w ¯ k tr ( H ¯ k ) s ¯ k T w ¯ k , ln m k 1 2 c 1 ϵ k ,
where c 1 > 0 , c 2 ( 0 , b ) , and ϵ k < c 2 . We define
ρ k = q k 1 ln q k , ζ k = ( 1 ϕ k ) q k ( τ k 2 1 ) , ξ k = ln ( 1 + ϕ k ( β k γ k 1 ) ) ,
and rewrite (29) as
ψ ( H ¯ k + 1 ) ψ ( H ¯ k ) ρ k ζ k ξ k + ( M k 1 ) + ϕ k q k μ k + ϕ k tr ( H ¯ k ) μ ^ k + ln m k .
As γ k = q k / m k and 0 q k tr ( H ¯ k ) , we can obtain from the above inequality and (36) that
ψ ( H ¯ k + 1 ) ψ ( H ¯ k ) ρ k ζ k ξ k + c 1 1 + tr ( H ¯ k ) ϵ k .
Considering
λ ln λ max 1 1 e λ , 1 , λ > 0 ,
one has
ψ ( A ) max 1 1 e tr ( A ) , n ,
where A T = A and A > 0 . Moreover, it follows from (40) that
ψ ( H ¯ k + 1 ) ( 1 + c 3 ϵ k ) ψ ( H ¯ k ) ρ k ζ k ξ k ,
where c 3 = c 1 1 n + e e 1 . Since τ k 2 1 and β k γ k 1 , it is obvious that ρ k , ζ k , ξ k > 0 , and
ψ ( H ¯ k + 1 ) ( 1 + c 3 ϵ k ) ψ ( H ¯ k ) .
The theorem given bellow shows that Algorithm 1 converges locally and linearly, where the relation (42) plays an essential role.
Theorem 1.
Let Assumption 1 hold and sequence { x k } be generated by Algorithm 1 with α k 1 , where H k is updated by (15). Then for any ρ ( 0 , 1 ) , there is a constant τ x 0 x τ , H 0 H τ , such that
x k + 1 x ρ x k x .
Proof. 
According to the Lemma 4 [29], there are constants τ ¯ ( 0 , b ) and δ > 0 such that when x 0 x τ ¯ , one has
ψ ( H ¯ 0 ) n δ / 2 ,
and
H H ρ / ( 2 L ¯ ) ,
where H 0 and H ¯ = H 1 / 2 H H ¯ = H 1 / 2 . Define
τ = min τ ¯ , c 2 , ρ L ¯ , ρ L M , 1 ρ c 3 ln 2 ( n + δ ) 2 n + δ .
We will prove the inequalities (43) and
H k H ρ 2 L ¯
hold for any k 0 by induction. By the Lipstchitz continuity of 2 f ( x ) , we have for x Ω ,
x x H f ( x ) H · 0 1 2 f ( x + t ( x x ) ) 2 f ( x ) · x x d t 1 2 L M x x 2 .
Then, when k = 0 , it is easy to deduce (43) by (44) and (45). Moreover, when we take α k 1 and substitute x 0 into (48), we can obtain
x 1 x = x 0 H 0 f ( x 0 ) x x 0 x H f ( x 0 ) + ( H 0 H ) ( f ( x 0 ) ) f ( x ) ) 1 2 L M x 0 x 2 + H 0 H · f ( x 0 ) ) f ( x ) 1 2 L M x 0 x + ρ 2 x 0 x 1 2 L M τ + ρ 2 x 0 x ρ x 0 x .
So we have that (43) and (47) hold for k = 1 . Assume that (43) and (47) hold for k = 0 , 1 , , l ; then one has
ϵ k = x k x , ϵ k ρ k ϵ 0 ρ k τ , k = 0 , 1 , , l ,
and
x l + 1 x = x l H l f ( x l ) x x l x H f ( x l ) + ( H l H ) ( f ( x l ) ) f ( x ) ) 1 2 L M x l x 2 + H l H · f ( x l ) ) f ( x ) 1 2 L M x l x + ρ 2 x l x 1 2 L M ρ l τ + ρ 2 x l x ρ x l x .
Then by the definition of τ (46), one has
c 3 k = 0 l ϵ k c 3 τ k = 0 l ρ k = c 3 τ 1 ρ l + 1 1 ρ c 3 τ 1 ρ ln 2 ( n + δ ) 2 n + δ .
Combine (42) and (44). It can seen that
ψ ( H ¯ l + 1 ) n ( ψ ( H ¯ 0 ) n ) + k = 0 l ( 1 + c 3 ϵ k ) 1 ψ ( H ¯ 0 ) δ 2 + n + δ 2 k = 0 l e c 3 ϵ k 1 δ 2 + n + δ 2 e c 3 k = 0 l ϵ k 1 δ 2 + n + δ 2 2 ( n + δ ) 2 n + δ 1 = δ .
Thus, we can get that (47) holds for all k = l + 1 . This completes the proof. □
Based on the above discussion and the relation (42), we can show the superlinear convergence of the Algorithm 1.
Theorem 2.
Let Assumption A1 hold and sequence { x k } be generated by Algorithm 1 with α k 1 , where H k is updated by (15). Then there is a constant τ > 0 such that when x 0 x τ , H 0 H τ , one has
lim k ( H k H ) w k w k = 0 .
Then the sequence { x k } is superlinearly convergent.
Proof. 
Let τ be defined as in (1), and for all k one has
ψ ( H ¯ k ) n δ .
It follows from (41) that
ρ k + ζ k + ξ k ψ ( H ¯ k + 1 ) ψ ( H ¯ k ) + c 3 ϵ k ψ ( H ¯ k ) .
Summing the above inequality and combining (51) and (54), we can deduce
k 1 ( ρ k + ζ k + ξ k ) c 3 k 1 ϵ k ψ ( H ¯ k ) c 3 ( n + δ ) ln 2 ( n + δ ) 2 n + δ < ,
which means that the nonnegative constants ρ k , ζ k and ξ k all tend to zero when k + . Furthermore, according to the definition of (37), we have that
( 1 ) q k 1 ; ( 2 ) if ϕ k 1 2 , τ 1 ; ( 3 ) if ϕ k > 1 2 , β k γ k 1 .
First, we have
H 1 / 2 ( H k H ) w k 2 H 1 / 2 w k 2 = H ¯ k w ¯ k 2 2 w ¯ k T H ¯ k w ¯ k + w ¯ k 2 H ¯ k 2 = q k τ k 2 2 q k + 1 .
For the case { k i : ϕ k i 1 2 } , one has q k 1 , and τ k i 1 ; and then (53) is true.
Moreover, it is easy to deduce that
H ¯ k w ¯ k s ¯ k 2 w ¯ k 2 H ¯ k 1 / 2 2 · H ¯ k 1 / 2 w ¯ k ( H ¯ k ) 1 / 2 s ¯ k 2 w ¯ k 2 = H ¯ k 1 / 2 2 ( w ¯ k T H ¯ k w ¯ k 2 s ¯ k T w ¯ k + s ¯ k T ( H ¯ k ) 1 s ¯ k ) w ¯ k 2 = H ¯ k 1 / 2 2 q k 2 m k + β k γ k q k .
We also have
| H ¯ k w ¯ k w ¯ k w ¯ k H ¯ k w ¯ k s ¯ k w ¯ k | w ¯ k s ¯ k w ¯ k 0 .
For the case { k i : ϕ k i > 1 2 } , one has q k 1 , β k γ k 1 , m k 1 ; then (53) is true by (56)–(58). Thus, the relation (53) holds for all k.
Next, we will show that (53) indicates that the sufficient condition [6]
lim k ( B k 2 f ( x ) ) s k s k = 0
holds. According to (47), one has that there is a constant λ min > 0 such that ( λ k ) i λ min , where ( λ k ) i denotes the eigenvalues of H k , i = 1 , 2 , , n . When we let w k = 2 f ( x k + 1 ) s k , one has
( H k H ) w k = ( H k H ) 2 f ( x ) s k + ( H k H ) ( 2 f ( x k + 1 ) 2 f ( x ) ) s k H k ( 2 f ( x ) B k ) s k ( H k H ) ( 2 f ( x k + 1 ) 2 f ( x ) ) s k λ min ( 2 f ( x ) B k ) s k ( H k H ) ( 2 f ( x k + 1 ) 2 f ( x ) ) s k ,
and
( H k H ) w k w k = λ min ( 2 f ( x ) B k ) s k 2 f ( x k + 1 ) s k ( H k H ) ( 2 f ( x k + 1 ) 2 f ( x ) ) s k 2 f ( x k + 1 ) s k λ min ( 2 f ( x ) B k ) s k 1 λ min s k ( H k H ) ( 2 f ( x k + 1 ) 2 f ( x ) ) s k 1 λ min s k = λ min 2 ( 2 f ( x ) B k ) λ min ( H k H ) ( 2 f ( x k + 1 ) 2 f ( x ) ) .
When k , since x k x , then one has from (53) that
lim k ( B k 2 f ( x ) ) s k s k = 0 ,
which is the well-known Dennis–Moré condition. Thus, we get the superlinear convergence. □

5. Numerical Experiments

The performance in [29] shows that the MCQN update with the BFGS method has better numerical performance than the MCQN update with DFP method. Hence, we compare the numerical performance of Algorithm 1 with the MCQN update with BFGS method and the limited-memory BFGS method.
The 24 test problems with initial points are given in Table 1, which are from [29,42,43,44]. It can be seen that all the test problems have special Hessian structures such as band matrices, so the chordal extension of the sparsity could be obtained easily. Then H k + 1 in Algorithm 1 can be written as the sparse clique-factorization formula.
All the methods were coded in MATLAB R2016a on a Core (TM) i5 PC. The automatic differentiation was computed by ADMAT 2.0, which is available on the cayuga research GitHUB page. In Table 1, Table 2, Table 3 and Table 4 and Figure 1 and Figure 2, we report the numerical performances of the three methods. For the convenience of statement, we use the following notation in our numerical results.
  • Pro: the problems;
  • Dim: the dimensions of the test problem;
  • Init: the initial points;
  • Method: the algorithm used to solve the problem;
  • MCQN-BFGS: MCQN update with the BFGS method;
  • L-BFGS: limited-memory with the BFGS method.
We adopted the termination criterion as follows:
f ( x ) n 10 5 or   ite 5000 .
Firstly, we tested all three methods on the above 24 problems, whose dimensions are 10, 20, 50, 100, 200, 5000, 1000, 2000, 5000, and 1000. We set m = 15 in the limited-memory BFGS method. Table 2 and Table 3 contain the numbers of iterations of the three methods for the test problems. Taking account of the total number of iterations, Algorithm 1 outperformed the MCQN update with BFGS method on 11 problems (2, 4, 5, 7, 9, 10, 12, 14, 18, 23, 24). Additionally, Algorithm 1 outperformed the limited memeory BFGS method on 13 problems (1, 2, 3, 7, 9, 12, 15, 16, 18, 19, 20, 21, 23).
For the sake of precise comparison, we adopted the performance profiles from [45], which are distribution functions of a performance metric. We denote P and S as the test set and the set of solvers; and N p and N s as the umber of problems and number of solvers, respectively. For solver s S and problem p P , we define t p , s as the number of iterations or number of function evaluations required for solve problem p using solver s. Then, using the performance ration
r p , s = t p , s min { t p , q : q S } ,
we define
ρ s ( t ) = 1 N p size { p P : r p , s t } ,
where r p , s r M for some constant for all p and s. The equality holds if and only if solver s cannot solve problem p. Therefore, ρ s : R [ 0 , 1 ] was the probability for s S satisfying r p , s t , t R among the best possible ratios.
Figure 1 evaluates the number of iterations of and the MCQN update with BFGS method by using performance profiles. It can be seen that the top curve corresponds to Algorithm 1, which shows that Algorithm 1 had better performance than the MCQN update with BFGS method. Additionally, Figure 2 demonstrates that Algorithm 1 had better performance than the limited-memory BFGS method.
Secondly, for a further comparison of Algorithm 1 and the MCQN update with BFGS method, we tested five different initial points, x 0 , 2 x 0 , 4 x 0 , 7 x 0 , and 10 x 0 , where x 0 is specified in Table 1. The dimensions of the test problems was 1000. Table 4 reports the number of iterations required of the two methods for 24 test problems, which also demonstrates that Algorithm 1 was effective and superior to the MCQN update with BFGS method.

6. Conclusions

In this paper, we presented a symmetric rank-two quasi-Newton update method based on an adjoint tangent condition for solving unconstrained optimization problems. Combined with the idea of matrix completion, we proposed a sparse quasi-Newton algorithm and established its local and superlinear convergence. Extensive numerical results demonstrated that the proposed algorithm outperformed other methods and can be used to solve large-scale unconstrained optimization problems.

Author Contributions

Conceptualization, H.C.; methodology, H.C. and X.A.; software, H.C. and X.A.; formal analysis, H.C.; writing—original draft preparation, H.C. and X.A.; writing—review and editing H.C. and X.A. All authors have read and agreed to the published version of the manuscript.

Funding

The work is supported by the National Natural Science Foundation of China, grant number 11701577; the Natural Science Foundation of Hunan Province, China, grant number 2020JJ5960; and the Scientific Research Foundation of Hunan Provincial Education Department, China, grant number 18C0253.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The date used to support the research plan and all the code used in this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhou, W. A modified BFGS type quasi-Newton method with line search for symmetric nonlinear equations problems. J. Comput. Appl. Math. 2020, 367, 112454. [Google Scholar] [CrossRef]
  2. Zhou, W. A globally convergent BFGS method for symmetric nonlinear equations. J. Ind. Manag. Optim. 2021. [Google Scholar] [CrossRef]
  3. Zhou, W. A class of line search-type methods for nonsmooth convex regularized minimization. Softw. Comput. 2021, 25, 7131–7141. [Google Scholar] [CrossRef]
  4. Zhou, W.; Zhang, L. A modified Broyden-like quasi-Newton method for nonlinear equations. J. Comput. Appl. Math. 2020, 372, 112744. [Google Scholar] [CrossRef]
  5. Sabi’u, J.; Muangchoo, K.; Shah, A.; Abubakar, A.B.; Jolaoso, L.O. A Modified PRP-CG Type Derivative-Free Algorithm with Optimal Choices for Solving Large-Scale Nonlinear Symmetric Equations. Symmetry 2021, 13, 234. [Google Scholar] [CrossRef]
  6. Dennis, J.E.; Moré, J.J. A characterization of superlinear convergence and its application to quasi-Newton methods. Math. Comput. 1974, 28, 549–560. [Google Scholar] [CrossRef]
  7. Dennis, J.E.; Moré, J.J. Quasi–Newton methods, motivation and theory. SIAM Rev. 1977, 19, 46–89. [Google Scholar] [CrossRef] [Green Version]
  8. Davidon, W.C. Variable metric method for minimization. In Research Development Report ANL-5990; University of Chicago: Chicago, IL, USA, 1959. [Google Scholar] [CrossRef] [Green Version]
  9. Fletcher, R.; Powell, M.J. A rapidly convergent descent method for minimization. Comput. J. 1963, 6, 163–168. [Google Scholar] [CrossRef]
  10. Broyden, C.G. The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA J. Appl. Math. 1970, 6, 76–90. [Google Scholar] [CrossRef]
  11. Fletcher, R. A new approach to variable metric algorithms. Comput. J. 1970, 13, 317–322. [Google Scholar] [CrossRef] [Green Version]
  12. Goldfarb, D. A family of variable-metric methods derived by variational means. Math. Comput. 1970, 24, 23–26. [Google Scholar] [CrossRef]
  13. Shanno, D.F. Conditioning of quasi-Newton methods for function minimization. Math. Comput. 1970, 24, 647–656. [Google Scholar] [CrossRef]
  14. Quasi–Newton Methods. In Optimization Theory and Methods; Springer Series in Optimization and Its Applications; Springer: Boston, MA, USA, 2006; Volume 1, pp. 203–301. [CrossRef]
  15. Quasi–Newton Methods. In Numerical Optimization; Springer Series in Operations Research and Financial Engineering; Springer: New York, NY, USA, 2006. [CrossRef]
  16. Sun, W.; Yuan, Y.X. Optimization Theory and Methods: Nonlinear Programming; Springer Science & Business Media: New York, NY, USA, 2006. [Google Scholar]
  17. Andrei, N. Continuous Nonlinear Optimization for Engineering Applications in GAMS Technology; Springer Optimization and Its Applications Series; Springer: Berlin, Germany, 2017; Volume 121. [Google Scholar] [CrossRef]
  18. Schubert, L.K. Modification of a quasi-Newton method for nonlinear equations with a sparse Jacobian. Math. Comput. 1970, 24, 27–30. [Google Scholar] [CrossRef]
  19. Powell, M.J.D.; Toint, P.L. On the estimation of sparse Hessian matrices. SIAM J. Numer. Anal. 1979, 16, 1060–1074. [Google Scholar] [CrossRef]
  20. Toint, P. Towards an efficient sparsity exploiting Newton method for minimization. In Sparse Matrices and Their Uses; Academic Press: London, UK, 1981; pp. 57–88. [Google Scholar]
  21. Nocedal, J. Updating quasi-Newton matrices with limited storage. Math. Comput. 1980, 35, 773–782. [Google Scholar] [CrossRef]
  22. Liu, D.C.; Nocedal, J. On the limited memory BFGS method for large scale optimization. Math. Program. 1989, 45, 503–528. [Google Scholar] [CrossRef] [Green Version]
  23. Griewank, A.; Toint, P.L. Partitioned variable metric updates for large structured optimization problems. Numer. Math. 1982, 39, 119–137. [Google Scholar] [CrossRef]
  24. Griewank, A.; Toint, P.L. Local convergence analysis for partitioned quasi-Newton updates. Numer. Math. 1982, 39, 429–448. [Google Scholar] [CrossRef]
  25. Griewank, A. The global convergence of partitioned BFGS on problems with convex decompositions and Lipschitzian gradients. Math. Program. 1991, 50, 141–175. [Google Scholar] [CrossRef]
  26. Cao, H.P.; Li, D.H. Partitioned quasi-Newton methods for sparse nonlinear equations. Comput. Optim. Appl. 2017, 66, 481–505. [Google Scholar] [CrossRef]
  27. Toint, P.L. On sparse and symmetric matrix updating subject to a linear equation. Math. Comput. 1977, 31, 954–961. [Google Scholar] [CrossRef]
  28. Fletcher, R. An optimal positive definite update for sparse Hessian matrices. SIAM J. Optim. 1995, 5, 192–218. [Google Scholar] [CrossRef]
  29. Yamashita, N. Sparse quasi-Newton updates with positive definite matrix completion. Math. Program. 2008, 115, 1–30. [Google Scholar] [CrossRef]
  30. Dai, Y.H.; Yamashita, N. Analysis of sparse quasi-Newton updates with positive definite matrix completion. J. Oper. Res. Soc. China 2014, 2, 39–56. [Google Scholar] [CrossRef] [Green Version]
  31. Dai, Y.H.; Yamashita, N. Convergence analysis of sparse quasi-Newton updates with positive definite matrix completion for two-dimensional functions. Numer. Algebr. Control. Optim. 2011, 1, 61–69. [Google Scholar] [CrossRef]
  32. Nazareth, J.L. If quasi-Newton then why not quasi-Cauchy. SIAG/Opt Views-and-News 1995, 6, 11–14. [Google Scholar]
  33. Dennis, J.E., Jr.; Wolkowicz, H. Sizing and least-change secant methods. SIAM J. Numer. Anal. 1993, 30, 1291–1314. [Google Scholar] [CrossRef] [Green Version]
  34. Andrei, N. A diagonal quasi-Newton updating method for unconstrained optimization. Numer. Algorithms 2019, 81, 575–590. [Google Scholar] [CrossRef]
  35. Andrei, N. A new diagonal quasi-Newton updating method with scaled forward finite differences directional derivative for unconstrained optimization. Numer. Funct. Anal. Optim. 2019, 40, 1467–1488. [Google Scholar] [CrossRef]
  36. Andrei, N. Diagonal Approximation of the Hessian by Finite Differences for Unconstrained Optimization. J. Optim. Theory Appl. 2020, 185, 859–879. [Google Scholar] [CrossRef]
  37. Andrei, N. A new accelerated diagonal quasi-Newton updating method with scaled forward finite differences directional derivative for unconstrained optimization. Optimization 2021, 70, 345–360. [Google Scholar] [CrossRef]
  38. Leong, W.J.; Enshaei, S.; Kek, S.L. Diagonal quasi-Newton methods via least change updating principle with weighted Frobenius norm. Numer. Algorithms 2021, 86, 1225–1241. [Google Scholar] [CrossRef]
  39. Schlenkrich, S.; Griewank, A.; Walther, A. On the local convergence of adjoint Broyden methods. Math. Program. 2010, 121, 221–247. [Google Scholar] [CrossRef]
  40. Byrd, R.H.; Nocedal, J. A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J. Numer. Anal. 1989, 26, 727–739. [Google Scholar] [CrossRef]
  41. Byrd, R.H.; Nocedal, J.; Yuan, Y.X. Global convergence of a cass of quasi-Newton methods on convex problems. SIAM J. Numer. Anal. 1987, 24, 1171–1190. [Google Scholar] [CrossRef]
  42. Moré, J.J.; Garbow, B.S.; Hillstrom, K.E. Testing unconstrained optimization software. ACM Trans. Math. Softw. (TOMS) 1981, 7, 17–41. [Google Scholar] [CrossRef]
  43. Luksan, L.; Matonoha, C.; Vlcek, J. Modified CUTE Problems for Sparse Unconstrained Optimization; Technical Report 1081; Institute of Computer Science, Academy of Sciences of the Czech Republic: Prague, Czech Republic, 2010. [Google Scholar]
  44. Andrei, N. An unconstrained optimization test functions collection. Adv. Model. Optim. 2008, 10, 147–161. [Google Scholar]
  45. Dolan, E.D.; Moré, J.J. Benchmarking optimization software with performance profiles. Math. Program. 2002, 91, 201–213. [Google Scholar] [CrossRef]
Figure 1. Performance profiles based on the numbers of iterations.
Figure 1. Performance profiles based on the numbers of iterations.
Symmetry 13 02093 g001
Figure 2. Performance profiles based on the numbers of iterations.
Figure 2. Performance profiles based on the numbers of iterations.
Symmetry 13 02093 g002
Table 1. The test problems.
Table 1. The test problems.
Prothe Test FunctionsInit
1TRIDIA [29] x 0 = ( 1 , 1 , , 1 ) T
2the chained Rosenbrock problem [29] x 0 = ( 1.2 , 1 , , 1.2 , 1 ) T
3the boundary value problem [29] x 0 = ( 1 n + 1 , 2 n + 1 , n n + 1 ) T
4Broyden tridiagonal function [42] x 0 = ( 1 , 1 , , 1 ) T
5DQRTIC [43] x 0 = ( 2 , 2 , , 2 ) T
6EDENSCH [43] x 0 = ( 0 , 0 , , 0 ) T
7ENGVAL1 [43] x 0 = ( 2 , 2 , , 2 ) T
8COSINE [43] x 0 = ( 1 , 1 , , 1 ) T
9ERRINROS-modified [43] x 0 = ( 1 , 1 , , 1 ) T
10FREUROTH [43] x 0 = ( 0.5 , 2 , 0 , , 0 ) T
11MOREBV- different start point [43] x 0 = ( 0.5 , 0.5 , , 0.5 ) T
12TOINTGSS [43] x 0 = ( 3 , 3 , , 3 ) T
13SCHMVETT [43] x 0 = ( 3 , 3 , , 3 ) T
14Extended Freudenstein and Roth function [44] x 0 = ( 0.5 , 2 , , 0.5 , 2 ) T
15Raydan 1 function [44] x 0 = ( 1 , 1 , , 1 ) T
16Generalized Tridiagonal function [44] x 0 = ( 2 , 2 , , 2 ) T
17Extended Himmelblau function [44] x 0 = ( 1 , 1 , , 1 ) T
18Generalized PSCI function [44] x 0 = ( 3 , 0.1 , , 3 , 0.1 ) T
19Extended Tridiagonal 2 function [44] x 0 = ( 1 , 1 , , 1 ) T
20Raydan 2 function [44] x 0 = ( 1 , 1 , , 1 ) T
21Extended Freudenstein and Roth function [44] x 0 = ( 1 , 1 , , 1 ) T
22DQDRTIC function [44] x 0 = ( 3 , 3 , , 3 ) T
23Generalized Quartic function [44] x 0 = ( 1 , 1 , , 1 ) T
24HIMMELBG function [44] x 0 = ( 1.5 , 1.5 , , 1.5 ) T
Table 2. Numbers of iterations for problems 1–12.
Table 2. Numbers of iterations for problems 1–12.
Dim10205010020050010002000500010,000
(1) Algorithm 13038517896146217301424527
(1) MCQN-BFGS2938517295146192298424528
(1) L-BFGS2639961583608641042175931533152
(2) Algorithm 1499016630859513452699543732182725
(2) MCQN-BFGS609520038468316683249648645623207
(2) L-BFGS5911326050499924814947988724,73249,391
(3) Algorithm 11626425859514960102399
(3) MCQN-BFGS1526425059715469101402
(3) L-BFGS391142797001503165926953370886727,471
(4) Algorithm 131253449434443495253
(4) MCQN-BFGS30294345495861626356
(4) L-BFGS21274054413838565250
(5) Algorithm 1304860921099992898181
(5) MCQN-BFGS35496794111108112988484
(5) L-BFGS28273431333941435481
(6) Algorithm 123263844545547516151
(6) MCQN-BFGS17273653605554515054
(6) L-BFGS17191922212324262425
(7) Algorithm 116212119171716151615
(7) MCQN-BFGS20222322151515171716
(7) L-BFGS20222621222125272830
(8) Algorithm 122232421232727282930
(8) MCQN-BFGS23252626262728282930
(8) L-BFGS9999101010101010
(9) Algorithm 174122134149137180148153172170
(9) MCQN-BFGS106125145171199181168171174179
(9) L-BFGS163245216196189190163169171192
(10) Algorithm 145454839454345145161145
(10) MCQN-BFGS47484943474841244204279
(10) L-BFGS24252424242222222022
(11) Algorithm 1244597121826735342111
(11) MCQN-BFGS244598121826735342111
(11) L-BFGS3310313612785493221109
(12) Algorithm 113111112956232
(12) MCQN-BFGS151113121276232
(12) L-BFGS6891013111010910
Table 3. Numbers of iterations for problems 13–24.
Table 3. Numbers of iterations for problems 13–24.
Dim10205010020050010002000500010,000
(13) Algorithm 119202122181715141312
(13) MCQN-BFGS19202122181715141312
(13) L-BFGS16181718181718181718
(14) Algorithm 136529511116926261256710621114
(14) MCQN-BFGS37548612019030059467910621126
(14) L-BFGS10101010101010101011
(15) Algorithm 11113233039456097196295
(15) MCQN-BFGS1112212837456495196295
(15) L-BFGS1321325079122207338402770
(16) Algorithm 129314769110149165174170163
(16) MCQN-BFGS29355170100146164173171164
(16) L-BFGS256580162160156151150144140
(17) Algorithm 11615141113121212109
(17) MCQN-BFGS1411181213121212109
(17) L-BFGS8888888888
(18) Algorithm 149212321191422211311
(18) MCQN-BFGS43282229201520252326
(18) L-BFGS36343736393640344137
(19) Algorithm 1131211131212121075
(19) MCQN-BFGS131211131212121075
(19) L-BFGS12141516161716151617
(20) Algorithm 15544444333
(20) MCQN-BFGS5544444333
(20) L-BFGS7777777777
(21) Algorithm 114111121181835282643
(21) MCQN-BFGS13101122181835282643
(21) L-BFGS10131519223036515664
(22) Algorithm 136362628262727302829
(22) MCQN-BFGS36362628262727302829
(22) L-BFGS13131616171616151919
(23) Algorithm 1131712211214811109
(23) MCQN-BFGS16182225151312131210
(23) L-BFGS14141615172427272731
(24) Algorithm 11113182431372821137
(24) MCQN-BFGS1215192532372821137
(24) L-BFGS10101010101010101010
Table 4. Results of Dim = 1000 with different initial points.
Table 4. Results of Dim = 1000 with different initial points.
ProAlgorithm 1MCQN-BFGS
Init x 0 2 x 0 4 x 0 7 x 0 10 x 0 x 0 2 x 0 4 x 0 7 x 0 10 x 0
(1)217189192194196192210213220213
(2)2699268426411192269432494850505621574961
(3)494755686554213210228294
(4)434736818060213210228294
(5)9283869189112106859494
(6)47474747475454545454
(7)16305319211519272122
(8)27311635542831163356
(9)148159154184157168151147191154
(10)454516417017541263203192194
(11)357096112127357096112127
(12)6622266222
(13)15161714141516171414
(14)612312504511318594523503481563
(15)605720845210246458203532941
(16)165180173153129164183191197215
(17)12111072112118730
(18)22232560412023306575
(19)12121416241212142220
(20)4656446564
(21)353035112719353035112719
(22)27282829292728282929
(23)8222314351213241970
(24)28202121212820212121
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Cao, H.; An, X. A Sparse Quasi-Newton Method Based on Automatic Differentiation for Solving Unconstrained Optimization Problems. Symmetry 2021, 13, 2093. https://doi.org/10.3390/sym13112093

AMA Style

Cao H, An X. A Sparse Quasi-Newton Method Based on Automatic Differentiation for Solving Unconstrained Optimization Problems. Symmetry. 2021; 13(11):2093. https://doi.org/10.3390/sym13112093

Chicago/Turabian Style

Cao, Huiping, and Xiaomin An. 2021. "A Sparse Quasi-Newton Method Based on Automatic Differentiation for Solving Unconstrained Optimization Problems" Symmetry 13, no. 11: 2093. https://doi.org/10.3390/sym13112093

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop