Next Article in Journal
A Methodology for the Analysis of Collaboration Networks with Higher-Order Interactions
Next Article in Special Issue
A Multi-Task Decomposition-Based Evolutionary Algorithm for Tackling High-Dimensional Bi-Objective Feature Selection
Previous Article in Journal
A Study on the Choice of Online Marketplace Co-Opetition Strategy Considering the Promotional Behavior of a Store on an E-Commerce Platform
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Family of Multi-Step Subgradient Minimization Methods

by
Elena Tovbis
1,
Vladimir Krutikov
2,3,
Predrag Stanimirović
3,4,
Vladimir Meshechkin
2,
Aleksey Popov
1 and
Lev Kazakovtsev
1,3,*
1
Institute of Informatics and Telecommunications, Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarskii Rabochii Prospekt, Krasnoyarsk 660037, Russia
2
Department of Applied Mathematics, Kemerovo State University, 6 Krasnaya Street, Kemerovo 650043, Russia
3
Faculty of Sciences and Mathematics, University of Nis, 18000 Nis, Serbia
4
Laboratory “Hybrid Methods of Modeling and Optimization in Complex Systems”, Siberian Federal University, 79 Svobodny Prospekt, Krasnoyarsk 660041, Russia
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(10), 2264; https://doi.org/10.3390/math11102264
Submission received: 5 April 2023 / Revised: 5 May 2023 / Accepted: 9 May 2023 / Published: 11 May 2023
(This article belongs to the Special Issue Intelligent Computing and Optimization)

Abstract

:
For solving non-smooth multidimensional optimization problems, we present a family of relaxation subgradient methods (RSMs) with a built-in algorithm for finding the descent direction that forms an acute angle with all subgradients in the neighborhood of the current minimum. Minimizing the function along the opposite direction (with a minus sign) enables the algorithm to go beyond the neighborhood of the current minimum. The family of algorithms for finding the descent direction is based on solving systems of inequalities. The finite convergence of the algorithms on separable bounded sets is proved. Algorithms for solving systems of inequalities are used to organize the RSM family. On quadratic functions, the methods of the RSM family are equivalent to the conjugate gradient method (CGM). The methods are intended for solving high-dimensional problems and are studied theoretically and numerically. Examples of solving convex and non-convex smooth and non-smooth problems of large dimensions are given.

1. Introduction

The beginning of research in the field of subgradient methods for minimizing a convex, but not necessarily differentiable, function was laid in the works [1,2], the results of which can be found in [3]. There are several directions for constructing non-smooth optimization methods. One of them [4,5,6] is based on the construction and use of function approximations. A number of effective approaches in the field of non-smooth optimization are associated with a change in the space metric as a result of space dilation operations [7,8]. Distance-to-extremum relaxation methods for minimization were first proposed in [9] and developed in [10]. The first relaxation-by-function methods were proposed in [11,12,13].
The need for methods for solving complex non-smooth high-dimensional minimization problems is constantly growing. In the case of smooth functions, the conjugate gradient method (CGM) [3] is one of the universal methods for solving ill-conditioned high-dimensional problems. The CGM is a multi-step method that is optimal in terms of the convergence rate on quadratic functions [3,14].
CGM generates search directions that are more consistent with the geometry of the minimized function. In practice, the CGM shows faster convergence rates than gradient descent algorithms, so CGM is widely used in machine learning. The original CGM, known as the Hestenes–Stiefel method [15], was introduced in 1952 for solving linear systems. There are several modifications of the Hestenes–Stiefel method, such as the Fletcher–Reeves method [16], Polak–Ribiere method [17], or Dai–Yuan method [18], which mainly differ in the way the conjugate gradient update parameter is calculated.
Fletcher and Reeves justified the convergence of the CGM for quadratic functions and generalized it for the case of non-quadratic functions. The Polak–Ribiere method is based on an exact procedure for searching along a straight line and on a more general assumption about the approximation of the objective function. At each iteration of the Polak–Ribiere or Fletcher–Reeves methods, the function and its gradient are calculated once, and the problem of one-dimensional optimization is solved. Thus, the complexity of one step of the CGM is of the same order as the complexity of the step of the steepest descent method. It was proven in [19] that the Polak–Ribiere method is also characterized by a linear convergence rate in the absence of returns to the initial iteration, but it has an advantage over the Fletcher–Reeves method in solving problems with general objective functions and is less sensitive to rounding errors when conducting a one-dimensional search. The Dai–Yuan algorithm converged globally, provided the line search made the standard Wolfe conditions hold.
Miele and Cantrell [20] generalized the approach of Fletcher and Reeves by proposing a gradient method with memory. The method is based on the use of two selectable minimization parameters in each of the search directions. This method is efficient in terms of the number of iterations required to solve the problem, but it requires more computations of the function values and gradient components than the Fletcher–Reeves method. The idea of the memory gradient method was further extended to the multi-dimensional search methods that are used mostly for unconstrained optimization in large-scale problems [21,22,23,24,25,26].
The improved CGM [27], Fletcher–Reeves (IFR), and Dai–Yuan methods mixed together with the second inequality of the strong Wolfe line search can be used to construct two new conjugate parameters. In online CGM, Xue et al. [28] combined the IFR method with the variance reduction approach [29]. This algorithm achieves a linear convergence rate under the strong Wolfe line search for the smooth and strongly convex objective function.
Dai and Liao [30] introduced CGM based on a modified conjugate gradient update parameter. Modifications of this method were later presented in [31,32,33,34].
In [35], an improved CG algorithm with a generalized Armijo search technique was proposed. A modified Fletcher–Reeves CGM for monotone nonlinear equations was described in [36]. Nonlinear CGM was considered an adaptive momentum method combined with the steepest descent along the search direction in [37]. In [38], the author used an estimate of the Hessian to approximate the optimal step size. The paper in [39] proposed a CGM on Riemannian manifolds. CG algorithms for stochastic optimization were introduced in [40,41,42]. Algorithms of this type use a small part of samples for large-scale learning problems.
Preconditioning is another technique to speed up the convergence of CG descent. The idea of preconditioning is to make a change in variables using an invertible matrix. The authors in [43] proposed a non-monotone scaled CG algorithm for solving large-scale unconstrained optimization problems, which combines the idea of a scaled memoryless Broyden–Fletcher–Goldfarb–Shanno (BFGS) method with the non-monotone technique. Inexact preconditioned CGM with an inner–outer iteration for a symmetric positive definite system was proposed in [44]. In [45], the authors developed an optimizer that uses CG with a diagonal preconditioner.
In [46], the authors combined the limited memory technique with a subspace minimization conjugate gradient method and presented a limited memory subspace minimization conjugate gradient algorithm that, by the first step, determines the search direction, and by the second step, applies the quasi-Newtonian method in the subspace to improve the orthogonality of gradients.
The idea of the spectral CG method is based on combining the idea of CG methods with spectral gradients. Li et al. [47] proposed a spectral three-term conjugate gradient method and proved the global convergence of this algorithm for uniformly convex functions. This work was further developed in [48].
The practical application of the conjugate gradient method is very wide and includes, for example, structured prediction problems and neural network learning [29], continuum mechanics [49], signal and image recovery problems [32,36], COVID-19 regression models [50], robot motion control problems [50], psychographic reconstruction [51], and molecular dynamics simulations [52].
For a more detailed review of conjugate gradient methods, see [40,53].
It seems relevant to create multi-step universal methods for solving non-smooth problems that are applicable in terms of computer memory resources for solving high-dimensional minimization problems [54,55,56,57]. In this work, we propose a family of multi-step RSMs for solving large-scale problems. With a certain organization of the methods of the family, such as the CGM, they enable us to find the minimum of a quadratic function in a finite number of iterations.
The subgradient method is an algorithm that was originally developed by Shor [1] for minimizing a non-differentiable convex function. The issue of subgradient methods is their speed, and several approaches can be used to speed them up.
Incremental subgradient methods were studied in [58,59,60,61,62]. The main difference with the standard subgradient method is that at each iteration, x is changed incrementally through a sequence of steps. In [60], a class of subgradient methods for minimizing a convex function that consists of the sum of many component functions was considered. In [63], the authors presented a family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. An adaptive subgradient method for the split quasi-convex feasibility problems was developed in [64]. Proximal subgradient methods were presented in [65,66]. The authors in [65] proposed a model with a proximal conjugate subgradient (PCS-TT) method for solving the non-convex rank minimization problem by using properties of Moreau’s decomposition. A conjugate subgradient projection model as applied to continuous road network design problems was presented in [67]. The paper in [68] described a conjugate subgradient algorithm that minimizes a convex function containing a least squares fidelity term and an absolute value regularization term. This method can be applied to the inversion of ill-conditioned linear problems. A non-monotone conjugate subgradient type method without any line search was described in [69].
The principle of organization in a number of the RSMs [70] is that, in a particular RSM, there is an independent algorithm for finding the descent direction, which makes it possible to go beyond some neighborhood of the current minimum. In [70,71], the problem of finding the descent direction in RSM was formulated as the problem of solving systems of inequalities on separable sets. The use of a particular model of subgradient sets makes it possible to reduce the original problem to the problem of estimating the parameters of a linear function from information about subgradients obtained during the operation of the minimization algorithm, and mathematically formalize it as a problem of minimizing the quality functional. This makes it possible to use the ideas and methods of machine learning [72] to find the descent direction in RSM [70,71,73,74].
Thus, a specific new learning algorithm will be used as the basis of a new RSM method. The properties of the minimization method are determined by the learning algorithm underlying it. The aim of this work is to develop a family of methods for solving systems of inequalities (MSSIs) and, on this basis, to create a family of multi-step RSMs (MRSMs) for solving large-scale smooth and non-smooth minimization problems. Known methods [73,74] are special cases of the MRSM family presented here.
It Is proven that the algorithms of the MSSI family converge in a finite number of iterations on separable sets. On strictly convex functions, the convergence of the MRSM algorithms is theoretically substantiated. It is proven that MRSM algorithms on quadratic functions are equivalent to the CGM.
In the practical implementation of RSM, several problems arise in combining the use of information about the function, both for minimization and for the internal algorithm for finding the descent direction. If, in CGM, the goal of a one-dimensional search is high accuracy, then, in RSM, the goal is to keep the step of a one-dimensional search proportional to the distance to the extremum, which eliminates looping and enables the learning algorithm to find a way out of a wide neighborhood of the current minimum. In accordance with the noted principle, we use a one-dimensional minimization procedure in which the rate of step decrease is controlled.
The described algorithms are implemented. A numerical experiment was carried out to select efficient versions from a family of algorithms. For the selected versions, an extensive experiment was carried out to compare them on smooth functions with various versions of the CGM. It was found that, along with the CGM, the proposed algorithms can be used to minimize smooth functions. The proposed methods are studied numerically on large-scale tests for solving convex and non-convex non-smooth optimization problems.
The rest of this paper is organized as follows: In Section 2, we state the problem of our study. In Section 3, we describe the method for solving systems of inequalities. In Section 4, we present a subgradient minimization method. In Section 5, we implement the proposed minimization algorithm. In Section 6, we perform a series of experiments with the implemented method. In the last section, we provide a short conclusion of the work.

2. The Problem Formulation

Let us solve a minimization problem for a convex function f(x) in Rn. In the RSM, the successive approximations are constructed according to the expressions [13]:
x k + 1 = x k γ k s k + 1 ,   γ k = arg min γ R f ( x k γ s k + 1 )
where the descent direction sk+1 is chosen as a solution for the system of inequalities [13]:
( s , g ) > 0 ,   g G
Here, G = ε f ( x i ) is the ε-subgradient set at point xi. Denote by S(G) the set of solutions to (2) and the subgradient set in x by f ( x ) f 0 ( x ) . Iterative methods (learning algorithms) are used to solve systems of inequalities (2) in the RSM. Since elements of the ε-subgradient set are not explicitly specified, subgradients calculated on the descent trajectory of the minimization algorithm are used instead.
The solution vector s* of the system (2) forms an acute angle with each of the subgradients of the set G. If the subgradients of some neighborhood of the current minimum of (1) act as the set G, then iteration (1) for s k = s provides the possibility of going beyond this neighborhood with a simultaneous decrease in the function. It seems relevant to search for efficient methods for solving (2).
In [70,71,73,74], the authors proposed the following approach to reduce the system (2) to an equivalent system of equalities. Let G R n belong to some hyperplane, and its vector η ( G ) closest to the origin be also the vector of the hyperplane closest to the origin. In this case, the solution of the system ( s , g )   =   1 ,   g G is also a solution for (2). It can be found as a solution to the system [70,71,73,74]:
( s , g i )   =   y i ,   i = 0 , 1 , , k ,   y i 1 .
Figure 1 shows the projection of a subgradient set in the form of a segment [A,B] lying on a straight line in the plane of vectors z1 and z2. The vector η ( G ) G lies in this plane and is the normal of the hyperplane (s*, g) = 1 formed by the vectors g at s * = η ( G ) / | | η ( G ) | | 2 .
The problem of solving the system (3) is one of the most common data analysis problems for which gradient minimization methods are used. The minimization function is formulated as:
F k ( s ) = ( y k ( s , g k ) ) / 2 .
To minimize it, various gradient-type methods are used. In a similar way, a solution is sought in the problems of constructing approximations by neural networks.
In [70], for solving system (3), a gradient minimization method was proposed—the Kaczmarz algorithm [75]:
s k + 1 = s k + 1 ( s k , g k ) ( g k , g k ) g k .
The method (4) provides an approximation s k + 1 that satisfies the equation ( s , g k )   =   1 , i.e., the last-received training equation from (3).
Figure 2 shows iterations (4) in the plane of vectors gk, s*, assuming that the set G represented by the segment [A,B] belongs to the hyperplane. The dashed line Wk in Figure 2 is the projection of the hyperplane ( g k , s ) = 1 for vectors s. In the case when the set G belongs to the hyperplane, the hyperplane of vectors s  ( s , g )   =   1 formed with some g G contains the vector s*.
In [71], to solve the system of inequalities (2), the descent direction correction scheme was used based on the exact solution of the last two equalities from (3) for the pair of indices k−1 and k, which can be realized by correction along the vector pk orthogonal to vector gk−1.
s k + 1 = s k + 1 ( s k , g k ) ( p k , g k ) p k ,
p k = g k α k ( g k , g k 1 ) g k 1 2 g k 1
Here, αk is the space dilation parameter. It is assumed here that before operations (5) and (6) are performed, the initial conditions ( g k 1 , s k 1 ) = 1 and ( g k 1 , g k ) 0 are satisfied, which is shown in Figure 3.
Figure 3 shows iterations (6) and (5) in the plane of vectors gk and gk−1. As a result of the operation, the vector s2k+1 will be found—the projection of the vector s* in the plane of the vectors gk and gk−1. The projections of the hyperplanes ( g k , s ) = 1 and ( g k 1 , s ) = 1 are shown as dashed lines Wk and Wk−1. The vector s1k+1 is the projection into the plane of the result of iteration (4).
On separable sets, iterations (6) and (5) lead to an acceleration in the convergence of the method for solving systems of inequalities. In the minimization method, under conditions of a rapidly changing position of the current minimum, the subgradients used in (6) and (5) in many cases do not belong to separable sets, which leads to the need to update the process (6), (5) with the loss of accumulated information.
In this paper, we consider a linear combination of solutions s1k+1 and s2k+1 as a descent vector s0k+1. This enables us to form a family of methods for solving systems of inequalities. On this basis, a family of subgradient MRSMs is constructed. Practical implementations with a special choice of the solution s0k+1 turn out to be more efficient, capable of covering wider neighborhoods of the current approximation using a rough one-dimensional search. The wider the neighborhood is, the greater the progress towards the extremum, and the higher the stability of the method to roundoff errors, noise, and the ability to overcome small local extrema. In this regard, the minimization methods studied in this work are of particular importance, in which, unlike the method from [11] and its modification [13], the built-in algorithms for solving systems of inequalities enable us to use the subgradients of a fairly wide neighborhood of the current minimum approximation and do not require exact one-dimensional descent.

3. A Family of Methods for Solving Systems of Inequalities

In the family of algorithms presented below, successive approximations of the solution to the system of inequalities (2) are constructed by correcting the current approximation.
Let us denote the vector closest to the origin of the coordinates in the set G as: η G η ( G ) , ρ G ρ ( G ) = | | η ( G ) | | , μ G = η ( G ) / | | η ( G ) | | , s = μ G / ρ G , R G R ( G ) = max g G | | g | | . Let us make an assumption concerning the set G.
 Assumption 1. 
The set G is non-empty, convex, closed, bounded R G < , satisfying the separability condition, i.e., ρ G > 0 .
Figure 4 shows the separable set and its elements.
Under the assumption made, since the vector ηG is a vector of minimal length in G, taking into account the convexity of the set, the inequalities ( η G , g ) ρ G 2 and g G will hold. Under these conditions, the vectors ηG, μG, and s* are solutions to (2), and the vectors g G satisfy the constraints:
1 ( s * , g ) R G / ρ G ,   g G
The vector s* is one of the solutions to system (2). The following algorithm searches for an approximation of s* using linear combinations of iterations (4) and (6), (5).
Algorithm 1 for αk = 0 implements a scheme based on the Kaczmarz algorithm [73], denote it as A0. For αk = 1, it implements an algorithm for solving systems of inequalities from [74].
Algorithm 1: A(αk).
Input: initial approximation s0
Output: solution s*
1. Assume k = 0, gk−1 = 0.
2. Choose arbitrary g k G so that 
( s k , g k ) 0

If such a vector does not exist, then s* = s k S ( G ) , stop the algorithm.
3. Estimate sk+1:
s k + 1 = s k + 1 ( s k , g k ) ( p k , g k ) p k ,

where the correction vector pk, taking into account the condition:
( g k , g k 1 ) < 0

Which is given by:
p k = g k ,

if (10) does not hold, then
p k = g k α k ( g k , g k 1 ) g k 1 2 g k 1 ,

if (10) holds.
The value αk is limited by:
0 α k 1

4. Assign k = k + 1. Go to step 2.
Since the algorithm is designed to find a solution to system (2) in the form of a vector s*, we will study the behavior of the residual vector Δ k = s * s k .
 Lemma 1. 
Let the sequence {sk} be obtained as a result of the use of Algorithm 1. Then, for k = 0, 1, 2,…, we have the following estimates:
( s k + 1 , g k ) = 1 ,   k = 0 ,   1 ,   2 ,
( p k , p k ) ( p k , g k ) ( g k , g k ) ,   k = 0 ,   1 ,   2 ,
( Δ k , g k 1 ) 0 ,   k =   0 ,   1 ,   2 ,
( Δ k , p k ) ( Δ k , g k ) 1 ( s k , g k ) 1 ,   k = 0 ,   1 ,   2 ,
 Proof of Lemma 1. 
Let us prove (14). Consider the cases of transformation (9) combined with (11) and (12). According to (9) and (11)
( s k + 1 , g k ) = ( s k , g k ) + 1 ( s k , g k ) ( g k , g k )   ( g k , g k ) = 1
According to (9), (12)
( s k + 1 , g k ) = ( s k , g k ) + 1 ( s k , g k ) ( p k , g k )   ( p k , g k ) = 1
Thus, equality (14) always holds. In the case of transformation (12) with αk = 1, the vectors pk and gk−1 are orthogonal:
( p k , g k 1 ) = ( g k , g k 1 ) ( g k , g k 1 ) g k 1 2 ( g k 1 , g k 1 ) = 0
Therefore, the equality ( s k + 1 , g k 1 ) = 1 is preserved. This case corresponds to the exact solution of the last two equalities in (3).
Let us prove (15). Inequalities (15) will hold in the case (11). In the case (12), we carry out transformations proving (15):
( p k , p k ) = ( g k , g k ) 2 α k ( g k , g k 1 ) 2 g k 1 2 + α k 2 ( g k , g k 1 ) 2 g k 1 2
Hence, from (13) and (12) follows:
( p k , p k ) ( g k , g k ) 2 α k ( g k , g k 1 ) 2 g k 1 2 + α k ( g k , g k 1 ) 2 g k 1 2 = ( g k , g k ) α k ( g k , g k 1 ) 2 g k 1 2 = ( p k , g k ) ( g k , g k )
Let us prove (16). For k = 0, (16) is satisfied due to g1 = 0. For k > 0, (16) follows from (7) and (14):
( Δ k , g k 1 ) = ( s * , g k 1 ) ( s k , g k 1 ) 1 ( s k , g k 1 ) 1 1 = 0
Let us prove (17). The first of the inequalities in (17) holds as an equality for (11), and in case (12), taking into account the sign under condition (10) and inequality (16), we obtain:
( Δ k , p k ) = ( Δ k , g k ) α k ( g k , g k 1 ) g k 1 2 ( Δ k , g k 1 ) ( Δ k , g k )
The second inequality in (17) follows from constraints (7). The last inequality in (17) follows from condition (8). □
The following theorem states that transformation (12) provides a direction pk to the solution point s* with a more acute angle compared to gk.
 Theorem 1. 
Let the sequence {sk} be obtained as a result of the use of Algorithm 1. Then, for k = 0, 1, 2…, we have the estimate:
( Δ k , p k ) ( Δ k , Δ k ) 0.5 ( p k , p k ) 0.5 ( Δ k , g k ) ( Δ k , Δ k ) 0.5 ( g k , g k ) 0.5
 Proof of Theorem 1. 
Consistently using (17) and (15), we obtain (18):
( Δ k , p k ) ( Δ k , Δ k ) 0.5 ( p k , p k ) 0.5 ( Δ k , g k ) ( Δ k , Δ k ) 0.5 ( p k , p k ) 0.5 ( Δ k , g k ) ( Δ k , Δ k ) 0.5 ( g k , g k ) 0.5
 Lemma 2. 
Let the set G satisfy Assumption 1. Then, s k S ( G )   if
| | Δ k | | < 1 / R G
 Proof of Lemma 2. 
Using (19) and the scalar product property, we obtain an estimate in the form of a strict inequality for vectors from G:
| ( Δ k , g ) | = | ( s * s k , g ) | | | s * s k | |   × | | g | | | | s * s k | |   × R G < R G / R G = 1 .
Hence, taking into account the constraint (7), we obtain the proof. □
The following theorem substantiates the finite convergence of Algorithm 1.
 Theorem 2. 
Let the set G satisfy Assumption 1. Then, to estimate the convergence rate of the sequence {sk}, k = 0, 1, 2… to the point s* generated by Algorithm 1 up to the moment of stopping, the following observations are true:
( Δ k , Δ k ) ( Δ k 1 , Δ k 1 ) 1 R G 2
| | Δ k | | 2 ( | | s 0 | | + ρ G 1 ) 2 k / R G 2
for ρG−1 we have the estimate:
ρ G 1 ( j = 0 k ( g j , g j ) 1 ) 0.5 | | s 0 | | k 0.5 R G | | s 0 | |
and for some value k, satisfying the inequality:
k k ( R G | | s 0 | | + R G ρ G ) 2 + 1
we will obtain the vector s k S ( G ) .
 Proof of Theorem 2. 
Using (9), we obtain an equality for the squared norm of the residual Δ k + 1 :
( Δ k + 1 , Δ k + 1 ) = ( Δ k , Δ k ) 2 ( Δ k , p k ) 1 ( s k , g k ) ( p k , g k ) + ( p k , p k ) ( 1 ( s k , g k ) ) 2 ( p k , g k ) 2
We transform the right side of the resulting expression, considering inequalities (17), replacing ( Δ k , p k ) with 1 ( s k , g k ) :
( Δ k + 1 , Δ k + 1 ) ( Δ k , Δ k ) 2 ( 1 ( s k , g k ) ) 2 ( p k , g k ) + ( p k , p k ) ( 1 ( s k , g k ) ) 2 ( p k , g k ) 2
In the resulting expression, we replace the factor ( p k , p k ) , according to (15), by a larger value ( p k , g k ) . As a result, we obtain:
( Δ k + 1 , Δ k + 1 ) ( Δ k , Δ k ) ( 1 ( s k , g k ) ) 2 ( p k , g k ) ( Δ k , Δ k ) 1 ( g k , g k ) ( Δ k , Δ k ) 1 R G 2
Here, the last two inequalities are obtained considering (8) and the definition of RG. With the indexing taken into account, we prove (20). Using recursively (20) and the inequality:
| | s s 0 | | 2 ( | | s 0 | | + | | s | | ) 2 = ( | | s 0 | | + ρ G 1 ) 2
which follows from the properties of the norm, we obtain estimate (21). Estimate (22) is a consequence of (21).
According to (21) | | Δ k | | 0 . Therefore, at some step k, inequality (19) will be satisfied for the vector sk, i.e., a vector s k S ( G ) will be obtained that is a solution to system (2). As an upper bound for the required number of steps, we can take k*, equal to the value k at which the right side of (21) vanishes, increased by 1. This provides an estimate for the required number of iterations k*. □
In the minimization algorithm, s0 = 0 is set. In this case, (22) will take the form:
ρ G ( j = 0 k ( g j , g j ) 1 ) 0.5 R G k 0.5
Inequalities (23) will hold as long as it is possible to find a vector g k G , satisfying the condition (8). In the minimization algorithm, under the condition of exact one-dimensional descent, there will always be gk satisfying condition (8). Therefore, estimates (23) will be used in the rules for updating the algorithm for solving systems of inequalities in the minimization method under constraints on the parameters of subgradient sets.

4. A Family of Subgradient Minimization Methods

The idea of organizing a minimization algorithm is to construct a descent direction that provides a solution to a system of inequalities of type (2) for subgradients in the neighborhood of the current minimum. Such a solution will allow, by means of one-dimensional minimization (1), to go beyond this neighborhood, that is, to find a point with a smaller value of the function outside the neighborhood of the current minimum.
Let the function f ( x ) ,   x R n be convex. Denote d ( x ) = ρ ( f ( x ) ) as the length of the vector of the minimum length of the subgradient set at the point x, D ( z ) = { x R n | f ( x ) f ( z ) } .
 Note 1. 
For a function convex on Rn, if the set D(x0) is bounded, for points  x * D ( x 0 ) satisfying the condition d ( x * ) < d 0 , the following estimate is correct [13]:
f ( x ) f D d 0
where D is the diameter of set D(x0), d0 is a given value, f = inf x R n f ( x ) .
The minimization algorithm must build a sequence of approximations of which the limit points x* satisfy the condition d(x*) < d0 for a given value of d0. This will provide, according to (24), the specified accuracy of minimization with respect to the function. For these purposes, the parameters are set in the algorithm in such a way as to ensure the search for points x* that satisfy the condition d(x*) < d0. The connection between d0 and the parameters of the algorithm will be established in more detail in Theorem 3.
When solving a minimization problem with a built-in algorithm for solving systems of inequalities in an exact one-dimensional search along a direction, according to the necessary condition for the minimum of a one-dimensional function, there is always a subgradient that satisfies condition (8). Therefore, criteria for updating the method for solving systems of inequalities are necessary, sufficient, but not excessive, for convergence to limit points x* satisfying the condition d ( x * ) < d 0 . For these purposes, relations (23) will be used, signaling the solution of a system of inequalities with given characteristics sufficient to exit the neighborhood of the current minimum.
Let us describe the minimization method with a built-in Algorithm 1 for finding points x R n such that d ( x ) E 0 , where E 0 > 0 .
In Algorithm 2, in steps 2, 4, and 5, there is a built-in algorithm for solving inequalities. Algorithm 2 for αk = 0 was obtained in [73] and uses the method for solving the inequalities with the Kaczmarz Formula (4) (we denote it as M0). Algorithm 2 for αk = 1 was obtained in [74].
Algorithm 2: MA(αk).
Input: initial approximation point x0
Output: minimum point x*
1. Set the initial approximation x 0 R n , integer k = j = 0.
2. Assign j = j + 1 ,   q j = k ,   s k = 0 ,   g k 1 = 0 , Σ k = 0 .
3. Set ε j , m j .
4. Calculate the subgradient g k f ( x k ) , which satisfies ( s k , g k ) 0 . If g k = 0 , then x* = xk, stop the algorithm.
5. Obtain a new approximation s k + 1 = s k + 1 ( s k , g k ) ( p k , g k ) p k , where
p k = { g k ,   i f   ( g k , g k 1 ) 0 ,   g k α k ( g k , g k 1 ) g k 1 2 g k 1 ,   i f   ( g k , g k 1 )   < 0 .  
The value of αk is bounded 0 α k 1 similarly to (13).
6. Calculate a new approximation of the criterion Σ k + 1 = Σ k + ( g k , g k ) 1 .
7. Calculate a new approximation of the minimum point
x k + 1 = x k γ k s k + 1 ,   γ k = arg min γ R f ( x k γ s k + 1 ) .
8. Set k = k + 1.
9. If 1 / Σ k < ε j , then go to step 2.
10. If k q j > m j , then go to step 2; otherwise, go to step 4.
The index qj, j = 0, 1, 2,… was introduced to denote the numbers of iterations k, at which, in step 2, when the criteria of steps 9 and 10 are met, the algorithm for solving inequalities is updated (sk = 0, gk−1 = 0). According to (21) and (22), the algorithm for solving the system of inequalities with s0 = 0 has the best convergence rate estimates. Therefore, when updating in step 2 of Algorithm 2, we set sk = 0. The need for updating arises due to the fact that as a result of the shifts in step 7, the subgradient sets in the neighborhood of the current point of the minimum are changed, which leads to the need to solve the system of inequalities based on new information.
By virtue of exact one-dimensional descent along the direction (−sk+1) in step 7, at a new point xk+1, the vector gk+1 ∈ ∂f(xk+1), such that (gk+1,sk+1) ≤ 0, always exists according to the necessary condition for a minimum of one-dimensional function (see [13]). Therefore, regardless of the number of iterations k, the condition (gk,sk) ≤ 0 of step 4 will always be satisfied.
The proof of the convergence of Algorithm 2 is based on the following lemma.
 Lemma 3 ([13]). 
Let the function f(x) be strictly convex on Rn, the set D(x0) be bounded, and the sequence { x k } k = 0  be such that   f ( x k + 1 ) = min α [ 0 , 1 ] f ( x k + α ( x k + 1 x k ) ) . Then, lim k | | x k + 1 x k | | = 0 .
Under the conditions of an exact one-dimensional search, the conditions of Lemma 3 will be satisfied in iterations of Algorithm 2.
Denote by W ε ( G ) = { z R n | z x ε , x G } the ε-neighborhood of the set G, by U δ ( x ) = { z R n |   | | z x | | δ } the δ-neighborhood of the point x, z j = x q j ,   Q j = Σ q j ,   j = 1 , 2 , , i.e., the points xk and the values of the Σ k , corresponding to the indices k at the time of updating in step 2 of Algorithm 2.
 Theorem 3. 
Let the function f(x) be strictly convex on Rn and the set D(x0) be bounded, and the parameters εj and mj specified in step 2 of Algorithm 2 are fixed:
ε j = E 0 > 0 ,   m j = M 0
Then, if x* is the limit point of the sequence  { x q j } j = 1  generated by Algorithm 2; then,
d ( x ) max { E 0 , R ( x 0 ) / M 0 } d 0
where R ( x 0 ) = max x D ( x 0 ) max v f ( x ) | | v | | . In particular, if M 0 R 2 ( x 0 ) E 0 2 , then d ( x ) E 0 .
 Proof of Theorem 3. 
Let conditions (25) be satisfied. The existence of limit points of the sequence {zk} follows from the fact that the set D(x0) is bounded and z j D ( x 0 ) . Assume that the statement of the theorem is false: suppose that the subsequence z j s x , but
d ( x ) = d > d 0 > 0
Assume that
ε = ( d d 0 ) / 2 .
Denote W ε = W ε ( f ( x ) ) . Choose δ > 0 , so that
f ( x ) W ε x U δ ( x )
Such a choice is possible due to the upper semicontinuity of the point-set mapping f ( x ) (see [13]).
Choose a number K, such that for js > K, the following will hold:
z j s U δ / 2 ( x ) ,   x k U δ ( x ) ,   q j s k q j s + M 0
i.e., such a number K that the points xk remain in the neighborhood U δ ( x ) for at least M0 steps of the algorithm. Such a choice is possible due to the assumption of convergence z j s x and the result of Lemma 3, the conditions of which are satisfied under the conditions of Theorem 3 and an exact one-dimensional descent in step 7 of Algorithm 2.
According to assumption (27), the choice conditions of ε in (28), δ in (29), and K, which ensures (30), for js > K, the inequality will hold:
ρ ( W ε ) ρ ( f ( x ) ) ε = d ( d d 0 ) / 2 > d 0
For js > K, due to the validity of relations (30), it follows from (29): g k W ε ,   q j s k q j s + M 0 . Algorithm 2 includes Algorithm 1. Therefore, taking into account the estimates from (23), depending on the steps of Algorithm 2 (step 9 or step 10), the update occurs at some k, and one of the inequalities will be satisfied:
ρ ( W ε ) k 0.5 ε j E 0 d 0
ρ ( W ε ) R ( x 0 ) / m j R ( x 0 ) / M 0 d 0
The last transition in the inequalities follows from the definition of d0 in (26). However, (31) contradicts both (32) and (33). The resulting contradiction proves the theorem.
According to estimate (26), for any limit point of the sequence {zj} generated by Algorithm 2, d ( x ) < d 0 will be satisfied, and therefore, estimate (24) will be valid. □
The following theorem defines the conditions under which Algorithm 2 generates a sequence {xk} converging to a minimum point.
 Theorem 4. 
Let the function f(x) be strictly convex, the set D(x0) be bounded, and
ε j 0 ,   m j
Then, any accumulation point of the sequence { x q j } generated by Algorithm 2 is a minimum point of the function f(x) on Rn.
 Proof of Theorem 4. 
Assume that the statement of the theorem is false: suppose that the subsequence z j s x , but in this case, there exists d0 > 0, such that inequality (27) is satisfied. As before, we set ε according to (28). We choose δ > 0, such that (29) will be satisfied. By virtue of conditions (34), there is K0, such that when j > K0, the relation will hold:
max { ε j ,   R ( x 0 ) / m j } d 0
Denote E0 = d0 and denote by M0 the minimum value mj with j > K0. This renaming allows us to use the proofs of Theorem 3. Let us choose an index K > K0, such that (30) holds for js > K, i.e., a number K such that the points xk remain in neighborhood Uδ(x*) for at least M0 steps of the algorithm. According to assumption (27), conditions for choosing ε in (28), δ in (29), and k in (30) for js > K inequality (31) will hold. For js > K, due to (30), from (29) follows g k W ε ,   q j s k q j s + M 0 . Algorithm 2 contains Algorithm 1. Therefore, taking into account the estimates from (23), depending on the step number of Algorithm 2 (step 9 or step 10) in which the update occurs at some k, one of the inequalities (32) and (33) will be satisfied, where the last transition in inequalities follows from the definition of E0 and M0. However, (31) contradicts both (32) and (33). The resulting contradiction, taking into account (35) and (34), proves that the limit point can only be the minimum point. □

5. Correlation with the Conjugate Gradient Method

Let us show that the presented Algorithm 2 has the properties of the conjugate gradient method, and successive approximations of the minimum of both methods are the same on quadratic functions. Denote by f ( x ) the gradient of a function, which, in the case of a differentiable convex function, coincides with the subgradient and is the only element of the subgradient set [13]. Denote by m a number of iterations (m ≤ n) at which the minimum point is not reached. Iterations of Algorithm 2 for k = 1, 2,, m can be written as follows:
x k + 1 = x k γ k s k + 1 ,   γ k = arg min γ R f ( x k γ s k + 1 ) .
s k + 1 = s k + 1 ( s k , g k ) ( p k , g k ) p k , g k = f ( x k ) ,   g 0 = 0 ,   s 1 = 0
p k = { g k ,   i f   ( g k , g k 1 ) 0 ,   g k α k ( g k , g k 1 ) g k 1 2 g k 1 ,   i f   ( g k , g k 1 )   < 0 .  
The value of αk is limited 0 α k 1 .
Let us establish a connection between Algorithm 2 and the CGM, the iteration of which has the form:
x ¯ k + 1 = x ¯ k γ ¯ k s ¯ k + 1 , γ ¯ k = arg min γ ¯ f ( x ¯ k γ ¯ s ¯ k + 1 ) ,   k = 1 , , m ,
s ¯ 2 = g 1 ,   s ¯ k + 1 = g ¯ k + ( g ¯ k , g ¯ k ) ( g ¯ k 1 , g ¯ k 1 ) s ¯ k ,   k = 2 , , m , g ¯ k = f ( x ¯ k )
 Theorem 5. 
Let the function f(x), x R n , be quadratic, and its matrix of second derivatives is strictly positive definite; then, provided that the initial points in the algorithms (36)–(38), (39), and (40) are equal x 1 = x ¯ 1 , they generate an identical sequence of approximations of the minimum, and their characteristics satisfy the relations:
( a )   p k = g k ,   ( b )   s k + 1 = s ¯ k + 1 / ( g k , g k ) ,   ( c )   x k + 1 = x ¯ k + 1 ,   k = 1 ,   2 ,     , m
In this case, the minimum will be found after no more than n steps.
 Proof of Theorem 5. 
We will use induction. As a result of iterations (36)–(38), for k = 1, due to g0 = 0 and s1 = 0, we have p 1 = g 1 and s 2 = g 1 / ( g 1 , g 1 ) . As a result of iterations (39) and (40), for k = 1, we have s ¯ 2 = g 1 . Consequently, equalities (41(a)) and (41(b)) are satisfied for k = 1. Due to the exact one-dimensional descent and the collinearity of the descent directions, equality (41(c)) will hold for k = 1.
Assume that equalities (41) are satisfied for k = 1, 2,…, l, where l > 1. Let us show that they are satisfied for k = l + 1. According to (41(a)), the gradients of the CGM algorithms (39), (40), and (36)–(38) coincide due to the identity of the points (41(c)) at which they are calculated, and the gradients used in the CGM and, hence, in (36)–(38), are mutually orthogonal [3]. Thus, in (38), for k = l + 1, as a result of the orthogonalization of vectors gl +1 and gl, we obtain p l + 1 = g l + 1 . This proves (41(a)) for k = l + 1.
According to the condition of exact one-dimensional descent, the equality ( s l + 1 , g l + 1 ) = 0 follows. Therefore, the transformation (37), taking into account (41(a)) for k = l + 1, (41(b)) for k = l, and (40), takes the form:
s l + 2 = s l + 1 + g l + 1 ( g l + 1 , g l + 1 ) = s ¯ l + 1 ( g l , g l ) + g l + 1 ( g l + 1 , g l + 1 ) = s ¯ l + 2 ( g l + 1 , g l + 1 )
This implies (41(b)). Due to the exact one-dimensional descent and the collinearity of the descent directions, equality (41(c)) will hold for k = l + 1.
From the above proof of the equivalence of sequences generated by the CGM algorithms and (36)–(38), taking into account the property of the termination of the process of minimization by the CGM method after no more than n steps [3], the proof of the theorem follows. □

6. Implementation of the Minimization Algorithm

Algorithm 2 is implemented according to the RSM implementation technique [70,71,73,74]. Consider a version of Algorithm 2 that includes a one-dimensional minimization procedure along the direction s. This procedure: (a) constructs the current approximation of the minimum xm; (b) constructs a point y from a neighborhood xm such that for g 1 f ( y ) , the inequality ( s , g 1 ) 0 holds. The subgradient g1 is used to solve the system of inequalities. Calling the procedure will be denoted as follows:
O M ( { x ,   s ,   g x ,   f x ,   h 0 } ; { γ m ,   f m ,   g m ,   γ 1 ,   g 1 ,   h 1 } )
The input parameters are the point of the current approximation of the minimum x, descent direction s, g   x   f ( x ) , f x = f ( x ) , and the initial step h0. It is assumed that the necessary condition ( g x , s ) > 0 for the possibility of descent in direction s is satisfied. The output parameters include γm, which is a step to the point of the obtained approximation of the minimum x + = x γ m s , f m = f ( x + ) , g m f ( x + ) , γ1, which is a step along s, such that at the point y + = x γ 1 s for g 1 f ( y + ) , the inequality ( g 1 , s ) 0 holds and h1, which is an initial descent step calculated in the procedure for the next iteration. In the algorithm presented below, vectors g 1 f ( y + ) are used to solve a set of inequalities, and points x + = x γ m s are used as points of approximations of a minimum.
Algorithm of one-dimensional descent (OM). Let it be required that to find an approximation of the minimum of the one-dimensional function ϕ ( β ) = f ( x β   s ) , where x is some point, and s is the descent direction. Take an ascending sequence β 0 = 0 and β i = h 0 q M i 1 for i 1 . Denote z i = x β i s , r i f ( z i ) , l as the minimum number i at which the relation ( r i , s ) 0 is satisfied for the first time, i = 0 , 1 , 2 , . Let us set the parameters of the segment [ γ 0 , γ 1 ] of localization of the one-dimensional minimum: γ 0 = β l 1 , f 0 = f ( z l 1 ) , g 0 = r l 1 , γ 1 = β l , f 1 = f ( z l ) , g 1 = r l . Let us find the point of minimum γ* of the one-dimensional cubic approximation of the function on the segment of localization. Calculate:
γ m = { q γ 1 γ 1 ,   i f   l = 1   a n d   γ q γ 1 γ 1 , γ 1 ,   i f   γ 1 γ q γ ( γ 1 γ 0 ) , γ 0 ,   i f   l > 1   a n d   γ γ 0 q γ ( γ 1 γ 0 ) , γ ,   o t h e r w i s e .
Calculate the initial descent step for the next iteration:
h 1 = h 0 q m ( γ 1 / h 0 ) 1 / 2
In (42), a rough search for the minimum on the interval is carried out, and when choosing γ0 or γ1 instead of γm, the calculation of the function and the gradient is not required. We use parameters qγ = 0.2 and qγ1 = 0.1 and coefficients qM > 1 and qm < 1.
Minimization algorithm. In the implementation of Algorithm 2 proposed below, the method for solving inequalities is not updated, and the exact one-dimensional descent is replaced by an approximate one.
Let us explain the steps of the algorithm. The OM procedure returns two subgradients g ˜ k + 1 and g k + 1 . The first of them is used to solve the inequalities in step 2, and the second one is used in step 3 to correct the direction of descent using Equation (4) in order to provide the necessary condition ( s k + 1 , g k ) > 0 for the possibility of descent in the direction ( s k + 1 ). Iteration (4) in (45) for ( s ˜ k + 1 , g k ) < 1 is a correction (4) by the Kaczmarz algorithm. This transformation is carried out in order to direct the descent according to the subgradient of the current approximation of the minimum.
Unlike the idealized case, Algorithm 3 does not provide updates. Although the rationale for the convergence of idealized versions of RSM is made under the condition of exact one-dimensional descent, the implementation of these algorithms is carried out with one-dimensional minimization procedures in which the initial step, depending on progress, can increase or decrease, which is determined by the given coefficients qM > 1 and qm < 1. These coefficients should be chosen so that the step length (43) decrease in the one-dimensional minimization procedure corresponds to the rate of reduction in the distance to the minimum point. The minimum iteration step cannot be less than some fraction of the initial step, the value of which is given in (42) by the parameters qγ = 0.2 and qγ1 = 0.1. We used these values in our calculations.
Algorithm 3: MOM(αk).
Input: initial approximation x0, initial step of one-dimensional descent h0, maximum allowed number of iterations N, argument minimization precision εx, gradient minimization precision εg
Output: minimum point x*
1. Set the initial approximation x 0 R n , the initial step of one-dimensional descent h0. Set k = 0 , g 0 = g ˜ 0 f ( x 0 ) , g k 1 = 0 , f 0 = f ( x 0 ) , s 0 = s ˜ 0 = 0 . Set the stop parameters: maximum allowed number of iterations N, argument minimization precision εx, gradient minimization precision εg.
2. Obtain an approximation
s ˜ k + 1 = s k + 1 ( s k , g ˜ k ) ( p k , g ˜ k ) p k ,

where p k = { g ˜ k ,   i f   ( g ˜ k , g k 1 ) 0 ,   g ˜ k α k ( g ˜ k , g k 1 ) g k 1 2 g k 1 ,   i f   ( g ˜ k , g k 1 )   < 0 .  
3. Obtain the descent direction 
s k + 1 = { s ˜ k + 1 ,   i f   ( s ˜ k + 1 , g k ) 1 , s ˜ k + 1 + g k ( 1 ( s ˜ k + 1 , g k ) ) / ( g k , g k ) ,   i f   ( s ˜ k + 1 , g k ) < 1 .

4. Perform a one-dimensional descent along the normalized direction w k + 1 = s k + 1 ( s k + 1 , s k + 1 ) 1 / 2 :
O M ( { x k ,   w k + 1 ,   g k ,   f k ,   h k } ; { γ k + 1 ,   f k + 1 ,   g k + 1 , γ ˜ k + 1 ,   g ˜ k + 1 ,   h k + 1 } ) .

5. Calculate the minimum point approximation x k + 1 = x k γ k + 1 w k + 1 .
6. If k > N or x k + 1 x k ε x or   g k + 1 ε g , then x* = xk+1, stop the algorithm; otherwise, k = k + 1, and go to step 2.
Consider ways to set parameters αk. With a numerical implementation with αk = 1, the number of iterations is either less than it is when αk = 0, or greater. Unplanned stops often occur in (44) due to the proximity to the zero of the ( p k , g ˜ k ) values. Denote by εp a value from a segment [0, 1]. In step 2 of the algorithm, we will use the following method for setting the parameter αk:
If   ( p k , p k ) ε p ( g ˜ k , g ˜ k ) ,   then   α k = 1 ε p ;   otherwise ,   α k = 1
We also used the second choice of parameter αk:
If   ( p k , p k ) ε p ( g ˜ k , g ˜ k ) ,   then   α k = 0 ;   otherwise ,   else   α k = 1
In the next section, we will select an appropriate parameter εp from the set ε p { 0.5 ;   0.1 ;   10 3 ;   10 4 ;   10 8 ;   10 15   } , with which the main computational experiment will be carried out.

7. Numerical Experiment

In Algorithm 3, the coefficients of decrease qm < 1 and increase qM > 1 of the initial step of the one-dimensional descent at iteration play a key role. Values qm close to 1 provide a low rate of step decrease and, accordingly, a low rate of method convergence. A small rate of step decrease eliminates the looping of the method due to the fact that the subgradients of the function involved in solving the inequalities are taken from a wider neighborhood. The choice of the parameter qm must be commensurate with the possible rate of convergence of the minimization method. The higher the speed capabilities of the algorithm, the smaller this parameter can be chosen. For example, in RSM with space dilation [71,73], qm = 0.8 is chosen. For smooth functions, the choice of this parameter is not critical and can be taken from the interval [0.8, 0.98]. The convergence rate practically does not depend on the step increase parameter, so it can be taken as q M [ 1.5 ,   3 ] .
The computational experiment is preceded by the choice of a parameter εp for the proposed Algorithm 3, which is used in Formulas (46) and (47). After that, we will conduct the main testing of the method with the selected parameter εp and its comparison with the known methods of conjugate gradients according to the following scheme:
  • Testing on smooth and non-smooth test functions with known characteristics of level surface elongation.
  • Testing on non-convex smooth and non-smooth test functions.
  • Testing on known smooth test functions.
We used the following methods:
  • AMMI—the distance-to-extremum relaxation method of minimization [10];
  • sub—Algorithm 3 with (46);
  • subm—Algorithm 3 with more precise one-dimensional descent and (46);
  • subg—Algorithm 3 with (47);
  • subgm—Algorithm 3 with (47) and exact one-dimensional descent;
  • sub0—Algorithm 3 with αk = 0;
  • sgrFR—the conjugate gradient method (Fletcher–Reeves method [3]) with exact one-dimensional descend;
  • sgr—method sgrFR with one-dimensional minimization procedure OM;
  • sgrPOL—the Polak–Ribiere–Polyak method [17];
  • sgrHS—the Hestenes–Stiefel method [15];
  • sgrDY—the Dai–Yuan method [18].
We used the following test groups. Each group has its own stopping criterion.
The first group of tests includes smooth and non-smooth functions with a maximum ratio of level surfaces elongation along the coordinate axes equal to 100:
f 1 ( x ) = i = 1 n x i 2 ( 1 + ( i 1 ) ( 100 1 ) / ( n 1 ) ) 2 ,   x 0 , i = 1 ,   x i = 0 ,   i = 0 , 1 , 2 , , n ,   ε = 10 8 f 2 ( x ) = i = 1 n | x i | ( 1 + ( i 1 ) ( 100 1 ) / ( n 1 ) ) ,   x 0 , i = 1 ,   x i = 0 ,   i = 0 , 1 , 2 , , n ,   ε = 10 4
The stopping criterion is
f ( x k ) f * ε
The second group of tests includes the Extended White and Holst function, which is not convex:
f G W ( x ) = i = 1 n / 2 [ 100 ( x 2 i x 2 i 1 3 ) 2 + ( 1 x 2 i 1 ) 2 ] ,   x 0 = ( 1.2 , 1 , , 1.2 , 1 ) ,   ε = 10 10
The non-smooth non-convex function derived from it:
f N W ( x ) = i = 1 n / 2 ( 10 | x 2 i x 2 i 1 3 | + | 1 x 2 i 1 | ) ,   x 0 = ( 1.2 , 1 , , 1.2 , 1 ) ,   ε = 10 4
The Raydan1 function is biased to obtain a new function with a zero minimum value:
f G R ( x ) = i = 1 n i 10 ( exp ( x i ) x i 1 ) ,   x 0 = (   2 ,   2 , ,   2 ) ,   ε = 10 10
We transform this function into a non-smooth one as follows:
f N R ( x ) = i = 1 n   a i 10 max   {   exp ( x i ) 1   ,   x i } ,   x 0 = (   1 ,   1 , ,   1 ) ,   ε = 10 4
a i = 1 + i 1 n 1 ( a max 1 ) ,   a max = 100 ,   i = 1 , 2 , , n
Here, the coefficients ai are bounded and not equal to i . Criterion (48) is used as a stopping criterion for these functions.
The third group of tests is composed of functions from [76]. We chose the functions that were difficult to minimize by gradient methods, which was revealed by the study in [53]. The stopping criteria:
f ( x k ) 10 6 ,   | f ( x k + 1 ) f ( x k ) | 1 + | f ( x k ) | 10 16
Several experiments were carried out for each function. The number of iterations and the number of function and gradient calculations were counted.
Denote:
  • S1 is a sum of resulting scores for dimensions 100, 200,…, and 1000;
  • S2 is a sum of resulting scores for dimensions 100, 500, 1000, 2000, 3000, 5000, 7000, 8000, 10,000, and 15,000.
The results for the dimensions Ti = 100,000 × i are given separately for changing i. We will use these notations for arbitrary functions.
For the functions from [76], the following notation is used: the Diagonal 9 function: (Diagonal9); the LIARWHD function (CUTE): (LIARWHD); the Quadratic QF2 function: (QF2); the DIXON3DQ function (CUTE): (DIXON3DQ); the TRIDIA function (CUTE): (TRIDIA); the Extended White and Holst function: (WHolst); and the Raydan 1 function: (Raydan1).
As a result for the methods, we will use it—the number of iterations, nfg—the number of functions and gradient calculations necessary to solve the problem with a given stopping criterion for a specific function.
Preliminarily, based on an experiment on some of the above functions, we study the dependence of Algorithm 3 (sub, subg, and sub0), using Formulas (46) or (47), on the parameter εp chosen from the set  ε p { 0.5 ;   0.1 ;   10 3 ;   10 4 ;   10 8 ;   10 15   } . The results for the costs (nfg—the number of calculations of the function and the gradient) are given in Table 1.
When αk = 0, the methods spend significantly more computations of the function and gradient. When αk = 1, the method is not operational due to unplanned stops. Therefore, these variants are not considered during testing. According to the results of Table 1, starting from εp = 10−3, the results stabilize and are almost always the best. In further studies, we used εp = 10−8, which reflects some geometric mean of the effective interval for both methods (46) and (47). Given the equivalence of the sub and subg methods, we carried out subsequent studies with only one of them for a given objective function.
The results for the first group of tests are presented in Table 2. The cell shows the number of iterations (upper number) and the number of function and gradient evaluations (lower number).
For example, part of the calculations on function f1 was carried out for the sgr method, which is the sgrFR method with the one-dimensional OM procedure. The results here are two times worse than for sgrFR, and for other functions, it was sometimes not possible to solve the problem. This result is presented in order to emphasize the effectiveness of choosing the descent direction in the new method, where, in contrast to the CGM, it is possible to obtain a rapidly converging method for inexact one-dimensional descent, which is important when solving non-smooth minimization problems. To solve smooth problems, there are many efficient variants of the CGM.
Here, we should note the quality of the descent direction of the new method. With inexact one-dimensional descent, the subg cost is less than that of the sgrFR method. The method proposed in the paper is stable with both minimization procedures, and its results are almost equivalent to the results of the sgrFR method, which is a finite method for minimizing quadratic functions. In this case, the sgrFR method acts as a reference method. Since the results for other CGMs on this function are completely identical, we do not present them here.
On a non-smooth function, the AMMI method [10] acts as a reference, in which only one calculation of the function and gradient is required at each iteration. As follows from the results of Table 2, the number of iterations on large-dimensional functions differs insignificantly. The running cost of calculating the function and the gradient in the transition from a smooth quadratic function to a non-smooth one for functions at n = 500,000 with equal proportions of the level line elongation for these methods is 119,063/1343 = 88.65 for the subg method, and 41,528/753 = 55.15 for the AMMI method. Considering that the conditions here are ideal for the AMMI method, since the minimum value of the function and its degree of homogeneity is known, and the calculations were carried out for functions at high dimensionalities, such a result for the subg method can be considered excellent.
The minimization results for the second group of tests are given in Table 3.
On smooth variants of functions, the subgm and subg methods are commensurate with sgrFR in terms of the cost of calculating the number of functions and gradient values. Therefore, taking into account these and previous tests, along with the CGM, when minimizing smooth functions, these methods can be used.
The subg method also handles non-smooth variants of functions (function fNW is non-smooth and non-convex).
The minimization results for the third group of smooth test functions are given in Table 4. A dash means that no calculations were made. The sign NaN marks the problems that could not be solved by this method.
Based on the results for this group of tests, we can conclude that subm and sub methods are applicable for minimizing smooth large-scale functions.
In general, the following conclusions can be drawn from the results of the experiment:
  • The choice of the parameters of the method, which ensures its stable operation, is carried out.
  • On tests with known parameters of level surface elongation, the behavior of the method and its comparison with other methods, confirming its effectiveness, were studied.
  • The method was studied on non-smooth, including non-convex, functions.
  • On commonly accepted tests of smooth functions, the method was compared with variants of the CGM, which enables us to conclude that it is applicable along with the CGM for minimizing smooth functions.

8. Conclusions

In our work, we proposed a family of iterative methods for solving systems of inequalities, which are generalizations of the previously proposed algorithms. The developed methods were substantiated theoretically and the estimates of their convergence rate were obtained. On this basis, a family of relaxation subgradient minimization algorithms was formulated and justified, which is applicable to solving non-convex problems as well.
According to the properties of convergence on quadratic functions of high dimension, with large spreads of eigenvalues, the developed algorithm is equivalent to the conjugate gradient method. The new method enables us to solve non-smooth non-convex large-scale minimization problems with a high degree of elongation of level surfaces.

Author Contributions

Conceptualization, V.K.; methodology, V.M., E.T. and P.S.; software, V.K.; validation, L.K., E.T. and P.S.; formal analysis, P.S.; investigation, V.M.; resources, V.M.; data curation, P.S.; writing—original draft preparation, V.K.; writing—review and editing, E.T., P.S., A.P. and L.K.; visualization, V.K. and E.T.; supervision, V.K. and A.P.; project administration, L.K.; funding acquisition, A.P. and L.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Science and Higher Education of the Russian Federation (project no.: FEFE-2023-0004).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Shor, N. Minimization Methods for Nondifferentiable Functions; Springer: Berlin, Germany, 1985. [Google Scholar]
  2. Polyak, B. A general method for solving extremum problems. Sov. Math. Dokl. 1967, 8, 593–597. [Google Scholar]
  3. Polyak, B.T. Introduction to Optimization; Optimization Software: New York, NY, USA, 1987. [Google Scholar]
  4. Golshtein, E.; Nemirovsky, A.; Nesterov, Y. Level method, its generalizations and applications. Econ. Math. Methods 1995, 31, 164–180. [Google Scholar]
  5. Nesterov, Y. Universal gradient methods for convex optimization problems. Math. Program. Ser. A 2015, 152, 381–404. [Google Scholar] [CrossRef] [Green Version]
  6. Gasnikov, A.; Nesterov, Y. Universal method for stochastic composite optimization problems. Comput. Math. Math. Phys. 2018, 58, 48–64. [Google Scholar] [CrossRef]
  7. Nemirovsky, A.; Yudin, D. Problem Complexity and Method Efficiency in Optimization; Wiley: Chichester, UK, 1983. [Google Scholar]
  8. Shor, N.Z. Application of the gradient descent method for solving network transportation problems. In Materials of the Seminar of Theoretical and Applied Issues of Cybernetics and Operational Research; USSR: Kyiv, Ukraine, 1962; pp. 9–17. (In Russian) [Google Scholar]
  9. Polyak, B. Optimization of non-smooth composed functions. USSR Comput. Math. Math. Phys. 1969, 9, 507–521. [Google Scholar]
  10. Krutikov, V.; Samoilenko, N.; Meshechkin, V. On the properties of the method of minimization for convex functions with relaxation on the distance to extremum. Autom. Remote Control 2019, 80, 102–111. [Google Scholar] [CrossRef]
  11. Wolfe, P. Note on a method of conjugate subgradients for minimizing nondifferentiable functions. Math. Program. 1974, 7, 380–383. [Google Scholar] [CrossRef]
  12. Lemarechal, C. An extension of Davidon methods to non-differentiable problems. Math. Program. Study 1975, 3, 95–109. [Google Scholar]
  13. Demyanov, V. Nonsmooth Optimization. In Nonlinear Optimization; Lecture Notes in Mathematics; Di Pillo, G., Schoen, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; Volume 1989, pp. 55–163. [Google Scholar]
  14. Himmelblau, D.M. Applied Nonlinear Programming; McGraw-Hill: Dallas, TX, USA, 1972. [Google Scholar]
  15. Hestenes, M.R.; Stiefel, E. Methods of Conjugate Gradients for Solving Linear Systems. J. Res. Natl. Bur. Stand. 1952, 49, 409. [Google Scholar] [CrossRef]
  16. Fletcher, R.; Reeves, C.M. Function minimization by conjugate gradients. Comput. J. 1964, 7, 149–154. [Google Scholar] [CrossRef] [Green Version]
  17. Polyak, B.T. The conjugate gradient method in extreme problems. USSR Comput. Math. Math. Phys. 1969, 9, 94–112. [Google Scholar] [CrossRef]
  18. Dai, Y.-H.; Yuan, Y. An efficient hybrid conjugate gradient method for unconstrained optimization. Ann. Oper. Res. 2001, 103, 33–34. [Google Scholar] [CrossRef]
  19. Powell, M.J.D. Restart Procedures of the Conjugate Gradient Method. Math. Program. 1977, 12, 241–254. [Google Scholar] [CrossRef]
  20. Miele, A.; Cantrell, J.W. Study on a memory gradient method for the minimization of functions. J. Optim. Theory Appl. 1969, 3, 459–470. [Google Scholar] [CrossRef]
  21. Cragg, E.E.; Levy, A.V. Study on a supermemory gradient method for the minimization of functions. J. Optim. Theory Appl. 1969, 4, 191–205. [Google Scholar] [CrossRef]
  22. Hanafy, A.A.R. Multi-search optimization techniques. Comput. Methods Appl. Mech. Eng. 1976, 8, 193–200. [Google Scholar] [CrossRef]
  23. Narushima, Y.; Yabe, H. Global convergence of a memory gradient method for unconstrained optimization. Comput. Optim. Appl. 2006, 35, 325–346. [Google Scholar] [CrossRef]
  24. Narushima, Y. A nonmonotone memory gradient method for unconstrained optimization. J. Oper. Res. Soc. Jpn. 2007, 50, 31–45. [Google Scholar] [CrossRef]
  25. Gui, S.; Wang, H. A Non-monotone Memory Gradient Method for Unconstrained Optimization. In Proceedings of the 2012 Fifth International Joint Conference on Computational Sciences and Optimization, Harbin, China, 23–26 June 2012; pp. 385–389. [Google Scholar] [CrossRef]
  26. Rong, Z.; Su, K. A New Nonmonotone Memory Gradient Method for Unconstrained Optimization. Math. Aeterna 2015, 5, 635–647. [Google Scholar]
  27. Jiang, X.; Jian, J. Improved Fletcher-Reeves and Dai-Yuan conjugate gradient methods with the strong Wolfe line search. J. Comput. Appl. Math. 2019, 348, 525–534. [Google Scholar] [CrossRef]
  28. Xue, W.; Wan, P.; Li, Q.; Zhong, P.; Yu, G.; Tao, T. An online conjugate gradient algorithm for large-scale data analysis in machine learning. AIMS Math. 2021, 6, 1515–1537. [Google Scholar] [CrossRef]
  29. Johnson, R.; Zhang, T. Accelerating stochastic gradient descent using predictive variance reduction. In Advances in Neural Information Processing Systems; Burges, C.J., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q., Eds.; The MIT Press: Cambridge, MA, USA, 2013; Volume 26. [Google Scholar]
  30. Dai, Y.-H.; Liao, L.-Z. New conjugacy conditions and related nonlinear conjugate gradient methods. Appl. Math. Optim. 2001, 43, 87–101. [Google Scholar] [CrossRef]
  31. Cheng, Y.; Mou, Q.; Pan, X.; Yao, S. A sufficient descent conjugate gradient method and its global convergence. Optim. Methods Softw. 2016, 31, 577–590. [Google Scholar] [CrossRef]
  32. Lu, J.; Li, Y.; Pham, H. A Modified Dai–Liao Conjugate Gradient Method with a New Parameter for Solving Image Restoration Problems. Math. Probl. Eng. 2020, 2020, 6279543. [Google Scholar] [CrossRef]
  33. Zheng, Y.; Zheng, B. Two new Dai-Liao-type conjugate gradient methods for unconstrained optimization problems. J. Optim. Theory Appl. 2017, 175, 502–509. [Google Scholar] [CrossRef]
  34. Ivanov, B.; Milovanović, G.V.; Stanimirović, P.S.; Awwal, A.M.; Kazakovtsev, L.A.; Krutikov, V.N. A Modified Dai–Liao Conjugate Gradient Method Based on a Scalar Matrix Approximation of Hessian and Its Application. J. Math. 2023, 2023, 9945581. [Google Scholar] [CrossRef]
  35. Gao, T.; Gong, X.; Zhang, K.; Lin, F.; Wang, J.; Huang, T.; Zurada, J.M. A recalling-enhanced recurrent neural network: Conjugate gradient learning algorithm and its convergence analysis. Inf. Sci. 2020, 519, 273–288. [Google Scholar] [CrossRef]
  36. Abubakar, A.B.; Kumam, P.; Mohammad, H.; Awwal, A.M.; Sitthithakerngkiet, K. A Modified Fletcher–Reeves Conjugate Gradient Method for Monotone Nonlinear Equations with Some Applications. Mathematics 2019, 7, 745. [Google Scholar] [CrossRef] [Green Version]
  37. Wang, B.; Ye, Q. Stochastic Gradient Descent with Nonlinear Conjugate Gradient-Style Adaptive Momentum. 2020. Available online: https://arxiv.org/pdf/2012.02188.pdf (accessed on 20 February 2023).
  38. Moller, M.F. A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw. 1993, 6, 525–533. [Google Scholar] [CrossRef]
  39. Sato, H. Riemannian Conjugate Gradient Methods: General Framework and Specific Algorithms with Convergence Analyses. 2021. Available online: https://arxiv.org/abs/2112.02572 (accessed on 20 February 2023).
  40. Yang, Z. Adaptive stochastic conjugate gradient for machine learning. Expert Syst. Appl. 2022, 206, 117719. [Google Scholar] [CrossRef]
  41. Jin, X.B.; Zhang, X.Y.; Huang, K.; Geng, G.G. Stochastic conjugate gradient algorithm with variance reduction. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 1360–1369. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. Jiang, H.; Wilford, P. A stochastic conjugate gradient method for the approximation of functions. J. Comput. Appl. Math. 2012, 236, 2529–2544. [Google Scholar] [CrossRef] [Green Version]
  43. Ou, Y.; Zhou, X. A nonmonotone scaled conjugate gradient algorithm for large-scale unconstrained optimization. Int. J. Comput. Math. 2018, 95, 2212–2228. [Google Scholar] [CrossRef]
  44. Golub, G.H.; Ye, Q. Inexact Preconditioned Conjugate Gradient Method with Inner-Outer Iteration. SIAM J. Sci. Comput. 1999, 21, 1305–1320. [Google Scholar] [CrossRef]
  45. Adya, S.; Palakkode, V.; Tuzel, O. Nonlinear Conjugate Gradients for Scaling Synchronous Distributed DNN Training. 2018. Available online: https://arxiv.org/abs/1812.02886 (accessed on 20 February 2023).
  46. Liu, Z.; Dai, Y.-H.; Liu, H. A Limited Memory Subspace Minimization Conjugate Gradient Algorithm for Unconstrained Optimization. 2022. Available online: https://optimization-online.org/2022/01/8772/ (accessed on 20 February 2023).
  47. Li, X.; Shi, J.; Dong, X.; Yu, J. A new conjugate gradient method based on Quasi-Newton equation for unconstrained optimization. J. Comput. Appl. Math. 2019, 350, 372–379. [Google Scholar] [CrossRef]
  48. Amini, K.; Faramarzi, P. Global convergence of a modified spectral three-term CG algorithm for nonconvex unconstrained optimization problems. J. Comput. Appl. Math 2023, 417, 114630. [Google Scholar] [CrossRef]
  49. Burago, N.G.; Nikitin, I.S. Matrix-Free Conjugate Gradient Implementation of Implicit Schemes. Comput. Math. Math. Phys. 2018, 58, 1247–1258. [Google Scholar] [CrossRef]
  50. Sulaiman, I.M.; Malik, M.; Awwal, A.M.; Kumam, P.; Mamat, M.; Al-Ahmad, S. On three-term conjugate gradient method for optimization problems with applications on COVID-19 model and robotic motion control. Adv. Cont. Discr. Mod. 2022, 2022, 1. [Google Scholar] [CrossRef]
  51. Yu, X.; Nikitin, V.; Ching, D.J.; Aslan, S.; Gürsoy, D.; Biçer, T. Scalable and accurate multi-GPU-based image reconstruction of large-scale ptychography data. Sci. Rep. 2022, 2, 5334. [Google Scholar] [CrossRef]
  52. Washio, T.; Cui, X.; Kanada, R.; Okada, J.; Sugiura, S.; Okuno, Y.; Takada, S.; Hisada, T. Using incomplete Cholesky factorization to increase the time step in molecular dynamics simulations. J. Comput. Appl. Math. 2022, 415, 114519. [Google Scholar] [CrossRef]
  53. Stanimirović, P.S.; Ivanov, B.; Ma, H.; Mosic, D. A survey of gradient methods for solving nonlinear optimization problems. Electron. Res. Arch. 2020, 28, 1573–1624. [Google Scholar] [CrossRef]
  54. Khan, W.A. Numerical simulation of Chun-Hui He’s iteration method with applications in engineering. Int. J. Numer. Method 2022, 32, 944–955. [Google Scholar] [CrossRef]
  55. Khan, W.A.; Arif, M.; Mohammed, M.; Farooq, U.; Farooq, F.B.; Elbashir, M.K.; Rahman, J.U.; AlHussain, Z.A. Numerical and Theoretical Investigation to Estimate Darcy Friction Factor in Water Network Problem Based on Modified Chun-Hui He’s Algorithm and Applications. Math. Probl. Eng. 2022, 2022, 8116282. [Google Scholar] [CrossRef]
  56. He, C.H. An introduction to an ancient Chinese algorithm and its modification. Int. J. Numer. Method 2016, 26, 2486–2491. [Google Scholar] [CrossRef]
  57. Gong, C.M.; Peng, J.; Wang, J. Tropical algebra for noise removal and optimal control. J. Low Freq. Noise 2023, 42, 317–324. [Google Scholar] [CrossRef]
  58. Kibardin, V.M. Decomposition into functions in the minimization problem. Automat. Remote Control 1980, 40, 1311–1323. [Google Scholar]
  59. Solodov, M.V.; Zavriev, S.K. Error stability properties of generalized gradient-type algorithms. J. Optim. Theory Appl. 1998, 98, 663–680. [Google Scholar] [CrossRef] [Green Version]
  60. Nedic, A.; Bertsekas, D.P. Incremental subgradient methods for Nondifferentiable optimization. Siam J. Optim. 1999, 12, 109–138. [Google Scholar] [CrossRef] [Green Version]
  61. Nedic, A.; Bertsekas, D.P. Convergence rate of incremental subgradient algorithms. In Stochastic Optimization: Algorithms and Applications; Uryasev, S., Pardalos, P.M., Eds.; Springer: Boston, MA, USA, 2001; Volume 54. [Google Scholar] [CrossRef]
  62. Ben-Tal, A.; Margalit, T.; Nemirovski, A. The ordered subsets mirror descent optimization method and its use for the positron emission tomography reconstruction. In Proceedings of the 2000 Haifa Workshop on Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications; Butnariu, D., Censor, Y., Reich, S., Eds.; Studies in Computational Mathematics; Elsevier: Amsterdam, The Netherlands, 2000. [Google Scholar]
  63. Duchi, J.; Hazan, E.; Singer, Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. 2011, 12, 2121–2159. [Google Scholar]
  64. Nimana, N.; Farajzadeh, A.P.; Petrot, N. Adaptive subgradient method for the split quasi-convex feasibility problems. Optimization 2016, 65, 1885–1898. [Google Scholar] [CrossRef]
  65. Belyaeva, I.; Long, Q.; Adali, T. Inexact Proximal Conjugate Subgradient Algorithm for fMRI Data Completion. In Proceedings of the 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 18–21 January 2021; pp. 1025–1029. [Google Scholar] [CrossRef]
  66. Li, Q.; Shen, L.; Zhang, N.; Zhou, J. A proximal algorithm with backtracked extrapolation for a class of structured fractional programming. Appl. Comput. Harmon. Anal. 2022, 56, 98–122. [Google Scholar] [CrossRef]
  67. Chiou, S.-W. A subgradient optimization model for continuous road network design problem. Appl. Math. Model. 2009, 33, 1386–1396. [Google Scholar] [CrossRef]
  68. Mirone, A.; Paleo, P. A conjugate subgradient algorithm with adaptive preconditioning for the least absolute shrinkage and selection operator minimization. Comput. Math. Math. Phys. 2017, 57, 739–748. [Google Scholar] [CrossRef]
  69. Konnov, I. A Non-monotone Conjugate Subgradient Type Method for Minimization of Convex Functions. J. Optim. Theory Appl. 2020, 184, 534–546. [Google Scholar] [CrossRef] [Green Version]
  70. Krutikov, V.; Gutova, S.; Tovbis, E.; Kazakovtsev, L.; Semenkin, E. Relaxation Subgradient Algorithms with Machine Learning Procedures. Mathematics 2022, 10, 3959. [Google Scholar] [CrossRef]
  71. Krutikov, V.N.; Stanimirović, P.S.; Indenko, O.N.; Tovbis, E.M.; Kazakovtsev, L.A. Optimization of Subgradient Method Parameters Based on Rank-Two Correction of Metric Matrices. J. Appl. Ind. Math. 2022, 16, 427–439. [Google Scholar] [CrossRef]
  72. Tsypkin, Y.Z. Foundations of the Theory of Learning Systems; Academic Press: New York, NY, USA, 1973. [Google Scholar]
  73. Krutikov, V.N.; Petrova, T. A new relaxation method for nondifferentiable minimization. Mat. Zap. Yakutsk. Gos. Univ. 2001, 8, 50–60. (In Russian) [Google Scholar]
  74. Krutikov, V.N.; Vershinin, Y.N. The subgradient multistep minimization method for nonsmooth high-dimensional problems. Vestnik Tomskogo Gosudarstvennogo Universiteta. Mat. I Mekhanika 2014, 3, 5–19. (In Russian) [Google Scholar]
  75. Kaczmarz, S. Approximate solution of systems of linear equations. Int. J. Control 1993, 57, 1269–1271. [Google Scholar] [CrossRef]
  76. Andrei, N. An Unconstrained Optimization Test Functions Collection. Available online: http://www.ici.ro/camo/journal/vol10/v10a10.pdf (accessed on 20 February 2023).
Figure 1. The set G belongs to the hyperplane.
Figure 1. The set G belongs to the hyperplane.
Mathematics 11 02264 g001
Figure 2. Projections of approximations sk+1 in the plane of vectors gk and s*.
Figure 2. Projections of approximations sk+1 in the plane of vectors gk and s*.
Mathematics 11 02264 g002
Figure 3. Projections of approximations in the plane of vectors gk and gk−1.
Figure 3. Projections of approximations in the plane of vectors gk and gk−1.
Mathematics 11 02264 g003
Figure 4. The set G and its characteristics.
Figure 4. The set G and its characteristics.
Mathematics 11 02264 g004
Table 1. Results of S1 calculations for Algorithm 3 with different values of εp parameters.
Table 1. Results of S1 calculations for Algorithm 3 with different values of εp parameters.
FunctionMethodαk = 0εp
0.510−110−310−410−810−15
Diagonal9 sub 25,19415,66612,39611,62711,62711,62711,627
Diagonal9 subg25,19424,49412,67811,62711,62711,62711,627
f 1 sub12,451934792989298929892989298
f 1 subg 12,451931692989298929892989298
f N R sub 64,05117,74017,74917,74917,74917,74917,749
f N R subg 64,05117,80317,74917,74917,74917,74917,749
f 2 sub 291,378103,681103,600103,600103,600103,600103,600
f 2 subg 291,378103,518103,618103,618103,618103,618103,618
Table 2. Results for the first group of tests (upper number is the number of iterations, lower number is the number of function and gradient evaluations).
Table 2. Results for the first group of tests (upper number is the number of iterations, lower number is the number of function and gradient evaluations).
FunctionMethodS1S2T1T2T3T4T5
f 1 subg5512
9308
6141
10,442
728
1189
747
1229
754
1268
760
1312
766
1343
f 1 subgm4679
9426
5759
11,603
713
1438
730
1472
740
1492
748
1508
753
1519
f 1 sgrPOL5319
10,706
5985
12,055
713
1438
730
1472
740
1492
748
1508
753
1519
f 1 sgrFR4673
9414
5756
11,597
713
1438
730
1472
740
1492
748
1508
753
1519
f 1 sgr--1515
2373
1580
2484
1597
2525
1626
2591
1647
2624
f 1 AMMI49806347713730740748753
f 2 subg144,036
288,123
153,651
307,413
20,148
40,345
58,196
116,463
58,978
118,043
59,758
119,604
59,481
119,063
f 2 AMMI42,38294,27824,56335,78831,39533,51741,528
Table 3. Results for the second group of tests (upper number is the number of iterations, lower number is the number of function and gradient evaluations).
Table 3. Results for the second group of tests (upper number is the number of iterations, lower number is the number of function and gradient evaluations).
FunctionMethodS1S2T1T2T3T4T5
f G W subg813
2397
889
2669
102
306
112
348
177
567
224
224
134
421
f G W subgm223
568
199
533,
20
51
26
71
27
73
22
60
19
49
f G W sgrFR772
1643
794
1705
55
125
53
121
41
97
48
109
42
97
f N W subg226,786
454,889
247,632
497,799
33,801
68,149
29,192
58,640
33,209
66,776
33,784
67,926
34,230
68,818
f G R subg1365
2313
3927
6594
1823
3168
2546
4431
3318
5826
3610
6333
3883
6808
f G R subgm1664
3394
4898
9879
2491
4994
3232
6674
4449
8909
4356
8725
5089
10,191
f G R sgrFR1372
2803
4432
8938
2813
5637
3994
7999
4086
8003
6059
12,131
4422
8855
f N R subg31,592
63,235
34,625
69,341
38,959
77,939
39,706
79,436
40,009
80,047
40,203
80,439
40,591
81,213
Table 4. Results for the third group of tests (upper number is the number of iterations, lower number is the number of function and gradient evaluations).
Table 4. Results for the third group of tests (upper number is the number of iterations, lower number is the number of function and gradient evaluations).
FunctionMethodS1S2T1T3T5
Diagonal9sgrFR23,431
46,941
83,952
167,999
9322
18,660
31,151
62,317
48,236
97,511
Diagonal9sgrPOL9714
19,518
10,743
21,590
2912
5841
5805
11,626
6642
13,304
Diagonal9sgrHS5554
11,190
10,806
21,715
3221
6459
6396
12,812
6640
13,300
Diagonal9sgrDY9343
18,763
41,835
83,766
9409
18,834
20,408
40,830
NaN
Diagonal9subm4872
9817
9768
19,629
3931
7889
7262
14,553
10,083
20,197
Diagonal9sub5912
11,627
10,345
19,324
4318
7668
9214
16,589
10,866
19,781
LIARWHDsgrFR1244
2569
947
1996
72,001
144,015
7412
14,836
146
306
LIARWHDsgrPOL207
485
247
587
52
121
31
74
80
173
LIARWHDsgrHS166
404
183
462
32
76
21
53
17
47
LIARWHDsgrDY1275
2629
1069
2239
210
435
176
364
182
378
LIARWHDsubm228
544
269
640
24
64
30
75
37
87
LIARWHDsub644
1325
719
1498
---
Quadratic QF2sgrFR14,320
28,677
28,988
58,022
11,233
22,473
13,222
26,452
29,820
59,648
Quadratic QF2sgrPOL2104
4246
6546
13,139
3196
6399
7056
14,120
8392
16,792
Quadratic QF2sgrHS2161
4359
6574
13,195
5706
11,420
6096
12,200
11,337
22,683
Quadratic QF2sgrDY5231
10,494
48,603
97,248
72,001
144,009
31,542
63,092
36,438
72,884
Quadratic QF2subm4161
8360
13,920
27,890
4687
9382
15,664
31,337
21,891
43,792
Quadratic QF2sub2453
4227
6656
11,227
3156
5304
4977
8247
7496
12,540
DIXON3DQsgrFR2750
5538
25,800
51,645
50,001
100,008
--
DIXON3DQsgrPOL2750
5538
25,800
51,645
50,001
100,008
--
DIXON3DQsgrHS2750
5538
25,800
51,645
50,001
100,008
--
DIXON3DQsgrDY2750
5538
25,800
51,645
50,001
100,008
--
DIXON3DQsubm2750
5538
25,800
51,645
50,001
100,008
--
DIXON3DQsub13,179
22,335
249,981
417,396
NaN--
TRIDIAsgrFR2390
4815
7190
14,424
3746
7499
6539
13,086
8470
16,949
TRIDIAsgrPOL2392
4819
7191
14,426
3746
7499
6539
13,086
8470
16,949
TRIDIAsgrHS2389
4813
7189
14,422
3746
7499
6539
13,086
8469
16,947
TRIDIAsgrDY2395
4825
7192
14,428
3747
7501
6539
13,086
8470
16,949
TRIDIAsubm2397
4829
7197
14,438
3747
7501
6540
13,088
8470
16,949
TRIDIAsub4413
7405
16,905
28,169
11,922
19,700
23,467
38,984
32,270
53,801
WHolstsgrFR1429
2957
1624
3365
65
145
61
137
49
111
WHolstsgrPOL220
548
240
587
23
59
22
56
26
62
WHolstsgrHS190
483
192
485
25
64
22
58
16
43
WHolstsgrDY676
1461
594
1310
52
120
387
789
63
138
WHolstsubm268
659
235
605
27
65
34
87
23
57
WHolstsub254
616
270
664
108
226
179
419
289
661
Raydan1sgrFR1708
3475
5863
11,800
4013
8037
10,548
21,109
9273
18,558
Raydan1sgrPOL1691
3442
4906
9887
2533
5077
4489
8989
5807
11,626
Raydan1sgrHS1731
3524
4871
9818
2593
5198
4501
9012
8041
16,097
Raydan1sgrDY2371
4803
6321
12,717
6104
12,219
NaNNaN
Raydan1subm2100
4266
6475
13,033
3541
7094
6748
13,507
8101
16,215
Raydan1sub1722
3032
5756
9876
3153
5432
6175
10,918
9552
17,070
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tovbis, E.; Krutikov, V.; Stanimirović, P.; Meshechkin, V.; Popov, A.; Kazakovtsev, L. A Family of Multi-Step Subgradient Minimization Methods. Mathematics 2023, 11, 2264. https://doi.org/10.3390/math11102264

AMA Style

Tovbis E, Krutikov V, Stanimirović P, Meshechkin V, Popov A, Kazakovtsev L. A Family of Multi-Step Subgradient Minimization Methods. Mathematics. 2023; 11(10):2264. https://doi.org/10.3390/math11102264

Chicago/Turabian Style

Tovbis, Elena, Vladimir Krutikov, Predrag Stanimirović, Vladimir Meshechkin, Aleksey Popov, and Lev Kazakovtsev. 2023. "A Family of Multi-Step Subgradient Minimization Methods" Mathematics 11, no. 10: 2264. https://doi.org/10.3390/math11102264

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop