Next Article in Journal
Integrating Fuzzy C-Means Clustering and Explainable AI for Robust Galaxy Classification
Previous Article in Journal
A Novel Hybrid Model (EMD-TI-LSTM) for Enhanced Financial Forecasting with Machine Learning
Previous Article in Special Issue
An Efficient Method for Solving Problems of Acoustic Scattering on Three-Dimensional Transparent Structures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Method for Transforming Non-Convex Optimization Problem to Distributed Form

by
Oleg O. Khamisov
1,
Oleg V. Khamisov
1,*,
Todor D. Ganchev
2,* and
Eugene S. Semenkin
3,*
1
Depertment of Applied Mathematics, Melentiev Energy Systems Institute, 664033 Irkutsk, Russia
2
Department of Computer Science and Engineering, Technical University of Varna, 9010 Varna, Bulgaria
3
Scientific and Educational Center “Artificial Intelligence Technologies”, Baumann Moscow State Technical University, 105005 Moscow, Russia
*
Authors to whom correspondence should be addressed.
Mathematics 2024, 12(17), 2796; https://doi.org/10.3390/math12172796 (registering DOI)
Submission received: 29 June 2024 / Revised: 23 August 2024 / Accepted: 29 August 2024 / Published: 9 September 2024

Abstract

:
We propose a novel distributed method for non-convex optimization problems with coupling equality and inequality constraints. This method transforms the optimization problem into a specific form to allow distributed implementation of modified gradient descent and Newton’s methods so that they operate as if they were distributed. We demonstrate that for the proposed distributed method: (i) communications are significantly less time-consuming than oracle calls, (ii) its convergence rate is equivalent to the convergence of Newton’s method concerning oracle calls, and (iii) for the cases when oracle calls are more expensive than communication between agents, the transition from a centralized to a distributed paradigm does not significantly affect computational time. The proposed method is applicable when the objective function is twice differentiable and constraints are differentiable, which holds for a wide range of machine learning methods and optimization setups.

1. Introduction

In modern society, digital technologies play an essential role in organizing our work and daily routine. Ubiquitous computing and digital technologies enable us to solve a wide range of complex problems in such important fields as ecology [1,2] and medicine [3,4].
The complexity of creating decision support systems in a digital environment requires the use of advanced technologies for designing and optimizing intelligent information processing systems. For example, within a holistic approach to integrating computational intelligence systems and human expert knowledge [5], it is possible to automatically design machine learning models with self-tuning adaptive stochastic optimization algorithms [6,7]. In this case, the processed data can remain on the problem owner’s servers, which ensures trust and allows us to remain within a federated approach to learning, and the resulting models will be interpretable and explainable [8]. However, very large-scale optimization problems arising under such conditions require special computations decomposition methods [9]. At the same time, in many cases, optimization problems of this kind have properties that allow the use of rigorous mathematical methods in their solution, which makes it possible to effectively use hybrid approaches [10]. In this regard, along with the improvement in adaptive methods of computational intelligence, the evolution of traditional optimization methods is of great importance. The aim of such advancements could be focused on adaption to problems of extremely high dimensionality and federated learning through the decentralization of work and to use such properties of theirs as guaranteed convergence to the optimum at high speed, which is fundamentally important in the tasks under consideration.
Specifically, in the following, we propose a novel decentralized optimization method for non-convex optimization problems with a separable objective function and coupling equality and inequality constraints. Under standard assumptions for distributed optimization [11,12], the problem has to be solved by a set of agents communicating over a connected graph. These agents are expected to communicate synchronously and can transmit real-valued numbers to adjacent agents. Communications are performed synchronously, and communication delays and packet losses are ignored. In addition to standard conditions, agents cannot share their decision variables and objective functions as they are considered to be private information.
Decentralized optimization already proved to be an essential instrument in a wide variety of applications. Namely, application in optimal transport [13] including coordination of mobile autonomous agents [14,15] and railway traffic [16,17], power systems control [18,19] with demand response [20] as well as data analysis in sensor networks [21,21]. Finally, decentralized optimization gains increasing popularity in federated learning [22] and support vector machines [23].

1.1. Related Work

While decentralized optimization has many applications of practical relevance, most of the corresponding results are dedicated to convex or strictly convex optimization. A comprehensive survey covering these areas can be found in [24]. The majority of these methods can be separated into primal [13,25], dual [26] or ADMM-based approaches [27,28]. Additionally, there exists a set of works utilizing the primal–dual approach [29,30].
The literature dedicated to non-convex or non-linear constraints is significantly more scarce. One of the proposed approaches is the application of SQP with the inner ADMM method [31]. However, in this work, coupling constraints are linear. Lagrangian methods for polynomial objective and equality constraints are presented in [32], and non-linear coupling constraints are considered. However, each constraint is associated with one of the agents and coupling is present only with the variables of adjacent agents.
Finlay, there exist several works that consider convex optimization problems with separable objective functions such that decision variables and corresponding summands of objective functions are considered private for each agent and cannot be exchanged with other participants [19,33]

1.2. Contribution

Here, we propose a novel distributed optimization algorithm for non-convex optimization problems with equality and inequality constraints. It is assumed that the objective function is twice differentiable and the constraints are differentiable. In addition, communications are significantly less time consuming than oracle calls. Under these assumptions, we show that
  • The proposed method can be applied to any optimization problem with non-convex separable objective function and coupling constraints.
  • Its convergence rate is equivalent to the convergence of Newton’s method with respect to oracle calls.
  • Decision variables, cost and constraint functions are not exchanged between agents.
The theoretical results are supported by numerical experiments.

1.3. Paper Organization

The remainder of this article is organized as follows. Section 2 introduces the problem statement. Section 3 is dedicated to the reformulation of optimization problems with equality constraints. Section 4 outlines the distributed gradient descent algorithm. In Section 5, a problem with equality and inequality constraint and its equivalent formulation is presented. Section 6 outlines the distributed Newton method. Finally, Section 7 presents the Arrow–Hurwicz method and a numerical example.

1.4. Notations

Let 1 K denote the vector of ones of the size K; I q is the identity q × q matrix. Operator vec is the vectorization operator for a matrix A; ker A is the matrix kernel. For two matrices A and B, the Kronecker product is denoted by A B , and diag ( A , B ) means the extended matrix of the corresponding size with A and B as the blocks on the main diagonal. For a twice differentiable function f : R n R , f and 2 f are the gradient and Hessian matrix, respectively. For a vector function g : R n R m , its Jacobian matrix is denoted by J g . For a vertex (agent) i in a communication graph, a set of adjacent vertices is defined by Adj ( i ) .

2. Problem Statement

Let us consider a non-convex optimization problem with a coupling objective function and constraints that must be solved by a multi-agent connected network with N vertices (agents). The optimization problem has the following form:
min x F ( x ) = i = 1 N f i ( x i ) ,
subject to
G ( x ) = i = 1 N g i ( x i ) = 0 ,
H ( x ) = i = 1 N h i ( x i ) 0 .
Here, the vector of objective variables x R n is separated into subvectors of local variables x i R n i , i { 1 , , N } , n 1 + n 2 + + n N = n . All functions f i : R n i R , g i : R n i R m ˜ and h i : R n i R m ^ , i { 1 , , N } are smooth.
Here, we postulate that the Problem (1) has to be solved in the distributed way, which brings the following perspective. It is assumed that each subvector x i belongs to an agent i and cannot be shared with the other agents, and the agent network is defined by a connected graph with the Laplacian matrix L R N × N . Every node in the graph represents some agent. Communication in the network is allowed only between neighboring vertices (agents). An agent i is characterized by its objective function f i , equality constraint function g i and inequality constrain function h i . The objective function F is separable, so the main difficulty in deriving a distributed version of problem (1) is given by constraints (1b) and (1c). They are coupling (non-local) even though the constraint functions G and H are also separable.
Subsequently, we set the goal to develop an algorithm and present a reduction, which, for an arbitrary problem (1), creates an auxiliary problem, such that
  • The auxiliary problem has the same solution as (1);
  • Methods of gradient descent, gradient projection and quasi-Newton methods will operate as distributed optimization methods when applied to the auxiliary problem without any modification (1):

3. Optimization Problem with Equality Constraints

Let us first introduce a simplified version of the problem (1);
min x R n F ( x ) = f 1 ( x 1 ) + f 2 ( x 2 ) + + f N ( x N ) ,
subject to
g 1 1 ( x 1 ) + g 1 2 ( x 2 ) + + g 1 N ( x N ) = 0 ,
g 2 1 ( x 1 ) + g 2 2 ( x 2 ) + + g 2 N ( x N ) = 0 ,
g m ˜ 1 ( x 1 ) + g m ˜ 2 ( x 2 ) + + g m ˜ N ( x N ) = 0 ,
i.e., the problem defined as in (1) without the inequality constraints. We introduce into consideration an auxiliary vector y R N m ˜ consisting of subvectors y i R m ˜ , y i = ( y 1 i , y 2 i , , y m ˜ i ) , i = 1 , , N . Each subvector y i is connected to the constraint function g i of the vertex i. Consider the j-th scalar equality constraint from (2):
i = 1 N g j i ( x i ) = 0 .
It can be reformulated to the following form:
g j 1 ( x 1 ) + i = 1 N L 1 i y j i = 0 , g j 2 ( x 2 ) + i = 1 N L 2 i y j i = 0 , g j N ( x N ) + i = 1 N L N i y j i = 0 ,
where L s i , s , i = 1 , , N are elements of the Laplacian matrix L. Such a transition is performed for all m ˜ equality constraints. Thus, in the further consideration, we use the following notation:
g ˜ = vec g 1 , g 2 , , g N ,
L ˜ = L I m ˜ ,
where L I m ˜ is the Kronecker product of the Laplacian matrix L and the identity matrix I m ˜ . The interpretation of representation (5) is given in Figure 1.
Repeating the transition from (3) to (4) for all j = 1 , , m ˜ yields the following reformulation of problem (2):
min ( x , y ) R n × R N m ˜ F ( x ) ,
g ˜ ( x ) + L ˜ y = 0 .
Lemma 1.
The following statements are correct:
1. 
Problem (2) is feasible if and only if problem (7) is feasible.
2. 
A pair ( x , y ) is the solution to problem (7) if and only if x is the solution to (2).
Proof. 
1. Consider system (4) as a system of linear equations with respect to variables ( y j 1 , y j 2 , , y j N ) for fixed x and right-hand side vector ( g j 1 ( x 1 ) , g j 2 ( x 2 ) , , g j N ( x N ) ) . According to the Fredholm Alternative [34] and due to the symmetricity of L, this system is consistent if and only if
vec g j 1 ( x 1 ) , , g j N ( x N ) ker L ,
where ker L = { v R N : L v = 0 } is the kernel of the Laplacian matrix L. Since the agent graph is connected, ker L = { v R N : v = ρ 1 N , ρ R , ρ 0 } . Then, (8) is equivalent to
vec g j 1 ( x 1 ) , , g j N ( x N ) 1 N = i = 1 N g j i ( x i ) = 0 .
Repeating this consideration for all j = 1 , , m , we find that problems (2) and (7) are feasible simultaneously.
2. The correctness of the second statement follows from the fact that both problems have the same objective function.    □
Let us consider the main property of system (4). Since L is the Laplacian matrix, this system can be rewritten in the following form:
g j 1 ( x 1 ) + i Adj ( 1 ) ( y j 1 y j i ) = 0 , g j 2 ( x 2 ) + i Adj ( 2 ) ( y j 2 y j i ) = 0 , g j N ( x N ) + i Adj ( N ) ( y j N y j i ) = 0 .
In order to evaluate the -th constraint in (10), it is necessary to know the local variables x , y j , local function g j and variables y j i from the neighboring vertices i Adj ( ) only. This is the main advantage of system (10) in comparison to constraint (3), for which it is necessary to know information from all vertices of the agent network.
If we fix vector y, for example, y = y ˜ , then, due to the separability of function F (see (2a)) and property (10), optimization with respect to the remaining vector x in problem (7) can be performed separately, i.e., each agent independently solves the corresponding problem:
min x f ( x ) ,
g j ( x ) = i Adj ( ) ( y ˜ j y ˜ j i ) , j = 1 , , m ˜ .
Assume that problems (11) and (12) are solvable for all = 1 , , N , and x ˜ is the corresponding solutions. The vector x ˜ = vec ( x ˜ 1 , , x ˜ N ) provides the solution of problem (7) for fixed y = y ˜ .

4. Gradient Descent in Variables y

Variables y are called communication variables. If we set y ˜ = y 0 , = 0 , = 1 , , N , and problems (11) and (12) are solvable with x 0 , as the corresponding solutions, then the pair ( x 0 , y 0 ) is a feasible starting point for problem (7), and F 0 = F ( x 0 ) is a starting objective function record value.
Let us write down the Lagrange function for problem (7) as
V ( x , y , λ ) = i = 1 N f i ( x i ) + λ g ˜ ( x ) + L ˜ y ,
where λ R N m ˜ is the vector of Lagrange multipliers consisting of subvectors λ i = ( λ 1 i , λ 2 i , , λ m ˜ i ) , i = 1 , , N . Each subvector λ i corresponds to constraint vector-function g i of the agent i. The corresponding necessary optimality conditions
V x = vec f 1 ( x 1 ) , , f N ( x N ) + J g ˜ ( x ) λ = 0 ,
V y = L ˜ λ = 0 ,
V λ = g ˜ ( x ) + L ˜ y = 0 ,
where J g ˜ ( x ) = diag J g 1 ( x 1 ) , J g 2 ( x 2 ) , , J g N ( x n ) is the Jacobian of g ˜ ( x ) . If we again fix y = y 0 and solve the corresponding problem (7) for y = y 0 , obtaining the corresponding primal solution x 0 and dual solution λ 0 , then conditions (14a) and (14c) will be satisfied with x = x 0 and λ = λ 0 . Condition (14b) can be violated V y = L ˜ λ 0 0 , since we do not perform optimization in y. Hence, we correct y 0 by the gradient descent step in y
y 1 = y 0 ρ 0 L ˜ λ 0 .
In general, we obtain the following recalculation formula for y k :
y k + 1 = y k ρ k L ˜ λ k .
From (14b), we have the following element-wise representation at step k due to the structure of the Laplacian matrix L and the structure of vector λ :
V ( x k , y k , λ k ) y j i = s Adj ( i ) λ j k , i λ j k , s , j = 1 , , m ˜ .
Therefore, the calculation of V y j i satisfies the distributed form, since λ j i is a local dual variable and all λ j k are dual variables from adjacent agents.
The main computational scheme for solving problem (7) is presented in Algorithm 1. We assume here that problem (7) is solvable for y = 0 .
Since we do not make any convexity assumptions, Algorithm 1 is suggested for finding a strict local saddle point in the Lagrange function in the sense of [35]. In [36], it is pointed out that values ρ at Step 4 must be chosen small enough in order to achieve convergence to a strict local saddle point. One of the recommended choices for ρ k is the following: ρ 0 = 1 , ρ k = 1 k for k > 1 .
Example 1.
Consider the problem with N = 4 , n i = 1 , i = 1 , , 4 , m ˜ = 1 and
f 1 ( x 1 ) = ( x 1 ) 4 + 3 ( x 1 ) 3 ( x 1 ) 2 3 x 1 , f 2 ( x 2 ) = sin ( x 2 ) + 0.1 ( x 2 ) 2 ,
f 3 ( x 3 ) = ( x 3 ) 2 4 , f 4 ( x 4 ) = x 4 ,
g 1 ( x 1 ) = ( x 1 ) 3 8 , g 2 ( x 2 ) = ( x 2 ) 2 7 x 2 + 10 , g 3 ( x 3 ) = x 3 1 , g 4 ( x 4 ) = ( x 4 ) 2 9 .
The network is described by the Laplacian matrix
L = 2 1 0 1 1 2 1 0 0 1 2 1 1 0 1 2 .
The starting solution of the corresponding problem (7) with y = y 0 = 0 is the following x 0 = ( 2 , 2 , 1 , 3 ) , λ 0 = ( 0.417 , 0.272 , 2.000 , 0.167 ) , F ( x 0 ) = 12.509 . The tolerance ε = 0.1 . The ε-solution was obtained after 294 iterations of Algorithm 1: x 294 = ( 2.131 , 1.954 , 0.098 , 2.877 ) , F ( x 294 ) = 13.972 , ρ k = 1 k . The optimal solution x = ( 2.182 , 1.831 , 0.093 , 2.678 ) , F ( x ) = 14.013 . Algorithm 1 generated a sequence of feasible points with decreasing objective function values.
Algorithm 1 Gradient descent method (for agent )
Input:  f , g , ε > 0 .
Output:  x ,
Algorithm steps:
Step 1. Set y 0 , = 0 and k = 0 ;
Step 2. Obtain y k , i , i Adj ( ) from neighboring agents;
Step 3. Solve problem (11) and (12) for y ˜ = y k , , y ˜ i = y k , i , i Adj ( ) . Let x k , , λ k , be the corresponding primal and dual solutions;
Step 4. Obtain λ k , i , i Adj ( ) from neighboring agents;
Step 5. If
V ( x k , y k , λ k ) y j = s Adj ( ) λ j k , λ j k , s < ε j = 1 , , m ˜ ,
then go to Step 8;
Step 6. Calculate y k + 1 , :
y k + 1 , = y k , ρ k V ( x k , y k , λ k ) y j ;

Step 7. Set k = k + 1 and go to Step 2;
Step 8. Stop: x k , is an ε -stationary point of problem (2).

5. Problem with Equality and Inequality Constraints

Similarly to the equality constraints, let us introduce vector-function h ^ : R n N m ^ :
h ^ = h 1 ( x 1 ) h N ( x N )
and expansion of the Laplacian matrix
L ^ = L I m ^ .
Then, the new optimization problem has the form
min x R n , y R N m ˜ , z R N m ^ i = 1 N f i ( x i ) ,
g ˜ ( x ) + L ˜ y = 0 ,
h ^ ( x ) + L ^ z 0 .
Firstly, let us prove the following lemma.
Lemma 2.
The following statements are correct:
1. 
Problem (19) is feasible if problem (1) is feasible.
2. 
Triplet ( x , y , z ) is the solution to problem (19) if and only if x is the solution to (1).
Proof. 
In Lemma 1, it was shown that linear constraints in both problems have the same solution in x. Let us now consider inequality constraints in both problems. As before, we consider the constraint j from (1c)
i = 1 N h j i ( x i ) 0
and the set of corresponding constraints c ( j ) from (19c)
vec h j 1 ( x 1 ) , , h j N ( x N ) + L z 0 .
The sum of the rows of L is zero. Thus, the sum of the inequalities (21) yields (20). Let us now show that if for some x equation (21) is correct, then there always exists x such that (20) holds. Vector vec h j 1 ( x 1 ) , , h j N ( x N ) can always be decomposed using some orthogonal basis 1 N , q 2 , , q N :
vec h j 1 ( x 1 ) , , h j N ( x N ) = α 1 N + i = 2 N β i q i = α 1 N + q .
Vector q is orthogonal to 1 N and, consequently, is orthogonal to ker L . Thus, there always exists z such that L z = q . Substitution of such z into left-hand side of (21) gives
vec h j 1 ( x 1 ) , , h j N ( x N ) + L z = α 1 N + q + L z = α 1 .
Additionally,
i = 1 N h j i ( x i ) = vec h j 1 ( x 1 ) , , h j N ( x N ) 1 = ( α 1 N + q ) 1 = α .
Thus, from (20) α 0 and (21), since this statement holds for all j { 1 , , m ^ } , the lemma is proven.    □

6. Newton’s Method

Here we adapt a Newton-type approach to distributed optimization [37]. In order to carry this out, we have to derive algorithms for the initial problem (1) and the distributed problem (19) in parallel. Due to the similar structure of these problems, all variables and functions of the initial problem will be denoted with an upper index c, which stands for centralized.
Let us introduce Lagrange functions for problem (19):
V ( x , y , λ , μ ) = f ( x ) + λ ( g ˜ ( x ) + L ˜ y ) + μ ( h ^ ( x ) + L ^ z ) .
The corresponding Karush–Kuhn–Tuckker conditions have the form
V x = f ( x ) + ( J g ˜ ( x ) ) λ + ( J h ^ ( x ) ) μ = 0 ,
V y = L ˜ λ = 0
V μ = L ^ z = 0 ,
g ˜ ( x ) + L ˜ y = 0 ,
( h ^ ( x ) + L ^ z ) i μ i = 0 , μ 0 .
In order to replace the complimentary slackness conditions with equations suitable for the Newton method, the complementarity function ψ : R 2 R is introduced. It has the following property: ψ ( x , y ) = 0 if and only if x 0 , y 0 and x y = 0 . It can be chosen in multiple ways. Here, we use the following form:
ψ ( x , y ) = y , if x > 0 or y 0 , x , otherwise .
Then, the KKT conditions (26) can be replaced with
ϕ ( x , y , z , λ , μ ) = 0 ,
where
ϕ ( x , y , z , λ , μ ) = V x L ˜ λ L ^ μ g ˜ ( x ) + L ˜ y ψ μ , h ^ ( x ) + L ^ z .
Next, we introduce diagonal matrices A ( x , μ ) with elements
A i i ( x , μ ) = 1 , ψ i ( μ , h ^ ( x ) + L ^ z ) = h ^ ( x ) + L ^ z , 0 , otherwise ;
and B ( x , μ ) with elements
B i i ( x , μ ) = 1 , ψ i ( μ , h ^ ( x ) ) = μ , 0 , otherwise .
Then,
Φ ( x , y , z , λ , μ ) = 2 V x 2 0 0 ( J g ˜ ( x ) ) ( J h ^ ( x ) ) 0 0 0 L ˜ 0 0 0 0 0 L ^ J g ˜ ( x ) L ˜ 0 0 0 A J h ^ ( x ) 0 B L ^ 0 B
and the values x k + 1 , y k + 1 , z k + 1 , λ k + 1 , μ k + 1 , corresponding to the k-th Newton iteration step, are calculated as the solution to the following system:
Φ k vec ( x , y , z , λ , μ ) vec ( x k + 1 , y k + 1 , z k + 1 , λ k + 1 , μ k + 1 ) = ϕ k ,
where
Φ k = Φ ( x k , y k , z k , λ k , μ k ) , ϕ k = ϕ ( x k , y k , z k , λ k , μ k ) .
Let us introduce the same Newton step equations for initial problem (1). For the Lagrange function, we have
V c ( x c , λ c , μ c ) = f ( x c ) + λ c g ( x c ) + μ c h ( x c )
and the corresponding parameters have the form
ϕ c ( x c , λ c , μ c ) = V c x c G ( x c ) ψ μ c , H ( x c ) ,
Φ c ( x c , λ c , μ c ) = 2 V c x c 2 ( J G ( x ) ) ( J H ( x ) ) J G ( x c ) 0 0 A ( x c , μ c ) J H ( x c ) 0 B ( x c , μ c ) .
Finally, with
Φ c , k = Φ ( x c , k , λ c , k , μ c , k ) , ϕ c , k = ϕ ( x c , k , λ c , k , μ c , k )
for a given Newton step, we have
Φ c , k vec ( x , y , z , λ , μ ) vec ( x c , k + 1 , y c , k + 1 , z c , k + 1 , λ c , k + 1 , μ c , k + 1 ) = ϕ c , k .
Let us now prove the following result.
Theorem 1.
If the following conditions hold
1. 
x 0 = x c , 0 ;
2. 
λ c ( i ) 0 = 1 λ i c , 0 / N for i { 1 , , m ˜ } ;
3. 
μ c ( i ) 0 = 1 μ i c , 0 / N for i { 1 , , m ^ } ;
4. 
For all i { 1 , , m ^ } z c ( i ) 0 is solution of the following optimization problem
min t R , z c ( i ) R N 1 2 t t ,
g ^ c ( i ) ( x 0 , i ) + L z c ( i ) + t = 0 ,
then the convergence of (31) coincides with the convergence of (36).
Proof. 
If, for the iteration k, conditions 1–4 hold, then, from (31), we arrive at a system of linear equations. Using Δ to denote the difference between variables at the k-th iteration, Equation (31) can be rewritten as
2 V x 2 Δ x + ( J G ( x k ) ) Δ λ + ( J H ( x k ) ) Δ μ = V x ,
L ˜ Δ λ = L ˜ λ k ,
L ^ Δ μ = L ^ μ k ,
J g ˜ ( x k ) Δ x + L ˜ Δ y = g ˜ ( x k ) + L ˜ y k ,
A ( x k , μ k ) J h ^ ( x k ) Δ x + A ( x k , μ k ) L ^ Δ z + B ( x k , μ k ) μ k = ψ ( μ k , h ^ ( x k ) + L ^ z k ) .
This set of equations describes a stationary point in the optimization problem
min Δ x R n , Δ y R m ˜ N , Δ z R m ^ N 1 2 Δ x 2 V x 2 Δ x + Δ x V x ,
J g ˜ ( x k ) Δ x + L ˜ Δ y = g ˜ ( x k ) + L ˜ y k ,
J h ^ ( x k ) Δ x + L ^ Δ z J = h ^ ( x k ) + L ^ z k J ,
where J = { i { 1 , , m ^ N } A i i ( x k , μ k ) = 1 } .
Likewise, for the centralized Equation (36), we can obtain a similar optimization problem:
min Δ x c R n 1 2 Δ x c 2 V c x c 2 Δ x c + Δ x c V c x c ,
J g ˜ ( x c , k ) Δ x c = g ˜ ( x c , k ) ,
J h ^ ( x c , k ) Δ x c I = h ^ ( x c , k ) I ,
where I = { i { 1 , , m ^ } A a i i ( x c , k , μ c , k ) = 1 } . Let us now show that (39) is an expansion of problem (40). Consider objective functions in both problems, which have first and second derivatives of the corresponding Lagrange functions. For the first derivative, we have
V x = f ( x k ) + ( J g ˜ ( x k ) ) λ + ( J h ^ ( x k ) ) μ .
Note that due to condition 2,
( J g ˜ ( x k ) ) λ k i = k = 1 m ˜ j c ( i ) g k x i λ k j = k = 1 m ˜ j c ( i ) g k x i λ k c m ˜ = λ k c k = 1 m ˜ j c ( i ) g k x i = ( J g ( x k ) ) λ k , c .
The same equality holds for ( ( J h ^ ( x k ) ) μ k . Thus,
V x = V c x c
and, consequently,
2 V x 2 = 2 V c x c 2 .
As a result, the objective functions in problems (40) and (39) are equivalent. Let us now consider the relation between sets J and I. Firstly, we focus on the optimization problem (37). It can be shown that in its optimum, z c ( i ) and t are chosen so that t i = t j , since it is the only case, where t ker L . Thus, for each i, the corresponding inequality constraints from (40) and (39),
j = 1 N g i j ( x c , j ) 0
and
g ˜ c ( i ) ( x ) + L z c ( i ) 0
are all active or inactive simultaneously. Thus,
J = i I c ( i ) .
As a result, problem (39) is an expansion of problem (40), and according to Lemma 1, has the same solution in x. Moreover, it means that for k + 1 , item 1 of the lemma is satisfied. Let us now demonstrate that the values x k + 1 , y k + 1 , z k + 1 , λ k + 1 , μ k + 1 satisfy items 2–4 of the Lemma. From (38b), for each i { 1 , , m ˜ } , all components of Δ λ c ( i ) k + 1 are equal to each other, which gives item 2. The same approach applies for all μ c ( i ) k + 1 , i I , and for all other i μ c k + 1 i = 0 , which means that item 3 holds. Finally, for z c ( i ) k + 1 , i I , the corresponding constraint is active, and problem (37) is solved with t = 0 . For all other i, we have z c ( i ) k + 1 , and therefore, item 4 holds.    □
Corollary 1.
The convergence speed of Newton’s method applied to problem (19) is equal to the convergence speed of Newton’s method applied to problem (1).
Corollary 2.
Assume that
  • functions f, g and h are twice differentiable in some neighborhood of the solution x of problem (1), and their second derivatives are Lipshitz-continuous in the neighborhood of the x .
  • the constraints’ gradients are linear independent in the optimum (linear independence constraint  qualification);
  • solution x has unique corresponding dual variables λ c , and μ c , in problem (1);
  • for x , λ c , and μ c , we have
    u 2 L c x 2 x , λ c , , μ c , > 0 u K + ( x ) { 0 } ,
    where
    K + ( x ) = u ker J h ( x ) J g ( x ) i u = 0 i : g i ( x ) = 0   a n d   μ i c , > 0 .
Then, in problem (19), for any starting point ( x , λ , μ ) sufficiently close to ( x , λ , μ ) , where λ c ( i ) 0 = 1 λ i c , 0 / N for i { 1 , , m ˜ } and μ c ( i ) 0 = 1 μ i c , 0 / N for i { 1 , , m ^ } , Algorithm 2 converges to the solution with quadratic rate.
This corollary result is based on the estimation of Newton’s method convergence rate for problem (1) given in [37].
Finally, Algorithm 2 requires the exchange of information only in steps 6 and 8. However, during this step, the gradient descent method is used with its distributed implementation shown in the previous section. Thus, operations in Algorithm 2 are performed in the distributed form.
Algorithm 2 Newton’s method
Input: f,g,L, ε > 0 .
Output:  x
Algorithm steps:
Step 1. Set y 0 = 0 ;
Step 2. For each i { 1 , , N } set z c ( i ) as a solution of (37) using gradient descent;
Step 3. Set λ 0 = 0 and μ 0 = 0 ;
Step 4. Set k = 0 ;
Step 5. Solve optimization problem (39) using the gradient descent method;
Step 6. Assign
x k + 1 y k + 1 z k + 1 λ k + 1 μ k + 1 = Δ x k Δ y k Δ z k Δ λ k Δ μ k + x k y k z k λ k μ k

Step 7. For all inactive constraints i { 1 , , m ^ } , solve problem (37) using gradient descent and assign its solution to z c ( i ) k + 1 ;
Step 8. If x k x k + 1 > ε , then set k = k + 1 and go back to step 5;
Step 9. Stop: x k + 1 is an ε -stationary point.
Example 2.
Consider the initial problem with the following components: N = 4 , m ˜ = 1 , m ^ = 0 , n 1 = n 2 = n 3 = n 4 = 1 , f 1 ( x 1 ) = ( x 1 1 5 ) 2 , f 2 ( x 2 ) = ( x 1 2 4 ) 2 , f 3 ( x 3 ) = ( x 1 3 3 ) 2 , f 4 ( x 4 ) = ( x 1 4 2 ) 2 , g 1 ( x 1 ) = ( x 1 1 ) 2 3 , g 2 ( x 2 ) = ( x 1 2 ) 2 3 , g 3 ( x 3 ) = ( x 1 3 ) 2 3 , g 4 ( x 4 ) = ( x 1 4 ) 2 3 . The network is described by the Laplacian matrix (as in example 1):
L = 2 1 0 1 1 2 1 0 0 1 2 1 1 0 1 2 .
Starting points x 1 1 , 0 = 2 , x 1 2 , 0 = 2 , x 1 3 , 0 = 2 , x 4 , 0 = 2 , y 1 1 , 0 = 0 , y 1 2 , 0 = 0 , y 1 3 , 0 = 0 , y 1 4 , 0 = 0 , λ 1 , 0 = 0.4 , λ 2 1 , 0 = 0.4 , λ 3 , 0 = 0.4 , λ 1 4 , 0 = 0.4 . The tolerance ε = 0.01 . Then, in four iterations, Algorithm 2 finds an ε-optimal solution with components x 1 1 , 4 = 2.357 , x 1 2 , 4 = 1.886 , x 1 3 , 4 = 1.414 , x 1 4 , 4 = 0.943 , y 1 1 , 4 = 1.083 , y 1 2 , 4 = 0.472 , y 1 3 , 4 = 0.694 , y = 4 , 4 0.861 , λ 1 1 , 4 = λ 1 2 , 4 = λ 1 3 , 4 = λ 1 4 , 4 = 1.121 . The objective function value F ( x 4 ) = 15.088 .

7. Application of the Arrow–Hurwicz Algorithm to Problems with Inequality Constraints

We consider here problem (19) without equality constraints
min x R n , z R N m ^ i = 1 N f i ( x i ) ,
h ^ ( x ) + L ^ z 0 .
The Lagrange function is given by
V ( x , z , μ ) = f ( x ) + μ h ^ ( x ) + L ^ x .
Assume that starting vectors ( x 0 , z 0 , μ 0 ) are given. The Arrow–Hurwicz algorithm can be described by the following relations ([38]):
x k + 1 = x k ρ V ( x k , z k , μ k ) x ,
z k + 1 = x k ρ V ( x k , z k , μ k ) z ,
μ k + 1 = max { 0 , μ k + ρ h ( x k ) } .
As was shown above, the computational scheme (53)–(55) with V defined in (52) has the distributed form. The algorithm stops when x k x k + 1 ε , where ε > 0 is the tolerance. The following strategy of choosing the step size ρ was used. We set the initial value ρ = 1 . If, during the iterations, the deviation h ( x k ) is becoming large enough, for example, greater than the practically chosen value h ¯ , then ρ is reset: ρ = ρ 2 , and the algorithms restart from the initial starting set ( x 0 , z 0 , μ 0 ) . The explanation of such a restarting is the following. If the deviation h ^ ( x k ) is big enough, then the point x k is too far from the feasible domain in contrast to starting point x 0 , which is assumed to be chosen close enough to the feasible domain. The first and main reason of using this algorithm is due to the fact that it provides a minimization procedure in the non-convex case. The second reason consists of the following. In some neighborhood of the point of minimum, the Lagrange function is usually locally convex, and if the algorithm managed to get into this neighborhood, then it determines this point of minimum.
Example 3.
Problem (51) has the following components: N = 4 , m ^ = 2 , n 1 = 2 , n 2 = 1 , n 3 = 3 , n 4 = 2 , f 1 ( x 1 1 , x 2 1 ) = ( x 1 1 ) 2 ( x 2 1 ) 2 , f 2 ( x 1 2 ) = 2 sin ( ( x 1 2 3 ) x 1 2 ) , f 3 ( x 1 3 , x 2 3 , x 3 3 ) = ( x 1 3 ) 2 + ( x 2 3 ) 2 + ( x 3 3 ) 2 , f 4 ( x 1 4 , x 2 4 ) = ( x 1 4 x 2 4 1 ) 2 , h 1 1 ( x 1 , x 2 1 ) = x 1 1 + x 2 1 3 , h 2 1 ( x 1 , x 2 1 ) = ( x 1 1 ) 2 + ( x 2 1 ) 2 5 , h 1 2 ( x 1 2 ) = ( x 1 2 ) 2 4 , h 1 3 ( x 1 3 , x 2 3 , x 3 3 ) = x 1 3 + x 2 3 + x 3 3 3 , h 2 3 ( x 1 3 , x 2 3 , x 3 3 ) = x 3 3 2 , h 1 4 ( x 1 4 , x 2 4 ) = x 4 1 x 2 4 , h 2 4 ( x 1 4 , x 2 4 ) = ( x 1 4 ) 2 + ( x 2 4 ) 2 1 .
The interpretation of the agent network is shown in Figure 2. The Laplacian matrix is
L = 2 1 0 1 1 2 1 0 0 1 2 1 1 0 1 2 .
All components of the starting vectors x 0 , z 0 , μ 0 were equal to 2. Tolerance ε = 0.001 . After 69 iterations, the algorithm determined point x 69 = ( 0.162 , 0.162 , 0.676 , 0 , 0 , 0 , 0.119 , 1.119 ) , which happened to be optimal, y 69 = ( 3.108 , 3.296 , 2.525 , 2.466 , 1.940 , 1.446 , 0.427 , 0.791 ) , F ( x 69 ) = 1.999 .

8. Conclusions

A novel approach for the decentralized solution of non-convex optimization problems was proposed. It is based on the reformulation of optimization problems to a specific form that allows the distributed implementation of modified gradient descent and Newton’s methods. The main strength of the modified Newton’s method is in having the same number of oracle callers as a standard Newton’s method applied to the initial problem formulation. Thus, in the cases when oracle calls are more expensive than communication between agents, the transition from centralized to distributed paradigm does not significantly affect computational time. Moreover, if the convergence speed for Newton’s method in application to centralized problems is quadratic, the same speed will remain for the modified decentralized algorithm.
Such properties of the proposed approach are extremely useful in solving optimization problems that arise when automating the design of decision support systems and digital twins based on a holistic approach that uses mathematical and machine learning models in conjunction with human expert knowledge [5], especially in the context of ubiquitous computing and digitalization [1].

Author Contributions

Conceptualization, O.O.K., O.V.K. and E.S.S.; methodology, O.O.K., O.V.K. and E.S.S.; validation, O.O.K., O.V.K., T.D.G. and E.S.S.; formal analysis O.V.K., T.D.G. and E.S.S.; investigation, E.S.S.; writing—original draft preparation, O.O.K. and O.V.K.; writing—review and editing, O.V.K., T.D.G. and E.S.S.; supervision, E.S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ganchev, T.D. Chapter 8—Ubiquitous computing and biodiversity monitoring. In Advances in Ubiquitous Computing; Neustein, A., Ed.; Advances in Ubiquitous Sensing Applications for Healthcare; Academic Press: Cambridge, MA, USA, 2020; pp. 239–259. [Google Scholar] [CrossRef]
  2. Ganchev, T.; Markova, V.; Valcheva-Georgieva, I.; Dobrev, I. Assessment of pollution with heavy metals and petroleum products in the sediments of Varna Lake. Rev. Bulg. Geol. Soc. 2022, 83, 3–9. [Google Scholar] [CrossRef]
  3. Nandal, A.; Zhou, L.; Dhaka, A.; Ganchev, T.; Nait-Abdesselam, F. Machine Learning in Medical Imaging and Computer Vision; IET: London, UK, 2024. [Google Scholar]
  4. Markova, V.; Ganchev, T.; Filkova, S.; Markov, M. MMD-MSD: A Multimodal Multisensory Dataset in Support of Research and Technology Development for Musculoskeletal Disorders. Algorithms 2024, 17, 187. [Google Scholar] [CrossRef]
  5. Semenkin, E. Computational Intelligence Algorithms based Comprehensive Human Expert and Data driven Model Mining for the Control, Optimization and Design of Complicated Systems. Int. J. Inf. Technol. Secur. 2019, 11, 63–66. [Google Scholar]
  6. Semenkin, E.; Semenkina, M. Artificial neural networks design with self-configuring genetic programming algorithm. In Proceedings of the Bioinspired Optimization Methods and Their Applications, Maribor, Slovenia, 17–18 November 2012; pp. 291–300. [Google Scholar]
  7. Akhmedova, S.; Semenkina, M.; Stanovov, V.; Semenkin, E. Semi-supervised Data Mining Tool Design with Self-tuning Optimization Techniques. In Proceedings of the Informatics in Control, Automation and Robotics: 14th International Conference, ICINCO 2017, Madrid, Spain, 26–28 July 2017; Revised Selected Papers. Springer: Berlin/Heidelberg, Germany, 2020; pp. 87–105. [Google Scholar]
  8. Sherstnev, P.; Semenkin, E. Application of evolutionary algorithms for the design of interpretable mschine learning models in classification problems. Control Syst. Inf. Technol. 2022, 22, 17–20. [Google Scholar] [CrossRef]
  9. Vakhnin, A.; Sopov, E.; Semenkin, E. On Improving Adaptive Problem Decomposition Using Differential Evolution for Large-Scale Optimization Problems. Mathematics 2022, 10, 4297. [Google Scholar] [CrossRef]
  10. Krutikov, V.; Gutova, S.; Tovbis, E.; Kazakovtsev, L.; Semenkin, E. Relaxation Subgradient Algorithms with Machine Learning Procedures. Mathematics 2022, 10, 3959. [Google Scholar] [CrossRef]
  11. Nedic, A.; Ozdaglar, A. Distributed Subgradient Methods for Multi-Agent Optimization. IEEE Trans. Autom. Control 2009, 54, 48–61. [Google Scholar] [CrossRef]
  12. Metelev, D.; Beznosikov, A.; Rogozin, A.; Gasnikov, A.; Proskurnikov, A. Decentralized optimization over slowly time-varying graphs: Algorithms and lower bounds. Comput. Manag. Sci. 2024, 21, 8. [Google Scholar] [CrossRef]
  13. Nedić, A.; Olshevsky, A.; Shi, W. Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs. Siam J. Optim. 2017, 27, 2597–2633. [Google Scholar] [CrossRef]
  14. Tang, Y.; Deng, Z.; Hong, Y. Optimal Output Consensus of High-Order Multiagent Systems With Embedded Technique. IEEE Trans. Cybern. 2019, 49, 1768–1779. [Google Scholar] [CrossRef] [PubMed]
  15. Jadbabaie, A.; Lin, J.; Morse, A. Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Trans. Autom. Control 2003, 48, 988–1001. [Google Scholar] [CrossRef]
  16. Luan, X.; Schutter, B.; Meng, L.; Corman, F. Decomposition and distributed optimization of real-time traffic management for large-scale railway networks. Transp. Res. Part Methodol. 2020, 141, 72–97. [Google Scholar] [CrossRef]
  17. Luan, X.; Schutter, B.; van den Boom, T.; Corman, F.; Lodewijks, G. Distributed optimization for real-time railway traffic management. IFAC-PapersOnLine 2018, 51, 106–111. [Google Scholar] [CrossRef]
  18. Khamisov, O.O. Direct disturbance based decentralized frequency control for power systems. In Proceedings of the 2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, Australia, 12–15 December 2017; pp. 3271–3276. [Google Scholar] [CrossRef]
  19. Khamisov, O.O.; Chernova, T.; Bialek, J.W. Comparison of two schemes for closed-loop decentralized frequency control and overload alleviation. In Proceedings of the 2019 IEEE Milan PowerTech, Milan, Italy, 23–27 June 2019; pp. 1–6. [Google Scholar] [CrossRef]
  20. Motta, V.N.; Anjos, M.; Gendreau, M. Optimal allocation of demand response considering transmission system congestion. Comput. Manag. Sci. 2020, 20, 2023. [Google Scholar]
  21. Rabbat, M.; Nowak, R. Distributed optimization in sensor networks. In Proceedings of the Third International Symposium on Information Processing in Sensor Networks, IPSN 2004, Berkeley, CA, USA, 26–27 April 2004; pp. 20–27. [Google Scholar] [CrossRef]
  22. Sadiev, A.; Borodich, E.; Beznosikov, A.; Dvinskikh, D.; Chezhegov, S.; Tappenden, R.; Takac, M.; Gasnikov, A. Decentralized personalized federated learning: Lower bounds and optimal algorithm for all personalization modes. Euro J. Comput. Optim. 2022, 10, 100041. [Google Scholar] [CrossRef]
  23. Forero, P.A.; Cano, A.; Giannakis, G.B. Consensus-Based Distributed Support Vector Machines. J. Mach. Learn. Res. 2010, 11, 1663–1707. [Google Scholar]
  24. Gorbunov, E.; Rogozin, A.; Beznosikov, A.; Dvinskikh, D.; Gasnikov, A. Recent Theoretical Advances in Decentralized Distributed Convex Optimization. In High-Dimensional Optimization and Probability: With a View Towards Data Science; Nikeghbali, A., Pardalos, P.M., Raigorodskii, A.M., Rassias, M.T., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 253–325. [Google Scholar] [CrossRef]
  25. Kovalev, D.; Salim, A.; Richtarik, P. Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: New York, NY, USA, 2020; Volume 33, pp. 18342–18352. [Google Scholar]
  26. Kovalev, D.; Shulgin, E.; Richtarik, P.; Rogozin, A.V.; Gasnikov, A. ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks. In Proceedings of the 38th International Conference on Machine Learning, Virtual Event, 18–24 July 2021; Meila, M., Zhang, T., Eds.; PMLR: Westminster, UK, 2021; Volume 139, Proceedings of Machine Learning Research. pp. 5784–5793. [Google Scholar]
  27. Wang, Z.; Ong, C.J.; Hong, G.S. Distributed Model Predictive Control of linear discrete-time systems with coupled constraints. In Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, NV, USA, 12–14 December 2016; pp. 5226–5231. [Google Scholar] [CrossRef]
  28. Erseghe, T. Distributed Optimal Power Flow Using ADMM. IEEE Trans. Power Syst. 2014, 29, 2370–2380. [Google Scholar] [CrossRef]
  29. Yarmoshik, D.; Rogozin, A.; Khamisov, O.O.; Dvurechensky, P.; Gasnikov, A. Decentralized Convex Optimization Under Affine Constraints for Power Systems Control. In Proceedings of the 21st International Conference Mathematical Optimization Theory and Operations Research, Petrozavodsk, Russia, 2–6 July 2022; Pardalos, P., Khachay, M., Mazalov, V., Eds.; Springer: Cham, Switzerland, 2022; pp. 62–75. [Google Scholar]
  30. Khamisov, O.O. Distributed continuous-time optimization for convex problems with coupling linear inequality constraints. Math. Optim. Theory Oper. Res. 2024, 21, 1619–6988. [Google Scholar] [CrossRef]
  31. Stomberg, G.; Engelmann, A.; Faulwasser, T. Decentralized non-convex optimization via bi-level SQP and ADMM. In Proceedings of the 2022 IEEE 61st Conference on Decision and Control (CDC), Cancun, Mexico, 6–9 December 2022; pp. 273–278. [Google Scholar] [CrossRef]
  32. Hours, J.H.; Jones, C.N. A Parametric Nonconvex Decomposition Algorithm for Real-Time and Distributed NMPC. IEEE Trans. Autom. Control 2016, 61, 287–302. [Google Scholar] [CrossRef]
  33. Zhao, C.; Topcu, U.; Li, N.; Low, S. Design and Stability of Load-Side Primary Frequency Control in Power Systems. IEEE Trans. Autom. Control 2014, 59, 1177–1189. [Google Scholar] [CrossRef]
  34. Shores, T. Applied Linear Algebra and Matrix Analysis; Springer: New York, NY, USA, 2007. [Google Scholar]
  35. Evtushenko, Y. Numerical Optimization Technique; Springer: New York, NY, USA, 1985. [Google Scholar]
  36. Hadley, G. Nonlinear and Dynamic Programming; Addison-Wessley Publishing Company Inc.: Boston, MA, USA, 1964. [Google Scholar]
  37. Izmailov, A.F.; Solodov, M.V. Newton-Type Methods for Optimization and Variational Problems; Springer: Cham, Switzerland, 2014. [Google Scholar]
  38. Minoux, M. Programmation Mathémateque: Théorie et Algorithmes; Dunod: Paris, France, 1983. [Google Scholar]
Figure 1. The agent network with information available at each node.
Figure 1. The agent network with information available at each node.
Mathematics 12 02796 g001
Figure 2. The agent network with information available at each node.
Figure 2. The agent network with information available at each node.
Mathematics 12 02796 g002
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Khamisov, O.O.; Khamisov, O.V.; Ganchev, T.D.; Semenkin, E.S. A Method for Transforming Non-Convex Optimization Problem to Distributed Form. Mathematics 2024, 12, 2796. https://doi.org/10.3390/math12172796

AMA Style

Khamisov OO, Khamisov OV, Ganchev TD, Semenkin ES. A Method for Transforming Non-Convex Optimization Problem to Distributed Form. Mathematics. 2024; 12(17):2796. https://doi.org/10.3390/math12172796

Chicago/Turabian Style

Khamisov, Oleg O., Oleg V. Khamisov, Todor D. Ganchev, and Eugene S. Semenkin. 2024. "A Method for Transforming Non-Convex Optimization Problem to Distributed Form" Mathematics 12, no. 17: 2796. https://doi.org/10.3390/math12172796

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop