Next Article in Journal
A Comprehensive Overview of Hydrogen-Fueled Internal Combustion Engines: Achievements and Future Challenges
Next Article in Special Issue
Decomposition Methods for the Network Optimization Problem of Simultaneous Routing and Bandwidth Allocation Based on Lagrangian Relaxation
Previous Article in Journal
Design of the Model of Optimization of Energy Efficiency Management Processes at the Regional Level of Slovakia
Previous Article in Special Issue
Optimal Location-Allocation of Printing Devices for Energy Saving Using a Novel MILP Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Generalized Benders Decomposition Method to Solve Big Mixed-Integer Nonlinear Optimization Problems with Convex Objective and Constraints Functions

by
Andrzej Karbowski
Research and Academic Computer Network NASK—National Research Institute, ul. Kolska 12, 01-045 Warsaw, Poland
Energies 2021, 14(20), 6503; https://doi.org/10.3390/en14206503
Submission received: 9 August 2021 / Revised: 28 September 2021 / Accepted: 30 September 2021 / Published: 11 October 2021

Abstract

:
The paper presents the Generalized Benders Decomposition (GBD) method, which is now one of the basic approaches to solve big mixed-integer nonlinear optimization problems. It concentrates on the basic formulation with convex objectives and constraints functions. Apart from the classical projection and representation theorems, a unified formulation of the master problem with nonlinear and linear cuts will be given. For the latter case the most effective and, at the same time, easy to implement computational algorithms will be pointed out.

1. Introduction

In the early 1960s, a Dutch mathematician Jacques F. Benders was considering optimization problems of the form [1]:
max x , v c T x + f ( v )
subject to the following constraints (hereinafter, the abbreviation “s.t.” will be used):
A x + F ( v ) b , b R m
x X R n , v V R q
He called them mixed-variables programming problems. In these problems, both the objective functions and the functions defining constraints were sums of two components: linear, dependent on the vector variable x X , and nonlinear, dependent on the vector variable v V . Benders called v a vector of complicating variables, because when they were fixed, the problem simplified—it became linear. In addition, he suggested that these variables can be discrete, i.e., the problem with respect to the x vector itself became continuous and was already very easy to solve with any linear programming solver (e.g., based on the simplex algorithm).
In his work [1] Benders proposed an iterative solution procedure for this optimization problem, consisting in solving alternately auxiliary problems with respect to one or the other vector of variables. These problems were related to the dual representation of the initial problem and optimality conditions. We will call the problem with respect to complicating variables v the master problem, and the problem with respect to x variables the primal problem. In its subsequent launches the primal problem provided the master problem with constraints related to the approximation of the objective function or the feasible set, computed at different points of the X set. The former were added in case, when for a given trial vector v, there existed a solution admissible in the initial problem, the latter—otherwise.
The generalization of this approach to nonlinear problems was presented by Arthur Geoffrion [2]. He called it the Generalized Benders Decomposition (GBD). As in the original work by Benders, constraints in the master problem were defined with the help of Lagrangians of the respective primal problems. The objective values in optimal points, obtained after each iteration of the master problem and of the primal problem—when it was feasible, were used to estimate the optimal objective of the initial problem. In the case of a minimization problem, the master problem provided its estimate from the bottom L B D (from the Lower BounD), the primal problem—from the top— U B D (from the Upper BounD). When the primal problem was infeasible, the so-called feasibility problem, consisting in minimization of the maximum exceedance of constraints, was solved. The stop test consisted in checking whether the upper and the lower estimates of the objective function were equal with a given accuracy.
Geoffrion’s approach in a slightly modified version was presented in the works of Floudas [3,4], which assumed that v variables are binary and supplemented the constraints of the problem with the equality ones. Unfortunately, they are not relaxed in the feasibility problem (like inequality constraints), which can cause this approach to lead to a deadlock, when for a trial point selected from the set V, any of these constraints is not satisfied for all x X . Floudas provided several versions of the computing algorithm, but they concerned the same specific problems, actually being one algorithm written in different ways.
Unfortunately, neither the works of Floudas [3,4], nor the Geoffrion’s source work [2], presented a version of fundamental importance in convex problems with respect to the entire vector ( x , v ) —with linear cuts in a master problem. Despite it being the easiest version to implement; moreover, it was mentioned by Benders [1] and in more recent works on mixed integer nonlinear programming, [5,6] it is most often presented as the only one.
Deficiencies of the above-mentioned presentations of the Benders method and the lack of a coherent and uniform description of this approach convinced the author of the need to put this knowledge in order, which resulted in this work. The importance of this approach is growing as more and more practical problems of mixed-integer nonlinear programming are solved, related to, e.g., energy [7], telecommunications [8], transport [9,10], gas networks [11], production processes [12], water systems [13], two-stage nonlinear stochastic control [14,15]. When these problems exceed a certain size, even the best commercial solvers fail [14]. It is necessary to write your own specialized code, using structural properties of the problem. That is what the Benders method is great for. It is also a kernel of many complicated algorithms solving optimization problems with nonconvex functions, especially in the field of chemical engineering [16,17,18,19,20].
Generalized Benders Decomposition is not the only decomposition method to solve MINLP problems. The most popular other classical approaches are: Lagrangian relaxation, branch and bound, column generation, outer approximation [6,21]. Recently, the Alternating Directions Multiplier Method has been gaining a lot of popularity [22,23].

2. Problem Formulation

Let us consider an optimization problem of the following form:
min x , v f ( x , v )
s.t.
g ( x , v ) 0
x X R n
v V R q
where
f : R n × R q R , g : R n × R q R m
The possible equality constraints
h j ( x , v ) = 0 , j = 1 , , s
can be taken into account using standard transformations, e.g.,
h 1 ( x , v ) 0 h s ( x , v ) 0 j = 1 s h j ( x , v ) 0
or
h j 2 ( x , v ) 0 , j = 1 , , s
For the sake of simplicity, the equality constraints will be further omitted. We will mention them in Section 8, suggesting a specific way to treat them in computational algorithms.
Assumptions about the functions and sets appearing in the formulation will be given in the following statements. For now, we will only assume, that they are such that the solution of the problem (4)–(7) exists. It is only worth saying something about the V set. It is supposed to be contained in R q . Therefore, this formulation covers also MINLP problems (from Mixed Integer NonLinear Programming), that is mixed continuous–discrete, nonlinear problems, where V Z q ; Z —the space of all integers.
It is suggested to define the vector v—complicating variables in such a way that, when we establish it, one of the following occurs:
  • A problem can be decomposed into a number of independent subproblems, each using a different subproblem vector x i , i = 1 , , p
    x = x 1 x 2 x p
    Most often, the total objective function is additive:
    min x , v f ( x , v ) = min x 1 , x 2 , , x p , v i = 1 p f i ( x i , v ) + f 0 ( v )
    s.t.
    g i j ( x i , v ) 0 , j = 1 , , m i ; i = 1 , , p
    x i X i R n i , i = 1 , , p , v V R q ,
    i = 1 p n i = n
    In this case, it is worthwhile to use not only decomposition, but also a parallel solution to the resulting primal subproblems.
  • The problem with respect to x variables takes a specific structure for which efficient algorithms exist—e.g., it is linear or quadratic.
  • The problem is convex with respect to x with v fixed and vice-versa, although it is non-convex, if we consider it for the concatenated vector variable ( x , v ) .
    Convex problems with respect to one subvector when the latter is fixed, for example, bilinear problems, are very often formulated in engineering [12,24].

3. Examples of Optimization Problems That Can Be Solved with the Benders Method

  • Problem with a separable objective function and separable function constraints
    min y 1 , , y 6 2 y 1 2 y 1 · y 2 + 4 y 2 2 + ( y 3 4 ) 2 + ( y 4 3 ) 2 + 8 y 5 2 + y 6 2 3 y 5
    s.t.
    y 1 1 , y 2 0 , y 3 3 , y 4 2 , y 5 3 , y 6 0
    y 1 + y 2 + y 3 2 + y 4 2 + y 5 + y 6 2 50
    In problems, where both the objective function and the constraint functions are sums of functions dependent on subvectors of decision variables, to obtain the standard form (12)–(14) you need to accordingly group the variables into subvectors and to introduce additional v variables, separating constraints.
    Let us assume p = 3 and denote:
    x 1 = x 1 , 1 x 1 , 2 y 1 y 2 , x 2 = x 2 , 1 x 2 , 2 y 3 y 4 ,
    x 3 = x 3 , 1 x 3 , 2 y 5 y 6
    Now:
    f 1 ( x 1 ) = 2 x 1 , 1 2 x 1 , 1 x 1 , 2 + 4 x 1 , 2 2
    X 1 = [ x 1 , 1 , x 1 , 2 ] R 2 : x 1 , 1 1 , x 1 , 2 0
    f 2 ( x 2 ) = ( x 2 , 1 4 ) 2 + ( x 2 , 2 3 ) 2
    X 2 = [ ( x 2 , 1 , x 2 , 2 ] R 2 : x 2 , 1 3 , x 2 , 2 2
    f 3 ( x 3 ) = 8 x 3 , 1 2 + x 3 , 2 2 3 x 3 , 1
    X 3 = [ x 3 , 1 , x 3 , 2 ] R 2 : x 3 , 1 3 , x 3 , 2 0
    f 0 ( v ) 0
    Let us also introduce artificial variables v 1 , v 2 , v 3 for constraining components binding primal subvector variables, respectively: x 1 , x 2 and x 3 , in the cumulative constraint (17).
    x 1 , 1 + x 1 , 2 v 1
    x 2 , 1 2 + x 2 , 2 2 v 2
    x 3 , 1 + x 3 , 2 2 v 3
    We obtain the following constraint functions in the problem (12)–(14) format:
    g 11 ( x 1 , v ) = x 1 , 1 + x 1 , 2 v 1
    g 21 ( x 2 , v ) = x 2 , 1 2 + x 2 , 2 2 v 2
    g 31 ( x 3 , v ) = x 3 , 1 + x 3 , 2 2 v 3
    as well as the V set:
    V = [ v 1 , v 2 , v 3 ] R 3 : v 1 + v 2 + v 3 50
  • Problem with a chain (ring) of constraints.
    In problems of this type, successive constraints bind several subsequent decision variables, i.e., every variable appears in several constraints (at least in two) together with neighboring variables with a lower and a higher index. This structure is quite typical for optimal control problems after discretization of differential equations over time and replacing a resulting system of s nonlinear equations with s + 1 inequality constraints using the transformation (9). The idea for splitting a problem and obtaining the standard form (12)–(14) is to treat some variables in a vector (e.g., those which divide it into several equal parts) as complicating variables v i , and then the constraints in which these variables appear as mixed constraints of the type (13).
    Consider the following problem:
    min y R 9 y 1 2 + y 2 2 + + y 9 2
    s.t.
    y k + 1 y k sin k , k = 1 , . . . , 8
    y 1 y 9 0.5
    Let us denote:
    v 1 y 3 , v 2 y 6 , v 3 y 9
    The remaining elements will form subvectors of the dimension 2. We will denote them as x 1 , x 2 , x 3 , that is:
    x 1 = x 1 , 1 x 1 , 2 y 1 y 2 , x 2 = x 2 , 1 x 2 , 2 y 4 y 5 ,
    x 3 = x 3 , 1 x 3 , 2 y 7 y 8
    These problems can be presented in the format (12)–(14) assuming:
    f 1 ( x 1 ) = x 1 , 1 2 + x 1 , 2 2
    X 1 = [ x 1 , 1 , x 1 , 2 ] R 2 : x 1 , 2 x 1 , 1 sin 1
    g 11 ( x 1 , v ) = x 1 , 1 v 3 0.5
    g 12 ( x 1 , v ) = v 1 x 1 , 2 sin 2
    f 2 ( x 2 ) = x 2 , 1 2 + x 2 , 2 2
    X 2 = [ x 2 , 1 , x 2 , 2 ] R 2 : x 2 , 2 x 2 , 1 sin 4
    g 21 ( x 2 , v ) = x 2 , 1 v 1 sin 3
    g 22 ( x 2 , v ) = v 2 x 2 , 2 sin 5
    f 3 ( x 3 ) = x 3 , 1 2 + x 3 , 2 2
    X 3 = [ x 3 , 1 , x 3 , 2 ] R 2 : x 3 , 2 x 3 , 1 sin 7
    g 31 ( x 3 , v ) = x 3 , 1 v 2 sin 6
    g 32 ( x 3 , v ) = v 3 x 3 , 2 sin 8
    f 0 ( v ) = i = 1 3 v i 2

4. Decomposition

The problem (4)–(7) can be represented as follows [2]:
min v V inf x X f ( x , v )
s.t.
g ( x , v ) 0
The infimum in the internal problem appeared due to the fact that for some v it can be unbounded.
Let us denote
z ( v ) inf x X f ( x , v )
s.t.
g ( x , v ) 0
We will call the problem (48)–(49), solved for a fixed v, the primal problem.
The problem (46)–(47) can now be written as
min v V V 0 z ( v )
where
V 0 = v : x X g ( x , v ) 0
and interpreted as a projection of the problem (46)–(47) onto the space of the variable v [2]. The formulation (50) is called the master problem.
The requirement for v defined in (50) as v V V 0 results from the necessity to guarantee the existence of a solution—the value of z ( v ) . The set V 0 is called the solvability set.
The problem is that we only know z ( v ) and V 0 indirectly, through their generic definitions.
The following theorems are valid [2]:
Theorem 1
(Projection).
  • The problem (4)–(7) has no solution or is unbounded if and only if the same is true for the problem (50).
  • If ( x ^ , v ^ ) is the optimal solution of the problem (4)–(7), then v ^ is the optimal solution of the problem (50),
  • If v ^ is the optimal solution of the problem (50) and x ^ reaches the infimum in the problem (48)–(49) at v = v ^ , then ( x ^ , v ^ ) is the optimal solution of the problem (4)–(7).
Theorem 2
(Representation of V 0 ).
Assume that X is a nonempty convex set and that the function g is convex on X for each fixed v V .
Suppose further that the set
Z v = z R m : x X g ( x , v ) z
is closed for each fixed v V . Then a point v * V also belongs to the set V 0 if and only if:
inf x X L f ( x , v * , λ ) 0 , λ Λ
where
Λ = λ R m : λ 0 , j = 1 m λ j = 1
and
L f ( x , v , λ ) = λ T g ( x , v )
Theorem 3
(Representation of z ( v ) ).
Assume that X is a nonempty convex set and that the functions f and g are convex on X for each fixed v = v * V . Assume further that for v * , at least one of the following conditions is met:
  • z ( v * ) is finite and in the problem (48)–(49) there exists an optimal vector of Lagrange multipliers;
  • z ( v * ) is finite, g ( x , v * ) and f ( x , v * ) are continuous on X, the set X is closed and the set of optimal solutions of the problem (48)–(49) with an accuracy ε is nonempty and bounded for some ε 0 ;
Then
z ( v ) = sup λ 0 inf x X L o ( x , v , λ ) , v V V 0
where
L o ( x , v , λ ) = f ( x , v ) + λ T g ( x , v )
The last theorem results directly from the strong duality theorem [25].
Substituting in the problem (50) the expression (55) for z ( v ) and (52) for the v V 0 constraint, we obtain an equivalent problem:
min v V sup λ 0 inf x X L o ( x , v , λ )
s.t.
inf x X L f ( x , v , λ ) 0 , λ Λ
Using the definition of supremum as the smallest upper constraint and introducing an additional, scalar variable μ , we obtain the following form of the master problem, equivalent to (57)–(58):
min v V , μ μ
s.t.
inf x X L o ( x , v , λ ) μ , λ 0
inf x X L f ( x , v , λ ) 0 , λ Λ
In practice, it can be assumed that the function z ( v ) (see formula (48)) is constrained for all v V , set X compact and functions f ( x , v ) and g ( x , v ) are continuous throughout the domain. Therefore, we can replace the infimum with minimum. Then the master problem will take the following form:
min v V , μ μ
s.t.
min x X L o ( x , v , λ ) μ , λ 0
min x X L f ( x , v , λ ) 0 , λ Λ
where the L o and L f functions are given by formulas (54) and (56).
The problem (62)–(64) is very difficult to solve due to constraints, which have to be satisfied in an infinite and even uncountable number of points (for all λ with nonnegative coordinates or nonnegative summing up to unity) and the existence of internal optimization subproblems (minimization with respect to x).
These difficulties are overcome by relaxing this problem, more precisely by solving it with the use of successive, more and more precise, approximations of functions on the left hand side of the (63) and (64) constraints. They will be made up of pieces of the L o or L f functions related to the optimal solutions of the primal problem, in a situation, respectively, when the next trial point from the master problem v * belongs to V 0 or not.

5. Basic Properties of the Primal Problem

The primal problem is the initial problem (4)–(7) solved for a fixed v = v * V
min x X f ( x , v * )
s.t.
g ( x , v * ) 0
While solving this problem, there may be two cases: when the primal problem is feasible, i.e., it has a solution, and when it has no solution.

5.1. The Case When the Primal Problem Has a Solution

Suppose that for v * obtained from the relaxed master problem, the primal problem (65)–(66) has a solution, to which corresponds the optimal vector of Lagrange multipliers λ * . These multipliers will be the multipliers of the new constraint of type (63).
So a constraint of type
min x X L o ( x , v , λ * ) μ
should be added to the relaxed master problem.

5.2. The Case When the Primal Problem Has No Solution

A more complicated situation occurs when the primal problem has no solution. According to Theorem 2, for a fixed v = v * the primal problem has a solution if the following condition is met:
min x X L f ( x , v * , λ ) = λ T g ( x , v * ) 0 , λ Λ
where
Λ = λ R m : λ 0 , j = 1 m λ j = 1
If a solver detects that there is no solution of the primal problem, one should find a vector λ ^ Λ for which
min x X λ ^ T g ( x , v * ) > 0
Then it is taken that λ * = λ ^ and the following inequality is added to the relaxed master problem:
min x X L f ( x , v , λ * ) 0
Theorem 4.
If the primal problem (65)–(66) is infeasible, the vector of Lagrange multipliers λ ^ Λ satisfying the condition (70) can be determined by solving the following auxiliary problem (the so-called feasibility problem):
min x X max j = 1 , , m g j ( x , v * )
Proof. 
The problem (72) is equivalent to the problem:
min x X , α α
s.t.
g j ( x , v * ) α , j = 1 , , m
It certainly has a solution due to the previously taken assumptions concerning the g ( x , v ) function.
Without loss of generality, let us assume that the set X is defined by the inequality constraints r ( x ) ; that is, we have a problem:
min x , α α
s.t.
g j ( x , v * ) α , j = 1 , , m
r j ( x ) 0 , j = 1 , , t
We assume that all constraints functions g and r are continuously differentiable and convex, and the points of their activities are regular. The Lagrangian for this problem is as follows:
L α f ( x , α , v * , λ ) = α + j = 1 m λ j g · ( g j ( x , v * ) α ) + j = 1 t λ j r · r j ( x )
where λ g 0 , λ r 0 are vectors of Lagrange multipliers for constraints (76) and (77), respectively, and λ = λ g λ r . From the Karush–Kuhn–Tucker optimality conditions for the α variable we obtain the equation:
1 j = 1 m λ ^ j g = 0
that is, comparing with the definition of Λ (69), λ g Λ . Taking into account the equality (79) in the Lagrangian formula (78) at the optimal point, as well as the conditions of complementarity in relation to the constraints (77), we will obtain:
L α f ( x * , α * , v * , λ ^ ) = j = 1 m λ ^ j g · g j ( x * , v * ) = L f ( x * , v * , λ ^ g )
So, the vector of Lagrange multipliers for the constraints (76) calculated at the optimal point in the problem (75)–(77) is the wanted λ ^ vector. □
Modern optimization solvers usually return such a vector of Lagrange multipliers when they state that a feasible solution does not exist.

6. Basic Properties of the Master Problem: Cuts

It would be best if, from the solutions of the primal problem for different v = v * , it was possible to use not only Lagrange multipliers at the optimal point λ * , but also optimal vectors of the primal variables x * . To obtain a solution of the basic problem (4)–(7) using this approach, certain conditions must be met.
The most important is as follows:
Assumption 1.
Both Lagrangians L o ( x , v , λ ) and L f ( x , v , λ ) for any x X , v V and λ 0 can be written as composite functions:
L o ( x , v , λ ) = Q o ( w o ( x , λ ) , v , λ )
L f ( x , v , λ ) = Q f ( w f ( x , λ ) , v , λ )
where w o , w f are scalar functions of x and λ, and Q o , Q f are increasing functions of the first argument, convex of the second.
Theorem 5.
Let us suppose that for the problem (4)–(7) Assumption 1 is satisfied. Let x * be the optimal solution of the problem
min x X L o ( x , v * , λ * )
Then also for v v *
min x X L o ( x , v , λ * ) = L o ( x * , v , λ * )
Proof. 
According to Assumption 1, due to the increasing dependency of the Lagrange function on the first argument, from the formula (81) we have:
Q o ( w o ( x * , λ * ) , v * , λ * ) = L o ( x * , v * , λ * ) = min x X L o ( x , v * , λ * )
= min x X Q o ( w o ( x , λ * ) , v * , λ * ) = Q o ( min x X w o ( x , λ * ) , v * , λ * )
Hence, taking into account the injectivity of strictly monotone functions:
w o ( x * , λ * ) = min x X w o ( x , λ * )
So, for v v *
min x X L o ( x , v , λ * ) = min x X Q o ( w o ( x , λ * ) , v , λ * ) = Q o ( min x X w o ( x , λ * ) , v , λ * )
= Q o ( w o ( x * , λ * ) , v , λ * ) = L o ( x * , v , λ * )
Consequently, if Assumption 1 is in effect, the constraint (63) in the master problem can be approximated around v = v * by the constraint:
μ L o ( x * , v , λ * )
where the vectors x * , λ * come from the solution of the primal problem (65)–(66).
From now on this constraint will be called a cut.
From Assumption 1, yet another convenience follows. Given the convexity of the Lagrange function L o with respect to v, in the case when it is differentiable with respect to this vector, we obtain:
L o ( x * , v , λ * ) L o ( x * , v * , λ * ) + L o T v ( x * , v * , λ * ) ( v v * )
In this way, the cut (88) can be relaxed and replaced in the master problem by a linear cut
μ L o ( x * , v * , λ * ) + L o T v ( x * , v * , λ * ) ( v v * )
If the Lagrange function is not differentiable, the gradient can be replaced with a subgradient.
Analogical reasoning can be performed to the feasibility constraint (64).
It is easy to check that Assumption 1 is satisfied for two classes of problems:
  • Separable—when the f, g functions are sums of components dependent on x and v, that is
    f ( x , v ) = f 1 ( x ) + f 2 ( v )
    g ( x , v ) = g 1 ( x ) + g 2 ( v )
    where the components f 2 , g 2 are convex on V.
  • Variable factor programming of the form:
    min x , v f ( x , v ) = i = 1 q v i f i ( x i )
    s.t.
    g ( x , v ) = i = 1 q v i x i c 0
    A v b , v 0 , v R q
    x i 0 , x i , c R m
    where v i , i = 1 , , q are non-negative scalars and c , x i , i = 1 , , q are vectors of the dimension m. A special case of this type of problem is the bilinear problem, which is nonconvex [24].
If the problem (4)–(7) is neither separable nor with variable factors, but convex with respect to the full vector of variables ( x , v ) , the cuts (90) can also be used.
Theorem 6.
For convex problems with respect to the full vector ( x , v ) the evaluation (90) is valid.
Proof. 
For convex problems with differentiable functions (if the Lagrange function is not differentiable, the gradient can be replaced with a subgradient) we have for fixed λ = λ * and all x and v:
L o ( x , v , λ * ) L o ( x * , v * , λ * ) + L o T x ( x * , v * , λ * ) ( x x * )
+ L o T v ( x * , v * , λ * ) ( v v * )
This inequality will be preserved when we compute the minimum with respect to x on both sides:
min x X L o ( x , v , λ * ) min x X L o ( x * , v * , λ * ) + L o T x ( x * , v * , λ * ) ( x x * ) + L o T v ( x * , v * , λ * ) ( v v * )
= L o ( x * , v * , λ * ) + L o T v ( x * , v * , λ * ) ( v v * ) + min x X L o T x ( x * , v * , λ * ) ( x x * )
Note that the last component of the expression (98) is equal to zero, as the point x * is optimal for v = v * , λ = λ * (there is no feasible direction of improvement, i.e., with a negative directional derivative), so the inequality (90) is valid. □

7. Computational Algorithm

We will now consider the case when at the optimal point of the primal problem x * in a version for both v * V V 0 , that is (65)–(66), as well as v * V \ V 0 , that is (73)–(74), there exists a vector of Lagrange multipliers λ * . Therefore, these problems must fulfill certain regularity conditions (more on that below).
A computational algorithm based on the generalized Benders method can be formulated as follows:
GBD Algorithm
  • Choose a starting point v 0 V ; K o : = , K f : = , k : = 0 , take a convergence tolerance ε > 0.
  • For the fixed v = v k try to solve the primal problem (65)–(66).
    • If successful k : = k + 1 , remember as x k and λ k the obtained optimal point and the optimal vector of Lagrange multipliers, K o : = K o { k } . Update UBD = min { UBD , z ( v k ) } . If there has been an improvement in the estimate of the upper bound UBD, remember a pair ( x k , v k ) as the best solution so far. If UBD LBD ε then STOP.
    • Otherwise solve the feasibility problem (73)–(74) (as far as the used solver does not solve it itself facing infeasibility), k : = k + 1 , remember as x k and λ k the optimal point obtained and the vector of optimal Lagrange multipliers corresponding to constraints, K f : = K f { k } .
  • Solve the relaxed master problem:
    • in nonlinear version
      min v V , μ μ
      s.t.
      L o ( x l , v , λ l ) μ , l K o
      L f ( x l , v , λ l ) 0 , l K f
    • or linearized
      min v V , μ μ
      s.t.
      L o ( x l , v l , λ l ) + L o T v ( x l , v l , λ l ) ( v v l ) μ , l K o ,
      L f ( x l , v l , λ l ) + L f T v ( x l , v l , λ l ) ( v v l ) 0 , l K f
    Let ( v k , μ k ) be the optimal solution of the above problem. Then μ k is the lower estimate of the optimal objective of the original problem. Let LBD = μ k . If UBD LBD ε then STOP.
  • Go to 2.

Convergence of the Algorithm

It has been proved that the GBD algorithm converges in a finite number of steps for any ε > 0 , when [2]:
  • V is a finite discrete set, and the assumptions of Theorems 2 and 3 for Case 1 (then the same is also true for ε = 0) are satisfied.
  • V is a nonempty, compact subset of V 0 , X is a nonempty compact convex set, functions f and g are convex on X for each fixed v V and continuous on X × V , the set of optimal Lagrange multipliers in problem (65)–(66) is non-empty for all v V 0 and constraints satisfy Slater’s regularity condition: x ¯ X , v ¯ V g ( x ¯ , v ¯ ) < 0 .
    In the case when inequality constraints result from the application of the transformation (9) or (10) to equality constraints (8), Slater’s condition will be fulfilled, when you relax them using the epsilon tube method.
  • V is not a subset of V 0 the constraint function g ( x , v ) is separable and the set X is defined using linear constraints; other conditions are as in point 2 [4].
A very important assumption is that there exist Lagrange multipliers for all v V V 0 . If this is not checked, serious errors can appear. Consider the following example given (in the context of a slightly different approach to decomposition of optimization problems) by Grothey et al. [26]. For the sake of completeness a X set has been added, defined as a box being a Cartesian product of intervals [ 10.10 ] along every coordinate. It does not matter for the reasoning.
min x 1 , x 2 , v v 2 x 2
s.t.
( x 1 1 ) 2 + x 2 2 ln v
( x 1 + 1 ) 2 + x 2 2 ln v
v 1
10 x i 10 , i = 1 , 2
The admissible area for the case when v > e is presented in Figure 1.
For v * = e we will have the problem:
min x 1 , x 2 , v e 2 x 2
s.t.
( x 1 1 ) 2 + x 2 2 1
( x 1 + 1 ) 2 + x 2 2 1
10 x i 10 , i = 1 , 2
Its optimal solution is the only feasible point (0,0). It is easy to check that at this point the Lagrange multipliers for the constraints (111)–(112) do not exist, so the expressions (100) and (103) lose their sense. It results from the fact that the Fiacco–McCormick regularity condition (that gradients of active constraints are linearly independent) is not satisfied at this point. Interestingly, the primal problem (105)–(109), with respect to the full vector ( x , v ) , is convex and regular—this condition is met also at point ( x , v ) = ( 0 , 0 , e ) .

8. Effective Algorithms for Solving the Master Problem for Discrete Variables in the Version with Linear Cuts

Here we will discuss how to effectively solve the linear master problem (102)–(104) when V Z q .
The GBD algorithm has some serious disadvantages. The most important of them is the number of constraints, which grows with every solution of the master problem, making it more complicated, and, hence, extending its maximum and average (in a certain window, because there can always be short single executions of the master problem, e.g., if the next optimal point v * is close to a given starting point, but this is not a typical thing) execution time. Moreover, there is no guarantee that in the next steps the value UBD of the approximation of the feasible solution will decrease. Therefore, it is worthwhile to use more sophisticated algorithms.
Nowadays the most useful as well as the most commonly used are the algorithms based on the cutting plane method, especially in conjunction with the interior point method [27].
The classical cutting plane method (more precisely the “cutting off plane method”) by Kelley [28], to which the master problem is reduced in the version with linearized cuts, has a number of disadvantages. First of all, it converges slowly. There are assessments which state that, to reach a solution with an accuracy of ε > 0 , this method requires O ( 1 ε q + 1 ) iterations [29]. A serious problem, mentioned above, is also that, as the calculations progress, the growing number of cuts causes subsequent iterations to be longer and longer. Unfortunately, despite many studies having been conducted, including recent machine learning techniques [8], there are no reliable and simple rules for removing old cuts, even those which are inactive for the current solution of the problem (102)–(104) [30]. One can also observe some kind of instability when the next point generated by the algorithm may be far from the previous one, even though the previous one was already close to the optimum, and even within it [25]. This effect may be alleviated by additional distance control constraints for each set of binary variables, limiting the integer update to a certain distance from the actual integer variable [31].
The above drawbacks have been largely removed through modifications of the Kelley method [27]. Unfortunately, from the most popular bundle method, which consists in adding a quadratic proximal component to a linear approximation of the objective function, which is a penalty for deviation from the last significant solution [32,33], very little can be expected in theory in the discrete case and “the mere production of a feasible point is never guaranteed” [34].
Much better, perhaps the best in this (i.e., MINLP) case, is the largest inscribed sphere method [35,36]. In this method the concept of a localization set is used. For the best estimate so far (say up to the k-th iteration of the master problem) of the upper optimal value of the objective function of the initial problem—let us denote it with U B D k —it will be the set:
L k = ( v , μ ) Z q × R μ U B D k φ ( v l ) + φ ( v l ) T ( v v l ) μ , l K o ξ ( v l ) + ξ ( v l ) T ( v v l ) 0 , l K f
where:
φ ( v l ) = L o ( x l , v l , λ l ) = f ( x l , v l ) + ( λ l ) T g ( x l , v l )
φ ( v l ) = L o v ( x l , v l , λ l ) = f v ( x l , v l ) + g v ( x l , v l ) λ l
ξ ( v l ) = L f ( x l , v l , λ l ) = ( λ l ) T g ( x l , v l )
ξ ( v l ) = L f v ( x l , v l , λ l ) = g v ( x l , v l ) λ l
Note, that by assuming u = ( v , μ ) , we can express the constraints that describe this set in the following way:
A u b
where matrix A is S × ( q + 1 ) , with S = ( K ¯ ¯ o + K ¯ ¯ f + 1 ) . The largest inscribed sphere method is one of the central point methods (from the larger family of interior point methods), which rely on looking in the subsequent steps k, instead of the point minimizing the objective function (102), for a point which, according to some measure, lies “in the middle” of the localization set L k (114).
The center of the largest inscribed sphere for the set U = { u Z q × R : A u b } (called also the Chebyshev center) is determined by solving a linear programming problem:
max u Z q × R , σ 0 σ
s.t.
a i T u + a i σ b i , i = 1 , , S
The advantage of this method is, in addition to a linear formulation, the ease of eliminating unnecessary (distant) cuts, based on zero values of Lagrange multipliers (inactive constraints (121)), if in subsequent iterations of the master problem, the value of σ ^ decreased [35]. This method needs O ( ( q + 1 ) ln 1 ε ) iterations to achieve the optimal solution with the accuracy of ε [29].
The next possibility is solving at the master level instead of (102)–(104) so-called outer approximation (OA) problem [37]:
min x X , v V , μ R μ
f ( x l , v l ) + f ( x l , v l ) T x x l v v l μ , l K o
g ( x l , v l ) + g ( x l , v l ) T x x l v v l 0 , l K f
GBD can be regarded as a particular case of the OA method (because constraints in GBD are linear combinations of those in OA), and the lower bounds of GBD are weaker (that is not of a greater value) than those of OA. It means that OA used at the master lever requires a smaller number of iterations, however at a higher cost for solving the master problem, since the number of constraints added per iteration is greater [38].
Instead of calling many times at the master level, an external Mixed Integer Linear Programming (MILP) solver, which is time costly, more advanced users can write their own specialized solvers which avoid it. The simplest way is to use the branch-and-cut algorithm proposed in [39], which combines the above OA method and branch and bound algorithm. Cuts—both optimality and feasibility—are made when a relaxed (to continuous) master problem delivers a discrete solution v * V , otherwise, in the case of a better than the discrete solution best so far, a partition of the domain is performed (branching) along a chosen variable v i * with a fractional value or, in the opposite case, the given region (node) is fathomed [39]. Some improvements to this approach were proposed in [40].
In Boolean problems, that is where V { 0 , 1 } q , it may be useful to add the following cuts, assuring that the old solutions will not appear again [41]:
j B l v j j N l v j | B l | 1 , l K o K f
where B l = { j | v j l = 1 } and N l = { j | v j l = 0 } .
Last but not least, it is always possible to extend the master level algorithm with some heuristics or machine learning techniques, connected more or less to specific features of the practical problems solved, e.g., [8,42,43,44,45,46,47,48].
The latest tests confirm that owing to the Benders method as a big problem as having 1956 binary and 57,616 continuous variables the three-scenario batch plant design problem can be solved with a 9.45% relative optimality gap in 729 s on a machine with 12 Xeon Intel processors, while a general solver BARON in 50,000 s delivers a solution with a 22.3% optimality gap (such solvers as SBB, Alpha-ECP, DICOPT failed to deliver a feasible solution in 50,000 s) [14].

9. Conclusions

The Benders method, although it was proposed sixty years ago, is still very vital. It is often used, especially in the version generalized by Geoffrion, to solve many large practical problems in the field of technology, mainly related to all types of networks, e.g., telecommunications, energy, gas, transport. These problems are characterized by mixed, discrete–continuous nature of decision variables, which is often accompanied by a nonconvexity of objective functions and constraints, but of such a type that, after fixing one of two subvectors of the problem variables, the problem becomes convex with respect to the second subvector (such are, for example, bilinear problems). Then the Benders method converges in a finite number of steps to exact solutions. This method copes well with the infeasibility of primal problems—with respect to the vector x—for some values of the complicating variables v, solving the feasibility problem from which new constraints are obtained—feasibility cuts. When the primal problem is feasible, the optimality cut is generated, eliminating part of the admissible set of complicating variables V, for which the objective function has worse values than those obtained so far.
There are two versions of the Benders method: with nonlinear and linear cuts. Both can be used when the condition of independence of the solution of the Lagrangian minimization problem in the primal problem from the vector of complicating variables is met as well as when the Lagrangian is convex with respect to complicating variables. Linear cuts can also be used in problems convex with respect to the full vector ( x , v ) .
The method with linear cuts (which in fact is a version of the Kelley cutting plane method) seems to be more practical, due to the ease of solving the master problem, e.g., using linear or quadratic solvers. It has many advantages, especially when using the largest inscribed sphere method (from the area of nondifferentiable optimization), outer approximation method or outer-approximation-based branch-and-cut method. In particular, practical problems, heuristics and machine learning techniques can additionally increase the efficiency of the master level algorithm.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ADMMAlternating Directions Multiplier Method
GBDGeneralized Benders Decomposition
LBDLower Bound
UBDUpper Bound
MILPMixed Integer Linear Programming
MINLPMixed Integer Nonlinear Programming
OAouter approximation (algorithm)

References

  1. Benders, J.F. Partitioning Procedures for Solving Mixed-Variables Programming Problems. Numer. Math. 1962, 4, 238–252. [Google Scholar] [CrossRef]
  2. Geoffrion, A.M. Generalized Benders Decomposition. J. Optim. Theory Appl. 1972, 10, 237–260. [Google Scholar] [CrossRef]
  3. Floudas, C.A. Nonlinear and Mixed-Integer Optimization; Oxford University Press: New York, NY, USA, 1995. [Google Scholar]
  4. Floudas, C.A. Generalized Benders Decomposition GBD. In Encyclopedia of Optimization, 2nd ed.; Floudas, C.A., Pardalos, P.M., Eds.; Springer: Boston, MA, USA, 2009. [Google Scholar]
  5. Lee, J.; Leyffer, S. Mixed Integer Nonlinear Programming; Springer: New York, NY, USA, 2012. [Google Scholar]
  6. Li, D.; Sun, X. Nonlinear Integer Programming; Springer: New York, NY, USA, 2006. [Google Scholar]
  7. Chung, K.H.; Kim, B.H.; Hur, D. Distributed implementation of generation scheduling algorithm on interconnected power systems. Energy Convers. Manag. 2011, 52, 3457–3464. [Google Scholar] [CrossRef]
  8. Lee, M.Y.; Ma, N.; Yu, G.D.; Dai, H.Y. Accelerating Generalized Benders Decomposition for Wireless Resource. IEEE Trans. Wirel. Commun. 2021, 20, 1233–1247. [Google Scholar] [CrossRef]
  9. Geoffrion, A.M.; Graves, G.W. Multicommodity Distribution System Design by Benders Decomposition. Manag. Sci. 1974, 20, 822–844. [Google Scholar] [CrossRef]
  10. Lu, J.; Gupte, A.; Huang, Y. A mean-risk mixed integer nonlinear program for transportation network protection. Eur. J. Oper. Res. 2018, 265, 277–289. [Google Scholar] [CrossRef] [Green Version]
  11. Li, X. Parallel nonconvex generalized Benders decomposition for natural gas production network planning under uncertainty. Comput. Chem. Eng. 2013, 55, 97–108. [Google Scholar] [CrossRef]
  12. Osman, H.; Demirli, K. A bilinear goal programming model and a modified Benders decomposition algorithm for supply chain reconfiguration and supplier selection. Int. J. Prod. Econ. 2010, 124, 97–105. [Google Scholar] [CrossRef]
  13. Cai, X.; McKinney, D.C.; Lasdon, L.S.; Watkins, D.W., Jr. Solving Large Nonconvex Water Resources Management Models Using Generalized Benders Decomposition. Oper. Res. 2001, 49, 235–245. [Google Scholar] [CrossRef] [Green Version]
  14. Li, C.; Grossmann, I.E. An improved L-shape d method for two-stage convex 0–1 mixed integer nonlinear stochastic programs. Comput. Chem. Eng. 2018, 112, 165–179. [Google Scholar] [CrossRef]
  15. Li, C.; Grossmann, I.E. A finite ε convergence algorithm for two-stage stochastic convex nonlinear programs with mixed-binary first and second-stage variables. J. Glob. Optim. 2019, 75, 921–947. [Google Scholar] [CrossRef]
  16. Li, X.; Tomasgard, A.; Barton, P.I. Nonconvex Generalized Benders Decomposition for Stochastic Separable Mixed-Integer Nonlinear Programs. J. Optim. Theory Appl. 2011, 151, 425–454. [Google Scholar] [CrossRef] [Green Version]
  17. Floudas, C.A.; Aggarwal, A.; Ciric, A.R. Global Optimum Search for Nonconvex NLP and MINLP Problems. Comput. Chem. Eng. 1989, 13, 1117–1132. [Google Scholar] [CrossRef]
  18. Aggarwal, A.; Floudas, C.A. A Decomposition Approach for Global Optimum Search in QP, NLP and MINLP Problems. Ann. Oper. Res. 1990, 25, 119–146. [Google Scholar] [CrossRef]
  19. Türkay, M.; Grossmann, I.E. Logic-Based MINLP Algorithms for the Optimal Synthesis of Process Networks. Comput. Chem. Eng. 1996, 20, 959–978. [Google Scholar] [CrossRef]
  20. Li, X.; Chen, Y.; Barton, P.I. Nonconvex Generalized Benders Decomposition with Piecewise Convex Relaxations for Global Optimization of Integrated Process Design and Operation Problems. Ind. Eng. Chem. Res. 2012, 51, 7287–7299. [Google Scholar] [CrossRef]
  21. Nowak, I. Relaxation and Decomposition Methods for Mixed Integer Nonlinear Programming; Birkhäuser: Basel, Switzerland, 2005. [Google Scholar]
  22. Boyd, S. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
  23. Li, X.; Bi, S.; Wang, H. Optimizing Resource Allocation for Joint AI Model Training and Task Inference in Edge Intelligence Systems. IEEE Wirel. Commun. Lett. 2021, 10, 532–536. [Google Scholar] [CrossRef]
  24. Adams, W.P.; Sherali, H.D. Mixed-integer bilinear programming problems. Math. Program. 1993, 59, 279–305. [Google Scholar] [CrossRef]
  25. Bertsekas, D.P. Nonlinear Programming, 2nd ed.; Athena Scientific: Belmont, MA, USA, 1999. [Google Scholar]
  26. Grothey, A.; Leyffer, S.; McKinnon, K.I.M. A Note on Feasibility in Benders Decomposition. In Numerical Analysis Report NA/188; Dundee University: Dundee, UK, 1999. [Google Scholar]
  27. Elhedhli, S.; Goffin, J.-L.; Vial, J.-P. Nondifferentiable optimization: Cutting plane methods. In Encyclopedia of Optimization, 2nd ed.; Floudas, C.A., Pardalos, P.M., Eds.; Springer: Boston, MA, USA, 2009. [Google Scholar]
  28. Kelley, J.E. The Cutting-Plane Method for Solving Convex Programs. J. Soc. Ind. Appl. Math. 1960, 8, 703–712. [Google Scholar] [CrossRef]
  29. Goffin, J.-L.; Vial, J.-P. Convex nondifferentiable optimization: A survey focused on the analytic center cutting plane method. Optim. Methods Softw. 2002, 17, 805–867. [Google Scholar] [CrossRef]
  30. Ruszczyński, A. Nonlinear Optimization; Princeton University Press: Princeton, NJ, USA, 2006. [Google Scholar]
  31. Franke, M.B. Mixed-integer optimization of distillation sequences with Aspen Plus: A practical approach. Comput. Chem. Eng. 2019, 131, 106583. [Google Scholar] [CrossRef]
  32. Lemaréchal, C. An extension of Davidon methods to nondifferentiable problems. In Mathematical Programming Study 3; Balinski, M.L., Wolfe, P., Eds.; North-Holland: Amsterdam, The Netherlands, 1975; pp. 95–109. [Google Scholar]
  33. Kiwiel, K.C. Methods of Descent for Nondifferentiable Optimization; Springer: Berlin/Heidelberg, Germany, 1985. [Google Scholar]
  34. Daniilidis, A.; Lemaréchal, C. On a primal-proximal heuristic in discrete optimization. Math. Program. Ser. A 2005, 104, 105–128. [Google Scholar] [CrossRef]
  35. Elzinga, J.; Moore, T.G. A central cutting plane algorithm for the convex programming problem. Math. Program. 1975, 8, 134–145. [Google Scholar] [CrossRef]
  36. Kronqvist, J.; Bernal, D.E.; Lundell, A.; Westerlund, T. A center-cut algorithm for quickly obtaining feasible solutions and solving convex MINLP problems. Comput. Chem. Eng. 2019, 122, 105–113. [Google Scholar] [CrossRef]
  37. Quesada, I.; Grossmann, I.E. An LP/NLP Based Branch and Bound Algorithm for Convex MINLP Optimization Problems. Comput. Chem. Eng. 1992, 16, 937–947. [Google Scholar] [CrossRef]
  38. Grossmann, I.E. Review of Nonlinear Mixed-Integer and Disjunctive Programming Techniques. Optim. Eng. 2002, 3, 227–252. [Google Scholar] [CrossRef]
  39. Bonami, P.; Biegler, L.T.; Conna, A.R.; Cornuejols, G.; Grossmann, I.E.; Laird, C.D.; Lee, J.; Lodi, A.; Margot, F.; Sawaya, N.; et al. An algorithmic framework for convex mixed integer nonlinear programs. Discret. Optim. 2008, 5, 186–204. [Google Scholar] [CrossRef]
  40. Su, L.J.; Tang, L.X.; Grossmann, I.E. Computational strategies for improved MINLP algorithms. Comput. Chem. Eng. 2015, 75, 40–48. [Google Scholar] [CrossRef]
  41. Duran, M.A.; Grossmann, I.E. An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Math. Program. 1986, 36, 307–339. [Google Scholar] [CrossRef]
  42. Raman, R.; Grossmann, I.E. Integration of Logic and Heuristic Knowledge in MINLP Optimization for Process Synthesis. Comput. Chem. Eng. 1992, 16, 155–171. [Google Scholar] [CrossRef]
  43. Sirikum, J.; Techanitisawad, A.; Kachitvichyanukul, V. A new efficient GA-Benders’ decomposition method: For power generation expansion planning with emission controls. IEEE Trans. Power Syst. 2007, 22, 1092–1100. [Google Scholar] [CrossRef]
  44. Franke, M.B.; Nowotny, N.; Ndocko, E.N.; Górak, A.; Strube, J. Design and Optimization of a Hybrid Distillation/Melt Crystallization Process. AICHE J. 2008, 54, 2925–2942. [Google Scholar] [CrossRef]
  45. Naoum-Sawaya, J.; Elhedhli, S. A Nested Benders Decomposition Approach for Telecommunication Network Planning. Nav. Res. Logist. 2010, 57, 519–539. [Google Scholar] [CrossRef]
  46. Chen, S.; Geunes, J. Optimal allocation of stock levels and stochastic customer demands to a capacitated resource. Ann. Oper. Res. 2013, 203, 33–54. [Google Scholar] [CrossRef]
  47. Marufuzzaman, M.; Eksioglu, S.D. Managing congestion in supply chains via dynamic freight routing: An application in the biomass supply chain. Transp. Res. Part E-Logist. Transp. Rev. 2017, 99, 54–76. [Google Scholar] [CrossRef]
  48. Meshgi, H.; Zhao, D.M.; Zheng, R. Optimal Resource Allocation in Multicast Device-to-Device Communications Underlaying LTE Networks. IEEE Trans. Veh. Technol. 2017, 66, 8357–8371. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Cross-section through the admissible set in the problem (105)–(109) for v > e .
Figure 1. Cross-section through the admissible set in the problem (105)–(109) for v > e .
Energies 14 06503 g001
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Karbowski, A. Generalized Benders Decomposition Method to Solve Big Mixed-Integer Nonlinear Optimization Problems with Convex Objective and Constraints Functions. Energies 2021, 14, 6503. https://doi.org/10.3390/en14206503

AMA Style

Karbowski A. Generalized Benders Decomposition Method to Solve Big Mixed-Integer Nonlinear Optimization Problems with Convex Objective and Constraints Functions. Energies. 2021; 14(20):6503. https://doi.org/10.3390/en14206503

Chicago/Turabian Style

Karbowski, Andrzej. 2021. "Generalized Benders Decomposition Method to Solve Big Mixed-Integer Nonlinear Optimization Problems with Convex Objective and Constraints Functions" Energies 14, no. 20: 6503. https://doi.org/10.3390/en14206503

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop