On the Locally Polynomial Complexity of the Projection-Gradient Method for Solving Piecewise Quadratic Optimisation Problems

Prusińska, Agnieszka; Szkatuła, Krzysztof; Tret’yakov, Alexey

doi:10.3390/e23040465

Open AccessArticle

On the Locally Polynomial Complexity of the Projection-Gradient Method for Solving Piecewise Quadratic Optimisation Problems

by

Agnieszka Prusińska

^1,*,†

,

Krzysztof Szkatuła

^1,2,†

and

Alexey Tret’yakov

^1,2,3,†

¹

Faculty of Exact and Natural Sciences, Siedlce University, 08-110 Siedlce, Poland

²

Systems Research Institute, Polish Academy of Sciences, 01-447 Warsaw, Poland

³

Dorodnicyn Computing Centre of FRC CSC, Russian Academy of Sciences, 119333 Moscow, Russia

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2021, 23(4), 465; https://doi.org/10.3390/e23040465

Submission received: 13 March 2021 / Revised: 2 April 2021 / Accepted: 14 April 2021 / Published: 15 April 2021

(This article belongs to the Section Complexity)

Download Versions Notes

Abstract

:

This paper proposes a method for solving optimisation problems involving piecewise quadratic functions. The method provides a solution in a finite number of iterations, and the computational complexity of the proposed method is locally polynomial of the problem dimension, i.e., if the initial point belongs to the sufficiently small neighbourhood of the solution set. Proposed method could be applied for solving large systems of linear inequalities.

Keywords:

optimisation problem; piecewise quadratic function; gradient method; system of linear inequalities; locally polynomial complexity

1. Introduction

Let us consider the following optimisation problem:

min_{x \in R^{n}} {∥{(A \cdot x - b)}_{+}∥}^{2},

(1)

where

c_{+} : = max \{c, 0\}

, A is an

m \times n

matrix,

A = \{a_{i j}\}

,

x \in R^{n}

,

x = \{x_{i}\}

,

b \in R^{m}

,

b = \{b_{j}\},

i = 1, \dots, n

,

j = 1, \dots m

and

∥ \cdot ∥

is the Euclidean norm of

R^{n}

.

In this paper, a method for solving the problem in (1) is proposed; moreover, the number of iterations (equivalent to the computational complexity) required by the proposed method with respect to m and n is locally polynomial, and in the worst-case scenario, it has a geometric convergence rate.

Let us define the set of solutions of (1) as follows

X^{*} : = \{x^{*} |x^{*} = arg min_{x \in R^{n}} {∥{(A \cdot x - b)}_{+}∥}^{2}\} .

(2)

If some point sufficiently close to the set

X^{*}

of solutions to (1) is known, then it is possible to find a solution of (1) within a polynomial number of computational iterations; thus, the computational complexity is of the order of

O (m^{3} \cdot n^{3})

.

Many methods for solving (1) have been proposed (cf. Karmanov [1], Golikov and Evtushenko [2], Evtushenko and Golikov [3], Tretyakov [4], Tretyakov and Tyrtyshnikov [5] and Han [6]). All of these methods have reasonable computational complexity but, as mentioned above, to date, no strongly polynomial-time algorithm for solving (1) has been proposed. In studies by Tretyakov and Tyrtyshnikov [7] and Mangasarian [8], linear programming problems were solved by reducing them to the unconditional minimisation of strongly convex piecewise quadratic functions. A solution is obtained within a finite polynomial number of iterations if the starting point of the algorithm belongs to a sufficiently close neighbourhood of the unique solution to the problem. Unfortunately, the authors imposed severe limitations on the functions to be minimised: they should be strongly convex, the eigenvalues of the Hessian matrices should satisfy specific conditions, etc.

These results create significant limitations on the class of problems that can be solved: it is required that (1) has only one unique solution, etc. The solution method described by Tretyakov and Tyrtyshnikov [7] is based on exploiting information about the problem being solved by analysing a sufficiently small neighbourhood of an arbitrary solution of (1). Analogous methods were proposed by Facchinei et al. [9] for the forecasting (identification) of the active constraints in a sufficiently close neighbourhood of the solution to the problem. In papers by Tretyakov and Tyrtyshnikov [5] and Wright [10], locally polynomial methods for solving quadratic programming problems based on similar ideas were presented. Tretyakov [4] proposed the gradient projection method for solving (1); this method involves finding a solution of (1) in a finite number of iterations and is a combination of iterative and straightforward (e.g., Gaussian) methods.

This paper proposes a computational method for solving (1). When the starting point of the proposed method is sufficiently close to the set

X^{*}

of solutions to (1), then its computational complexity is locally polynomial, i.e., it is of the order of

O (m^{3} \cdot n^{3})

.

We point out that solving a system of linear inequalities involves

A \cdot x - b \leq 0_{m},

(3)

where the

0_{m}

– m-dimensional vectors of zeroes can be reduced to solve the problem (1). This means that the number of computations required for establishing a solution (if a given system of linear inequalities has one) is locally polynomial.

Let us denote

X : = \{x \in R^{n} |A \cdot x - b \leq 0_{m}\} .

(4)

It is obvious that the set X might be empty in general, but our method, presented in this paper, either determines this situation in a locally polynomial number of computations or provides a solution to the system (3). The proposed method could be applied when solving large systems of linear inequalities, which appear in many practical, industrial applications, e.g., the simplex method (Pan [11]), Karmarkar’s method (Wright, [12]), Chubanov’s method (Roos [13]), and Fourier–Motzkin elimination method (Khachiyan [14], I. Šimeček, R. Fritsch, D. Langr, R. Lórencz [15]).

2. Definitions and Theoretical Results

Let

φ (x) = \frac{1}{2} \cdot {∥{(A \cdot x - b)}_{+}∥}^{2} .

(5)

Theorem 1.

The function

φ (x)

is convex and has a nonempty set of minimal values

X^{*} : = \{x^{*} \in R^{n} |φ (x^{*}) = min_{x \in R^{n}} φ (x)\} .

(6)

Proof.

Theorem 1 follows immediately from the well-known features of quadratic-type convex functions (see, e.g., [16]). □

It is obvious that the elements

x^{*} \in X^{*}

, cf. (6) satisfy

φ^{'} {(x^{*})}^{⊤} = \sum_{i = 1}^{m} {(〈a_{i}, x^{*}〉 - b_{i})}_{+} \cdot a_{i} = 0_{n} = A^{T} \cdot {(A \cdot x^{*} - b)}_{+},

(7)

where

a_{i}^{T}

is the ith row of matrix A.

Therefore, in the general case, our goal is to solve the following equation

φ^{'} {(x)}^{⊤} = \sum_{i = 1}^{m} {(〈a_{i}, x〉 - b_{i})}_{+} \cdot a_{i} = A^{T} \cdot {(A \cdot x - b)}_{+} = 0_{n}, where x \in R^{n} .

(8)

In the sequel,

x^{*}

stands for an arbitrary element of

X^{*}

(the minimum point of

φ

). If the minimum value of

φ

is equal to zero, then

X = X^{*}

, and if the minimum value of

φ

is positive, then

X = \emptyset

. Let us denote

f_{i} (x) : = 〈a_{i}, x〉 - b_{i}, i \in D = {1, \dots m},

and

\begin{matrix} J_{0} (x) & : = \{i \in D |f_{i} (x) = 0\}, J_{-} (x) : = \{i \in D |f_{i} (x) < 0\}, \\ J_{+} (x) & : = \{i \in D |f_{i} (x) > 0\}, \end{matrix}

(9)

where

f_{i} (x)

is introduced to simplify the definitions of the sets

J_{0} (x)

and

J_{+} (x)

.

According to (7) and the above notations,

x^{*}

should satisfy the formula

\sum_{i \in J_{0} (x^{*}) \cup J_{+} (x^{*})} {(〈a_{i}, x^{*}〉 - b_{i})}_{+} \cdot a_{i} = 0_{n} .

(10)

The formula (10) is equivalent to a condition that should be satisfied at point

x^{*}

\begin{matrix} \sum_{i \in J_{0} (x^{*}) \cup J_{+} (x^{*})} (〈a_{i}, x^{*}〉 - b_{i}) \cdot a_{i} = 0_{n}, \end{matrix}

(11)

In (11), it is considered that

{(〈a_{i}, x^{*}〉 - b_{i})}_{+} = 〈a_{i}, x^{*}〉 - b_{i}, i \in J_{0} (x^{*}) \cup J_{+} (x^{*}) .

This, in turn, means that, in the general case, we should solve the following equations

\sum_{i \in J_{0} (x) \cup J_{+} (x)} {(〈a_{i}, x〉 - b_{i})}_{+} \cdot a_{i} = 0_{n},

(12)

or

\begin{matrix} \sum_{i \in J_{+} (x)} & {(〈a_{i}, x〉 - b_{i})}_{+} \cdot a_{i} = 0_{n}, \\ 〈a_{i}, x〉 - b_{i} = 0, i \in J_{0} (x) . \end{matrix}

(13)

Without loss of generality, we may denote

J_{-} (x^{*}) : = \{1, \dots, l\}, J_{0} (x^{*}) : = \{l + 1, \dots, p\}, J_{+} (x^{*}) : = \{p + 1, \dots, m\},

where

l \leq p \leq m

.

The main idea exploited in this paper is based on the following Lemma. For

ε > 0

we set

U_{ε} (x^{*}) : = \{x \in R^{n} : ∥ x - x^{*} ∥ \leq ε\}

.

Lemma 1.

Let

x^{*}

be a solution to the problem (1). Then, there exists

ε > 0

, such that for any

x \in U_{ε} (x^{*})

, the inequality

f_{i} (x) \geq 0

implies the inequality

f_{i} (x^{*}) \geq 0

.

Proof.

If

i \in J_{-} (x^{*}),

that is

f_{i} (x^{*}) < 0

, then, by continuity of the function

f_{i}

, there exists

ε_{i} > 0

, such that

f_{i} (x) < 0

for all

x \in U_{ε_{i}} (x^{*})

. Set

ε = min_{i \in J_{-} (x^{*})} ε_{i} .

Then, for all

i \in J_{-} (x^{*})

and for all

x \in U_{ε} (x^{*})

, we have

f_{i} (x) < 0

. Consequently, if there exists

x \in U_{ε} (x^{*})

such that

f_{i} (x) \geq 0

with some

i \in {1, \dots, m}

, then

i \notin J_{-} (x^{*})

, that is

f_{i} (x^{*}) \geq 0

. □

By virtue of the above lemma, in a sufficiently small neighbourhood of some fixed point

x^{*} \in X^{*}

, for every

\bar{x} \in U_{ε} (x^{*})

, the following hold

J_{0} (\bar{x}) \subseteq J_{0} (x^{*}) and J_{+} (\bar{x}) \subseteq J_{0} (x^{*}) \cup J_{+} (x^{*}), J_{-} (\bar{x}) \subseteq J_{0} (x^{*}) \cup J_{-} (x^{*}) .

Now, our goal is to correctly define the sets

J_{0} (x^{*})

and

J_{+} (x^{*})

based on the information gained at point

\bar{x} \in U_{ε} (x^{*})

. Let us denote

{\bar{J}}_{0} (\bar{x}) : = J_{0} (\bar{x}), {\bar{J}}_{+} (\bar{x}) : = J_{+} (\bar{x}), {\bar{J}}_{-} (\bar{x}) : = J_{-} (\bar{x}) .

Let

A (\bar{x})

and

b (\bar{x})

represent the matrix and vector obtained from A and b, respectively. The rows of

A (\bar{x})

and the coefficients of

b (\bar{x})

correspond to the index set, which is defined by

{\bar{J}}_{0} (\bar{x}) \cup {\bar{J}}_{+} (\bar{x})

. In this case, Equations (12)–(13) may be rewritten as

\begin{matrix} A^{T} (\bar{x}) & \cdot (A (\bar{x}) \cdot x - b (\bar{x})) = 0_{n}, \\ 〈a_{i}, x〉 & - b_{i} = 0, i \in {\bar{J}}_{0} (\bar{x}) . \end{matrix}

(14)

Let

\bar{A} (\bar{x})

denote the matrix in the equations in (14) corresponding to the maximum set of linearly independent rows, and let

\bar{b} (\bar{x})

denote the corresponding vector of constant terms in (14).

The equations in (14) may be reformulated in the following way

\bar{A} (\bar{x}) \cdot x - \bar{b} (\bar{x}) = 0_{n} .

(15)

Let us observe that, at point

x^{*}

, the following holds

A^{T} (x^{*}) \cdot {(A (x^{*}) \cdot x^{*} - b (x^{*}))}_{+} = 0_{n} .

(16)

This, in turn, means that

\bar{A} (x^{*}) \cdot x^{*} - \bar{b} (x^{*}) = 0_{n} .

(17)

\begin{matrix} M (\bar{x}) : = {x \in R^{n} ∣ & \sum_{i \in {\bar{J}}_{0} (\bar{x}) \cup {\bar{J}}_{+} (\bar{x})} (〈a_{i}, x〉 - b_{i}) \cdot a_{i} = 0_{n} \\ and 〈a_{j}, x〉 - b_{j} = 0, where j \in {\bar{J}}_{0} (\bar{x})\} . \end{matrix}

(18)

If the rank of a matrix B of size

r \times n

is equal to r, then the pseudoinverse matrix (operator)

B^{+}

may be defined as

B^{+} : = B^{T} \cdot {(B \cdot B^{T})}^{- 1}

. We denote the quadratic matrix

n \times n

orthogonally projected on the space containing the rows of matrix B as

{(B^{T})}^{II} : = B^{T} {(B \cdot B^{T})}^{- 1} \cdot B = B^{+} \cdot B

, and its projection on the orthogonal complement of the matrix is denoted as

{(B^{T})}^{⊥} : =

I - {(B^{T})}^{II}

, where I is an all-ones matrix of size

n \times n .

Let a point

z (\bar{x})

be the projection of point

\bar{x}

on the set

M (\bar{x})

. Let us observe that

x^{*} \in M (\bar{x})

if

\bar{x} \in U_{ε} (x^{*})

and

ε

is sufficiently small.

Moreover, if the constraints at point

z (\bar{x})

are

f_{i} (z (\bar{x})) \leq 0

for a certain

i \in {\bar{J}}_{+} (\bar{x})

, then we define the set

I_{-}

in the following way

I_{-} = \{i \in {\bar{J}}_{+} (\bar{x}) ∣ f_{i} (z (\bar{x})) \leq 0\}; I_{-} \subseteq J_{0} (x^{*}) .

Otherwise, if the constraints at point

z (\bar{x})

are

f_{i} (z (\bar{x})) \geq 0

for a certain

i \in {\bar{J}}_{-} (\bar{x})

, we define the set

I_{+}

in an analogous way

I_{+} = \{i \in {\bar{J}}_{-} (\bar{x}) ∣ f_{i} (z (\bar{x})) \geq 0\}; I_{+} \subseteq J_{0} (x^{*}) .

Now, we redefine

{\bar{J}}_{0} (\bar{x})

,

{\bar{J}}_{+} (\bar{x})

and

{\bar{J}}_{-} (\bar{x})

as follows

{\bar{J}}_{0} (\bar{x}) : = {\bar{J}}_{0} (\bar{x}) \cup I_{-} \cup I_{+}, {\bar{J}}_{+} (\bar{x}) : = {\bar{J}}_{+} (\bar{x}) \ I_{-}, {\bar{J}}_{-} (\bar{x}) : = {\bar{J}}_{-} (\bar{x}) \ I_{+} .

(19)

Next, we project point

\bar{x}

on the new set

M (\bar{x})

, cf. (18), and a new point

z (\bar{x})

is obtained.

Let

z (x) = P_{M (\bar{x})} (x) = {(A^{T} (\bar{x}))}^{⊥} \cdot x + {\bar{A}}^{+} (\bar{x}) \cdot \bar{b} (\bar{x})

(20)

define the operator for the projection of point x on set

M (\bar{x}) .

3. Algorithm for Finding the Solution of (1)

In this section, the algorithm designed to find the solution to (1) is presented. The main idea of this algorithm is based on information related to a current point

\bar{x}

belonging to a sufficiently small neighbourhood of the point

x^{*} \in X^{*}

. We also demonstrate how to find such a point. The proposed method comprises two algorithms. The starting point of the method can be arbitrary, because Algorithm 2 (gradient method with a special step selection) starts at an arbitrary point and, on a certain iteration, it will provide a point arbitrarily close to the solution set. Therefore, Algorithm 1 could start at the point specified by Algorithm 2.

Algorithm 1.

Initialisation Step: For the current point

\bar{x}

, the sets of indices

J_{0} (\bar{x})

,

J_{-} (\bar{x})

and

J_{+} (\bar{x})

are defined according to (9). If set

J_{+} (\bar{x}) = \emptyset

, then

\bar{x}

is the solution of (1) and Algorithm 1 is terminated. Otherwise, the Main Recursive Step is performed.

Main Recursive Step: Let

z (\bar{x})

, the projection of point

\bar{x}

on the set

M (\bar{x})

, be defined according to (20). We check if the following condition is satisfied

I_{+} = \emptyset and I_{-} = \emptyset .

(21)

Checking Step: If (21) holds, then

z (\bar{x}) \in X^{*}

, and Equation (10) is satisfied;

z (\bar{x})

is the solution of (1), as defined in (2), and Algorithm 1 is terminated. Otherwise, if for certain values of

i \in D

the condition (21) is violated and

i \in I_{+} \cup I_{-}

, we define

{\bar{J}}_{0} (\bar{x})

,

{\bar{J}}_{+} (\bar{x})

and

{\bar{J}}_{-} (\bar{x})

according to (19),

M (\bar{x})

is redefined according to (18), and the Main Recursive Step is repeated.

The set

D

is finite, and

|D| = m

; therefore, the number of changes to the index sets

{\bar{J}}_{0} (\bar{x})

,

{\bar{J}}_{+} (\bar{x})

and

{\bar{J}}_{-} (\bar{x})

does not exceed m, and finally, the point

z (\bar{x})

fulfilling (12) is established. This means that

z (\bar{x})

is the solution of (1), as defined in (2).

It is of utmost importance that

\bar{x}

belongs to a sufficiently small neighbourhood of the point

x^{*}

, because otherwise,

z (\bar{x})

may not satisfy (12). If this is not the case, it is necessary to find another point

\bar{x}

that is closer to

x^{*}

. The process for accomplishing this is described below.

Theorem 2.

For a sufficiently small

ε > 0

and for every

\bar{x} \in U_{ε} (x^{*})

, Algorithm 1 provides

z^{*} = z (\bar{x})

as the solution for

φ^{'} (x) = A^{T} (x) \cdot {(A (x) \cdot x - b)}_{+} = 0_{n},

(22)

and this is equivalent to finding the solution for (12) within a number of iterations of the order of

0 (m^{3} \cdot n^{3})

.

Proof.

The proof is based on the observation that for

\bar{x}

belonging to a sufficiently small neighbourhood of the point

x^{*}

, according to Lemma 1, the constraints

f_{i} (\bar{x}) \geq 0

correspond to constraints

f_{i} (x^{*}) \geq 0

. Therefore,

{\bar{J}}_{0} (\bar{x}) \cup {\bar{J}}_{+} (\bar{x}) \subseteq J_{0} (x^{*}) \cup J_{+} (x^{*}) .

Let us determine

z (\bar{x})

as the projection of the point

\bar{x}

on the set

M (\bar{x})

, which is defined according to (18). It may happen that the set

{\bar{J}}_{0} (\bar{x})

becomes enlarged. However, the number of iterations required when

{\bar{J}}_{0} (\bar{x})

becomes enlarged does not exceed m, the number of elements in the set D. Therefore, at some iteration, (21) is satisfied. This means that

z (\bar{x})

satisfies (12) or, equivalently,

φ^{'} (z (\bar{x})) = 0_{n}

. This demonstrates that

z (\bar{x})

is the solution for (1), as defined in (2). The computational complexity of establishing each projection

z (\bar{x})

is of order

0 (m^{2} \cdot n^{3})

; this process takes the computational effort related to the multiplications of matrices into account. The number of iterations does not exceed m and, therefore, the overall computational complexity is of order

0 (m^{3} \cdot n^{3})

. □

To complement the presentation of this chapter, the gradient method for establishing

\bar{x}

belonging to the sufficiently small neighbourhood

U_{ε} (x^{*})

of some fixed solution

x^{*} \in X^{*}

to (1) is described. This gradient method has the following scheme

x_{k + 1} = x_{k} - α \cdot φ^{'} (x_{k})

(23)

where

α = \frac{1}{L}

and gradient

φ^{'} (x_{k})

fulfils the Lipschitz condition

|φ^{'} (x_{k + 1}) - φ^{'} (x_{k})| \leq L \cdot |x_{k + 1} - x_{k}| where L = 2 \cdot ∥A^{T} \cdot A∥ .

The convergence of the gradient method (23) is considered in the following theorem, cf. Karmanov [1].

Theorem 3.

Let

x_{0} \in R^{n}

and the sequence

\{x_{k}\}

,

k = 0, 1, 2 \dots

, be constructed according to (23). Then,

x_{k} \to x^{*}, x^{*} \in X^{*}, where k \to \infty and ∥x_{k + 1} - y∥ < ∥x_{k} - y∥ \forall y \in X^{*} .

Proof.

The scheme in (23) produces a sequence that converges to a certain

x^{*} \in X^{*}

. Moreover, for every sufficiently small

ε > 0

, there exists

\bar{k} = k (ε)

such that

\{x_{k}\} \in U_{ε} (x^{*})

, for all

k \geq \bar{k}

. This, in turn, means that at iteration

\bar{k}

, the hypothesis of Theorem 2 is satisfied, and we obtain a solution to (1). □

Now, we have all the necessary prerequisites to present the solution algorithm for (3).

Algorithm 2.

Initialisation Step: Let

k = 0

, and let

x_{0}

be an arbitrary point in

R^{n}

.

Main Recursive Step: Let

x_{k + 1} = x_{k} - α \cdot φ^{'} (x_{k}) .

Checking Step: If

z (x_{k})

is the solution for (3), then Algorithm 2 is terminated. Otherwise, we set

k : = k + 1

, and the Main Recursive Step is repeated.

Theorem 4.

There exists a finite

\bar{k}

such that

z (x_{\bar{k}}) \in X^{*}

and

z (x_{\bar{k}})

is the solution for (3).

Proof.

The sequence

\{x_{k}\}

converges to a fixed

x^{*} \in X^{*}

and therefore, at a certain iteration

\bar{k}

, the hypothesis of Theorem 2 is satisfied, and we obtain the solution

z^{*} = P_{M (x_{\bar{k}})} \in X^{*}

. □

Theorem 4 allows us to establish whether (3) has a solution or not.

Corollary 1.

If

z^{*} \in X,

then

z^{*}

is the solution of (3). Otherwise (3) has no solutions.

4. Conclusions and Appendix

As previously mentioned, the locally polynomial complexity estimate is valid only if the starting point of the proposed method belongs to a sufficiently small neighbourhood of the set of solutions

X^{*}

. To reach such a desired point, the gradient method (23) is used. There are accelerated gradient methods (see those of Nesterov [17] and Poliak [18]), but these methods do not guarantee monotonic convergence to a set of solutions

X^{*}

. The method presented in this paper monotonically converges to a certain point

x^{*}

,

x^{*} \in X^{*}

. It is obvious that the point

x^{*}

depends on the initial point

x_{0}

and, therefore, the number of iterations required by the gradient method for entering the proper neighbourhood of point

x^{*}

depends on the position of the initial point

x_{0}

. Moreover, the

ε

radius of the neighbourhood of point

x^{*}

, which the gradient method should reach, is unknown in the general case and depends on the specific problem being considered. However, it appears that we can guarantee a geometric convergence rate for the gradient method (23) while minimising piecewise quadratic functions of the form (5).

Namely, for every strongly convex function

ψ (x),

the gradient method (23) has a geometric convergence rate, i.e.,

ψ (x_{k}) - ψ^{*} \leq c \cdot δ^{k}, where 0 < δ < 1, c > 0,

where c is a constant that is independent of the size of the problem but depends on the initial point

x_{0}

. In the general case, for functions that are not convex in the strongest sense, there is no proof of the geometric convergence of the gradient method (23). However, in the case where the function

φ (x)

is given by (5), it is possible to prove the geometric convergence of the gradient method (23). Let

\begin{matrix} l (x_{k}) & = \{x^{*} + β \cdot (x_{k} - x^{*}), β \geq 0\} and M (s_{k}) = \{x^{*} + β \cdot s_{k}, β \geq 0\}, \\ s_{k} & = \frac{x_{k} - x^{*}}{∥x_{k} - x^{*}∥} . \end{matrix}

The theorem presented below proves the strong convexity of the function

φ (x)

in the cone of convergence.

Theorem 5.

The elements of the sequence

\{x_{k}\},

defined by (23), belong to the cone of strong convexity of the function

φ (x)

, namely

\forall x

,

y \in l (x_{k})

, the function

φ (x)

is uniformly strongly convex for the sequence

\{x_{k}\}

, i.e.,

φ (λ \cdot x + (1 - λ) \cdot y) \leq λ \cdot φ (x) + (1 - λ) \cdot φ (y) - γ \cdot λ \cdot (1 - λ) \cdot {∥x - y∥}^{2}

(24)

where

λ \in [0, 1]

,

x, y \in l (x_{k})

,

k = 0, 1 \dots

, and

γ > 0

.

Proof.

First, it should be pointed out that because the second derivative of the function

φ (x)

has a finite number of points of discontinuity

\bar{S} \in R^{n}

in every direction, i.e., on the ray

x^{*} + λ \cdot \bar{S}

, there exists

σ > 0

such that on the closed interval

[x^{*}, x^{*} + σ \cdot \bar{S}]

, the function

φ (x)

has a continuous second derivative that obviously depends on

\bar{S}

. Let us assume that the theorem does not hold, i.e., there is not

γ > 0

, such that (24) holds. This means that for

l (x_{k}) = \{x^{*} + β \cdot s_{k}, β \geq 0\}

the following must hold

\frac{\partial^{2} φ (x^{*})}{\partial s_{k}^{2}} = γ_{k} \to 0 when k \to \infty,

(25)

or

\frac{\partial^{2} φ (x^{*})}{\partial s_{k}^{2}} = 〈A^{T} \cdot A \cdot s_{k}, s_{k}〉 = γ_{k} \to 0 when k \to \infty .

For vector

s = lim_{k \to \infty} s_{k}

, the following condition

〈A^{T} \cdot A \cdot s, s〉 = 0

holds, or, due to the construction of

φ (x)

,

φ (x^{*} + β \cdot s) = 0 = φ (x^{*}) = min {∥{(A \cdot x - b)}_{+}∥}^{2},

where

β \in [0, \bar{β}]

,

\bar{β} > 0

is a certain fixed constant. Let

x_{k}^{*}

be (locally) the projection of

x_{k}

on the set

M (s) \in X^{*}

. Then, due to

s_{k} \to s

,

k \to \infty

, we have

∥x_{k} - x_{k}^{*}∥ = δ_{k} \cdot ∥x_{k} - x^{*}∥, where δ_{k} \to 0, k \to \infty .

(26)

Let us set

δ_{k}

sufficiently small and consider the points

x_{k + r}

,

r = 1, 2,

…. Then, according to Theorem 3, we have

∥x_{k + r} - x_{k}^{*}∥ < ∥x_{k} - x_{k}^{*}∥ .

(27)

On the other hand, according to (26), when

r \to \infty

\begin{matrix} ∥x_{k + r} - x_{k}^{*}∥ & \geq ∥x_{k}^{*} - x^{*}∥ - ∥x_{k + r} - x^{*}∥ \geq \\ ∥x_{k} - x^{*}∥ & - ∥x_{k} - x_{k}^{*}∥ - ∥x_{k + r} - x^{*}∥ \geq \\ \frac{1}{δ_{k}} ∥x_{k} - x_{k}^{*}∥ & - ∥x_{k} - x_{k}^{*}∥ - ∥x_{k + r} - x^{*}∥ > ∥x_{k} - x_{k}^{*}∥ . \end{matrix}

This is contradictory to (27), and therefore Theorem 5 holds. □

Theorem 5 allows for the estimation of the convergence rate of the gradient method (23).

Theorem 6.

Under the assumptions of Theorem 5 for the sequence

\{x_{k}\},

constructed according to (23), the following convergence rates hold

φ (x_{k}) - φ^{*} \leq c_{1} \cdot τ^{k} and ∥x_{k} - x^{*}∥ \leq c_{2} \cdot τ^{\frac{k}{2}},

(28)

where

τ \in (0, 1)

, and

c_{1}

and

c_{2}

> 0

; the constants

c_{1}

,

c_{2}

are independent of the value of k but depend on the initial point

x_{0}

.

Proof.

Let us denote

μ_{k} = φ (x_{k}) - φ^{*} .

For the sequence

\{x_{k}\}

and

q \in (\frac{1}{2}, 1)

the following holds

\begin{matrix} φ (x_{k}) - φ (x_{k + 1}) & \geq α \cdot q \cdot {∥φ^{'} (x_{k})∥}^{2} \geq α \cdot q \cdot {〈φ^{'} (x_{k}), s_{k}〉}^{2} = \frac{\partial^{2} φ (x_{k})}{\partial s_{k}^{2}} \geq \\ \geq α \cdot q \cdot γ^{2} \cdot (φ (x_{k}) - φ^{*}) \end{matrix}

(29)

or, equivalently,

μ_{k} - μ_{k + 1} \geq α \cdot q \cdot γ^{2} μ_{k} .

Therefore, for

τ \in (0, 1)

, the following holds

μ_{k} \leq c_{1} \cdot τ^{k} or, equivalently, φ (x_{k}) - φ^{*} \leq c_{1} \cdot τ^{k} .

This proves the first part of (28), while the latter part of (28) follows from the strong convexity of the function

φ (x)

in the cone of convergence. □

The conduction computational experiments and comparison of the presented method with other methods in the literature remains a topic for future research.

Author Contributions

Conceptualisation, A.T. and K.S.; methodology, A.P., K.S. and A.T.; validation, A.P., K.S. and A.T.; formal analysis, A.P.; investigation, A.P., K.S. and A.T.; resources, A.T.; writing—original draft preparation, A.P. and K.S.; supervision, A.T.; project administration, A.P.; funding acquisition, A.P. and A.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Science and Higher Education, grant number 61/20/B.

Acknowledgments

The research of the third author was supported by Russian Sciences Foundation (project No 21-71-30005).

Conflicts of Interest

The authors declare no conflict of interest. Formal analysis, funding acquisition, investigation, methodology, project administration, validation, writing—original draft, A.P.; conceptualization, investigation, methodology, validation, writing—original draft, K.S.; conceptualization, funding acquisition, investigation, methodology, resources, supervision, validation, A.T.

References

Karmanov, V.G. Mathematical Programming; Mir Publishers: Moscow, Russia, 1989. [Google Scholar]
Golikov, A.; Evtushenko, Y.G. Theorems of the alternative and their applications in numerical methods. Comput. Math. Math. Phys. 2003, 43, 338–358. [Google Scholar]
Evtushenko, Y.G.; Golikov, A. New perspective on the theorems of alternative. In High Performance Algorithms and Software for Nonlinear Optimization; Springer: Berlin/Heidelberger, Germany, 2003; pp. 227–241. [Google Scholar]
Tretyakov, A. A finite-termination gradient projection method for solving systems of linear inequalities. Russ. J. Numer. Anal. Model. 2010, 25, 279–288. [Google Scholar] [CrossRef]
Tretyakov, A.; Tyrtyshnikov, E. Exact differentiable penalty for a problem of quadratic programming with the use of a gradient-projective method. Russ. J. Numer. Anal. Model. 2015, 30, 121–128. [Google Scholar] [CrossRef]
Han, S.-P. Least-Squares Solution of Linear Inequalities; Technical Report; Wisconsin Univ-Madison Mathematics Research Center: Madison, WI, USA, 1980. [Google Scholar]
Tretyakov, A.; Tyrtyshnikov, E. A finite gradient-projective solver for a quadratic programming problem. Russ. J. Numer. Anal. Model. 2013, 28, 289–300. [Google Scholar] [CrossRef]
Mangasarian, O. A Finite Newton Method for Classification Problems; Technical Report, Technical Report 01-11; Data Mining Institute, Computer Sciences Department, University of Wisconsin: Madison, WI, USA, 2001. [Google Scholar]
Facchinei, F.; Fischer, A.; Kanzow, C. On the accurate identification of active constraints. SIAM 1998, 9, 14–32. [Google Scholar] [CrossRef]
Wright, S.J. An algorithm for degenerate nonlinear programming with rapid local convergence. SIAM 2005, 15, 673–696. [Google Scholar] [CrossRef] [Green Version]
Pan, P.Q. A Projective Simplex Algorithm Using LU Decomposition. Comput. Math. Appl. 2000, 39, 187–208. [Google Scholar] [CrossRef] [Green Version]
Wright, M.H. The interior-point revolution in optimization: History, recent developments, and lasting consequences. Bull. Am. Math. Soc. 2005, 42, 39–56. [Google Scholar] [CrossRef] [Green Version]
Roos, K. An improved version of Chubanov’s method for solving a homogeneous feasibility problem. Optim. Methods. Softw. 2018, 33, 26–44. [Google Scholar] [CrossRef] [Green Version]
Khachiyan, L. Fourier-Motzkin elimination method. In Encyclopedia of Optimization; Floudas, C.A., Pardalos, P.M., Eds.; Springer: Berlin/Heidelberger, Germany, 2009; pp. 1074–1077. [Google Scholar]
Šimeček, I.; Fritsch, R.; Langr, D.; Lórencz, R. Paralel solver of large systems of linear inequalities using Fourier-Motzkin elimination. Comput. Inform. 2016, 35, 1307–1337. [Google Scholar]
Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar]
Nesterov, Y. Nesterov. One class of methods of unconditional minimization of a convex function, having a high rate of convergence. USSR 1984, 24, 80–82. [Google Scholar]
Poliak, B. Introduction to Optimization; Optimization Software, Inc.: New York, NY, USA, 1987. [Google Scholar]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Prusińska, A.; Szkatuła, K.; Tret’yakov, A. On the Locally Polynomial Complexity of the Projection-Gradient Method for Solving Piecewise Quadratic Optimisation Problems. Entropy 2021, 23, 465. https://doi.org/10.3390/e23040465

AMA Style

Prusińska A, Szkatuła K, Tret’yakov A. On the Locally Polynomial Complexity of the Projection-Gradient Method for Solving Piecewise Quadratic Optimisation Problems. Entropy. 2021; 23(4):465. https://doi.org/10.3390/e23040465

Chicago/Turabian Style

Prusińska, Agnieszka, Krzysztof Szkatuła, and Alexey Tret’yakov. 2021. "On the Locally Polynomial Complexity of the Projection-Gradient Method for Solving Piecewise Quadratic Optimisation Problems" Entropy 23, no. 4: 465. https://doi.org/10.3390/e23040465

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On the Locally Polynomial Complexity of the Projection-Gradient Method for Solving Piecewise Quadratic Optimisation Problems

Abstract

1. Introduction

2. Definitions and Theoretical Results

3. Algorithm for Finding the Solution of (1)

4. Conclusions and Appendix

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI