1. Introduction
Let us consider the following optimisation problem:
where
,
A is an
matrix,
,
,
,
,
,
and
is the Euclidean norm of
.
In this paper, a method for solving the problem in (
1) is proposed; moreover, the number of iterations (equivalent to the computational complexity) required by the proposed method with respect to
m and
n is locally polynomial, and in the worst-case scenario, it has a geometric convergence rate.
Let us define the set of solutions of (
1) as follows
If some point sufficiently close to the set
of solutions to (
1) is known, then it is possible to find a solution of (
1) within a polynomial number of computational iterations; thus, the computational complexity is of the order of
.
Many methods for solving (
1) have been proposed (cf. Karmanov [
1], Golikov and Evtushenko [
2], Evtushenko and Golikov [
3], Tretyakov [
4], Tretyakov and Tyrtyshnikov [
5] and Han [
6]). All of these methods have reasonable computational complexity but, as mentioned above, to date, no strongly polynomial-time algorithm for solving (
1) has been proposed. In studies by Tretyakov and Tyrtyshnikov [
7] and Mangasarian [
8], linear programming problems were solved by reducing them to the unconditional minimisation of strongly convex piecewise quadratic functions. A solution is obtained within a finite polynomial number of iterations if the starting point of the algorithm belongs to a sufficiently close neighbourhood of the unique solution to the problem. Unfortunately, the authors imposed severe limitations on the functions to be minimised: they should be strongly convex, the eigenvalues of the Hessian matrices should satisfy specific conditions, etc.
These results create significant limitations on the class of problems that can be solved: it is required that (
1) has only one unique solution, etc. The solution method described by Tretyakov and Tyrtyshnikov [
7] is based on exploiting information about the problem being solved by analysing a sufficiently small neighbourhood of an arbitrary solution of (
1). Analogous methods were proposed by Facchinei et al. [
9] for the forecasting (identification) of the active constraints in a sufficiently close neighbourhood of the solution to the problem. In papers by Tretyakov and Tyrtyshnikov [
5] and Wright [
10], locally polynomial methods for solving quadratic programming problems based on similar ideas were presented. Tretyakov [
4] proposed the gradient projection method for solving (
1); this method involves finding a solution of (
1) in a finite number of iterations and is a combination of iterative and straightforward (e.g., Gaussian) methods.
This paper proposes a computational method for solving (
1). When the starting point of the proposed method is sufficiently close to the set
of solutions to (
1), then its computational complexity is locally polynomial, i.e., it is of the order of
.
We point out that solving a system of linear inequalities involves
where the
–
m-dimensional vectors of zeroes can be reduced to solve the problem (
1). This means that the number of computations required for establishing a solution (if a given system of linear inequalities has one) is locally polynomial.
Let us denote
It is obvious that the set
X might be empty in general, but our method, presented in this paper, either determines this situation in a locally polynomial number of computations or provides a solution to the system (
3). The proposed method could be applied when solving large systems of linear inequalities, which appear in many practical, industrial applications, e.g., the simplex method (Pan [
11]), Karmarkar’s method (Wright, [
12]), Chubanov’s method (Roos [
13]), and Fourier–Motzkin elimination method (Khachiyan [
14], I. Šimeček, R. Fritsch, D. Langr, R. Lórencz [
15]).
2. Definitions and Theoretical Results
Theorem 1. The function is convex and has a nonempty set of minimal values Proof. Theorem 1 follows immediately from the well-known features of quadratic-type convex functions (see, e.g., [
16]). □
It is obvious that the elements
, cf. (
6) satisfy
where
is the
ith row of matrix
A.
Therefore, in the general case, our goal is to solve the following equation
In the sequel,
stands for an arbitrary element of
(the minimum point of
). If the minimum value of
is equal to zero, then
, and if the minimum value of
is positive, then
. Let us denote
and
where
is introduced to simplify the definitions of the sets
and
.
According to (
7) and the above notations,
should satisfy the formula
The formula (
10) is equivalent to a condition that should be satisfied at point
In (
11), it is considered that
This, in turn, means that, in the general case, we should solve the following equations
or
Without loss of generality, we may denote
where
.
The main idea exploited in this paper is based on the following Lemma. For we set .
Lemma 1. Let be a solution to the problem (1). Then, there exists , such that for any , the inequality implies the inequality . Proof. If
that is
, then, by continuity of the function
, there exists
, such that
for all
. Set
Then, for all
and for all
, we have
. Consequently, if there exists
such that
with some
, then
, that is
. □
By virtue of the above lemma, in a sufficiently small neighbourhood of some fixed point
, for every
, the following hold
Now, our goal is to correctly define the sets
and
based on the information gained at point
. Let us denote
Let
and
represent the matrix and vector obtained from
A and
b, respectively. The rows of
and the coefficients of
correspond to the index set, which is defined by
. In this case, Equations (
12)–(
13) may be rewritten as
Let
denote the matrix in the equations in (
14) corresponding to the maximum set of linearly independent rows, and let
denote the corresponding vector of constant terms in (
14).
The equations in (
14) may be reformulated in the following way
Let us observe that, at point
, the following holds
This, in turn, means that
If the rank of a matrix
B of size
is equal to
r, then the pseudoinverse matrix (operator)
may be defined as
. We denote the quadratic matrix
orthogonally projected on the space containing the rows of matrix
B as
, and its projection on the orthogonal complement of the matrix is denoted as
, where
I is an all-ones matrix of size
Let a point be the projection of point on the set . Let us observe that if and is sufficiently small.
Moreover, if the constraints at point
are
for a certain
, then we define the set
in the following way
Otherwise, if the constraints at point
are
for a certain
, we define the set
in an analogous way
Now, we redefine
,
and
as follows
Next, we project point
on the new set
, cf. (
18), and a new point
is obtained.
Let
define the operator for the projection of point
x on set
3. Algorithm for Finding the Solution of (1)
In this section, the algorithm designed to find the solution to (
1) is presented. The main idea of this algorithm is based on information related to a current point
belonging to a sufficiently small neighbourhood of the point
. We also demonstrate how to find such a point. The proposed method comprises two algorithms. The starting point of the method can be arbitrary, because Algorithm 2 (gradient method with a special step selection) starts at an arbitrary point and, on a certain iteration, it will provide a point arbitrarily close to the solution set. Therefore, Algorithm 1 could start at the point specified by Algorithm 2.
Algorithm 1. Initialisation Step: For the current point , the sets of indices , and are defined according to (9). If set , then is the solution of (1) and Algorithm 1 is terminated. Otherwise, the Main Recursive Step is performed. Main Recursive Step: Let, the projection of pointon the set, be defined according to (20). We check if the following condition is satisfied Checking Step: If (21) holds, then , and Equation (10) is satisfied; is the solution of (1), as defined in (2), and Algorithm 1 is terminated. Otherwise, if for certain values of the condition (21) is violated and , we define , and according to (19), is redefined according to (18), and the Main Recursive Step is repeated. The set
is finite, and
; therefore, the number of changes to the index sets
,
and
does not exceed
m, and finally, the point
fulfilling (
12) is established. This means that
is the solution of (
1), as defined in (
2).
It is of utmost importance that
belongs to a sufficiently small neighbourhood of the point
, because otherwise,
may not satisfy (
12). If this is not the case, it is necessary to find another point
that is closer to
. The process for accomplishing this is described below.
Theorem 2. For a sufficiently small and for every , Algorithm 1 provides as the solution for and this is equivalent to finding the solution for (12) within a number of iterations of the order of . Proof. The proof is based on the observation that for
belonging to a sufficiently small neighbourhood of the point
, according to Lemma 1, the constraints
correspond to constraints
. Therefore,
Let us determine
as the projection of the point
on the set
, which is defined according to (
18). It may happen that the set
becomes enlarged. However, the number of iterations required when
becomes enlarged does not exceed
m, the number of elements in the set
D. Therefore, at some iteration, (
21) is satisfied. This means that
satisfies (
12) or, equivalently,
. This demonstrates that
is the solution for (
1), as defined in (
2). The computational complexity of establishing each projection
is of order
; this process takes the computational effort related to the multiplications of matrices into account. The number of iterations does not exceed
m and, therefore, the overall computational complexity is of order
. □
To complement the presentation of this chapter, the gradient method for establishing
belonging to the sufficiently small neighbourhood
of some fixed solution
to (
1) is described. This gradient method has the following scheme
where
and gradient
fulfils the Lipschitz condition
The convergence of the gradient method (
23) is considered in the following theorem, cf. Karmanov [
1].
Theorem 3. Let and the sequence , , be constructed according to (23). Then, Proof. The scheme in (
23) produces a sequence that converges to a certain
. Moreover, for every sufficiently small
, there exists
such that
, for all
. This, in turn, means that at iteration
, the hypothesis of Theorem 2 is satisfied, and we obtain a solution to (
1). □
Now, we have all the necessary prerequisites to present the solution algorithm for (
3).
Algorithm 2. Initialisation Step: Let , and let be an arbitrary point in .
Checking Step: If is the solution for (3), then Algorithm 2 is terminated. Otherwise, we set , and the Main Recursive Step is repeated. Theorem 4. There exists a finite such that and is the solution for (3). Proof. The sequence converges to a fixed and therefore, at a certain iteration , the hypothesis of Theorem 2 is satisfied, and we obtain the solution . □
Theorem 4 allows us to establish whether (
3) has a solution or not.
Corollary 1. Ifthen is the solution of (3). Otherwise (3) has no solutions. 4. Conclusions and Appendix
As previously mentioned, the locally polynomial complexity estimate is valid only if the starting point of the proposed method belongs to a sufficiently small neighbourhood of the set of solutions
. To reach such a desired point, the gradient method (
23) is used. There are accelerated gradient methods (see those of Nesterov [
17] and Poliak [
18]), but these methods do not guarantee monotonic convergence to a set of solutions
. The method presented in this paper monotonically converges to a certain point
,
. It is obvious that the point
depends on the initial point
and, therefore, the number of iterations required by the gradient method for entering the proper neighbourhood of point
depends on the position of the initial point
. Moreover, the
radius of the neighbourhood of point
, which the gradient method should reach, is unknown in the general case and depends on the specific problem being considered. However, it appears that we can guarantee a geometric convergence rate for the gradient method (
23) while minimising piecewise quadratic functions of the form (
5).
Namely, for every strongly convex function
the gradient method (
23) has a geometric convergence rate, i.e.,
where
c is a constant that is independent of the size of the problem but depends on the initial point
. In the general case, for functions that are not convex in the strongest sense, there is no proof of the geometric convergence of the gradient method (
23). However, in the case where the function
is given by (
5), it is possible to prove the geometric convergence of the gradient method (
23). Let
The theorem presented below proves the strong convexity of the function in the cone of convergence.
Theorem 5. The elements of the sequence defined by (23), belong to the cone of strong convexity of the function , namely ,, the function is uniformly strongly convex for the sequence , i.e.,where , , , and . Proof. First, it should be pointed out that because the second derivative of the function
has a finite number of points of discontinuity
in every direction, i.e., on the ray
, there exists
such that on the closed interval
, the function
has a continuous second derivative that obviously depends on
. Let us assume that the theorem does not hold, i.e., there is not
, such that (
24) holds. This means that for
the following must hold
or
For vector
, the following condition
holds, or, due to the construction of
,
where
,
is a certain fixed constant. Let
be (locally) the projection of
on the set
. Then, due to
,
, we have
Let us set
sufficiently small and consider the points
,
…. Then, according to Theorem 3, we have
On the other hand, according to (
26), when
This is contradictory to (
27), and therefore Theorem 5 holds. □
Theorem 5 allows for the estimation of the convergence rate of the gradient method (
23).
Theorem 6. Under the assumptions of Theorem 5 for the sequence constructed according to (23), the following convergence rates holdwhere , and and ; the constants , are independent of the value of k but depend on the initial point . Proof. Let us denote
For the sequence
and
the following holds
or, equivalently,
Therefore, for
, the following holds
This proves the first part of (
28), while the latter part of (
28) follows from the strong convexity of the function
in the cone of convergence. □
The conduction computational experiments and comparison of the presented method with other methods in the literature remains a topic for future research.