*Proceeding Paper* **Additional Requirement in the Formulation of the Optimal Control Problem for Applied Technical Systems †**

**Elizaveta Shmalko \*,‡ and Askhat Diveev ‡**

Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, Moscow 119333, Russia; aidiveev@mail.ru

**\*** Correspondence: e.shmalko@gmail.com

† Presented at the 15th International Conference "Intelligent Systems" (INTELS'22), Moscow, Russia, 14–16 December 2022.

‡ These authors contributed equally to this work.

**Abstract:** This paper considers the difficulties that arise in the implementation of solutions to the optimal control problem. When implemented in real systems, as a rule, the object is subject to some perturbations, and the control obtained as a function of time as a result of solving the optimal control problem does not take into account these factors, which leads to a significant change in the trajectory and deviation of the object from the terminal goal. This paper proposes to supplement the formulation of the optimal control problem. Additional requirements are introduced for the optimal trajectory. The fulfillment of these requirements ensures that the trajectory remains close to the optimal one under perturbations and reaches the vicinity of the terminal state. To solve the problem, it is proposed to use numerical methods of machine learning based on symbolic regression. A computational experiment is presented in which the solutions of the optimal control problem in the classical formulation and with the introduced additional requirement are compared.

**Keywords:** optimal control; stability; control synthesis; feasibility of control; synthesized control

#### **1. Introduction**

The main disadvantage of the optimal control problem [1] is that its solution is an optimal control as a function of time, and it cannot be implemented in practice since its implementation leads to an open control system that is insensitive to model disturbances. Consider a well-known optimal control problem

$$\begin{array}{rcl} \dot{\mathfrak{x}}\_{1} &=& \mathfrak{x}\_{2}, \\ \dot{\mathfrak{x}}\_{2} &=& \mathfrak{u}\_{\prime} \end{array} \tag{1}$$

where **x** = [*x*<sup>1</sup> *x*2] *<sup>T</sup>* is a state vector, *u* is a control signal. The control values are limited

−1 ≤ *u* ≤ 1. (2)

It is necessary to find a control that will move the object (1) from the initial state

$$\mathbf{x}(0) = \begin{bmatrix} 1 \ 1 \end{bmatrix}^T,\tag{3}$$

to the given terminal position

as fast as possible

$$\mathbf{x}^{f} = [0 \,\, 0]^{T} \tag{4}$$

$$J = t\_f \to \min.\tag{5}$$

**Citation:** Shmalko, E.; Diveev, A. Additional Requirement in the Formulation of the Optimal Control Problem for Applied Technical Systems. *Eng. Proc.* **2023**, *33*, 7. https://doi.org/10.3390/ engproc2023033007

Academic Editors: Ivan Zelinka, Arutun Avetisyan and Alexander Ilin

Published: 16 May 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The analytical solution of the stated problem was presented in [1]. According to the maximum principle, the optimal control takes only limit values (2) and has no more than one switch. According to (3), initially, the control has the value *u* = −1. Then, when a certain state is reached, the control switches to the value *u* = 1.

$$\mu = \begin{cases} -1, \text{if } t < t^\* \\ 1, \text{otherwise} \end{cases} \tag{6}$$

where *t* ∗ is the moment of control switching.

Let us find the moment of switching. A particular solution of the system (1) from the initial state (3) has the following form

$$\begin{array}{rcl} \mathbf{x}\_1 &=& -0.5t^2 + t + 1, \\ \mathbf{x}\_2 &=& -t + 1. \end{array} \tag{7}$$

The general solution (1) for the control *u* = +1 is the following

$$\begin{array}{rcl} \mathbf{x}\_1 &=& \mathbf{0}.5t^2 + \mathbf{x}\_{2,0}t + \mathbf{x}\_{1,0} \\ \mathbf{x}\_2 &=& t + \mathbf{x}\_{2,0} \end{array} \tag{8}$$

where *x*1,0, *x*2,0 are the coordinates of the switching point.

Let us express in (8) *x*<sup>1</sup> as a function of *x*<sup>2</sup>

$$\mathbf{x}\_1 = \mathbf{x}\_2^2 - 0.5\mathbf{x}\_{2,0}^2 + \mathbf{x}\_{1,0} \tag{9}$$

The relation for the switching point follows from the terminal conditions (4)

$$
\varkappa\_{1,0} = 0.5 \varkappa\_{2,0}^2. \tag{10}
$$

Let us now find the moment of time for a particular solution (7) that satisfies the relation (10).

$$\begin{array}{rcl}-0.5t^2 + t + 1 &=& 0.5(-t + 1)^2; \\ t^2 - 2t - 0.5 &=& 0; \\ t^\* = 1 + \sqrt{1.5} &=& 2.22474487. \end{array} \tag{11}$$

The switching time (11) is the solution to this optimal control problem. To determine the value of the functional (5), we calculate the coordinates of the switching point. Substituting (11) into the particular solution (7), we obtain

$$
\mathbf{x}\_1 = -0.75, \; \mathbf{x}\_2 = \sqrt{1.5}. \tag{12}
$$

From the second equation in (8), we obtain the optimal time of reaching the terminal state

$$
\overline{t} = t^\* + \sqrt{1.5} = 3.44948974.\tag{13}
$$

Now, we introduce perturbations into the initial conditions (3)

$$\mathbf{x}\_1(0) = 1 + \delta\_1, \mathbf{x}\_2(0) = 1 + \delta\_2. \tag{14}$$

where *δ*1, *δ*<sup>2</sup> are random variables from a limited range.

During the time *t* ∗, the object does not get to the switching point (12), and after switching accordingly, does not get into the terminal state. Based on the optimal value of the functional (13), let us limit the control time to *t* <sup>+</sup> = 3.5 and determine the state of the object at the moment *t* <sup>+</sup>, taking into account the switching of the control at the moment (11)

$$\begin{array}{rcl} \pounds\_{1,0} &=& -0.5t^{\*2} + (1 + \delta\_2)t^\* + 1 + \delta\_1 \\ \pounds\_{2,0} &=& -t^\* + 1 + \delta\_2 \\ \chi\_1(t^+) &=& 0.5(t^+ + t^\*)^2 + \pounds\_{2,0}(t^+ - t^\*) + \pounds\_{1,0} = \\ & 0.00127565 - (1.27525513 + t^\*)\delta\_2 + \delta\_1 \\ \chi\_2(t^+) &=& t^+ - t^\* + \pounds\_{2,0} = 0.05051026 + \delta\_2. \end{array} \tag{15}$$

Figure 1 shows trajectories of eight randomly perturbed solutions of the problem (1)–(5) from the range

$$
\pi\_1(0) = 1 \pm 0.25, \; \pi\_2(0) = 1 \pm 0.25. \tag{16}
$$

In Figure 1, the blue curve represents the optimal unperturbed solution.

**Figure 1.** Optimal and perturbed solutions with control (6).

All perturbed solutions do not reach the terminal state. It is obvious from the plots that the solution of the optimal control problem as a function of time (6) cannot be implemented in practice since according to the model (1) with control (6) due to disturbances, it is impossible to assess the state of the control object.

In this regard, it is necessary to introduce additional requirements into the formulation of the optimal control problem so that the resulting controls can be directly implemented on a real plant.

#### **2. Optimal Control Problem Statement with Additional Requirement**

Consider the formulation of the optimal control problem, the solution of which can be directly implemented on the plant.

Given the mathematical model of the control object

$$
\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{u}),
\tag{17}
$$

where **<sup>x</sup>** <sup>∈</sup> <sup>R</sup>*n*, **<sup>u</sup>** <sup>∈</sup> <sup>U</sup> <sup>⊆</sup> <sup>R</sup>*m*, **<sup>f</sup>** = [ *<sup>f</sup>*1(**x**, **<sup>u</sup>**)... *fn*(**x**, **<sup>u</sup>**)]*T*. Given the initial

**<sup>x</sup>**(0) = **<sup>x</sup>**<sup>0</sup> <sup>∈</sup> <sup>R</sup>*n*. (18)

and terminal conditions

$$\mathbf{x}(t\_f) = \mathbf{x}^f \in \mathbb{R}^n,\tag{19}$$

where *tf* is the time to reach the terminal conditions, not specified, but limited, *tf* ≤ *t* <sup>+</sup>, *t* + is the specified limit time of the control process.

A quality criterion is set. It may include conditions for fulfilling phase constraints

$$J\_1 = \int\_0^{t\_f} f\_0(\mathbf{u}, \mathbf{x})dt \to \min\_{\mathbf{u} \in \mathcal{U}}.\tag{20}$$

We need to find a control in the form

$$\mathbf{u} = \mathbf{g}(\mathbf{x}, t) \in \mathcal{U}. \tag{21}$$

The found control must be such that the particular solution **x**(*t*, **x**0) of the system

$$
\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{g}(\mathbf{x}, t)) \tag{22}
$$

from the initial state (18) reaches the terminal state (19) with the optimal value of the quality criterion (20). Moreover, the optimal particular solution **x**(*t*, **x**0) of the system (22) would have a neighborhood Δ(*t*) > 0 such that if for any other particular solution **x**(*t*, **y**) of the system (22) from another initial state **<sup>y</sup>** <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* at time *<sup>t</sup>* , 0 ≤ *t* ≤ *t* +

$$\left\|\mathbf{x}(t',\mathbf{y}) - \mathbf{x}(t',\mathbf{x}^0)\right\| \le \Delta(t'),\tag{23}$$

then ∀*t*, *t* ≤ *t* ≤ *tf* , this particular solution does not leave the neighborhood of the optimal

$$\|\mathbf{x}(t, \mathbf{y}) - \mathbf{x}(t, \mathbf{x}^0)\| \le \Delta(t), \ t' \le t \le t\_f. \tag{24}$$

The neighborhood Δ(*t*) shrinks near the terminal state. This means that for any particular solution from the neighborhood of the optimal one for which the conditions (23) are satisfied ∃*t* < *t* <sup>+</sup> such that

$$\|\mathbf{x}(t'',\mathbf{y}) - \mathbf{x}(t'',\mathbf{x}^0)\| \le \varepsilon,\tag{25}$$

where *ε* is a given small positive value.

The existence of a neighborhood of the optimal solution in many cases can worsen the value of the functional. For example, in a problem with phase constraints, which are obstacles on the path of movement of the control object to the terminal state, the optimal trajectory often passes along the boundary of the obstacle. Such a trajectory will not have a neighborhood, so in this case, it is necessary to find another trajectory that will not be optimal according to the classical formulation of the optimal control problem but allows variations in the initial values with a small change in the value of the functional.

#### **3. Overview of Methods for Solving the Extended Optimal Control Problem with Additional Requirement**

Therefore, an additional requirement has been put forward in the formulation of the optimal control problem, which makes it possible to implement the obtained controls on real objects. Consider the existing methods for solving the problem in the presented extended formulation.

The solution to the problem of general control synthesis for a certain region of initial conditions leads to the fact that each particular solution from this region of initial conditions will be optimal. In this case, each particular solution has a neighborhood containing other optimal solutions. The neighborhood will be open, but will also shrink near the terminal state.

For the model (1), there is a solution to the problem of general control synthesis, in which a control is found that ensures the optimal achievement of the terminal state from any initial condition.

$$\mu^\* = \begin{cases} -1, \text{if } h(\mathbf{x}\_1, \mathbf{x}\_2) \ge 0 \\ 1, \text{otherwise} \end{cases} \tag{26}$$

where

$$h(x\_1, x\_2) = \begin{cases} x\_1 + 0.5x\_2^2, \text{if } x\_1 < 0\\ x\_1 - 0.5x\_2^2, \text{otherwise} \end{cases}.\tag{27}$$

Plots of eight perturbed solutions for the object model (1) with control (26) are shown in Figure 2.

**Figure 2.** Optimal and perturbed solutions with control (26).

As can be seen from Figure 2, all perturbed solutions have reached the terminal state. This control (26) is practically feasible.

However, the problem of general control synthesis is a complex mathematical problem for which there is no universal numerical solution method. In this case, this problem was solved because the plant model is simple, the optimal control takes only two limit values, and for both of these values, general solutions of the differential equations of the model (1) are obtained.

Another approach to solving the optimal control problem and fulfilling additional requirements is stabilizing motion along the trajectory based on the theory of stability of A.M. Lyapunov [2]. As a result of constructing the stabilization system, the optimal trajectory should become asymptotically stable. The construction of such a stabilization system is not always possible; in particular, in the problem under consideration (1), the control resources are exhausted to obtain the optimal trajectory and there are no more control resources for the stabilization system.

Another approach that also allows solving the optimal control problem in the presented extended formulation with additional requirements is the synthesized control method. It consists of two stages [3,4]. Initially, the problem of control synthesis is solved in order to ensure the stability of the control object relative to some point in the state space. In the second stage, the problem of optimal control is solved, while the coordinates of the stability points of the control object are used as control. It should be noted that the solution of the control synthesis problem at the first stage will significantly change the mathematical model of the control object. When solving the control synthesis problem, the functional of the optimal control problem is not used to ensure stability; therefore, various methods for solving the control synthesis problem will lead to different mathematical models of a closed control system and to different solutions to the optimal control problem. The presence of a neighborhood with attraction properties for the optimal solution requires the choice of such a position of the stability points in the state space so that particular solutions from a certain region of initial states, being attracted to these stability points, are close to each other, moving to the terminal state.

Consider the application of the synthesized optimal control to problems (1)–(5). For stabilization system synthesis different methods can be used from traditional regulators [5] to modern machine learning techniques [6–8]. As far as the considered object is rather simple (1), it is enough to use a proportional regulator. Taking into account the limits on control, it has the following form

$$u = \begin{cases} \text{sgn}(\vec{u}), \text{if } |\vec{u}| > 1 \\ \vec{u} - \text{otherwise} \end{cases} \tag{28}$$

where

$$
\tilde{u} = k\_1(\mathbf{x}\_1^\* - \mathbf{x}\_1) + k\_2(\mathbf{x}\_2^\* - \mathbf{x}\_2). \tag{29}
$$

The object is stable if *k*<sup>1</sup> > 0, *k*<sup>2</sup> > 0. A stable equilibrium point exists if |*u*˜| < 1. The coordinates of the equilibrium point for the given *k*1, *k*<sup>2</sup> depend on the values of *x*∗ <sup>1</sup>, *x*<sup>∗</sup> 2

$$
\mathfrak{x}\_1 = \frac{k\_2 \mathfrak{x}\_2^\* + k\_1 \mathfrak{x}\_1^\*}{k\_1}, \ \mathfrak{x}\_2 = 0. \tag{30}
$$

As a result, we obtain the following control object model

$$\begin{array}{rcl} \dot{\mathbf{x}}\_{1} &=& \mathbf{x}\_{2} \\ \dot{\mathbf{x}}\_{2} &=& k\_{1}(\mathbf{x}\_{1}^{\*} - \mathbf{x}\_{1}) + k\_{2}(\mathbf{x}\_{2}^{\*} - \mathbf{x}\_{2}) \end{array} . \tag{31}$$

The control in the model is the vector **x**∗ = [*x*∗ <sup>1</sup> *x*<sup>∗</sup> 2 ] *<sup>T</sup>*, whose values are limited by the following inequalities

$$-1 \le k\_1(\mathbf{x}\_1^\* - \mathbf{x}\_1) + k\_2(\mathbf{x}\_2^\* - \mathbf{x}\_2) \le 1. \tag{32}$$

To solve the problem of optimal control, we include in the quality criterion the accuracy of hitting the terminal state **x***<sup>f</sup>* = [0 0] *T*

$$J\_5 = t\_f + p\_1 \|\mathbf{x}^f - \mathbf{x}\|\_\prime \tag{33}$$

where *tf* is a terminal time, *p*<sup>1</sup> is a weight coefficient, *p*<sup>1</sup> = 1.

When solving the problem, we divide the control time *t* <sup>+</sup> into intervals Δ*t*, and on each interval, we look for the values *x*∗ <sup>1</sup>, *x*<sup>∗</sup> <sup>2</sup>, taking into account the constraints (32). The following parameter values were used: *k*<sup>1</sup> = 2, *k*<sup>2</sup> = 2, *p*<sup>1</sup> = 1, *t* <sup>+</sup> = 3.5, Δ*<sup>t</sup>* = 0.5, *ε*<sup>1</sup> = 0.001. An evolutionary hybrid algorithm [9] was used for the solution. The optimal solution gave the value of the functional (33) *J*<sup>5</sup> = 3.6343.

Figure 3 shows particular solutions of the system (31) with random perturbations of the initial values in the range (16). The blue curve in the figure shows the unperturbed optimal solution. The black dots show the positions of the found **x**∗ control points. As we see from the experimental results, the perturbed solutions stabilize in the vicinity of the optimal one. Compared to Figure 1, it is obvious that the resulting model (31) is feasible.

**Figure 3.** Perturbed and unperturbed solutions under synthesized optimal control.

#### **4. Discussion**

This paper raises the problem of the feasibility of optimal controls obtained as a result of solving the classical formulation of the optimal control problem. It is shown that when disturbances appear, the solutions turn out to be unsatisfactory. In the paper, an additional requirement for the desired control function is introduced. The introduced requirement ensures the stability of solutions near the optimal solution. Possible approaches to solving the proposed extended optimal control problem are considered. The best solution for problems (1)–(5) is the solution to the general control synthesis problem. However, for more complex objects, it is not always possible to solve the problem of general synthesis. This paper considers the method of synthesized optimal control, which satisfies the introduced requirement of the feasibility of optimal control, and at the same time finds solutions that are close to optimal through the use of machine learning methods and evolutionary algorithms.

**Author Contributions:** Conceptualization, A.D. and E.S.; methodology, A.D.; software, E.S. and A.D.; validation, E.S. and A.D.; formal analysis, A.D.; investigation, A.D. and E.S.; data curation, E.S.; writing—original draft preparation, E.S.; writing—review and editing, E.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was performed with partial support of the Russian Science Foundation grant number 23-29-00339.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
