*Article* **Machine Learning Feedback Control Approach Based on Symbolic Regression for Robotic Systems**

**Askhat Diveev \* and Elizaveta Shmalko**

Federal Research Center "Computer Science and Control", the Russian Academy of Sciences, 119333 Moscow, Russia **\*** Correspondence: aidiveev@mail.ru

**Abstract:** A control system of an autonomous robot produces a control signal based on feedback. This type of control implies the control of an object according to its state that is mathematically the control synthesis problem. Today there are no universal analytical methods for solving the general synthesis problem, and it is solved by certain particular approaches depending on the type of control object. In this paper, we propose a universal numerical approach to solving the problem of optimal control with feedback using machine learning methods based on symbolic regression. The approach is universal and can be applied to various objects. However, the use of machine learning methods imposes two aspects. First, when using them, it is necessary to reduce the requirements for optimality. In machine learning, optimization algorithms are used, but strictly optimal solutions are not sought. Secondly, in machine learning, analytical proofs of the received properties of solutions are not required. In machine methods, a set of tests is carried out and it is shown that this is sufficient to achieve the required properties. Thus, in this article, we initially introduce the fundamentals of machine learning control, introduce the basic concepts, properties and machine criteria for application of this technique. Then, with regard to the introduced notations, the feedback optimal control problem is considered and reformulated in order to add to the problem statement that such a property adjusts both the requirements of stability and optimality. Next, a description of the proposed approach is presented, theoretical formulations are given, and its efficiency is demonstrated on the computational examples in mobile robot control tasks.

**Keywords:** control synthesis; optimal control; stabilization; symbolic regression; machine learning; evolutionary algorithm; mobile robot

**MSC:** 49M25; 68T05

#### **1. Introduction**

Aiming at the automation of processes, we intend to automate the very process of control systems development in order to make it fast and generic. This sounds especially relevant in the context of ever-increasing robotization and the emergence of a variety of robots as control objects. To reach this goal of all-round automation, it is necessary to generalize the needed tasks, that is, to formulate them in general mathematical statements, and then develop universal methods for solving them. However, the problem here is that despite the extensive theoretical background of control theory, today, there is a wide range of applied problems that do not have exact analytical solutions. At the same time, there is an objective need for solving them.

In fact, in robotics, most modern control systems for robots are programmed by hand, and engineers do not even set the general problems because there are no general ways to solve them. The developer, based on his experience, sets the structure of the control system, determines the control channels, types of regulators, and then adjusts the parameters of the given system so that they meet certain requirements [1]. However, every problem can and should be considered an optimal one, defining not only the parameters, but also the structure of the control system optimally and, again, automatically.

**Citation:** Diveev, A.; Shmalko, E. Machine Learning Feedback Control Approach Based on Symbolic Regression for Robotic Systems. *Mathematics* **2022**, *10*, 4100. https://doi.org/10.3390/ math10214100

Academic Editor: Liliya Demidova

Received: 9 October 2022 Accepted: 31 October 2022 Published: 3 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

If a robot has to perform rather simple actions, for example, moving from one point to another and going around some obstacles, then the program of its control system contains supposedly several hundreds of lines. In more complex control tasks, the programs that must control robots can include several tens or hundreds of thousands of lines. These programs will grow as the tasks or the robots structure become more complex. One can assume that a control system for a robot that repeats the actions of a fly must contain some millions of lines. It follows from that stated above that the manual creation of the robot control system is an unpromising direction. It is necessary to automate this process.

Any problem for robots, as well as any other control objects, can be formulated as a mathematical optimization problem, such as a problem of providing stability, an optimal control problem for finding optimal path in current real conditions, a problem of stabilization of movement along of the optimal path, a problem of avoiding collisions with static and dynamic obstacles, the problem of interaction with other control objects, the problem of precise achievement of some given terminal conditions and so on. The most general problem in robotics is feedback control synthesis. It assumes that a control system that makes the object reach its goal is designed as a function of the object state optimally according to given criteria. Even if the optimal control problem is solved and the optimal path is found, we must further ensure the movement of the object along the obtained trajectory to compensate for possible ever-existing uncertainties.

The general synthesis problem was formulated back in the early 1960s by Bellman [2,3], where the continuous-time nonlinear optimal control problem was solved through the Hamilton–Jacobi–Bellman equation, which is a nonlinear partial differential equation. Even in simple cases, the HJB equation may not have global analytic solutions. Various numerical methods based on dynamic programming have been proposed in the literature [4–7], including the modern adaptive dynamic programming technique [8–10] and reinforcement learning [11–13]. However, the main drawback of dynamic programming methods today is still the computational complexity required to describe the value function, which grows exponentially with the dimension of its domain.

A different way to construct a feedback optimal control is firstly to solve an optimal control problem by direct methods of nonlinear programming or by the indirect approach of the Pontryagin maximum principle and then to synthesize a feedback stabilization system in order to supply movement along the received optimal trajectory. For example, in [14], points are placed on the trajectory, and the object is stabilized at these points. This is the most popular practical approach to feedback optimal control system design.

However, concerning the optimality criterion, this approach is not correct since it turns out that the optimal path is considered for one control object, and the introduced stabilization system changes the object so that the calculated path may not be optimal for the modified object model. In addition, when approaching a given point on the path, the system slows down, so it is necessary to carry out additional estimates in each specific task, according to the optimal moments of points switching.

In this work, we propose an inverse approach to feedback optimal control system synthesis. The general idea is the following. We firstly stabilize an object according to some point in the state space by solving the stabilization system synthesis problem. Note that this problem is computationally easier than the general synthesis problem. The stabilization task can be solved by a plain variety of methods depending on the complexity of the object model, particularly analytical methods of backstepping [15,16] or the analytical design of aggregated controllers [17], or synthesis based on the application of the Lyapunov function [18,19], as well as any classical methods for liner systems, such as modal control [20], differently tuned PID controllers [21], and fuzzy [22] and neural network [23] controllers. In the overwhelming majority of cases, the control synthesis problem is solved analytically or technically by taking into account the specific properties of the mathematical model. Today, modern numerical machine learning methods can be applied to find a solution for generic dynamic objects [24].

This new paradigm of machine learning control [25,26] allows to find some good near optimal solutions in a limited amount of time. However, due to the novelty of these methods, it becomes necessary to substantiate the results obtained by machine learning. In this paper, we introduce definitions of some machine properties of the system. We introduce the definition of machine learning control from our point of view, give machine proof of the existence of a specific property in some mathematical model, refine the definition of the feasibility property of the mathematical model of the control object and present the extended statement of the optimal control problem.

The addition of the stabilization system into the object model gives it a new property: at each moment of time, the object has a point of equilibrium. Thus, in the synthesized optimal control approach, the uncertainty in the right parts is compensated by the stability of the system relative to a point in the state space. Near the equilibrium point, all solutions converge. Now, we can solve the problem of optimal control through the optimal position of the equilibrium point. The found synthesized optimal control can be realized in the real object directly without additional feedback stabilization loops.

The paper is structured in the following order. After the introduction, the theoretical base of machine learning control is presented in Section 2, introducing the main definitions and machine criteria for justification of the results received by machine numerical methods. Next, in Section 3, we formulate the mathematical statement of the problem of feedback machine learning control, extending the optimal control problem statement with additional requirements. Then in Section 4, the paper proposes a synthesized approach to the solution of the stated feedback optimal control problem. Algorithms for its solution are considered in Sections 5 and 6, and computational examples of solving control problems for mobile robots are presented in Section 7. In the experimental part, the computational examples of synthesized control application for solving feedback control problems in the class of feasible controls for mobile robots are presented.

#### **2. Theoretical Base of Machine Learning Control**

Summarizing various definitions of machine learning [27–29], we can conclude that machine learning is an inexact numerical solution of some mathematical optimization problem, that is, the solution obtained by machine learning differs from the exact one by some known value but satisfies the researcher, and it can be improved with continuing learning. In all cases, different optimization algorithms are used for machine learning, but for these algorithms, it is enough to find a near optimal solution.

Let us introduce some definitions.

**Definition 1.** *The machine learning problem is a search of an unknown function.*

$$\mathbf{y} = \mathfrak{a}(\mathbf{x}, \mathbf{q}),\tag{1}$$

where **<sup>y</sup>** is a vector of function values, **<sup>y</sup>** ∈ R*<sup>r</sup>* , **<sup>x</sup>** is a vector of arguments, **<sup>x</sup>** ∈ R*n*, **<sup>q</sup>** is a vector of constant parameters, **<sup>q</sup>** ∈ <sup>Q</sup> ⊆ R*p*,

$$\mathfrak{a}(\mathfrak{x}, \mathfrak{q}) : \mathbb{R}^n \times \mathbb{R}^p \to \mathbb{R}^r. \tag{2}$$

This function during training approximates some data set, which is called the training sample:

$$J = \sum\_{i=0}^{N} ||\mathfrak{F}^i - a(\mathbf{x}^i, \mathbf{q})||\_\prime \tag{3}$$

where Yˆ <sup>=</sup> {**y**<sup>ˆ</sup> 1,..., ˆ**y***N*} is a training sample.

With unsupervised learning, this function is used for the minimization of some functional

$$J = \int\_{0}^{t\_f} f\_0(\mathbf{x}(t), \boldsymbol{\alpha}(\mathbf{x}(t), \mathbf{q}))dt,\tag{4}$$

where *tf* is the goal achievement time.

**Definition 2.** *Machine learning is finding a solution of the optimization problem in the given* Δ *neighborhood of the optimal solution.*

The peculiarity of machine learning is that learning does not require the exact achievement of minimum criterion (3) or (4):

$$J\_1 \le \min J + \Delta^\*,\tag{5}$$

where Δ∗ is a given positive value determining a functional value achievable during learning. For criterion (3), a minimum value is equal to zero. For criterion (4), the minimal value can be unknown. Then, the limit minimum value can be used instead:

$$J\_1 = J^- + \Delta^\*,\tag{6}$$

where *J*<sup>−</sup> ≤ min *J*.

If, as a result of learning, the found function (1) must acquire some properties, then the proof of the presence of these properties is confirmed by simulation.

$$J\_2 = \sum\_{i=0}^{K} \theta(\phi(\alpha(\mathbf{x}^i, \mathbf{q}))),\tag{7}$$

where *ϑ*(*z*) is the Heaviside function

$$\theta(z) = \begin{cases} 1, \text{if } z > 0 \\ 0, \text{otherwise} \end{cases} \tag{8}$$

*φ*(**x**, **q**) is a condition that determines whether a function property exists

$$
\phi(\mathbf{x}, \mathbf{q}) \le 0,\tag{9}
$$

where *K* is a number of consecutive experiments performed with a positive result (9), set to prove the presence of a property.

#### **Definition 3.** *Machine learning control is a search of control function.*

Machine learning searches for a function that, for some sets of arguments, returns the required values. Note that there can be many such functions, and all they can have various structures and parameter values.

According to the introduced Definitions 1–3, an optimization problem of the control function search must be formulated for machine learning control. A solution of this problem is not optimal, as the found function gives a value of the quality criterion close to optimal one. On the one hand, this might reduce the importance of the solution found, but on the other hand, it allows for solving very complex problems.

Let us be given a mathematical model of a control object. This model can be derived from physical laws or identified by some machine learning technique [30,31]. Generally, this model is described by a system of ordinary differential equations with a free control vector in the right-hand side:

$$
\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{u}),
\tag{10}
$$

where **x** is a state vector, **u** is a control vector,

$$\begin{array}{rcl} \mathbf{x} & = & [\mathbf{x}\_1 \dots \mathbf{x}\_n]^T \\ \mathbf{u} & = & [u\_1 \dots u\_m]^T, \; m \le n \\ \mathbf{f}(\mathbf{x}, \mathbf{u}) & = & [f\_1(\mathbf{x}, \mathbf{u}) \dots f\_n(\mathbf{x}, \mathbf{u})]^T. \end{array} \tag{11}$$

The problem of control, including machine learning control, is to find a control function instead of the control vector

$$\mathbf{u} = \mathbf{h}(\mathbf{x}),\tag{12}$$

to make the differential equation system

$$
\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{h}(\mathbf{x})),
\tag{13}
$$

acquire some new properties. For example, these can be such properties as stability, the optimality of solutions, and others.

In machine learning, the control function of these new properties of the control object has to be checked by a computer as well.

When the control function is derived analytically, then the system is guaranteed to have the desired property. In the case of machine learning control, events occur when the system does not have the desired property. Let us call them bad events. For example, the robot reaches the terminal position from almost all initial conditions, but does not reach it from some other initial condition. Although such events are rare with good training, they can occur, and the probability of its occurrence is not known. We also need to introduce some estimate when we can consider that the probability of the bad event is small, and we can consider learning to be successful, i.e., assume that the system has obtained the desired property.

The appearance of bad events is due to the presence of various uncertainties and disturbances in the system. According to Lyapunov [32], the existing uncertainties can be considered uncertainties in the initial conditions.

Let us formulate a machine criterion of obtaining some property by a differential equation system. To define the property of the whole system (13), it is enough to set a quantity *K* of partial solutions that obtain this property.

**Definition 4.** *If D experiments are carried out and in every i experiment, Ki partial solutions of the differential equation perform the required property from any Mi* ≥ *Ki randomly selected initial conditions from the initial domain, and*

$$\lim\_{D \to \infty} \sum\_{i=1}^{D} \frac{K\_i}{M\_i} \to 1,\tag{14}$$

*the existence of this property for the differential equation in this domain is proven by machine.*

In other words, as the number of experiments increases, the probability of such a bad event, when the system does not have the desired property, tends to zero. From a mathematical point of view, this means that all private solutions for the domain of initial conditions have this property, except solutions for a subset of a zero measure.

Now we can redefine some properties of differential equations into appropriate machine properties.

Let the computer check the new properties in terminal time interval, (0; *t* +).

Let, in the state space of differential equation system (13), a manifold of the dimension *n* − *s* be defined by

$$\phi\_i(\mathbf{x}) = 0, \ i = 1, \ldots, s. \tag{15}$$

**Definition 5.** *In some domain* <sup>X</sup> ∈ R*n, the following properties are performed: for given quantity <sup>K</sup> of initial conditions* **<sup>x</sup>**0,*<sup>i</sup>* ∈ <sup>X</sup>*, <sup>i</sup>* = 1, ... , *<sup>K</sup> for the partial solution* **<sup>x</sup>**(*t*, **<sup>x</sup>**0,*<sup>i</sup>* ) *of differential Equation (13)* ∃*t* , 0 < *t* ≤ *t* <sup>+</sup>*. Then,*

$$\left\|\left|\phi(\mathbf{x}(\mathbf{t}',\mathbf{x}^{0,i}))\right\|\leq\Delta,\ i=1,\ldots,K,\tag{16}$$

*where <sup>φ</sup>*(**x**)=[*φ*1(**x**)... *<sup>φ</sup>s*(**x**)]*T, and* ∀*<sup>t</sup>* < *t* ≤ *t* +

$$\|\|\phi(\mathbf{x}(t, \mathbf{x}^{0,i}))\|\| < \Delta, \ i = 1, \ldots, K. \tag{17}$$

*Then, differential equation system (13) is machine stable on a bound time interval* (0; *t* +) *relative to the manifold (15).*

If a dimension of the manifold equals to 0, then a machine stable equilibrium point is obtained. Coordinates of this point in the state space are determined from solving the algebraic equation system,

$$\phi\_i(\mathbf{x}) = 0, \ i = 1, \ldots, n. \tag{18}$$

The definition of machine stability uses a manifold (15) that can be expressed from the partial solution. Let **x**(*t*, **x**0) be a partial solution of differential Equation (13):

$$\mathbf{x}(t, \mathbf{x}^0) = [\vec{\mathbf{x}}\_1(t) \dots \vec{\mathbf{x}}\_n(t)]^T. \tag{19}$$

Let us solve one component (19) relative to *t*. Let it be the last component:

$$t = \omega(\mathfrak{x}\_n). \tag{20}$$

After inserting Equation (20) in solution (19), a one-dimensional manifold is received:

$$\left(\boldsymbol{\omega}\_{i}(\boldsymbol{\omega}(\boldsymbol{\tilde{x}}\_{\boldsymbol{n}}),\mathbf{x}^{0}) - \boldsymbol{\tilde{x}}\_{i}(\boldsymbol{\omega}(\boldsymbol{\tilde{x}}\_{\boldsymbol{n}})) = 0, \ i = 1, \ldots, n - 1. \tag{21}$$

Machine stability relative to the manifold (21) is the machine stability of solution (19) of differential Equation (13).

Now consider the equilibrium points of some generic differential equation:

$$
\dot{\mathbf{x}} = \mathbf{w}(\mathbf{x}),
\tag{22}
$$

where **<sup>x</sup>** ∈ R*n*, **<sup>w</sup>**(**x**) : R*<sup>n</sup>* → R*n*.

Analytically, the equilibrium points are defined as solutions of the system of algebraic equations:

$$\mathbf{w}(\mathbf{x}) = 0.\tag{23}$$

Machine-determined equilibrium points **x**˜ 1, ... , **x**˜*<sup>K</sup>* are the points that satisfy the following condition:

$$\|\|\mathbf{w}(\tilde{\mathbf{x}}^{\bar{i}})\|\| \le \varepsilon\_1 \tag{24}$$

and ∀**x**˜*<sup>i</sup>* , **x**˜*<sup>j</sup>* , *i* = *j*,

$$\|\|\bar{\mathbf{x}}^{i} - \bar{\mathbf{x}}^{j}\|\| > \varepsilon\_{1}, i, j \in \{1, \ldots, K\}, \tag{25}$$

where *ε*<sup>1</sup> is a given small positive number.

**Definition 6.** *An equilibrium point* **<sup>x</sup>**˜ ∈ R*<sup>n</sup> of differential Equation (22) is stable if there is a domain* X0 ⊆ R*n,* **<sup>x</sup>**˜ ⊂ *<sup>X</sup>*<sup>0</sup> *such that it contains a sphere* <sup>S</sup>

$$\sum\_{i=1}^{n} (x\_i - \bar{x}\_i)^2 = r^2. \tag{26}$$

*where <sup>r</sup>* > *<sup>ε</sup>*<sup>1</sup> *is located completely in this domain* X0 *<sup>S</sup>* ⊂ X0*, and* ∀**x**<sup>0</sup> ∈ <sup>S</sup>*, a partial solution* **<sup>x</sup>**(*t*, **<sup>x</sup>**0) *of differential Equation (22), will reach the point* **<sup>x</sup>** ∈ <sup>S</sup> *for limited time*

$$\left\|\mathbf{x}(t\_{f'},\mathbf{x}^0) - \widetilde{\mathbf{x}}\right\| \leq \varepsilon\_{1\prime} \tag{27}$$

*where tf* < *t* <sup>+</sup> < ∞*.*

The sphere is introduced here in order to guarantee that the equilibrium point is inside the region and exclude it from falling on the boundary.

The introduced machine interpretations of the known properties of objects eliminate the need to analytically prove the existence of these properties for an object since this is often very laborious or completely impossible. This allows further solving complex technical problems by machine methods and checking the achievement of the required properties by machine.

#### **3. Machine Learning Feedback Control**

Recall our goal. We want to automate the design of an automatic control system. For this purpose, it is necessary to formulate for the computer the control problem and make the computer solve it automatically and design a control system for a control object without human.

To do this, let us formulate the problem in a general mathematical setting of optimal control.

The mathematical model of the control object is given in the form of differential equation system (10).

The initial condition is given:

$$\mathbf{x}(0) = \mathbf{x}^0 \in \mathbb{R}^n. \tag{28}$$

Given the terminal position as a goal,

$$\mathbf{x}^{f} = [\mathbf{x}\_1^{f} \dots \mathbf{x}\_n^{f}]^T. \tag{29}$$

The quality criterion is given in the form of an integral functional:

$$J\_1 = \int\_0^{t\_f} f\_0(\mathbf{x}, \mathbf{u})dt \to \min. \tag{30}$$

It is necessary to find a control function in the form

$$\mathbf{u} = \mathbf{g}(\mathbf{x}, t). \tag{31}$$

where **g**(**x**, *t*)=[*g*1(**x**, 1)... *gm*(**x**, *t*)]*T*, which makes object (10) achieve given goal (29) with the optimal value of quality criterion (30). A found control function (31) has to satisfy the boundaries:

$$
\mu\_i^- \le g\_i(\mathbf{x}, t) \le \mu\_i^+, \ i = 1, \dots, m. \tag{32}
$$

We are looking for control as a function of the state of the object, which corresponds to the principle of feedback control. It is generally accepted that this type of control is implemented in real systems since it allows leveling the inaccuracies of the model.

**Definition 7.** *For a mathematical model to correspond to a dynamic real object, it is necessary and sufficient that the mathematical estimation error of the real object state does not increase over time.*

That is, the introduction of the feedback control to the differential equation system gives the system some property that allows the object to achieve the goal with the optimal quality value, that is, to be feasible. The question is, what is this property?

It is clear that not all control systems are feasible. For example, optimal but open-loop control systems do not have the feasibility property. Conversely, Lyapunov-stable systems are feasible. However, there are examples when the solution is not Lyapunov stable but at the same time it is feasible [32]. For example, when moving through points, the movement itself to a point is Lyapunov stable, but movement along a trajectory consisting of points is not Lyapunov stable, but this control is now most often implemented. Thus, it becomes necessary to formulate a property that makes it possible to determine the feasibility of the system.

In fact, by introducing a feedback system, we change the differential equations of the system so that a certain area appears around some particular solution of the system (the optimal trajectory) such that other trajectories that fall into this area will not leave it.

This trajectory is a partial solution of differential equation

$$
\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{g}(\mathbf{x}, t)) \tag{33}
$$

for the found optimal control.

**Definition 8.** *The partial solution* **x**(*t*, **x**0) *of differential Equation (22) has a compressibility property if, for any other partial solution* **x**(*t*, **x**∗)*, the following conditions are performed. If*

$$\left\|\mathbf{x}(t',\mathbf{x}^0) - \mathbf{x}(t',\mathbf{x}^\*)\right\| \le \sigma,\tag{34}$$

*where t* <sup>&</sup>gt; <sup>0</sup>*, <sup>σ</sup>* <sup>&</sup>gt; <sup>0</sup>*, then* <sup>∃</sup>*<sup>α</sup>* <sup>&</sup>gt; <sup>0</sup> *such, that for any <sup>ε</sup>*<sup>+</sup> <sup>&</sup>gt; <sup>0</sup>

$$\left\|\mathbf{x}(t'+\mathfrak{a},\mathbf{x}^0)-\mathbf{x}(t'+\mathfrak{a},\mathbf{x}^\*)\right\|\leq\varepsilon^+.\tag{35}$$

**Hypothesis 1.** *To realize the found optimal control function (31) in the real control object, the optimal trajectory must compressibility properties (34) and (35).*

Obviously, if a control function provides performing properties (34) and (35), then this control function according to Definition 8 can be realized in the real object directly. According to Definition 8, an unstable differential equation cannot be realized. Highly unstable systems exist, but they cannot be described by unstable differential equations because these differential equations cannot estimate the state of unstable objects in time. Any small error in the initial conditions for the unstable differential equation of a mathematical model will be increasing over time. To estimate the state of an unstable object, it is necessary to use a stable differential equation.

Thus, to solve the stated feedback optimal control problem, it is necessary to construct such a control function (31) that makes the object (10) achieve given goal (29) with the optimal value of quality criterion (30) and obtain required properties (34) and (35).

#### **4. Synthesized Optimal Control Approach for the Solution of the Stated Problem**

In this section, we propose our synthesized optimal control approach [33] that completely satisfies requirements (34) and (35) in the construction of optimal control (31).

The idea of the approach consists in providing the object with the existence of some equilibrium point in the state space and then constructing such a control function that controls the position of the equilibrium point in order to make the object reach the goal with the optimal value of the quality criterion.

Initially, the control synthesis problem is solved to provide the existence of the equilibrium point. As a result, the control function in the following form is found:

$$\mathbf{u} = \mathbf{h}(\mathbf{x}^\* - \mathbf{x}),\tag{36}$$

where **x**∗ in each fixed moment of time is some point in the state space that affects the position of the equilibrium point of the differential equation:

$$
\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{h}(\mathbf{x}^\* - \mathbf{x})),
\tag{37}
$$

$$\mathbf{h}(\mathbf{x}^\* - \mathbf{x}) = \left[ h\_1(\mathbf{x}^\* - \mathbf{x}) \dots h\_m(\mathbf{x}^\* - \mathbf{x}) \right]^T. \tag{38}$$

The control function (38) must satisfy restrictions for any position of the point **x**∗

$$
\mu\_i^- \le h\_i(\mathbf{x}^\* - \mathbf{x}) \le \mu\_i^+, \ i = 1, \dots, m. \tag{39}
$$

For any value **x**∗, the differential equation system (37) has an equilibrium point **x**˜(**x**∗):

$$\mathbf{f}(\ddot{\mathbf{x}}(\mathbf{x}^\*), \mathbf{h}(\mathbf{x}^\* - \ddot{\mathbf{x}}(\mathbf{x}^\*))) = 0. \tag{40}$$

A matrix of Jacobi

$$\mathbf{A}(\mathbf{x}^\*) = \frac{\partial \mathbf{f}(\mathbf{x}, \mathbf{x}^\* - \mathbf{x})}{\partial \mathbf{x}},\tag{41}$$

computed in the equilibrium point **x**˜(**x**∗) has all eigenvalues in the left part of the complex plane.

$$\det(\mathbf{A}(\mathbf{x}^\*) - \lambda \mathbf{E}) = \prod\_{i=1}^n (\lambda - \lambda\_j) = 0,\tag{42}$$

where

$$
\lambda\_{\dot{j}} = \alpha\_{\dot{j}} + i\beta\_{\dot{j}}.\tag{43}
$$

*<sup>α</sup><sup>j</sup>* <sup>&</sup>lt; 0, *<sup>j</sup>* <sup>=</sup> 1, . . . , *<sup>n</sup>*, *<sup>i</sup>* <sup>=</sup> √−1.

In many cases, the equilibrium point **x**˜ coincides with the point **x**∗, but in some cases, it is impossible. For example, if the differential equation system includes an equation *x*˙*<sup>k</sup>* = *xl*, then the component *xk* of the equilibrium point will have only value 0 for any values of components *x*∗ *k* .

Note that when this control synthesis problem is solved by some machine learning method, conditions (41) and (42) cannot be checked for each mathematical expression **h**(**x**<sup>∗</sup> − **x**) of the control function because these are very time-consuming procedures. In machine learning control, to prove the stability in an equilibrium point, Definition 6 is used.

To synthesize control function (36), it is necessary to determine domain X ∈ R*<sup>n</sup>* and then to determine equilibrium point **x**˜. If the equilibrium point is equal to point **x**∗, then the control function is searched in the form of (36), where **x**∗ = **x**˜.

Computationally, to provide a stable property of equilibrium point **x**˜, the synthesis problem (10)–(12) is solved with the terminal point **<sup>x</sup>***<sup>f</sup>* = **<sup>x</sup>**˜, the initial domain X0 ⊂ X, and the quality criterion

$$J = \max\{t\_{f,1}, \dots, t\_{f,K}\} + a\_1 \sum\_{i=1}^{K} \Delta\_{f,i} \to \min,\tag{44}$$

where *a*<sup>1</sup> is the weight coefficient,

$$\Delta\_{f,i} = \left\| \mathbf{x}^f - \mathbf{x}(t\_{f,i}, \mathbf{x}^{0,i}) \right\|\_{\prime} \tag{45}$$

where *tf* ,*<sup>i</sup>* is the time of achievement of the terminal position (29) from the initial condition **<sup>x</sup>**0,*<sup>i</sup>* of the set of initial conditions X0 = {**x**0,1,..., **<sup>x</sup>**0,*K*}, *<sup>i</sup>* ∈ {1, . . . , *<sup>K</sup>*},

$$t\_{f,i} = \begin{cases} \ t, & \text{if } \ t < t^+ \text{ and } \ \Delta\_{f,i} \le \varepsilon \\\ t^+, & \text{otherwise} \end{cases},\tag{46}$$

where *t* <sup>+</sup> and *ε* are given positive values, **x**(*t*, **x**0,*<sup>i</sup>* ) is a partial solution of the system

$$
\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{h}(\mathbf{x}^f - \mathbf{x})),
\tag{47}
$$

for initial conditions **x**(*t*0) = **x**0,*<sup>i</sup>* , *i* ∈ {1, . . . , *K*},

$$\left\|\mathbf{x}^{f} - \mathbf{x}\right\| = \sqrt{\sum\_{i=1}^{n} (x\_i^f - x\_i)^2}.\tag{48}$$

In the second stage, the following optimal control problem is solved. The mathematical model of the control object is given in the form of (37), and the initial conditions are given as (28). It is necessary to find control as a function of time:

$$\mathbf{x}^\* = \mathbf{v}^\*(t),\tag{49}$$

in order to minimize the functional

$$J\_2 = \int\_0^{t\_f} f\_0(\mathbf{x}, \mathbf{x}^\* - \mathbf{x})dt \to \min\_{\mathbf{x}^\* \in \mathcal{X}}.\tag{50}$$

The obtained control

$$\mathbf{u} = \mathbf{g}(\mathbf{x}, t) = \mathbf{h}(\mathbf{v}^\*(t) - \mathbf{x}) \tag{51}$$

allows performing conditions (34) and (35); therefore, it can be realized in the real object.

Further in the paper, we discuss the machine learning methods appropriate to solve the described problems and show the examples of applying the proposed approach to the solution of two different robotic tasks.

#### **5. Symbolic Regression for Machine Learning**

According to the introduced Definitions 1, 2 and 3, the task of searching for the needed control function (36) in the first step is to be considered a machine learning task.

A search of an unknown function consists in searching for the structure and parameters of this function. Usually structures of the functions are set by a researcher on the base of data analysis, experience, or intuition. Today different universal structures become popular such as various mathematical series and artificial neural networks. If a structure of the needed function is set, then machine learning searches for the optimal values of parameters according to some criterion [34].

An ML technique such as symbolic regression allows to look for the optimal structure of the needed function and parameters as well [35].

Symbolic regression methods have made huge strides over the past decade and recently, the importance of interpretable machine learning has been recognized by the wider scientific community. However, to a greater extent, symbolic regression methods are used for so-called supervised machine learning, when there are some data that need to be approximated [36–39].

The considered problem of machine learning for control does not have a training set, and the search for a control function must be based on minimizing the quality criterion. This approach, in conventional terminology, refers to unsupervised learning. In this direction, there are much fewer examples due to the complexity of the search. In [40–42], the control functions are searched as linear combinations of basic functions, and mainly smooth functions are used as basic functions. We perform the control function search [43,44] in the form of function nesting, which allows to obtain more complex mathematical expressions, and also use a wider set of basic functions, including discontinuous functions.

All symbolic regression methods code the searched mathematical expression in the form of special code and search for the optimal solutionon the space of codes by a special genetic algorithm. For this purpose, a special crossover operation is developed. The application of a special crossover operation for two codes of parents allows to receive two new codes of child chromosomes. Different crossover operations are used for different code forms.

A complex crossover operation in symbolic regression methods, in our opinion, makes it difficult to find a solution. Creating new possible solutions as a result of a complex crossover operation is similar to generating new possible solutions. Therefore, the search process does not use the properties of evolution and is more like a random search. In order for the search algorithms of symbolic regression methods to have metaheuristic evolutionary properties, it is necessary that new possible solutions obtained by transforming existing possible solutions have the property of inheritance.

**Definition 9.** *The evolutionary algorithm has an inheritance property, if among the new possible solutions obtained, as a result of the evolutionary transformations of existing possible solutions, named parents, at least a given part of the new possible solutions have functional values, which differ from the functional values of the parents by not more than a given value.*

A universal approach to provide the inheritance property to any symbolic regression algorithm is using the principle of small variations of the basic solution [45]. The application of this principle makes it possible to find solutions that are close to optimal in a reasonable time.

In [24], this principle was applied to Cartesian genetic programming, and it improved the search process of the optimal solution. In the present paper, in the experimental part, the network operator method [46] is used, which was developed exactly for the solution of the control synthesis problem and was the first method where the principle of small variations was applied.

#### **6. Hybrid Algorithm for Optimal Control Problem**

The second step of the proposed approach (49) is essentially a pure optimization problem. Today, most generic optimization algorithms are based on population search [47], and so we also use them as a main technique, but according to the task, any other optimization algorithm can also be appropriate.

For the most complex optimal control problems with complex phase constraints, we propose to use a hybrid algorithm that combines GA [48], GWO [49] and PSO [50]. As we experimentally noticed, such a combination of the evolutionary algorithms allows to avoid the local minimum in complex tasks. A pseudocode of the algorithm can be found in Appendix A.

#### **7. Computational Experiment**

To demonstrate the proposed synthesized approach for the machine learning feedback control problem solution, let us consider two different optimal control tasks with mobile robots in complex environments with phase constraints.

#### *7.1. Two Mobile Robots with Bottlenecks Phase Constraints*

The first task we considered was to make two robots switch places with each other while accurately passing through the given areas, as if through bottlenecks.

The mathematical model of the control object has the following form:

$$\begin{array}{rcl} \dot{x}\_1 &=& 0.5(\boldsymbol{u}\_1 + \boldsymbol{u}\_2)\cos(\boldsymbol{x}\_3), \\ \dot{x}\_2 &=& 0.5(\boldsymbol{u}\_1 + \boldsymbol{u}\_2)\sin(\boldsymbol{x}\_3), \\ \dot{x}\_3 &=& 0.5(\boldsymbol{u}\_1 - \boldsymbol{u}\_2), \\ \dot{x}\_4 &=& 0.5(\boldsymbol{u}\_3 + \boldsymbol{u}\_4)\cos(\boldsymbol{x}\_6), \\ \dot{x}\_5 &=& 0.5(\boldsymbol{u}\_3 + \boldsymbol{u}\_4)\sin(\boldsymbol{x}\_6), \\ \dot{x}\_6 &=& 0.5(\boldsymbol{u}\_3 - \boldsymbol{u}\_4), \end{array} \tag{52}$$

where *x*1, *x*2, and *x*<sup>3</sup> are coordinates of the state vector of the first mobile robot, *x*4, *x*5, and *x*<sup>6</sup> are coordinates of the state vector of the second mobile robot, *u*<sup>1</sup> and *u*<sup>2</sup> are components of the control vector for the first robot, and *u*<sup>3</sup> and *u*<sup>4</sup> are components of the control vector for the second robot.

The values of control are limited:

$$-10 = \mu\_i^- \le \mu\_i \le \mu\_i^+ = 10, \text{ } i = 1, \text{ 2, 3, 4.}\tag{53}$$

The initial and terminal conditions are given:

$$\mathbf{x}(0) = \mathbf{x}^0 = [0 \; 0 \; 0 \; 10 \; 10 \; 0]^T,\tag{54}$$

$$\mathbf{x}(t\_f) = \mathbf{x}^f = [10 \ 10 \ 0 \ 0 \ 0 \ 0]^T. \tag{55}$$

The quality functional is given:

$$J\_3 = t\_f + p\_1 \|\mathbf{x}^f - \mathbf{x}(t\_f)\| + p\_2 \int\_0^{t\_f} \theta(\chi(\mathbf{x}))dt + p\_3 \sum\_{i=1}^{k\_5} \sum\_{j=1}^2 \theta(\Lambda\_{i,j} - \varepsilon\_i) \to \min,\tag{56}$$

where

$$t\_f = \begin{cases} \ t\_\prime \text{if } t < t^+, & \text{and } \|\mathbf{x}^f - \mathbf{x}(t\_f)\| < \varepsilon\_0\\ t^+, & \text{otherwise} \end{cases},\tag{57}$$

$$\chi(\mathbf{x}) = r\_0 - \sqrt{(\mathbf{x}\_1 - \mathbf{x}\_4)^2 + (\mathbf{x}\_2 - \mathbf{x}\_5)^2},\tag{58}$$

*ϑ*(*α*) is a Heaviside step function

$$\theta(\alpha) = \begin{cases} 1, \text{if } \alpha \ge 0 \\ 0, \text{otherwise} \end{cases} \tag{59}$$

$$\Delta\_{i,j} = \min\_{t} \sqrt{(x\_{1+(j-1)3} - y\_1^i)^2 + (x\_{2+(j-1)3} - y\_2^i)^2},\tag{60}$$

*r*<sup>0</sup> = 2, *kg* = 4, *ε<sup>i</sup>* = 0.1, *i* = 1, 2, 3, 4, *y*<sup>1</sup> <sup>1</sup> = 4, *<sup>y</sup>*<sup>1</sup> <sup>2</sup> = 2, *<sup>y</sup>*<sup>2</sup> <sup>1</sup> = 6, *<sup>y</sup>*<sup>2</sup> <sup>2</sup> = 4, *<sup>y</sup>*<sup>3</sup> <sup>1</sup> = 4, *<sup>y</sup>*<sup>3</sup> <sup>2</sup> = 6, *<sup>y</sup>*<sup>4</sup> <sup>1</sup> = 6, *y*4 <sup>2</sup> = 8, *p*<sup>1</sup> = 4, *p*<sup>2</sup> = 3, *p*<sup>3</sup> = 4, *ε*<sup>0</sup> = 0.01, *t* <sup>+</sup> = 4.8.

In the first stage, according to the proposed approach, the control synthesis problem is solved in order to provide the existence of the equilibrium point.

The stabilization system was received by the network operator method. As far as the received expressions of the control function, both the encoded and decoded forms are too long, so we place them into the Appendix B.

In the second stage, it is necessary to solve the optimal control problem and to find the control function in the form of piece-wise constant control function

$$\mathbf{x}^\* = \mathbf{x}^{\*,\hat{i}}, \ (i-1)\Delta t \le t < i\Delta t,\tag{61}$$

where *i* = 1, . . . , *K*, Δ*t* is a time interval, Δ*t* = 0.6, and *K* is the number of time intervals

$$K = \left\lfloor \frac{t^+}{\Delta t} \right\rfloor = \left\lfloor \frac{4.8}{0.6} \right\rfloor = 8.\tag{62}$$

To solve the optimal control problem and find **x**∗,*<sup>i</sup>* , *i* = 1, ... , 8, the described hybrid algorithm was used. The following optimal solution was found:

$$\begin{array}{rcl} \mathbf{x}^{\*,1} &=& [5.4629 \, 0.8503 \, -0.0834 \, 5.5730 \, 7.4134 \, -0.2191]^T, \\ \mathbf{x}^{\*,2} &=& [8.0879 \, 4.6283 \, -0.1367 \, 2.727 \, 4.3233 \, -04311]^T, \\ \mathbf{x}^{\*,3} &=& [6.6929 \, 4.2258 \, -1.4392 \, 1.1911 \, 7.7361 \, 0.2100]^T, \\ \mathbf{x}^{\*,4} &=& [1.8651 \, 7.0765 \, -0.0173 \, 6.6029 \, 2.6080 \, 0.1310]^T, \\ \mathbf{x}^{\*,5} &=& [3.6284 \, 4.0688 \, 0.3204 \, 0.7814 \, 9.4491 \, -0.1612]^T, \\ \mathbf{x}^{\*,6} &=& [8.4951 \, 8.4002 \, 0.3134 \, 1.0557 \, 0.7920 \, 0.7306]^T, \\ \mathbf{x}^{\*,7} &=& [7.7752 \, 9.9316 \, -0.0237 \, 1.6134 \, 1.9251 \, 0.0495]^T, \\ \mathbf{x}^{\*,8} &=& [10.0465 \, 9.8035 \, 0.1303 \, -1.0000 \, 0.0051 \, 0.2369]^T. \end{array} \tag{63}$$

Optimal trajectories on horizontal plane for robots are presented in Figure 1.

**Figure 1.** Optimal trajectories of robots for synthesized control

In the Figure 1 the solid black line is a trajectory of the first robot, the dash line is a trajectory of the second robot, the small circles are the bottlenecks, the small black squares are the optimal control points (63) for the first robot, and the small white squares are the optimal control points (63) for the second robot. The optimal value of the functional (56) is 4.8347.

For comparison, the optimal control problem (52)–(60) was solved by the direct approach of optimal control. For this purpose, the time axis was divided on *K*˜ intervals. The control function is found in the form of the piece-wise linear function

$$u\_{\hat{j}} = \begin{cases} u\_{\hat{j}}^{+} = 10, \text{if } \hat{u}\_{\hat{i}} > u\_{\hat{i}}^{+} \\\ u\_{\hat{j}}^{-} = -10, \text{if } \hat{u}\_{\hat{i}} < u\_{\hat{i}}^{-} \\\ \hat{u}\_{\hat{j}}, \text{otherwise} \end{cases}, j = 1, 2, 3, 4,\tag{64}$$

where

$$\mathfrak{a}\_{j} = (q\_{i+(j-1)\mathcal{K}+1} - q\_{i+(j-1)\mathcal{K}}) \frac{t - (i-1)\tilde{\Delta}t}{\tilde{\Delta}t} + q\_{i+(j-1)\mathcal{K}} \tag{65}$$

where *j* = 1, 2, 3, 4, *j* = 1, . . . , *K*˜, Δ˜ *t* is a time interval, Δ˜ *t* = 0.2,

$$\mathcal{R} = \left\lfloor \frac{t^+}{\Delta t} \right\rfloor = \left\lfloor \frac{4.8}{0.2} \right\rfloor = 24. \tag{66}$$

In total, it is necessary to find 96 parameters, **q** = [*q*<sup>1</sup> ... *q*96] *<sup>T</sup>*. The problem was very difficult for many evolutionary algorithms. The most successful in solving this problem was the described hybrid evolutionary algorithm.

In the result, the following solution was obtained:

**q** = [3.6515 − 5.6155 − 5.8103 − 4.1722 − 3.1398 − 5.4711 0.0936 6.6150 2.6432 − 8.8133 − 2.4498 − 18.5059 − 9.5896 0.1931 −5.5797 − 14.2516 9.5304 0.2181 0.6002 − 11.9435 − 12.2196 −0.0127 − 19.4712 8.8589 5.5695 8.4877 5.6459 0.7054 − 3.6582 −8.0966 − 0.6840 − 8.6774 7.7892 − 5.6366 − 5.3715 − 4.8317 −16.6047 − 19.8104 − 14.9474 7.6756 4.7000 9.7919 − 14.6483 −3.5860 − 3.1178 − 9.7188 − 16.2048 − 15.9328 − 1.3150 1.9570 −10.2673 − 0.5094 − 6.4163 − 4.9303 − 3.7649 − 6.3955 −5.8384 − 15.8273 − 9.2860 − 0.1217 9.0490 − 3.0543 0.8906 7.6340 10.8459 10.2492 3.4207 − 10.6311 − 4.9477 − 3.4041 −13.6140 − 15.2029 4.8782 − 10.4763 − 10.5894 6.4966 − 4.2872 −12.7573 − 8.2174 − 0.8267 − 14.1822 − 1.6810 − 15.3973 12.1957 15.4694 10.3573 − 12.7840 7.9684 − 6.3937 17.4171 − 6.6234 1.3378 −8.2870 − 0.2343 − 18.0791 − 5.3433] *T*. (67)

The functional value is 4.8132. The optimal trajectories on the horizontal plane are presented in Figure 2.

**Figure 2.** Optimal trajectories of robots for direct control.

To analyze the received results, these optimal control functions were tested for the mathematical model with perturbations:

$$\begin{array}{rcl} \dot{x}\_1 &=& 0.5(u\_1 + u\_2)\cos(x\_3) + \beta\_5^x(t),\\ \dot{x}\_2 &=& 0.5(u\_1 + u\_2)\sin(x\_3) + \beta\_5^x(t),\\ \dot{x}\_3 &=& 0.5(u\_1 - u\_2) + \beta\_5^x(t),\\ \dot{x}\_4 &=& 0.5(u\_3 + u\_4)\cos(x\_6) + \beta\_5^x(t),\\ \dot{x}\_5 &=& 0.5(u\_3 + u\_4)\sin(x\_6) + \beta\_5^x(t),\\ \dot{x}\_6 &=& 0.5(u\_3 - u\_4) + \beta\_5^x(t), \end{array} \tag{68}$$

where *ξ*(*t*) is a function that returns a random number from diapason (−1; 1) at every call, and *β* is a constant value.

For the system (68), disturbances were also introduced into the initial conditions

$$\mathbf{x}(0) = \mathbf{x}\_i^0 + \beta\_0 \mathbf{\tilde{y}}\_{\prime} \; i = 1, \ldots, 6,\tag{69}$$

where *β*<sup>0</sup> is a constant value. The results of the tests are presented in Table 1. For every disturbance, 10 tests were performed. In Table 1, *J*<sup>3</sup> is an average functional value for synthesized control, *σ*(*J*3) is a standard deviation for values *J*3, *J*<sup>4</sup> is an average functional value for direct control, and *σ*(*J*4) is a standard deviation for values *J*4.


**Table 1.** Functional values for perturbed control object model.

As the test results show, synthesized control is much less susceptible to perturbations of the mathematical model and the initial conditions than direct control. Direct optimal control is the most sensitive to disturbances of initial conditions. Even the smallest disturbances of the initial conditions make direct control unacceptable. The results show that the synthesized control obtains the compression property, and it is feasible in real systems.

#### *7.2. Synthesized Control for Omni-Mecanum-Wheeled Robot*

A mecanum robot has special wheels that allow it to move under a direct angle to its direction of axis without any turns [51]. In Figure 3, an example of a mecanum robot is shown.

**Figure 3.** Omni-mecanum-wheeled robot.

Consider the optimal control problem, where two identical mecanum robots have to swap their places in some area with obstacles and without collisions for a minimal amount of time.

The mathematical model of the control object is the following:

$$\begin{array}{llll} \dot{\mathbf{x}}\_{1} &=& 0.25((u\_1+u\_4)(\cos(\mathbf{x}\_3)+\sin(\mathbf{x}\_3))+(u\_2+u\_3)(\cos(\mathbf{x}\_3)-\sin(\mathbf{x}\_3))),\\ \dot{\mathbf{x}}\_{2} &=& 0.25((u\_1+u\_4)(\sin(\mathbf{x}\_3)-\cos(\mathbf{x}\_3))+(u\_2+u\_3)(\cos(\mathbf{x}\_3)+\sin(\mathbf{x}\_3))),\\ \dot{\mathbf{x}}\_{3} &=& 0.25(-u\_1+u\_2-u\_3+u\_4)/(L\_0+H\_0),\\ \dot{\mathbf{x}}\_{4} &=& 0.25((u\_5+u\_8)(\cos(\mathbf{x}\_6)+\sin(\mathbf{x}\_6))+(u\_6+u\_7)(\cos(\mathbf{x}\_6)-\sin(\mathbf{x}\_6))),\\ \dot{\mathbf{x}}\_{5} &=& 0.25((u\_5+u\_8)(\sin(\mathbf{x}\_6)-\cos(\mathbf{x}\_6))+(u\_6+u\_7)(\cos(\mathbf{x}\_6)+\sin(\mathbf{x}\_6))),\\ \dot{\mathbf{x}}\_{6} &=& 0.25(-u\_5+u\_6-u\_7+u\_8)/(L\_0+H\_0),\end{array} \tag{70}$$

where *x*1, *x*2, and *x*<sup>3</sup> are the state vector coordinates of the first mecanum robot, *x*4, *x*5, and *x*<sup>6</sup> are the state vector coordinates of the second mecanum robot, *u*1, *u*2, *u*3, and *u*<sup>4</sup> are components of the control vector of the first mecanum robot, *u*5, *u*6, *u*7, and *u*<sup>8</sup> are components of the control vector of the first mecanum robot, and *L*<sup>0</sup> and *H*<sup>0</sup> are geometric parameters of the robots, *L*<sup>0</sup> = 2, *H*<sup>0</sup> = 1.

The control is restricted:

$$-10 = \mu\_i^- \le \mu\_i \le \mu\_i^+ = 10, \ i = 1, \dots, 8,\tag{71}$$

where *u*− *<sup>i</sup>* and *<sup>u</sup>*<sup>+</sup> *<sup>i</sup>* are given lower and upper limits for values of control, respectively, *i* = 1, . . . , 8.

The initial state is given:

$$\mathbf{x}(0) = \mathbf{x}^0 = [0 \, 0 \, 0 \, 10 \, 10 \, 0]^T. \tag{72}$$

The terminal state is given:

$$\mathbf{x}(t\_f) = \mathbf{x}^f = [10 \ 10 \ 0 \ 0 \ 0 \ 0]^T \,\mathrm{.}\tag{73}$$

where *tf* is the time of achievement of the terminal state. It is determined by Equation (57) with *ε*<sup>0</sup> = 0.05, *t* <sup>+</sup> = 1.9.

The quality criterion includes phase constraints and the accuracy of the terminal state achievement.

$$J = p\_1 \|\mathbf{x}^f - \mathbf{x}(t\_f)\| + \sum\_{i=1}^K w\_i \int\_0^{t\_f} \theta(\phi\_i(\mathbf{x}))dt + p\_2 \int\_0^{t\_f} \theta(\chi(\mathbf{x}))dt + t\_f \to \min\_{\mathbf{u}},\tag{74}$$

where *p*<sup>1</sup> and *p*<sup>1</sup> are the penalty coefficient, where *p*<sup>1</sup> = 3 and *p*<sup>2</sup> = 3, *wi* is a weight coefficient, *i* = 1, ... , *K*, *K* = 8, *ϑ*(*α*) is a Heaviside step function (59), and *χ*(**x**) is determined by Equation (58):

$$\phi\_i(\mathbf{x}) = r\_i - \sqrt{(\mathbf{x}\_1) - \mathbf{x}\_{1,i}} + (\mathbf{x}\_2 - \mathbf{x}\_{2,i})^2, \text{ i } i = 1, 2, 3, 4,\tag{75}$$

$$\phi\_i(\mathbf{x}) = r\_{i-4} - \sqrt{(\mathbf{x}\_4) - \mathbf{x}\_{1,i-4}}^2 + (\mathbf{x}\_5 - \mathbf{x}\_{2,i-4})^2, \text{ } i = 5, 6, 7, 8, \tag{76}$$

*r*<sup>1</sup> = 2, *r*<sup>2</sup> = 2.5, *r*<sup>3</sup> = 2.5, *r*<sup>4</sup> = 2, *x*1,1 = 2, *x*2,1 = 2, *x*1,2 = 8, *x*2,2 = 2, *x*1,3 = 2, *x*2,3 = 8, *x*1,4 = 8, *x*2,4 = 8, *r*<sup>0</sup> = 1.

According to the synthesized method, initially, the feedback control synthesis problem is solved, such that the closed-loop control system is stable relative to some equilibrium point in the state space. For this purpose, again, the network operator method is used.

Since the robots are the same, we make the synthesis of the stabilization system for one robot, for example, the first robot.

In the result, the network operator method found the following control function:

$$u\_i = \begin{cases} u\_i^+ \text{, if } \vec{u}\_i \ge u\_i^+ \\ u\_i^- \text{, if } \vec{u}\_i \le u\_i^- \\ \vec{u}\_i \text{otherwise} \end{cases} \tag{77}$$

where

$$
\tilde{u}\_1 = D + \arctan(q\_1 \Delta\_1),
\tag{78}
$$

$$\begin{array}{rcl} \vec{u}\_{2} &=& \rho\_{19}(\vec{u}\_{1}) + \rho\_{4}(\mathsf{C}) + \rho\_{17}(\mathsf{B}) + \text{sgn}(q\_{2}\Lambda\_{2} + \\ & \rho\_{1}\Lambda\_{1} + q\_{3}\sqrt[3]{\Delta\_{3}}) + \rho\_{18}(q\_{2}\Lambda\_{2}) + q\_{1}\Lambda\_{1} - (q\_{1}\Lambda\_{1})^{3} \end{array} \tag{79}$$

$$\vec{u}\_3 = \vec{u}\_2 + \vec{\mathcal{C}}^3 + A + \arctan(q\_1) - (A + \arctan(q\_1))^3 + \Delta\_2^{-1},\tag{80}$$

$$\begin{array}{rcl} \vec{u}\_4 &=& \sin(\vec{u}\_3) + \rho\_{16}(\vec{u}\_2) + (\mathcal{C} + \sqrt[3]{\Delta\_1})^3 + B + \\ &\quad \theta(q\_2\Delta\_2 + q\_1\Delta\_1) + q\_3\sqrt[3]{\Delta\_3} \end{array} \tag{81}$$

$$A = \text{sgn}(q\_2\Delta\_2 + q\_1\Delta\_1 + q\_3\sqrt[3]{\Delta\_3}) + \theta(q\_2\Delta\_2 + q\_1\Delta\_1),$$

$$B = A + \arctan(q\_1) + \text{sgn}(q\_2\Delta\_2 + q\_1\Delta\_1) + \arctan(q\_1\Delta\_1),$$

$$\mathcal{C} \quad = \cos(B) + \rho\_4(q\_2\Delta\_2 + q\_1\Delta\_1) + \rho\_{16}(\Delta\_1) + \rho\_{19}(q\_1\Delta\_1 + q\_2\Delta\_2) + \exp(q\_1\Delta\_1),$$

$$D = \mathcal{C} + \sqrt[3]{\Delta\_1} + \mathcal{C}^3 + \rho\_B - A - \arctan(q\_1) + \rho\_{16}(q\_1\Delta\_1),$$

Δ<sup>1</sup> = *x*<sup>∗</sup> − *x*1, Δ<sup>2</sup> = *x*<sup>∗</sup> − *x*2, Δ<sup>3</sup> = *x*<sup>∗</sup> − *x*3, *q*<sup>1</sup> = 11.89282, *q*<sup>2</sup> = 10.15381, *q*<sup>3</sup> = 15.25903,

$$\rho\_4(\kappa) = \text{sgn}(\kappa)\sqrt{|\alpha|}, \ \rho\_{16}(\alpha) = \begin{cases} \alpha, \text{ if } |\alpha| < 1 \\ \text{sgn}(\alpha), \text{ otherwise} \end{cases}'$$

$$\rho\_{17}(\kappa) = \text{sgn}(\kappa)\ln(|\alpha|+1), \ \rho\_{18}(\kappa) = \text{sgn}(\kappa)(\exp(|\kappa|)-1),$$

$$\rho\_{19}(\kappa) = \text{sgn}(\kappa)\exp(-|\kappa|).$$

For the second robot in the control function (77), it is necessary to replace *xi*, *i* = 1, 2, 3 with *xi*, *i* = 4, 5, 6, and *x*<sup>∗</sup> *<sup>i</sup>* , *i* = 1, 2, 3, with *x*<sup>∗</sup> *<sup>i</sup> i* = 4, 5, 6, respectively.

Plots of the trajectories of the robot movement from four initial states are presented in Figure 4.

**Figure 4.** Trajectories on the horizontal plane for four initial states.

On the second stage, the optimal control problem (70)–(74) is solved for the closed loop control system with the control function (77). A control on the second stage is vector **x**∗, determining the position of the stable equilibrium point in the state space. For solving the optimal control problem, the time axis is divided on the intervals, and in each interval, a control function is approximated by a piece-wise constant function. The value of the interval is equal to Δ*t* = 0.19.

$$\mathbf{x}\_i^\* = \mathbf{x}\_i(t\_j) = q\_{i + (j-1)6\prime} \tag{82}$$

where *i* = 1, . . . , 6, *tj* = (*j* − 1)Δ*t j* = 1, . . . , *D*, *D* is the number of intervals,

$$D = \frac{t^+}{\Delta t} = \frac{1.9}{0.19} = 10.\tag{83}$$

In total, it was necessary to find an optimal vector with 10 · 6 = 60 parameters:

$$\mathbf{q} = [q\_1 \; \dots \; q\_{\ell 0}]^T \; .$$

For solving the problem, the hybrid evolutionary algorithm was applied. The values of the parameters were restricted:

$$\begin{array}{l} -2.5 = q\_1^+ \le q\_{1+3(k-1)} \le q\_1^- = 2.5\\ -2.5 = q\_2^+ \le q\_{2+3(k-1)} \le q\_2^- = 2.5\\ -5\pi/12 = q\_2^+ \le q\_{3+3(k-1)} \le q\_2^- = 5\pi/12 \end{array} \tag{84}$$

where *k* = 1, . . . , 20.

The following solution was obtained:

$$\begin{array}{l} \mathbf{q} = [-2.4862\ 2.0000\ 0.6231 - 2.1975 - 1.3799\ 0.7058 - 2.5000\ 1.9421\\ 1.2094 - 0.6933 - 2.0088 - 0.3378\ 0.1956\ 1.7643\ 0.6378\ 1.4052\\ -2.4611 - 0.3230\ 1.7245\ 2.0000\ 0.8872\ 1.6769 - 1.2832 - 0.4372\\ 0.5624\ 1.9987 - 1.1601\ 1.9452 - 1.8859\ 0.7357\ 1.6876\ 1.5024\\ 0.0997\ 1.1642 - 1.3678\ 0.5321\ 1.6945\ 1.9946\ 0.5880\ 0.7886\\ -1.6027\ 1.3090\ 1.1795\ 1.5385\ - 1.0798\ 0.6781\ - 1.9975\ 0.0204\\ -0.8939\ 1.0578\ - 0.4106\ - 2.1003\ - 1.2544\ - 1.3090\ - 2.5000\ 1.0746\\ -1.2216\ - 2.0840\ - 1.3344\ - 0.8913\end{array} \tag{85}$$

The optimal value of the functional (74) is *J* = 1.923.

Optimal trajectories on the horizontal plane of two robots are presented in Figure 5. The solid line is the trajectory of the first robot, the dash line is the trajectory of the second robot, the red circles are the phase constraints, the black small squares are projections of control **x**∗ on the horizontal plane for the first robot, and the white small squares are projections of control **x**∗ on the horizontal plane for the second robot.

Figure 5 shows very clearly, as in the previous example with bottlenecks, that the equilibrium points are located not on the trajectory of the robot, as is done with conventional stabilization, but outside the trajectory. By placing points on the trajectory, we can lose quality because when approaching the equilibrium point, the robot must slow down. Such an optimal arrangement of points ensures the optimal value of the functional.

In this example, we would like to note the following. It occurs that two components of the control vector are enough to control the mecanum robot. The other two components have limit values and do not change during the control process. This can be seen in the additional control plots presented in Appendix C. However, indeed, we noted that in the mathematical model of the mecanum robot (70), the control was redundant, *m* > *n*. So, the computer itself found a solution for how to proceed in this case, and in the solution found, the two components of control, in fact, do not participate in the search for the

optimal solution. This is one of the more successful demonstrations of intelligent machine automation of the process of creating control systems.

**Figure 5.** Optimal trajectories on the horizontal plane.

#### **8. Discussion**

In this work, we are laying the theoretical foundations of machine learning control. The main feature is that the machine proof of various properties is implemented experimentally basing on examples. In particular, this is what happens in neural networks, when experiments are carried out on a test sample to check whether the system achieves certain properties. We formulate several machine properties in the control system design. In the future, the proposed formal descriptions can be developed, new properties of systems can be considered, and quantitative probabilistic estimates can be given based on positive test results.

An important result of the work is the expansion of the formulation of the optimal control problem and the introduction of additional requirements for the required control. Ensuring the introduced conditions additionally requires the introduction of feedback. The paper presents the approach of synthesized optimal control, which allows implementing optimal control systems, taking into account the introduced additional requirement. In this case, other approaches can be considered and proposed.

#### **9. Conclusions**

A general machine learning approach for the automatic design of feedback control systems for any dynamical nonlinear control objects is considered. The main perspective is the machine-automated development of control systems. According to this trend, the control system is obtained as a result of a machine solution of some formal mathematical control problem and it is to be implemented in the real object directly.

Since the control system is created by machine learning, the paper formulates a machine check of all the properties required from the control system. Mathematical statements of control problems and some theoretical justifications for solution of these problems by machine methods are presented. The paper introduces and discusses such notions as machine learning control, stability, optimality and feasibility of machine-made control systems. In this regard, substantiations are introduced for the machine learning feedback control approach based on symbolic regression and evolutionary algorithms.

Thus, the feedback control design is generalized and automated with a generic approach applicable to any nonlinear models, including machine-learning-identified models. It is shown that with this approach, the computer is able to propose interesting outstanding solutions, which sometimes an engineer cannot even suppose.

There is another feature in the proposed approach that can be developed in a good direction. It is possible to control an object by controlling the position of the equilibrium point both offline under predetermined operating conditions, and online, when the positions of the points can be optimally planned for some short term and then adjusted according to the situation. This is one more direction for further research and application of the presented approach.

**Author Contributions:** Conceptualization, A.D. and E.S.; methodology, A.D. and E.S.; software, A.D. and E.S.; validation, A.D. and E.S.; formal analysis, A.D.; investigation, E.S.; writing—original draft preparation, A.D. and E.S.; writing—review and editing, E.S.; supervision, A.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is supported by the Ministry of Science and Higher Education of the Russian Federation, project No. 075-15-2020-799.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **Appendix A. Hybrid Evolutionary Algorithm**

During the experiments, we noticed that evolutionary algorithms work faster if they use as much information as possible about the goal function values in the space of search at the evolutionary transformation of each possible solution. The PSO algorithm at the construction of new possible solutions uses information about the current best possible solution, about the best possible solution among random selected informants, and about previous goal function values for each possible solution. The GWO algorithm uses information about some current best possible solutions. Therefore, these algorithms work well.

However, when the goal function has a complex form, or it includes many complex constraints, then these algorithms stop at some points of the local minimum. The GA in these cases begins to work better than other evolutionary algorithms. The GA often can shift the search from the current local minimum.

We propose a hybrid algorithm that includes all three listed above algorithms.

Initially, all arrays are created, and the type of evolutionary transformation is randomly chosen: GA, GWO or PSO. In each generation, the number of transformations of every algorithms is approximately the same. A pseudocode of the algorithm can be found in the Algorithms A1–A4. We proposed that the hybrid algorithm worked better than GA, PSO and GWO individually.

Here is the description of the proposed hybrid algorithm.

In the description of the algorithm, a function *Goal*(**q**) returns the goal function value for the vector of parameters **q**. Procedure *NumbertoGray*(*a*, **y**) converts the real number *a* to Gray code **y**. Procedure *GraytoNumber*(**y**, *a*), on the contrary, converts Gray code to a real number *a*. Procedure *Sort*(*I*, *F*, *k*) sorts the first *k* elements in array *F* and sets the first indexes in the array of index *I*.

#### **Algorithm A1** Hybrid algorithm.

**Require:** *H* > 0 is a number of possible solutions in the initial population, *G* > 0 is the number of generations, *R* > 0 is the number of evolutionary changes in one generation, *p* is the number of searched parameters, *q*<sup>+</sup> *<sup>i</sup>* > *q*<sup>−</sup> *<sup>i</sup>* , *i* = 1, ... , *p*, are restrictions on the parameters, *c* is the number bits in the integer part of the parameter, *d* is the number of bits in the fractional part of the parameter, *α*, *β*, *γ*, *σ*, *k*<sup>0</sup> are parameters for the PSO algorithm, and *kw* is the number of leaders for the GWO algorithm.

```
Ensure: q˜ = [q˜1 ... q˜p]
                          T is the optimal vector of parameters
  q0
    j = q˜j j = 1, . . . , p,
```

```
qi
 j ← (q+
        j − q−
              j )ξ + q−
                      j , j = 1, . . . , p, i = 1, . . . , H − 1
vi
 j ← 0, j = 1, . . . , p, i = 0, . . . , H − 1
Fj ← Goal(qj
              ), j = 0, . . . , H − 1
Ij = j, F˜
        j = Fj, j = 0, . . . , H − 1
t ← 0
while t < G do
   j− ← 0, F− ← F0, j ← 1
   while j < H do
       if Fj < Fj− then
           j− ← j,
           Fj− ← Fj
       end if
       j ← j + 1
   end while
   Sort(I, F˜, kw)
   s ← 0
   while s < R do
       ka ← ξ(3)
       if ka = 0 then
           GWO transformations
       end if
       if ka = 1 then
           GA transformation
       end if
       if ka = 2 then
           PSO transformation
       end if
       s ← s + 1
   end while
   t ← t + 1
end while
j− ← 0, F− ← F0, j ← 1
while j < H do
   if Fj < Fj− then
       j− ← j,
       Fj− ← Fj
   end if
   j ← j + 1
end while
q˜ ← qj−
```
**Algorithm A2** GWO transformation.

```
L ← 2 − t(2/G)
i ← ξ(H)
j ← 0
while j < p do
   αx ← 0
   k ← 0
   while k < kw do
       gA ← 2Lξ − L
       gC ← 2ξ
       αD ← |gCq
                   Ik
                   j − qi
                        j
                         |
       αx ← αx + q
                    Ik
                    j − gAαD
       k ← k + 1
   end while
   qˆj ← αx/kw
   if qˆj > q+
            j then
       qˆj ← q+
              j
   end if
   if qˆj < q−
            j then
       qˆj ← q−
              j
   end if
   j ← j + 1
end while
Fˆ ← Goal(qˆ)
if Fˆ < Fi then
   Fi ← Fˆ
   qi ← qˆ
   Sort(I, F, kw)
end if
```
**Algorithm A3** GA transformation.

```
k1 ← ξ(H), k2 ← ξ(H)
d ← ξ
if d > Fj− /Fk1 or d > Fj− /Fk2 then
   ks ← ξ(p)
   NumbertoGray(q
                      k1
                      ks
                        , y1)
   NumbertoGray(q
                      k2
                      ks
                        , y2)
   kc ← ξ(c + d)
   j ← 0
   while j < kc do
       Sony1
            j ← y1
                   j , Sony2
                          j ← y2
                                 j , j ← j + 1
   end while
   j ← kc
   while j < c + d do
       Sony1
            j ← y2
                   j , Sony2
                          j ← y1
                                 j , j ← j + 1
   end while
   i ← 0
   while i < ks do
       Son1
           i ← q
                 k1
                 i , Son2
                        i ← q
                               k2
                               i , i ← i + 1
   end while
   i ← ks + 1
   while i < p do
       Son1
           i ← q
                 k2
                 i , Son2
                        i ← q
                               k1
                               i , i ← i + 1
   end while
   GraytoNumber(Sony1, Son1
                                ks
                                  )
   GraytoNumber(Sony2, Son2
                                ks
                                  )
   j ← 1
   while j ≤ 2 do
       if Sonj
             ks > q+
                    ks then
           Sonj
               ks ← q+
                      ks
       else if Sonj
                  ks < q−
                         ks then
           Sonj
               ks ← q−
                      ks
       end if
       Fˆ ← Goal(Sonj
                       )
       i+ ← 0, i ← 1
       while i < H do
           if Fi > Fi+ then
              i+ ← i
           end if
       end while
       if Fˆ < Fi+ then
           qi+ ← Sonj
           Fi+ ← Fˆ
       end if
       j ← j + 1
   end while
end if
```
#### **Algorithm A4** PSO transformation.

```
j ← ξ(H)
k ← ξ(H)
i ← 0
while i < k0 do
   l ← ξ(H)
   if Fl < Fk then
       k ← l
   end if
end while
i ← 0
while i < p do
   v
     j
     i ← αv
             j
             i + ξβ(qk
                      i − q
                            j
                            i
                            ) + ξγ(q
                                      j−
                                      i − q
                                             j
                                             i
                                              )
   qˆi ← q
           j
           i + σv
                  j
                  i
   if qˆi > q+
             i then
       qˆi ← q+
               i
   else if qˆi < q−
                  i then
       qˆi ← q−
               i
   end if
   i ← i + 1
end while
Fˆ ← Goal(qˆ)
if Fˆ < Fj then
    Fj ← Fˆ
   qj ← qˆ
end if
```
#### **Appendix B. Stabilization System of the Mobile Robot**

$$\mathbf{Y} = \begin{bmatrix} \mathbf{Y}\_{1,1} & \mathbf{Y}\_{1,2} \\ \mathbf{0}\_{12 \times 12} \mathbf{Y}\_{2,2} \end{bmatrix} \tag{A1}$$
 
$$\begin{bmatrix} 0 \,0 \,0 \,0 \,0 \,1 \,1 \,0 \,0 \,1 \,2 \,1 \\ 0 \,0 \,0 \,0 \,0 \,0 \,1 \,0 \,0 \,0 \\ 0 \,0 \,0 \,0 \,0 \,0 \,1 \,0 \,0 \,0 \\ 0 \,0 \,0 \,0 \,0 \,1 \,0 \,0 \,0 \,0 \\ 0 \,0 \,0 \,0 \,0 \,0 \,1 \,0 \,0 \,0 \\ 0 \,0 \,0 \,0 \,0 \,0 \,1 \,0 \,0 \,0 \\ 0 \,0 \,0 \,0 \,0 \,0 \,1 \,0 \,0 \,0 \\ 0 \,0 \,0 \,0 \,0 \,2 \,0 \,1 \,1 \,0 \\ 0 \,0 \,0 \,0 \,0 \,0 \,2 \,1 \,0 \,0 \\ 0 \,0 \,0 \,0 \,0 \,0 \,0 \,1 \,1 \,8 \\ 0 \,0 \,0 \,0 \,0 \,0 \,0 \,0 \,1 \,1 \\ 0 \,0 \,0 \,0 \,0 \,0 \,0 \,0 \,0 \,1 \end{bmatrix} \tag{A2}$$

**Ψ**1,2 = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 15 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 00 0 0 0 0 0 0 0 0 9 0 00 0 10000 0 0 0 13 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 4 13 10 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 12 0 0 0 19000 0 1 0 0 0 00 0 0 0 0 0 0 0 4 23 1 0 0 0 0 0 0 23 1 10 10 0 0 0 0 23 0 16 0 16 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ (A3) **Ψ**2,2 = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 1 1 15 0 14 0 0 0 0 0 0 0 01 1 0 0 000 0 00 0 0 0 1 1 0 0 0 0 0 0 0 13 00 0 1 8 000 0 00 0 00 0 0 1 100 0 00 0 00 0 0 0 2 1 0 15 0 0 0 00 0 0 0 011 0 00 0 00 0 0 0 001 5 00 0 00 0 0 0 000 1 10 0 0 0 0 0 0 0 0 0 0 1 1 12 00 0 0 0 000 0 02 1 00 0 0 0 000 0 00 1 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ (A4)

In the matrices, the numbers correspond to the functions with one and two arguments according to [46].

The mathematical expression for the found control function described by these matrices has the following form and parameters:

$$u\_{i+(j-1)2} = \begin{cases} u\_{i+(j-1)2}^+ \text{ if } \vec{u}\_{i+(j-1)2} > u\_{i+(j-1)2}^+ \\ u\_{i+(j-1)2}^- \text{ if } \vec{u}\_{i+(j-1)2} < u\_{i+(j-1)2}^- \\ \vec{u}\_{i+(j-1)2} \text{ otherwise} \end{cases} \tag{A5}$$

where *i* = 1, 2, *j* = 1, 2,

$$\mathfrak{a}\_{1+(j-1)2} = \text{sgn}(q\_3(\mathbf{x}\_{3+(j-1)3}^\* - \mathbf{x}\_{3+(j-1)3})) \exp(-|q\_3(\mathbf{x}\_{3+(j-1)3}^\* - \mathbf{x}\_{3+(j-1)3})|) + $$

$$a^{-1} + \sqrt[3]{a} + \text{sgn}(\mathbf{x}\_{3+(j-1)3}^\* - \mathbf{x}\_{3+(j-1)3}) + \mu(b), \tag{A6}$$

$$
\vec{u}\_{2+(j-1)2} = \vec{u}\_{1+(j-1)2} + \sin(\vec{u}\_{1+(j-1)2}) + \arctan(h) + \mu(b) + c - c^3,\tag{A7}
$$

$$a = \tanh(d) + \left(b + \sqrt[3]{\mathbf{x}\_{1+(j-1)3}^{\*} - \mathbf{x}\_{1+(j-1)3}}\right)^{3} + \cdots$$

$$c + \sin(q\_3(\mathbf{x}\_{3+(j-1)3}^{\*} - \mathbf{x}\_{3+(j-1)3})),$$

$$\begin{split} b &= \mathfrak{g} + \text{sgn}(\text{sgn}(\mathbf{x}\_{1+(j-1)3}^{\*} - \mathbf{x}\_{1+(j-1)3}) q\_{2}(\mathbf{x}\_{2+(j-1)3}^{\*} - \mathbf{x}\_{2+(j-1)3})) \times \\ & \exp(-|\text{sgn}(\mathbf{x}\_{1+(j-1)3}^{\*} - \mathbf{x}\_{1+(j-1)3}) q\_{2}(\mathbf{x}\_{2+(j-1)3}^{\*} - \mathbf{x}\_{2+(j-1)3})|) + \\ & \sin(\mathbf{x}\_{1+(j-1)3}^{\*} - \mathbf{x}\_{1+(j-1)3}) + \tanh(\mathfrak{g}) + \mathbf{x}\_{1+(j-1)3}^{\*} - \mathbf{x}\_{1+(j-1)3} \\ c &= \mathfrak{g} + \text{sgn}(\text{sgn}(\mathbf{x}\_{1+(j-1)3}^{\*} - \mathbf{x}\_{1+(j-1)3}) q\_{2}(\mathbf{x}\_{2+(j-1)3}^{\*} - \mathbf{x}\_{2+(j-1)3})) \times \\ & \exp(-|\text{sgn}(\mathbf{x}\_{1+(j-1)3}^{\*} - \mathbf{x}\_{1+(j-1)3}) q\_{2}(\mathbf{x}\_{2+(j-1)3}^{\*} - \mathbf{x}\_{2+(j-1)3})|) + \end{split}$$

sin(*x*∗ <sup>1</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*1+(*j*−1)3), *<sup>d</sup>* <sup>=</sup> *<sup>h</sup>* <sup>+</sup> *<sup>c</sup>* <sup>−</sup> *<sup>c</sup>*<sup>3</sup> <sup>+</sup> sgn(*q*1(*x*<sup>∗</sup> <sup>1</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*1+(*j*−1)3))+ arctan(*q*1) + *ϑ*(*x*<sup>∗</sup> <sup>3</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*3+(*j*−1)3), *g* = sgn(*x*∗ <sup>1</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*1+(*j*−1)3)*q*2(*x*<sup>∗</sup> <sup>2</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*2+(*j*−1)2)+ *q*3(*x*<sup>∗</sup> <sup>3</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*3+(*j*−1)3) + tanh(*q*1(*x*<sup>∗</sup> <sup>1</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*1+(*j*−1)3)), *h* = arctan(*q*1(*x*<sup>∗</sup> <sup>1</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*1+(*j*−1)3))+ sgn(*w*) |*w*| + *w* + *v* + 2sgn(*w* + tanh(*v*))+ 3 ' *w* + tanh(*v*) + <sup>3</sup> *<sup>x</sup>*<sup>∗</sup> <sup>1</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*1+(*j*−1)3<sup>+</sup> sgn(*x*∗ <sup>1</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*1+(*j*−1)3) |*x*∗ <sup>1</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*1+(*j*−1)3|<sup>+</sup> 3 *x*∗ <sup>1</sup> − *x*<sup>1</sup> + tanh(*v*), *w* = sgn(*x*∗ <sup>1</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*1+(*j*−1)3)+ sgn(*q*2(*x*<sup>∗</sup> <sup>2</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*2+(*j*−1)3))sgn(*x*<sup>∗</sup> <sup>1</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*1+(*j*−1)3)<sup>×</sup> tanh(*x*∗ <sup>1</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*1+(*j*−1)3), *v* = *q*3(*x*<sup>∗</sup> <sup>3</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*3+(*j*−1)3) + sgn(*x*<sup>∗</sup> <sup>1</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*1+(*j*−1)3)*q*2<sup>×</sup> (*x*∗ <sup>2</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*2+(*j*−1)3) + tanh(*x*<sup>∗</sup> <sup>1</sup>+(*j*−1)<sup>3</sup> <sup>−</sup> *<sup>x</sup>*1+(*j*−1)3), *<sup>μ</sup>*(*α*) = sgn(*α*) min{1, <sup>|</sup>*α*|}, tanh(*α*) = <sup>1</sup> <sup>−</sup> exp(−2*α*) 1 + exp(−2*α*) ,

*j* = 1, 2, *q*<sup>1</sup> = 14.7288, *q*<sup>2</sup> = 2.0271, *q*<sup>3</sup> = 4.0222.

#### **Appendix C. Plots for Mecanum Robot**

Figures A1–A6 demonstrate the plots of optimal values of the state-space vector components (black lines) and the corresponding values of the vector **x**∗ (red lines).

**Figure A1.** Plots of optimal *x*<sup>1</sup> and *x*<sup>1</sup> ∗.

**Figure A2.** Plots of optimal *x*<sup>2</sup> and *x*<sup>2</sup> ∗.

**Figure A3.** Plots of optimal *x*<sup>3</sup> and *x*<sup>3</sup> ∗.

**Figure A4.** Plots of optimal *x*<sup>4</sup> and *x*<sup>4</sup> ∗.

**Figure A5.** Plots of optimal *x*<sup>5</sup> and *x*<sup>5</sup> ∗.

**Figure A6.** Plots of optimal *x*<sup>6</sup> and *x*<sup>6</sup> ∗.

In Figures A7–A14, the plots of optimal values of the control vector components are presented.

**Figure A7.** Plot of optimal values of control component *u*1.

**Figure A8.** Plot of optimal values of control component *u*2.

**Figure A9.** Plot of optimal values of control component *u*3.

**Figure A10.** Plot of optimal values of control component *u*4.

**Figure A11.** Plot of optimal values of control component *u*5.

**Figure A12.** Plot of optimal values of control component *u*6.

**Figure A13.** Plot of optimal values of control component *u*7.

**Figure A14.** Plot of optimal values of control component *u*8.

#### **References**

