*Proceeding Paper* **Synthesis of a Feedback Controller by the Network Operator Method for a Mobile Robot Rosbot in Gazebo Environment †**

**Elizaveta Shmalko \*,‡ and Yuri Rumyantsev ‡**

Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 119333 Moscow, Russia; urock@fastsense.tech


**Abstract:** The article presents an approach based on machine learning with symbolic regression for the synthesis of a stabilization system for a mobile robot. This approach is universal and allows you to numerically solve the synthesis problem in a general setting without the need to form a training sample, instead relying only on the value of the functional. The synthesis is implemented to stabilize the mobile robot Rosbot in the Gazebo simulation environment. The feedback stabilization system is received by the network operator method. The advantage of the method is that it can be applied to a control object of any complexity and linearity.

**Keywords:** control synthesis; stabilization; machine learning; symbolic regression; Gazebo mobile robot

## **1. Introduction**

Synthesis of a feedback stabilization system is one of the key tasks in applied robotics. Feedback is needed to level out the differences between the model and the real object, as well as other possible uncertainties and noise. From a mathematical point of view, this problem belongs to the class of problems for the synthesis of optimal control systems, where it is necessary to find a control vector function that depends on the state vector of the object and delivers a minimum to the quality functional (see Figure 1).

**Figure 1.** The problem of synthesis of the stabilization system.

There are two main strategies for the synthesis of a control system: parametric and structural-parametric.

Parametric synthesis includes all methods in which the control structure is specified, and only the parameters are optimally tuned from the point of view of the functional. This synthesis strategy is by far the most common. It also includes the most popular various PID

**Citation:** Shmalko, E.; Rumyantsev, Y. Synthesis of a Feedback Controller by the Network Operator Method for a Mobile Robot Rosbot in Gazebo Environment. *Eng. Proc.* **2023**, *33*, 6. https://doi.org/10.3390/engproc 2023033006

Academic Editors: Askhat Diveev, Ivan Zelinka, Arutun Avetisyan and Alexander Ilin

Published: 16 May 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

controllers [1,2], other controllers based on neural networks [3,4], and fuzzy logic [5,6], etc. In all these approaches, some preliminary knowledge of the developer about the object is required in order to set the controller structure as correctly as possible; nevertheless, there is still no reason to consider the chosen structure as optimal, as only the parameters are adjusted according to the optimality criterion.

With the structural-parametric approach, not only are the parameters optimized, but the optimal structure of the feedback control function is also sought. Among the analytical methods for solving the problem of stabilization system synthesis, the most popular are methods based on solving the Riccati equation [7], but, for linear systems only, as well as more modern analytical approaches of backstepping [8] and analytical design of aggregated controllers [9,10], they also depend on the types of right-hand sides of nonlinear differential equations that describe the control object model. This challenging task for nonlinear systems can be addressed by a dynamic programming (DP) algorithm assisted by the Hamilton–Jacobi–Bellman (HJB) equation. However, it always results in difficulty due to the well-known curse of dimensionality. Recently, modern techniques such as adaptive DP (ADP) [11–14] and reinforcement learning (RL) [15–17] have received increasing attention as powerful machine learning and optimization strategies for addressing the control problems of nonlinear systems regarding the use of neural networks for numerical approximating solutions of the HJB equation. However, these approaches also have many computational difficulties, primarily related to the definition and training of the neural networks used.

Thus, the application of machine learning methods opens up broad prospects, but it is necessary to develop novel control methods addressing general mathematical statements of the control synthesis problem in order to satisfy the requirement of optimal performance in control synthesis tasks. This motivates our research.

In this paper, we apply symbolic regression methods to solve the problem of synthesis of a stabilization system. These methods also belong to the class of machine learning methods, but unlike neural networks, they allow us to search for not only for parameters but also for the optimal structure of the control function. These methods use evolutionary optimization algorithms for the structural-parametric search for the control function, basing it directly on the value of the quality functional.

A wheeled mobile robot is considered as a control object.

The paper develops an applied software implementation of the robot control system. To create it, the most popular robotic operating system ROS is used today. It provides the ability to work with all aspects of the control system, including hardware abstraction, low-level control, message passing between processes, and package management. The developed software systems were tested in the Gazebo simulation environment, which is one of the most popular robotic simulators due to its compatibility with ROS. Gazebo is a 3D simulator that aims to model a robot in a way that gives you a close substitute for how the robot would behave in a real physical environment. Gazebo has a fairly reliable simulation of physics and various physical phenomena, takes into account the influence of forces, and also has a large number of plug-ins for simulating the operation of sensors, such as lidars or cameras. Due to these facts, most developers of control systems for robotic systems around the world use this ROS/Gazebo bundle as a standard for testing the developed control algorithms [18,19].

In this work, we used the ready-made Gazebo model ROSbot 2.0 integrated into ROS [20]. As a position source, a plugin was used that gives the true coordinates of the robot. The robot model in the simulator is shown in Figure 2.

For the selected object, the problem of synthesizing the stabilization system was successfully solved by machine learning based on symbolic regression via the network operator method [21].

#### **2. Problem Statement of the Stabilization System Synthesis**

The main goal of the introduction of the stabilization system is to provide a stability property for the object in some domain X0 <sup>⊆</sup> <sup>R</sup>*n*.

Let us be given a mathematical model of a control object. This model can be derived from physical laws or identified by some machine learning technique [22]. Generally, this model is described by a system of ordinary differential equations with a free control vector on the right hand side

$$
\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{u}),
\tag{1}
$$

where the state of the object is described by **<sup>x</sup>** <sup>∈</sup> <sup>R</sup>*n*, and control by **<sup>u</sup>** <sup>∈</sup> <sup>U</sup> <sup>⊆</sup> <sup>R</sup>*m*; U is a compact set, *m* ≤ *n*,

$$\begin{array}{rcl} \mathbf{x} & = & [\mathbf{x}\_1 \dots \mathbf{x}\_n]^T \\ \mathbf{u} & = & [u\_1 \dots u\_m]^T, \; m \le n \\ \mathbf{f}(\mathbf{x}, \mathbf{u}) & = & [f\_1(\mathbf{x}, \mathbf{u}) \dots f\_n(\mathbf{x}, \mathbf{u})]^T. \end{array} \tag{2}$$

An area of initial conditions is given

$$
\chi\_0 \subseteq \mathbb{R}^n. \tag{3}
$$

It is necessary to find a control function in the form

$$\mathbf{u} = \mathbf{h}(\mathbf{x}^\* - \mathbf{x}),\tag{4}$$

where **x**∗ is a fixed point in the state space, which becomes an equilibrium point of the differential equation

$$
\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{h}(\mathbf{x}^\* - \mathbf{x})),
\tag{5}
$$

where control function **<sup>h</sup>**(**x**∗, **<sup>x</sup>**)=[*h*1(**x**<sup>∗</sup> <sup>−</sup> **<sup>x</sup>**)... *hm*(**x**<sup>∗</sup> <sup>−</sup> **<sup>x</sup>**)]*<sup>T</sup>* : <sup>R</sup>*<sup>n</sup>* <sup>×</sup> <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>R</sup>*<sup>m</sup>* has the following properties:

$$\begin{array}{c} \mathbf{h}(\mathbf{x}^\*, \mathbf{x}) \in \mathbf{U} \subseteq \mathbb{R}^n, \quad \forall \quad \mathbf{x}^\* \in \mathbb{X}\_0, \; \exists \mathbf{x}(\mathbf{x}^\*) \quad \text{such that} \\ \mathbf{f}(\bar{\mathbf{x}}(\mathbf{x}^\*), \mathbf{h}(\mathbf{x}^\*, \bar{\mathbf{x}}(\mathbf{x}^\*))) = \mathbf{0}, \\ \det(\mathbf{A} - \lambda \mathbf{E}) = (-1)^n (\lambda - \lambda\_1) \cdot \ldots \cdot (\lambda - \lambda\_n) = \prod\_{j=1}^n (\lambda - \lambda\_j) = 0, \\ \lambda\_j = a\_j + i\beta\_j, \ j = 1, \ldots, n, \\ a\_j < 0, \ j = 1, \ldots, n, \ i = \sqrt{-1}, \\ \mathbf{A} = \frac{\det(\mathbf{x}(\mathbf{x}^\*), \mathbf{h}(\mathbf{x}^\*, \mathbf{x}(\mathbf{x}^\*)))}{\det}, \\ \mathbf{E} = \operatorname\*{diag}(\underbrace{1, \ldots, 1}\_n). \end{array} \tag{6}$$

The properties (6) indicate that ∀**x**<sup>∗</sup> ∈ X0 for the system **x**˙ = **f**(**x**, **h**(**x**∗, **x**)) means that there is always a stable equilibrium point **<sup>x</sup>**˜(**x**∗) <sup>∈</sup> <sup>R</sup>*n*. Additionally, the equilibrium point possesses attractor properties, since near this point all solutions converge.

Computationally, to provide a stability property to the equilibrium point **x**˜, the synthesis problem (1)–(4) is solved with the terminal point **<sup>x</sup>***<sup>f</sup>* <sup>=</sup> **<sup>x</sup>**˜, the initial domain X0 <sup>⊂</sup> X, and the quality criterion

$$J = \max\{t\_{f,1}, \dots, t\_{f,K}\} + a\_1 \sum\_{i=1}^{K} \Delta\_{f,i} \to \min,\tag{7}$$

where *a*<sup>1</sup> is a weight coefficient,

$$\Delta\_{f,i} = \left\| \mathbf{x}^f - \mathbf{x}(t\_{f,i}, \mathbf{x}^{0,i}) \right\|\_{\prime} \tag{8}$$

*tf* ,*<sup>i</sup>* is a time of achievement of the terminal position from the initial condition **x**0,*<sup>i</sup>* of the set of initial conditions X0 <sup>=</sup> {**x**0,1,..., **<sup>x</sup>**0,*K*}, *<sup>i</sup>* ∈ {1, . . . , *<sup>K</sup>*},

$$\Lambda\_{f,i} = \begin{cases} \ t, & \text{if } \ t < t^+ \text{ and } \ \Lambda\_{f,i} \le \varepsilon \\\ t^+, & \text{otherwise} \end{cases}, \tag{9}$$

*t* <sup>+</sup> and *ε* are given positive values, and **x**(*t*, **x**0,*<sup>i</sup>* ) is a partial solution of the system

$$
\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{h}(\mathbf{x}^f - \mathbf{x})),
\tag{10}
$$

for initial conditions **x**(*t*0) = **x**0,*<sup>i</sup>* , *i* ∈ {1, . . . , *K*},

$$\left\|\mathbf{x}^{f} - \mathbf{x}\right\| = \sqrt{\sum\_{i=1}^{n} (\mathbf{x}\_{i}^{f} - \mathbf{x}\_{i})^{2}}.\tag{11}$$

Since we are solving the problem of synthesizing a stabilization system using machine learning, we need machine confirmation of the achievement of the desired properties. Let us introduce the following definition of a machine criterion for a differential equation system to obtain some property. To define the property of the whole system (1), it is enough to set a quantity *K* of partial solutions that obtain this property.

**Definition 1.** *If D experiments are carried out, and in every i experiment Ki partial solutions of the differential equation perform the required property from any Mi* ≥ *Ki randomly selected initial conditions from the initial domain,*

$$\lim\_{D \to \infty} \sum\_{i=1}^{D} \frac{K\_i}{M\_i} \to 1,\tag{12}$$

*and so the existence of this property for the differential equation in this domain is proven by a machine.*

In other words, as the number of experiments increases, the probability of a "bad" event, when the system does not have the desired property, tends to zero. From a mathematical point of view, this means that all private solutions for a domain of initial conditions have this property except for solutions for a subset of a zero measure.

Based on the proposed formulation of the problem statement, let us consider in the next section a solution of the stabilization system synthesis problem via a machine learning approach of symbolic regression for a mobile robot in a Gazebo simulation environment.

#### **3. Stabilization System Synthesis for ROSbot in Gazebo**

We consider the ROSbot 2.0 virtual robot implemented in the Gazebo physical simulation environment. The robot is a platform on four non-rotating wheels around the vertical axis. An electric motor is attached to each wheel. A differential control scheme is used: the robot moves forward and backward by applying the same voltage to all four electric motors; turning the robot to the right or left is carried out by supplying more voltage to the electric motors of the left or right wheels, respectively.

The robot motion model is described by the following system of differential equations [23]

$$\begin{array}{rcl} \dot{\mathfrak{x}}\_{1} &=& 0.5(\mathfrak{u}\_{1} + \mathfrak{u}\_{2})\cos(\mathfrak{x}\_{3}),\\ \dot{\mathfrak{x}}\_{2} &=& 0.5(\mathfrak{u}\_{1} + \mathfrak{u}\_{2})\sin(\mathfrak{x}\_{3}),\\ \dot{\mathfrak{x}}\_{3} &=& 0.5(\mathfrak{u}\_{1} - \mathfrak{u}\_{2}),\end{array} \tag{13}$$

where **x** = [*x*<sup>1</sup> *x*<sup>2</sup> *x*3] *<sup>T</sup>* is a vector of state, **u** = [*u*<sup>1</sup> *u*2] *<sup>T</sup>* is a control vector.

Physical control of the robot in Gazebo is implemented using two signals: *u<sup>v</sup>* — the desired linear speed; and *u<sup>ω</sup>* — the desired angular velocity. As we are aiming to stabilize the system, it can be assumed that control signals are completely directly transmitted to the system *v* = *uv*, *ω* = *uω*. In this case, the (13) equations are converted to the following:

$$\begin{cases} \dot{\boldsymbol{x}}\_1 = \boldsymbol{u}^v \cos(\boldsymbol{x}\_3), \\ \dot{\boldsymbol{x}}\_2 = \boldsymbol{u}^v \sin(\boldsymbol{x}\_3), \\ \dot{\boldsymbol{x}}\_3 = \boldsymbol{u}^\omega. \end{cases} \tag{14}$$

The stabilization system is synthesized in advance and then is programmed into an onboard computer. For the solution, a machine learning approach based on symbolic regression was chosen. Symbolic regression allows you to search for a solution to the problem without training data, simply according to the formal statement, relying in the search process on the criterion of minimizing the functional. Moreover, this approach is universal and can be equally applicable to models of any kind, including non-linear models or models in the form of neural networks.

The network operator method [21] was used in the calculations. This symbolic regression method is good because it uses the principle of variation of the basic solution, which significantly speeds up the process of finding a solution that is close to optimal. The method encodes possible solutions as a square upper triangular matrix. It is in the form of a matrix that describes the sequence of calculation of the control function that the resulting stabilization system is placed in the on-board computer.

In the calculations, the following parameters were set.

The control values were constrained −10 ≤ *ui* ≤ 10, *i* = 1, 2.

The initial domain was defined by 26 elements:

X¯ <sup>0</sup> <sup>=</sup> {[−2.5 <sup>−</sup> 2.5 <sup>−</sup> <sup>5</sup>*π*/12] *<sup>T</sup>*, [−2.5 <sup>−</sup> 2.5 0] *<sup>T</sup>*, [−2.5 <sup>−</sup> 2.5 5*π*/12] *T*, [−2.5 0 − 5*π*/12] *<sup>T</sup>*, [−2.5 0 0] *<sup>T</sup>*, [−2.5 0 5*π*/12] *<sup>T</sup>*, [−2.5 2.5 <sup>−</sup> <sup>5</sup>*π*/12] *T*, [−2.5 2.5 0] *<sup>T</sup>*, [−2.5 2.5 5*π*/12] *<sup>T</sup>*, [<sup>0</sup> <sup>−</sup> 2.5 <sup>−</sup> <sup>5</sup>*π*/12] *<sup>T</sup>*, [<sup>0</sup> <sup>−</sup> 2.5 0] *T*, [0 − 2.5 5*π*/12] *<sup>T</sup>*, [0 0 <sup>−</sup> <sup>5</sup>*π*/12] *<sup>T</sup>*, [005*π*/12] *<sup>T</sup>*, [0 2.5 <sup>−</sup> <sup>5</sup>*π*/12] *T*, [0 2.5 0] *<sup>T</sup>*, [0 2.5 5*π*/12] *<sup>T</sup>*, [2.5 <sup>−</sup> 2.5 <sup>−</sup> <sup>5</sup>*π*/12] *<sup>T</sup>*, [2.5 <sup>−</sup> 2.5 0] *T*, [2.5 − 2.5 5*π*/12] *<sup>T</sup>*, [2.5 0 <sup>−</sup> <sup>5</sup>*π*/12] *<sup>T</sup>*, [2.5 0 0] *<sup>T</sup>*, [2.5 0 5*π*/12] *T*, [2.5 2.5 − 5*π*/12] *<sup>T</sup>*, [2.5 2.5 0] *<sup>T</sup>*, [2.5 2.5 5*π*/12] *T*}. (15)

The stabilization point was chosen as

$$\mathbf{x}^\* = [\mathbf{x}\_1^\* \ \mathbf{x}\_2^\* \ \mathbf{x}\_3^\*]^T = [0 \ 0 \ 0]^T. \tag{16}$$

It is necessary to find a control function in the form

$$\mu\_i = h\_i(\mathbf{x}\_1^\* - \mathbf{x}\_1, \mathbf{x}\_2^\* - \mathbf{x}\_2, \mathbf{x}\_3^\* - \mathbf{x}\_3, r\_1, r\_2, r\_3), \tag{17}$$

where *r*1,*r*2,*r*<sup>3</sup> are constant parameters, *i* = 1, 2, such that a robot from all 26 initial conditions (15) reaches the stabilization point (16) with minimal time and highest accuracy.

A time consuming computational experiment has been carried out, and as the result of synthesis the network operator method found a solution in the form of a network operator matrix. The dimension of the matrix in this experiment was *L*×*L*, where *L* = 24.

This matrix encodes a rather sophisticated symbolic formula. Each non-zero element represents one of either unary (such as *sin*(*x*), *e<sup>x</sup>* ... ) or binary operation (such as *x* + *y*, *x* × *y*). To be able to calculate control values, the matrix has to be decoded online on an onboard computer in real time. The computational complexity of such a process is limited by *O*(*L*2) but is often reasonably low because a lot of matrix elements are zero.

#### **4. Verification in Gazebo**

For a real time onboard computation network, the operator decode function has been implemented in C++ [24] and later incorporated into the ROS node Rosbot controller. Rosbot controller node accepted robot ground truth coordinates and movement goal as inputs and generated control signals *uv*, *u<sup>ω</sup>* every 100 ms as outputs.

The robot was directed to reach several goals. After reaching its first goal, the robot started heading to the next and so on, as shown in Figure 3.

**Figure 3.** Rosbot trajectory on XY plane under network operator control.

In general we noticed that the robot's movements were stable and predictable.

#### **5. Discussion**

In this paper, a Rosbot controller synthesized by a computer algorithm using the network operator method has been verified to work in a Gazebo simulated environment. The synthesis algorithm required only a mathematical model of the robot as an input. The mathematical structure of the control function has been found automatically without any human input. The synthesized function has been verified to be able to control Rosbot in a stable and predictable way.

The presented numerical approach to the synthesis of a stabilization system by a robot, implemented on the Gazebo simulator robot, is a universal machine learning approach for the synthesis of control systems, and opens up broad prospects for its use in various technical problems. The main advantage of the approach is that it is not tied to the type of control object model, and allows you to find the feedback control function in automatic mode using the symbolic regression algorithm.

**Author Contributions:** Conceptualization, E.S.; methodology, E.S.; software, Y.R.; validation, Y.R.; formal analysis, E.S. and Y.R.; investigation, Y.R.; data curation, Y.R.; writing—original draft preparation, E.S.; writing—review and editing, E.S.; visualization, Y.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
