**3. The Problem of General Control Synthesis as Machine Learning Control**

In the field of control there are also problems that require machine learning. One of the main machine learning control problems is a search for a control function in the general control synthesis problem.

The problem of control general synthesis was formulated in the middle of the last century by Boltyanskii [9] after studying the Pontryagin's maximum principle for the optimal control problem.

The problem has the following description.

The mathematical model of the control object is given in the form of the system of ordinary differential equations

$$
\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{u}),
\tag{10}
$$

where **<sup>x</sup>** is a vector of state, **<sup>x</sup>** <sup>∈</sup> <sup>R</sup>*n*, **<sup>u</sup>** is a vector of control, **<sup>u</sup>** <sup>∈</sup> <sup>U</sup> <sup>⊆</sup> <sup>R</sup>*m*, U is a compact set, *m* ≤ *n*.

The domain of initial conditions is given

$$
\lambda\_0 \subseteq \mathbb{R}^n. \tag{11}
$$

Existence of the initial condition domain is a main feature of the control general synthesis problem. Initially Boltyanskii defined the domain of initial conditions as a whole space of states X0 = R*n*, because he tried to solve this problem analytically. In this case we assume to solve this problem numerically. Therefore the domain X0 is a restricted set in the space of states.

The terminal condition is given

$$\mathbf{x}(t\_f) = \mathbf{x}^f \in \mathbb{R}^n,\tag{12}$$

where *tf* is unassigned time of getting from any initial condition **<sup>x</sup>**<sup>0</sup> <sup>∈</sup> X0 to the terminal state (12).

The finishing time is bounded

$$t\_f \le t^+,\tag{13}$$

where *t* <sup>+</sup> is a given positive value.

The phase constraints are given

$$\varphi\_i(\mathbf{x}) \le 0, \ i = 1, \ldots, r. \tag{14}$$

The quality criterion is given

$$J = \int \cdots \cdot \int \int\_{t\_0}^{t\_f} f\_0(\mathbf{x}(t, \mathbf{x}^0), \mathbf{u}(t)) dt \to \min\_{\mathbf{u} \in \mathbf{U}} \tag{15}$$

where **<sup>x</sup>**(*t*, **<sup>x</sup>**0) is a partial solution of differential Equation (10) with control **<sup>u</sup>**(*t*) <sup>∈</sup> U from initial condition **<sup>x</sup>**<sup>0</sup> <sup>∈</sup> X0.

It is necessary to find a control function in the form

$$\mathbf{u} = \mathbf{h}(\mathbf{x}) \in \mathbf{U},\tag{16}$$

where **<sup>h</sup>**(**x**) : <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>R</sup>*m*.

If one inserts the control function (16) in the right part of differential Equation (10), then the system of stationary differential equations is received

$$
\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{h}(\mathbf{x})),
\tag{17}
$$

which does not have a free control vector in the right part.

Any partial solution of the differential Equation (17) from initial conditions (11) achieves terminal condition (12), performing all conditions on phase constraints (14) with optimal value of the quality criterion (15).

Note, that the control function (16) can have simple discontinuities, therefore in many cases analytical methods could not be applied. The majority of analytical methods such as integrator backstepping [10,11] and analytical design of aggregated regulators [12,13] provides stability on Lyapunov by nonlinear smooth feedback control. The main drawback of all analytical methods of control synthesis solution is that they are bounded with the specific form of the mathematical model of control object. The control synthesis problem (10)–(17) under consideration is complicated by the arbitrary form of the mathematical model of the control object and sub-integral function of quality criterion, as well as the phase constraints and a wide class of control functions, which can have simple discontinuities.

In general case, this control general synthesis problem can be solved numerically by symbolic regression methods as machine learning control problem.

For application of the numerical methods it is necessary to reformulate the problem statement. The domain of initial conditions is changed onto finite set of initial state points

$$\tilde{\chi}\_0 = \{ \mathbf{x}^{0,1}, \dots, \mathbf{x}^{0,K} \}. \tag{18}$$

The terminal condition (12) and the phase constraints are added into quality criterion (15), and the integral of the domain of initial conditions is changed onto sum of all initial state points.

$$J\_1 = \sum\_{i=1}^{K} \left( a\_1 ||\mathbf{x}^f - \mathbf{x}(t\_{f,i}, \mathbf{x}^{0,i})|| + \int\_0^{t\_{f,i}} \left( f\_0(\mathbf{x}(t, \mathbf{x}^{0,i}), \mathbf{u}(t)) + \right. \right)$$

$$\theta(\boldsymbol{\varrho}(\mathbf{x}(t, \mathbf{x}^{0,i}))) p(\mathbf{x}(t, \mathbf{x}^{0,i})) \Big| \, dt \right) \to \min\_{\mathbf{u} \in \mathcal{U}}.\tag{19}$$

where *a*<sup>1</sup> is a weight coefficient, *ϑ*(*A*) is a Heaviside step function

$$\theta(A) = \begin{cases} \ 1, \text{if } A > 0 \\ \ 0, \text{otherwise} \end{cases} \tag{20}$$

*p*(*B*) is a penalty function, *tf* ,*<sup>i</sup>* is a time of terminal state (12) achievement from initial condition **x**0,*<sup>i</sup>* ,

$$t\_{f,i} = \begin{cases} \ t, \text{if } t \le t^+ \text{and } \|\mathbf{x}^f - \mathbf{x}(t, \mathbf{x}^{0,i})\| \le \varepsilon\_0\\ \ t^+, \text{otherwise} \end{cases}, i = 1, \dots, K,\tag{21}$$

*ε*<sup>0</sup> is a small positive value, that determines accuracy of terminal state achievement.

Within the framework of the formulation of the machine learning problem, the solution to the synthesis problem based on symbolic regression methods is machine learning control.
