*3.1. Control Synthesis as Unsupervised Machine Learning Control*

The first approach is a direct search of the control function on basis of a quality criterion minimization. In this case we receive unsupervised machine learning control. The stated general control synthesis problem (10), (12), (18)–(21) can be solved in the concept of unsupervised machine learning control by different symbolic regression methods. Such approach is demonstrated by genetic programming [2], network operator method [14], variational genetic programming [15], variational analytic programming [16], multi-layer network operator [17], binary variational genetic programming [18], modified Cartesian genetic programming [19]. All mentioned symbolic regression methods search for mathematical expressions of control functions, that provide for the received solutions achievement of the terminal condition (12) from all the initial conditions (18) with optimal value of the quality criterion (19), describing the time and accuracy of terminal state hitting, and including phase constraints in the form of penalty functions.

Symbolic regression methods use evolutionary algorithms to search for functions and can achieve a certain level of accuracy when minimizing the functional, but it still remains unknown how the values of the criterion (19) for these solutions are far from real optimal values. To correct this problem it is possible to use a supervised machine learning with a training set received by the solution of the optimal control problem.

#### *3.2. Control Synthesis as Supervised Machine Learning Control*

The second approach is a learning with application of a training set. This is a supervised machine learning control. In this case firstly it is necessary to obtain the training set. For this purpose solutions of the optimal control problem can be used.

The statement of optimal control problem includes a mathematical model of control object (10), an initial condition given in one point

$$\mathbf{x}^{0} \in \mathbb{R}^{n},$$
 
$$\mathbf{x}^{0} \in \mathbb{R}^{n},$$

terminal condition (12), (13) the phase constraints (14), and a quality criterion

$$J\mathfrak{z} = a\_1 \|\mathbf{x}^f - \mathbf{x}(t\_f)\| + \int\_0^{t\_f} (f \mathfrak{o}(\mathbf{x}(t), \mathfrak{u}(t)) + \theta(\varphi(\mathbf{x}(t))p(\mathbf{x}(t))))dt \to \min\_{\mathbf{u} \in \mathcal{U}},\tag{23}$$

where

$$t\_f = \begin{cases} \ t, \text{if } t \le t^+ \text{and } \|\mathbf{x}^f - \mathbf{x}(t)\| \le \varepsilon\_0\\ t^+, \text{otherwise} \end{cases} \tag{24}$$

It is necessary to find a control in the form

$$\mathfrak{u} = \mathbf{v}(t, \mathbf{x}^0) \in \mathbb{U}. \tag{25}$$

When inserting the function (25) into the right part of the mathematical model of the control object (10), the following system of non-stationary differential equations is received

$$\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{v}(t, \mathbf{x}^{0,i})). \tag{26}$$

To create a training set for the control synthesis problem it is necessary to solve the optimal control problem on criterion (23) for each particular initial condition from (18) and to receive sets of optimal controls

$$\mathbf{U}\_0 = \{ \mathbf{v}(t, \mathbf{x}^{0,1}), \dots, \mathbf{v}(t, \mathbf{x}^{0,K}) \} \tag{27}$$

and optimal trajectories

$$\widetilde{\mathbf{X}} = \{ \bar{\mathbf{x}}(t, \mathbf{x}^{0,1}), \dots, \bar{\mathbf{x}}(t, \mathbf{x}^{0,K}) \}. \tag{28}$$

Now we define the time interval *Deltat* and calculate the value of the state vector on each optimal trajectory at the interval boundaries. As a result, get a training set of optimal trajectories

$$
\tilde{X} = \{\tilde{X}\_1, \dots, \tilde{X}\_K\}\_\prime \tag{29}
$$

where

$$\tilde{\mathbf{X}}\_{i} = \{ \tilde{\mathbf{x}}(0, \mathbf{x}^{0,i}) = \mathbf{x}^{0,i}, \tilde{\mathbf{x}}^{i}(t\_1, \mathbf{x}^{0,i}), \dots, \tilde{\mathbf{x}}^{i}(t\_{M\_{i}}, \mathbf{x}^{0,i}) \}, \ i = 1, \dots, K,\tag{30}$$

*tj* = *tj*−<sup>1</sup> + <sup>Δ</sup>*t*, *<sup>j</sup>* = 1, . . . , *Mi*, *<sup>i</sup>* = 1, . . . , *<sup>K</sup>*, <sup>Δ</sup>*<sup>t</sup>* is a given time interval.

Now in order to solve the control synthesis problem (10), (12), (18)–(21), and to find the control function in the form (16) it is enough to approximate the training set (29) on a criterion

$$J\_3 = \sum\_{i=1}^{K} \sum\_{j=0}^{M\_i} ||\mathbf{x}(t\_j, \mathbf{x}^{0,i}) - \bar{\mathbf{x}}(t\_j, \mathbf{x}^{0,j})|| \to \min\_{\mathbf{h}(\mathbf{x}) \in \mathbf{U}} \tag{31}$$

where *t*<sup>0</sup> = 0, **x**(*t*, **x**0,*<sup>i</sup>* ) is a partial solution of the Equation (17) with the initial conditions **x**0,*<sup>i</sup>* , **x**˜(*t*, **x**0,*<sup>i</sup>* ) is a partial solution of the Equation (28), *i* ∈ {1, . . . , *K*}.

To ensure the fulfillment of phase constraints, both criteria (31) and (19) are applied. In result, the following combined criterion is used

$$J\_4 = J\_1 + \gamma J\_3 \to \min\_{\mathbf{h}(\mathbf{x}) \in \mathbf{U}} \tag{32}$$

where *γ* is a weight coefficient.

To solve the approximation problem, a symbolic regression is also used. The control synthesis on the base of optimal trajectories approximation allows to find a control function (16) that provides receiving optimal control with accuracy to approximation of the training set. The solution closest to the optimal one is determined by the accuracy of the optimal control problem.
