**4. Computational Algorithms**

In order to solve the control synthesis problem as machine learning control on the base of optimal trajectories set approximation it is required to solve two complex problems, the optimal control problem in order to form a training set, and the approximation of optimal trajectories by some symbolic regression method. For both problems evolutionary computations are used.

## *4.1. Algorithms for the Optimal Control Problem*

The optimal control problem with phase constraints is not uni-modal [20], therefore evolutionary algorithms are applied, which can solve a global optimization problem. Recently, it is popular to use hybrid evolutionary algorithms that combine different evolutionary algorithms. Studies of evolutionary algorithms for numerical solution of the optimal control problem show [21], that the most successful in solving this problem are genetic algorithm (GA) [22], Particle swarm optimization algorithm (PSO) [23] and Grey wolf optimizer algorithm (GWO) [24].

All evolutionary algorithms include evolution of possible solutions. It implies such changes in possible solutions that some of new possible solutions ensure obtaining the value of the goal functional not worse than the old possible solutions before the change. At evolution of possible solutions information about the values of the goal functional for other possible solutions is used.

PSO-algorithm uses the best current possible solution, and the best of solutions from some random selected ones as well as information about historical changes for this possible solution. An evolution is performed for each component of possible solution

$$
\overline{q}\_j^i = q\_j^i + \sigma v\_{j'}^i \quad j = 1, \dots, p,\tag{33}
$$

where *q*˜ *i <sup>j</sup>* is a new value of the component *<sup>j</sup>* of the possible solution *<sup>i</sup>*, **<sup>q</sup>**˜*<sup>i</sup>* = [*q*˜1 ... *<sup>q</sup>*˜*p*] *<sup>T</sup>*, *q<sup>i</sup> j* is the component *j* of the old possible solution *i*, **q***<sup>i</sup>* = [*q<sup>i</sup>* <sup>1</sup> ... *qp*] *<sup>T</sup>*, *p* is a dimension of the vector, *v<sup>i</sup> <sup>j</sup>* is the component *<sup>j</sup>* of a historical vector for possible solution *<sup>i</sup>*, **<sup>v</sup>***<sup>i</sup>* = [*v<sup>i</sup>* <sup>1</sup> ... *<sup>v</sup><sup>i</sup> p*], this historical vector is changed one time in one cycle of generation

$$
\sigma\_j^i \leftarrow \alpha \upsilon\_j^i + \mathbb{S}\beta (q\_j^i(k) - q\_j^i) + \mathbb{S}\gamma (q\_j^0 - q\_j^i), j = 1, \dots, p,\tag{34}
$$

*q*¯ *i <sup>j</sup>* is the component *<sup>j</sup>* of the best solution from *<sup>k</sup>* randomly selected ones, **<sup>q</sup>**¯*<sup>i</sup>* = [*q*¯1 ... *<sup>q</sup>*¯*p*] *T*, *qj*(0) is the component *j* of the best current possible solution, **q**(0)=[*q*1(0)... *qp*(0)]*T*, *ξ* is a random value in the interval from 0 to 1, at each call this function gives a new random number, *α*, *β*, *γ* are constant parameters of the algorithm, vector **v** has zero initial value.

The GWO-algorithm performs the following changes of possible solution on the base of some best current solutions

$$\eta\_{\dot{j}}^{i} = \frac{1}{N} \sum\_{k=0}^{N-1} (q\_{\dot{j}}(k) - r(2\xi - 1)) 2\xi q\_{\dot{j}}(k) - q\_{\dot{j}}^{i}| ), \quad \dot{j} = 1, \dots, p,\tag{35}$$

where *qj*(0)... *qj*(*N* − 1) are *j* components of *N* best possible solutions,

$$r = 2\left(1 - \frac{\mathcal{S}}{G}\right),\tag{36}$$

*r* is calculated one time in a generation, *g* is a number of generation, *G* is a quantity of generations.

The GA considers vectors in the form of Grey code and performs evolution for two selected possible solutions as operations of crossover and mutation. For crossover two possible solutions are selected

$$\mathbf{z}^{i} = [z\_1^i \dots z\_{c+d}^i]^T, \quad \mathbf{z}^j = [z\_1^j \dots z\_{c+d}^j]^T,\tag{37}$$

where *z<sup>i</sup> k*, *z j <sup>k</sup>* ∈ {0, 1}, *c* is a number of bit for integer part, *d* is a number of bits for fractional part of Grey code.

Then the point of crossover *s* is determined. As a result two new possible solutions are received

$$\bar{\mathbf{z}}^{i} = [z\_1^{i} \dots z\_s^{i} z\_{s+1}^{j} \dots z\_{c+d}^{j}]^T,\ \bar{\mathbf{z}}^{j} = [z\_1^{j} \dots z\_s^{j} z\_{s+1}^{i} \dots z\_{c+d}^{i}]^T. \tag{38}$$

In hybrid algorithms in each cycle of evolution for each possible solution one of three ways (33), (34) or (35), (36), or (37), (38) of obtaining new possible solutions is selected randomly. The algorithm stops calculation after all cycles of evolution are performed.

#### *4.2. Numerical Methods of Symbolic Regression for the Control Synthesis Problem*

For solution of the control synthesis problem numerical methods of symbolic regression are used. Now more than fourteen methods are know. The methods code a mathematical expression and search for optimal solution on the code space. All methods differ in the form of code.

For example, consider the following mathematical expression

$$y = \sin(\mathbf{x}\_1) + \exp(-q\_1 \mathbf{x}\_1)\cos(q\_2 \mathbf{x}\_2 + q\_1). \tag{39}$$

To code this mathematical expression the following basic sets are used: the set of arguments

$$\mathbf{F}\_0 = \{f\_{0,1} = \mathbf{x}\_1, f\_{0,2} = \mathbf{x}\_2, f\_{0,3} = q\_1, f\_{0,4} = q\_2\},\tag{40}$$

the set of elementary functions

$$\mathcal{F} = \{f\_{1,1}(z) = z, f\_{1,2}(z) = -z, f\_{1,3}(z) = \sin(z), f\_{1,4}(z) = \cos(z),$$

$$f\_{1,5}(z) = \exp(z), f\_{2,6}(z\_1, z\_2) = z\_1 + z\_2, f\_{2,7}(z\_1, z\_2) = z\_1 z\_2\},\tag{41}$$

where indexes of elements point the number of arguments and a function number, if the first index is equal to zero, then this is an argument of the mathematical expression.

The most popular and the earliest symbolic regression method is the genetic programming by J.Koza [6]. This method presents a mathematical expression in the form of computational tree. In the Figure 1 the computational tree for the mathematical expression (39) is presented.

In the Figure 1 the nodes of the tree contain numbers of functions, the leaves contain arguments of the mathematical expression.

**Figure 1.** Computational tree of genetic programming.

A code of genetic programming for the mathematical expression (39) has the following form

& 2 6 ' , & 1 3 ' , & 0 1 ' , & 2 7 ' , & 1 5 ' , & 2 7 ' , & 1 2 ' , & 0 3 ' , & 0 1 ' , & 1 4 ' , & 2 6 ' , & 2 7 ' , & 0 2 ' , & 0 4 ' , & 0 3 '. (42)

The code of genetic programming consists of indexes of elements of the computational tree on all branches from the top to the leaves. The code of genetic programming is used for presentation of the computational tree in the computer memory.

A code of genetic programming is not very comfortable, as codes of different mathematical expressions have different length that also changes after crossover operation. If in the mathematical expression one argument enters several times, then it has to be on leaves of the computational tree the same number of times.

Another method of symbolic regression—the network operator method [14]—codes mathematical expression in the form of oriented graph. In the Figure 2 the network operator graph for the mathematical expression (39) is presented.

**Figure 2.** The network operator graph of the mathematical expression.

On the network operator graph the nodes contain the numbers of functions with two arguments, the source-nodes contain the arguments of the mathematical expression, the arcs are marked with the numbers of functions with one argument.

In the computer memory the network operator graph is presented as an integer matrix.

$$\Psi = \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 3 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 7 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 6 & 0 & 4 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 7 & 5 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 7 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 6 \end{bmatrix}. \tag{43}$$

Each row of the matrix corresponds to a graph node. The numbers of nodes are located at the top of the nodes (see Figure 2). The nodes are numbered in such a way that the number of the node from which the arc exits must be less than the number of the node where the arc enters. Then a network operator matrix has an upper triangular form. In the network operator matrix the numbers of functions with two arguments are located on the main diagonal. Zero element on the main diagonal shows that the row corresponds to a source-node. Other non-zero non-diagonal elements are the numbers of functions with one argument.

Consider one more symbolic regression method. In Cartesian genetic programming [25] the code of the mathematical expression (39) has the following form

$$
\left( \begin{bmatrix} 7 \\ 1 \\ 3 \end{bmatrix}, \begin{bmatrix} 2 \\ 5 \\ 1 \end{bmatrix}, \begin{bmatrix} 5 \\ 6 \\ 2 \end{bmatrix}, \begin{bmatrix} 3 \\ 1 \\ 3 \end{bmatrix}, \begin{bmatrix} 7 \\ 2 \\ 4 \end{bmatrix}, \begin{bmatrix} 6 \\ 9 \\ 3 \end{bmatrix}, \begin{bmatrix} 4 \\ 10 \\ 1 \end{bmatrix}, \begin{bmatrix} 7 \\ 11 \\ 7 \end{bmatrix}, \begin{bmatrix} 6 \\ 8 \\ 12 \end{bmatrix} \right). \tag{44}$$

Here, every integer vector corresponds to one elementary function. The first element of vector is the function number, other elements are element numbers from the set of arguments (40). If a function has one argument, then the second argument is not used. The result of calculation according to each vector is added to the argument set, so a number of arguments increases in one every time after calculation of functions according to the vectors.

Due to redundancy, Cartesian genetic programming code has the same length for all mathematical expressions. A crossover operation for Cartesian genetic programming are performed by exchange of vectors after crossover point, so the length of codes does not change.

Studies of symbolic regression methods for the control synthesis problems show that it is effective to use in these methods the principle of small variations of the basic solution [26]. According to this principle only one basic solution is encoded by a method of symbolic regression. Other possible solutions are encoded by sets of variation vectors. Each variation vector makes one small change of basic solution code. After some generations the basic solution is changed on the best current found solution. This approach makes it possible to speed up the search process by narrowing the search space and avoiding additional checks for the correctness of the codes of possible solutions.

#### **5. Computational Experiment**

In a computational experiment a problem of general synthesis of optimal control for a spacecraft landing on the surface of the Moon is considered [27]. The differential equations of spacecraft state are the following

$$\begin{aligned} \dot{x}\_1 &= \frac{\mathcal{g}\_E(P\_c + u\_2)\cos(u\_1 - x\_2)}{x\_5} - \mathcal{g}\_{Mh}\cos(x\_2), \\ \dot{x}\_2 &= \frac{\mathcal{g}\_E(P\_c + u\_2)\sin(u\_1 - x\_2)}{x\_5} + \frac{\mathcal{g}\_{Mh}\sin(x\_2)}{x\_1}, \\ \dot{x}\_3 &= \frac{x\_1\cos(x\_2)}{1000}, \\ \dot{x}\_4 &= \frac{x\_1\sin(x\_2)}{1000}, \\ \dot{x}\_5 &= -\frac{P\_c + u\_2}{P\_5}, \end{aligned} \tag{45}$$

where **x** = [*x*<sup>1</sup> *x*<sup>2</sup> *x*<sup>3</sup> *x*<sup>4</sup> *x*5] *<sup>T</sup>* is a state vector, namely *x*<sup>1</sup> is the current speed of the spacecraft (m/s), *x*<sup>2</sup> is a trajectory inclination angle (rad), *x*<sup>3</sup> is the current flight altitude relative to the lunar surface (km), *x*<sup>4</sup> is a flight distance (km), *x*<sup>5</sup> is the mass of spacecraft including fuel (kg). **u** = [*u*<sup>1</sup> *u*2] *<sup>T</sup>* is a control vector, values of which are constrained

$$-\frac{\pi}{2} \le \mu\_1 \le \frac{\pi}{2}, \quad -80 \le \mu\_2 \le 80. \tag{46}$$

Parameters of model have the following values: gravitational acceleration at the certain altitude above the lunar surface

$$\mathcal{g}\_{Mh} = \mathcal{g}\_M \left( \frac{r\_M}{r\_M + x\_3} \right)^2,\tag{47}$$

the Moon gravitational acceleration *gM* = 1.623 m/s2, the Earth gravitational acceleration *gE* = 9.80665 m/s2, the Moon radius *rM* = 1737 km, nominal thrust of the spacecraft engine *Pc* = 720 kg, spacecraft engine thrust *Ps* = 319 s.

A domain of initial states is

$$\mathbf{x}\_{0} = \{ \mathbf{x}\_{0,1} = 1689, \ -1.65 \le \mathbf{x}\_{0,2} \le -1.55, \ 17 \le \mathbf{x}\_{0,3} \le 20, \ \mathbf{x}\_{0,4} = 0, \ \mathbf{x}\_{0,5} = 1500 \}. \tag{48}$$

A terminal state is

$$\mathbf{x}^f = \begin{bmatrix} \mathbf{x}\_1^f = 10, & \mathbf{x}\_3^f = 0.2 \end{bmatrix}^T. \tag{49}$$

Phase constraints are determined by the mechanics of spacecraft flight. Obviously the speed *x*1, altitude *x*<sup>3</sup> and fuel level *x*<sup>5</sup> cannot be negative, reaching a zero altitude *x*<sup>3</sup> or zero fuel level *x*<sup>5</sup> at a significant speed *x*<sup>1</sup> means that the spacecraft has crashed. Consider the following phase constraints

$$\begin{array}{lclclcl}h\_k(\mathbf{x}) &=& -\mathbf{x}\_j \le \mathbf{0}, \ k = 1, 2, 3, \ j = 1, 3, 5, \\ h\_k(\mathbf{x}) &=& \theta(0.001 - \mathbf{x}\_j)(\mathbf{x}\_1 - V\_{\max}) \le \mathbf{0}, \ k = 4, 5, \ j = 3, 5, \end{array} \tag{50}$$

where *Vmax* is the maximum landing speed, *Vmax* = 1, *ϑ*(*A*) is the Heaviside function.

According to the proposed method at the first step the training set is to be formed. We determine the finite set of initial states within the domain (48) and solve the optimal control problem for each initial state from this set.

Let us replace the domain of initial states (48) with a set of *M* = 21 elements uniformly distributed on this domain

$$\bar{\mathbf{X}}\_{0} = \left\{ \mathbf{x}^{0, j+7(i-1)} = \begin{bmatrix} 1689 & -1.65 + 0.05(i-1) & 17 + 0.5(j-1) & 0 \ 1500 \end{bmatrix}^{\mathsf{T}} \right\}, \ i = \overline{1, 3}, \ j = \overline{1, 7}. \tag{51}$$

Quality criterion considers the proximity of reaching terminal state and the case of phase constraints violation

$$\begin{split} J\_{\boldsymbol{\beta}} &= \boldsymbol{\alpha}\_{1} \sqrt{\left(\boldsymbol{\alpha}\_{1} \left(\boldsymbol{t}\_{f}(\mathbf{x}^{0,j})\right) - \boldsymbol{x}\_{1}^{f}\right)^{2} + \left(\boldsymbol{x}\_{3} \left(\boldsymbol{t}\_{f}(\mathbf{x}^{0,j})\right) - \boldsymbol{x}\_{3}^{f}\right)^{2}} + \\ &\boldsymbol{\alpha}\_{2} \int\_{0}^{\boldsymbol{t}\_{f}(\mathbf{x}^{0,j})} \left(\sum\_{k=1}^{K} \vartheta\left(\boldsymbol{h}\_{k}(\mathbf{x}(t, \mathbf{x}^{0,j}))\right) \boldsymbol{h}\_{k}(\mathbf{x}(t, \mathbf{x}^{0,j}))\right) dt \to \min,\end{split} \tag{52}$$

where *αi*, *i* = 1, 2 are given penalty factors, *K* = 5 is a number of phase constraints, *j* = 1, *M*.

To search for solution to the optimal control problem the direct approach was used. The original problem was reduced to a nonlinear programming problem by introducing the time interval Δ*t*. The solution of each optimal control problem in form of control vector at discrete moments of time was searched independently by hybrid evolutionary algorithm combining modern Grey Wolf Optimizer (GWO), which does not require problem specific tuning of additional parameter, and well-known Particle swarm optimization (PSO). Separately these algorithms showed a high efficiency in solving optimal control problems. A hybrid realization is to increase their effectiveness.

In a computational experiment the size of the set of possible solutions was 100, number of search iterations was 5000. Modeling parameters were the following: maximum control time *tmax* = 300, discretization time interval Δ*t* = 30, penalty factors *α*<sup>1</sup> = 10, *α*<sup>2</sup> = 10.

At the second step of proposed approach we use obtained optimal trajectories to synthesize a multidimensional control function of object state space. The search for a control function is conducted by a symbolic regression method that search for the most suitable expression that approximates provided optimal trajectories best.

We used the network operator method to synthesize a control function. NOP allows to search for the structure of mathematical expression simultaneously with the search for optimal values of parameter vector. In the computational experiment we used the following parameters of NOP: size of NOP matrix was 40, size of the set of input variables was 3, size of the set of input parameters was 12, number of outputs was 2, number of candidate solutions in the initial set was 256, maximum number of search iteration was 25,000.

As a result of computational experiment, a control function in the form of NOP matrix and a parameter vector was obtained. The mathematical expression for the found control function is as follows

$$u\_1 = \begin{cases} \pi/2, \text{if } \vec{u}\_1 > \pi/2\\ -\pi/2, \text{if } \vec{u}\_1 < -\pi/2\\ \vec{u}\_{1\prime} \text{otherwise} \end{cases}, \ u\_2 = \begin{cases} 80, \text{if } \vec{u}\_2 > 80\\ -80, \text{if } \vec{u}\_2 < -80\\ \vec{u}\_{2\prime} \text{ otherwise} \end{cases},\tag{53}$$

where

*u*˜1 = *χ*6(−*z*34, tanh(*z*35), *z*37), *u*˜2 = sgn(*u*1) ! |*u*1| − *z*<sup>36</sup> + arctan(*z*35) + sgn(*z*34) ! |*z*34| + log(|*z*33|)+ *z*−<sup>1</sup> <sup>32</sup> <sup>+</sup> *<sup>z</sup>*<sup>31</sup> <sup>+</sup> *<sup>z</sup>*<sup>28</sup> <sup>−</sup> *<sup>z</sup>*<sup>3</sup> <sup>28</sup> <sup>+</sup> log(|*z*26|) <sup>−</sup> *<sup>z</sup>*<sup>25</sup> <sup>+</sup> arctan(*z*21) + sgn((*z*17))! |*z*17|+ exp(*q*12) + tanh(*q*10) + *q*<sup>9</sup> + log(*q*5) + tanh(*q*1), *<sup>z</sup>*<sup>37</sup> <sup>=</sup> min{*z*<sup>36</sup> <sup>−</sup> *<sup>z</sup>*<sup>3</sup> <sup>36</sup>, log(|*z*35|), *<sup>z</sup>*<sup>34</sup> <sup>−</sup> *<sup>z</sup>*<sup>3</sup> <sup>34</sup>, *<sup>z</sup>*−<sup>1</sup> <sup>32</sup> , sgn(*z*28) <sup>|</sup>*z*28|, tanh(*z*27), exp(*z*25), *<sup>z</sup>*−<sup>1</sup> <sup>20</sup> , <sup>√</sup><sup>3</sup> *<sup>x</sup>*2}, *z*<sup>36</sup> = min{exp(*z*29), tanh(*z*28), sgn(*z*27) ! |*z*27|, arctan(*z*26), <sup>√</sup><sup>3</sup> *<sup>z</sup>*22, *<sup>z</sup>*<sup>3</sup> 20, tanh(*q*6), <sup>√</sup><sup>3</sup> *<sup>q</sup>*1, *<sup>x</sup>*<sup>3</sup> 3}, *z*<sup>35</sup> = arctan(*z*23) + tanh(*z*22) + tanh(*q*12) + log(|*x*3|), *z*<sup>34</sup> = max{sgn(*z*33) ! <sup>|</sup>*z*33|, log(|*z*30|), *<sup>z</sup>*−<sup>1</sup> <sup>29</sup> , *<sup>z</sup>*−<sup>1</sup> <sup>22</sup> , *<sup>z</sup>*−<sup>1</sup> <sup>20</sup> }, *<sup>z</sup>*<sup>33</sup> <sup>=</sup> min{*z*−<sup>1</sup> <sup>32</sup> , arctan(*z*29), tanh(*z*24), *<sup>z</sup>*<sup>3</sup> 22, *<sup>z</sup>*19, <sup>−</sup>*z*16, *<sup>z</sup>*−<sup>1</sup> <sup>11</sup> , tanh(*q*3), tanh(*q*2), exp(*x*2)}, *<sup>z</sup>*<sup>32</sup> <sup>=</sup> min{*z*<sup>3</sup> 31, *<sup>z</sup>*26, log(|*z*25|), *<sup>z</sup>*−<sup>1</sup> <sup>18</sup> , <sup>√</sup>*q*10, arctan(*q*6), <sup>√</sup><sup>3</sup> *<sup>q</sup>*2}, *<sup>z</sup>*<sup>31</sup> <sup>=</sup> log(|*z*27|) + *<sup>z</sup>*<sup>2</sup> <sup>26</sup> + *<sup>z</sup>*<sup>2</sup> <sup>24</sup> <sup>+</sup> arctan(*z*22) + <sup>√</sup><sup>3</sup> *<sup>z</sup>*<sup>21</sup> <sup>+</sup> exp(*z*20) + exp(*z*17)+ *z*−<sup>1</sup> <sup>16</sup> <sup>+</sup> *<sup>q</sup>*<sup>3</sup> <sup>12</sup> <sup>−</sup> *<sup>q</sup>*<sup>5</sup> <sup>+</sup> <sup>√</sup><sup>3</sup> *<sup>q</sup>*<sup>2</sup> <sup>+</sup> <sup>√</sup><sup>3</sup> *<sup>x</sup>*1, *z*<sup>30</sup> = max{*z*29, −*z*26, tanh(*z*25), arctan(*z*18), sgn(*z*16) ! |*z*16|, *q*11}, *z*<sup>29</sup> = *χ*6(*z*28, *z*−<sup>1</sup> <sup>27</sup> , *<sup>z</sup>*<sup>3</sup> 26, log(|*z*24|), <sup>−</sup>*z*<sup>3</sup> 22, −*z*20, <sup>√</sup><sup>3</sup> *<sup>z</sup>*17, *<sup>q</sup>*7, *<sup>q</sup>*−<sup>1</sup> <sup>2</sup> , <sup>√</sup>*q*1), *<sup>z</sup>*<sup>28</sup> <sup>=</sup> max{*z*27, *<sup>z</sup>*−<sup>1</sup> <sup>23</sup> , <sup>−</sup>*z*20, log(|*z*19|), *<sup>z</sup>*<sup>18</sup> <sup>−</sup> *<sup>z</sup>*<sup>3</sup> 18, −*z*17, log(|*z*16|), *q*7, log(*q*5)}, *z*<sup>27</sup> = *χ*6(*z*24, arctan(*z*20), <sup>√</sup>*q*11, *<sup>q</sup>*<sup>8</sup> <sup>−</sup> *<sup>q</sup>*<sup>3</sup> 8, *<sup>q</sup>*<sup>3</sup> 6, −*q*5, log(*q*4)), *z*<sup>26</sup> = *χ*5( <sup>√</sup><sup>3</sup> *<sup>z</sup>*23, <sup>−</sup>*z*20, *<sup>z</sup>*<sup>19</sup> <sup>−</sup> *<sup>z</sup>*<sup>3</sup> 19, −*z*18, sgn(*z*15) ! |*z*15|, *q*12, <sup>√</sup><sup>3</sup> *<sup>x</sup>*3), *<sup>z</sup>*<sup>25</sup> <sup>=</sup> max{*z*<sup>2</sup> 23, *<sup>z</sup>*22, *<sup>z</sup>*<sup>21</sup> <sup>−</sup> *<sup>z</sup>*<sup>3</sup> 21, arctan(*z*17), *q*11, *q*9, <sup>√</sup>*q*5}, *<sup>z</sup>*<sup>24</sup> <sup>=</sup> max{tanh(*z*21), *<sup>z</sup>*<sup>20</sup> <sup>−</sup> *<sup>z</sup>*<sup>3</sup> 20, *<sup>z</sup>*<sup>3</sup> 15, *<sup>q</sup>*<sup>12</sup> <sup>−</sup> *<sup>q</sup>*<sup>3</sup> 12, *<sup>q</sup>*10, *<sup>q</sup>*<sup>3</sup> 8, <sup>√</sup><sup>3</sup> *<sup>q</sup>*5, *<sup>q</sup>*<sup>3</sup> 4, exp(*x*2)}, *z*<sup>23</sup> = *z*20*z*18*z*16*q*<sup>9</sup> <sup>√</sup>*q*7sgn(*x*1) ! |*x*1|, *z*<sup>22</sup> = *χ*6(*z*19, *z*−<sup>1</sup> <sup>17</sup> , exp(*z*16), <sup>√</sup><sup>3</sup> *<sup>q</sup>*10, arctan(*q*9), *<sup>q</sup>*−<sup>1</sup> <sup>8</sup> , <sup>√</sup>*q*2), *<sup>z</sup>*<sup>21</sup> <sup>=</sup> max{tanh(*z*18), *<sup>q</sup>*<sup>2</sup> 8, *q*7, −*q*6, <sup>√</sup>*q*3}, *<sup>z</sup>*<sup>20</sup> <sup>=</sup> *<sup>χ</sup>*6(−*z*19, *<sup>z</sup>*17, arctan(*z*10), *<sup>q</sup>*6, <sup>−</sup>*q*2, *<sup>x</sup>*−<sup>1</sup> <sup>3</sup> ), *z*<sup>19</sup> = *z*<sup>16</sup> + arctan(*q*11) + *q*−<sup>1</sup> <sup>8</sup> + *q*<sup>5</sup> + arctan(*x*3), *z*<sup>18</sup> = *χ*5(*z*15, *q*−<sup>1</sup> <sup>12</sup> , *<sup>q</sup>*−<sup>1</sup> <sup>7</sup> , *<sup>q</sup>*4, *<sup>x</sup>*<sup>1</sup> <sup>−</sup> *<sup>x</sup>*<sup>3</sup> 1), *z*<sup>17</sup> = *q*<sup>3</sup> + *x*<sup>3</sup> − *q*3*x*3, *z*<sup>16</sup> = *q*2*x*<sup>2</sup> tanh(*q*5), *z*<sup>15</sup> = sgn( <sup>√</sup>*q*<sup>10</sup> <sup>+</sup> *<sup>q</sup>*<sup>5</sup> <sup>−</sup> *<sup>q</sup>*<sup>3</sup> <sup>5</sup> + *<sup>q</sup>*−<sup>1</sup> <sup>3</sup> + *q*<sup>1</sup> + *x*1) ! *<sup>q</sup>*<sup>10</sup> + (*q*<sup>5</sup> <sup>−</sup> *<sup>q</sup>*3)<sup>2</sup> <sup>+</sup> *<sup>q</sup>*−<sup>2</sup> <sup>3</sup> + *<sup>q</sup>*<sup>2</sup> <sup>1</sup> + *<sup>x</sup>*<sup>2</sup> 1, *χ*5(*a*1, *a*2) = *a*<sup>1</sup> + *a*<sup>2</sup> − *a*1*a*2, *χ*5(*a*1,..., *as*) = *χ*5(*χ*5(*a*1, *χ*5(..., *χ*5(*as*−1, *as*)...),

$$\chi\_{\boldsymbol{\theta}}(a\_1, \dots, a\_s) = \text{sgn}\left(\sum\_{i=1}^s a\_i\right) \sqrt{\sum\_{i=1}^s a\_{i'}^2}$$

*q*<sup>1</sup> = 2.3474, *q*<sup>2</sup> = 10.5066, *q*<sup>3</sup> = 9.9106, *q*<sup>4</sup> = 13.1419, *q*<sup>5</sup> = 9.6631, *q*<sup>6</sup> = 4.4541,

*q*<sup>7</sup> = 2.1899, *q*<sup>8</sup> = 4.8552, *q*<sup>9</sup> = 3.1116, *q*<sup>10</sup> = 6.6172, *q*<sup>11</sup> = 12.6812, *q*<sup>12</sup> = 15.6148.

Functions *χ*<sup>5</sup> and *χ*<sup>6</sup> are commutative, associative, and have a unit element, zero.

To check the solution we used the found control function to obtain optimal control and corresponding trajectories for various initial states from (48). Among considered initial states were both those that were present in the training set (51) and those that were not present.

Table 1 shows the values of quality criterion *J*∗ obtained using the found control function (53) for 21 initial states from the finite set (51). The optimal trajectories known for these initial states were previously used as a training set. This test is to show the quality of approximation. The value of the quality criterion *Jocp* obtained by solving the optimal control problem for the same initial state is showed in the table as a reference value. The average deviation of the quality criterion values from the reference ones is 0.0591, maximum deviation is 0.2514, the standard deviation is 0.0648.

**Table 1.** Results of the computational experiment using initial states from the training set.


Table 2 shows the values of quality criterion *J*∗ obtained using the found control function (53) for 10 initial states generated randomly within the domain (48). This test is to show the suitability of the found control function for any initial state from the domain (48). The value of the quality criterion *Jocp* obtained by solving the optimal control problem for the same initial state is showed in the table as a reference value. The average deviation of the quality criterion values from the reference ones is 0.0366, maximum deviation is 0.1122, the standard deviation is 0.0341.

**Table 2.** Results of the computational experiment for random initial states.


Figures 3 and 4 illustrate the results of computational experiment for initial state **<sup>x</sup>**<sup>0</sup> = [<sup>1689</sup> <sup>−</sup>1.565 18.92 0 1500] *<sup>T</sup>*. Graphs of the trajectories obtained using the found control function are shown by black solid lines. For comparison the trajectories obtained by solving the optimal control problem are shown by grey dashed lines. Figure 5 shows the found control function values over time.

**Figure 3.** The graphs of trajectories: (**a**) spacecraft speed over time *x*1(*t*); (**b**) spacecraft altitude over time *x*3(*t*). Found solution—black solid line; reference solution—grey dashed line.

**Figure 4.** The graphs of trajectories: (**a**) spacecraft speed over distance *x*1(*x*4); (**b**) spacecraft altitude over distance *x*3(*x*4). Found solution—black solid line; reference solution—grey dashed line.

The computational experiment showed that the found multidimensional control function allows one to obtain a close to optimal solution for any initial states from the given domain (48) even for those initial states that were not in the training set (51).

According to the analysis of the standard deviation, the training set contained a sufficient number of optimal trajectories. A better value of the standard deviation for the experiment with randomly distributed in (48) initial states can be explained by the fact that the set (51) had a large number of initial states on the boundaries of the set (48).

**Figure 5.** The graphs of control values over time: (**a**) control *u*1(*t*); (**b**) control *u*2(*t*). Found solution—black solid line; reference solution—grey dashed line.
