1. Introduction
A general approach to the numerical solution of the problem of finding the average optimal control of nonlinear deterministic dynamical systems under conditions of uncertainty in setting the initial conditions and incomplete information about the state vector is proposed. Since direct information about the state vector is not available, a nonlinear state observer is included in the closed-loop control system, which finds an estimate of the state vector from the output of the nonlinear model of the measuring system. The control laws of the plant and the observer are found simultaneously as functions of time and estimates of the state vector. In contrast to linear systems with a quadratic criterion, in which the synthesis of the optimal controller and the optimal filter is performed independently, in the proposed procedure, the undefined coefficients of the control laws of the plant and the observer are sought simultaneously [
1].
An alternative way is to use various numerical methods for solving the Bellman equation as a sufficient condition for optimality of feedback control in the complete state information problem. In this case, arbitrary initial conditions are considered, for which the minimum of the functional should be obtained. When solving practical problems of control theory, it is usually possible to define a set of initial states, determined by the conditions of operation of the control system, and for this set to search for the corresponding law of control with feedback. To complete the solution, one should find the parameters of the nonlinear observer independently and use the estimate of the state vector in the optimal control law instead of exact information about the state vector.
In the present paper, the behavior of a nonlinear continuous deterministic plant (model of object) is described by the ODE system. Parallelepiped constraints are imposed on control vector coordinates. Initial conditions are given by a compact set of initial states. The quality of separate trajectory control is estimated by the value of the Bolz functional. For the given set of initial conditions, a pencil of trajectories is considered. The performance index to be minimized is calculated by the average value of the Bolz functional over the set of initial states. The problem is to find the control laws for the plant and the state observer in the class of functional expansions in terms of elements of orthonormal basis systems with unknown coefficients, depending on time and estimates of the state vector coordinates. The components of the control laws are found using systems of basis functions that are used in problems of spectral analysis [
2,
3]. It is proposed to apply the mini-batch adaptive method of random search (MAMRS) [
4,
5,
6] for solving the problem under consideration and to analyze the solution of the problem for various models of the measuring system. As a special case, the control problem with complete information about the state vector is considered. MAMRS can be classified as a metaheuristic method [
7,
8,
9,
10,
11]. MAMRS extends the idea of stochastic gradient methods [
12,
13,
14,
15] to a method that does not require information about the gradient. The efficiency and analysis of this method is demonstrated by solving an applied optimal control problem of satellite stabilization [
16].
2. Statement of the Problem
We consider the nonlinear continuous dynamical system described by the vector differential equation:
where
is a given continuous function,
is a continuous time and the initial moment
and final moment
are specified;
is a state vector;
is a control vector and
is a set of allowable values of control.
The initial conditions are specified as:
where Ω is a set with positive measure (
) and a piecewise smooth boundary. It characterizes the uncertainty in setting the initial conditions.
The model of the measuring system is described by the relation:
where
is an output vector and
is a given continuous function. The information coming from the model of the measuring system arrives at the input of the state observer, producing an estimate of the state vector.
We suppose that it is possible to obtain an estimate of the state vector using a nonlinear observer of the form:
where
is a state vector estimate,
is an initial estimate and
is an unknown continuous
matrix function. This matrix is considered as a feedback control of the observation process. The state vector estimate is used also in the plant control law
.
We define the set of admissible control laws by functions , where , the plant control is a piecewise continuous and the observer control is a continuous function. It is assumed that the solution of the system of Equations (1) and (4) with the initial conditions (2), (5) taking into account (3), exists and is unique.
The performance index for a separate trajectory:
where
are given continuous functions.
We associate the pencil of trajectories of the system of Equations (1) and (4) with each admissible control law
and the set
of initial states:
that is, the union of the system of Equations (1) and (4) and solutions for all possible initial states from the set
.
The performance index for the pencil of trajectories control to be minimized is:
The optimal control problem is to choose the control policy
so that performance index (7) is minimized:
Since the average value of performance index (6) is minimized on the set of initial states , the required control is called optimal on average.
3. Solution Search Strategy
We consider the transition to the parametric optimization problem from the control problem (8), i.e., to the problem of finding unknown coefficients of the plant control and the observer control. The plant control constraints of parallelepiped type should be taken into account.
To implement this transition, we use the following assumptions:
- 1.
The set of initial states is a parallelepiped, defined by the direct product of segments i.e., . With the help of a step , all line segments are divided into segments and the parallelepiped is divided into elementary disjoint subsets . In each elementary subset , an initial state (the center of the parallelepiped is specified;
- 2.
The direct product represents the set of admissible values of the state vector coordinates, where are the lower and upper boundaries for each coordinate, respectively, determined by the applied problem being solved. Therefore, one can assume that the possible estimates of the state vector should satisfy the following conditions: ;
- 3.
The plant control policy is searched in the form:
where saturation function sat guarantees the fulfillment of the plant control constraints of the form
:
where
are unknown coefficients;
are scales of truncation;
is a system of orthonormal time functions (basis functions) defined on the segment
and satisfying the condition
; and
is a system of orthonormal functions of a variable
(basis functions) defined on an interval
,
.
As the basis functions , one can take, for example:
where
and other systems of basic functions.
The matrix entries of the state observer control policy are found by a formula similar to (11), where variable is replaced by .
The value of the pencil control cost functional (7) is approximated as:
The optimization problem is to choose the best parameters
,
, minimizing performance index (12) by using a mini-batch adaptive method of random search (MAMRS) [
4]. The strategy of its application is that, for the approximate calculation of functional (12), randomly selected
non-coinciding trajectories emanating from the set of initial states are used that form a mini-batch:
The mini-batch size is user-definable,, and is usually selected step by step. Furthermore, for simplicity of presentation, we assume that each coordinate of the control laws and can be associated with a matrix column of the coefficients , . Furthermore, by concatenation, one can represent the entire set of optimized parameters in the form of an extended vector. Let us denote it by and assume that it has dimension . The objective function is denoted by . For each mini-batch size , the optimization results are different. When , the accuracy of solving the optimization problem in general increases.
4. Mini-Batch Adaptive Search Algorithm
Let us consider the optimization problem .
Denote: is the minimum value of the cost function after the th run; is a best parameter vector column after startup; is the mini-batch size.
Step 0. Set the initial mini-batch size: (in general, one can start with any value of ); is a maximum number of starts; is a maximum number of passes; is an expansion coefficient; is a compression coefficient; M is a maximum number of failed tests at the current iteration; is an initial step size (one can use any value ), R is a minimum step size and L is a maximum number of iterations; and is a number of initial trial solutions ().
Step 1. Set the values: (passes number counter) and (initial value of the sum of the cost function average values).
Step 2. Set the values: (starts number counter) and ; (initial value of the sum of the objective function values).
Step 3. Define the initial values of coefficients
,
. Generate
r vectors
using a uniform distribution of its coordinates at some intervals. Calculate the value of the function
for each generated vector and order them according to the value of the objective function [
17]. The vector with the smallest value of the objective function is denoted by column vector
. Put
.
Step 4. Generate a random vector , where is a random variable uniformly distributed on the interval [−1,1].
Step 5. Calculate: .
Step 6. Generate the mini-batch size , i.e., generate pairwise mismatched sets of values defining the initial states with numbers or corresponding to the tuple .
Check the fulfillment of the conditions:
- (a)
If , the algorithm step is successful. Put . Determine if current direction is successful: if , the search direction is successful. Put , tl+1 = αtl, l = l + 1 and check the termination condition. If l < L, put j = 1 and go to step 4. If l = L, the search process is over: , go to step 8; if , the search direction is unsuccessful, go to step 7;
- (b)
If , the unsuccessful step is made, go to step 7.
Step 7. Calculate the number of unsuccessful steps from the current solution:
- (a)
If j < M, put j = j + 1 and go to step 4;
- (b)
If j = M, check the termination condition: if , the process is over:
and , go to step 8; if , put tl = β tl, j = 1 and go to step 4.
Step 8. Check the improvement of the cost function value as a result of the -th run: if , put and and go to step 9; if , go to step 9.
Step 9. Calculate and verify the stop conditions (the maximum number of starts is achieved): if , put and go to step 3; if , put —the best solution during the -th pass for a given ; calculate and and go to step 10.
Step 10*. Put and and check the condition for completing a given number of passes: if , put and go to step 2; if ,calculate: , .
Step 11*. Check the condition for completing studies of the effect of the mini-batch size: if , put and go to step 1; if , go to step 12.
Step 12. As a result, find the best estimate of after passes and indicators and for each value of the mini-batch size . To analyze the resulting estimation accuracy, find the value .
Steps 10 and 11 are performed if necessary. It is recommended to do restarts to increase the chances of finding a global extremum. The best solution is selected from the restarts made.
5. Satellite Stabilization Problem
The problem of damping the rotational motion of the satellite by the engines installed on it is considered. The system describing the motion of a rigid body relative to the center of inertia after the transition to dimensionless variables has the form:
where
are the projections of the angular velocity onto the main central axes of inertia and
and
are controls that characterize the thrust of the engines located on the satellite.
The set of initial states is given by a uniform distribution law on the set
At the final moment of the system functioning, the following conditions must be fulfilled: corresponding to the meaning of the satellite stabilization problem. The fulfillment of terminal conditions should be accompanied by minimization of the fuel used to turn the satellite.
Next, we will consider two examples: the joint estimations and control problem with incomplete information about the state vector and the optimal control problem with complete information about the state vector.
5.1. Example 1. The Joint Estimation and Control Problem
The proposed observer equation is:
Further, we will consider the cases of solving the problem with different models of the measuring system.
In all tests, the number of initial states is and is the scale of truncation. The initial state estimation vector is . Parameters of the mini-batch adaptive method of random search are , and . To synthesize the plant control and observer control , a system of orthonormal Legendre polynomials is used.
5.1.1. Case A
The measuring system model is described by the following relationship:
The behavior of trajectories set for different mini-batch sizes is shown in
Figure 1:
Table 1 shows the results of solving the problem depending on the mini-batch size.
5.1.2. Case B
The measuring system model is described by the following relationship:
The behavior of the trajectories set for different sizes of mini-batch is shown in
Figure 2:
Table 2 shows the results of solving the problem depending on the mini-batch size.
5.1.3. Case C
The measuring system model is described by the following relationship:
The behavior of the trajectories set for different sizes of mini-batch is shown in
Figure 3:
Table 3 shows the results of solving the problem depending on the mini-batch size.
Based on
Table 1,
Table 2 and
Table 3, we can conclude that, with an increase of the mini-batch size, the accuracy of the problem solution also increases.
Figure 4 and
Table 4 show the solution to the problem of satellite stabilization depending on the selected model of measuring systems with a mini-batch
:
From
Figure 4 and
Table 4, a similar character of convergence for different models of the measuring system is observed.
5.2. Example 2. The Control Problem with Complete Information about the State Vector
The measuring system model is described by the following relationship:
In this case, there is no need to use a state observer because there is complete information about the state vector at an arbitrary moment in time. In practice, this case is rarely realized, but it is of interest for the analysis of losses in terms of the value of the cost functional associated with the incompleteness of the information received. In all tests, the number of generated random initial states is and is the scale of truncation. Parameters of the mini-batch adaptive method of random search are , and . To synthesize the plant control , a system of orthonormal Legendre polynomials is used.
The behavior of trajectories set for different sizes of mini-batch is shown in
Figure 5:
Table 5 shows the results of solving the problem depending on the mini-batch size.
Based on the results of examples 1 and 2, we can conclude that, with the mini-batch size , good convergence of the estimates of the state vector coordinates to the true values is already achieved. The total execution time of the algorithm with the mini-batch size was 30 min and with the mini-batch size was 90 min based on an INTEL CORE i5 2.10 GHz processor. The results obtained indicate that, when using mini-batches, the required quality of transients is achieved at reasonable computational costs.
6. Conclusions
The developed zero-order metaheuristic optimization algorithm, namely, a mini-batch adaptive method of random search, is tested on the satellite stabilization problem of finding the optimal control for a pencil of trajectories of nonlinear deterministic systems emanating from a given set of initial states. The software for solving the problem of satellite stabilization is developed. Three cases of solving the problem for different models of the measuring system with incomplete information are considered. The analysis of the problem solution for different models of the measuring system with incomplete information is carried out. A comparison is made with the solution of the problem with a model of the measuring system containing complete information about the state vector. The study of the influence of the mini-batch size on the accuracy of the solution in each considered problem is carried out. Recommendations on the choice of the algorithm parameters are given. The obtained numerical results confirm the idea that, for a certain mini-batch size, an acceptable quality of transient processes can be achieved with low computational costs.