A Machine Learning Algorithm That Experiences the Evolutionary Algorithm’s Predictions—An Application to Optimal Control

Mînzu, Viorel; Arama, Iulian

doi:10.3390/math12020187

Open AccessArticle

A Machine Learning Algorithm That Experiences the Evolutionary Algorithm’s Predictions—An Application to Optimal Control

by

Viorel Mînzu

^1,*

and

Iulian Arama

²

¹

Control and Electrical Engineering Department, “Dunarea de Jos” University, 800008 Galati, Romania

²

Informatics Department, “Danubius” University, 800654 Galati, Romania

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(2), 187; https://doi.org/10.3390/math12020187

Submission received: 9 December 2023 / Revised: 28 December 2023 / Accepted: 4 January 2024 / Published: 6 January 2024

(This article belongs to the Special Issue AI Algorithm Design and Application)

Download

Browse Figures

Versions Notes

Abstract

:

Using metaheuristics such as the Evolutionary Algorithm (EA) within control structures is a realistic approach for certain optimal control problems. They often predict the optimal control values over a prediction horizon using a process model (PM). The computational effort sometimes causes the execution time to exceed the sampling period. Our work addresses a new issue: whether a machine learning (ML) algorithm could “learn” the optimal behaviour of the couple (EA and PM). A positive answer is given by proposing datasets apprehending this couple’s optimal behaviour and appropriate ML models. Following a design procedure, a number of closed-loop simulations will provide the sequences of optimal control and state values, which are collected and aggregated in a data structure. For each sampling period, datasets are extracted from the aggregated data. The ML algorithm experiencing these datasets will produce a set of regression functions. Replacing the EA predictor with the ML model, new simulations are carried out, proving that the state evolution is almost identical. The execution time decreases drastically because the PM’s numerical integrations are totally avoided. The performance index equals the best-known value. In different case studies, the ML models succeeded in capturing the optimal behaviour of the couple (EA and PM) and yielded efficient controllers.

Keywords:

evolutionary algorithm; machine learning; optimal control; simulation

MSC:

68T05; 68T20; 68W50; 49-04

1. Introduction

Controlling a process subjected to a performance index is a usual task in process engineering. Theoretical control laws can be implemented in favourable situations where the process has certain mathematical properties. On the other hand, when the process has profound nonlinearities, or its model is uncertain, imprecise or incomplete, using metaheuristic algorithms (EA, Particle Swarm Optimization, etc.) (see [1,2,3]) within an appropriate control structure could be a realistic solution. Control engineering recorded many examples of using metaheuristics [4,5,6,7,8,9] owing to their robustness and capacity to cope with complex problems.

Generally speaking, the metaheuristic algorithm’s role within a controller is to predict the optimal (quasi-optimal) control values. The predictor forecasts the optimal control sequence for a prediction horizon, and the controller decides the next optimal control value. A control structure adequate for this kind of controller is receding horizon control (RHC) [10,11,12]. It is used to solve optimal control problems (OCPs) and includes a process model (PM).

Because this work’s main result is applied to the RHC, we recall hereafter the basic principles due to which this closed-loop structure controls the process optimally:

-: The controller acquires the process’s current state and makes optimal predictions to establish the current optimal control values.
-: The controller embeds a PM (for example, a set of algebraic and differential equations) to compute the predictions via the PM’s numerical integration.
-: The controller organizes the shifting of the prediction horizon.

A possible organization of the receding prediction horizon is given in [10,13]. An RHC particular case is the well-known model predictive control (MPC) that minimizes the prediction errors at each sampling period. There are plenty of articles addressing MPC and covering different aspects, from which we recall a few: theoretical works in [14,15], tutorial reviews in [16], and surveys of industrial applications in [17,18,19].

Many works have integrated genetic algorithms (GAs), EAs, and other metaheuristics inside the RHC and implemented successfully real-time control structures. The book by Jayaraman and Siarry [6] describes many applications of this kind. Goggos and King [20] introduced the evolutionary predictive control technique. At every sampling moment, evolutionary algorithms generate and evaluate a family of optimum predictive controllers having different parameters, and the best performer is selected.

Other works make EAs or GAs fit in the MPC structure; emphasis is placed on the operators’ definition. The authors of [21] proposed a specialized GA optimization method based on the Takagi–Sugeno model for fuzzy predictive control. Nonlinear MPC strategies are described in [22]; the paper proposes stochastic optimization algorithms associated with a polynomial-type process model.

The RHC was also used for flood control in [23], and ulteriorly, paper [24] described a real-time flood control system using an EA and the RHC.

In previous works, the authors have studied implementing the prediction module using EAs [25,26]. The EAs provided a realistic solution due to many possibilities to reduce the predictor’s execution time. Certain control engineering aspects, such as the continuous signals’ discretization and the dynamic sub-systems’ time constants, determine the choice of sampling period (T). So, the value T could not be increased at will. Within an iterative process, the controller makes predictions for the process’s evolution over a prediction horizon, hT (h variable). A larger value for h is desirable since there is an increased chance of quasi-optimal behaviour along the control horizon.

On the other hand, the larger the value h, the larger the prediction calculation time. However, the controller execution time, including EA predictions, cannot exceed the sampling period. Due to the large computational effort, this is the controller’s most restrictive time constraint. So, the predictor’s execution time decrease is the challenge of this approach (EA + RHC), which is mainly appropriate for slow processes with large sampling periods.

A Machine Learning Algorithm Extending the Applicability of the EA Predictions

Extending the applicability of the EA predictions is a challenge, involving techniques and control structures diminishing the execution time [26]. One can consider that our work addresses the execution time’s decreasing, but the proposed machine learning (ML) task (see [27,28,29,30]) largely exceeds this topic.

This work proposes an interesting issue: whether a machine learning algorithm could “learn” the optimal behaviour of the couple (EA and PM). A positive answer would have very favourable consequences for the controller implementation. The substitution of the couple (EA and PM) with an accurate ML algorithm could cause the predictions’ computation time to decrease significantly and the controller’s structure to be much simpler. In this context, the ML algorithm has to construct usable models experiencing datasets catching the optimal behaviour of the couple (EA and PM). Concretely, this work’s main objective is to answer the above-mentioned question positively by proposing the following:

Realistic datasets apprehending the optimal behaviour of the couple (EA and PM);
Appropriate ML models.

Hence, this work will propose an ML model that can substitute the couple (EA and PM) inside the optimal controller while keeping the control performances. In other words, the ML model would be equivalent in a certain sense to the optimal behaviour of the EA plus PM; both entities are predictors [31]. According to our knowledge, this “intelligent equivalence” can be considered a new issue.

Before using the controller within the closed-loop system in real time, a simulation program must validate the designed controller via the process’s evolution along the imposed control horizon; the process and the PM are considered identical. The simulation’s results are usually sequences of control and state variables’ values along the control horizon that can be stored or recorded. This data is a mark of the system’s evolution made up of the control profile (sequence of control values) and state trajectory (sequence of state variables’ values). Repeating the simulation many times, we can aggregate these data and generate datasets for the ML algorithm. The simulations are conducted offline, so there is no execution time constraint. The time constraint mentioned above exists only when the closed loop works in real time.

An important remark is that we do not need data from the real process to be included in the datasets. The predictor module predicts optimal trajectories using only the EA and PM, whose inter-influence must be captured by the datasets. When an accurate ML model replaces the initial predictor, the controller should behave quasi-identically within the simulations, which is our goal. The ML model only generalizes in real time when the real process’s states are used as initial states.

This work will ascertain the previous considerations and propose an approach to construct the datasets and ML model, starting from the OCP to solve. Besides general considerations, we will apply the proposed methods to a specific OCP, exemplifying the proposed methods and algorithms to make the presentation easy to follow.

We will consider OCPs with a final cost that uses predictions to exemplify the equivalence mentioned above and implement ML controllers. Section 2 recalls the general approach to solving the OCPs using EAs developed in previous authors’ works [25,26]. The Park Ramirez Problem (PRP) (see [26,32,33]) is a kind of benchmark problem already treated in this context, which is addressed as an example. This paper will report partially previous results for comparison and take over the EA predictor’s implementation. Section 2 is mainly necessary because the optimal control using the EA will supply the datasets for ML. Although the aspects presented in this section are not among this paper’s contributions, they introduce the notations and keep the discourse self-contained.

Section 3 answers the following three basic questions:

What data do we need to capture the optimal behaviour of the couple (EA and PM)?
How do we generate the datasets for the ML algorithm?
What ML model can be used to design an appropriate controller (we will name it ML controller)?

These are general questions, each of which has subsumed aspects to clarify. Section 3.1, describing the proposed method, answers these questions succinctly; details will be given in the next sections. It also establishes a controller design procedure.

The starting point in our approach is that we have already solved the considered OCP, and the implemented controller has a prediction module using a specific EA. Section 3.2.1 describes an algorithm achieving the closed-loop simulation along the control horizon devoted to the EA predictor. The simulation will record the sequence of optimal control values—the optimal control profile—and the optimal trajectory (sequence of states). The two sequences can be regarded as a series of couples (state, control value), each couple associated implicitly with a sampling period.

Repeating the simulation M times (e.g., M equals 200), data concerning these optimal evolutions of the closed loop is collected and aggregated into a data structure presented in Section 3.2.2. These aggregated data structure characterizes the optimal behaviour of the couple (EA and PM) globally, that is, for the whole control horizon. The extraction of datasets characterizing each sampling period from the aggregated data creates the premise to find a model of optimal behaviour for each sampling period. Section 3.2.3 describes how the aggregated data is split into datasets for each sampling period. Moreover, training and testing datasets are created.

In Section 3.3, the ML models are constructed according to an important choice. The ML model will be a set of regression functions; a linear regression function will be determined for each sampling period [34,35,36]. The first reason for this choice is the model’s simplicity, which is important, especially when the control horizon is large. Secondly, the control law will be directly determined. This fact is appropriate for the controller implementation, which is now straightforward. Section 3.3.1 proposes simple models with linear terms for each state variable. In contrast, Section 3.3.2 uses the stepwise regression strategy to generate regression functions, which are allowed to include nonlinear terms as the interactions. The ML model succeeds in reproducing the optimal behaviour of the EA predictor with a high accuracy.

The simulation of the closed-loop system is the way to test the generalization aptitude of the new predictor after its implementation. In its first part, Section 4 describes an algorithm that simulates the closed-loop system using the ML controller along the control horizon. The second part compares the simulation results to those anteriorly obtained with the EA predictor for the PRP case [26]. The state evolutions are practically identical, and the performance index equals the best of M evolutions, which is true for the two types of regression functions.

The simulation results (in the PRP case and other case studies not presented in this paper) proved that the proposed ML models succeeded in apprehending the optimal behaviour of the couple (EA and PM) and engendered efficient controllers.

We consider that our work has the following findings:

The interesting issue itself which is to find an ML model experiencing the datasets generated by an EA (or another metaheuristic), trying to capture the latter’s optimal behaviour.
The dataset’s construction as a dynamic trace of the EA predictor, aggregating the trajectories and CPs.
The dataset extraction for each sampling period as a premise to find a temporally distributed ML model.
The design procedure for the ML controller and all associated algorithms (simulation and models’ construction algorithms).
The outstanding decrease in the ML controller’s execution time.

Special attention was addressed to the implementation aspects such that the interested reader can find support to apprehend and eventually reproduce parts of this work or use it in other projects. With this aim in view, all algorithms used in this work are implemented, the associated scripts are attached as supplementary materials, and all the necessary details are given in Appendix A, Appendix B, Appendix C, Appendix D and Appendix E.

2. Optimal Control Using Evolutionary Algorithms

This section recalls the general approach to solving the OCPs using EAs developed in anterior papers [25,26]. The minimal elements presented here introduce the notations and keep the discourse self-contained.

2.1. Optimal Control Problems with a Final Cost

The structure of an OCP being well known, we consider in the sequel only the defining elements adapted to the problem taken as an example in this section. Rigorous mathematical details will be avoided to simplify the presentation.

2.1.1. Process Model

In our approach, the controller includes a process model constituted by algebraic and ordinary differential equations:

\{\begin{array}{l} \dot{X} (t) = {[f_{1} (X, U, W) \dots f_{n} (X, U, W)]}^{T} \\ g_{i} (X, U, W) = 0, i = 1, \dots, p \end{array}, where

(1)

where

X (t) = {[x_{1} (t) \dots x_{n} (t)]}^{T}

—a vector with n state variables;

U (t) = {[u_{1} (t) \dots u_{m} (t)]}^{T}

—a vector with m control variables.

An example of a process model is Equation (7) from Section 2.1.4.

2.1.2. Constraints

There are many constraint types, but we mention only those used in the case study presented in this paper.

Control horizon : t \in [t_{0}, t_{f i n a l}]; t_{0} = 0,

(2)

Initial state : X (0) = X_{0} \in R^{n}

(3)

Bound constraints : u_{j} (t) \in Ω_{j} ≜ [u_{\min}^{j}, u_{\max}^{j}]; j = 1, \dots, m; 0 < t < t_{f i n a l} .

(4)

The values

u_{\min}^{j}, u_{\max}^{j}

are the technological bounds of the variable

u_{j} (t)

.

If T is the sampling period of the control system, we can divide the control horizon into H sampling periods:

t_{f i n a l} = H \times T .

2.1.3. Cost Function

The problem is to determine the control function

U (\cdot)

optimizing (max or min) a specific cost (objective) function J, whose general form is given below:

J (U (\cdot), X_{0}) = \int_{0}^{t_{f i n a l}} L (X (τ), U (τ)) d τ + J_{f i n a l} .

(5)

The function L determines the integral component (Lagrange term) of function J, while J_final (Mayer term) rewards (or penalizes) the final state (in most cases).

Remark 1.

When the final cost is present, and the controller makes predictions, whether or not there is an integral term, the prediction horizon must end at the final time, involving the biggest computational complexity.

Given Remark 1, we consider only the final cost, a situation suited to our OCP (see Section 2.1.4).

J (U (\cdot), X_{0}) ≜ J_{f i n a l} = J (X (t_{f i n a l})) .

The problem’s solution is the function

U (\cdot)

that engenders the cost function’s optimal value. This value, J₀, is called the performance index:

J_{0} = \underset{U (\cdot)}{m a x} J (X (t_{f i n a l})) ≜ \underset{U (\cdot)}{m a x} J (U (\cdot), X_{0}) .

(6)

2.1.4. An Example of OCP with a Final Cost

The Park–Ramirez problem (PRP) is a kind of benchmark problem ([26,32,33]) that can exemplify a final cost OCP. The nonlinear process models a fed-batch reactor, which produces secreted protein. This problem has been addressed in many works to study integration methods.

Process Model (PM):

{\dot{x}}_{1} = g_{1} \cdot (x_{2} - x_{1}) - \frac{u}{x_{5}} \cdot x_{1} .

{\dot{x}}_{2} = g_{2} \cdot x_{3} - \frac{u}{x_{5}} \cdot x_{2} .

{\dot{x}}_{3} = g_{3} \cdot x_{3} - \frac{u}{x_{5}} \cdot x_{3}

{\dot{x}}_{4} = - 7.3 \cdot g_{3} \cdot x_{3} + \frac{u}{x_{5}} \cdot (20 - x_{4})

(7)

{\dot{x}}_{5} = u

g_{1} = \frac{4.75 \cdot g_{3}}{0.12 + g_{3}}

g_{2} = \frac{x_{4}}{0.1 + x_{4}} \cdot e^{- 5.0 \cdot x_{4}}

g_{3} = \frac{21.87 \cdot x_{4}}{(x_{4} + 0.4) (x_{4} + 62.5)}

The state vector

X (t) = {[x_{1} (t) \dots x_{5} (t)]}^{T}

regroups the following physical parameters:

x_{1} (t)

—concentration of secreted protein,

x_{2} (t)

—concentration of total protein,

x_{3} (t)

—density of culture cell,

x_{4} (t)

—concentration of substrate, and

x_{5} (t)

—holdup volume.

It holds n = 5; m = 1;

U (t) = u (t) \in R

.

Constraints:

Control horizon : t \in [t_{0}, t_{f i n a l}]; t_{0} = 0, t_{f i n a l} = 15 h .

Initial state : X (0) = X_{0} = {[0, 0, 1, 5, 1]}^{T} \in R^{5} .

Bound constraints : u (t) \in Ω ≜ [0, 2]; 0 < t < t_{f i n a l} .

Performance Index:

J_{0} = \max_{u (t)} J (x (t_{f i n a l})) = \max_{u (t)} X_{1} (t_{f i n a l}) \cdot X_{5} (t_{f i n a l})

An open-loop solution can not be used in real time because the process and PM have different dynamics (even when there are small differences); this will produce unpredictable efficiency. We want to generate a controlled optimal process (a closed-loop solution) starting from a given X₀ whose final cost should be J₀.

2.2. A Discrete-Time Solution Based on EAs

A control structure that can generate the optimal solution is RHC ([25,26]). Its controller includes the process model and a prediction module (see Figure 1). The last one predicts, at each moment kT, the optimal control sequence until the final time. Then, the controller outputs this sequence’s first element as the optimal value and inputs the next process’s state.

The prediction module using an EA proved a realistic solution due to many possibilities to reduce its execution time [26].

To use an EA, we append to our OCP the discretization constraint:

U (t) = U (k T), for k \cdot T \leq t < (k + 1) \cdot T; k = 0, \dots, H - 1 .

So, the control variables are step functions. For the sake of simplicity, the time moment

k \cdot T

will be denoted k in the sequel. For example, inside the sampling period

[k T, (k + 1) T)

, the control vector is

U (k T) \equiv U (k) ≜ {[u_{1} (k), \dots, u_{m} (k)]}^{T}

We name a “control profile” (CP) a complete sequence of H control vectors,

U (0), U (1), \dots U (H - 1)

. It will generate the transfer diagram drawn in Figure 2.

The EA yields candidate predictions over prediction horizons and evaluates the cost function J. For the sampling period [k, k + 1), a candidate prediction is a control sequence having the following structure:

\bar{U} (k) = 〈U (k), \dots, U (H - 1)〉 .

(8)

The vector

X (k)

is the process’s current state. It is also the initial state for the candidate prediction with H − k elements. This fact justifies the appellation of “Receding Horizon Control”. Using Equations (1) and (8), the EA also calculates the corresponding state sequence (with H − k + 1 elements):

\bar{X} (k) = 〈X (k), \dots, X (H)〉 .

(9)

At convergence, the EA returns (to the controller) the optimal prediction sequence, denoted

\bar{V} (k)

:

\bar{V} (k) ≜ \arg \max_{\bar{U} (k)} J (\bar{U} (k), X (k)) = 〈V (k), \dots, V (H - 1)〉 .

(10)

Finally, the first value of the sequence, V(k), becomes the controller’s best output, denoted U^*(k), sent towards the process:

U^{*} (k) ≜ V (k) .

(11)

Remark 2.

The optimal control

U^{*} (k)

is also a function of the current state

X (k)

, which does not appear as a distinct argument in (11) to keep the notation simple and easy to follow. Nevertheless, this dependence is essential for the machine learning models as well.

The other values of the sequence

\bar{V} (k)

are forgotten, and the controller will treat the next sampling period [k + 1, k + 2).

The controller equipped with the EA constructs the optimal CP for the given initial state X₀ = X(0) and the entire control horizon, concatenating the optimal controls

U^{*} (k)

k = 0, \dots, H - 1

. It forces the system to pass through a sequence of “optimal states”, the optimal trajectory

Γ (X_{0})

:

Ω (X_{0}) ≜ 〈U^{*} (0), U^{*} (1), \dots, U^{*} (H - 1)〉

(12)

Γ (X_{0}) ≜ 〈X_{0}, X^{*} (1), \dots, X^{*} (H)〉

(13)

These two sequences completely characterize the optimal evolution of the closed loop over the control horizon. Theoretically, the optimal cost function will reach the value J₀ if the process and its model are identical. Practically, this value will be very close to J₀, such that Ω(X₀) is a quasi-optimal solution of our OCP.

The flowchart of the prediction module is drawn in Appendix A. The interested reader can find the main characteristics of the implemented EA in [26] or the supplementary materials appended to this work. In our implementation, the EA’s code is presented in the script RHC_Predictor_EA.m. The initial population is generated using the control variables’ bounds. The cost function is coded within the file eval_PR_step.m. All scripts are included in the folder ART_Math, as the other functions called by the predictor and implementing the EA’s operators and the PM.

3. Controller Based on Machine Learning

3.1. The General Description of the Proposed Method

The PRP and other problems of this kind allow the validation of the designed optimal controller by simulating the control loop.

The simulation of the control loop is an important design tool in this context, which can supply the sequence

Ω (X_{0})

and

Γ (X_{0})

((12) and (13)) that describe the quasi-optimal evolution of the loop.

By repeating the control loop simulation M times, we obtain M different optimal (actually quasi-optimal) couples (CP—trajectory), even if the initial states would be identical due to the EA’s stochastic character. Moreover, in the case of PRP, the initial state can be perturbed to simulate the imperfect realization of the initial conditions when launching a new batch. Let us consider a lot of the M simulations illustrated in Figure 3.

At step k of the control horizon, the controller must predict the optimal control output (sent towards the process) using its predictor module based on the EA and MP described before. Data concerning the same step have some common aspects:

The initial process’s state $X_{i}^{*} (k), 1 \leq i \leq M$ is input data for the EA.
The prediction horizon is the same: H − k.
The PM is the same.
The M simulations use the same EA.

The EA calculates and returns the optimal control vector

U_{i}^{*} (k), 1 \leq i \leq M

. A dataset including the simulation results for step k can be constructed as follows:

The dataset resembles a table due to the transposition operator. If M is big enough, this dataset collects an essential part of the EA’s ability to predict optimal control values for step k. It would be useful to answer the question: How can we generalize this ability for other current state values the process could access at step k? A machine learning algorithm is the answer. For example, linear regression (see [34,36]) can construct a function

f_{k} : R^{n} \to R^{m}

for each

k, k = 0, 1, \dots, H - 1

using a dataset like that presented before.

Remark 3.

The linear regression function

f_{k}

models how the EA determines the optimal prediction at step k. The set of functions

Φ = {f_{k} |k = 0, 1, \dots, H - 1}

is the couple (EA–PM) machine learning model. The behaviour of the EA, which, in turn, depends on the PM, is captured by the set of functions

Φ

.

Hence, it would be possible to successfully replace the predictor with EA by this set of functions and obtain a faster controller.

Logically, a few steps lead us to a design procedure for the optimal controller.

Design procedure:

Implement a program to simulate the closed loop functioning over the control horizon (H) using the controller based on the EA. To simplify the presentation, we will call it ControlLoop_EA in the sequel. The output data are the quasi-optimal trajectory and its associated control profile ( $Ω (X_{0})$ and $Γ (X_{0})$ ).
Repeat M times the module ControlLoop_EA to generate and save M quasi-optimal trajectories and their control profiles.
Extract, for each step k, datasets similar to Table 1 using the data saved at step 2.
Determine the machine learning model (for example, the set of functions $Φ$ ) experiencing the M trajectories and control profiles.
Implement the new controller based on ML. It will be called ML controller in the sequel.

Simulation of the closed loop using the new controller:

6.: Write a simulation program called ControlLoop_ML for the closed loop working with the new controller. This simulation will test the feasibility of the proposed method, the quality of the quasi-optimal solution, the performance index, and the execution time of the new controller.

The set of functions

Φ

can determine the optimal CP starting from a given state

X_{0}

, following the transfer diagram from Figure 2, and applying the functions

f_{0}, f_{1}, \dots, f_{H - 1}

to the current state successively:

U_{1}^{*} = f_{0} (X_{0}); U_{2}^{*} = f_{1} (X_{1}); \dots U_{H - 1}^{*} = (X_{H - 1}) .

(14)

Remark 4.

The design procedure shows that the new controller is completely designed offline. No data collected online from the process is necessary. The result of this procedure is the new controller, ML controller, which could be used in real time after a robustness analysis.

3.2. Dataset Generation for Machine Learning Model

The first two steps of the design procedure will be described in this section, trying to keep generality. Only some aspects will refer to the PRP to simplify the presentation.

3.2.1. The Simulation of the Closed-Loop System Based on EA Predictions

This subsection corresponds to step 1 of the design procedure. Figure 4 shows the flowchart of the simulation program for the closed loop (the script ControlLoop_EA.m).

At every moment k, the controller calls the predictor based on the EA, RHC_Predictor_EA. The last one returns the optimal control value

U^{*} (k)

, which is used by the function RHC_RealProcessStep to determine the next process’s stat

The optimal control value and the optimal states are stored in the matrices uRHC (H × m) and state (H × n), respectively, having the structure presented in Figure 5.

Hence, the optimal CP and trajectory are described by the matrix uRHC and state, respectively, which are the images of

Ω (X_{0})

and

Γ (X_{0})

sequences (see (12) and (13)).

In the case of the PRP, an example of matrices describing a quasi-optimal evolution is given in Figure 6. Notice that this time, m = 1 and the 16th state is the final one.

3.2.2. Aggregation of Datasets concerning M Optimal Evolutions of the Closed Loop

This subsection corresponds to step 2 of the design procedure. The controller’s optimal behaviour learning process needs data from an important number of optimal evolutions. Practically, the program ControlLoop_EA.m will be executed repeatedly M times (e.g., M = 200) in a simple loop described in the script LOOP_M_ControlLoop_EA.m. The objective is to create aggregate data structures and store the optimal trajectories and their CPs.

Figure 7 illustrates possible data structures, a cell array for M tables storing the trajectories, and a matrix storing their CPs.

The optimal CPs could also be stored in a cell array, but in the PRP case (m = 1), a matrix (M × H) can store these values simpler. We call it UstarRHC (200 × 15); on each line, it memorizes the transposed vector uRHC from Figure 6.

The performance index values are also stored in a column vector JMAT (M × 1) for different analyses. All these data structures could be saved in a file for ulterior processing.

3.2.3. Extraction of Datasets Characterizing Each Sampling Period

This subsection details step 3 of the design procedure. For each sampling period k, a dataset similar to Table 1 is extracted from STATE and UstarRHC data structures.

Considering k,

0 \leq k \leq H - 1

, is already fixed, we generate a matrix

S O C S K \in R^{M \times (n + m)}

that gathers the states and optimal control values concerning step k from all the M experiences. Line i,

1 \leq i \leq M

, concatenates data from experience i:

S O C S K_{i} \leftarrow [{(X_{i}^{*} (k))}^{T} {(U_{i}^{*} (k))}^{T}] .

Using the data structures defined before, it holds

S O C S K_{i} \leftarrow [S T A T E_{i} (k, 1 : n) U s t a r R H C (i, k)] .

STATE_i designates the i^th component of the cell array STATE. In the PRP case, we have

S O C S K = [\begin{array}{l} x_{1} {(k)}^{1} & x_{2} {(k)}^{1} & x_{3} {(k)}^{1} & x_{4} {(k)}^{1} & x_{5} {(k)}^{1} & u {(k)}^{1} \\ \dots & \dots & \dots & \dots & \dots & \dots \\ x_{1} {(k)}^{i} & x_{2} {(k)}^{i} & x_{3} {(k)}^{i} & x_{4} {(k)}^{i} & x_{5} {(k)}^{i} & u {(k)}^{i} \\ \dots & \dots & \dots & \dots & \dots & \dots \\ x_{1} {(k)}^{M} & x_{2} {(k)}^{M} & x_{3} {(k)}^{M} & x_{4} {(k)}^{M} & x_{5} {(k)}^{M} & u {(k)}^{M} \end{array}] .

Remark 5.

Because the controller’s optimal behaviour has repercussions on each sampling period, the learning of its optimal behaviour will be split at the level of each interval [k, k + 1).

Figure 8 gives the flowchart of the data processing to obtain datasets for training and testing the machine learning algorithm at the level of each moment k (datakTrain and datakTest). To save these datasets for each k, we will use cell array DATAKTrain (

H \times 1

) and DATAKTest (

H \times 1

).

After constructing the matrix SOCSK, we will convert it into a table, which seems more convenient for processing datasets for machine learning in some programming and simulation systems. The result is the table named datak, which has variables and properties. After that, this table’s lines are split into table datakTrain with 140 examples for training (70%) and table datakTest with 60 data points for testing (30%). Finally, these tables are stored in cell #k of the DATAKTrain and DATAKTest array, respectively. They will be used by the machine learning algorithm ulteriorly.

3.3. Construction of Machine Learning Models

This section covers step #4 of the design procedure. As stated by Remark 3, the set of functions

Φ = {f_{k} |k = 0, 1, \dots, H - 1}

is the machine learning model for the optimal behaviour of the couple (PM—EA). The data points are couples (process state—optimal control values) related to moment k, which the function

f_{k}

can learn.

Of course, another kind of machine learning model globally treating the learning process (PM—EA’s control profile) could be addressed without splitting the learning at the level of sampling periods. The resulting model would be more complex and difficult to train and integrate into the controller.

In this work, we mainly chose multiple linear regression as a machine learning algorithm because of its simplicity. This characteristic is important, especially when H is large. Secondly, the linear regression functions

f_{k}

for each sampling period are appropriate for a controller implementation; these functions directly give the control law (see Equation (14)). To emphasize this aspect, Algorithm 1 presents the general structure of the ML controller when using the set of functions

Φ

.

Algorithm 1. The structure of the controller’s algorithm using linear regression functions.
1	Get the current value of the state vector, X(k); /* Initialize $k and X (k)$ */
2	$U^{} (k) \leftarrow f_{k} (X (k))$ / see Equation (14) */
3	Send $U * (k)$ towards the Process.
4	Wait for the next sampling period.

We can determine the set of functions

Φ

via multiple linear regression considering different models containing an intercept, linear terms for each feature (predictor variable), products of pairs of distinct features (interactions), squared terms, etc. In other words, the resulting functions could be nonlinear as functions of process states.

In our case study, we will also apply the strategy of stepwise regression that adds or removes features starting from a constant model.

3.3.1. Models with Linear Terms for Each State Variable

Remark 6.

Our objective is not to find the “best” set of linear regression models but to prove that our approach is working and this new model can replace the EA.

That is why, for the beginning, we adopt a very simple model such that each regression function

f_{k}

is a simple linear model involving only linear terms for each state variable and an intercept. Each model is trained and tested separately, considering the datasets already prepared in the cell arrays DATAKTrain and DATAKTest.

The construction of these models is presented by the pseudocode in Algorithm 2. The datasets for training and testing, described in Section 3.2.3, are now input data for this algorithm. The coefficients of each function

f_{k}

having the form (15), are stored in an output matrix called COEFF. These models are objects stored in an output cell array called MODEL (

H \times 1

). Line #4 creates the model “mdl” using the function fitting_to_data that fits the function (15) to the dataset “datakTrain”. At line #6, the function get_the_coefficients extracts the six coefficients, which are put in a line of matrix COEFF afterwards.

f_{k} (X (k)) = C^{k}_{0} + C^{k}_{1} \cdot x_{1} (k) + C^{k}_{2} \cdot x_{2} (k) + \dots + C^{k}_{n} \cdot x_{n} (k),

(15)

The predicted values corresponding to datakTest are stored in the vector “uPred” by the function fpredict to be compared with the experienced values. Details concerning the implementation of fitting_to_data, get_the_coefficients, and predict functions are given in Appendix B.

Algorithm 2. The pseudocode of the models’ construction.
	/* This pseudocode describes the training and testing of the linear models set $Φ$ */
	Input: cell arrays DATAKTrain, DATAKTest
	Output: matrix COEFF ( $H \times (n + 1)$ ), cell array MODEL{ $H \times 1$ } storing objects that
	are the linear models $f_{k}$
1	for k = 0…H − 1.
2	datakTrain $\leftarrow$ DATAKTrain{k}; /* Recover the dataset from the cell array if it was saved in a file */
3	datakTest $\leftarrow$ DATAKTest{k}; /* Recover the dataset from the cell array if it was saved in a file */
4	mdl $\leftarrow$ fitting_to_data(datakTrain); /* Create the linear regression model that fits datakTrain */
5	#display mdl;
6	coef(:) $\leftarrow$ get_the_coefficients(mdl)
7	COEFF(k,:) $\leftarrow$ coef(:); /* save the coefficients in the corresponding line of matrix COEFF*/
8	MODEL(k,1) $\leftarrow$ mdl;
9	uPred $\leftarrow$ fpredict(mdl, datakTest) /* The predicted control values are stored in the vector uPred */
10	# Represent in the same drawing the values of uPred and datakTask’s last column for comparison.
11	end.

For the PRP, e.g., the coefficients are given in Table A1 (Appendix B).

A comparison of the values in “datakTest” to the predicted values “uPred” is given in Figure 9 [34,36]. The blue line is the plot of test values against themselves.

The 60 predicted values are disposed along the blue line at a distance inferior to 0.2 for most of them.

3.3.2. Models Constructed via Stepwise Regression

This section will propose a more elaborate model that allows the possibility to include nonlinear terms as the interactions, that is, the product of predictor variables (ex.: x₁*x₄).

As an example, for the PRP case, we will apply the strategy of stepwise regression that adds or removes features starting from a constant model. This strategy is usually implemented by a function

stepwise (T)

returning a model that fits the dataset in Table T.

model \leftarrow stepwise (T)

The script GENERATE_ModelSW.m constructs de set of models using this function (see Appendix C). This script yields the regression functions given in Table 2, where the colored terms are, actually, nonlinear.

A fragment of the listing generated by the script mentioned above is given in Figure 10.

We can see how the function

stepwise

works and what are the statistic parameters that validate the model.

4. Simulation of the Control-Loop System Equipped with the ML controller

To evaluate the measure in which the set of functions

Φ

succeeded in “learning” how the EA acts as an optimal predictor, we will refer to the simulation results of the control loop. The script ControlLoop_ML.m implements step #6 of the design procedure and is described by the pseudocode in Algorithm 3.

Algorithm 3. The pseudocode of the control loop simulation using linear regression functions.
	ControlLoop_ML (MODEL, X0)
	/* This pseudocode describes the simulation of the closed loop that uses the proposed controller (the linear models set $Φ$ ) */
	Input: MODEL $(H \times 1$ $) storing objects that are the linear models f_{k}$ ,
	X0: the initial process’s state.
	Output: The vector uML (1, H) representing the quasi-optimal CP,
	the matrix State (H + 1, n) representing the quasi-optimal trajectory.
1	# Initializations: technological bounds umin, umax;
2	# The matrix State will store $X (k)$ , k = 0, …, H – 1. Initially, it is put to zero.
3	State (1, :) $\leftarrow$ X0
4	for k = 0…H − 1.
5	mdl $\leftarrow$ MODEL{k}; /* mdl is a linear regression model */
6	uML(k) $\leftarrow$ feval(mdl, X0(1), X0(2), …, X0(n));
7	# Limit to umin or umax the value of uML(k).
8	X0 $\leftarrow$ step_PP_RH(uML(k), X0) /* Determine the new state the process evolves when the control value uML(k) is applied. */;
9	State(k + 1,:) $\leftarrow$ X0.
10	end.
11	return uML and State

In line #6, the function feval returns the value uML(k) that the model mdl predicts when the current state is X0 (also a local variable). Instead of using feval, one can use the product coefficients—state variable.

The function step_PP_RH(uML(k), X0) calculates the next state (at the moment k + 1) by integration of state Equation (7). The codes for all the proposed functions are given in the folder ART_Math.

In the PRP case, we used ControlLoop_ML.m to simulate the control loop using the models constructed in Section 3.3.1. The ML controller yielded the CP drawn in Figure 11. This one engenders the quasi-optimal state evolution depicted in Figure 12b, which can be compared to the typical evolution, Figure 12a, generated by the RHC endowed with an EA. Figure 12a was produced within the authors’ previous work [21].

The resemblance between the two process’s responses is very high and proves that the set of functions

f_{k}

succeeded in emulating the optimal behaviour of the EA. Moreover, the performance index’s value achieved by the CP in Figure 12b is very good

J_{0} = 32 . 0986

because it equals the maximum value (J₀ = 32.0936) recorded in the M data points generated by the EA. Hence, the PRP’s solution found by the ML model is also quasi-optimal.

The Controller’s Execution Time

Because the initial reason for this research is just the EA’s execution time decrease, we compared the execution time of the two controllers, the EA and ML controllers. Actually, we compared the simulation times for the two closed loops with the EA and ML models because we have already written the programs ControlLoop_EA.m and ControlLoop_ML.m. The simulations were carried out using MATLAB-2023 system and the processor Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz. The resulting times are 38 s and 0.08 s, respectively (see Appendix D). The average execution times of the controller are 38/H and 0.08/H seconds. H equals 15 in the PRP case.

Remark 7.

The decrease in the controller’s execution time is outstanding, all the more so because the ML controller keeps the evolution accuracy. In anterior research, we have obtained a small decrease but with an acceptable accuracy decrease as well.

How can we explain this outstanding decrease in the controller’s execution time? The key is how the EA predictor works. The EA algorithm generates a population of solutions and then evaluates the cost function by numerical integration of the PM for each solution and along many generations. The PM’s numerical integration for thousands of times is the most time-consuming. Nothing of these happens when the ML model works, only a single evaluation of a simple regression function. The EA’s computational complexity is huge compared to a few calculations.

The model developed in Section 3.3.2

Φ_{2}

(considering

Φ_{1}

the model constructed in Section 3.3.1) leads us to a new ML controller. This time, we used the script ControlLoop_MLSW.m (see Appendix C) to simulate the closed-loop system. Figure 13 plots the resulting CP in a blue line. The CP in Figure 11 is also plotted in red to facilitate the comparison.

The two CPs are practically identical, which will engender the similarity of the state evolutions. Figure 14 confirms that situation and plots the state evolutions involved by the two controllers.

This remarkable similarity is because the models

Φ_{1}

and

Φ_{2}

emulate the same couple (EA-PM), and both carry it out very well. Consequently, the performance index is practically the same (

J_{0} = 32 . 0986

). Although the results are identical, the sets of regression functions are different. The functions set

Φ_{2}

seems more appropriate for the controller’s implementation because of their simpler formulas.

In real time, the difference between the PM’s state and the state of the process is the biggest problem. We can consider that the process is affected by a noise representing this difference. When the noise is important, the control loop could loose totally its functionality. Our desideratum is that the controller keeps acceptable performances for a significative noise range. Inside this range, the controller must reject this difference as much as possible, involve a quasi-optimal process evolution, and produce a performance index near J₀ (see Appendix E).

5. Discussion

This paper positively answers the issue raised in Section 1: whether an ML algorithm could “learn” the optimal behaviour of the predictor based on an EA. The proof was made in the context of OCPs, giving rise to some findings:

The issue statement oneself: To find an ML model experiencing the datasets generated by the couple (EA and PM), trying to capture its optimal behaviour. An EA is a product of computational intelligence. This link between two “intelligent” entities is interesting; further developments can be derived from this “equivalence”. The same issue can be considered when instead of EA is another metaheuristic.
The dataset’s construction as a dynamic trace of the EA predictor by aggregating the trajectories and CPs. The number M is a procedure’s parameter established according to the process complexity. The dynamic trace must include couples (state, optimal control value) spread throughout the evolution space. In this way, the ML model could generalize well.
The dataset extraction for each sampling period, which is a premise to find an ML model for each k.
The outstanding decrease in the ML controller’s execution time.
The design procedure for the ML controller and all associated algorithms (simulation and models’ construction algorithms).

Finding #3 is related to the fact that we are looking for an ML model comprising a set of regression functions

Φ

. We underline the motivation: its simplicity, especially when the control horizon has many sampling periods. If need be, obtaining the model during the sampling period is conceivable. On the other hand, a regression function for each sampling period means that the control law is directly implemented. The controller implementation is now straightforward.

The degree to which the ML controller accurately reproduces the couple (EA and PM)’s behaviour outclasses another achievement of the controller: its execution time. Because the initial reason for this research is just the EA’s execution time decrease, this subject deserves more attention. The simulation time for the closed loop using the EA predictor for the PRP case is 38 s, while the same simulation using the ML controller is 0.08 s. This outstanding achievement is due to three factors:

The ML model is split at the level of each sampling period; a single regression function $f_{k}$ is the current model;
A regression function has a very simple expression, which is, in fact, just the control law;
The PM’s numerical integration is totally avoided.

According to the authors’ experience addressing this subject, the execution time decrease to a small extent is carried out by paying the price of predictions’ accuracy decrease as well. In the case of the ML controller, remarkably, that does not happen; the accuracy is kept.

As we mentioned before, predictors based on metaheuristics are usually used for slow process control. Owing to its small execution time, the ML controller largely extends the set of processes, which can be controlled using EA or another metaheuristic. We must implement the EA predictor in the first phase of the controller’s design, which will produce the datasets for the ML algorithm’s training. Finally, the ML predictor will be used as a faster equivalent of the EA predictor in the control structure.

Special attention was addressed to the implementation aspects related to the PRP case. All algorithms used in this work are implemented, the associated scripts are attached as supplementary materials, and all the necessary details are given in Appendix A, Appendix B, Appendix C, Appendix D and Appendix E. Readers can find support to apprehend and eventually reproduce parts of this work.

Finally, the ML models and the controller must fulfil their task in real conditions, namely when the control system works with a real process. Although the design procedure does not depend on the real process, an interested reader can ask how to forecast the closed-loop behaviour when the real process and the PM differ. At the end of section #4 and in Appendix D, some elements can help to simulate this situation.

To simplify the presentation, we applied the proposed methods only in the PRP case. The methods were also applied in other case studies not presented in this paper, with the same favourable conclusion; the proposed ML models succeeded in apprehending the optimal behaviour of the couple (EA and PM) and yielded efficient controllers.

Obviously, another ML model treating globally the learning process is conceivable, without splitting the learning at the level of sampling periods. The resulting model would be more complex and difficult to train and integrate into the controller, but it could be useful in other applications. In a future research project, we will make investigations in this direction.

6. Conclusions

The ML algorithm presented in this paper is not in line with the control techniques developed inside the control systems theory, like the PID, adaptive, robust, nonlinear control, optimal control, MPC, RHC, etc. Their result is a controller that controls a process and fulfils the control objectives. The proposed ML model, stemming from computer science theory, is the “intelligent equivalent” of the optimal behaviour of the couple (EA and PM). We need the dataset capturing the couple’s optimal behaviour to obtain such a model. Consequently, we must implement the PM and the EA controller and simulate the closed loop running many times to obtain the dataset. In other words, we must implement the control technique, RHC with EA predictions, as a sine-qua-non condition to construct the ML model. Once available, the latter will replace the EA predictor to achieve the so-called ML controller.

Our proposed approach is suitable to solve our main problem, the predictor’s execution time decrease, owing to the following aspects:

-: The controller’s design procedure is feasible using only offline simulations of the closed loop (with EA and PM). The real process states’ evolution is not needed in this phase. All we need is a large dataset that captures the response of the EA predictor in different states and times, regardless of if the state belongs to a real or simulated process. The EA predictor acts in the same way because it uses only the PM in both situations.
-: The regression functions are expressed straightforwardly by simple formulas, which are actually the control laws for each sampling period.
-: The ML controller works very well in closed-loop mode; it generalizes accurately when the controller obtains the real process states.
-: The controller’s execution time decreases remarkably.

The simulations described and used in our work, except the one from Section 4, are parts of the controller’s design procedure. The result of this procedure, the ML controller, was tested in Section 4 to see if it generalizes well in a closed-loop structure with a real process. This time, we conducted a simulation study considering the process model identical to the PM. The process evolutions in both cases (with EA and ML controllers) were quasi-identical, proving that the ML model generalizes very accurately. The case when the PM is affected by an additive noise is also briefly addressed.

Obviously, the ML controller could be used successfully in a real-time control application. The PRP, used as an example, is not a theoretical problem but a real one. So far, many control techniques have been successfully implemented in real time, from numerical integration methods to RHC. As in many other chemical or biochemical processes, every batch must be optimally controlled (with the same parameters [27,28]). The authors have been involved in a real-time control application concerning microalgae growth [8] that could be addressed in a future project using the ML controller.

However, there is a more important challenge; the ML controller opens a new perspective to control fast processes (having small time constants) besides slow processes (as biochemical processes). Even a small sampling period could be sufficient to calculate a simple formula extracted from the ML model. So, the RHC structure with EA and PM can be extended to a wider range of processes but using the ML predictor when implementing.

The main objective of the presented work was to prove that a machine learning algorithm could “learn” the optimal behaviour of the couple (EA and PM). The proposed algorithm is a multiple linear regression that helped us to fulfil our main objective; we proved that optimal behaviour can be learned. The ML model is a set of regression functions, one for each sampling period. This choice is not to avoid better models, possibly more difficult to construct. Its simplicity, i.e., materialized by a non-complex formula, makes it suitable, especially when the control horizon has many sampling periods. The second reason is that the regression function directly implements the control law for each sampling period. The optimal control value’s computing involves simple and fast calculations. That is not the case for the nonparametric techniques (Support vector machines, decision trees, Gaussian process regression, and neural networks, …) that should call prediction functions to return optimal control values.

Our future work concerning this topic could have two directions. The first refers to regularising the proposed linear regression model (Ridge and Lasso Regression and Elastic Net). The second one will aim to construct a global ML model, which will not be split at the level of each sampling period. We will use nonparametric learning models (especially Gaussian process regression and neural networks). The advantages mentioned above will be lost, but we expect to obtain global ML models covering the entire control horizon for more complex PMs.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math12020187/s1, The archive “ART_Math.zip” contains the files mentioned in Appendix A, Appendix B, Appendix C, Appendix D and Appendix E.

Author Contributions

Conceptualization, V.M.; methodology, I.A. and V.M.; software, V.M.; validation, V.M. and I.A.; formal analysis, V.M.; investigation, I.A.; resources, I.A.; data curation, I.A.; writing—original draft preparation, V.M.; writing—review and editing, V.M. and I.A.; visualization, I.A.; supervision, I.A.; project administration, I.A.; funding acquisition, I.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data was included in the manuscript.

Acknowledgments

This work benefited from the administrative support of the Doctoral School of “Dunarea de Jos” University of Galati, Romania.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

The Prediction Function Using an EA

Figure A1. The flowchart of the prediction function using an EA (RHC_Predictor_EA.m).

The cost function values of the offspring are calculated inside the EA’s operators. Variable F is used to stop the iterative process towards the optimal solution. When the performance index is close to J₀, the EA stops, considering that a very good solution is found, and the best solution becomes the predicted control sequence.

Appendix B

The Models’ Construction Script

Our implementation is based on the MATLAB system, in which “fitlm”, “model.Coefficients”, and “predict” correspond to fitting_to_data, get_the_coefficients, and fpredict (the functions proposed in Section 3.3.1), respectively. The algorithms in Figure 8 and Algorithm 1 are joined and implemented by the script Model_Construction.m. The coefficients found by the script Model_Construction.m are listed in the table below.

Table A1. The coefficients of the linear regression f_k.

k	C0	C1	C2	C3	C4	C5
0	−38.079	0	0	14.128	6.0244	−6.0264
1	5.6291	−1.3608 × 10⁷	2.4367 × 10⁶	−3.7392	−0.47639	1.139
2	−38.33	3.0713 × 10⁵	2.2759 × 10⁵	21.679	1.5663	1.8422
3	−28.268	−40,144	4487.7	14.638	1.1171	1.2135
4	−29.163	1.6973 × 10⁵	−1.6184 × 10⁵	14.311	1.2206	0.63983
5	−88.374	95,097	1.0768 × 10⁵	40.145	4.0721	0.78088
6	−25.715	3.5858 × 10⁵	−3.7572 × 10⁵	13.047	1.5373	−1.0067
7	−62.516	−4.657 × 10⁵	4.4453 × 10⁵	29.336	4.0186	−2.0782
8	45.254	−1.387 × 10⁷	1.3915 × 10⁷	−13.454	−0.83017	−2.415
9	504.78	−2.211 × 10⁶	2.2891 × 10⁶	−192.32	−23.957	−1.8747
10	−236.04	−82.588	11.879	95.084	12.898	−1.3842
11	−147.92	0.58548	0.29874	57.825	7.5075	−0.34759
12	93.46	−0.37731	0.76193	−35.138	−4.7889	0.094402
13	34.849	−1.0111	0.66035	−13.353	−0.79003	0.1234
14	4.0848	−0.30413	0.29959	−1.6444	−0.43335	0.11348

Appendix C

Implementation of the Stepwise Strategy

The stepwise strategy is implemented via the function “stepwiselm”, which adds or removes terms to the current model. The set of available terms is addressed, and the one having an F-test for adding it with a p-value of 0.05 or less is added to the model. If no terms can be added, the function evaluates the terms in the model and removes that one having an F-test for removing it with a p-value of 0.10 or greater. This process stops when no terms can be added or removed.

The models from Section 3.3.2 are generated using this function for each k in the script GENERATE_ModelSW.m. The latter has the same structure as Model_Construction.m, except the function stepwiselm is used instead of fitlm.

To simulate the closed loop, we used the script ControlLoop_MLSW.m, which is very similar to ControlLoop_ML.m but uses a different mat-file. Finally, it calls the script DRAW_ML to plot the CP and the state evolution.

Appendix D

Execution Time

Because the ControlLoop_ML is conceived as a function (pseudocode in Algorithm 3), the closed-loop simulation is made by the script SIM1.m, which calls it after loading the ML model. The simulation processor is Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz.

Finally, the comparison is made between the execution time of two programs, ControlLoop_EA.m and SIM1.m. The results are 38 s and 0.8 s, respectively.

Appendix E

The ML controller’s Capacity to Reject Noises Affecting the Process State

The general problem of noise modelling depends on the process specificity and overpass the aim of this work. Our immediate goal is to provide a simple technique to test the existence of a noise range where the controller keeps acceptable performances.

In our simulation, we considered the noise is added to the PM’s state variables. For our example, the noise I

n_{i}

, equivalent to all influences, is added to the PM’s vector state:

x_{i} (k) \leftarrow x_{i} (k) + n_{i}, i = 1, \dots, 5

We adopted the hypothesis that the noise is a random variable RT in the interval [−L_i, L_i], where

L_{i} = p \cdot |x_{i} (k)|, 0 < p < 1

Hence, the noise depends on the state variable’s absolute value to avoid annulling its influence. We have chosen a constant value of 4%, which means an 8% interval for placing the noise. The controller keeps the regression models from Figure 10.

These elements have been inserted into the control loop simulation script (ControlLoop_ML_noise.m), which has been carried out many times. Algorithm 3 presents only three CPs and the associated performance index to evaluate whether the control loop has a margin of robustness to noise.

Table A2. Three simulations of the control loop with noise.

Control Profile	J
uML₁ = [0.1570 0.2926 0.0000 0.8799 0.5285 0.8301 1.1906 1.4888 1.6225 2.0000 0.0894 0.8711 0.9350 0.9830 1.2488]	29.6884
uML₂ = [0.1570 0.2842 0.3744 0.3296 0.6528 0.7238 1.2591 1.4624 1.5555 2.0000 0.1898 0.8700 0.9106 0.9412 1.1815]	28.4537
uML₃ = [0.1570 0.2831 0.0873 0.7395 0.5383 0.7105 1.2914 1.4879 1.6041 1.9905 0.0962 0.8693 0.8938 0.9029 1.1856]	31.8093

The state evolution for the two first simulations is presented in Figure A2.

After simulations, some conclusions can be drawn:

-: The controller succeeded in keeping the stability, but the state evolution changed the look compared to Figure 12b.
-: The value $J$ is generally smaller than $J_{0}$ and has different values for different simulations because the noise has random values.
-: Technological aspects must be considered when deciding whether or not the controller works acceptably.

Figure A2. The state evolution for two simulations with additive noise (a) CP = uML_1; (b) CP = uML₂.

References

Siarry, P. Metaheuristics; Springer: Berlin/Heidelberg, Germany, 2016; ISBN 978-3-319-45403-0. [Google Scholar]
Talbi, E.G. Metaheuristics—From Design to Implementation; Wiley: Hoboken, NJ, USA, 2009; ISBN 978-0-470-27858-1. [Google Scholar]
Kruse, R.; Borgelt, C.; Braune, C.; Mostaghim, S.; Steinbrecher, M. Computational Intelligence—A Methodological Introduction, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
Faber, R.; Jockenhövelb, T.; Tsatsaronis, G. Dynamic optimization with simulated annealing. Comput. Chem. Eng. 2005, 29, 273–290. [Google Scholar] [CrossRef]
Onwubolu, G.; Babu, B.V. New Optimization Techniques in Engineering; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
Valadi, J.; Siarry, P. Applications of Metaheuristics in Process Engineering; Springer International Publishing: Berlin/Heidelberg, Germany, 2014; pp. 1–39. [Google Scholar] [CrossRef]
Minzu, V.; Riahi, S.; Rusu, E. Optimal control of an ultraviolet water disinfection system. Appl. Sci. 2021, 11, 2638. [Google Scholar] [CrossRef]
Minzu, V.; Ifrim, G.; Arama, I. Control of Microalgae Growth in Artificially Lighted Photobioreactors Using Metaheuristic-Based Predictions. Sensors 2021, 21, 8065. [Google Scholar] [CrossRef] [PubMed]
Abraham, A.; Jain, L.; Goldberg, R. Evolutionary Multiobjective Optimization—Theoretical Advances and Applications; Springer: Berlin/Heidelberg, Germany, 2005; ISBN 1-85233-787-7. [Google Scholar]
Hu, X.B.; Chen, W.H. Genetic algorithm based on receding horizon control for arrival sequencing and scheduling. Eng. Appl. Artif. Intell. 2005, 18, 633–642. [Google Scholar] [CrossRef]
Hu, X.B.; Chen, W.H. Genetic algorithm based on receding horizon control for real-time implementations in dynamic environments. In Proceedings of the 16th Triennial World Congress, Prague, Czech Republic, 4–8 July 2005; Elsevier IFAC Publications: Amsterdam, The Netherlands, 2005. [Google Scholar]
Mayne, D.Q.; Michalska, H. Receding Horizon Control of Nonlinear Systems. IEEE Trans. Autom. Control 1990, 35, 814–824. [Google Scholar] [CrossRef]
Attia, S.A.; Alamir, M.; De Wit, C.C. Voltage Collapse Avoidance in Power Systems: A Receding Horizon Approach. Intell. Autom. Soft Comput. 2006, 12, 9–22. [Google Scholar] [CrossRef]
Hiskens, I.A.; Gong, B. Voltage Stability Enhancement Via Model Predictive Control of Load. Intell. Autom. Soft Comput. 2006, 12, 117–124. [Google Scholar] [CrossRef]
Zheng, T. (Ed.) Model Predictive Control; Sciyo: Rijeka, Croatia, 2010; ISBN 978-953-307-102-2. [Google Scholar]
Christofides, P.D.; Scattolini, R.; de la Pena, D.M.; Liu, J. Distributed model predictive control: A tutorial review and future research directions. Comput. Chem. Eng. 2013, 51, 21–41. [Google Scholar] [CrossRef]
Qin, S.J.; Badgwell, T.A. A survey of industrial model predictive control technology. Control Eng. Pract. 2003, 11, 733–764. [Google Scholar] [CrossRef]
Yang, Y.; Lin, X.; Miao, Z.; Yuan, X.; Wang, Y. Predictive Control Strategy Based on Extreme Learning Machine for Path-Tracking of Autonomous Mobile Robot. Intell. Autom. Soft Comput. 2014, 21, 1–19. [Google Scholar] [CrossRef]
Lopez-Francol, C. Robot Pose Estimation Based on Visual Information and Particle Swarm Optimization. Intell. Autom. Soft Comput. 2018, 24, 431–442. [Google Scholar] [CrossRef]
Goggos, V.; King, R. Evolutionary predictive control. Comput. Chem. Eng. 1996, 20 (Suppl. S2), S817–S822. [Google Scholar] [CrossRef]
Sarimveis, H.; Bafas, G. Fuzzy model predictive control of nonlinear processes using genetic algorithms. Fuzzy Sets Syst. 2003, 139, 59–80. [Google Scholar] [CrossRef]
Venkateswarlu, C.; Reddy, A.D. Nonlinear model predictive control of reactive distillation based on stochastic optimization. Ind. Eng. Chem. Res. 2008, 47, 6949–6960. [Google Scholar] [CrossRef]
Blanco, T.B.; Willems, P.; Chiang, P.-K.; Haverbeke, N.; Berlamont, J.; De Moor, B. Flood regulation using nonlinear model predictive control. Control Eng. Pract. 2010, 18, 1147–1157. [Google Scholar] [CrossRef]
Chiang, P.-K.; Willems, P. Combine Evolutionary Optimization with Model Predictive Control in Real-time Flood Control of a River System. Water Resour. Manag. 2015, 29, 2527–2542. [Google Scholar] [CrossRef]
Minzu, V.; Serbencu, A. Systematic Procedure for Optimal Controller Implementation Using Metaheuristic Algorithms. Intell. Autom. Soft Comput. 2020, 26, 663–677. [Google Scholar] [CrossRef]
Mînzu, V.; Georgescu, L.; Rusu, E. Predictions Based on Evolutionary Algorithms Using Predefined Control Profiles. Electronics 2022, 11, 1682. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Machine Learning Basics. In Deep Learning; The MIT Press: Cambridge, MA, USA, 2016; pp. 95–161. ISBN 978-0262035613. [Google Scholar]
Zou, S.; Chu, C.; Shen, N.; Ren, J. Healthcare Cost Prediction Based on Hybrid Machine Learning Algorithms. Mathematics 2023, 11, 4778. [Google Scholar] [CrossRef]
Cuadrado, D.; Valls, A.; Riaño, D. Predicting Intensive Care Unit Patients’ Discharge Date with a Hybrid Machine Learning Model That Combines Length of Stay and Days to Discharge. Mathematics 2023, 11, 4773. [Google Scholar] [CrossRef]
Albahli, S.; Irtaza, A.; Nazir, T.; Mehmood, A.; Alkhalifah, A.; Albattah, W. A Machine Learning Method for Prediction of Stock Market Using Real-Time Twitter Data. Electronics 2022, 11, 3414. [Google Scholar] [CrossRef]
Wilson, C.; Marchetti, F.; Di Carlo, M.; Riccardi, A.; Minisci, E. Classifying Intelligence in Machines: A Taxonomy of Intelligent Control. Robotics 2020, 9, 64. [Google Scholar] [CrossRef]
Banga, J.R.; Balsa-Canto, E.; Moles, C.G.; Alonso, A.A. Dynamic optimization of bioprocesses: Efficient and robust numerical strategies. J. Biotechnol. 2005, 117, 407–419. [Google Scholar] [CrossRef] [PubMed]
Balsa-Canto, E.; Banga, J.R.; Alonso, A.A.; Vassiliadis, V.S. Dynamic optimization of chemical and biochemical processes using restricted second-order information. Comput. Chem. Eng. 2001, 25, 539–546. [Google Scholar] [CrossRef]
Newbold, P.; Carlson, W.L.; Thorne, B. Multiple Regression. In Statistics for Business and Economics, 6th ed.; Pfaltzgraff, M., Bradley, A., Eds.; Pearson Education, Inc.: Upper Saddle River, NJ, USA, 2007; pp. 454–537. [Google Scholar]
Shi, H.; Zhang, X.; Gao, Y.; Wang, S.; Ning, Y. Robust Total Least Squares Estimation Method for Uncertain Linear Regression Model. Mathematics 2023, 11, 4354. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Example: Linear Regression. In Deep Learning; The MIT Press: Cambridge, MA, USA, 2016; pp. 104–113. ISBN 978-0262035613. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and neither the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figure 1. The control structure using EAs.

Figure 2. The state trajectory yield by a control profile.

Figure 3. A set of M quasi-optimal trajectories produced by control loop simulation using a controller based on the EA and MP.

Figure 4. The closed loop simulation to generate a quasi-optimal trajectory and its CP.

Figure 5. The matrices for the quasi-optimal trajectory and its CP.

Figure 6. An example of matrices for the optimal trajectory and its CP (PRP case).

Figure 7. STATE and UstarRHC: data structures representing the M optimal trajectories and CPs.

Figure 8. Preparing the training and testing datasets for machine learning at the level of each sampling period.

Figure 9. Predicted data and test data of the regression model for k = 4.

Figure 10. Construction of the regression model by stepwise function for k = 2.

Figure 11. The CP achieved by the ML controller, linear regression version.

Figure 12. PRP: State evolution comparison: (a) prediction with EA using the RHC; (b) prediction achieved by machine learning using a set of linear regression functions.

Figure 13. The CP achieved by the ML controller, stepwise regression version.

Figure 14. Comparison between the state evolutions involved by the two ML controllers.

Table 1. Dataset for step k.

X^T	U^T
${(X_{1}^{*} (k))}^{T}$	${(U_{1}^{*} (k))}^{T}$
……	……
${(X_{M}^{*} (k))}^{T}$	${(U_{M}^{*} (k))}^{T}$

Table 2. The regression functions following the stepwise strategy.

k	f_k(X)	k	f_k(X)
0	f_k = −10.449 + 10.606∙x₂	8	f_k = 4.3452 − 0.44551 x₅
1	f_k = 0.72997 − 0.09299∙x₄	9	f_k = 36.628 − 8.1574∙x₂ − 2.5122 x₅
2	f_k = −14.476 + 3.0317∙x₃ + 1.3432∙x₅ + 4.9049∙x₃∙x₅	10	f_k = 4.2061 + 713.02∙x₂ − 0.41558∙x₅ − 95.461∙x₂∙x₅
3	f_k = −1.5603 + 1.4141∙x₂	11	f_k = 0.87113 − 0.60908∙x₁
4	f_k = −1.7596 + 1.552∙x₂	12	f_k = 1.0149 − 0.30622∙x₁
5	f_k = −2.351 + 1.9328∙x₂	13	f_k = 1.2692 − 0.33849∙x₁
6	f_k = 4.4274 − 0.9024∙x₅	14	f_k = −0.19244 − 0.41509∙x₁ + 0.41057∙x₂ + 0.10083∙x₅
7	f_k = 4.8791 − 0.72412∙x₅

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mînzu, V.; Arama, I. A Machine Learning Algorithm That Experiences the Evolutionary Algorithm’s Predictions—An Application to Optimal Control. Mathematics 2024, 12, 187. https://doi.org/10.3390/math12020187

AMA Style

Mînzu V, Arama I. A Machine Learning Algorithm That Experiences the Evolutionary Algorithm’s Predictions—An Application to Optimal Control. Mathematics. 2024; 12(2):187. https://doi.org/10.3390/math12020187

Chicago/Turabian Style

Mînzu, Viorel, and Iulian Arama. 2024. "A Machine Learning Algorithm That Experiences the Evolutionary Algorithm’s Predictions—An Application to Optimal Control" Mathematics 12, no. 2: 187. https://doi.org/10.3390/math12020187

APA Style

Mînzu, V., & Arama, I. (2024). A Machine Learning Algorithm That Experiences the Evolutionary Algorithm’s Predictions—An Application to Optimal Control. Mathematics, 12(2), 187. https://doi.org/10.3390/math12020187

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Machine Learning Algorithm That Experiences the Evolutionary Algorithm’s Predictions—An Application to Optimal Control

Abstract

1. Introduction

A Machine Learning Algorithm Extending the Applicability of the EA Predictions

2. Optimal Control Using Evolutionary Algorithms

2.1. Optimal Control Problems with a Final Cost

2.1.1. Process Model

2.1.2. Constraints

2.1.3. Cost Function

2.1.4. An Example of OCP with a Final Cost

2.2. A Discrete-Time Solution Based on EAs

3. Controller Based on Machine Learning

3.1. The General Description of the Proposed Method

3.2. Dataset Generation for Machine Learning Model

3.2.1. The Simulation of the Closed-Loop System Based on EA Predictions

3.2.2. Aggregation of Datasets concerning M Optimal Evolutions of the Closed Loop

3.2.3. Extraction of Datasets Characterizing Each Sampling Period

3.3. Construction of Machine Learning Models

3.3.1. Models with Linear Terms for Each State Variable

3.3.2. Models Constructed via Stepwise Regression

4. Simulation of the Control-Loop System Equipped with the ML controller

The Controller’s Execution Time

5. Discussion

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

The Prediction Function Using an EA

Appendix B

The Models’ Construction Script

Appendix C

Implementation of the Stepwise Strategy

Appendix D

Execution Time

Appendix E

The ML controller’s Capacity to Reject Noises Affecting the Process State

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI