Next Article in Journal
Experimental and Numerical Study of Double-Pipe Evaporators Designed for CO2 Transcritical Systems
Previous Article in Journal
Investigation of Heat and Moisture Transport in Bananas during Microwave Heating Process
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multivariable System Identification Method Based on Continuous Action Reinforcement Learning Automata

School of Information Science and Technology, Beijing University of Chemical Technology, No. 15, Beisanhuan East Road, Beijing 100029, China
*
Author to whom correspondence should be addressed.
Processes 2019, 7(8), 546; https://doi.org/10.3390/pr7080546
Submission received: 19 June 2019 / Revised: 11 August 2019 / Accepted: 12 August 2019 / Published: 17 August 2019
(This article belongs to the Special Issue Multivariable Control and Object-Oriented Modeling)

Abstract

:
In this work, a closed-loop identification method based on a reinforcement learning algorithm is proposed for multiple-input multiple-output (MIMO) systems. This method could be an attractive alternative solution to the problem that the current frequency-domain identification algorithms are usually dependent on the attenuation factor. With this method, after continuously interacting with the environment, the optimal attenuation factor can be identified by the continuous action reinforcement learning automata (CARLA), and then the corresponding parameters could be estimated in the end. Moreover, the proposed method could be applied to time-varying systems online due to its online learning ability. The simulation results suggest that the presented approach can meet the requirement of identification accuracy in both square and non-square systems.

1. Introduction

With the rapid development of modern industry, it becomes increasingly difficult for the traditional model control methods to properly control complex process due to the uncertainty, time-delay, multivariable coupling, and constraints between input and output. It is a challenge for the traditional identification methods to obtain optimal results, particularly in multivariable systems, due to their complex structure, various parameters, and time-varying in industrial applications. Methods for the identification of multivariable systems go back to the 1960s, but most identification methods require the observation to be noise-free. This situation together with their heavy computational cost makes them difficult to be applied in practice [1]. In view of the above problems, many researchers proposed replacing the state space model with a polynomial matrix to describe the multivariable system. In 1975, Guidozi [2] proposed the equivalence relation between the observable canonical form of system state and the canonical form of an input–output difference equation, and established the mapping relation between the parameters of the state equation and the observed input–output data. Subsequently, some researchers proposed row subspace identification methods based on the Hankel matrix. In these methods, the first step is to obtain the augmented observability matrix (or state sequence) of the system, and then the parameter matrix of each subspace is calculated. The main representative methods include multivariable output error state space (MOESP) [3,4], numerical algorithms for subspace state-space system identification (N4SID) [5,6], and canonical variate analysis (CVA) [7,8]. At first, these methods of subspace identification were difficult to apply online, and were very computationally intensive in recursive decomposition. With the development of technology, they have been widely applied in industrial practice due to the advantage that they do not depend on any prior knowledge in the identification of the multivariable system. However, the objective of multivariate identification methods based on state space models is not to find the optimal solution, but the Pareto suboptimal solutions, so there is substantial room for improvement in the performance of these methods.
Methods for the identification of multivariable systems based on state space have large computational demand, take a long time, and it is difficult for them to achieve the global optimal solution. Therefore, many researchers have focused on identification methods based on the transfer function matrix or other models (i.e., to transform the state space model into a transfer function model through the Laplace transform). Researchers have proposed several methods for the transfer function n matrix modeling of multivariable systems, such as the instrumental model method, the sub-sub model recursive method, the combined identification algorithm (CIA), the extended least squares method, and the multi-innovation recursive identification method, which have greatly promoted the direction of multivariable system identification based on the transfer function matrix model. With the instrumental model, the identification of the transfer function model can be decomposed into several sub-models, and it has attracted increasing attention. In [9], the least squares method was used to eliminate the bias in a multi-input single-output (MISO) system, but this method is still difficult to apply online, and it has harsh preconditions and assumptions. Ding et al. [10] proposed a bias-compensation-based recursive least-squares algorithm to solve the problem of the identification in non-stationary systems. On the basis of the nearest Kronecker product and low-rank approximation, Camelia et al. [11] proposed a low-complexity recursive least-squares (RLS) algorithm, which has good robustness against additive noise and good identification effect. The instrumental model [12] can solve the problem of the unknown information vector in the model. Du et al. [13] proposed a robust output error model identification method, and used the auxiliary model to estimate the noiseless output under random noise, which is suitable for the time-delay industrial process under load disturbance. In order to eliminate the interference of abnormal points in the observation, the multi-innovation concept was introduced to accelerate the convergence of the model [14]. Based on the sub-model (MISO), Li et al. [15] proposed a sequential excitation method for a multiple-input multiple-output (MIMO) system. With sequential excitation signals, the multiple-input multiple-output (MIMO) system could be decomposed into several equivalent single-input single-output (SISO) systems in an open-loop control.
In recent years, the open-loop step response test has been a common method in the field of system identification, and is simple to operate and easy to implement. However, there are still some issues: (1) Open-loop identification is usually not permitted in field systems. Considering the factors of safety and economy, the output of the controller is required within a limited range once the system is running. (2) The input signal is usually limited to a step-function signal, and the test time is limited to the transition time. It is often affected by external disturbance and the change of internal working conditions. Therefore, authors in [16,17,18,19] proposed frequency response estimation methods for the identification of the closed-loop system. However, the existing methods still have some limitations (e.g., the accuracy depends on the choice of attenuation factor). The value of the attenuation factor is usually given according to prior knowledge. Jin et al. [20] proposed an intelligent searching algorithm to obtain the range of the attenuation factor, and achieved good results. However, for large-scale complex systems, it is time-varying and the attenuation factor easily falls into local premature convergence.
Learning automata (LA) have potential applications in system control. Li et al. [21] proposed a coral reef algorithm based on LA for the coverage control problem of heterogeneous directional sensor networks. Mohammed et al. [22] developed a fuzzy maximum power point tracking controller using the information collected by LA through the learning process. Thus, a novel method based on a reinforcement learning algorithm, namely continuous action reinforcement learning automata (CARLA), is presented here to solve the issue mentioned above. After continuously interacting with the environment, an optimal attenuation factor can be achieved by CARLA, and then the system parameters of the system can be estimated. Moreover, the proposed method can be used online and applied to time-varying systems due to its online learning ability.

2. Materials and Methods

2.1. Background

Frequency response methods can be applied to the parameter estimation of open-loop and closed-loop systems, but their accuracy depends on the value of the attenuation factor, which is usually obtained from prior knowledge. Some researchers [20,23] proposed the use of heuristic search algorithms (e.g., particle swarm optimization (PSO) and its improved methods) to obtain the optimal value. However, it is difficult to solve the issue of the on-line identification of time-varying systems. An online adaptive learning method is required to find the optimal solution of the parameters within the contiguous space. To solve these problems, a frequency response estimation (FRE) method based on continuous action reinforcement learning automata (CARLA-FRE) is proposed in this paper.

2.2. Basic Reinforcement Learning

As an important branch of machine learning, reinforcement learning (RL) [24] interacts with the environment actively and constantly, updates iterations based on feedback, and finally gives the optimal strategy. It contains the main elements of agent, state, action, and rewards, and its learning target is to obtain the optimal strategy to maximize the long-term cumulative rewards. As shown in Figure 1, the most important feature of reinforcement learning is the capability of autonomous and online learning without any prior knowledge and state transition probability. Firstly, the agent perceives the state of the environment and takes various exploratory actions according to the accumulative compensation. After taking the action, the environment undergoes a state transition and enters into a new state. At the same time, the behavior strategy is evaluated, and the feedback is returned to the learning system. After receiving the feedback of the reward or the punishment, the agent modifies its strategies continuously to meet the requirements of the environment, and the whole process is iteratively updated until the optimal strategy is obtained.

2.3. Continuous Action Reinforcement Learning Automata (CARLA)

The target of system identification is to obtain the appropriate attenuation factor, which is a problem of finding the optimal action. Because the action space is continuous, a continuous action reinforcement learning automata (CARLA) [25,26] is proposed in this paper. Compared with other algorithms, CARLA can use the probability density function to select behavior in continuous space with a stochastic or unknown system model. The system learns interactively with the environment in a trial-and-error manner, and gets better behavior strategies by strengthening signals, increasing the probability of the action by strategy iteration, and finally obtains the optimal parameters online [27,28,29,30].
For CARLA, each action x is a mapping of a cumulative probability density function (CPDF), registered as f ( x ) . With reinforcement signal β , the density functions are updated many times, and the optimal decision variables with the maximum of the corresponding CPDF is obtained. During the process of the iteration, the reinforcement signal is determined by the evaluation function of the last iteration. Therefore, the whole process of learning and updating is always optimized in the direction of better results, and the ultimate goal is to achieve long-term rewards rather than one-step rewards, so as to ensure the global optimality. The learning process is as follows:
CARLA algorithm
1: Initialize the probability density function f 0 ( x i ) : establish the uniform distribution of CPDFs according to the range of the parameter;
2: Actions selection: select actions (or parameters) randomly based on the CPDF value;
3: System evaluation: take the action, substitute parameters into the system to obtain the responding curve, and calculate the fitness function J ( x i ) ;
4: Calculate the enhanced signal value β according to the value of the fitness function;
5: Update each CPDF value according to the enhanced signal value;
6: Update behavior parameters: introduce the normal random number generator to update the action parameters at the next moment;
7: If the stopping condition has not been reached, return to step 2 until the convergence condition is met.

3. Frequency Response Estimation Based on CARLA (CARLA-FRE)

3.1. Frequency Response Estimation Based on CARLA (CARLA-FRE)

In order to improve learning efficiency of the algorithm, the threshold value of attenuation factor obtained in [31] is used as the initial region of CARLA (i.e., x i [ x min i , x max i ] ). Then, using the powerful interactive learning ability of CARLA, the optimal parameters can be obtained online to improve the accuracy and effectiveness of frequency response estimation. The learning process is shown in Figure 2, and the details of the algorithm are as follows:
(1) Selection of the test signal: The input functions with the characteristics of continuous second-order derivability can be used as the excitation signals of frequency response estimation. Therefore, there are many choices of test signal, including the step signal r ( t ) = c , the pulse signal r ( t ) = 1 / δ ( t ) , the exponential attenuation signal r ( t ) = e k t , and the composite function r ( t ) = t e k t .
(2) Selection of the test mode: This includes open-loop identification and closed-loop identification, according to the process requirements and environmental conditions. If the system is asymptotically stable and the external disturbance is small or the disturbance signal is regular, it may be a good choice to adopt a simple open-loop testing mode. If the process requires high safety and stability, large shutdown loss, and sensitive to external disturbance, it will be necessary to choose the closed-loop approach.
(3) Selection of the model structure: The establishment of a system model includes structure selection and parameter identification. The former selects the model structure according to the characteristics of the process, including model order and time delay. The latter estimates model parameters on the basis of the model structure. This paper focuses on the problem of model parameter estimation.
Steps (1)–(3) can be regarded as the preparation stage of the test, which can seriously affect the accuracy of system identification.
(4) Analysis of system parameters: This part focuses on the frequency response estimation method. The mathematical expressions of the relationship between system parameters and input–output can be given by analyzing the transfer function of the system. Then, the frequency response of the expression can be obtained by substituting s = a + j w , where a is the attenuation factor. With the appropriate attenuation factor a , the parameter estimation can be obtained. The traditional approach is to give the value of a directly according to experiential knowledge, or using a heuristic searching algorithm. However, in practice, the system often has time-varying characteristics, and the attenuation factor usually falls into local premature convergence. Therefore, an online learning method is required to acquire appropriate parameters adaptively.
(5) Online optimization of attenuation factor based on CARLA:
a. Initialize the attenuation factor a ( 0 ) . According to the literature, the range of the attenuation factor can be set as a [ a min , a max ] in advance. For each iteration k, the action a ( k ) can be selected according to the probability density function f ( a , k ) . In the beginning, the probability density function f ( a , k ) can be initialized as in Formula (1):
f ( x i , 0 ) = { 1 x max i x min i , x i [ x min i , x max i ] 0 , x i [ x min i , x max i ] .
b. Select the attenuation factor. Select the attenuation factor a according to the value of the probability distribution function (as Formulas (2) and (3))— the attenuation factor a with the maximum value of probability distribution function f ( a , k ) is the best.
F ( x i , k ) = 0 x i f ( x i , k ) d x i
x i * = arg max x i F ( x i , k )
c. Calculate the cost function J ( a , k ) . According to the cost function, the cost function value under the current attenuation factor a will be calculated as Formula (4). It is the integral square error (ISE) of the real output and the estimated output of the system.
J ( X , k ) = ( Δ e ) 2 d t
J ( X , k ) = ( Δ e ) 2 d t T = 1 M ( Y T y T ) 2
d. Calculate the enhancement signal β ( k ) . By substituting the cost value J ( a , k ) , average cost value J m e a n , and minimum cost value J m i n into Formula (6), the enhancement signal β ( k ) can be calculated to measure the performance of the evaluation. The closer the value of β ( k ) is to 1, the better the performance of the system under the attenuation factor. On the contrary, the closer the value of β ( k ) is to 0, the worse the performance of the system under the attenuation factor.
β ( k ) = min { max { 0 , J mean J ( X , k ) J mean J min } , 1 }
e. Update the value of the probability density function corresponding to the current attenuation factor a . After the system takes action to change the environment, it will give feedback to this policy, and update the probability density function corresponding to the iteration action by strengthening the signal. The probability density function f ( a , k + 1 ) corresponding to the selected attenuation factor can be updated according to Formula (7):
f ( x i , k + 1 ) = { α k ( f ( x i , k ) + β ( k ) H ( x i , r ) ) , x i [ x min i , x max i ] 0 , x i [ x min i , x max i ] ,
where H ( x i , r ) ) is the Gaussian nearest neighbor function, whose value represents the possibility of action change. The specific formula is as follows:
H ( x i , r ) ) = λ exp ( ( x i r ) 2 2 σ 2 ) ,
α k = 1 x min i x max i ( f ( x i , k ) + β ( k ) H ( x i , r ) ) d x i ,
λ = g h ( x max i x min i ) ,
σ = g w ( x max i x min i ) ,
where g h and g w represent the height and width of the Gaussian distribution function, respectively. They determine the speed and depth of the learning process, respectively. According to previous work [32], g w = 0.02 and g h = 0.3 are usually selected if the sample number is 500 in the calculation of the strengthened signal, r is the action parameter, and a is the normalized factor, which can keep the value of the probability density function within [0,1].
f. Update attenuation factor parameters. In order to solve the problem of large computations when updating the probability density function value and the probability distribution value in step b , this paper proposes an improved method as follows:
α ( k + 1 ) = { norm ( α ( k ) , ω e ( 1 β ( k ) ) ) , α ( k ) [ α min , α max ] α ( k ) , α ( k ) [ α min , α max ] ,
where n o r m ( α ( k ) , ω e ( 1 β ( k ) ) ) is the normal-distribution random number simulator with α ( k ) , ω e ( 1 β ( k ) ) , and ω as the average, standard deviation, and learning rate factor, respectively. It can be seen that the learning time of the improved algorithm is linear with the number of iterations. Compared with the previous method using the integral operation to calculate the value of the probability distribution function, the calculation and learning time of the improved algorithm are significantly reduced without any loss of learning performance.
g. Update the iterations to achieve the optimal attenuation factor a which makes the estimated output as close as possible to the real output.
(6) Substitute the learned optimal attenuation factor into the mathematical expression in step (4) to obtain the parameter estimation. After obtaining the system estimation model, it can be applied to the internal model control.

3.2. The Applications of CARLA-FRE in MIMO Systems

In industrial applications, due to the complex structure, various parameters, and the coupling relationship between the loops of multivariable systems, the traditional identification methods are difficult to use effectively in multivariable systems. Li [15] proposed a multivariable system identification method based on sequential step signals. With the sequential step method, the multiple-input multiple-output (MIMO) system could be equivalently decomposed into several single-input single-output (SISO) systems. Then, the analytic expressions of model parameters of these sub-systems could be obtained by frequency response estimation. The simulation results reveal that the presented approach could match the requirements of identification accuracy both in square and non-square systems.

3.2.1. Closed-Loop Identification for Square Multivariate Systems

A multivariable square closed-loop control system n × n is depicted in Figure 3, where G s ( s ) is the controlled object, G c ( s ) is the distributed diagonal controller, R is the system input vector, Y is the system output vector, E is the deviation vector, and U is the controller output vector.
R ( s ) = [ r 1 , , r n ] T
Y ( s ) = [ y 1 , , y n ] T
U ( s ) = [ u 1 , , u n ] T
G s ( s ) = [ g s 11 ( s ) g s 12 ( s ) g s 1 n ( s ) g s 21 ( s ) g s 22 ( s ) g s 2 n ( s ) g s n 1 ( s ) g s n 2 ( s ) g s n n ( s ) ]
G c ( s ) = [ g c 1 ( s ) 0 0 0 g c 2 ( s ) 0 0 0 g c n ( s ) ]
According to the above, it can be known that
Y = G c G s ( R Y ) .
So, we can get:
[ y 1 y 2 y n ] = [ g c 1 0 0 0 g c 2 0 0 0 g c n ] [ g s 11 g s 12 g s 1 n g s 21 g s 22 g s 2 n g s n 1 g s n 2 g s n n ] [ r 1 y 1 r 2 y 2 r n y n ] .
Then, the MIMO system can be equivalently decomposed into several SISO systems as follows:
{ y 1 = g c 1 g s 11 ( r 1 y 1 ) + + g c n g s 1 n ( r n y n ) y 2 = g c 1 g s 21 ( r 1 y 1 ) + + g c n g s 2 n ( r n y n ) y n = g c 1 g s n 1 ( r 1 y 1 ) + + g c n g s n n ( r n y n ) .
A method to simplify the identification process of closed-loop systems was proposed in the literature, in which the system deviation could be regarded as the input signal, that is, r ( t ) could be replaced with e ( t ) = r ( t ) y ( t ) of the identification process, and the closed-loop system was equivalent to the open-loop system. Therefore, Formula (20) can be further expressed as Formula (21):
{ y 1 = g c 1 g s 11 e 1 + + g c n g s 1 n e n y 2 = g c 1 g s 21 e 1 + + g c n g s 2 n e n y n = g c 1 g s n 1 e 1 + + g c n g s n n e n .
Then, n step excitation signals are applied to the system successively and n groups of vector relationships can be obtained for each MISO system, depicted, for example, as Formula (22):
{ y i 1 = g c i g s i 1 e 1 1 + + g c i g s i n e n 1 y i 2 = g c i g s i 1 e 1 2 + + g c i g s i n e n 2 y i n = g c i g s i 1 e 1 n + + g c i g s i n e n n ,
where r i j is the excitation signal of the input channel of the ith subsystem under the jth test, y i j is the output signal of the output channel of the ith subsystem under the jth test, and then the deviation signal of the channel can be obtained, e i j = r i j y i j , that is, the equivalent input signal in the identification process.
Therefore, the following expression can be obtained by matrix transformation:
[ y i 1 y i 2 y i n ] = g c i [ e 1 1 e 2 1 e n 1 e 1 2 e 2 2 e n 2 e 1 n e 2 n e n n ] [ g s i 1 g s i 2 g s i n ] = [ g c i e 1 1 g c i e 2 1 g c i e n 1 g c i e 1 2 g c i e 2 2 g c i e n 2 g c i e 1 n g c i e 2 n g c i e n n ] [ g s i 1 g s i 2 g s i n ] .
That is,
Y i = F i · G s i ,
where
Y i = [ y i 1 y i 2 y i n ] T ,
F i = [ g c i e 1 1 g c i e 2 1 g c i e n 1 g c i e 1 2 g c i e 2 2 g c i e n 2 g c i e 1 n g c i e 2 n g c i e n n ] ,
G i s = [ g s i 1 g s i 2 g s i n ] T .
Since the closed-loop system is asymptotically stable and the input signal is continuously differentiable, it can be proved that F i is a non-singular square matrix. So, Formula (28) can be further obtained according to Formula (24):
G s i = F i 1 · Y i = adj ( F i ) det ( F i ) · Y i ,
where F i 1 is the inverse matrix of F i , adj ( · ) is the adjoint matrix operator, and det ( · ) is the matrix determinant operator.
According to Formula (28), the MIMO system identification problem can first be decomposed into several MISO subsystem problems, and then further decomposed into SISO identification problems.
G s i j = k = 1 n F k j * y i k det ( F i ) i
G s i j is the transfer function of the decomposed equivalent subsystem (SISO) of the MISO system, and F k j * is the joint factor of the jth row of the F matrix under the kth test.
For Formula (29), if the assumption that u ¯ i j = det ( F i ) , y ¯ i j = k = 1 n F k j * y i k , u ¯ i j , and y ¯ i j can be regarded as the equivalent input and output of the SISO system identification problem, MISO can be further decomposed into several SISO identification problems. Considering that the presented CARLA-FRE is applicable for a variety of excitation signals, the parameters of the decomposed equivalent SISO system can be estimated using CARLA-FRE. Then, the identified SISO can be combined into a MIMO system according to Formulas (24) and (19) to complete the multivariable system identification.

3.2.2. Closed-Loop Identification for Non-Square Multivariate Systems

For multivariable non-square systems, due to the inconsistency between the input and output dimensions of the system, the inverse matrix of the equivalent matrix F does not exist, making the methods above difficult to apply directly. Based on the idea of the first method in [17], this paper adjusts the decentralized controller into a centralized controller, establishes an association between the input and output of each loop in the MIMO system, and then constructs a solvable matrix form. A multivariable non-square closed-loop control system of m × n , ( m n ) is shown in Figure 4, where G s ( s ) is the controlled object, G c ( s ) is the central controller, R is the system input vector, Y is the system output vector, E is the deviation vector, and U is the controller output vector.
G c ( s ) = [ g c 11 ( s ) g c 12 ( s ) g c 1 n ( s ) g c 21 ( s ) g c 22 ( s ) g c 2 m ( s ) g c n 1 ( s ) g c n 2 ( s ) g c n m ( s ) ]
In a non-square system structure, m times signals are applied to the input of the system in turn, and the system will output n response signals at a time, that is, m × n   ( m n ) groups of equations are generated. The system can be decomposed and calculated according to Formulas (28) and (29), and the pseudo-inverse of the matrix is selected to replace it accordingly.

4. CARLA Algorithm Performance Verification

The continuous action reinforcement learning algorithm has strong online search and learning ability and can converge to the optimal value after full ergodic learning, without prior knowledge to set parameters. In order to test the identification ability of the CARLA algorithm, it was compared with the particle swarm optimization and the parallel diffuse algorithm (fireworks algorithm, FWA) by employing standard test functions. Here the Sphere, Rosenbrock, Griewank, Rastrigin, Ackley, and Schwefel’s problem 22 functions were selected to illustrate the applicability and performance of the CARLA algorithm.
(1) Sphere function:
f ( x ) = i = 1 n x i 2 ,
As can be shown in Figure 5.
The Sphere function has a unique global minimum value, which is obtained by the sum of squares when the minimum value is taken by independent variables with the same definition of domain.
(2) Rosenbrock function:
f ( x ) = i = 1 N 1 [ ( 1 x i ) 2 + 100 ( x i + 1 x i 2 ) 2 ] .
As shown in Figure 6, Rosenbrock’s global optimum lies in a smooth, narrow and parabolic valley. Due to the limited information, it is difficult to determine the search gradient and find the optimal solution. Therefore, it is often used to test the optimization performance of the non-convex function of the optimization algorithm, and the function can find the minimum value 0 at x * = ( 1 , , 1 ) .
(3) Griewank function:
min f ( x i ) = i = 1 N x i 2 4000 i = 1 N cos ( x i i ) + 1 ,
where | x i | 8 .
As shown in Figure 7, the Griewank function has many local minimum points, and the number is related to the dimension of variables. It can detect the ability of the algorithm to jump out of the local minimums and the global minimum value f(0) = 0, which is generally recognized as a difficult complex multimodal problem for the optimization algorithm.
(4) Rastrigin function:
min f ( x i ) = i = 1 D [ x i 2 10 cos ( 2 π x i ) + 10 ] ,
where x i [ 5.12 , 5.12 ] .
As shown in Figure 8, the Rastrigin function is a multi-peak function, and there are about 10 n local minimum points within the range of { x i [ 5.12 , 5.12 ] ,   i = 1 , 2 n } . Similar to the Griewank function, it is also a typical nonlinear multi-modal function, and the peak shape features ups, downs, and jumps, so it is difficult to optimize and find the global optimal value.
(5) Ackley function:
f ( X ) = 20 e 1 5 1 n i = 1 n x i 2 e 1 n i = 1 n cos ( 2 π x i ) + 20 + e .
As shown in Figure 9, when the dimension of the Ackley function increases, its direction gradient and forward direction are various. This function can detect the global convergence speed of an algorithm. The function finds a minimum value 0 at x * = ( 0 , , 0 ) .
(6) Schwefel’s problem 22 function:
f ( X ) = i = 1 n | x i | + i = 1 n | x i | ,
where | x i | 10 .
As shown in Figure 10, Schwefel’s problem 22 function, proposed by Schwefel, is a continuous and smooth multimodal function which belongs to the classical test functions. When the independent variable approaches infinity, the function forms a large number of local extremum regions, and the global optimal value is located at the boundary of the definition domain. The function finds its minimum value 0 at x * = ( 0 , , 0 ) .
The parameter configuration is shown in Table 1.
Particle swarm optimization (PSO), parallel diffuse algorithm (FWA), and continuous action reinforcement learning (CARLA) algorithms were respectively tested for the standard functions above. The results are shown in Table 2. FWA and CARLA algorithms had better search accuracy than the PSO algorithm, and both could accurately obtain function parameter estimation.

5. Simulation

5.1. Square Multivariate System: Wood-Berry Model

The Wood-Berry double distillation towel model is a classic multivariate model in industrial production. The identification of the Wood-Berry model has drawn the attention of academic and industrial researchers due to the complex characteristics of multi-parameters, large time-delay, and strong coupling relationships. Many works, including Li’s sequence identification method based on step response (frequency response estimation, FRE) [33,34], Cheng’s identification method based on heuristic search (NLG-particle swarm optimization, NPSO) algorithm [20], Cao’s intelligent search algorithm based on parallel diffuse type algorithm (modified fireworks explosion optimization algorithm-FRE, MFA-FRE) [23], etc., have acquired many achievements. In this paper, some comparative simulations between the presented CARLA-FRE and the methods above were carried out to estimate the parameters of the Wood-Berry model (see Figure 11).
The transfer function of the Wood-Berry model is:
G s ( s ) = [ 12.8 e s 16.7 s + 1 18.9 e 3 s 21 s + 1 6.6 e 7 s 10.9 s + 1 19.4 e 3 s 14.4 s + 1 ] .
The sequential step signal is:
R = [ [ r 1 1 r 2 1 ] , [ r 1 2 r 2 2 ] ] = [ [ 1 2 ] , [ 4 2 ] ] .
Four methods were used to respectively identify the mentioned multivariable closed-loop system, and the results are presented in Table 3. It can be seen that several methods achieved high estimation accuracy in the parameter identification of the Wood-Berry model. In addition, the CARLA-FRE method proposed in this paper can be implemented online and adjusted automatically with the change of objects.

5.2. Non-Square Multivariate System: Shell Model

Multivariable non-square systems exist widely in industrial production, and this paper takes the Shell model of a standard non-square model [35] as an example to test the proposed method. The closed-loop control of the Shell model is shown in Figure 12:
Here, the model transfer function matrix is:
G s ( s ) = [ 4.05 e 81 s 50 s + 1 1.77 e 84 s 60 s + 1 5.88 e 81 s 50 s + 1 5.39 e 54 s 50 s + 1 5.72 e 42 s 60 s + 1 6.9 e 45 s 40 s + 1 ] .
The controller is the following formula:
G c = [ 4 4 1 2 0.6 0.5 ] .
The sequential step signal is
R = [ [ r 1 1 r 2 1 r 3 1 ] [ r 1 2 r 2 2 r 3 2 ] [ r 1 3 r 2 3 r 3 3 ] ] .
Gaussian white noises with a noise-to-signal ratio (NSR) of 10% and 20% were applied to verify the effectiveness of the algorithm, and the results were compared with the MAF-FRE method, as shown in Table 4. It can be seen that both methods achieved high estimation accuracy in the parameter identification of the Shell model, and the CARLA-FRE method proposed in this paper could better adapt to changes in the external environment.

6. Conclusions

This paper proposed a frequency response estimation method based on a continuous action reinforcement learning machine. It could solve closed-loop identification problems of multivariable square systems and non-square systems. Some comparative simulations between the presented method and existing methods were carried out. The classic Wood-Berry model (square system) and the Shell model (non-square system) were chosen to test the algorithms. From the results, it was found that the proposed method could not only achieve good identification accuracy, but also had stronger online learning ability and anti-interference ability.

Author Contributions

M.J. carried out the simulations and wrote the paper. Q.J. developed the model and designed the article. All authors have read and approved the final manuscript.

Funding

This research was funded by the Social Science Foundation of Beijing (No. 15JGC188).

Acknowledgments

The authors are grateful to the anonymous reviewers for their valuable recommendations.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Gupta, R.D.; Fairman, F.W. Parameter estimation for multivariable systems. IEEE Trans. Autom. Control 1974, 19, 546–549. [Google Scholar] [CrossRef]
  2. Guidorzi, R. Canonical structures in the identification of multivariable systems. Automatica 1975, 11, 361–374. [Google Scholar] [CrossRef]
  3. Verhaegen, M. A novel non-iterative mimo state space model identification technique. IFAC Proc. Vol. 1991, 24, 749–754. [Google Scholar] [CrossRef]
  4. Nakayama, M.; Oku, H.; Ushida, S. Closed-loop identification for a continuous-time model of a multivariable dual-rate system with input fast sampling. IFAC PapersOnLine 2018, 51, 415–420. [Google Scholar] [CrossRef]
  5. Moor, B.D.; Overschee, P.V. Numerical algorithms for subspace state space system identification. In Trends in Control; Springer: London, UK, 1995. [Google Scholar]
  6. Gumussoy, S.; Ozdemir, A.A.; McKelvey, T.; Ljung, L.; Gibanica, M.; Singh, R. Improving linear state-space models with additional niterations. IFAC PapersOnLine 2018, 51, 341–346. [Google Scholar] [CrossRef]
  7. Larimore, W.E. Canonical variate analysis in identification, filtering, and adaptive control. In Proceedings of the 29th IEEE Conference on Decision and Control, Honolulu, HI, USA, 5–7 December 1990; pp. 596–604. [Google Scholar]
  8. Pilario, K.E.S.; Cao, Y.; Shafiee, M. Mixed kernel canonical variate dissimilarity analysis for incipient fault monitoring in nonlinear dynamic processes. Comput. Chem. Eng. 2019, 123, 143–154. [Google Scholar] [CrossRef]
  9. Zheng, W.X. Unbiased identification of multivariable systems subject to colored noise. In Proceedings of the 33rd IEEE Conference on Decision and Control, Lake Buena Vista, FL, USA, 14–16 December1994; Volume 2863, pp. 2864–2865. [Google Scholar]
  10. Feng, D.; Tongwen, C.; Li, Q. Bias compensation based recursive least-squares identification algorithm for miso systems. IEEE Trans. Circuits Syst. II Express Briefs 2006, 53, 349–353. [Google Scholar] [CrossRef]
  11. Elisei-Iliescu, C.; Stanciu, C.; Paleologu, C.; Benesty, J.; Anghel, C.; Ciochina, S. Efficient recursive least-squares algorithms for the identification of bilinear forms. Digit. Signal Process. 2018, 83, 280–296. [Google Scholar] [CrossRef]
  12. Ding, F.; Xie, X. Recursive estimation of parameters of transfer function matrix subsub-model: Instrumental model method. Control Decis. 1991, 6, 447–452. [Google Scholar]
  13. Du, J.; Dong, S.; Liu, T.; Zhao, J. Multi-innovation based identification of output error model with time delay under load disturbance. IFAC PapersOnLine 2018, 51, 224–228. [Google Scholar] [CrossRef]
  14. Ding, F.; Xie, X.; Fang, C. Multi-innovation identification method for time-varying systems. Acta Autom. Sin. 1996, 22, 85–91. [Google Scholar]
  15. Li, S.Y.; Qi, C.K. A Structured Closed-Loop Identification Method for Multivariable Systems based on Step Response Testing. Chinese Patent CN148268, 7 April 2004. [Google Scholar]
  16. Liu, T.; Gao, F. A frequency domain step response identification method for continuous-time processes with time delay. J. Process Control 2010, 20, 800–809. [Google Scholar] [CrossRef]
  17. Liu, T.; Zhang, W.; Gao, F. Analytical decoupling control strategy using a unity feedback control structure for mimo processes with time delays. J. Process Control 2007, 17, 173–188. [Google Scholar] [CrossRef]
  18. Romano, R.A.; Pait, F. Matchable-observable linear models and direct filter tuning: An approach to multivariable identification. IEEE Trans. Autom. Control 2017, 62, 2180–2193. [Google Scholar] [CrossRef]
  19. Morales Alvarado, C.S.; Garcia, C. Comparison of statistical metrics and a new fuzzy method for validating linear models used in model predictive control controllers. Ind. Eng. Chem. Res. 2018, 57, 3666–3677. [Google Scholar] [CrossRef]
  20. Jin, Q.B.; Cheng, Z.J.; Dou, J.; Cao, L.T.; Wang, K.W. A novel closed loop identification method and its application of multivariable system. J. Process Control 2012, 22, 132–144. [Google Scholar] [CrossRef]
  21. Li, M.; Miao, C.; Leung, C. A coral reef algorithm based on learning automata for the coverage control problem of heterogeneous directional sensor networks. Sensors 2015, 15, 30617–30635. [Google Scholar] [CrossRef]
  22. Mohammed, S.S.; Devaraj, D.; Ahamed, T.P.I. Learning automata based fuzzy mppt controller for solar photovoltaic system under fast changing environmental conditions. J. Intell. Fuzzy Syst. 2017, 32, 3031–3041. [Google Scholar] [CrossRef]
  23. Liting, C. Research of Identification and Internal Model Control for Non-Square Multivariable System with Time Delay; Beijing University of Chemical Technology: Beijing, China, 2015. [Google Scholar]
  24. Sutton, R.S.; Barto, A.G. Reinforcement learning: An introduction. IEEE Trans. Neural Netw. 1998, 9, 1054. [Google Scholar] [CrossRef]
  25. Najim, K.; Poznyak, A.S. Learning Automata: Theory and Applications; Pergamon: Oxford, UK, 1994. [Google Scholar]
  26. Narendra, K.S.; Thathachar, M.A. Learning Automata: An Introduction; Prentice-Hall: London, UK, 1989. [Google Scholar]
  27. Xuejing, G.; Mingru, Z.; Zhiliang, W.; Yucheng, G. Parameter learning optimization of intelligent controller based on carla-pso composite model. Appl. Res. Comput. 2019, 3, 678–680. [Google Scholar]
  28. Anari, B.; Torkestani, J.A.; Rahmani, A.M. Automatic data clustering using continuous action-set learning automata and its application in segmentation of images. Appl. Soft Comput. 2017, 51, 253–265. [Google Scholar] [CrossRef]
  29. Howell, M.N.; Best, M.C. On-line pid tuning for engine idle-speed control using continuous action reinforcement learning automata. Control Eng. Pract. 2000, 8, 147–154. [Google Scholar] [CrossRef]
  30. Irandoost, M.A.; Rahmani, A.M.; Setayeshi, S. A novel algorithm for handling reducer side data skew in mapreduce based on a learning automata game. Inf. Sci. 2018, 501, 662–679. [Google Scholar] [CrossRef]
  31. Jin, Q.; Jiang, B.; Cheng, Z. A novel identification method based on frequency response analysis. Trans. Inst. Meas. Control 2016, 38, 44–54. [Google Scholar] [CrossRef]
  32. Howell, M.N.; Frost, G.P.; Gordon, T.J.; Wu, Q.H. Continuous action reinforcement learning applied to vehicle suspension control. Mechatronics 1997, 7, 263–276. [Google Scholar] [CrossRef] [Green Version]
  33. Mei, H.; Li, S. Decentralized identification for multivariable integrating processes with time delays from closed-loop step tests. Isa Trans. 2007, 46, 189–198. [Google Scholar] [CrossRef]
  34. Mei, H.; Li, S.Y.; Cai, W.J.; Xiong, Q. Decentralized closed-loop parameter identification for multivariable processes from step responses. Math. Comput. Simul. 2005, 68, 171–192. [Google Scholar] [CrossRef]
  35. Jing, Q.; Yan, G.; Liu, Z.; Song, A. Decoupling internal model control for non-square process with time delays. In Proceedings of the IEEE 2010 International Conference on Measuring Technology and Mechatronics Automation (ICMTMA 2010), Changsha City, China, 13–14 March 2010. [Google Scholar]
Figure 1. Reinforcement learning process.
Figure 1. Reinforcement learning process.
Processes 07 00546 g001
Figure 2. Frequency response estimation method based on continuous action reinforcement learning automata (CARLA-FRE) flow chart.
Figure 2. Frequency response estimation method based on continuous action reinforcement learning automata (CARLA-FRE) flow chart.
Processes 07 00546 g002
Figure 3. Diagram of the multivariable square closed-loop control system.
Figure 3. Diagram of the multivariable square closed-loop control system.
Processes 07 00546 g003
Figure 4. Closed-loop control schematic diagram of a multivariable non-square system.
Figure 4. Closed-loop control schematic diagram of a multivariable non-square system.
Processes 07 00546 g004
Figure 5. Three-dimensional graph of the Sphere function.
Figure 5. Three-dimensional graph of the Sphere function.
Processes 07 00546 g005
Figure 6. Three-dimensional graph of the Rosenbrock function.
Figure 6. Three-dimensional graph of the Rosenbrock function.
Processes 07 00546 g006
Figure 7. Three-dimensional graph of the Griewank function.
Figure 7. Three-dimensional graph of the Griewank function.
Processes 07 00546 g007
Figure 8. Three-dimensional graph of the Rastrigin function.
Figure 8. Three-dimensional graph of the Rastrigin function.
Processes 07 00546 g008
Figure 9. Three-dimensional graph of the Ackley function.
Figure 9. Three-dimensional graph of the Ackley function.
Processes 07 00546 g009
Figure 10. Three-dimensional graph of Schwefel’s problem 22.
Figure 10. Three-dimensional graph of Schwefel’s problem 22.
Processes 07 00546 g010
Figure 11. Wood-Berry system closed-loop control diagram.
Figure 11. Wood-Berry system closed-loop control diagram.
Processes 07 00546 g011
Figure 12. The diagram of the Shell closed-loop control system.
Figure 12. The diagram of the Shell closed-loop control system.
Processes 07 00546 g012
Table 1. Parameter configuration of the test functions.
Table 1. Parameter configuration of the test functions.
Standard Function TypeDimensionalitySweet SpotOptimal Fitness ValueSearch Interval Settings
Sphere30[0,0,…,0]0(−100,100)
Rosenbrock30[1,1,…,1]0(−2.048,2.048)
Griewank30[0,0,…,0]0(−8,8)
Rastrigin30[0,0,…,0]0(−5.12,5.12)
Ackley30[0,0,…,0]0(−8,8)
Schwefel’s problem 2230[0,0,…,0]0(−10,10)
Table 2. Parameter configuration of the test function. PSO: particle swarm optimization.
Table 2. Parameter configuration of the test function. PSO: particle swarm optimization.
Standard Function TypePSOFWACARLA
Mean ValueStandard DeviationMean ValueStandard DeviationMean ValueStandard Deviation
Sphere000000
Rosenbrock66.59204.2912.1612.828.9110.22
Griewank00.010000
Rastrigin6.777.70000.010.21
Ackley 0.0430.0420000
Schwefel’s problem 2223.9313.610000
Table 3. Wood-Berry model identification results. (FRE: frequency response estimation; NPSO-FRE: NLG-particle swarm optimization-FRE; MFA-FRE: modified fireworks explosion optimization algorithm-FRE).
Table 3. Wood-Berry model identification results. (FRE: frequency response estimation; NPSO-FRE: NLG-particle swarm optimization-FRE; MFA-FRE: modified fireworks explosion optimization algorithm-FRE).
Wood-Berry G 11 ( s ) G 12 ( s ) G 21 ( s ) G 22 ( s )
Actual model 12.8 e s 16.7 s + 1 18.9 e 3 s 21 s + 1 6.6 e 7 s 10.9 s + 1 19.4 e 3 s 14.4 s + 1
Method in [26] (FRE) 12.7929 e 1.0044 s 16.6193 s + 1 18.9 e 3.01 s 21.001 s + 1 6.5998 e 6.9852 s 10.9205 s + 1 19.3942 e 3.006 s 14.4339 s + 1
error (%)0.980.340.400.47
Method in [20] (NPSO-FRE) 12.7984 e 1.0027 s 16.6942 s + 1 18.8996 e 3.0002 s 20.9990 s + 1 6.5996 e 7.0005 s 10.9001 s + 1 19.4000 e 3.0007 s 14.4001 s + 1
error (%)0.320.0140.0150.024
Method in [23] (MFA-FRE) 12.7979 e 1.0025 s 16.6910 s + 1 18.9001 e 3.0001 s 21.0001 s + 1 6.6000 e 7.0001 s 10.8999 s + 1 19.4001 e 3.0001 s 14.4001 s + 1
error (%)0.320.0040.0080.005
Method in this paper (CARLA-FRE) 12.7981 e 1.0014 s 16.6901 s + 1 18.9002 e 3.0003 s 21.0004 s + 1 6.6003 e 7.0002 s 10.9005 s + 1 19.3987 e 3.0003 s 14.4005 s + 1
error (%)0.210.0130.0120.021
Table 4. Identification results of the Shell model.
Table 4. Identification results of the Shell model.
Actual ModelNoise 0%Noise 20%
MFA-FRECARLA-FREMFA-FRECARLA-FRE
119 e 5 s 21.7 s + 1 118.9998 e 5.0174 s 21.7255 s + 1 118.9828 e 5.0132 s 21.7027 s + 1 118.9987 e 5.0479 s 21.6557 s + 1 118.9825 e 5.0143 s 21.7031 s + 1
error (%)0.470.291.160.32
40 e 5 s 337 s + 1 39.9877 e 5.0240 s 336.1787 s + 1 39.9862 e 5.0179 s 336.7018 s + 1 40.1770 e 4.7120 s 338.3429 s + 1 39.9783 e 5.0258 s 336.7251 s + 1
error (%)0.750.396.450.65
21 e 5 s 10 s + 1 20.9072 e 5.0085 s 9.9200 s + 1 20.9563 e 5.0037 s 9.9677 s + 1 20.8994 e 5.3567 s 9.9907 s + 1 20.9273 e 5.0042 s 9.9381 s + 1
error (%)1.410.617.711.05
77 e 5 s 50 s + 1 76.8300 e 5.0102 s 49.7549 s + 1 76.9137 e 5.0114 s 49.9371 s + 1 76.7488 e 5.1236 s 49.8870 s + 1 76.9029 e 5.0313 s 49.9218 s + 1
error (%)0.920.473.020.91
76.7 e 3 s 28 s + 1 76.6657 e 2.9980 s 28.0045 s + 1 76.7128 e 3.0026 s 27.9438 s + 1 76.4512 e 2.9641 s 27.8876 s + 1 76.7241 e 3.0129 s 29.9039 s + 1
error (%)0.520.312.310.81
50 e 5 s 10 s + 1 50.0989 e 5.0012 s 9.9980 s + 1 49.9611 e 5.0043 s 0.9837 s + 1 50.0425 e 5.1011 s 9.9356 s + 1 49.9217 e 5.0183 s 0.9731 s + 1
error (%)0.240.252.750.79
93 e 5 s 50 s + 1 93.0011 e 5.0112 s 49.6576 s + 1 93.0118 e 5.0185 s 49.8194 s + 1 93.1022 e 4.7869 s 49.9881 s + 1 93.0237 e 5.0386 s 49.7621 s + 1
error (%)0.910.744.391.27
36.7 e 5 s 166 s + 1 36.7903 e 5.0097 s 166.1148 s + 1 36.7198 e 5.0238 s 165.9671 s + 1 36.7061 e 4.8760 s 165.6667 s + 1 36.7237 e 5.0934 s 165.7830 s + 1
error (%)2.700.552.702.06
103.3 e 4 s 23 s + 1 103.2389 e 4.0098 s 23.0009 s + 1 103.2471 e 4.0184 s 22.9744 s + 1 103.3141 e 4.0012 s 23.0023 s + 1 103.2176 e 4.0421 s 22.8761 s + 1
error (%)0.310.620.291.67

Share and Cite

MDPI and ACS Style

Jiang, M.; Jin, Q. Multivariable System Identification Method Based on Continuous Action Reinforcement Learning Automata. Processes 2019, 7, 546. https://doi.org/10.3390/pr7080546

AMA Style

Jiang M, Jin Q. Multivariable System Identification Method Based on Continuous Action Reinforcement Learning Automata. Processes. 2019; 7(8):546. https://doi.org/10.3390/pr7080546

Chicago/Turabian Style

Jiang, Meiying, and Qibing Jin. 2019. "Multivariable System Identification Method Based on Continuous Action Reinforcement Learning Automata" Processes 7, no. 8: 546. https://doi.org/10.3390/pr7080546

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop