Next Article in Journal
The Relationship between Symmetry and Specific Properties of Supramolecular Systems
Previous Article in Journal
Modification of Brain Functional Connectivity in Adolescent Thoracic Idiopathic Scoliosis by Lower Extremities Position
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Distributed Model Predictive Control for Two-Dimensional Electric Vehicle Platoon Based on QMIX Algorithm

by
Sheng Zhang
1 and
Xiangtao Zhuan
1,2,*
1
Department of Artificial Intelligence and Automation, School of Electrical Engineering and Automation, Wuhan University, Wuhan 430072, China
2
Shenzhen Research Institute, Wuhan University, Shenzhen 518057, China
*
Author to whom correspondence should be addressed.
Symmetry 2022, 14(10), 2069; https://doi.org/10.3390/sym14102069
Submission received: 13 August 2022 / Revised: 16 September 2022 / Accepted: 26 September 2022 / Published: 4 October 2022
(This article belongs to the Section Engineering and Materials)

Abstract

:
In this paper, an improved distributed model predictive control (IDMPC) method for the platoon consisting of electric vehicles is put forward. And the motion of the platoon is performed in two dimensions, which contains longitudinal motion and lateral motion. Firstly, a platoon model is built based on the car-following model for a single following vehicle. Then, the IDMPC strategy is designed with the consideration of multiple objectives. The symmetrical weight matrices in the IDMPC are important for the final control effect. To control each following vehicle in the platoon coordinately, the weights for the IDMPC are optimized based on the QMIX algorithm in multi-agent reinforcement learning. The QMIX can fully consider the global information in the multi-vehicle following process; therefore, the IDMPC can get optimal control variables. Finally, the simulation and experimental results verify the IDMPC. Compared to the comparison strategies, the IDMPC has the better lane tracking, stability in lateral direction and economic performance.

1. Introduction

As the automotive sector develops, electric vehicles [1] and autonomous driving systems (ADSs) [2] are gradually becoming two important trends. In addition, vehicle control evolves from single-vehicle control to multi-vehicle control. Multi-vehicle control technology has therefore been widely applied in the vehicle platoon [3].
Electric vehicles are mainly composed of modules such as electric motors and batteries. Electric vehicles are driven by electricity, and the electrical energy in the battery is converted into power for the electric vehicle by the electric motor [4]. Therefore, compared with fuel vehicles, electric vehicles are environmentally friendly as they do not consume fossil energy and do not produce polluting gases. It takes a long time for electric vehicles to charge, and there is a higher requirement for electric vehicles on range with per charge. In addition, the range for per charge reflects the economic performance of electric vehicles. Therefore, the economic performance of electric vehicles is crucial [5,6].
As one of the ADSs, adaptive cruise control (ACC) is widely applied in the field of automotive motion control [7]. ACC can assist the driver in the longitudinal driving process, and because of assistance from the ACC, the driver is able to drive with ease [8]. The working state of ACC consists of the state without front vehicle and the state with front vehicle [9]. For the state without front vehicle, the velocity of the following vehicle remains unchanged [10]. For the state with front vehicle, the velocity of the following vehicle is related with the front vehicle and varies with the front vehicle [11]. The state with front vehicle is also called the car-following state.
With the development of ACC system, the following control has evolved from single-vehicle following control to multi-vehicle following control [12]. In addition, the multi-vehicle following control is an extension of the single-vehicle following control and can be also called the platoon control. Therefore, the platoon control not only needs to consider the single-vehicle following control, but also needs to consider the coordinated control among different following vehicles. In order to realize the coordinated control for different following vehicles, it is necessary to fully consider the global information in the platoon control. Therefore, the platoon consisting of electric vehicles is the object of study, and the focus of the paper is on the control strategy of platoon. In addition, the economic performance of platoon is fully considered.
In the one-dimensional vehicle platoon, only the motion in the longitudinal direction is considered. For the two-dimensional vehicle platoon, the motions of longitudinal direction and the lateral direction need to be considered. For platoon control strategies, model predictive control (MPC) algorithms achieve a wide range of application. And the improved distributed model predictive control (IDMPC) method is put forward for the platoon in the paper. In addition, for IDMPC, the symmetrical weight matrices in IDMPC are important for the final control effect. Furthermore, to control different following vehicle in the platoon coordinately, the global information needs to be fully considered. The QMIX algorithm [13] in multi-agent reinforcement learning can make full use of global information. The highlight of this paper is that to control each following vehicle in the platoon coordinately, the optimization operations are performed on the weights for IDMPC with the QMIX algorithm.
The content of the paper is organized as follows: the related work is presented in Section 2; the vehicle model for electric vehicle is built in Section 3; the platoon model is built in Section 4; the IDMPC is designed in Section 5; the IDMPC is verified in Section 6; the conclusions are drawn in Section 7.

2. Related Work

The control modes of MPC contain distributed control and centralized control. The distributed MPC uses multiple MPC frameworks to control each following vehicle in the platoon, while centralized MPC uses one MPC framework to control all following vehicles. Because the centralized control leads to a series of problems with the complex data interaction and lack of flexibility of control system, therefore the distributed control is mostly used in existing researches of platoon control. The MPC has formed an effective design and scientific analysis method with good stability and robustness, and the MPC is superior in dealing with multivariate constrained control and multi-objective optimal control problems [14,15,16]. MPC controls the future state according to the current state and control variables of the system, and the future state of the system is unknown, so it is necessary to continuously adjust the future control variable according to the system state. The MPC method seeks a control sequence with the current state and applies the first control value into the system.
For the control strategy of the platoon consisting of electric vehicles, the existing researches are mainly performed with the MPC algorithm. In [17], a simple and effective platoon control strategy for electric vehicles was put forward with the MPC algorithm, which can ensure the safety of the platoon in longitudinal cut-in operations. In [18], a heterogeneous control strategy of platoon consisting of electric vehicles was put forward with the MPC algorithm, which can save energy in the longitudinal following process by adjusting the distances between adjacent vehicles. In [19], a control strategy of platoon consisting of electric vehicles with the MPC algorithm was proposed, the multiple objectives of longitudinal motion for the platoon was optimized, and the economy was improved on the premise of ensuring other performance. In [20], an ecological platoon control strategy was proposed with the MPC algorithm, and the setting for longitudinal speed profile takes into account of energy consumption and longitudinal distance between vehicles. In [21], an energy-efficient platoon control strategy was proposed for connected electric vehicles with the MPC algorithm, which can minimize energy consumption of platoon for longitudinal motion, and the topology of the communication was fully considered.
The existing studies about control strategy for platoons consisting of electric vehicle only consider the motion in longitudinal direction, and ignore the motion in lateral direction, and lateral motion of the vehicle can be performed by the lane keeping system [22]. For the study of control strategies of a two-dimensional platoon consisting of the electric vehicles, it is necessary to consider the coordinated control between different following vehicles and the coupling between longitudinal motion and lateral motion of vehicles.

3. Vehicle Model for Electric Vehicle

The target vehicle of this paper is an electric vehicle with a front-drive motor, and the vehicle configuration is shown in Figure 1. The power system of the electric vehicle is different from the traditional fuel vehicle, mainly composed of a drive motor, main gearbox and power battery, etc. The braking system is mainly composed of a traditional hydraulic braking system and motor regenerative braking. The target vehicle dynamics model is built in Carsim. There is no electric vehicle dynamics model in Carsim, so the motor and battery models need to be externalized from Simulink.
The electric motor is a key component of the power system for electric vehicle, which can play the role of electric motor and generator in the driving and braking process respectively. In the process of driving, the drive motor acts as the power source of the vehicle and is powered by the battery to drive the vehicle; in the process of braking or coasting the vehicle, after the braking energy recovery function is turned on, the drive motor can act as a generator to provide part or all of the braking torque for the vehicle and convert the kinetic energy of the vehicle into electrical energy to be stored in the power battery [23]. For the electric vehicle in this paper, a permanent magnet synchronous motor (PMSM) with wide speed range, high power density and high efficiency is adopted. In the model construction of the drive motor, less consideration is given to the complex dynamic characteristics of PMSM, the focus is on its mechanical and electrical power output characteristics and efficiency characteristics, and the internal model is simplified as much as possible [24]. The external characteristics of a motor is shown in Figure 2a.
Power batteries mainly include lead-acid batteries, nickel-based batteries and lithium-based batteries, among which lithium batteries have higher voltage level, high energy and high power, good stability and no pollution, and are now widely used in power batteries of electric vehicle. A lithium battery is a complex nonlinear electrochemical energy storage system, so this paper ignores its chemical characteristics and builds a power battery model based on the equivalent internal resistance [25]. The battery model is presented in Figure 2b.

4. Model of Vehicle Platoon

In Figure 3, the modeling process of vehicle platoon is presented. In Figure 3a, the number of vehicles is n + 1 (n ≥ 2), and the vehicle 0 is the leading vehicle. The platoon model is constructed with vehicle i − 1 and the vehicle i as the vehicle in front and following vehicle in Figure 3b, respectively. Longitudinal motion in Figure 3c and lateral motion in Figure 3d need to be considered for platoon model.
With the model in [23,26], the platoon model is built. State equation of the platoon model is defined as:
x i ( k + 1 ) = A i x i ( k ) + B i u i ( k ) + G i w i ( k )
where
x i ( k ) = [ Δ s i ( k ) , v x , i ( k ) , v r e l , i ( k ) , a x , i ( k ) , j x , i ( k ) , e s , i ( k ) , e ˙ s , i ( k ) , e α , i ( k ) , e ˙ α , i ( k ) ] T u i ( k ) = [ a x d e s , i ( k ) , δ f , i ( k ) ] T w i ( k ) = [ a f x , i ( k ) , Ψ ˙ d e s , i ( k ) ] T A i = [ A 1 , i 0 0 A 2 , i ] A 1 , i = [ 1 0 T s 1 2 T s 2 0 0 1 0 T s 0 0 0 1 T s 0 0 0 0 1 T s τ l 0 0 0 0 1 τ l 0 ] A 2 , i = [ 1 T s 0 0 0 1 2 C α f + 2 C α r M v e h v x , i T s 2 C α f + 2 C α r M v e h T s 2 C α f l 1 2 C α r l 2 M v e h v x , i T s 0 0 1 T s 0 2 C α f l 1 2 C α r l 2 I z v x , i T s 2 C α f l 1 2 C α r l 2 I z T s 1 2 C α f l 1 2 + 2 C α r l 2 2 I z v x , i T s ] B i = [ B 1 , i 0 0 B 2 , i ] B 1 , i = [ 0 0 0 T s τ l 1 τ l ] T B 2 , i = [ 0 , 2 C a f M v e h T s , 0 , 2 C a f l 1 I z T s ] T G i = [ G 1 , i 0 0 G 2 , i ] G 1 , i = [ 1 2 T s 2 0 T s 0 0 ] T G 2 , i = [ 0 , 2 C α f l 1 2 C α r l 2 M v e h v x , i T s v x , i T s , 0 , 2 C α f l 1 2 + 2 C α r l 2 2 I z v x , i T s ] T
At the time k, for the following vehicle i, the symbols in the platoon model are shown in the Table 1.

5. IDMPC Strategy for Platoon

5.1. Distributed Control Structure for Platoon

The platoon control for electric vehicle needs to optimize the multiple objectives. For each following vehicle, objectives for to longitudinal motion and lateral motion need to be optimized. Multiple objectives contain the safety, followability, comfortability, lane tracking, stability in lateral direction and economic performance; for the whole platoon, it needs to optimize the platoon stability. The platoon stability is that the error of the spacing converges in the process of propagation. Therefore, the objectives of multi-vehicle following control include the safety, followability, platoon stability, comfortability, lane tracking, stability in lateral direction and economic performance.
In Figure 4, the distributed control architecture is applied for the platoon consisting of electric vehicles. n MPC controllers are used to control the multi-vehicle following process for n following vehicles, respectively. In order to realize the coordinated optimization of the multi-vehicle following control, it is important to select suitable weights in objective function for each MPC controller. To coordinate control of the various following vehicles in the platoon, the global information needs to be considered. The DQN algorithm in single-agent reinforcement learning is difficult to meet the needs of multi-vehicle following control, so weights are selected based on the QMIX algorithm in multi-agent reinforcement learning. The QMIX algorithm adopts offline centralized training and online distributed application, which not only realizes distributed control, but also makes full use of global information. Combining with the weights of real-time optimization, the MPC algorithm realizes the coordinated control for the motions of longitudinal direction and lateral direction with the method of rolling optimization.

5.2. Distributed MPC Algorithm

In the platoon control, in order to optimize multiple objectives, corresponding output variables are set. For the following vehicles, the vehicle i − 1 and the vehicle i are taken as the front vehicle and following vehicle, respectively. The output variables are defined as:
y i ( k ) = [ δ s , i ( k ) , v r e l , i ( k ) , a x , i ( k ) , j x , i ( k ) , e s , i ( k ) , e ˙ s , i ( k ) , e α , i ( k ) , e ˙ α , i ( k ) ] T
where, for the following vehicle i, δs,i is the error of spacing, and δs,i is described as:
δ s , i ( k ) = Δ s i ( k ) ( v x , i t h + d 0 )
where, th denotes the headway time; d0 denotes the safe spacing.
The output variables can be described as:
y i ( k ) = C i x i ( k ) Z i
where
C i = [ C 1 , i 0 0 C 2 , i ] C 1 , i = [ 1 t h 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 ] C 2 , i = [ 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 ] Z i = [ Z 1 , i 0 ] Z 1 , i = [ d 0 0 0 0 ] T
To guarantee followability, the error of spacing and the relative speed should be minimized; to guarantee comfortability, the longitudinal acceleration and jerk should be minimized [23]; to guarantee economic performance, the desired longitudinal acceleration should be minimized for reducing the energy consumption [27]; to guarantee lane tracking, the minimization should be made for the position deviation in lateral direction and directional deviation [26]; to guarantee stability in lateral direction, the minimization should be made for the derivatives of position deviation in lateral direction and directional deviation, and for the steering angle [26]. Thus, to perform optimization operations on multiple objectives in the multi-vehicle following process, the minimization should be made for the output variables and control variables:
Minimization : { min | y i ( k ) | min | u i ( k ) |
To obtain smooth response curve of multi-vehicle following system, the output variable is smoothed with the reference trajectory as follows:
y r e f , i ( k + i ) = φ i j y i ( k )
where
φ i = [ φ 1 , i 0 0 φ 2 , i ] φ 1 , i = [ ρ δ 0 0 0 0 ρ v 0 0 0 0 ρ a 0 0 0 0 ρ j ] φ 2 , i = [ ρ e s 0 0 0 0 ρ e ˙ s 0 0 0 0 ρ e α 0 0 0 0 ρ e ˙ α ]
where, ρδ is the factor of the δs; ρv is the factor of the vrel; ρa is the factor of the ax; ρj is the factor of the jx; ρ e s is the factor of the es; ρ e ˙ s is the factor of the e ˙ s; ρ e α is the factor of the eα; ρ e ˙ α is the factor of the e ˙ α.
For the multi-vehicle following control, the constraints are described as follows:
s . t . { Δ s i ( k ) d c v x min v x , i ( k ) v x max a x min a x , i ( k ) a x max j x min j x , i ( k ) j x max u 1 min u 1 , i ( k ) u 1 max u 2 min u 2 , i ( k ) u 2 max
where, u1,i and u2,i are the desired longitudinal acceleration and the targeted value of front steering angle for the following vehicle i, respectively.
Homogeneous platoon means that the vehicles equipped with ACC system are exactly the same, and the parameters in controller are exactly the same. Heterogeneous platoon means that vehicles equipped with ACC are produced by different automobile manufacturers and component suppliers. And the controller fails to meet uniform standards, and this paper mainly considers the difference in headway time th.
For guaranteeing the platoon stability of the homogeneous platoon, the corresponding constraints [28] are defined as follows:
t h > 2 τ l
For guaranteeing the platoon stability of the heterogeneous platoon, the corresponding constraints [28] are defined as follows:
{ t h , i > 2 τ l , i t h , i t h , i 1
where, th,i and th,i−1 is the headway time of the following vehicle i and i − 1, respectively, and the τl,i is the lag time of the following vehicle i.
For each following vehicles, longitudinal motion and lateral motion are optimized with a distributed MPC algorithm. The objective function is described as:
J i = j = 1 p [ y ^ p , i ( k + j / k ) y r e f , i ( k + j ) ] T Q i [ y ^ p , i ( k + j / k ) y r e f , i ( k + j ) ] + j = 0 m 1 u i ( k + j ) R i u i ( k + j )
where, p is the time domain for predicting, and m is the time domain for controlling.
By combining objective functions and constraints, the distributed MPC algorithm can be applied to a multi-vehicle following system for calculating control variables. In the Equation (10), the weight matrix Qi and weight matrix Ri are as follows:
Q i = [ w δ , i ( k ) w v , i ( k ) w a , i ( k ) w j x , i ( k ) w e s , i ( k ) w e ˙ s , i ( k ) w e α , i ( k ) w e ˙ α , i ( k ) ]
R i = [ w u 1 , i ( k ) w u 2 , i ( k ) ]
where, at sampling time k, for the following vehicle i, wδ,i (k), wv,i(k), wa,i(k), wj,i(k), w e s , i ( k ) , w e ˙ s , i ( k ) , w e α , i ( k ) and w e ˙ α , i ( k ) are the weights for δs, vrel, ax, jx, es, e ˙ s, eα and e ˙ α, respectively. And, w u 1 , i ( k ) and w u 2 , i ( k ) are the weights for u1,i and u2,i.
The various parameters of the distributed MPC algorithm are described in Table 2.

5.3. QMIX-Based Optimization Algorithm for Weights

In the process of weight optimization, only the weights of the Qi matrix are optimized, while the weights of the Ri matrix are set to a constant 1. The purpose of this is to guarantee that the weights are referenced to the weights of the Ri matrix. Since the number of all weights of the Qi matrix is 8 × n, it is difficult to optimize the weights of the Qi matrix by traditional modeling methods, so the weights are optimized through the QMIX algorithm in multi-agent reinforcement learning. The optimization algorithm for weights are designed according to the principles of multi-agent reinforcement learning and QMIX.
When there are multiple agents in the environment, the environment becomes complicated due to the competition and cooperation among multiple agents. As shown in Figure 5, it is the principle of multi-agent reinforcement learning [29]. In the training process, the policy of each agent is changing, and for any independent agent, its environment is not stable. Applying MDP directly to multi-agent systems will result in many problems. Markov game (MG) is an extension of MDP on multi-agent system, and multi-agent reinforcement learning problems can be modeled by MG [29].
For a multi-agent system consisting of n (n ≥ 2) agents, the mathematical form of MG is defined as follows:
M m = ( n , S , A 1 , , A n , P , R 1 , , R n , γ )
where, n is the number of agents, S is the state set, Ai is the action set of the agent i, I ∈ [1, 2, …, n], P is the state transition function, and Ri is the reward function of the agent i, γ is the reward discount factor, and γ ∈ [0, 1). Comparing with single-agent reinforcement learning, the difference is that the reward function and transfer function of multi-agent reinforcement learning are based on the joint action ajoint = (a1, a2, …, an). ri (si, a1, …, an) is the reward value for agent i obtained by taking a joint action (a1, a2, …, an) at the state of s.
In the multi-agent reinforcement learning, the relationship between multiple agents is mainly divided into collaboration, competition and mix. If the environment of MDP is partially observable, the MDP is called a partially observable MDP (POMDP). When there is a cooperative relationship between multiple agents, MG can be converted into a decentralized POMDP (Dec-POMDP) model [29].
The mathematical form of Dec-POMDP is described as follows:
G = ( n , S , A , O , R , Z , γ )
where, n is the number of agents, S is the state space, A is the action space, O is the observation function, R is the reward function, Z is the observation space, and γ is the discount factor.
The single-agent reinforcement learning that integrates deep neural networks is called single-agent deep reinforcement learning, such as the DQN algorithm. Since the environment is partially observable in POMDP, the DQN [30] algorithm is not suitable for POMDP, so the DQN algorithm needs to be improved. Deep recurrent Q-network (DRQN) [31] introduces a recurrent neural network to replace a fully connected layer after the DQN convolutional layer, so that it can memorize historical states and thus improve algorithm performance under partially observable conditions. Long-short-term memory networks and gated recurrent unit (GRU) networks are two special types of recurrent neural networks, and because of the special gated system structure, they can facilitate learning over a longer period of time.
Multi-agent reinforcement learning that integrates deep neural networks is called multi-agent deep reinforcement learning. The learning framework of multi-agent deep reinforcement learning can be divided into fully centralized, fully distributed, and centralized learning and distributed applications. In addition, centralized learning and distributed application frameworks are the most widely used. Value decomposition network (VDN) [32] adopts the framework of centralized learning and distributed application. In VDN, for each agent i, the value function Qi is calculated independently, and then accumulates the joint action-value function Qtot. Qtot is calculated as follows:
Q t o t ( τ j o i n t , a j o i n t ) = i = 1 n Q i ( τ i , a i ; θ i )
VDN decomposes the overall value function through a simple summation method. When VDN is trained with centralized method, it only needs to calculate the time difference error of Qtot. And then, VDN backpropagates to the value function Qi of a single agent, thereby effectively reducing the amount of calculation. Because the VDN algorithm does not consider the global state information, and the joint action-value function is obtained by simply accumulating the value functions of a single agent, the VDN algorithm has certain limitations.
The QMIX [22] algorithm is an extension of the VDN algorithm. QMIX fits multiple local action-value functions to the global action-value function through the neural network, and considers global information in the fitting process. In QMIX, to ensure that the global action-value function and the local action-value function have the same monotonicity, which should meet the needs of the following equation:
argmax a j o i n t   Q t o t ( τ j o i n t , a j o i n t ) = ( argmax a 1   Q 1 ( τ 1 , a 1 ) argmax a n   Q n ( τ n , a n ) )
The above equation can be converted into the following form:
Q t o t Q i 0 ,   i { 1 , n }
The framework of QMIX is shown in Figure 6. It consists of a mixing network and an agent network, respectively [22]. QMIX generates the weights and biases of the mixing network through the hypernetwork, thus guaranteeing the monotonicity constraint. The input of hypernetwork is the global state information st. Therefore, the mixing network can fit arbitrary monotonic functions. The agent network is implemented through DRQN, where DRQN memorizes historical states through GRU. The input of DRQN is the observation oi,t of a single agent and the action ai,t−1 in the previous time, and DRQN outputs the value Qi of a single agent.
For the QMIX, the loss function is described as follows:
L ( θ ) = i = 1 b [ ( y i t o t Q t o t ( τ j o i n t , a j o i n t , s ; θ ) ) 2 ]
where
y t o t = r + γ   max a Q t o t ( τ j o i n t , a j o i n t , s ; θ )
where, b is the number of samples taken from the experience pool, θ and θ ^ are the parameters of the main network and target network, respectively.
The selection problem of weights is converted into the Dec-POMDP firstly, and then optimization operations for weights are performed with the QMIX principle.
At the time step t, the observation oi,t of agent i is described as:
o i , t = ( δ s , i , t v r e l , i , t a x , i , t j x , i , t e s , i , t e ˙ s , i , t e α , i , t e ˙ α , i , t )
At the time step t, the action ai,t of agent i is described as:
a i , t = ( w δ s , i , t w v r e l , i , t w a x , i , t w j x , i , t w e s , i , t w e ˙ s , , i , t w e α , i , t w e ˙ α , i , t )
At the time step t, the action-observation history of agent i is described as:
τ i = ( a i , 1 , o i , 0 , , a i , t 1 , o i , t , , a n , t 1 , o n , t )
The global state st at the time step t is described as:
s t = ( o 1 , t , , o i , t , , o n , t )
At the time step t, the agent i takes action ai, and the reward ri,t is set as follows:
r i , t = ( 5 ( δ s , i , t ) 2 + 5 ( v r e l , i , t ) 2 + 50 ( a x , i , t ) 2 + 50 ( j x , i , t ) 2 ) × 0.001    ( 50 ( e s , i , t ) 2 + 250 ( e α , i , t ) 2 + 50 ( e ˙ s , i , t ) 2 + 250 ( e ˙ α , i , t ) 2 ) × 0.001    10 ζ 1 + 2 ζ 2 + ζ 3
where
{ ζ 1 = 1 ,   i f   s i m u l a t i o n   f i n i s h e s    ζ 1 = 0 ,   o t h e r w i s e { ζ 2 = 1 ,   i f   v r e l 2 < 1 ζ 2 = 0 ,   o t h e r w i s e { ζ 3 = 1 ,   i f   e s 2 < 0.01 ζ 3 = 0 ,   o t h e r w i s e
In the QMIX, the agent network adopts DRQN network which contains input layer, hidden layer and output layer. The hidden layer is composed of GRU, and the number of neurons in the hidden layer is 64. The activation function of the output layer in the network selects ReLU. All weights obtained from the output layer of network should be greater than or equal to 10−4. For the agent i, during training, the t-step input of the DRQN network is oi,t and ai,t, and the output is Qi,t. During application, the t-step input of the DRQN network is oi,t and ai,t, the output is argmaxaQi.
The maximum iteration round of QMIX is 1,000,000, the learning rate set as 5 × 10−4, the capacity for replay buffer is 5000, the batch size is set as 32, and the reward discount factor is 0.95. The parameters are synchronized for main network and target network every 200 iteration rounds. In the ε-greedy policy, the initial ε is 0.99, which is then decayed linearly for each iteration.

6. Simulation and Experiment Results

6.1. Simulation Experiment Settings

For the platoon control, objectives for optimization are the safety, followability, platoon stability, comfortability, lane tracking, stability in lateral direction and economic performance. In the experiment, the control strategy of platoon put forward in the paper is the target strategy, and the strategy is called as MPC_QMIX.
In the validation of the target strategy, two comparison strategies are set. The first comparison strategy achieves weight optimization through independent Q-learning (IQL) network [33]. The second comparison strategy adopts the constant weights. The comparison strategies are abbreviated as MPC_IQL and MPC_ORI.
For the purpose of analysis for followability, lane tracking and stability in lateral direction, the root mean square estimation (RMSE) and coordinate deviations ∆XY for longitudinal direction and lateral direction [34] are described as:
Δ X Y ( i ) = ( X ( i ) X r e f ( i ) ) 2 + ( Y ( i ) Y r e f ( i ) ) 2
R M S E v a r = 1 n t o t j = 1 n t o t ( v a r ( j ) ) 2
n t o t = T T s
The symbols in Equations (25)–(27) are listed in Table 3.
In the platoon control, the platoon can be divided into homogeneous platoon and heterogeneous platoon. The difference between two platoons in this paper mainly considers the difference in headway time th. In a homogeneous platoon, the th of all following vehicles is the same; in a heterogeneous platoon, the th for all following vehicles is different. In order to ensure the platoon stability, th needs to meet the corresponding constraint for the two kinds of platoons.
For the platoon, the number of vehicles is 5, where the vehicle 0 is the leading vehicle, and the vehicles 1, 2, 3 and 4 are following vehicles. The headway th in the homogeneous platoon is set to 1.5 s, and the headways th of the vehicles 1, 2, 3 and 4 in the heterogeneous platoon is 1.5 s, 1.4 s, 1.3 s and 1.2 s, respectively.
In the validation of the target strategy, the setting of simulation scenario is as: following a front vehicle with a changing speed all the time, from 3 s to 143 s, and the acceleration is approximately sinusoidal. As shown in Table 4, the simulation scenario is set.
As shown in Figure 7, the lane center line is set for the platoon control, which consists of 61 arcs with different lengths. The longitudinal speed and the constraints on the magnitude of lateral acceleration for the following vehicles are considered in the setting for the setting of an arc radius. The hardware and software of simulation are set in Table 5. The evaluation criteria for objectives to be optimized are set in Table 6.

6.2. Analysis of Experimental Results

6.2.1. Analysis of Experimental Results for Homogeneous Platoon

In Figure 8, the spacing of homogeneous platoon is shown. In the period 10–130 s, because the longitudinal velocity of the leading vehicle changes all the time, the spacings among vehicles for following vehicles changes all the time, but the spacing is always greater than the minimum safe spacing (5 m), thus the safety for the platoon control is ensured.
Figure 9a,b show the changes for the error of spacing and the speed in longitudinal direction of the homogeneous platoon, respectively. In Figure 9a, since the speed of the leading vehicle changes in the period of 10–130 s, the error of the spacing always exists, but the error is small, so each vehicle can track the desired spacing well. In Figure 9b, the speed in longitudinal direction of vehicles 1, 2, 3 and 4 has a small difference with the speed in longitudinal direction of the respective front vehicle, so the longitudinal velocity of the respective preceding vehicle can be well tracked. In Table 7, the R M S E δ s for vehicles 1, 2, 3 and 4 are 0.9120 m, 0.9098 m, 0.8985 m and 0.8834 m, respectively; the R M S E v r e l for vehicles 1, 2, 3 and 4 are 0.8676 m/s, 0.8127 m/s, 0.7569 m/s and 0.7051 m/s, respectively. The averages of R M S E δ s and R M S E v r e l for all following vehicles are 0.9009 m and 0.7856 m/s, respectively. To sum up, the followability is guaranteed during multi-vehicle following process.
As shown in Figure 9a, in the period of 0–20 s, the error of spacing varies greatly, which is because the following process does not reach a stable following state. In the period of 20–150 s, the motion states of vehicles 1, 2, 3 and 4 change stably with the respective preceding vehicle, and the error of spacing decreases smoothly as vehicle number increases. The error of spacing converges in the process of propagation, so the stability of the platoon in the multi-vehicle following process is guaranteed.
In Figure 10a,b, the changes for longitudinal acceleration and jerk of the homogeneous platoon are shown, respectively. In Figure 10a, the acceleration in longitudinal direction of the leading vehicle approximately changes in the form of a sine function in the period of 10–130 s. The longitudinal acceleration of vehicles 1, 2, 3 and 4 varies with the acceleration in longitudinal direction for the front vehicle, and the longitudinal acceleration varies smoothly. In Figure 10b, the absolute values of the upper and lower bounds of jerk are in range of 3 m/s3 all the time, so the comfortability for the multi-vehicle following process is guaranteed.
Figure 11 shows the vehicle trajectory of a homogeneous platoon. The reference trajectory is composed of 61 arcs with different radii. The trajectories of vehicles 1, 2, 3 and 4 can well coincide with the reference trajectory. In the Table 7, the R M S E Δ X Y for vehicles 1, 2, 3 and 4 are 0.0453 m, 0.0448 m, 0.0442 m and 0.0437 m, respectively. The average of R M S E Δ X Y for all following vehicles is 0.0445 m. Therefore, the lane tracking during the multi-vehicle following process is guaranteed.
Figure 12 shows the responses related with lateral motion of a homogeneous platoon, consisting of β, ay, δf and Ψ ˙ . The curvature of the road and the longitudinal speed are the main factors affecting the stability in lateral direction. The reference trajectory for the lane centerline is composed of 61 arcs with unequal radii, and there is a curvature difference at the connection position of the two arcs with unequal radii. When the vehicle is entering the curve, the curvature difference is large, so the responses related with the lateral motion fluctuates greatly. After entering the curve, due to the small curvature difference between adjacent arcs, the variation of responses related with the lateral motion caused by the curvature difference is small. At this time, the responses with the lateral motion are mainly affected by the longitudinal speed.
In Figure 12a, the β is shown. During the whole simulation, the longitudinal velocity is relatively high, so the vehicles 1, 2, 3 and 4 have a centrifugal trend when driving on the curve, so β is negative. Since the longitudinal speed of the leading vehicle in the platoon is approximately sinusoidal, the velocity of the following vehicle in the platoon is also approximately sinusoidal. When the longitudinal velocity decreases, β gradually increases. When the longitudinal speed increases, β gradually decreases. When the longitudinal speed tends to a constant value, β tends to be a stable value. In Figure 12b–d, the changes of ay, δf and Ψ ˙ are shown, respectively. The three responses have similar changing trends and change in real time with the speed of vehicle in front. All three responses increase as longitudinal speed increases, and decrease as longitudinal speed decreases. In the Table 7, the R M S E β for the vehicles 1, 2, 3 and 4 are 0.0935 deg, 0.0930 deg, 0.0926 deg and 0.0923 deg, respectively. The R M S E a y for the vehicles 1, 2, 3 and 4 are 0.8074 m/s2, 0.8036 m/s2, 0.7999 m/s2 and 0.7965 m/s2, respectively. The R M S E δ f for the vehicles 1, 2, 3 and 4 are 0.4424 deg, 0.4411 deg, 0.4398 deg and 0.4386 deg, respectively. And the R M S E ψ ˙ for the vehicles 1, 2, 3 and 4 are 2.0211 deg/s, 2.0154 deg/s, 2.0098 deg/s, 2.0042 deg/s, respectively. In addition, the averages of the R M S E β , R M S E a y , R M S E δ f and R M S E ψ ˙ are 0.0929 deg, 0.7951 m/s2, 0.4405 deg and 2.0126 deg/s, respectively. Since the variation range of the four responses of stability is small, the stability in lateral direction during multi-vehicle following process is guaranteed.
Figure 13a,b are the battery power and SOC of the homogeneous platoon, respectively. Due to the regenerative braking of pure electric vehicles, it is necessary to consider energy recovery during multi-vehicle following process. When the vehicles perform a acceleration operation, the battery power is positive, corresponding to energy consumption, and the SOC decreases. When the vehicles perform a deceleration operation, the battery power is negative, corresponding to energy recovery, and the SOC increases. Because there exists the conservation of energy, and the energy consumed is greater than the energy recovered, thus the SOC eventually declines. During the multi-vehicle following process, the battery power and SOC for vehicles 1, 2, 3 and 4 have similar trends. As shown in Figure 13a, in the vehicle platoon, the variation range of battery power of the rear vehicle is smaller than that of the front vehicle. From Figure 13b, in the vehicle platoon, the variation of SOC for the following vehicle is smaller than that of the preceding vehicle. This is because when each following vehicle enters a stable following state, the error of spacing is in convergence state during the propagation process. The corresponding longitudinal speed and longitudinal acceleration for the following vehicles decrease in sequence compared with the preceding vehicle. In the Table 7, the ΔSOC/s for vehicles 1, 2, 3 and 4 are 0.0053 km−1, 0.0051 km−1, 0.0049 km−1 and 0.0048 km−1. And the average of the ΔSOC/s is 0.0050 km−1. Therefore, for each vehicle in the platoon, the energy consumption decreases sequentially. Since the energy consumed is optimized and the energy consumed is taken into account during the following process, the economic performance in the multi-vehicle following process is ensured.
For the homogeneous platoon, Table 8, Table 9, Table 10 and Table 11 are the comparison results of the followability, lane tracking, stability in lateral direction and economic performance of the three strategies. Through the comparison, it can be concluded that MPC_QMIX can obtain better lane tracking, stability in lateral direction and economic performance. This is because MPC_QMIX can take advantage of global information to optimize the weights of each MPC controller during the multi-vehicle following process. MPC_IQL can only optimize the weights of each MPC controller by using the observed local information. And MPC_ORI applies constant weights to MPC algorithm, which cannot control the different following vehicle coordinately. As for the economic performance, in MPC_QMIX, the spacing and longitudinal speed during the multi-vehicle following process are more suitable, so that the entire vehicle platoon consumes less energy.
To sum up, the control strategy of platoon consisting of electric vehicle put forward in the paper, and the platoon strategy can take advantage of global information in the multi-vehicle following process. Comparing with the comparison strategy, MPC_QMIX can control the following vehicles in the platoon coordinately, and it has better lane tracking, stability in lateral direction and economic performance on the premise of guaranteeing other control objectives in the multi-vehicle following process.

6.2.2. Analysis of Experimental Results for Heterogeneous Platoon

As shown in Figure 14, the spacing of heterogeneous platoon is significantly different from the spacing of homogeneous platoon. This is because the headway time of each following vehicle in the heterogeneous platoon is different, which leads to the difference of desired spacing, so that the spacing is different for each following vehicle. In the period of 10–130 s, because the speed in longitudinal direction of the leading vehicle changes all the time, the spacing changes in real time. The spacing is always lager than the minimum value for safe spacing (5 m), thus the safety for the multi-vehicle following process is guaranteed.
Figure 15a,b are the error of spacings and longitudinal speed of heterogeneous platoon, respectively. In Figure 15a, since the headway time of each following vehicle in the heterogeneous platoon is different, the desired spacing is different. The error of spacing for each following vehicle is significantly different in the period of 0–40 s. During the period of 40–150 s, the error of spacing tends to changes steadily. Because the velocity in longitudinal direction for the front vehicle changes all the time, the distance among vehicles always exists, but the value is small. Therefore, for each vehicle, the respective expected spacing can be tracked. In Figure 15b, the speed in longitudinal direction of vehicles 1, 2, 3 and 4 differs very little from the speed in longitudinal direction for the respective front vehicle. Therefore, the speed in longitudinal direction for the respective preceding vehicle can be well tracked. In the Table 12, the R M S E δ s for vehicles 1, 2, 3 and 4 are 0.9231 m, 1.0914 m, 1.4290 m and 1.8470 m, respectively; the R M S E v r e l for vehicles 1, 2, 3 and 4 are 0.8753 m/s, 0.7666 m/s, 0.6806 m/s and 0.6109m/s, respectively. The averages of R M S E δ s and R M S E v r e l for all following vehicles are 1.3226 m and 0.7334 m/s, respectively. To sum up, the followability is guaranteed in the multi-vehicle following process.
From Figure 15a, in the period of 0–40 s, the error of spacing varies greatly, which is because the following process is not in a stable following state. In the 40–150 s period, the motion states of vehicles 1, 2, 3 and 4 change stably with the change of their respective preceding vehicles. The error of spacing decreases smoothly as vehicle number increases. The error of spacing converges in the process of propagation, so the stability of the vehicle platoon in the multi-vehicle following process is ensured.
In Figure 16a,b, the changes of longitudinal acceleration and jerk for heterogeneous platoon are presented. In Figure 16a, for the leading vehicle, the longitudinal acceleration is a constant value of 0 in the period of 0–10 s. Since the headway time of each following vehicle is different, the corresponding desired spacing is different. Therefore, the tracking effects for the respective desired spacing are different. Among them, the headway of vehicles 3 and 4 is smaller, and the corresponding error of spacing is larger. The accelerations of vehicles 3 and 4 are larger than that in vehicle 1 and vehicle 2. In the period of 10–130 s, the longitudinal acceleration for the leading vehicle is approximately sinusoidal. And for each following vehicle, the longitudinal acceleration also changes with the respective front vehicle. In addition, the acceleration of following vehicles is smooth. In Figure 16b, the absolute value of the upper and lower bounds of jerk is kept in the range of 3 m/s3 all the time, thus the comfortability in the multi-vehicle following process is guaranteed.
Figure 17 shows the vehicle trajectory of heterogeneous platoon. The reference trajectory is composed of 61 arcs with different radii. In the Table 12, the R M S E Δ X Y for vehicles 1, 2, 3 and 4 are 0.0456 m, 0.0450 m, 0.0445 m and 0.0441 m, respectively. The average of R M S E Δ X Y for all following vehicles is 0.0448 m. The trajectory of each following vehicle in the platoon can well coincide with the reference trajectory, so the lane tracking in the multi-vehicle following process is ensured.
Figure 18 shows the responses of stability in lateral direction for heterogeneous platoon consisting of β, ay, δf and Ψ ˙ . The responses of lateral motion for the heterogeneous platoon are similar to the responses of motion for the homogeneous platoon, because the headway time mainly affects the longitudinal motion of the vehicle. The headway time has little effect on the lateral motion. the stability in lateral direction is mainly affected by the road curvature and longitudinal speed. The reference trajectory of the lane centerline is composed of 61 arcs with unequal radii, and there will be a curvature difference at the connection position of the two arcs with unequal radii. At the beginning of the simulation, since the following vehicle is entering the curve, the curvature difference is large, so the responses related with stability in lateral direction change greatly. After the vehicle enters the curve, due to the small curvature difference between adjacent arcs, the changes for responses related with motion caused by the curvature difference is small, and the responses related with lateral motion is mainly affected by the longitudinal speed.
In Figure 18a, the change of β is shown. During the whole simulation, the longitudinal speed is high, and each following vehicle has a centrifugal motion trend when driving on the curve, so β is a negative. Since the speed in longitudinal direction of the leading vehicle for the platoon is approximately sinusoidal, the velocity in longitudinal direction of the following vehicle for the platoon is also approximately sinusoidal. When the longitudinal speed decreases, β gradually increases. When the velocity in longitudinal direction increases, β gradually decreases. When the speed in longitudinal direction tends to be a constant value, β tends to be a stable value. In Figure 18b–d, ay, δf and Ψ ˙ are shown, respectively. The three kinds of responses have similar changing trends and change with the front vehicle in real time. All three responses increase as longitudinal speed increases, and decrease as longitudinal speed decreases. In the Table 12, the R M S E β for the vehicles 1, 2, 3 and 4 are 0.0964 deg, 0.0945 deg, 0.0931 deg and 0.0927 deg, respectively. And the R M S E a y for the vehicles 1, 2, 3 and 4 are 0.8155 m/s2, 0.8077 m/s2, 0.8013 m/s2 and 0.7980 m/s2, respectively. The R M S E δ f for the vehicles 1, 2, 3 and 4 are 0.4441 deg, 0.4419 deg, 0.4401 deg and 0.4389 deg, respectively. And the R M S E ψ ˙ for the vehicles 1, 2, 3 and 4 are 2.0288 deg/s, 2.0193 deg/s, 2.0110 deg/s and 2.0055 deg/s, respectively. In addition, the averages of the R M S E β , R M S E a y , R M S E δ f and R M S E ψ ˙ for all following vehicles are 0.0942 deg, 0.8056 m/s2, 0.4413 deg and 2.0162 deg/s, respectively. Since the variation ranges of the four responses of lateral motion are small, the stability in lateral direction during the multi-vehicle following is ensured.
The battery power and SOC of heterogeneous platoon are shown in Figure 19. Due to the regenerative braking of electric vehicles, it is necessary to consider the energy recovery during the multi-vehicle following. When the vehicle accelerates, the battery power is positive, corresponding to energy consumption, and the SOC decreases. When the vehicle decelerates, the battery power is negative, corresponding to energy recovery, and the SOC increases. Because there exists the conservation for energy, the consumed energy is greater than the energy recovered, thus the SOC eventually declines. During the multi-vehicle following process, the battery power and SOC of each following vehicle have similar trends. As shown in Figure 19a, in the period of 0–10 s, due to the different headway time of each following vehicle, the corresponding spacing are also different, so the error of spacing is also different. Vehicles 3 and 4 have relatively small headway time, and the corresponding errors of spacing are large. Therefore, vehicles 3 and 4 takes accelerated operations to reduce the errors of spacing, and the variations in battery power are large. During the period of 10–140 s, the battery power changes steadily with the speed in longitudinal direction. And the range of the battery power of the rear vehicle is lower than that of the front vehicle. As shown in Figure 19b, at the end of the experiment, the change of SOC for the rear vehicle is lower than that of the front vehicle. This is because when each following vehicle enters a stable following state, the error of spacing converges during the propagation process. Comparing with the corresponding preceding vehicle, the variation range of speed and acceleration in longitudinal direction for the corresponding following vehicle decreases in turn. For each following vehicle in the platoon, the energy consumption decreases sequentially. In Table 12, the ΔSOC/s for vehicles 1, 2, 3 and 4 are 0.0053 km−1, 0.0051 km−1, 0.0050 km−1 and 0.0049 km−1. The average of the ΔSOC/s for all following vehicles is 0.0051 km−1. Since the energy consumption is optimized and the energy recovery is taken into account, the economic performance is ensured for the multi-vehicle following process.
For heterogeneous platoon control, Table 13, Table 14, Table 15 and Table 16 are the comparison results of the followability, lane tracking, stability in lateral direction and economic performance of the three strategies. Comparing with homogeneous platoon, the RMSE indicators of errors for spacing in heterogeneous platoon increase sequentially as the vehicle number increases, because the headway of each following vehicle in the platoon is different. The errors of spacing for vehicle 1, 2, 3 and 4 increase sequentially at the beginning of the experiment. In addition, the errors of spacing are larger at the beginning of the experiment. The error of spacing converges in the process of propagation after entering stable following state. Comparing with the comparison strategy, it can be concluded that MPC_QMIX can achieve better lane tracking, stability in lateral direction and economic performance, because MPC_QMIX can take advantage of global information to optimize the weights of each MPC controller during the multi-vehicle following process. MPC_IQL can only use local observed information to optimize the weights of each MPC controller. And MPC_ORI applies constant weights to MPC algorithm, which is difficult to control the different following vehicle coordinately. As for the economic performance, in MPC_QMIX, the spacing and longitudinal speed during the multi-vehicle following process are more suitable, so that the entire vehicle platoon consumes less energy.
Through the above analysis, the control strategy for platoon consisting of electric vehicles is put forward in the paper, and the strategy can take advantage of the global information in the multi-vehicle following process. Comparing to the comparison strategies, on the premise of guaranteeing other control objectives in the multi-vehicle following process, the multi-vehicle following process can be optimized coordinately. Therefore, the proposed strategy has better lane tracking, stability in lateral direction and economic performance.

7. Conclusions

The platoon control can be decomposed into multiple single-vehicle following controls, and it is necessary to consider the coordinated control among multiple single-vehicle following processes. Therefore, this paper studies the platoon control strategy considering the longitudinal motion and lateral motion. Firstly, a platoon model is built. Then, IDMPC strategy is designed with the distributed MPC algorithm. To control the different following vehicle coordinately, the optimization operations are performed on the weights in distributed MPC algorithm with the QMIX algorithm. Thus, the distributed MPC algorithm gets optimal control variables. Finally, the IDMPC is verified for homogeneous platoon and heterogeneous platoon. Comparing with the comparison strategies, the proposed platoon control strategy can take advantage of the global information. The spacing and longitudinal speed for the multi-vehicle following process are more suitable than comparison strategies. Therefore, the IDMPC can obtain better lane tracking, stability in lateral direction and economic performance under the premise of guaranteeing other objectives in the multi-vehicle following process. For future research, multidimensional platoon control will be considered. For example, three-dimensional platoon control is meaningful, which contains lateral control, longitudinal control and vertical control.

Author Contributions

S.Z. conceptualization; S.Z. methodology; S.Z. validation; S.Z. and X.Z. formal analysis; S.Z. writing—original draft preparation; X.Z. writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the National Natural Science Foundation of China under Grant U1713213, Grant U1913202, and Grant U1813205; in part by the Key-Area Research and Development Program of Guangdong Province under Grant 2019B090915001; in part by Shenzhen Technology Project under Grant JCYJ20180507182610734 and Grant JSGG20191129094012321.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yousefi, M.; Hajizadeh, A.; Soltani, M.N.; Hredzak, B. Predictive home energy management system with photovoltaic array, heat pump, and plug-in electric vehicle. IEEE Trans. Ind. Inf. 2021, 17, 430–440. [Google Scholar] [CrossRef]
  2. Badue, C.; Guidolini, R.; Carneiro, R.V.; Azevedo, P.; Cardoso, V.B.; Forechi, A.; Jesus, L.; Berriel, R.; Paixão, T.M.; Mutz, F.; et al. Self-driving cars: A survey. Expert Syst. Appl. 2021, 165, 113816. [Google Scholar] [CrossRef]
  3. Zhang, R.H.; Li, K.N.; Wu, Y.Y.; Zhao, D.Z.; Lv, Z.L.; Li, F.L.; Cheng, X.; Qiu, Z.J.; Yu, F. A multi-vehicle longitudinal trajectory collision avoidance strategy using AEBS with vehicle-infrastructure communication. IEEE Trans. Veh. Technol. 2022, 71, 1253–1266. [Google Scholar] [CrossRef]
  4. He, Z.J.; Qin, S.; Wei, Y.J.; Gao, B.Z.; Zhu, B.; He, L. A model predictive control approach with slip ratio estimation for electric motor antilock braking of battery electric vehicle. IEEE Trans. Ind. Electron. 2022, 69, 9225–9234. [Google Scholar] [CrossRef]
  5. Liu, S.; Li, Z.; Ji, H.; Wang, L.; Hou, Z. A novel anti-saturation model-free adaptive control algorithm and its application in the 823 electric vehicle braking energy recovery system. Symmetry 2022, 14, 580. [Google Scholar] [CrossRef]
  6. Pei, W.; Zhang, Q.; Li, Y. Efficiency Optimization Strategy of Permanent Magnet Synchronous Motor for Electric Vehicles Based on Energy Balance. Symmetry 2022, 14, 164. [Google Scholar] [CrossRef]
  7. Wang, Y.; Wang, Z.; Han, K.; Tiwari, P.; Work, D.B. Gaussian process-based personalized adaptive cruise control. IEEE Trans. Intell. Transp. Syst. 2022, 1–12. Available online: https://ieeexplore.ieee.org/document/9774935/ (accessed on 13 May 2022). [CrossRef]
  8. Groelke, B.; Earnhardt, C.; Borek, J.; Vermillion, C. A predictive command governor-based adaptive cruise controller with collision avoidance for non-connected vehicle following. IEEE Trans. Intell. Transp. Syst. 2022, 23, 12276–12286. [Google Scholar] [CrossRef]
  9. Jia, D.; Chen, H.; Zheng, Z.; Watling, D.; Connors, R.; Gao, J.; Li, Y. An enhanced predictive cruise control system design with data-driven traffic prediction. IEEE Trans. Intell. Transp. Syst. 2022, 7, 8170–8183. [Google Scholar] [CrossRef]
  10. Ruan, S.; Ma, Y.; Yang, N.; Xiang, C.; Li, X. Real-time energy-saving control for HEVs in car-following scenario with a double explicit MPC approach. Energy 2022, 247, 123265. [Google Scholar] [CrossRef]
  11. Li, S.; Li, K.; Rajamani, R.; Wang, J. Model Predictive Multi-Objective Vehicular Adaptive Cruise Control. IEEE Trans. Control Syst. Technol. 2011, 19, 556–566. [Google Scholar] [CrossRef]
  12. Lamprecht, A.; Steffen, D.; Nagel, K.; Haecker, J.; Graichen, K. Optimal management and configuration methods for automobile cruise control systems. In Proceedings of the 18th Annual Conference on Systems Engineering Research (CSER), Charlottesville, VA, USA, 19–21 March 2020; pp. 429–439. [Google Scholar]
  13. Rashid, T.; Samvelyan, M.; Schroeder, C.; Farquhar, G.; Foerster, J.; Whiteson, S. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4295–4304. [Google Scholar]
  14. Ly, K.; Mayekar, J.V.; Aguasvivas, S.; Keplinger, C.; Rentschler, M.E.; Correll, N. Electro-hydraulic rolling soft wheel: Design, hybrid dynamic modeling, and model predictive control. IEEE Trans. Rob. 2022, 1–20. Available online: https://ieeexplore.ieee.org/document/9766178/ (accessed on 2 May 2022). [CrossRef]
  15. Yeganegi, M.H.; Khadiv, M.; Prete, A.D.; Moosavian, S.A.A.; Righetti, L. Robust walking based on MPC with viability guarantees. IEEE Trans. Rob. 2022, 38, 1–16. [Google Scholar] [CrossRef]
  16. Wu, Z.; Xia, X.; Zhu, B. Model predictive control for improving operational efficiency of overhead cranes. Nonlinear Dyn. 2015, 79, 2639–2657. [Google Scholar] [CrossRef]
  17. Capuano, A.; Spano, M.; Musa, A.; Toscano, G.; Misul, D.A. Development of an adaptive model predictive control for platooning safety in battery electric vehicles. Energies 2021, 14, 5291. [Google Scholar] [CrossRef]
  18. Caiazzo, B.; Coppola, A.; Petrillo, A.; Santini, S. Distributed nonlinear model predictive control for connected autonomous electric vehicles platoon with distance-dependent air drag formulation. Energies 2021, 14, 5122. [Google Scholar] [CrossRef]
  19. Ma, H.; Chu, L.; Guo, J.H.; Wang, J.W.; Guo, C. Cooperative adaptive cruise control strategy optimization for electric vehicles based on SA-PSO with model predictive control. IEEE Access 2020, 8, 225745–225756. [Google Scholar] [CrossRef]
  20. Lopes, D.R.; Evangelou, A. Energy savings from an eco-cooperative adaptive cruise control: A BEV platoon investigation. In Proceedings of the 18th European Control Conference (ECC), Napoli, Italy, 25–28 June 2019; pp. 4160–4167. [Google Scholar]
  21. Ma, F.W.; Yang, Y.; Wang, J.W.; Liu, Z.Z.; Li, J.H.; Nie, J.H. Predictive energy-saving optimization based on nonlinear model predictive control for cooperative connected vehicles platoon with V2V communication. Energy 2019, 189, 116120. [Google Scholar] [CrossRef]
  22. Chen, J.; Sun, D.; Zhao, M.; Li, Y.; Liu, Z. A new lane keeping method based on human-simulated intelligent control. IEEE Trans. Intell. Transp. Syst. 2022, 23, 7058–7069. [Google Scholar] [CrossRef]
  23. Zhang, S.; Zhuan, X.T. Study on adaptive cruise control strategy for battery electric vehicle. Math. Probl. Eng. 2019, 2019, 7971594. [Google Scholar] [CrossRef]
  24. Li, L.; Zhang, Y.B.; Yang, C.; Yang, B.J.; Martinez, M. Model predictive control-based efficient energy recovery control strategy for regenerative braking system of hybrid electric bus. Energy Convers. Manag. 2016, 111, 299–314. [Google Scholar] [CrossRef]
  25. Abdollahi, A.; Han, X.; Avvari, G.; Raghunathan, N.; Balasingam, B.; Pattipati, K.R.; Bar-Shalom, Y. Optimal battery charging, Part I: Minimizing time-to-charge, energy loss, and temperature rise for OCV-resistance battery model. J. Power Sources. 2016, 303, 388–398. [Google Scholar] [CrossRef] [Green Version]
  26. Zhang, S.; Zhuan, X.T.; Fang, Y.T.; Cheng, J. Model-predictive optimization for lane keeping assistance system with exponential decay smoothing. In Proceedings of the 2021 IEEE International Conference on Robotics and Biomimetics, Sanya, China, 27–31 December 2021; pp. 1–6. [Google Scholar]
  27. Dang, R.; He, C.; Zhang, Q. ACC of electric vehicles with coordination control of fuel economy and tracking safety. In Proceedings of the Intelligent Vehicles Symposium, Alcala de Henares, Spain, 3–7 June 2012; pp. 240–245. [Google Scholar]
  28. Xiao, L.Y.; Gao, F. Practical string stability of platoon of adaptive cruise control vehicles. IEEE Trans. Intell. Transp. Syst. 2011, 12, 1184–1194. [Google Scholar] [CrossRef]
  29. Li, T.; Zhu, K.; Luong, N.C.; Niyato, D.; Wu, Q.; Zhang, Y.; Chen, B. Applications of Multi-agent reinforcement learning in future internet: A comprehensive survey. IEEE Commmun. Surv. Tutorials. 2022, 24, 1240–1279. [Google Scholar] [CrossRef]
  30. Mnih, V. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
  31. Hausknecht, M.; Stone, P. Deep recurrent Q-learning for partially observable. In Proceedings of the 2015 AAAI Fall Symposium Series, Arlington, TX, USA, 12–14 November 2015. [Google Scholar]
  32. Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-decomposition networks for cooperative multi-agent learning. arXiv 2017, arXiv:1706.05296. [Google Scholar]
  33. Tan, M. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, Honolulu, HI, USA, 27–29 July 1993; pp. 330–337. [Google Scholar]
  34. Batra, M.; McPhee, J.; Azad, N.L. Anti-jerk model predictive cruise control for connected electric vehicles with changing road conditions. In Proceedings of the 2017 11th Asian Control Conference (ASCC), Gold Coast, Australia, 17–20 December 2017; pp. 49–54. [Google Scholar]
Figure 1. The vehicle configuration of a front-drive electric vehicle.
Figure 1. The vehicle configuration of a front-drive electric vehicle.
Symmetry 14 02069 g001
Figure 2. The characteristics of a motor and the model of a battery. (a) External characteristics of the motor, (b) Battery model.
Figure 2. The characteristics of a motor and the model of a battery. (a) External characteristics of the motor, (b) Battery model.
Symmetry 14 02069 g002
Figure 3. The modeling process for a vehicle platoon. (a) Vehicle model, (b) Car-following control of following vehicle i, (c) Longitudinal car-following process for following vehicle i, (d) Lateral lane keeping process for following vehicle i.
Figure 3. The modeling process for a vehicle platoon. (a) Vehicle model, (b) Car-following control of following vehicle i, (c) Longitudinal car-following process for following vehicle i, (d) Lateral lane keeping process for following vehicle i.
Symmetry 14 02069 g003
Figure 4. The distributed control architecture for electric vehicle platoon.
Figure 4. The distributed control architecture for electric vehicle platoon.
Symmetry 14 02069 g004
Figure 5. The principle of multi-agent reinforcement learning.
Figure 5. The principle of multi-agent reinforcement learning.
Symmetry 14 02069 g005
Figure 6. The principle of the QMIX algorithm.
Figure 6. The principle of the QMIX algorithm.
Symmetry 14 02069 g006
Figure 7. The settings for the lane centerline.
Figure 7. The settings for the lane centerline.
Symmetry 14 02069 g007
Figure 8. Spacing of the homogeneous platoon.
Figure 8. Spacing of the homogeneous platoon.
Symmetry 14 02069 g008
Figure 9. The error of spacing and longitudinal speed of homogeneous platoon. (a) Error of spacing, (b) Longitudinal speed.
Figure 9. The error of spacing and longitudinal speed of homogeneous platoon. (a) Error of spacing, (b) Longitudinal speed.
Symmetry 14 02069 g009
Figure 10. The longitudinal acceleration and jerk of the homogeneous platoon. (a) Longitudinal acceleration, (b) Jerk.
Figure 10. The longitudinal acceleration and jerk of the homogeneous platoon. (a) Longitudinal acceleration, (b) Jerk.
Symmetry 14 02069 g010
Figure 11. The vehicle trajectory of the homogeneous platoon.
Figure 11. The vehicle trajectory of the homogeneous platoon.
Symmetry 14 02069 g011
Figure 12. The responses related to lateral motion of homogeneous platoon. (a) Sideslip angle of centroid, (b) Lateral acceleration, (c) Front steering angle, (d) Yaw rate.
Figure 12. The responses related to lateral motion of homogeneous platoon. (a) Sideslip angle of centroid, (b) Lateral acceleration, (c) Front steering angle, (d) Yaw rate.
Symmetry 14 02069 g012aSymmetry 14 02069 g012b
Figure 13. The battery power and SOC of homogeneous platoon. (a) Battery power, (b) SOC.
Figure 13. The battery power and SOC of homogeneous platoon. (a) Battery power, (b) SOC.
Symmetry 14 02069 g013
Figure 14. Spacing of the heterogeneous platoon.
Figure 14. Spacing of the heterogeneous platoon.
Symmetry 14 02069 g014
Figure 15. Error of spacing and longitudinal speed in a heterogeneous platoon. (a) Error of spacing, (b) Longitudinal speed.
Figure 15. Error of spacing and longitudinal speed in a heterogeneous platoon. (a) Error of spacing, (b) Longitudinal speed.
Symmetry 14 02069 g015
Figure 16. The longitudinal acceleration and jerk of a heterogeneous platoon. (a) Longitudinal acceleration, (b) Jerk.
Figure 16. The longitudinal acceleration and jerk of a heterogeneous platoon. (a) Longitudinal acceleration, (b) Jerk.
Symmetry 14 02069 g016
Figure 17. Vehicle trajectory of a heterogeneous platoon.
Figure 17. Vehicle trajectory of a heterogeneous platoon.
Symmetry 14 02069 g017
Figure 18. Responses related to the lateral motion of a heterogeneous platoon. (a) Sideslip angle of centroid, (b) Lateral acceleration, (c) Front steering angle, (d) Yaw rate.
Figure 18. Responses related to the lateral motion of a heterogeneous platoon. (a) Sideslip angle of centroid, (b) Lateral acceleration, (c) Front steering angle, (d) Yaw rate.
Symmetry 14 02069 g018aSymmetry 14 02069 g018b
Figure 19. The battery power and SOC of a homogeneous platoon. (a) Battery power, (b) SOC.
Figure 19. The battery power and SOC of a homogeneous platoon. (a) Battery power, (b) SOC.
Symmetry 14 02069 g019
Table 1. The symbols in the platoon model.
Table 1. The symbols in the platoon model.
SymbolDescription
xi(k)state variables
si(k)spacing
vx,i(k)longitudinal speed
vrel,i(k)relative speed
ax,i(k)longitudinal acceleration
jx,i(k)jerk
es,i(k)lateral distance deviation
e ˙ s,i(k)derivative of lateral distance deviation
eα,i(k)directional deviation
e ˙ α,i(k)derivative of directional deviation
ui(k)control variables
axdes,i(k)desired longitudinal acceleration
δf,I(k)targeted front steering angle
wi(k)variable of system disturbance
afx,i(k)longitudinal acceleration of front vehicle
ψ ˙ des,i(k)desired yaw rate
Tssampling time
τ l time lag
Mvehmass of electric vehicle
Izmoment of inertia of electric vehicle
Cαfthe cornering stiffness of the front wheels
Cαrthe cornering stiffness of the rear wheels
l1the distance between centroid and front axles
l2the distance between centroid and rear axles
Table 2. The various parameters for the distributed MPC algorithm.
Table 2. The various parameters for the distributed MPC algorithm.
SymbolValueSymbolValue
dc5 ml21.58 m
d07 mCαf80 KN/rad
vxmax36 m/sCαr80 KN/rad
vxmin0 m/sIz2873 kg·m3
axmax2.5 m/s2u2max5 deg
axmin−5.5 m/s2u2min−5 deg
u1max2.5 m/s2 ρ e s 0.6
u1min−5.5 m/s2 ρ e ˙ s 0.6
jxmax3 m/s3 ρ e α 0.6
jxmin−3 m/s3 ρ e ˙ α 0.6
ρδ0.94p10
ρv0.94m5
ρa0.94T150 s
ρj0.94Ts0.05 s
RDiag (1, 1)th1.5 s
Mveh1550 kgτl0.15 s
l11.1 m--
Table 3. The symbols in the RMSE calculations.
Table 3. The symbols in the RMSE calculations.
SymbolDescription
Xactual horizontal coordinate
Yactual vertical coordinate
Xrefthe referenced horizontal coordinate
Yrefthe referenced vertical coordinate
var(j)various variables (β, ay, δf and Ψ ˙ ) at moment j
ntotthe number of calculations
Table 4. The settings of the simulation scenario.
Table 4. The settings of the simulation scenario.
T (s)μini_vf (m/s)ini_vx (m/s)amp_ax (m/s2)ini_∆s (m)
25252525144.5
μthe ground adhesion coefficient
ini_vf (m/s)the initial longitudinal velocity of the front vehicle
ini_vx (m/s)the initial longitudinal velocity of the following vehicle
amp_ax (m/s2)the amplitude for longitudinal acceleration of the front vehicle
inis (m)the initial spacing
Table 5. The hardware and software for simulation.
Table 5. The hardware and software for simulation.
NameProperty
GPUNVIDIA TITAN V
CPUIntel Core i7-4790 (3.60 GHz)
Memory32GB (3200 MHz)
Operating systemWindows 10 (64 bit)
CUDA10.1
Python3.8.8
PyTorch1.7.1
CarSim2016.1
Matlab2018a
Table 6. The evaluation criteria for objectives to be optimized.
Table 6. The evaluation criteria for objectives to be optimized.
ObjectivesIndicators
safetymin|Δs| > 5 m
followability R M S E δ s and R M S E v r e l
platoon stabilityδs,i → 0
comfortabilitymax|jerk| < 3 m/s3
lane tracking R M S E Δ X Y
stability in lateral direction R M S E β , R M S E a y , R M S E δ f and R M S E ψ ˙
economic performance∆SOC/s
Table 7. Multiple indicators of the MPC_QMIX strategy for a homogeneous platoon.
Table 7. Multiple indicators of the MPC_QMIX strategy for a homogeneous platoon.
ObjectivesIndicatorVehicle1Vehicle2Vehciel3Vehicle4Average
Followability R M S E δ s (m)0.91200.90980.89850.88340.9009
R M S E v r e l   ( m / s ) 0.86760.81270.75690.70510.7856
Lane tracking R M S E Δ X Y (m)0.04530.04480.04420.04370.0445
Stability
in lateral
direction
R M S E β (deg)0.09350.09300.09260.09230.0929
R M S E a y   ( m / s 2 ) 0.80740.80360.79990.79650.7951
R M S E δ f   ( deg ) 0.44240.44110.43980.43860.4405
R M S E ψ ˙   ( deg / s ) 2.02112.01542.00982.00422.0126
Economic performanceΔSOC/s (km−1)0.00530.00510.00490.00480.0050
Table 8. A comparison of followability for three strategies in homogeneous platoon.
Table 8. A comparison of followability for three strategies in homogeneous platoon.
StrategyIndicatorVehicle1Vehicle2Vehciel3Vehicle4Average
MPC_QMIX R M S E δ s (m)0.91200.90980.89850.88340.9009
R M S E v r e l   ( m / s ) 0.86760.81270.75690.70510.7856
MPC_IOL R M S E δ s (m)0.94130.92190.90800.89430.9164
R M S E v r e l   ( m / s ) 0.89530.83100.77030.72190.8046
MPC_ORI R M S E δ s (m)1.15311.13041.11771.10351.1261
R M S E v r e l   ( m / s ) 0.85930.79900.73950.69050.7721
Table 9. A comparison of lane tracking for three strategies in homogeneous platoon.
Table 9. A comparison of lane tracking for three strategies in homogeneous platoon.
StrategyIndicatorVehicle1Vehicle2Vehciel3Vehicle4Average
MPC_QMIX R M S E Δ X Y (m)0.04530.04480.04420.04370.0445
MPC_IQL R M S E Δ X Y (m)0.04700.04630.04570.04500.0460
MPC_ORI R M S E Δ X Y (m)0.06830.06720.06650.06590.0670
Table 10. A comparison of stability in lateral direction for three strategies in homogeneous platoon.
Table 10. A comparison of stability in lateral direction for three strategies in homogeneous platoon.
StrategyIndicatorVehicle1Vehicle2Vehciel3Vehicle4Average
MPC_QMIX R M S E β (deg)0.09350.09300.09260.09230.0929
R M S E a y   ( m / s 2 ) 0.80740.80360.79990.79650.7951
R M S E δ f   ( deg ) 0.44240.44110.43980.43860.4405
R M S E ψ ˙   ( deg / s ) 2.02112.01542.00982.00422.0126
MPC_IOL R M S E β (deg)0.09670.09610.09570.09540.0960
R M S E a y   ( m / s 2 ) 0.82850.82470.82100.81790.8230
R M S E δ f   ( deg ) 0.46360.46220.46080.45950.4615
R M S E ψ ˙   ( deg / s ) 2.13072.12322.11262.10672.1183
MPC_ORI R M S E β (deg)0.10370.10300.10250.10210.1028
R M S E a y   ( m / s 2 ) 0.91630.91240.90810.90470.9104
R M S E δ f   ( deg ) 0.55680.55430.55210.54990.5533
R M S E ψ ˙   ( deg / s ) 2.53322.52652.50012.44332.5008
Table 11. A comparison of economic performance for three strategies in homogeneous platoon.
Table 11. A comparison of economic performance for three strategies in homogeneous platoon.
StrategyIndicatorVehicle1Vehicle2Vehciel3Vehicle4Average
MPC_QMIXΔSOC/s (km−1)0.00530.00510.00490.00480.0050
MPC_IQLΔSOC/s (km−1)0.00570.00540.00510.00500.0053
MPC_ORIΔSOC/s (km−1)0.00650.00610.00580.00560.0060
Table 12. Multiple indicators of the MPC_QMIX strategy for a heterogeneous platoon.
Table 12. Multiple indicators of the MPC_QMIX strategy for a heterogeneous platoon.
ObjectivesIndicatorVehicle1Vehicle2Vehciel3Vehicle4Average
Followability R M S E δ s (m)0.92311.09141.42901.84701.3226
R M S E v r e l   ( m / s ) 0.87530.76660.68060.61090.7334
Lane tracking R M S E Δ X Y (m)0.04560.04500.04450.04410.0448
Stability
in lateral
direction
R M S E β (deg)0.09640.09450.09310.09270.0942
R M S E a y   ( m / s 2 ) 0.81550.80770.80130.79800.8056
R M S E δ f   ( deg ) 0.44410.44190.44010.43890.4413
R M S E ψ ˙   ( deg / s ) 2.02882.01932.01102.00552.0162
Economic performanceΔSOC/s (km−1)0.00530.00510.00500.00490.0051
Table 13. The comparison of followability for three strategies in a heterogeneous platoon.
Table 13. The comparison of followability for three strategies in a heterogeneous platoon.
StrategyIndicatorVehicle1Vehicle2Vehciel3Vehicle4Average
MPC_QMIX R M S E δ s (m)0.92311.09141.42901.84701.3226
R M S E v r e l   ( m / s ) 0.87530.76660.68060.61090.7334
MPC_IOL R M S E δ s (m)0.94131.47131.85802.13431.6012
R M S E v r e l   ( m / s ) 0.89530.78620.69180.63760.7527
MPC_ORI R M S E δ s (m)1.15311.68852.04362.32421.8924
R M S E v r e l   ( m / s ) 0.87930.74750.66320.59440.7211
Table 14. The comparison of lane tracking for three strategies in a heterogeneous platoon.
Table 14. The comparison of lane tracking for three strategies in a heterogeneous platoon.
StrategyIndicatorVehicle1Vehicle2Vehciel3Vehicle4Average
MPC_QMIX R M S E Δ X Y (m)0.04560.04500.04450.04410.0448
MPC_IQL R M S E Δ X Y (m)0.04700.04650.04590.04530.0462
MPC_ORI R M S E Δ X Y (m)0.06830.06750.06700.06630.0673
Table 15. The comparison of stability in lateral direction for three strategies in a heterogeneous platoon.
Table 15. The comparison of stability in lateral direction for three strategies in a heterogeneous platoon.
StrategyIndicatorVehicle1Vehicle2Vehciel3Vehicle4Average
MPC_QMIX R M S E β (deg)0.09640.09450.09310.09270.0942
R M S E a y   ( m / s 2 ) 0.81550.80770.80130.79800.8056
R M S E δ f   ( deg ) 0.44410.44190.44010.43890.4413
R M S E ψ ˙   ( deg / s ) 2.02882.01932.01102.00552.0162
MPC_IOL R M S E β (deg)0.09670.09630.09600.09570.0962
R M S E a y   ( m / s 2 ) 0.82850.82520.82240.81900.8238
R M S E δ f   ( deg ) 0.46360.46270.46130.46030.4620
R M S E ψ ˙   ( deg / s ) 2.13072.12572.11482.11032.1204
MPC_ORI R M S E β (deg)0.10370.10320.10280.10240.1030
R M S E a y   ( m / s 2 ) 0.91630.91400.91070.90640.9119
R M S E δ f   ( deg ) 0.55680.55520.55340.55090.5541
R M S E ψ ˙   ( deg / s ) 2.53322.52872.50292.44772.5031
Table 16. A comparison of economic performance for three strategies in a heterogeneous platoon.
Table 16. A comparison of economic performance for three strategies in a heterogeneous platoon.
StrategyIndicatorVehicle1Vehicle2Vehciel3Vehicle4Average
MPC_QMIXΔSOC/s (km−1)0.00530.00510.00500.00490.0051
MPC_IQLΔSOC/s (km−1)0.00570.00550.00520.00510.0054
MPC_ORIΔSOC/s (km−1)0.00650.00620.00600.00580.0061
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhang, S.; Zhuan, X. Distributed Model Predictive Control for Two-Dimensional Electric Vehicle Platoon Based on QMIX Algorithm. Symmetry 2022, 14, 2069. https://doi.org/10.3390/sym14102069

AMA Style

Zhang S, Zhuan X. Distributed Model Predictive Control for Two-Dimensional Electric Vehicle Platoon Based on QMIX Algorithm. Symmetry. 2022; 14(10):2069. https://doi.org/10.3390/sym14102069

Chicago/Turabian Style

Zhang, Sheng, and Xiangtao Zhuan. 2022. "Distributed Model Predictive Control for Two-Dimensional Electric Vehicle Platoon Based on QMIX Algorithm" Symmetry 14, no. 10: 2069. https://doi.org/10.3390/sym14102069

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop