Distributed Model Predictive Control for Two-Dimensional Electric Vehicle Platoon Based on QMIX Algorithm

Zhang, Sheng; Zhuan, Xiangtao

doi:10.3390/sym14102069

Open AccessArticle

Distributed Model Predictive Control for Two-Dimensional Electric Vehicle Platoon Based on QMIX Algorithm

by

Sheng Zhang

¹

and

Xiangtao Zhuan

^1,2,*

¹

Department of Artificial Intelligence and Automation, School of Electrical Engineering and Automation, Wuhan University, Wuhan 430072, China

²

Shenzhen Research Institute, Wuhan University, Shenzhen 518057, China

^*

Author to whom correspondence should be addressed.

Symmetry 2022, 14(10), 2069; https://doi.org/10.3390/sym14102069

Submission received: 13 August 2022 / Revised: 16 September 2022 / Accepted: 26 September 2022 / Published: 4 October 2022

(This article belongs to the Section Engineering and Materials)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, an improved distributed model predictive control (IDMPC) method for the platoon consisting of electric vehicles is put forward. And the motion of the platoon is performed in two dimensions, which contains longitudinal motion and lateral motion. Firstly, a platoon model is built based on the car-following model for a single following vehicle. Then, the IDMPC strategy is designed with the consideration of multiple objectives. The symmetrical weight matrices in the IDMPC are important for the final control effect. To control each following vehicle in the platoon coordinately, the weights for the IDMPC are optimized based on the QMIX algorithm in multi-agent reinforcement learning. The QMIX can fully consider the global information in the multi-vehicle following process; therefore, the IDMPC can get optimal control variables. Finally, the simulation and experimental results verify the IDMPC. Compared to the comparison strategies, the IDMPC has the better lane tracking, stability in lateral direction and economic performance.

Keywords:

electric vehicle; platoon control; distributed model predictive control; weights; QMIX

1. Introduction

As the automotive sector develops, electric vehicles [1] and autonomous driving systems (ADSs) [2] are gradually becoming two important trends. In addition, vehicle control evolves from single-vehicle control to multi-vehicle control. Multi-vehicle control technology has therefore been widely applied in the vehicle platoon [3].

Electric vehicles are mainly composed of modules such as electric motors and batteries. Electric vehicles are driven by electricity, and the electrical energy in the battery is converted into power for the electric vehicle by the electric motor [4]. Therefore, compared with fuel vehicles, electric vehicles are environmentally friendly as they do not consume fossil energy and do not produce polluting gases. It takes a long time for electric vehicles to charge, and there is a higher requirement for electric vehicles on range with per charge. In addition, the range for per charge reflects the economic performance of electric vehicles. Therefore, the economic performance of electric vehicles is crucial [5,6].

As one of the ADSs, adaptive cruise control (ACC) is widely applied in the field of automotive motion control [7]. ACC can assist the driver in the longitudinal driving process, and because of assistance from the ACC, the driver is able to drive with ease [8]. The working state of ACC consists of the state without front vehicle and the state with front vehicle [9]. For the state without front vehicle, the velocity of the following vehicle remains unchanged [10]. For the state with front vehicle, the velocity of the following vehicle is related with the front vehicle and varies with the front vehicle [11]. The state with front vehicle is also called the car-following state.

With the development of ACC system, the following control has evolved from single-vehicle following control to multi-vehicle following control [12]. In addition, the multi-vehicle following control is an extension of the single-vehicle following control and can be also called the platoon control. Therefore, the platoon control not only needs to consider the single-vehicle following control, but also needs to consider the coordinated control among different following vehicles. In order to realize the coordinated control for different following vehicles, it is necessary to fully consider the global information in the platoon control. Therefore, the platoon consisting of electric vehicles is the object of study, and the focus of the paper is on the control strategy of platoon. In addition, the economic performance of platoon is fully considered.

In the one-dimensional vehicle platoon, only the motion in the longitudinal direction is considered. For the two-dimensional vehicle platoon, the motions of longitudinal direction and the lateral direction need to be considered. For platoon control strategies, model predictive control (MPC) algorithms achieve a wide range of application. And the improved distributed model predictive control (IDMPC) method is put forward for the platoon in the paper. In addition, for IDMPC, the symmetrical weight matrices in IDMPC are important for the final control effect. Furthermore, to control different following vehicle in the platoon coordinately, the global information needs to be fully considered. The QMIX algorithm [13] in multi-agent reinforcement learning can make full use of global information. The highlight of this paper is that to control each following vehicle in the platoon coordinately, the optimization operations are performed on the weights for IDMPC with the QMIX algorithm.

The content of the paper is organized as follows: the related work is presented in Section 2; the vehicle model for electric vehicle is built in Section 3; the platoon model is built in Section 4; the IDMPC is designed in Section 5; the IDMPC is verified in Section 6; the conclusions are drawn in Section 7.

2. Related Work

The control modes of MPC contain distributed control and centralized control. The distributed MPC uses multiple MPC frameworks to control each following vehicle in the platoon, while centralized MPC uses one MPC framework to control all following vehicles. Because the centralized control leads to a series of problems with the complex data interaction and lack of flexibility of control system, therefore the distributed control is mostly used in existing researches of platoon control. The MPC has formed an effective design and scientific analysis method with good stability and robustness, and the MPC is superior in dealing with multivariate constrained control and multi-objective optimal control problems [14,15,16]. MPC controls the future state according to the current state and control variables of the system, and the future state of the system is unknown, so it is necessary to continuously adjust the future control variable according to the system state. The MPC method seeks a control sequence with the current state and applies the first control value into the system.

For the control strategy of the platoon consisting of electric vehicles, the existing researches are mainly performed with the MPC algorithm. In [17], a simple and effective platoon control strategy for electric vehicles was put forward with the MPC algorithm, which can ensure the safety of the platoon in longitudinal cut-in operations. In [18], a heterogeneous control strategy of platoon consisting of electric vehicles was put forward with the MPC algorithm, which can save energy in the longitudinal following process by adjusting the distances between adjacent vehicles. In [19], a control strategy of platoon consisting of electric vehicles with the MPC algorithm was proposed, the multiple objectives of longitudinal motion for the platoon was optimized, and the economy was improved on the premise of ensuring other performance. In [20], an ecological platoon control strategy was proposed with the MPC algorithm, and the setting for longitudinal speed profile takes into account of energy consumption and longitudinal distance between vehicles. In [21], an energy-efficient platoon control strategy was proposed for connected electric vehicles with the MPC algorithm, which can minimize energy consumption of platoon for longitudinal motion, and the topology of the communication was fully considered.

The existing studies about control strategy for platoons consisting of electric vehicle only consider the motion in longitudinal direction, and ignore the motion in lateral direction, and lateral motion of the vehicle can be performed by the lane keeping system [22]. For the study of control strategies of a two-dimensional platoon consisting of the electric vehicles, it is necessary to consider the coordinated control between different following vehicles and the coupling between longitudinal motion and lateral motion of vehicles.

3. Vehicle Model for Electric Vehicle

The target vehicle of this paper is an electric vehicle with a front-drive motor, and the vehicle configuration is shown in Figure 1. The power system of the electric vehicle is different from the traditional fuel vehicle, mainly composed of a drive motor, main gearbox and power battery, etc. The braking system is mainly composed of a traditional hydraulic braking system and motor regenerative braking. The target vehicle dynamics model is built in Carsim. There is no electric vehicle dynamics model in Carsim, so the motor and battery models need to be externalized from Simulink.

The electric motor is a key component of the power system for electric vehicle, which can play the role of electric motor and generator in the driving and braking process respectively. In the process of driving, the drive motor acts as the power source of the vehicle and is powered by the battery to drive the vehicle; in the process of braking or coasting the vehicle, after the braking energy recovery function is turned on, the drive motor can act as a generator to provide part or all of the braking torque for the vehicle and convert the kinetic energy of the vehicle into electrical energy to be stored in the power battery [23]. For the electric vehicle in this paper, a permanent magnet synchronous motor (PMSM) with wide speed range, high power density and high efficiency is adopted. In the model construction of the drive motor, less consideration is given to the complex dynamic characteristics of PMSM, the focus is on its mechanical and electrical power output characteristics and efficiency characteristics, and the internal model is simplified as much as possible [24]. The external characteristics of a motor is shown in Figure 2a.

Power batteries mainly include lead-acid batteries, nickel-based batteries and lithium-based batteries, among which lithium batteries have higher voltage level, high energy and high power, good stability and no pollution, and are now widely used in power batteries of electric vehicle. A lithium battery is a complex nonlinear electrochemical energy storage system, so this paper ignores its chemical characteristics and builds a power battery model based on the equivalent internal resistance [25]. The battery model is presented in Figure 2b.

4. Model of Vehicle Platoon

In Figure 3, the modeling process of vehicle platoon is presented. In Figure 3a, the number of vehicles is n + 1 (n ≥ 2), and the vehicle 0 is the leading vehicle. The platoon model is constructed with vehicle i − 1 and the vehicle i as the vehicle in front and following vehicle in Figure 3b, respectively. Longitudinal motion in Figure 3c and lateral motion in Figure 3d need to be considered for platoon model.

With the model in [23,26], the platoon model is built. State equation of the platoon model is defined as:

x_{i} (k + 1) = A_{i} x_{i} (k) + B_{i} u_{i} (k) + G_{i} w_{i} (k)

(1)

where

x_{i} (k) = {[\begin{array}{l} Δ s_{i} (k), & v_{x, i} (k), & v_{r e l, i} (k), & a_{x, i} (k), & j_{x, i} (k), & e_{s, i} (k), & {\dot{e}}_{s, i} (k), & e_{α, i} (k), & {\dot{e}}_{α, i} (k) \end{array}]}^{T} u_{i} (k) = {[\begin{matrix} a_{x d e s, i} (k), & δ_{f, i} (k) \end{matrix}]}^{T} w_{i} (k) = {[\begin{matrix} a_{f x, i} (k), & {\dot{Ψ}}_{d e s, i} (k) \end{matrix}]}^{T} A_{i} = [\begin{matrix} A_{1, i} \\ 0 \end{matrix} \begin{matrix} 0 \\ A_{2, i} \end{matrix}] A_{1, i} = [\begin{matrix} 1 & 0 & T_{s} & - \frac{1}{2} T_{s}^{2} & 0 \\ 0 & 1 & 0 & T_{s} & 0 \\ 0 & 0 & 1 & - T_{s} & 0 \\ 0 & 0 & 0 & 1 - \frac{T_{s}}{τ_{l}} & 0 \\ 0 & 0 & 0 & - \frac{1}{τ_{l}} & 0 \end{matrix}] A_{2, i} = [\begin{matrix} 1 & T_{s} & 0 & 0 \\ 0 & 1 - \frac{2 C_{α f} + 2 C_{α r}}{M_{v e h} v_{x}_{, i}} T_{s} & \frac{2 C_{α f} + 2 C_{α r}}{M_{v e h}} T_{s} & - \frac{2 C_{α f} l_{1} - 2 C_{α r} l_{2}}{M_{v e h} v_{x}_{, i}} T_{s} \\ 0 & 0 & 1 & T_{s} \\ 0 & - \frac{2 C_{α f} l_{1} - 2 C_{α r} l_{2}}{I_{z} v_{x}_{, i}} T_{s} & \frac{2 C_{α f} l_{1} - 2 C_{α r} l_{2}}{I_{z}} T_{s} & 1 - \frac{2 C_{α f} l_{1}^{2} + 2 C_{α r} l_{2}^{2}}{I_{z} v_{x}_{, i}} T_{s} \end{matrix}] B_{i} = [\begin{matrix} B_{1,}_{i} \\ 0 \end{matrix} \begin{matrix} 0 \\ B_{2,}_{i} \end{matrix}] B_{1}_{, i} = {[\begin{array}{l} 0 & 0 & 0 & \frac{T_{s}}{τ_{l}} & \frac{1}{τ_{l}} \end{array}]}^{T} B_{2,}_{i} = {[\begin{matrix} 0, & \frac{2 C_{a f}}{M_{v e h}} T_{s}, & 0, & \frac{2 C_{a f} l_{1}}{I_{z}} T_{s} \end{matrix}]}^{T} G_{i} = [\begin{matrix} G_{1,}_{i} \\ 0 \end{matrix} \begin{matrix} 0 \\ G_{2,}_{i} \end{matrix}] G_{1,}_{i} = {[\begin{array}{l} \frac{1}{2} T_{s}^{2} & 0 & T_{s} & 0 & 0 \end{array}]}^{T} G_{2,}_{i} = {[\begin{matrix} 0, & - \frac{2 C_{α f} l_{1} - 2 C_{α r} l_{2}}{M_{v e h} v_{x,}_{i}} T_{s} - v_{x}_{, i} T_{s}, & 0, & - \frac{2 C_{α f} l_{1}^{2} + 2 C_{α r} l_{2}^{2}}{I_{z} v_{x,}_{i}} T_{s} \end{matrix}]}^{T}

At the time k, for the following vehicle i, the symbols in the platoon model are shown in the Table 1.

5. IDMPC Strategy for Platoon

5.1. Distributed Control Structure for Platoon

The platoon control for electric vehicle needs to optimize the multiple objectives. For each following vehicle, objectives for to longitudinal motion and lateral motion need to be optimized. Multiple objectives contain the safety, followability, comfortability, lane tracking, stability in lateral direction and economic performance; for the whole platoon, it needs to optimize the platoon stability. The platoon stability is that the error of the spacing converges in the process of propagation. Therefore, the objectives of multi-vehicle following control include the safety, followability, platoon stability, comfortability, lane tracking, stability in lateral direction and economic performance.

In Figure 4, the distributed control architecture is applied for the platoon consisting of electric vehicles. n MPC controllers are used to control the multi-vehicle following process for n following vehicles, respectively. In order to realize the coordinated optimization of the multi-vehicle following control, it is important to select suitable weights in objective function for each MPC controller. To coordinate control of the various following vehicles in the platoon, the global information needs to be considered. The DQN algorithm in single-agent reinforcement learning is difficult to meet the needs of multi-vehicle following control, so weights are selected based on the QMIX algorithm in multi-agent reinforcement learning. The QMIX algorithm adopts offline centralized training and online distributed application, which not only realizes distributed control, but also makes full use of global information. Combining with the weights of real-time optimization, the MPC algorithm realizes the coordinated control for the motions of longitudinal direction and lateral direction with the method of rolling optimization.

5.2. Distributed MPC Algorithm

In the platoon control, in order to optimize multiple objectives, corresponding output variables are set. For the following vehicles, the vehicle i − 1 and the vehicle i are taken as the front vehicle and following vehicle, respectively. The output variables are defined as:

y_{i} (k) = {[\begin{matrix} δ_{s,}_{i} (k), & v_{r e l,}_{i} (k), & a_{x,}_{i} (k), & \begin{matrix} j_{x,}_{i} (k), & e_{s}_{, i} (k), & {\dot{e}}_{s, i} (k), & \begin{matrix} e_{α, i} (k), & {\dot{e}}_{α, i} (k) \end{matrix} \end{matrix} \end{matrix}]}^{T}

(2)

where, for the following vehicle i, δ_s,i is the error of spacing, and δ_s,i is described as:

δ_{s, i} (k) = Δ s_{i} (k) - (v_{x, i} t_{h} + d_{0})

(3)

where, t_h denotes the headway time; d₀ denotes the safe spacing.

The output variables can be described as:

y_{i} (k) = C_{i} x_{i} (k) - Z_{i}

(4)

where

C_{i} = [\begin{matrix} C_{1, i} \\ 0 \end{matrix} \begin{matrix} 0 \\ C_{2,}_{i} \end{matrix}] C_{1,}_{i} = [\begin{array}{l} 1 & - t_{h} & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{array}] C_{2,}_{i} = [\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}] Z_{i} = [\begin{matrix} Z_{1,}_{i} \\ 0 \end{matrix}] Z_{1,}_{i} = {[\begin{array}{l} d_{0} & 0 & 0 & 0 \end{array}]}^{T}

To guarantee followability, the error of spacing and the relative speed should be minimized; to guarantee comfortability, the longitudinal acceleration and jerk should be minimized [23]; to guarantee economic performance, the desired longitudinal acceleration should be minimized for reducing the energy consumption [27]; to guarantee lane tracking, the minimization should be made for the position deviation in lateral direction and directional deviation [26]; to guarantee stability in lateral direction, the minimization should be made for the derivatives of position deviation in lateral direction and directional deviation, and for the steering angle [26]. Thus, to perform optimization operations on multiple objectives in the multi-vehicle following process, the minimization should be made for the output variables and control variables:

Minimization : {\begin{matrix} \min | y_{i} (k) | \\ \min | u_{i} (k) | \end{matrix}

(5)

To obtain smooth response curve of multi-vehicle following system, the output variable is smoothed with the reference trajectory as follows:

y_{r e f,}_{i} (k + i) = φ_{i}^{j} y_{i} (k)

(6)

where

φ_{i} = [\begin{matrix} φ_{1,}_{i} & 0 \\ 0 & φ_{2,}_{i} \end{matrix}] φ_{1,}_{i} = [\begin{matrix} ρ_{δ} & 0 & 0 & 0 \\ 0 & ρ_{v} & 0 & 0 \\ 0 & 0 & ρ_{a} & 0 \\ 0 & 0 & 0 & ρ_{j} \end{matrix}] φ_{2,}_{i} = [\begin{matrix} ρ_{e_{s}} & 0 & 0 & 0 \\ 0 & ρ_{{\dot{e}}_{s}} & 0 & 0 \\ 0 & 0 & ρ_{e_{α}} & 0 \\ 0 & 0 & 0 & ρ_{{\dot{e}}_{α}} \end{matrix}]

where, ρ_δ is the factor of the δ_s; ρ_v is the factor of the v_rel; ρ_a is the factor of the a_x; ρ_j is the factor of the j_x;

ρ_{e_{s}}

is the factor of the e_s;

ρ_{{\dot{e}}_{s}}

is the factor of the

\dot{e}

_s;

ρ_{e_{α}}

is the factor of the e_α;

ρ_{{\dot{e}}_{α}}

is the factor of the

\dot{e}

_α.

For the multi-vehicle following control, the constraints are described as follows:

s . t . {\begin{array}{l} \begin{array}{l} Δ s_{i} (k) \geq d_{c} \\ v_{x \min} \leq v_{x,}_{i} (k) \leq v_{x \max} \end{array} \\ a_{x \min} \leq a_{x,}_{i} (k) \leq a_{x \max} \\ j_{x \min} \leq j_{x}_{, i} (k) \leq j_{x \max} \\ \begin{array}{l} u_{1 \min} \leq u_{1,}_{i} (k) \leq u_{1 \max} \\ u_{2 \min} \leq u_{2}_{, i} (k) \leq u_{2 \max} \end{array} \end{array}

(7)

where, u_1,i and u_2,i are the desired longitudinal acceleration and the targeted value of front steering angle for the following vehicle i, respectively.

Homogeneous platoon means that the vehicles equipped with ACC system are exactly the same, and the parameters in controller are exactly the same. Heterogeneous platoon means that vehicles equipped with ACC are produced by different automobile manufacturers and component suppliers. And the controller fails to meet uniform standards, and this paper mainly considers the difference in headway time t_h.

For guaranteeing the platoon stability of the homogeneous platoon, the corresponding constraints [28] are defined as follows:

t_{h} > 2 τ_{l}

(8)

For guaranteeing the platoon stability of the heterogeneous platoon, the corresponding constraints [28] are defined as follows:

{\begin{matrix} t_{h, i} > 2 τ_{l, i} \\ t_{h, i} \leq t_{h, i - 1} \end{matrix}

(9)

where, t_h_,i and t_h_,i−1 is the headway time of the following vehicle i and i − 1, respectively, and the τ_l_,i is the lag time of the following vehicle i.

For each following vehicles, longitudinal motion and lateral motion are optimized with a distributed MPC algorithm. The objective function is described as:

\begin{array}{l} J_{i} = & \sum_{j = 1}^{p} {[{\hat{y}}_{p, i} (k + j / k) - y_{r e f, i} (k + j)]}^{T} Q_{i} [{\hat{y}}_{p, i} (k + j / k) - y_{r e f, i} (k + j)] \\ + \sum_{j = 0}^{m - 1} u_{i} (k + j) R_{i} u_{i} (k + j) \end{array}

(10)

where, p is the time domain for predicting, and m is the time domain for controlling.

By combining objective functions and constraints, the distributed MPC algorithm can be applied to a multi-vehicle following system for calculating control variables. In the Equation (10), the weight matrix Q_i and weight matrix R_i are as follows:

Q_{i} = [\begin{array}{l} w_{δ,}_{i} (k) \\ w_{v,}_{i} (k) \\ w_{a,}_{i} (k) \\ w_{j_{x},}_{i} (k) \\ w_{e_{s},}_{i} (k) \\ w_{{\dot{e}}_{s},}_{i} (k) \\ w_{e_{α},}_{i} (k) \\ w_{{\dot{e}}_{α},}_{i} (k) \end{array}]

(11)

R_{i} = [\begin{matrix} w_{u_{1}}_{, i} (k) \\ w_{u_{2},}_{i} (k) \end{matrix}]

(12)

where, at sampling time k, for the following vehicle i, w_δ_,i (k), w_v,i(k), w_a,i(k), w_j,i(k),

w_{e_{s}, i} (k)

,

w_{{\dot{e}}_{s}, i} (k)

,

w_{e_{α}, i} (k)

and

w_{{\dot{e}}_{α}, i} (k)

are the weights for δ_s, v_rel, a_x, j_x, e_s,

\dot{e}

_s, e_α and

\dot{e}

_α, respectively. And,

w_{u_{1}, i} (k)

and

w_{u_{2}, i} (k)

are the weights for u_1,i and u_2,i.

The various parameters of the distributed MPC algorithm are described in Table 2.

5.3. QMIX-Based Optimization Algorithm for Weights

In the process of weight optimization, only the weights of the Q_i matrix are optimized, while the weights of the R_i matrix are set to a constant 1. The purpose of this is to guarantee that the weights are referenced to the weights of the R_i matrix. Since the number of all weights of the Q_i matrix is 8 × n, it is difficult to optimize the weights of the Q_i matrix by traditional modeling methods, so the weights are optimized through the QMIX algorithm in multi-agent reinforcement learning. The optimization algorithm for weights are designed according to the principles of multi-agent reinforcement learning and QMIX.

When there are multiple agents in the environment, the environment becomes complicated due to the competition and cooperation among multiple agents. As shown in Figure 5, it is the principle of multi-agent reinforcement learning [29]. In the training process, the policy of each agent is changing, and for any independent agent, its environment is not stable. Applying MDP directly to multi-agent systems will result in many problems. Markov game (MG) is an extension of MDP on multi-agent system, and multi-agent reinforcement learning problems can be modeled by MG [29].

For a multi-agent system consisting of n (n ≥ 2) agents, the mathematical form of MG is defined as follows:

M_{m} = (n, S, A_{1}, \dots, A_{n}, P, R_{1}, \dots, R_{n}, γ)

(13)

where, n is the number of agents, S is the state set, A_i is the action set of the agent i, I ∈ [1, 2, …, n], P is the state transition function, and R_i is the reward function of the agent i, γ is the reward discount factor, and γ ∈ [0, 1). Comparing with single-agent reinforcement learning, the difference is that the reward function and transfer function of multi-agent reinforcement learning are based on the joint action a_joint = (a₁, a₂, …, a_n). r_i (s_i, a₁, …, a_n) is the reward value for agent i obtained by taking a joint action (a₁, a₂, …, a_n) at the state of s.

In the multi-agent reinforcement learning, the relationship between multiple agents is mainly divided into collaboration, competition and mix. If the environment of MDP is partially observable, the MDP is called a partially observable MDP (POMDP). When there is a cooperative relationship between multiple agents, MG can be converted into a decentralized POMDP (Dec-POMDP) model [29].

The mathematical form of Dec-POMDP is described as follows:

G = (n, S, A, O, R, Z, γ)

(14)

where, n is the number of agents, S is the state space, A is the action space, O is the observation function, R is the reward function, Z is the observation space, and γ is the discount factor.

The single-agent reinforcement learning that integrates deep neural networks is called single-agent deep reinforcement learning, such as the DQN algorithm. Since the environment is partially observable in POMDP, the DQN [30] algorithm is not suitable for POMDP, so the DQN algorithm needs to be improved. Deep recurrent Q-network (DRQN) [31] introduces a recurrent neural network to replace a fully connected layer after the DQN convolutional layer, so that it can memorize historical states and thus improve algorithm performance under partially observable conditions. Long-short-term memory networks and gated recurrent unit (GRU) networks are two special types of recurrent neural networks, and because of the special gated system structure, they can facilitate learning over a longer period of time.

Multi-agent reinforcement learning that integrates deep neural networks is called multi-agent deep reinforcement learning. The learning framework of multi-agent deep reinforcement learning can be divided into fully centralized, fully distributed, and centralized learning and distributed applications. In addition, centralized learning and distributed application frameworks are the most widely used. Value decomposition network (VDN) [32] adopts the framework of centralized learning and distributed application. In VDN, for each agent i, the value function Q_i is calculated independently, and then accumulates the joint action-value function Q_tot. Q_tot is calculated as follows:

Q_{t o t} (τ_{j o i n t}, a_{j o i n t}) = \sum_{i = 1}^{n} Q_{i} (τ_{i}, a_{i}; θ_{i})

(15)

VDN decomposes the overall value function through a simple summation method. When VDN is trained with centralized method, it only needs to calculate the time difference error of Q_tot. And then, VDN backpropagates to the value function Q_i of a single agent, thereby effectively reducing the amount of calculation. Because the VDN algorithm does not consider the global state information, and the joint action-value function is obtained by simply accumulating the value functions of a single agent, the VDN algorithm has certain limitations.

The QMIX [22] algorithm is an extension of the VDN algorithm. QMIX fits multiple local action-value functions to the global action-value function through the neural network, and considers global information in the fitting process. In QMIX, to ensure that the global action-value function and the local action-value function have the same monotonicity, which should meet the needs of the following equation:

\underset{a_{j o i n t}}{argmax} Q_{t o t} (τ_{j o i n t}, a_{j o i n t}) = (\begin{matrix} \underset{a_{1}}{argmax} Q_{1} (τ_{1}, a_{1}) \\ ⋮ \\ \underset{a_{n}}{argmax} Q_{n} (τ_{n}, a_{n}) \end{matrix})

(16)

The above equation can be converted into the following form:

\frac{\partial Q_{t o t}}{\partial Q_{i}} \geq 0, \forall i \in {1, \dots n}

(17)

The framework of QMIX is shown in Figure 6. It consists of a mixing network and an agent network, respectively [22]. QMIX generates the weights and biases of the mixing network through the hypernetwork, thus guaranteeing the monotonicity constraint. The input of hypernetwork is the global state information s_t. Therefore, the mixing network can fit arbitrary monotonic functions. The agent network is implemented through DRQN, where DRQN memorizes historical states through GRU. The input of DRQN is the observation o_i_,t of a single agent and the action a_i_,t−1 in the previous time, and DRQN outputs the value Q_i of a single agent.

For the QMIX, the loss function is described as follows:

L (θ) = \sum_{i = 1}^{b} [{(y_{i}^{t o t} - Q_{t o t} (τ_{j o i n t}, a_{j o i n t}, s; θ))}^{2}]

(18)

where

y^{t o t} = r + γ \max_{a^{'}} Q_{t o t} (τ_{j o i n t}, a_{j o i n t}, s; θ^{-})

(19)

where, b is the number of samples taken from the experience pool, θ and

\hat{θ}

are the parameters of the main network and target network, respectively.

The selection problem of weights is converted into the Dec-POMDP firstly, and then optimization operations for weights are performed with the QMIX principle.

At the time step t, the observation o_i_,t of agent i is described as:

o_{i,}_{t} = (\begin{array}{l} δ_{s, i, t} & v_{r e l, i, t} & a_{x, i, t} & j_{x, i, t} & e_{s, i, t} & {\dot{e}}_{s, i, t} & e_{α, i, t} & {\dot{e}}_{α, i, t} \end{array})

(20)

At the time step t, the action a_i_,t of agent i is described as:

a_{i,}_{t} = (\begin{array}{l} w_{δ_{s}, i, t} & w_{v_{r e l}, i, t} & w_{a_{x}, i, t} & w_{j_{x}, i, t} & w_{e_{s}, i, t} & w_{{\dot{e}}_{s}_{,}}_{, i, t} & w_{e_{α}, i, t} & w_{{\dot{e}}_{α}, i, t} \end{array})

(21)

At the time step t, the action-observation history of agent i is described as:

τ_{i} = (a_{i, 1}, o_{i, 0}, \dots, a_{i, t - 1}, o_{i, t}, \dots, a_{n, t - 1}, o_{n, t})

(22)

The global state s_t at the time step t is described as:

s_{t} = (o_{1, t}, \dots, o_{i, t}, \dots, o_{n, t})

(23)

At the time step t, the agent i takes action a_i, and the reward r_i,t is set as follows:

\begin{array}{l} r_{i,}_{t} = - (5 {(δ_{s, i, t})}^{2} + 5 {(v_{r e l, i, t})}^{2} + 50 {(a_{x, i, t})}^{2} + 50 {(j_{x, i, t})}^{2}) \times 0.001 \\ - (50 {(e_{s, i, t})}^{2} + 250 {(e_{α, i, t})}^{2} + 50 {({\dot{e}}_{s, i, t})}^{2} + 250 {({\dot{e}}_{α, i, t})}^{2}) \times 0.001 \\ - 10 ζ_{1} + 2 ζ_{2} + ζ_{3} \end{array}

(24)

where

\begin{array}{l} {\begin{array}{l} ζ_{1} = 1, i f s i m u l a t i o n f i n i s h e s \\ ζ_{1} = 0, o t h e r w i s e \end{array} \\ {\begin{array}{l} ζ_{2} = 1, i f v_{r e l}^{2} < 1 \\ ζ_{2} = 0, o t h e r w i s e \end{array} \\ {\begin{array}{l} ζ_{3} = 1, i f e_{s}^{2} < 0.01 \\ ζ_{3} = 0, o t h e r w i s e \end{array} \end{array}

In the QMIX, the agent network adopts DRQN network which contains input layer, hidden layer and output layer. The hidden layer is composed of GRU, and the number of neurons in the hidden layer is 64. The activation function of the output layer in the network selects ReLU. All weights obtained from the output layer of network should be greater than or equal to 10⁻⁴. For the agent i, during training, the t-step input of the DRQN network is o_i_,t and a_i_,t, and the output is Q_i_,t. During application, the t-step input of the DRQN network is o_i_,t and a_i_,t, the output is argmax_aQ_i.

The maximum iteration round of QMIX is 1,000,000, the learning rate set as 5 × 10⁻⁴, the capacity for replay buffer is 5000, the batch size is set as 32, and the reward discount factor is 0.95. The parameters are synchronized for main network and target network every 200 iteration rounds. In the ε-greedy policy, the initial ε is 0.99, which is then decayed linearly for each iteration.

6. Simulation and Experiment Results

6.1. Simulation Experiment Settings

For the platoon control, objectives for optimization are the safety, followability, platoon stability, comfortability, lane tracking, stability in lateral direction and economic performance. In the experiment, the control strategy of platoon put forward in the paper is the target strategy, and the strategy is called as MPC_QMIX.

In the validation of the target strategy, two comparison strategies are set. The first comparison strategy achieves weight optimization through independent Q-learning (IQL) network [33]. The second comparison strategy adopts the constant weights. The comparison strategies are abbreviated as MPC_IQL and MPC_ORI.

For the purpose of analysis for followability, lane tracking and stability in lateral direction, the root mean square estimation (RMSE) and coordinate deviations ∆_XY for longitudinal direction and lateral direction [34] are described as:

Δ_{X Y} (i) = \sqrt{{(X (i) - X_{r e f} (i))}^{2} + {(Y (i) - Y_{r e f} (i))}^{2}}

(25)

R M S E_{v a r} = \sqrt{\frac{1}{n_{t o}_{t}} \sum_{j = 1}^{n_{t o t}} {(v a r (j))}^{2}}

(26)

n_{t o t} = \frac{T}{T_{s}}

(27)

The symbols in Equations (25)–(27) are listed in Table 3.

In the platoon control, the platoon can be divided into homogeneous platoon and heterogeneous platoon. The difference between two platoons in this paper mainly considers the difference in headway time t_h. In a homogeneous platoon, the t_h of all following vehicles is the same; in a heterogeneous platoon, the t_h for all following vehicles is different. In order to ensure the platoon stability, t_h needs to meet the corresponding constraint for the two kinds of platoons.

For the platoon, the number of vehicles is 5, where the vehicle 0 is the leading vehicle, and the vehicles 1, 2, 3 and 4 are following vehicles. The headway t_h in the homogeneous platoon is set to 1.5 s, and the headways t_h of the vehicles 1, 2, 3 and 4 in the heterogeneous platoon is 1.5 s, 1.4 s, 1.3 s and 1.2 s, respectively.

In the validation of the target strategy, the setting of simulation scenario is as: following a front vehicle with a changing speed all the time, from 3 s to 143 s, and the acceleration is approximately sinusoidal. As shown in Table 4, the simulation scenario is set.

As shown in Figure 7, the lane center line is set for the platoon control, which consists of 61 arcs with different lengths. The longitudinal speed and the constraints on the magnitude of lateral acceleration for the following vehicles are considered in the setting for the setting of an arc radius. The hardware and software of simulation are set in Table 5. The evaluation criteria for objectives to be optimized are set in Table 6.

6.2. Analysis of Experimental Results

6.2.1. Analysis of Experimental Results for Homogeneous Platoon

In Figure 8, the spacing of homogeneous platoon is shown. In the period 10–130 s, because the longitudinal velocity of the leading vehicle changes all the time, the spacings among vehicles for following vehicles changes all the time, but the spacing is always greater than the minimum safe spacing (5 m), thus the safety for the platoon control is ensured.

Figure 9a,b show the changes for the error of spacing and the speed in longitudinal direction of the homogeneous platoon, respectively. In Figure 9a, since the speed of the leading vehicle changes in the period of 10–130 s, the error of the spacing always exists, but the error is small, so each vehicle can track the desired spacing well. In Figure 9b, the speed in longitudinal direction of vehicles 1, 2, 3 and 4 has a small difference with the speed in longitudinal direction of the respective front vehicle, so the longitudinal velocity of the respective preceding vehicle can be well tracked. In Table 7, the

R M S E_{δ s}

for vehicles 1, 2, 3 and 4 are 0.9120 m, 0.9098 m, 0.8985 m and 0.8834 m, respectively; the

R M S E_{v r e l}

for vehicles 1, 2, 3 and 4 are 0.8676 m/s, 0.8127 m/s, 0.7569 m/s and 0.7051 m/s, respectively. The averages of

R M S E_{δ s}

and

R M S E_{v r e l}

for all following vehicles are 0.9009 m and 0.7856 m/s, respectively. To sum up, the followability is guaranteed during multi-vehicle following process.

As shown in Figure 9a, in the period of 0–20 s, the error of spacing varies greatly, which is because the following process does not reach a stable following state. In the period of 20–150 s, the motion states of vehicles 1, 2, 3 and 4 change stably with the respective preceding vehicle, and the error of spacing decreases smoothly as vehicle number increases. The error of spacing converges in the process of propagation, so the stability of the platoon in the multi-vehicle following process is guaranteed.

In Figure 10a,b, the changes for longitudinal acceleration and jerk of the homogeneous platoon are shown, respectively. In Figure 10a, the acceleration in longitudinal direction of the leading vehicle approximately changes in the form of a sine function in the period of 10–130 s. The longitudinal acceleration of vehicles 1, 2, 3 and 4 varies with the acceleration in longitudinal direction for the front vehicle, and the longitudinal acceleration varies smoothly. In Figure 10b, the absolute values of the upper and lower bounds of jerk are in range of 3 m/s³ all the time, so the comfortability for the multi-vehicle following process is guaranteed.

Figure 11 shows the vehicle trajectory of a homogeneous platoon. The reference trajectory is composed of 61 arcs with different radii. The trajectories of vehicles 1, 2, 3 and 4 can well coincide with the reference trajectory. In the Table 7, the

R M S E_{Δ_{X Y}}

for vehicles 1, 2, 3 and 4 are 0.0453 m, 0.0448 m, 0.0442 m and 0.0437 m, respectively. The average of

R M S E_{Δ_{X Y}}

for all following vehicles is 0.0445 m. Therefore, the lane tracking during the multi-vehicle following process is guaranteed.

Figure 12 shows the responses related with lateral motion of a homogeneous platoon, consisting of β, a_y, δ_f and

\dot{Ψ}

. The curvature of the road and the longitudinal speed are the main factors affecting the stability in lateral direction. The reference trajectory for the lane centerline is composed of 61 arcs with unequal radii, and there is a curvature difference at the connection position of the two arcs with unequal radii. When the vehicle is entering the curve, the curvature difference is large, so the responses related with the lateral motion fluctuates greatly. After entering the curve, due to the small curvature difference between adjacent arcs, the variation of responses related with the lateral motion caused by the curvature difference is small. At this time, the responses with the lateral motion are mainly affected by the longitudinal speed.

In Figure 12a, the β is shown. During the whole simulation, the longitudinal velocity is relatively high, so the vehicles 1, 2, 3 and 4 have a centrifugal trend when driving on the curve, so β is negative. Since the longitudinal speed of the leading vehicle in the platoon is approximately sinusoidal, the velocity of the following vehicle in the platoon is also approximately sinusoidal. When the longitudinal velocity decreases, β gradually increases. When the longitudinal speed increases, β gradually decreases. When the longitudinal speed tends to a constant value, β tends to be a stable value. In Figure 12b–d, the changes of a_y, δ_f and

\dot{Ψ}

are shown, respectively. The three responses have similar changing trends and change in real time with the speed of vehicle in front. All three responses increase as longitudinal speed increases, and decrease as longitudinal speed decreases. In the Table 7, the

R M S E_{β}

for the vehicles 1, 2, 3 and 4 are 0.0935 deg, 0.0930 deg, 0.0926 deg and 0.0923 deg, respectively. The

R M S E_{a_{y}}

for the vehicles 1, 2, 3 and 4 are 0.8074 m/s², 0.8036 m/s², 0.7999 m/s² and 0.7965 m/s², respectively. The

R M S E_{δ_{f}}

for the vehicles 1, 2, 3 and 4 are 0.4424 deg, 0.4411 deg, 0.4398 deg and 0.4386 deg, respectively. And the

R M S E_{\dot{ψ}}

for the vehicles 1, 2, 3 and 4 are 2.0211 deg/s, 2.0154 deg/s, 2.0098 deg/s, 2.0042 deg/s, respectively. In addition, the averages of the

R M S E_{β}

,

R M S E_{a_{y}}

,

R M S E_{δ_{f}}

and

R M S E_{\dot{ψ}}

are 0.0929 deg, 0.7951 m/s², 0.4405 deg and 2.0126 deg/s, respectively. Since the variation range of the four responses of stability is small, the stability in lateral direction during multi-vehicle following process is guaranteed.

Figure 13a,b are the battery power and SOC of the homogeneous platoon, respectively. Due to the regenerative braking of pure electric vehicles, it is necessary to consider energy recovery during multi-vehicle following process. When the vehicles perform a acceleration operation, the battery power is positive, corresponding to energy consumption, and the SOC decreases. When the vehicles perform a deceleration operation, the battery power is negative, corresponding to energy recovery, and the SOC increases. Because there exists the conservation of energy, and the energy consumed is greater than the energy recovered, thus the SOC eventually declines. During the multi-vehicle following process, the battery power and SOC for vehicles 1, 2, 3 and 4 have similar trends. As shown in Figure 13a, in the vehicle platoon, the variation range of battery power of the rear vehicle is smaller than that of the front vehicle. From Figure 13b, in the vehicle platoon, the variation of SOC for the following vehicle is smaller than that of the preceding vehicle. This is because when each following vehicle enters a stable following state, the error of spacing is in convergence state during the propagation process. The corresponding longitudinal speed and longitudinal acceleration for the following vehicles decrease in sequence compared with the preceding vehicle. In the Table 7, the ΔSOC/s for vehicles 1, 2, 3 and 4 are 0.0053 km⁻¹, 0.0051 km⁻¹, 0.0049 km⁻¹ and 0.0048 km⁻¹. And the average of the ΔSOC/s is 0.0050 km⁻¹. Therefore, for each vehicle in the platoon, the energy consumption decreases sequentially. Since the energy consumed is optimized and the energy consumed is taken into account during the following process, the economic performance in the multi-vehicle following process is ensured.

For the homogeneous platoon, Table 8, Table 9, Table 10 and Table 11 are the comparison results of the followability, lane tracking, stability in lateral direction and economic performance of the three strategies. Through the comparison, it can be concluded that MPC_QMIX can obtain better lane tracking, stability in lateral direction and economic performance. This is because MPC_QMIX can take advantage of global information to optimize the weights of each MPC controller during the multi-vehicle following process. MPC_IQL can only optimize the weights of each MPC controller by using the observed local information. And MPC_ORI applies constant weights to MPC algorithm, which cannot control the different following vehicle coordinately. As for the economic performance, in MPC_QMIX, the spacing and longitudinal speed during the multi-vehicle following process are more suitable, so that the entire vehicle platoon consumes less energy.

To sum up, the control strategy of platoon consisting of electric vehicle put forward in the paper, and the platoon strategy can take advantage of global information in the multi-vehicle following process. Comparing with the comparison strategy, MPC_QMIX can control the following vehicles in the platoon coordinately, and it has better lane tracking, stability in lateral direction and economic performance on the premise of guaranteeing other control objectives in the multi-vehicle following process.

6.2.2. Analysis of Experimental Results for Heterogeneous Platoon

As shown in Figure 14, the spacing of heterogeneous platoon is significantly different from the spacing of homogeneous platoon. This is because the headway time of each following vehicle in the heterogeneous platoon is different, which leads to the difference of desired spacing, so that the spacing is different for each following vehicle. In the period of 10–130 s, because the speed in longitudinal direction of the leading vehicle changes all the time, the spacing changes in real time. The spacing is always lager than the minimum value for safe spacing (5 m), thus the safety for the multi-vehicle following process is guaranteed.

Figure 15a,b are the error of spacings and longitudinal speed of heterogeneous platoon, respectively. In Figure 15a, since the headway time of each following vehicle in the heterogeneous platoon is different, the desired spacing is different. The error of spacing for each following vehicle is significantly different in the period of 0–40 s. During the period of 40–150 s, the error of spacing tends to changes steadily. Because the velocity in longitudinal direction for the front vehicle changes all the time, the distance among vehicles always exists, but the value is small. Therefore, for each vehicle, the respective expected spacing can be tracked. In Figure 15b, the speed in longitudinal direction of vehicles 1, 2, 3 and 4 differs very little from the speed in longitudinal direction for the respective front vehicle. Therefore, the speed in longitudinal direction for the respective preceding vehicle can be well tracked. In the Table 12, the

R M S E_{δ s}

for vehicles 1, 2, 3 and 4 are 0.9231 m, 1.0914 m, 1.4290 m and 1.8470 m, respectively; the

R M S E_{v r e l}

for vehicles 1, 2, 3 and 4 are 0.8753 m/s, 0.7666 m/s, 0.6806 m/s and 0.6109m/s, respectively. The averages of

R M S E_{δ s}

and

R M S E_{v r e l}

for all following vehicles are 1.3226 m and 0.7334 m/s, respectively. To sum up, the followability is guaranteed in the multi-vehicle following process.

From Figure 15a, in the period of 0–40 s, the error of spacing varies greatly, which is because the following process is not in a stable following state. In the 40–150 s period, the motion states of vehicles 1, 2, 3 and 4 change stably with the change of their respective preceding vehicles. The error of spacing decreases smoothly as vehicle number increases. The error of spacing converges in the process of propagation, so the stability of the vehicle platoon in the multi-vehicle following process is ensured.

In Figure 16a,b, the changes of longitudinal acceleration and jerk for heterogeneous platoon are presented. In Figure 16a, for the leading vehicle, the longitudinal acceleration is a constant value of 0 in the period of 0–10 s. Since the headway time of each following vehicle is different, the corresponding desired spacing is different. Therefore, the tracking effects for the respective desired spacing are different. Among them, the headway of vehicles 3 and 4 is smaller, and the corresponding error of spacing is larger. The accelerations of vehicles 3 and 4 are larger than that in vehicle 1 and vehicle 2. In the period of 10–130 s, the longitudinal acceleration for the leading vehicle is approximately sinusoidal. And for each following vehicle, the longitudinal acceleration also changes with the respective front vehicle. In addition, the acceleration of following vehicles is smooth. In Figure 16b, the absolute value of the upper and lower bounds of jerk is kept in the range of 3 m/s³ all the time, thus the comfortability in the multi-vehicle following process is guaranteed.

Figure 17 shows the vehicle trajectory of heterogeneous platoon. The reference trajectory is composed of 61 arcs with different radii. In the Table 12, the

R M S E_{Δ_{X Y}}

for vehicles 1, 2, 3 and 4 are 0.0456 m, 0.0450 m, 0.0445 m and 0.0441 m, respectively. The average of

R M S E_{Δ_{X Y}}

for all following vehicles is 0.0448 m. The trajectory of each following vehicle in the platoon can well coincide with the reference trajectory, so the lane tracking in the multi-vehicle following process is ensured.

Figure 18 shows the responses of stability in lateral direction for heterogeneous platoon consisting of β, a_y, δ_f and

\dot{Ψ}

. The responses of lateral motion for the heterogeneous platoon are similar to the responses of motion for the homogeneous platoon, because the headway time mainly affects the longitudinal motion of the vehicle. The headway time has little effect on the lateral motion. the stability in lateral direction is mainly affected by the road curvature and longitudinal speed. The reference trajectory of the lane centerline is composed of 61 arcs with unequal radii, and there will be a curvature difference at the connection position of the two arcs with unequal radii. At the beginning of the simulation, since the following vehicle is entering the curve, the curvature difference is large, so the responses related with stability in lateral direction change greatly. After the vehicle enters the curve, due to the small curvature difference between adjacent arcs, the changes for responses related with motion caused by the curvature difference is small, and the responses related with lateral motion is mainly affected by the longitudinal speed.

In Figure 18a, the change of β is shown. During the whole simulation, the longitudinal speed is high, and each following vehicle has a centrifugal motion trend when driving on the curve, so β is a negative. Since the speed in longitudinal direction of the leading vehicle for the platoon is approximately sinusoidal, the velocity in longitudinal direction of the following vehicle for the platoon is also approximately sinusoidal. When the longitudinal speed decreases, β gradually increases. When the velocity in longitudinal direction increases, β gradually decreases. When the speed in longitudinal direction tends to be a constant value, β tends to be a stable value. In Figure 18b–d, a_y, δ_f and

\dot{Ψ}

are shown, respectively. The three kinds of responses have similar changing trends and change with the front vehicle in real time. All three responses increase as longitudinal speed increases, and decrease as longitudinal speed decreases. In the Table 12, the

R M S E_{β}

for the vehicles 1, 2, 3 and 4 are 0.0964 deg, 0.0945 deg, 0.0931 deg and 0.0927 deg, respectively. And the

R M S E_{a_{y}}

for the vehicles 1, 2, 3 and 4 are 0.8155 m/s², 0.8077 m/s², 0.8013 m/s² and 0.7980 m/s², respectively. The

R M S E_{δ_{f}}

for the vehicles 1, 2, 3 and 4 are 0.4441 deg, 0.4419 deg, 0.4401 deg and 0.4389 deg, respectively. And the

R M S E_{\dot{ψ}}

for the vehicles 1, 2, 3 and 4 are 2.0288 deg/s, 2.0193 deg/s, 2.0110 deg/s and 2.0055 deg/s, respectively. In addition, the averages of the

R M S E_{β}

,

R M S E_{a_{y}}

,

R M S E_{δ_{f}}

and

R M S E_{\dot{ψ}}

for all following vehicles are 0.0942 deg, 0.8056 m/s², 0.4413 deg and 2.0162 deg/s, respectively. Since the variation ranges of the four responses of lateral motion are small, the stability in lateral direction during the multi-vehicle following is ensured.

The battery power and SOC of heterogeneous platoon are shown in Figure 19. Due to the regenerative braking of electric vehicles, it is necessary to consider the energy recovery during the multi-vehicle following. When the vehicle accelerates, the battery power is positive, corresponding to energy consumption, and the SOC decreases. When the vehicle decelerates, the battery power is negative, corresponding to energy recovery, and the SOC increases. Because there exists the conservation for energy, the consumed energy is greater than the energy recovered, thus the SOC eventually declines. During the multi-vehicle following process, the battery power and SOC of each following vehicle have similar trends. As shown in Figure 19a, in the period of 0–10 s, due to the different headway time of each following vehicle, the corresponding spacing are also different, so the error of spacing is also different. Vehicles 3 and 4 have relatively small headway time, and the corresponding errors of spacing are large. Therefore, vehicles 3 and 4 takes accelerated operations to reduce the errors of spacing, and the variations in battery power are large. During the period of 10–140 s, the battery power changes steadily with the speed in longitudinal direction. And the range of the battery power of the rear vehicle is lower than that of the front vehicle. As shown in Figure 19b, at the end of the experiment, the change of SOC for the rear vehicle is lower than that of the front vehicle. This is because when each following vehicle enters a stable following state, the error of spacing converges during the propagation process. Comparing with the corresponding preceding vehicle, the variation range of speed and acceleration in longitudinal direction for the corresponding following vehicle decreases in turn. For each following vehicle in the platoon, the energy consumption decreases sequentially. In Table 12, the ΔSOC/s for vehicles 1, 2, 3 and 4 are 0.0053 km⁻¹, 0.0051 km⁻¹, 0.0050 km⁻¹ and 0.0049 km⁻¹. The average of the ΔSOC/s for all following vehicles is 0.0051 km⁻¹. Since the energy consumption is optimized and the energy recovery is taken into account, the economic performance is ensured for the multi-vehicle following process.

For heterogeneous platoon control, Table 13, Table 14, Table 15 and Table 16 are the comparison results of the followability, lane tracking, stability in lateral direction and economic performance of the three strategies. Comparing with homogeneous platoon, the RMSE indicators of errors for spacing in heterogeneous platoon increase sequentially as the vehicle number increases, because the headway of each following vehicle in the platoon is different. The errors of spacing for vehicle 1, 2, 3 and 4 increase sequentially at the beginning of the experiment. In addition, the errors of spacing are larger at the beginning of the experiment. The error of spacing converges in the process of propagation after entering stable following state. Comparing with the comparison strategy, it can be concluded that MPC_QMIX can achieve better lane tracking, stability in lateral direction and economic performance, because MPC_QMIX can take advantage of global information to optimize the weights of each MPC controller during the multi-vehicle following process. MPC_IQL can only use local observed information to optimize the weights of each MPC controller. And MPC_ORI applies constant weights to MPC algorithm, which is difficult to control the different following vehicle coordinately. As for the economic performance, in MPC_QMIX, the spacing and longitudinal speed during the multi-vehicle following process are more suitable, so that the entire vehicle platoon consumes less energy.

Through the above analysis, the control strategy for platoon consisting of electric vehicles is put forward in the paper, and the strategy can take advantage of the global information in the multi-vehicle following process. Comparing to the comparison strategies, on the premise of guaranteeing other control objectives in the multi-vehicle following process, the multi-vehicle following process can be optimized coordinately. Therefore, the proposed strategy has better lane tracking, stability in lateral direction and economic performance.

7. Conclusions

The platoon control can be decomposed into multiple single-vehicle following controls, and it is necessary to consider the coordinated control among multiple single-vehicle following processes. Therefore, this paper studies the platoon control strategy considering the longitudinal motion and lateral motion. Firstly, a platoon model is built. Then, IDMPC strategy is designed with the distributed MPC algorithm. To control the different following vehicle coordinately, the optimization operations are performed on the weights in distributed MPC algorithm with the QMIX algorithm. Thus, the distributed MPC algorithm gets optimal control variables. Finally, the IDMPC is verified for homogeneous platoon and heterogeneous platoon. Comparing with the comparison strategies, the proposed platoon control strategy can take advantage of the global information. The spacing and longitudinal speed for the multi-vehicle following process are more suitable than comparison strategies. Therefore, the IDMPC can obtain better lane tracking, stability in lateral direction and economic performance under the premise of guaranteeing other objectives in the multi-vehicle following process. For future research, multidimensional platoon control will be considered. For example, three-dimensional platoon control is meaningful, which contains lateral control, longitudinal control and vertical control.

Author Contributions

S.Z. conceptualization; S.Z. methodology; S.Z. validation; S.Z. and X.Z. formal analysis; S.Z. writing—original draft preparation; X.Z. writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the National Natural Science Foundation of China under Grant U1713213, Grant U1913202, and Grant U1813205; in part by the Key-Area Research and Development Program of Guangdong Province under Grant 2019B090915001; in part by Shenzhen Technology Project under Grant JCYJ20180507182610734 and Grant JSGG20191129094012321.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yousefi, M.; Hajizadeh, A.; Soltani, M.N.; Hredzak, B. Predictive home energy management system with photovoltaic array, heat pump, and plug-in electric vehicle. IEEE Trans. Ind. Inf. 2021, 17, 430–440. [Google Scholar] [CrossRef]
Badue, C.; Guidolini, R.; Carneiro, R.V.; Azevedo, P.; Cardoso, V.B.; Forechi, A.; Jesus, L.; Berriel, R.; Paixão, T.M.; Mutz, F.; et al. Self-driving cars: A survey. Expert Syst. Appl. 2021, 165, 113816. [Google Scholar] [CrossRef]
Zhang, R.H.; Li, K.N.; Wu, Y.Y.; Zhao, D.Z.; Lv, Z.L.; Li, F.L.; Cheng, X.; Qiu, Z.J.; Yu, F. A multi-vehicle longitudinal trajectory collision avoidance strategy using AEBS with vehicle-infrastructure communication. IEEE Trans. Veh. Technol. 2022, 71, 1253–1266. [Google Scholar] [CrossRef]
He, Z.J.; Qin, S.; Wei, Y.J.; Gao, B.Z.; Zhu, B.; He, L. A model predictive control approach with slip ratio estimation for electric motor antilock braking of battery electric vehicle. IEEE Trans. Ind. Electron. 2022, 69, 9225–9234. [Google Scholar] [CrossRef]
Liu, S.; Li, Z.; Ji, H.; Wang, L.; Hou, Z. A novel anti-saturation model-free adaptive control algorithm and its application in the 823 electric vehicle braking energy recovery system. Symmetry 2022, 14, 580. [Google Scholar] [CrossRef]
Pei, W.; Zhang, Q.; Li, Y. Efficiency Optimization Strategy of Permanent Magnet Synchronous Motor for Electric Vehicles Based on Energy Balance. Symmetry 2022, 14, 164. [Google Scholar] [CrossRef]
Wang, Y.; Wang, Z.; Han, K.; Tiwari, P.; Work, D.B. Gaussian process-based personalized adaptive cruise control. IEEE Trans. Intell. Transp. Syst. 2022, 1–12. Available online: https://ieeexplore.ieee.org/document/9774935/ (accessed on 13 May 2022). [CrossRef]
Groelke, B.; Earnhardt, C.; Borek, J.; Vermillion, C. A predictive command governor-based adaptive cruise controller with collision avoidance for non-connected vehicle following. IEEE Trans. Intell. Transp. Syst. 2022, 23, 12276–12286. [Google Scholar] [CrossRef]
Jia, D.; Chen, H.; Zheng, Z.; Watling, D.; Connors, R.; Gao, J.; Li, Y. An enhanced predictive cruise control system design with data-driven traffic prediction. IEEE Trans. Intell. Transp. Syst. 2022, 7, 8170–8183. [Google Scholar] [CrossRef]
Ruan, S.; Ma, Y.; Yang, N.; Xiang, C.; Li, X. Real-time energy-saving control for HEVs in car-following scenario with a double explicit MPC approach. Energy 2022, 247, 123265. [Google Scholar] [CrossRef]
Li, S.; Li, K.; Rajamani, R.; Wang, J. Model Predictive Multi-Objective Vehicular Adaptive Cruise Control. IEEE Trans. Control Syst. Technol. 2011, 19, 556–566. [Google Scholar] [CrossRef]
Lamprecht, A.; Steffen, D.; Nagel, K.; Haecker, J.; Graichen, K. Optimal management and configuration methods for automobile cruise control systems. In Proceedings of the 18th Annual Conference on Systems Engineering Research (CSER), Charlottesville, VA, USA, 19–21 March 2020; pp. 429–439. [Google Scholar]
Rashid, T.; Samvelyan, M.; Schroeder, C.; Farquhar, G.; Foerster, J.; Whiteson, S. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4295–4304. [Google Scholar]
Ly, K.; Mayekar, J.V.; Aguasvivas, S.; Keplinger, C.; Rentschler, M.E.; Correll, N. Electro-hydraulic rolling soft wheel: Design, hybrid dynamic modeling, and model predictive control. IEEE Trans. Rob. 2022, 1–20. Available online: https://ieeexplore.ieee.org/document/9766178/ (accessed on 2 May 2022). [CrossRef]
Yeganegi, M.H.; Khadiv, M.; Prete, A.D.; Moosavian, S.A.A.; Righetti, L. Robust walking based on MPC with viability guarantees. IEEE Trans. Rob. 2022, 38, 1–16. [Google Scholar] [CrossRef]
Wu, Z.; Xia, X.; Zhu, B. Model predictive control for improving operational efficiency of overhead cranes. Nonlinear Dyn. 2015, 79, 2639–2657. [Google Scholar] [CrossRef]
Capuano, A.; Spano, M.; Musa, A.; Toscano, G.; Misul, D.A. Development of an adaptive model predictive control for platooning safety in battery electric vehicles. Energies 2021, 14, 5291. [Google Scholar] [CrossRef]
Caiazzo, B.; Coppola, A.; Petrillo, A.; Santini, S. Distributed nonlinear model predictive control for connected autonomous electric vehicles platoon with distance-dependent air drag formulation. Energies 2021, 14, 5122. [Google Scholar] [CrossRef]
Ma, H.; Chu, L.; Guo, J.H.; Wang, J.W.; Guo, C. Cooperative adaptive cruise control strategy optimization for electric vehicles based on SA-PSO with model predictive control. IEEE Access 2020, 8, 225745–225756. [Google Scholar] [CrossRef]
Lopes, D.R.; Evangelou, A. Energy savings from an eco-cooperative adaptive cruise control: A BEV platoon investigation. In Proceedings of the 18th European Control Conference (ECC), Napoli, Italy, 25–28 June 2019; pp. 4160–4167. [Google Scholar]
Ma, F.W.; Yang, Y.; Wang, J.W.; Liu, Z.Z.; Li, J.H.; Nie, J.H. Predictive energy-saving optimization based on nonlinear model predictive control for cooperative connected vehicles platoon with V2V communication. Energy 2019, 189, 116120. [Google Scholar] [CrossRef]
Chen, J.; Sun, D.; Zhao, M.; Li, Y.; Liu, Z. A new lane keeping method based on human-simulated intelligent control. IEEE Trans. Intell. Transp. Syst. 2022, 23, 7058–7069. [Google Scholar] [CrossRef]
Zhang, S.; Zhuan, X.T. Study on adaptive cruise control strategy for battery electric vehicle. Math. Probl. Eng. 2019, 2019, 7971594. [Google Scholar] [CrossRef]
Li, L.; Zhang, Y.B.; Yang, C.; Yang, B.J.; Martinez, M. Model predictive control-based efficient energy recovery control strategy for regenerative braking system of hybrid electric bus. Energy Convers. Manag. 2016, 111, 299–314. [Google Scholar] [CrossRef]
Abdollahi, A.; Han, X.; Avvari, G.; Raghunathan, N.; Balasingam, B.; Pattipati, K.R.; Bar-Shalom, Y. Optimal battery charging, Part I: Minimizing time-to-charge, energy loss, and temperature rise for OCV-resistance battery model. J. Power Sources. 2016, 303, 388–398. [Google Scholar] [CrossRef] [Green Version]
Zhang, S.; Zhuan, X.T.; Fang, Y.T.; Cheng, J. Model-predictive optimization for lane keeping assistance system with exponential decay smoothing. In Proceedings of the 2021 IEEE International Conference on Robotics and Biomimetics, Sanya, China, 27–31 December 2021; pp. 1–6. [Google Scholar]
Dang, R.; He, C.; Zhang, Q. ACC of electric vehicles with coordination control of fuel economy and tracking safety. In Proceedings of the Intelligent Vehicles Symposium, Alcala de Henares, Spain, 3–7 June 2012; pp. 240–245. [Google Scholar]
Xiao, L.Y.; Gao, F. Practical string stability of platoon of adaptive cruise control vehicles. IEEE Trans. Intell. Transp. Syst. 2011, 12, 1184–1194. [Google Scholar] [CrossRef]
Li, T.; Zhu, K.; Luong, N.C.; Niyato, D.; Wu, Q.; Zhang, Y.; Chen, B. Applications of Multi-agent reinforcement learning in future internet: A comprehensive survey. IEEE Commmun. Surv. Tutorials. 2022, 24, 1240–1279. [Google Scholar] [CrossRef]
Mnih, V. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Hausknecht, M.; Stone, P. Deep recurrent Q-learning for partially observable. In Proceedings of the 2015 AAAI Fall Symposium Series, Arlington, TX, USA, 12–14 November 2015. [Google Scholar]
Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-decomposition networks for cooperative multi-agent learning. arXiv 2017, arXiv:1706.05296. [Google Scholar]
Tan, M. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, Honolulu, HI, USA, 27–29 July 1993; pp. 330–337. [Google Scholar]
Batra, M.; McPhee, J.; Azad, N.L. Anti-jerk model predictive cruise control for connected electric vehicles with changing road conditions. In Proceedings of the 2017 11th Asian Control Conference (ASCC), Gold Coast, Australia, 17–20 December 2017; pp. 49–54. [Google Scholar]

Figure 1. The vehicle configuration of a front-drive electric vehicle.

Figure 2. The characteristics of a motor and the model of a battery. (a) External characteristics of the motor, (b) Battery model.

Figure 3. The modeling process for a vehicle platoon. (a) Vehicle model, (b) Car-following control of following vehicle i, (c) Longitudinal car-following process for following vehicle i, (d) Lateral lane keeping process for following vehicle i.

Figure 4. The distributed control architecture for electric vehicle platoon.

Figure 5. The principle of multi-agent reinforcement learning.

Figure 6. The principle of the QMIX algorithm.

Figure 7. The settings for the lane centerline.

Figure 8. Spacing of the homogeneous platoon.

Figure 9. The error of spacing and longitudinal speed of homogeneous platoon. (a) Error of spacing, (b) Longitudinal speed.

Figure 10. The longitudinal acceleration and jerk of the homogeneous platoon. (a) Longitudinal acceleration, (b) Jerk.

Figure 11. The vehicle trajectory of the homogeneous platoon.

Figure 12. The responses related to lateral motion of homogeneous platoon. (a) Sideslip angle of centroid, (b) Lateral acceleration, (c) Front steering angle, (d) Yaw rate.

Figure 13. The battery power and SOC of homogeneous platoon. (a) Battery power, (b) SOC.

Figure 14. Spacing of the heterogeneous platoon.

Figure 15. Error of spacing and longitudinal speed in a heterogeneous platoon. (a) Error of spacing, (b) Longitudinal speed.

Figure 16. The longitudinal acceleration and jerk of a heterogeneous platoon. (a) Longitudinal acceleration, (b) Jerk.

Figure 17. Vehicle trajectory of a heterogeneous platoon.

Figure 18. Responses related to the lateral motion of a heterogeneous platoon. (a) Sideslip angle of centroid, (b) Lateral acceleration, (c) Front steering angle, (d) Yaw rate.

Figure 19. The battery power and SOC of a homogeneous platoon. (a) Battery power, (b) SOC.

Table 1. The symbols in the platoon model.

Symbol	Description
x_i(k)	state variables
∆s_i(k)	spacing
v_x,i(k)	longitudinal speed
v_rel,i(k)	relative speed
a_x,i(k)	longitudinal acceleration
j_x,i(k)	jerk
e_s,i(k)	lateral distance deviation
$\dot{e}$ _s,i(k)	derivative of lateral distance deviation
e_α,i(k)	directional deviation
$\dot{e}$ _α,i(k)	derivative of directional deviation
u_i(k)	control variables
a_xdes,i(k)	desired longitudinal acceleration
δ_f,I(k)	targeted front steering angle
w_i(k)	variable of system disturbance
a_fx,i(k)	longitudinal acceleration of front vehicle
$\dot{ψ}$ _des,i(k)	desired yaw rate
T_s	sampling time
$τ_{l}$	time lag
M_veh	mass of electric vehicle
I_z	moment of inertia of electric vehicle
C_αf	the cornering stiffness of the front wheels
C_αr	the cornering stiffness of the rear wheels
l₁	the distance between centroid and front axles
l₂	the distance between centroid and rear axles

Table 2. The various parameters for the distributed MPC algorithm.

Symbol	Value	Symbol	Value
d_c	5 m	l₂	1.58 m
d₀	7 m	C_αf	80 KN/rad
v_x_max	36 m/s	C_αr	80 KN/rad
v_x_min	0 m/s	I_z	2873 kg·m³
a_x_max	2.5 m/s²	u_2max	5 deg
a_x_min	−5.5 m/s²	u_2min	−5 deg
u_1max	2.5 m/s²	$ρ_{e_{s}}$	0.6
u_1min	−5.5 m/s²	$ρ_{{\dot{e}}_{s}}$	0.6
j_x_max	3 m/s³	$ρ_{e_{α}}$	0.6
j_x_min	−3 m/s³	$ρ_{{\dot{e}}_{α}}$	0.6
ρ_δ	0.94	p	10
ρ_v	0.94	m	5
ρ_a	0.94	T	150 s
ρ_j	0.94	T_s	0.05 s
R	Diag (1, 1)	t_h	1.5 s
M_veh	1550 kg	τ_l	0.15 s
l₁	1.1 m	-	-

Table 3. The symbols in the RMSE calculations.

Symbol	Description
X	actual horizontal coordinate
Y	actual vertical coordinate
X_ref	the referenced horizontal coordinate
Y_ref	the referenced vertical coordinate
var(j)	various variables (β, a_y, δ_f and $\dot{Ψ}$ ) at moment j
n_tot	the number of calculations

Table 4. The settings of the simulation scenario.

T (s)	μ	ini_v_f (m/s)	ini_v_x (m/s)	amp_a_x (m/s²)	ini_∆s (m)
25	25	25	25	1	44.5
μ	the ground adhesion coefficient
ini_v_f (m/s)	the initial longitudinal velocity of the front vehicle
ini_v_x (m/s)	the initial longitudinal velocity of the following vehicle
amp_a_x (m/s²)	the amplitude for longitudinal acceleration of the front vehicle
ini_Δs (m)	the initial spacing

Table 5. The hardware and software for simulation.

Name	Property
GPU	NVIDIA TITAN V
CPU	Intel Core i7-4790 (3.60 GHz)
Memory	32GB (3200 MHz)
Operating system	Windows 10 (64 bit)
CUDA	10.1
Python	3.8.8
PyTorch	1.7.1
CarSim	2016.1
Matlab	2018a

Table 6. The evaluation criteria for objectives to be optimized.

Objectives	Indicators
safety	min\|Δs\| > 5 m
followability	$R M S E_{δ s}$ and $R M S E_{v_{r e l}}$
platoon stability	δ_s_,i → 0
comfortability	max\|jerk\| < 3 m/s³
lane tracking	$R M S E_{Δ_{X Y}}$
stability in lateral direction	$R M S E_{β}$ , $R M S E_{a_{y}}$ , $R M S E_{δ_{f}}$ and $R M S E_{\dot{ψ}}$
economic performance	∆SOC/s

Table 7. Multiple indicators of the MPC_QMIX strategy for a homogeneous platoon.

Objectives	Indicator	Vehicle1	Vehicle2	Vehciel3	Vehicle4	Average
Followability	$R M S E_{δ s}$ (m)	0.9120	0.9098	0.8985	0.8834	0.9009
Followability	$R M S E_{v r e l} (m / s)$	0.8676	0.8127	0.7569	0.7051	0.7856
Lane tracking	$R M S E_{Δ_{X Y}}$ (m)	0.0453	0.0448	0.0442	0.0437	0.0445
Stability in lateral direction	$R M S E_{β}$ (deg)	0.0935	0.0930	0.0926	0.0923	0.0929
	$R M S E_{a_{y}} (m / s^{2})$	0.8074	0.8036	0.7999	0.7965	0.7951
	$R M S E_{δ_{f}} (\deg)$	0.4424	0.4411	0.4398	0.4386	0.4405
	$R M S E_{\dot{ψ}} (\deg / s)$	2.0211	2.0154	2.0098	2.0042	2.0126
Economic performance	ΔSOC/s (km⁻¹)	0.0053	0.0051	0.0049	0.0048	0.0050

Table 8. A comparison of followability for three strategies in homogeneous platoon.

Strategy	Indicator	Vehicle1	Vehicle2	Vehciel3	Vehicle4	Average
MPC_QMIX	$R M S E_{δ s}$ (m)	0.9120	0.9098	0.8985	0.8834	0.9009
MPC_QMIX	$R M S E_{v r e l} (m / s)$	0.8676	0.8127	0.7569	0.7051	0.7856
MPC_IOL	$R M S E_{δ s}$ (m)	0.9413	0.9219	0.9080	0.8943	0.9164
MPC_IOL	$R M S E_{v r e l} (m / s)$	0.8953	0.8310	0.7703	0.7219	0.8046
MPC_ORI	$R M S E_{δ s}$ (m)	1.1531	1.1304	1.1177	1.1035	1.1261
MPC_ORI	$R M S E_{v r e l} (m / s)$	0.8593	0.7990	0.7395	0.6905	0.7721

Table 9. A comparison of lane tracking for three strategies in homogeneous platoon.

Strategy	Indicator	Vehicle1	Vehicle2	Vehciel3	Vehicle4	Average
MPC_QMIX	$R M S E_{Δ_{X Y}}$ (m)	0.0453	0.0448	0.0442	0.0437	0.0445
MPC_IQL	$R M S E_{Δ_{X Y}}$ (m)	0.0470	0.0463	0.0457	0.0450	0.0460
MPC_ORI	$R M S E_{Δ_{X Y}}$ (m)	0.0683	0.0672	0.0665	0.0659	0.0670

Table 10. A comparison of stability in lateral direction for three strategies in homogeneous platoon.

Strategy	Indicator	Vehicle1	Vehicle2	Vehciel3	Vehicle4	Average
MPC_QMIX	$R M S E_{β}$ (deg)	0.0935	0.0930	0.0926	0.0923	0.0929
	$R M S E_{a_{y}} (m / s^{2})$	0.8074	0.8036	0.7999	0.7965	0.7951
	$R M S E_{δ_{f}} (\deg)$	0.4424	0.4411	0.4398	0.4386	0.4405
	$R M S E_{\dot{ψ}} (\deg / s)$	2.0211	2.0154	2.0098	2.0042	2.0126
MPC_IOL	$R M S E_{β}$ (deg)	0.0967	0.0961	0.0957	0.0954	0.0960
	$R M S E_{a_{y}} (m / s^{2})$	0.8285	0.8247	0.8210	0.8179	0.8230
	$R M S E_{δ_{f}} (\deg)$	0.4636	0.4622	0.4608	0.4595	0.4615
	$R M S E_{\dot{ψ}} (\deg / s)$	2.1307	2.1232	2.1126	2.1067	2.1183
MPC_ORI	$R M S E_{β}$ (deg)	0.1037	0.1030	0.1025	0.1021	0.1028
	$R M S E_{a_{y}} (m / s^{2})$	0.9163	0.9124	0.9081	0.9047	0.9104
	$R M S E_{δ_{f}} (\deg)$	0.5568	0.5543	0.5521	0.5499	0.5533
	$R M S E_{\dot{ψ}} (\deg / s)$	2.5332	2.5265	2.5001	2.4433	2.5008

Table 11. A comparison of economic performance for three strategies in homogeneous platoon.

Strategy	Indicator	Vehicle1	Vehicle2	Vehciel3	Vehicle4	Average
MPC_QMIX	ΔSOC/s (km⁻¹)	0.0053	0.0051	0.0049	0.0048	0.0050
MPC_IQL	ΔSOC/s (km⁻¹)	0.0057	0.0054	0.0051	0.0050	0.0053
MPC_ORI	ΔSOC/s (km⁻¹)	0.0065	0.0061	0.0058	0.0056	0.0060

Table 12. Multiple indicators of the MPC_QMIX strategy for a heterogeneous platoon.

Objectives	Indicator	Vehicle1	Vehicle2	Vehciel3	Vehicle4	Average
Followability	$R M S E_{δ s}$ (m)	0.9231	1.0914	1.4290	1.8470	1.3226
Followability	$R M S E_{v r e l} (m / s)$	0.8753	0.7666	0.6806	0.6109	0.7334
Lane tracking	$R M S E_{Δ_{X Y}}$ (m)	0.0456	0.0450	0.0445	0.0441	0.0448
Stability in lateral direction	$R M S E_{β}$ (deg)	0.0964	0.0945	0.0931	0.0927	0.0942
	$R M S E_{a_{y}} (m / s^{2})$	0.8155	0.8077	0.8013	0.7980	0.8056
	$R M S E_{δ_{f}} (\deg)$	0.4441	0.4419	0.4401	0.4389	0.4413
	$R M S E_{\dot{ψ}} (\deg / s)$	2.0288	2.0193	2.0110	2.0055	2.0162
Economic performance	ΔSOC/s (km⁻¹)	0.0053	0.0051	0.0050	0.0049	0.0051

Table 13. The comparison of followability for three strategies in a heterogeneous platoon.

Strategy	Indicator	Vehicle1	Vehicle2	Vehciel3	Vehicle4	Average
MPC_QMIX	$R M S E_{δ s}$ (m)	0.9231	1.0914	1.4290	1.8470	1.3226
MPC_QMIX	$R M S E_{v r e l} (m / s)$	0.8753	0.7666	0.6806	0.6109	0.7334
MPC_IOL	$R M S E_{δ s}$ (m)	0.9413	1.4713	1.8580	2.1343	1.6012
MPC_IOL	$R M S E_{v r e l} (m / s)$	0.8953	0.7862	0.6918	0.6376	0.7527
MPC_ORI	$R M S E_{δ s}$ (m)	1.1531	1.6885	2.0436	2.3242	1.8924
MPC_ORI	$R M S E_{v r e l} (m / s)$	0.8793	0.7475	0.6632	0.5944	0.7211

Table 14. The comparison of lane tracking for three strategies in a heterogeneous platoon.

Strategy	Indicator	Vehicle1	Vehicle2	Vehciel3	Vehicle4	Average
MPC_QMIX	$R M S E_{Δ_{X Y}}$ (m)	0.0456	0.0450	0.0445	0.0441	0.0448
MPC_IQL	$R M S E_{Δ_{X Y}}$ (m)	0.0470	0.0465	0.0459	0.0453	0.0462
MPC_ORI	$R M S E_{Δ_{X Y}}$ (m)	0.0683	0.0675	0.0670	0.0663	0.0673

Table 15. The comparison of stability in lateral direction for three strategies in a heterogeneous platoon.

Strategy	Indicator	Vehicle1	Vehicle2	Vehciel3	Vehicle4	Average
MPC_QMIX	$R M S E_{β}$ (deg)	0.0964	0.0945	0.0931	0.0927	0.0942
	$R M S E_{a_{y}} (m / s^{2})$	0.8155	0.8077	0.8013	0.7980	0.8056
	$R M S E_{δ_{f}} (\deg)$	0.4441	0.4419	0.4401	0.4389	0.4413
	$R M S E_{\dot{ψ}} (\deg / s)$	2.0288	2.0193	2.0110	2.0055	2.0162
MPC_IOL	$R M S E_{β}$ (deg)	0.0967	0.0963	0.0960	0.0957	0.0962
	$R M S E_{a_{y}} (m / s^{2})$	0.8285	0.8252	0.8224	0.8190	0.8238
	$R M S E_{δ_{f}} (\deg)$	0.4636	0.4627	0.4613	0.4603	0.4620
	$R M S E_{\dot{ψ}} (\deg / s)$	2.1307	2.1257	2.1148	2.1103	2.1204
MPC_ORI	$R M S E_{β}$ (deg)	0.1037	0.1032	0.1028	0.1024	0.1030
	$R M S E_{a_{y}} (m / s^{2})$	0.9163	0.9140	0.9107	0.9064	0.9119
	$R M S E_{δ_{f}} (\deg)$	0.5568	0.5552	0.5534	0.5509	0.5541
	$R M S E_{\dot{ψ}} (\deg / s)$	2.5332	2.5287	2.5029	2.4477	2.5031

Table 16. A comparison of economic performance for three strategies in a heterogeneous platoon.

Strategy	Indicator	Vehicle1	Vehicle2	Vehciel3	Vehicle4	Average
MPC_QMIX	ΔSOC/s (km⁻¹)	0.0053	0.0051	0.0050	0.0049	0.0051
MPC_IQL	ΔSOC/s (km⁻¹)	0.0057	0.0055	0.0052	0.0051	0.0054
MPC_ORI	ΔSOC/s (km⁻¹)	0.0065	0.0062	0.0060	0.0058	0.0061

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, S.; Zhuan, X. Distributed Model Predictive Control for Two-Dimensional Electric Vehicle Platoon Based on QMIX Algorithm. Symmetry 2022, 14, 2069. https://doi.org/10.3390/sym14102069

AMA Style

Zhang S, Zhuan X. Distributed Model Predictive Control for Two-Dimensional Electric Vehicle Platoon Based on QMIX Algorithm. Symmetry. 2022; 14(10):2069. https://doi.org/10.3390/sym14102069

Chicago/Turabian Style

Zhang, Sheng, and Xiangtao Zhuan. 2022. "Distributed Model Predictive Control for Two-Dimensional Electric Vehicle Platoon Based on QMIX Algorithm" Symmetry 14, no. 10: 2069. https://doi.org/10.3390/sym14102069

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distributed Model Predictive Control for Two-Dimensional Electric Vehicle Platoon Based on QMIX Algorithm

Abstract

1. Introduction

2. Related Work

3. Vehicle Model for Electric Vehicle

4. Model of Vehicle Platoon

5. IDMPC Strategy for Platoon

5.1. Distributed Control Structure for Platoon

5.2. Distributed MPC Algorithm

5.3. QMIX-Based Optimization Algorithm for Weights

6. Simulation and Experiment Results

6.1. Simulation Experiment Settings

6.2. Analysis of Experimental Results

6.2.1. Analysis of Experimental Results for Homogeneous Platoon

6.2.2. Analysis of Experimental Results for Heterogeneous Platoon

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI