Next Article in Journal
S-Score Table-Based Parameter-Reduction Approach for Fuzzy Soft Sets
Previous Article in Journal
Statistical Modelling for the Darcy–Forchheimer Flow of Casson Cobalt Ferrite-Water/Ethylene Glycol Nanofluid under Nonlinear Radiation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Two-Dimensional Car-Following Control Strategy for Electric Vehicle Based on MPC and DQN

1
Department of Artificial Intelligence and Automation, School of Electrical Engineering and Automation, Wuhan University, Wuhan 430072, China
2
Shenzhen Research Institute, Wuhan University, Shenzhen 518057, China
*
Author to whom correspondence should be addressed.
Symmetry 2022, 14(8), 1718; https://doi.org/10.3390/sym14081718
Submission received: 22 July 2022 / Revised: 11 August 2022 / Accepted: 14 August 2022 / Published: 17 August 2022
(This article belongs to the Section Engineering and Materials)

Abstract

:
For the coupling problem of longitudinal control and lateral control of vehicles, a two-dimensional (2-D) car-following control strategy for an electric vehicle is proposed in this paper. First, a 2-D car-following model for longitudinal following and lateral lane keeping is established. Then, a 2-D car-following control strategy is designed, and the longitudinal following control and lateral lane keeping control are integrated into one model predictive control (MPC) framework. The 2-D car-following strategy can realize the multi-objective coordinated optimization for longitudinal control and lateral control during the 2-D car-following process, and the multiple objectives are: safety, tracking, comfort, lane keeping, lateral stability and economy. In addition, the economy is important for electric vehicles. The weight matrix of the objective function in the MPC framework is symmetric, and the weight coefficients for the weight matrix have a great influence on the control. The contribution of this paper is: in order to adapt to different dynamic processes of lane keeping, the weight coefficients in the MPC framework are optimized in real-time based on the deep Q network (DQN) algorithm. Finally, to verify the 2-D car-following control strategy, a comparison strategy and two experimental scenarios are set, and simulation experiments are carried out. In scenario 1, compared with the comparison strategy, the lane keeping, lateral stability and economy of the proposed strategy are improved by 37.21%, 17.57% and 9.26%, respectively. In scenario 2, compared with the comparison strategy, the lane keeping, lateral stability and economy of the proposed strategy are improved by 36.45%, 16.66% and 18.52%, respectively. Therefore, compared with the comparison strategy, the 2-D car-following control strategy can have better lane keeping, lateral stability and economy on the premise of ensuring other performances during the 2-D car-following process.

1. Introduction

With the development of the automotive industry, the electric vehicle [1] and driving control system (DCS) [2] have become two important vehicle technologies. The battery and motor are two important components of electric vehicles, and electric vehicles are powered by a battery and driven by a motor [3,4]. Therefore, electric vehicles have the characteristics of zero-emission and environmental friendliness. Electric vehicles have high requirements on the mileage of the battery on a single charge, and the mileage of the battery on a single charge is related to economy, so economy is particularly important for electric vehicles [5,6]. Among various DCSs, adaptive cruise control (ACC) has been widely used [7]. ACC can assist with the longitudinal driving of the vehicle, which can effectively reduce the driver’s driving workload and the psychological burden [8]. The working mode of ACC is divided into speed control mode and car-following mode [9]. In the speed control mode, there is no vehicle in front, and the following vehicle travels at a constant longitudinal speed [10]. In the car-following mode, the control process is more complicated and more meaningful because of the involvement of the following vehicle and the preceding vehicle [11]. Therefore, this paper takes the electric vehicle equipped with the ACC system as the research object and studies the car-following process of the electric vehicle. The economy of electric vehicles is fully considered.
The model predictive control (MPC) algorithm is widely used in the design of the car-following control strategies for electric vehicles. The MPC algorithm consists of three parts: predictive model, rolling optimization and feedback correction [12,13,14,15]. The multi-objective optimization problem can be converted to a quadratic programming problem under the MPC framework, and the optimal control variables can be obtained by solving the quadratic programming problem. The existing studies on the car-following control strategy of electric vehicles are mainly designed under MPC frameworks. In [16], an energy-optimized car-following control strategy for electric vehicles was designed under an MPC framework, which planned the longitudinal speed trajectory of the following vehicles in real-time. The traffic information and road conditions were obtained in advance so as to improve the economy. In [17], a car-following control strategy for connected electric vehicles was proposed. Under the MPC framework, the security and economy were optimized. For security, the safe following distance was obtained through the state invariant set theory. For the economy, the speed of the preceding vehicle was predicted through a long short-term memory network. In [18], a car-following control strategy that can improve the cruising range was proposed. Based on the MPC framework, the minimum safe distance was maintained with the preceding vehicle, and the speed reference of the following vehicle was calculated according to the status of the traffic lights, thereby improving the economy of the electric vehicle for the car-following process. In [19], a multi-objective car-following control strategy for electric vehicles was proposed. The strategy adopted a hierarchical control structure. The upper layer optimized the safety, following performance, comfort and economy under the MPC framework, and the lower layer applied regenerative braking to recover energy. In [20], a car-following control strategy applied to in-wheel motor electric vehicles was proposed. The Bayesian network was used to predict the motion state of the vehicle in front, and the control objective in the car-following process was optimized under the MPC framework. Therefore, on the premise of maintaining a safe distance from the vehicle in front, better economy can be obtained.
However, the existing car-following control strategies for electric vehicles mainly focus on longitudinal control and do not consider lateral control. Car-following control that only includes longitudinal control is one-dimensional car-following control. While the electric vehicle is following the preceding vehicle on a single curved lane, lateral control can be achieved with a lane keeping assistance system [21]. The longitudinal following control and lateral lane keeping control are coupled with each other during the driving process, so they also influence each other. When the vehicle is following in a straight lane, the longitudinal following control of the vehicle is mainly considered. When the vehicle is following on a single curved lane, the coupling of longitudinal following control and lateral lane keeping control needs to be considered. If the longitudinal following control and the lateral lane keeping control are optimized with two MPC frameworks, the longitudinal control and lateral control cannot interact with each other well. As a result, the longitudinal control and the lateral control are not accurate enough. Therefore, to follow a preceding vehicle on a single curved lane, it is necessary to optimize the multiple objectives related to longitudinal control and lateral control in one MPC framework. In addition, for the MPC framework, the weight matrix of the objective function is symmetric, and the weight coefficients in the weight matrix have a great influence on the control effect. While the electric vehicle is following the preceding vehicle on a single curved lane, the lateral lane keeping process changes all the time. If constant weight coefficients are adopted, it is difficult to adapt to different dynamic processes of lane keeping, resulting in a poor control effect. In order to adapt to different dynamic processes of lane keeping, the weight coefficients in the MPC framework need to be optimized in real-time.
Therefore, this paper proposes a two-dimensional (2-D) car-following control strategy. The 2-D car-following control strategy adopts a hierarchical control structure and contains upper-layer control and lower-layer control. Multiple control objectives related to the longitudinal and lateral directions are optimized under the MPC framework for the upper-layer control. The driving control, braking control and active steering control are achieved for the lower-layer control. The contribution of this paper is: in order to adapt to different dynamic processes of lane keeping, the weight coefficients in the MPC framework are optimized in real-time based on the deep Q network (DQN) algorithm [22].
The following content of this paper is organized as follows: The model of an electric vehicle is established in Section 2; the 2-D car-following model is established in Section 3; the 2-D car-following control strategy is designed in Section 4; the 2-D car-following control strategy is verified in Section 5; the conclusions are obtained in Section 6.

2. Model of Electric Vehicle

The structure of the target electric vehicle is shown in Figure 1, and the target electric vehicle is a front-drive electric vehicle. The electric vehicle obtains information from the environment through millimeter-wave radar and cameras. Among them, the millimeter wave radar is mainly used to detect the motion state of the preceding vehicle, and the camera is mainly used to detect the lane line. In an electric vehicle model, the motor and battery are two important components.
The motor is a permanent magnet synchronous motor, and the motor model is established with the torque and speed characteristics of the motor [23]. The working state of the motor is divided into two states: driving and braking, and the motor efficiency is considered in the model. During motor modeling, the input and output characteristics of the motor need to be focused on, and the complex dynamic characteristics of the motor can be appropriately simplified. The integrated expression of the motor controller and the motor body is its input and output characteristics, so the motor and the motor controller are usually integrated together for modeling during the modeling process. The torque and speed characteristics of the motor are shown in Figure 2, and motor torque corresponds to driving torque and braking torque. The battery is a lithium battery, and the battery model is established with the equivalent internal resistance model [24]. The working state of the battery is divided into two states of charging and discharging, and the efficiency of the battery is considered in the model. The equivalent internal resistance model of the battery is shown in Figure 3.
During braking, electric vehicles can recover energy through regenerative braking. The regenerative braking control strategy contains the distribution of braking force for motors and hydraulics, as well as the distribution of braking force for the front axle and rear axle. The control strategy for regenerative braking in [19] is adopted in this paper.
Figure 2. Torque–speed characteristics of the motor.
Figure 2. Torque–speed characteristics of the motor.
Symmetry 14 01718 g002
Figure 3. Equivalent internal resistance model of battery.
Figure 3. Equivalent internal resistance model of battery.
Symmetry 14 01718 g003

3. Model of 2-D Car-Following Process

The longitudinal control and lateral control of the vehicle are coupled. Figure 4 shows the 2-D car-following process. In the 2-D car-following model, it is necessary to consider the longitudinal following process and lateral lane keeping process. Therefore, the 2-D car-following model can better reflect the longitudinal and lateral dynamic characteristics of the vehicle during the 2-D car-following process.
As shown in Figure 5, it is the longitudinal car-following process and the longitudinal motion relationship between two vehicles can be presented. The mathematical form of the longitudinal car-following process [19] is described as follows:
Figure 5. Longitudinal car-following process.
Figure 5. Longitudinal car-following process.
Symmetry 14 01718 g005
{ Δ s ( k + 1 ) = Δ s ( k ) + v r e l ( k ) T s + 1 2 a f x ( k ) T s 2 1 2 a x ( k ) T s 2 v x ( k + 1 ) = v x ( k ) + a x ( k ) T s v r e l ( k + 1 ) = v f x ( k + 1 ) v x ( k + 1 ) a x ( k + 1 ) = ( 1 T s τ l ) a x ( k ) + T s τ l a x d e s ( k ) j x ( k + 1 ) = 1 τ l a x ( k ) + 1 τ l a x d e s ( k )
The longitudinal car-following process is described in the form of the state equation as follows:
x 1 ( k + 1 ) = A 1 x 1 ( k ) + B 1 u 1 ( k ) + G 1 w 1 ( k )
where
x 1 ( k ) = [ Δ s ( k ) v x ( k ) v r e l ( k ) a x ( k ) j x ( k ) ] T
u 1 ( k ) = [ a x d e s ( k ) ] T
w 1 ( k ) = [ a f x ( k ) ] T
A 1 = [ 1 0 T s 1 2 T s 2 0 0 1 0 T s 0 0 0 1 T s 0 0 0 0 1 T s τ l 0 0 0 0 1 τ l 0 ]
B 1 = [ 0 0 0 T s τ l 1 τ l ] T
G 1 = [ 1 2 T s 2 0 T s 0 0 ] T
where at time k, state variable x1(k) is composed of the distance between vehicles Δs(k), longitudinal speed vx(k), relative speed vrel(k), longitudinal acceleration ax(k) and jerk jx(k); control variable u1(k) is composed of the desired longitudinal acceleration axdes; system disturbance variable w1(k) is composed of the longitudinal acceleration of the preceding vehicle afx. In addition, Ts is the sampling time and τ l   is the time lag.
The is the lateral lane keeping process is shown in Figure 6, and the lateral motion relationship and the lateral dynamic characteristics of the following vehicle can be presented. The mathematical form of the lateral lane keeping process [25] is described as follows:
Figure 6. Lateral lane keeping process.
Figure 6. Lateral lane keeping process.
Symmetry 14 01718 g006
M v e h e ¨ s = e ˙ s [ 2 C α f v x 2 C α r v x ] + e α [ 2 C α f + 2 C α r ] + e ˙ α [ 2 C α f l 1 v x + 2 C α r l 2 v x ] + Ψ ˙ d e s [ 2 C α f l 1 v x + 2 C α r l 2 v x v x ] + 2 C α f δ f
I z e ¨ α = 2 C α f l 1 δ f + e ˙ s [ 2 C α f l 1 v x 2 C α r l 2 v x ] + e α [ 2 C α f l 1 2 C α r l 2 ] + e ˙ α [ 2 C α f l 1 2 v x 2 C α r l 2 2 v x ] + Ψ ˙ d e s [ 2 C α f l 1 2 v x 2 C α r l 2 2 v x ]
The lateral lane keeping process is described in the form of the state equation with approximate discretization as follows:
x 2 ( k + 1 ) = A 2 x 2 ( k ) + B 2 u 2 ( k ) + G 2 w 2 ( k )
where
x 2 ( k ) = [ e s ( k ) e ˙ s ( k ) e α ( k ) e ˙ α ( k ) ] T
u 2 ( k ) = [ δ f ( k ) ] T
w 2 ( k ) = [ Ψ ˙ d e s ( k ) ] T
A 2 = [ 1 T s 0 0 0 1 2 C α f + 2 C α r M v e h v x T s 2 C α f + 2 C α r M v e h T s 2 C α f l 1 2 C α r l 2 M v e h v x T s 0 0 1 T s 0 2 C α f l 1 2 C α r l 2 I z v x T s 2 C α f l 1 2 C α r l 2 I z T s 1 2 C α f l 1 2 + 2 C α r l 2 2 I z v x T s ]
B 2 = [ 0 , 2 C a f M v e h T s , 0 , 2 C a f l 1 I z T s ] T
G 2 = [ 0 , 2 C α f l 1 2 C α r l 2 M v e h v x T s v x T s , 0 , 2 C α f l 1 2 + 2 C α r l 2 2 I z v x T s ] T
where at time k, state variable x2(k) is composed of the lateral distance deviation es(k), derivative of lateral distance deviation e ˙ s(k), directional deviation eα(k) and derivative of directional deviation e ˙ α(k); control variable u2(k) is composed of the targeted turning angle δf; system disturbance variable w2(k) is composed of the desired yaw rate Ψ ˙ des. Mveh is the mass of the electric vehicle; Iz is the moment of inertia of the electric vehicle; Cαf and Cαr are the cornering stiffness of the front and rear wheels, respectively; l1 and l2 are the distances from the center of mass to the front and rear axles, respectively.
With the models in [19,25], the 2-D car-following model is established. The equation for the 2-D car-following model is described as follows:
x ( k + 1 ) = A x ( k ) + B u ( k ) + G w ( k )
where
x ( k ) = [ Δ s ( k ) v x ( k ) v r e l ( k ) a x ( k ) j x ( k ) e s ( k ) e ˙ s ( k ) e α ( k ) e ˙ α ( k ) ] T
u ( k ) = [ a x d e s ( k ) δ f ( k ) ] T
w ( k ) = [ a f x ( k ) Ψ ˙ d e s ( k ) ] T
A = [ A 1 0 0 A 2 ]
B = [ B 1 0 0 B 2 ]
G = [ G 1 0 0 G 2 ]

4. Two-Dimensional Car-Following Control Strategy

4.1. Hierarchical Control Structure

As shown in Figure 7, the hierarchical control structure is adopted in the 2-D car-following control strategy, including the upper-layer controller and the lower-layer controller. The longitudinal and lateral control are coupled with each other in the upper-layer controller. The longitudinal speed is input into the lateral control in real-time to affect the lateral performance, and the lateral acceleration is input into the longitudinal control in real-time to constrain the magnitude of the longitudinal acceleration. The lower-layer controller contains the driving control algorithm, braking control algorithm and active steering control algorithm. The focus of this paper is on the upper-layer controller.
The vehicle state and lane centerline information are obtained through sensors, thereby establishing a 2-D car-following model. In the upper-layer controller, the longitudinal following control and lateral lane keeping control are integrated into an MPC framework, and the control objectives include safety, following performance, comfort, lane keeping, lateral stability and economy. In order to smooth the system response characteristics, a reference trajectory is set with an exponential decay function. To solve the problem for selecting the weight coefficients in the MPC framework, an optimization algorithm for weight coefficients based on the DQN algorithm is proposed. The optimal weight coefficients are obtained through the trained deep neural network in real-time so as to better adapt to different dynamic processes of lane keeping. The upper-layer controller obtains the optimal control variables through rolling optimization and the optimal control variables correspond to the desired longitudinal acceleration and the targeted turning angle.
In the lower-layer controller, the driving state is determined through the state transformation logic. The driving and braking control algorithms are designed based on the PID algorithm with feedforward and feedback functions. The motor and hydraulic braking system are controlled through driving and braking control algorithms. The active steering actuator for front wheels is controlled through the active steering control algorithm.
Figure 7. Hierarchical control structure for the 2-D car-following process.
Figure 7. Hierarchical control structure for the 2-D car-following process.
Symmetry 14 01718 g007

4.2. Multi-Objective Optimization MPC Algorithm

In the 2-D car-following process, to facilitate the optimization of multiple objectives for longitudinal following and lateral lane keeping, the corresponding performance variables are set as follows:
y ( k ) = [ δ s ( k ) v r e l ( k ) a x ( k ) j x ( k ) e s ( k ) e ˙ s ( k ) e α ( k ) e ˙ α ( k ) ] T
where δs is the error of distance between vehicles, and δs is calculated as follows:
δ s ( k ) = Δ s ( k ) ( v x t h + d 0 )
where th is the time headway, and d0 is the safe vehicle distance, which is the sum of the minimum distance between the vehicles and a body length.
The performance variables can be calculated based on the state variables in Equation (4):
y ( k ) = C x ( k ) Z
where
C = [ C 1 0 0 C 2 ]
C 1 = [ 1 t h 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 ]
C 2 = [ 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 ]
Z = [ Z 1 0 ]
Z 1 = [ d 0 0 0 0 ] T
To ensure tracking, it is necessary to minimize the error in distance between vehicles and relative speed; to ensure comfort, it is necessary to minimize longitudinal acceleration and jerk; to minimize energy consumption, it is necessary to minimize desired longitudinal acceleration [26]; to ensure lane keeping, it is necessary to minimize lateral distance deviation and directional deviation [19]; in order to ensure lateral stability, the derivatives of lateral distance deviation and directional deviation need to be minimized, and the turning angle needs to be minimized [25]. Therefore, to optimize the multiple objectives in the 2-D car-following process, the performance variables and control variables need to be minimized:
Objectives : { min | y ( k ) | min | u ( k ) |
In order to smooth the response characteristics of the system, a reference trajectory for the performance variables is set with the exponential decay function. The reference trajectory is described as:
y r e f ( k + i ) = φ i y ( k )
where
φ = [ φ 1 0 0 φ 2 ]
φ 1 = [ ρ δ 0 0 0 0 ρ v 0 0 0 0 ρ a 0 0 0 0 ρ j ]
φ 2 = [ ρ e s 0 0 0 0 ρ e ˙ s 0 0 0 0 ρ e α 0 0 0 0 ρ e ˙ α ]
where ρδ, ρv, ρa, ρj, ρ e s , ρ e ˙ s , ρ e α and ρ e ˙ α are the coefficients for the δs, vrel, ax, jx, es, e ˙ s, eα and e ˙ α in reference trajectories, respectively.
In the 2-D car-following process, for the longitudinal following control, it is necessary to ensure safety, and for the lateral lane keeping, it is necessary to set corresponding constraints on the targeted turning angle; the vehicle is limited by its own capability. The constraints are as follows:
s . t . { Δ s ( k ) d c v x min v x ( k ) v x max a x min a x ( k ) a x max j x min j x ( k ) j x max u 1 min u 1 ( k ) u 1 max u 2 min u 2 ( k ) u 2 max
where u1 and u2 are used to represent the desired longitudinal acceleration and the targeted turning angle, respectively.
For the stability of the vehicle, the lateral acceleration and longitudinal acceleration need to meet the following conditions:
a v e h = a x 2 + a y 2 μ max g
where ay is the lateral acceleration, aveh is the resultant acceleration of an electric vehicle and μmax is the maximum adhesion coefficient of vehicle tires and ground.
In order to ensure a certain adhesion margin, the above equation is modified into the following form:
a v e h = a x 2 + a y 2 μ max g ε
where ε is the coefficient for attachment margin, which is set to 1.
Converting the above equation results in the following equation:
( μ max g ε ) 2 a y 2 a x ( μ max g ε ) 2 a y 2
In the 2-D car-following control, the maximum and minimum of u1 are set as follows:
u 1 min ( k ) = max ( u 1 l o w , ( μ max g ε ) 2 a y ( k ) 2 )
u 1 max ( k ) = min ( u 1 u p , ( μ max g ε ) 2 a y ( k ) 2 )
where u1up and u1low are the maximum and minimum of the control variable for desired longitudinal acceleration without lateral lane keeping.
Longitudinal control and lateral control are integrated into an MPC framework, and the objective function is calculated as follows:
J = j = 1 p [ y ^ p ( k + j / k ) y r e f ( k + j ) ] T Q [ y ^ p ( k + j / k ) y r e f ( k + j ) ] + j = 0 m 1 u ( k + j ) R u ( k + j )
where p and m are the time domains for prediction and control, respectively.
Under the MPC framework, combining Equation (13) and Constraint 7, the optimization problem for multiple objectives can be converted into a constrained quadratic programming problem. By solving the quadratic programming problem, the optimal control sequence can be acquired. However, merely the first value in the control sequence is applied to the ACC system. The above optimization is repeated at the next sampling time.
The weight matrices Q and R in the objective function can be described as:
Q = [ w δ ( k ) w v ( k ) w a ( k ) w j ( k ) w e s ( k ) w e ˙ s ( k ) w e α ( k ) w e ˙ α ( k ) ]
R = [ w u 1 ( k ) w u 2 ( k ) ]
where, at sampling time k, wδ(k), wv(k), wa(k), wj(k), w e s ( k ) , w e ˙ s ( k ) ,   w e α ( k ) and w e ˙ α ( k ) are the weights of error of the distance between vehicles, relative speed, longitudinal acceleration, jerk, lateral distance deviation, derivatives of lateral distance deviation, directional deviation and derivatives of directional deviation, respectively. In addition, w u 1 ( k ) and w u 2 ( k ) are the weight coefficients for the desired longitudinal acceleration and targeted turning angle.
As shown in Table 1, the parameters in the MPC framework are set.

4.3. Weight Optimization Algorithm Based on DQN

In the 2-D car-following control, it is necessary to consider the longitudinal following and the lateral lane keeping at the same time. The MPC framework with constant weight is difficult to adapt to different dynamic processes of lane keeping. The weights of the algorithm need to be optimized in real-time.
Only the weight coefficients in the Q3 matrix are optimized in real-time, and the weight coefficients wu1(k) and wu2(k) in the R3 matrix are set to a constant value of 1. The purpose of not adjusting R3 is to ensure that the relative values among the weights are all based on wu1(k) and wu2(k). Since weight matrix Q3 contains eight weight coefficients, and some of the weight coefficients interact with each other, it is difficult to optimize the weight coefficients by traditional modeling methods.
For this reason, the weight coefficients are optimized through the method based on reinforcement learning optimized in real-time. The framework of reinforcement learning and the framework of the DQN algorithm is introduced in this paper first, and then the weight optimization algorithm is designed based on the above two frameworks.
Reinforcement learning, as a branch of machine learning, derives its ideas from behavioral psychology [27]. In reinforcement learning, the agent realizes the learning process by interacting with the environment [28]. The goal of the learning process is to obtain the optimal policy by maximizing the expected value of the accumulated reward. Figure 8 shows the framework of reinforcement learning.
Reinforcement learning can be modeled by a Markov decision process (MDP) [28]; MDP is described as a tuple with five components:
M = ( S , A , P , R , γ )
where S is the state set, A is the action set, P is the state transition function, R is the reward function, r(s,a) is the reward value obtained by the agent that takes action a in state s, γ is the discount factor of the reward and γ ∈ (0,1).
The action-value function is the expected reward with s and a. If the agent operates with the policy π, the action-value function is defined as follows:
Q π ( s , a ) = Ε [ t = 0 γ t r t ]
where
r t = r ( s t , a t ) , a t = π ( s t )
s t + 1 ~ P ( s t , a t )
s 0 = s , a 0 = a
where t represents the time step.
The optimal policy π* is calculated as follows:
π * = argmax Q π ( s , a )
where Π is the policy space.
The optimal action-value function is described as follows:
Q * ( s , a ) = Q π * ( s , a ) = Ε s ~ P ( s , a ) [ r + γ max a Q * ( s , a ) ]
During the solution process of MDP, the state transition function and action-value function are unknown, but the observed sequence (st, at, st+1, rt) can be obtained from the MDP. The observed sequence is also called the trajectory, and the trajectory can be applied to estimate the action-value function. Most solvers of MDP are implemented based on the iterative estimation of the optimal action-value function. Among them, the calculation process of the Q-learning algorithm [29] is as follows:
Q l + 1 ( s t , a t ) Q l ( s t , a t ) + α t [ r t + 1 + γ max a Q l ( s t , a t ) Q l ( s t , a t ) ]
where l is the index, αt is the learning rate at step t and the action at is selected in a semi-random manner for the ε-greedy strategy.
When the scale of the spaces for state and action are large, the accuracy of the Q-learning algorithm cannot be guaranteed, so the method of parameter fitting is applied. Deep neural networks can be applied to fit action-value functions, and the DQN algorithm is an example.
The framework of the DQN algorithm is shown in Figure 9. In addition, the DQN algorithm contains two Q networks, namely the main network and the target network. The main network is used to predict the optimal action for each step, and the parameter is θ; the target network is applied to acquire the target of the gradient descent step, and the parameter is θ ^ . The difference between the estimation and the target for the action-value function can be expressed in the form of a loss function so that the solution of the MDP is transformed to calculate parameter θ to minimize the loss function. The loss function of the DQN algorithm is set as:
L ( θ ) = Ε ( s , a , r , s ) ~ U ( D ) [ ( r + γ m a x a Q ( s , a ; θ ^ ) Q ( s , a ; θ ) ) 2 ]
where (s, a, r, s′)~U(D) means a random and uniform sample drawn from the empirical pool.
The DQN algorithm uses the experience replay and target network to solve instability and divergence problems [30]. In the experience replay, the observed sequence (st, at, rt, st+1) is stored in the replay buffer, the correlation between the data is eliminated by random sampling, and the data distribution is smooth. Observed sequences are also called experiences. In the target network, the network only keeps individual parameters for the network and is regularly updated to reduce the correlation between the estimated value and the target value of the action-value function.
The selection of weight coefficients in the weight matrix is converted into an MDP first, and then the weight coefficients are optimized based on the DQN algorithm. The state st and action at of the MDP are defined as follows:
s t = ( δ s , t v r e l , t a x , t j x , t e s , t e ˙ s , t e α , t e ˙ α , t )
a t = ( w δ s , t w v r e l , t w a x , t w j x , t w e s , t w e ˙ s , t w e α , t w e ˙ α , t )
When the vehicle is driving on a straight road, the longitudinal following control should be given priority, and greater weight should be applied to the weight coefficients related to longitudinal following under the MPC framework. When the vehicle is driving on a curved road, the lateral lane keeping control should be given priority, and greater weight is applied to the weight coefficients related to lane keeping under the MPC framework. Based on the above analysis and combined with trial-and-error experiments, the reward settings for the t time step are as follows:
r t = ( 5 ( δ s , t ) 2 + 5 ( v r e l , t ) 2 + 50 ( a x , t ) 2 + 50 ( j x , t ) 2 ) * 0.001 ( 50 ( e s , t ) 2 + 250 ( e α , t ) 2 + 50 ( e ˙ s , t ) 2 + 250 ( e ˙ α , t ) 2 ) * 0.001 10 ζ 1 + 2 ζ 2 + ζ 3
where ζ1, ζ2 and ζ3 are the fine-tuning coefficients. If the simulation ends, ζ1 = 1, otherwise ζ1 = 0; if vrel2 < 1, ζ2 = 1, otherwise ζ2 = 0; if es2 < 0.01, ζ3 = 1, otherwise ζ3 = 0.
For the weight optimization algorithm, the network in DQN consists of the input layer, the hidden layer and the output layer. Among them, the hidden layer consists of three layers. In each hidden layer, the number of neurons is 48, 96 and 48, respectively. In the output layer, the activation function ReLU is adopted. Since the weight coefficients are all greater than zero, the output of the activation function for the output layer needs to be modified. The minimum value of each weight coefficient mapped by the activation function is greater than or equal to 10−4. The DQN algorithm adopts an offline training process; a single agent interacts with the environment and trains the neural network according to the state, reward function and action. When the training is completed, no more learning is required. The neural network directly outputs the weight coefficients of the Q3 matrix according to the state.
The maximum iteration round of the DQN algorithm is set to 12,000, the learning rate is set to 0.01, the capacity of the replay buffer is 2000, the size of the batch is 32, the discount factor in the reward function is set to 0.95 and the parameters for the target network and main network are synchronized every 10 iteration rounds, the initial ε of the ε-greedy policy is set to 0.99 and then decays linearly, in turn, with each iteration round.

5. Simulation Experiment and Analysis

5.1. Experimental Method

In the 2-D car-following process, the control objectives contain safety, following performance, comfort, lane keeping, lateral stability and economy. The strategy proposed in this paper is the target strategy, which is abbreviated as Integ_MPC for the convenience of the following description.
In order to verify the target strategy, a comparison strategy is set up. The comparison strategy adopts two MPC frameworks to achieve longitudinal and lateral control. The longitudinal control adopts the longitudinal following strategy in [19], and the lateral control adopts the lateral lane keeping strategy in [25]. For the convenience of the following description, the comparison strategy is abbreviated as: MPC1_MPC2.
In order to evaluate the target strategy, the following evaluation criteria are set: (1) whether the distance between the vehicles is greater than the minimum safe distance (5 m) is regarded as the judgment standard for vehicle safety; (2) following performance is judged by the root mean square estimation for the error of distance between vehicles and relative speed [31]; (3) the comfort is judged by whether the maximum value of the absolute value of the jerk is less than 3 m/s3; (4) lane keeping is judged by root mean square estimation of the horizontal and vertical coordinate deviation; (5) the lateral stability is judged by the root mean square estimation of the sideslip angle of the centroid, lateral acceleration, turning angle and yaw rate; (6) the economy is judged by the ratio of the change in SOC and the distance.
In order to facilitate the analysis of the following performance and lane keeping, the horizontal and vertical coordinate deviation in the geodetic coordinate system ΔXY and root mean square estimation (RMSE) [31] are introduced as follows:
Δ X Y ( i ) = ( X ( i ) X r e f ( i ) ) 2 + ( Y ( i ) Y r e f ( i ) ) 2
R M S E v a r = 1 n t o t i = 1 n t o t ( v a r ( i ) ) 2
n t o t = T T s
where X and Y denote the actual horizontal and vertical coordinates for the centroid of the vehicle, respectively; Xref and Yref denote the referenced horizontal and vertical coordinates for the centroid of the vehicle; var(i) denotes the various variables at moment i, the error for the distance between vehicles is δs, the relative speed is vrel and the lateral and longitudinal distance deviation is ΔXY, the sideslip angle of the centroid is β, lateral acceleration is ay, turning angle is δf, yaw rate is ψ ˙ and ntot is the number of calculations.
In order to verify the target strategy, the following experimental scenarios are set up: (1) tracking a preceding vehicle with a time-varying speed; (2) lane change insertion of the preceding vehicle. The specific settings of the scenarios are shown in Table 2.
The complete lane centerline is shown in Figure 10, which contains a straight road with a length of 200 m and a curve with 18 arcs. The first arc is tangent to the X-axis, and the radii of the arcs are different. The radius setting of the arc comprehensively considers the constraints of the lateral acceleration and longitudinal speed of the vehicle.
The control strategy in the simulation experiment is constructed in Matlab/Simulink, and the vehicle dynamics and driving scenarios are constructed in CarSim. In addition, the DQN algorithm is constructed by Python and PyTorch, and real-time data transmission is realized through the communication module. The experimental software and hardware environment are shown in Table 3.

5.2. Analysis of Results

5.2.1. Scenario 1

In scenario 1, the vehicle follows a preceding vehicle whose speed varies over time. In this scenario, the motion state of the vehicle in front changes at all times. The vehicle is approaching from a long distance. In order to track the desired distance between vehicles and the longitudinal speed of the preceding vehicle, the following vehicle changes with the vehicle in front. The experimental results of this scenario are shown in Figure 11, Figure 12, Figure 13, Figure 14 and Figure 15.
The resultant acceleration of scenario 1 is shown in Figure 11. The resultant acceleration is the vector sum of the longitudinal acceleration and the lateral acceleration. This paper mainly analyzes the numerical change in the resultant acceleration. In the 2-D car-following control, the lateral motion and the longitudinal motion influence each other. The lateral acceleration is input into the longitudinal control to limit the magnitude of the longitudinal acceleration in real-time so that the magnitude of the resultant acceleration is maintained within a reasonable range. There are two upper limits for the resultant acceleration, which are μg and μg-ε. Among them, μg is the maximum upper limit of the resultant acceleration. In order to ensure a certain adhesion margin, the upper limit is corrected to μg-ε. In the 2-D car-following process, compared with MPC1_MPC2, Integ_MPC has a smaller fluctuation range of the resultant acceleration. In addition, the resultant accelerations of the two strategies are always smaller than the revised upper limit value, thus ensuring that the vehicle can run stably during the car-following process, which is the premise of the performance optimization for the 2-D car-following process.
Figure 11. Resultant acceleration in scenario 1.
Figure 11. Resultant acceleration in scenario 1.
Symmetry 14 01718 g011
Figure 12 shows the responses of longitudinal motion in scenario 1, including the distance between vehicles, longitudinal speed, longitudinal acceleration and jerk.
Figure 12a shows the change in distance between vehicles, where Integ_MPC_∆s and MPC1_MPC2_∆s are the distances between vehicles for Integ_MPC and MPC1_MPC2, respectively. Integ_MPC_∆sdes and MPC1_MPC2_∆sdes are the desired distances between vehicles for Integ_MPC and MPC1_MPC2, respectively. In the whole experiment process, the minimum distances between vehicles of both strategies are larger than the minimum safe distance (5 m), so the safety in the two strategies is guaranteed during the 2-D car-following process.
Following performance contains the tracking of the actual distance to the desired distance between vehicles and the tracking of the longitudinal speed of the following vehicle to the longitudinal speed of the vehicle in front. In terms of tracking the desired distance between vehicles, the root mean square estimation for Integ_MPC and MPC1_MPC2 are: 6.63 and 8.27 m, respectively; in terms of tracking the longitudinal speed of the preceding vehicle, the root mean square estimation for Integ_MPC and MPC1_MPC2 are 1.72 and 1.61 m/s, respectively. As shown in Figure 12b, after entering the curve, the fluctuation of longitudinal speed for Integ_MPC is smaller than that of MPC1_MPC2. Integ_MPC has better following performance for the desired distance between vehicles, and MPC1_MPC2 has better following performance for the longitudinal speed of the vehicle in front.
Figure 12c shows the change in longitudinal acceleration. In the early stage of the experiment, the following vehicle accelerates to reduce the error of distance between vehicles in two strategies. Therefore, for the longitudinal acceleration, both strategies have a large acceleration in the early stage of the experiment. When the actual distance approaches the desired distance between vehicles, both strategies adjust the longitudinal speed to approach the longitudinal speed of the preceding vehicle. Figure 12d shows the change in jerk. Both strategies ensure that the maximum absolute value of the jerk during the 2-D car-following process is less than 3 m/s3. Therefore, the comfort of the two strategies in the 2-D car-following process is guaranteed.
Figure 13 shows the vehicle trajectory in scenario 1, and XY_ REF is the lane centerline. For Integ_ MPC and MPC1_MPC2, the root mean square estimation of the horizontal and vertical coordinate deviation are 0.0437 and 0.0696 m, respectively, so the lane keeping is better in Integ_MPC.
Figure 12. Responses of longitudinal motion in scenario 1.
Figure 12. Responses of longitudinal motion in scenario 1.
Symmetry 14 01718 g012aSymmetry 14 01718 g012b
Figure 13. Vehicle trajectory in scenario 1.
Figure 13. Vehicle trajectory in scenario 1.
Symmetry 14 01718 g013
Shown in Figure 14 are the responses related to the lateral stability for scenario 1, including the sideslip angle of the centroid, lateral acceleration, turning angle and yaw rate. Before entering the curve, since the road is straight, the four responses are close to 0. When entering the curve, the four responses change greatly. After entering the curve, since the radius of the curve is not constant, the four responses also fluctuate slightly with the change in the radius of the arc.
The change in sideslip angle for the centroid is shown in Figure 14a. Since the vehicle is driving on a straight road at the beginning of the experiment, the sideslip angle of the centroid is 0 deg. When the driving distance exceeds 200 m, it begins to enter the curve. The longitudinal speed is low, so the sideslip angle of the centroid is positive. With the increase in the longitudinal speed, the vehicle tends to move centrifugally. The sideslip angle of the centroid for the two strategies decreases to a negative value. When the longitudinal speed decreases, the sideslip angle of the centroid gradually increases; then the longitudinal speed repeats the process of increasing and decreasing, but the change range decreases and the sideslip angle of the centroid decreases first and then increases. Finally, as the longitudinal speed approaches a stable value, the sideslip angle of the centroid also changes to be a stable value in the two strategies. During the change in the sideslip angle of the centroid, compared with MPC1_MPC2, the variation range for the sideslip angle of the centroid for Integ_MPC strategy is smaller, and the change tends to be stable rapidly. This is because, in Integ_MPC, the DQN-based weight optimization algorithm can obtain optimal weight coefficients, so as to optimize longitudinal following and lateral lane keeping coordinately. Therefore, the sideslip angle of the centroid can be effectively controlled when entering the curve and after entering the curve.
Figure 14b–d show the changes in lateral acceleration, turning angle and yaw rate, and the three responses for the two strategies change similarly. When entering the curve, the three responses in the two strategies fluctuate to a certain extent and then change with the longitudinal speed in real-time. When the longitudinal speed increases, the three responses of lateral motion also increase, and when the longitudinal speed decreases, and the three responses of lateral motion are also decreased.
For the Integ_MPC, the root mean square estimation of the sideslip angle of the centroid, lateral acceleration, turning angle and yaw rate are 0.0423 deg, 0.4933 m/s2, 0.3108 deg and 1.4099 deg/s. For the MPC1_MPC2, the root mean square estimation of the sideslip angle of the centroid, lateral acceleration, turning angle and yaw rate are 0.0501 deg, 0.5923 m/s2, 0.3901 deg and 1.7096 deg/s. Therefore, compared with MPC1_MPC2, Integ_MPC has better lateral stability.
Figure 14. Responses related to lateral motion in scenario 1.
Figure 14. Responses related to lateral motion in scenario 1.
Symmetry 14 01718 g014aSymmetry 14 01718 g014b
Figure 15 shows the change in battery power and SOC in scenario 1. From Figure 15a, in the initial stage of the experiment, to reduce the distance from the vehicle in front and approach the vehicle in front, the following vehicles in the two strategies accelerate rapidly, so the battery power is larger. When the distance between the vehicles is close to the desired distance, and the longitudinal speed of the following vehicle approaches the longitudinal speed of the preceding vehicle, the battery power gradually decreases. In the subsequent 2-D car-following process, the preceding vehicle decelerates first, then accelerates, and then decelerates, and finally, the speed tends to be stable, and the following vehicle also maintains the same trend of change, so the battery power changes between positive and negative values. The battery is in the energy consumption state when the battery power is positive. The battery is in the energy recovery state when the battery power is negative. As shown in Figure 15b, since the vehicle switches between acceleration and deceleration repeatedly, the SOC of the two strategies switches between falling and rising states repeatedly. The reduction of SOC corresponds to energy consumption, and the increase in SOC corresponds to energy recovery. Due to the conservation of energy, the energy recovered is less than the energy consumed, so the SOC showed a downward trend throughout the experiment. The indicators of economy for Integ_MPC and MPC1_MPC2 are 0.0049 and 0.0054 km−1, respectively, so the economy of Integ_MPC is better.
Figure 15. Battery power and SOC in scenario 1.
Figure 15. Battery power and SOC in scenario 1.
Symmetry 14 01718 g015aSymmetry 14 01718 g015b
To sum up, in scenario 1, the following vehicle tracks a preceding vehicle whose speed varies with time. In this scenario, Integ_MPC integrates longitudinal following control and lateral lane keeping control in one MPC framework. The weight coefficients in the objective function are optimized in real-time through the DQN-based weight optimization algorithm, thereby obtaining the optimal control variable. When the vehicle is driving on a straight road, longitudinal following is a priority; when the vehicle is driving on a curve, lane keeping is a priority. While the following vehicle is passing through the curve, since the longitudinal speed can have an impact on the lateral stability, Integ_MPC improves the lateral stability by reducing the amplitude of the longitudinal speed. Although the following performance for the longitudinal speed is reduced, the impact is small. In the 2-D car-following process, the fluctuation range of longitudinal speed and longitudinal acceleration becomes smaller, thereby reducing the energy consumption on the actuator. From Table 4, the improvement in lane keeping is 37.21%, the average improvement in lateral stability is 17.57% and the improvement in economy is 9.26%. Compared with the comparison strategy, Integ_MPC achieves better lane keeping, lateral stability and economy on the premise of ensuring other performances during the car-following process. Therefore, Integ_MPC can realize the coordinated optimization of longitudinal control and lateral control.

5.2.2. Scenario 2

In Scenario 2, the vehicle follows a vehicle in front, and then the vehicle in front of the adjacent lane suddenly inserts into the lane to become the new preceding vehicle. The vehicle re-determines the following object and takes corresponding measures. The experimental results of this scenario are shown in Figure 16, Figure 17, Figure 18, Figure 19 and Figure 20.
Figure 16 shows the resultant acceleration in scenario 2. The resultant acceleration is the vector sum of the longitudinal acceleration and the lateral acceleration; this paper mainly analyzes the numerical change in the resultant acceleration. In the 2-D car-following control, the longitudinal motion and the lateral motion influence each other, and the lateral acceleration is input into the longitudinal control in real-time to limit the magnitude of the longitudinal acceleration so that the magnitude of the resultant acceleration is maintained within a reasonable range. There are two upper limits for the resultant acceleration, which are μg and μg-ε. Among them, μg is the maximum upper limit of the resultant acceleration. In order to ensure a certain adhesion margin, the upper limit is corrected to μg-ε. During the whole experiment, compared with MPC1_MPC2, Integ_MPC has a smaller fluctuation range of the resultant acceleration. The resultant accelerations of the two strategies are always smaller than the revised upper limit value, thus ensuring that the vehicle can run stably during the 2-D car-following process, which is the premise of the performance optimization for the 2-D car-following process.
Figure 16. Resultant acceleration in scenario 2.
Figure 16. Resultant acceleration in scenario 2.
Symmetry 14 01718 g016
As shown in Figure 17, the responses of longitudinal motion in scenario 2 contain the distance between vehicles, longitudinal speed, longitudinal acceleration and jerk.
Figure 17a shows the change in distance between vehicles, where Integ_MPC_∆s and MPC1_MPC2_∆s are the distances between vehicles for Integ_MPC and MPC1_MPC2, respectively. Integ_MPC_∆sdes and MPC1_MPC2_∆sdes are the desired distances between vehicles for Integ_MPC and MPC1_MPC2, respectively. In the whole experiment process, the minimum distances between vehicles of both strategies are larger than the minimum safe distance (5 m), so the safety in the two strategies is guaranteed during the 2-D car-following process.
Following performance contains the tracking of the actual distance to the desired distance between vehicles and the tracking of the longitudinal speed of the following vehicle to the longitudinal speed of the vehicle in front. In terms of tracking the desired distance between vehicles, the root mean square estimation of Integ_MPC and MPC1_MPC2 are 5.86 and 8.19 m, respectively. In terms of tracking the longitudinal speed of the preceding vehicle, the root mean square estimation of Integ_MPC and MPC1_MPC2 are 1.45 and 1.35 m/s, respectively. As shown in Figure 17b, before entering the curve, in order to reduce the error between the actual distance and the desired distance for vehicles, the longitudinal speed of Integ_MPC is greater than that of MPC1_MPC2. When the following vehicle is passing through the curve, compared with MPC1_MPC2, the variation range of longitudinal speed in Integ_MPC is reduced. Integ_MPC performs better in tracking the desired distance between vehicles; MPC1_MPC2 is better in tracking the longitudinal speed of the preceding vehicle.
Figure 17c shows the change in longitudinal acceleration. In the early stage of the experiment, since the longitudinal speed of the following vehicle is greater than that of the vehicle in front, the following vehicle in both strategies decelerates. The acceleration of MPC1_MPC2 fluctuates greatly, while the acceleration of Integ_MPC is smoother. Figure 17d shows the jerk. The maximum values of absolute value for the jerk in the two strategies are always less than 3 m/s3. Compared with MPC1_MPC2, the fluctuation range of Integ_MPC is smaller. The comfort of the two strategies is guaranteed as the jerk fluctuates within a reasonable range.
Figure 17. Responses of longitudinal motion in scenario 2.
Figure 17. Responses of longitudinal motion in scenario 2.
Symmetry 14 01718 g017aSymmetry 14 01718 g017b
Figure 18 shows the vehicle trajectory in scenario 2, and XY _REF is the lane centerline. For Integ_MPC and MPC1_MPC2, the root mean square estimation of the horizontal and vertical coordinate deviation are 0.0469 and 0.0738 m, respectively, so the lane keeping is better in Integ_MPC.
Figure 18. Vehicle trajectory in scenario 2.
Figure 18. Vehicle trajectory in scenario 2.
Symmetry 14 01718 g018
Figure 19 shows the responses of lateral stability in scenario 2, including sideslip angle of the centroid, lateral acceleration, turning angle and yaw rate. Before entering the curve, since the road is straight, the four responses are all close to 0. When entering the curve, the four responses change greatly. After entering the curve, since the curve radius is not constant, the response also fluctuates slightly with the change in the radius for the curve. In Figure 19a, the change in the sideslip angle of the centroid is shown. Before entering the curve, since the road is straight, the sideslip angle of the centroid is 0 deg. When the following vehicle is driving on the curve, the sideslip angle of the centroid increases slightly. When the following vehicle completely enters the curve, due to the large longitudinal speed, the vehicle has a centrifugal tendency, and the sideslip angle of the centroid decreases to a negative value rapidly. When the longitudinal speed decreases, the sideslip angle of the centroid gradually increases to a positive value. In the subsequent 2-D car-following process, the longitudinal speed for the following vehicle changes with the vehicle in front, and the sideslip angle of the centroid also changes. The sideslip angle of the centroid decreases to be negative first and then increases to be positive. As the longitudinal speed approaches a stable value, the sideslip angle of the centroid also approaches a stable value. After getting into the curve, the fluctuation in the sideslip angle of the centroid in Integ_MPC is smaller than that of MPC1_MPC2.
Figure 19b–d are the changes in lateral acceleration, turning angle and yaw rate, and the change in the three responses for the two strategies are similar. When the following vehicle is getting into the curve, the three responses in both strategies fluctuate to a certain extent and then change in real-time with the change in the longitudinal speed. When the longitudinal speed increases, the three responses also increase; and when the longitudinal speed decreases, the three responses also decrease. After entering the curve, compared with MPC1_MPC2, because the weight coefficients related to lane keeping in Integ_MPC are bigger, it focuses on improving the performance related to lateral lane keeping. The fluctuation range for responses related to lateral stability in Integ_MPC is smaller, and the lateral stability of Integ_MPC is better. This is because the DQN-based weight optimization algorithm can obtain optimal weight coefficients, the three responses can be effectively controlled after entering the curve.
For the Integ_MPC, the root mean square estimation of the sideslip angle of the centroid, lateral acceleration, turning angle and yaw rate are 0.0496 deg, 0.5432 m/s2, 0.3257 deg and 1.4806 deg/s. For the MPC1_MPC2, the root mean square estimation of the sideslip angle of the centroid, lateral acceleration, turning angle and yaw rate are 0.0577 deg, 0.5695 m/s2, 0.4053 deg and 1.7748 deg/s. Therefore, compared with MPC1_MPC2, Integ_MPC has better lateral stability.
Figure 19. Responses related to lateral motion in scenario 2.
Figure 19. Responses related to lateral motion in scenario 2.
Symmetry 14 01718 g019aSymmetry 14 01718 g019b
Figure 20 shows the changes in the battery power and SOC in scenario 2. From Figure 20a, it can be seen that in the initial stage of the experiment, the longitudinal speed of the following vehicle is greater than that of the vehicle in front. In Integ_MPC, the following vehicle decelerates, so the battery power at this stage is negative, and the battery is in a state of energy recovery. While in MPC1_MPC2, the following vehicle accelerates and decelerates alternately, the battery power switches between positive and negative values rapidly, and the battery switches between energy consumption and energy recovery rapidly. In the subsequent 2-D car-following process, the preceding vehicle accelerates first and then decelerates, and the following vehicle also accelerates first and then decelerates. The battery power is positive during the acceleration process, and the battery power is negative during the deceleration process. From Figure 20b, it can be obtained that the SOC of the two strategies is accompanied by several rising and falling processes. The rising of SOC corresponds to energy recovery, and the falling of SOC corresponds to energy consumption. At the end of the experiment, the indicators of economy in Integ_MPC and MPC1_MPC2 are 0.0044 and 0.0054 km−1, respectively. Through the above analysis, the economy of Integ_MPC is better.
Figure 20. Battery power and SOC in scenario 2.
Figure 20. Battery power and SOC in scenario 2.
Symmetry 14 01718 g020aSymmetry 14 01718 g020b
To sum up, in scenario 2, the preceding vehicle in the adjacent lane changes lanes and becomes the new preceding vehicle. Integ_MPC optimizes the longitudinal following control and lateral lane keeping control in an MPC framework. The weight coefficients in the objective function are optimized in real-time through the DQN-based weight optimization algorithm, thereby obtaining the optimal control variables. While the vehicle is following on a straight road, longitudinal following is the priority, and while the vehicle is following on a curve, lane keeping is the priority. Since the longitudinal vehicle speed can affect the lateral stability, after entering the curve, Integ_MPC focuses on improving the lateral stability, so the lateral stability is improved by reducing the fluctuation range of the longitudinal vehicle speed. Although the tracking performance for longitudinal speed is weakened, the effect is small. Since the fluctuations of longitudinal speed and longitudinal acceleration in Integ_MPC are reduced when driving on the curved road, it reduces the energy consumption on the actuator. From Table 5, the improvement in lane keeping is 36.45%, the average improvement in lateral stability is 16.66%, and the improvement in economy is 18.52%. Compared with the comparison strategy, Integ_MPC achieves better lane keeping, lateral stability and economy on the premise of ensuring other performances in the 2-D car-following process. Thus, the proposed strategy can realize the coordinated optimization of longitudinal control and lateral control.

6. Conclusions

For the coupling problem of longitudinal control and lateral control of the vehicle, the 2-D car-following control strategy is proposed in this paper. First, a 2-D car-following model is established. Then, the 2-D car-following model is designed. In addition, the longitudinal following control and lateral lane keeping control are optimized under an MPC framework. In order to adapt to different dynamic processes of lane keeping, the weight coefficients in the MPC framework are optimized in real-time based on the DQN algorithm, and the optimal weight coefficients are obtained through the trained deep neural network. Finally, to verify the effectiveness of the 2-D car-following control strategy, a comparison strategy and two experimental scenarios are set up. During the experiment, the longitudinal speed is input into the lateral lane keeping control in real-time to influence the lateral lane keeping control; and the lateral acceleration is input into the longitudinal following control in real-time, which constrains the longitudinal following control in real-time. Compared with the comparison strategy, under the premise of ensuring other performances of the 2-D car-following process, the proposed strategy achieves improvements for lane keeping, lateral stability and economy in scenario 1 and scenario 2. For scenario 1, the lane keeping, lateral stability and economy of the proposed strategy are improved by 37.21%, 17.57% and 9.26%, respectively. For scenario 2, the lane keeping, lateral stability and economy of the proposed strategy are improved by 36.45%, 16.66% and 18.52%, respectively.

Author Contributions

Conceptualization, methodology, validation, formal analysis, writing—original draft preparation, S.Z.; Formal analysis, writing—review and editing, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the National Natural Science Foundation of China under Grant U1713213, Grant U1913202, and Grant U1813205; in part by the Key-Area Research and Development Program of Guangdong Province under Grant 2019B090915001; in part by Shenzhen Technology Project under Grant JCYJ20180507182610734 and Grant JSGG20191129094012321.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yousefi, M.; Hajizadeh, A.; Soltani, M.N.; Hredzak, B. Predictive home energy management system with photovoltaic array, heat pump, and plug-in electric vehicle. IEEE Trans. Ind. Inf. 2021, 17, 430–440. [Google Scholar] [CrossRef]
  2. Badue, C.; Guidolini, R.; Carneiro, R.V.; Azevedo, P.; Cardoso, V.B.; Forechi, A.; Jesus, L.; Berriel, R.; Paixão, T.M.; Mutz, F.; et al. Self-driving cars: A survey. Expert Syst. Appl. 2021, 165, 113816. [Google Scholar] [CrossRef]
  3. Wang, Q.; Wang, Z.; Zhang, L.; Liu, P.; Zhang, Z. A novel consistency evaluation method for series-connected battery systems based on real-world operation data. IEEE Trans. Transp. Electrif. 2021, 7, 437–451. [Google Scholar] [CrossRef]
  4. Diao, K.; Sun, X.; Lei, G.; Bramerdorfer, G.; Guo, Y.; Zhu, J. System-level robust design optimization of a switched reluctance motor drive system considering multiple driving cycles. IEEE Trans. Energy Convers. 2021, 36, 348–357. [Google Scholar] [CrossRef]
  5. Liu, S.; Li, Z.; Ji, H.; Wang, L.; Hou, Z. A novel anti-saturation model-free adaptive control algorithm and its application in the electric vehicle braking energy recovery system. Symmetry 2022, 14, 580. [Google Scholar] [CrossRef]
  6. Pei, W.; Zhang, Q.; Li, Y. Efficiency Optimization Strategy of Permanent Magnet Synchronous Motor for Electric Vehicles Based on Energy Balance. Symmetry 2022, 14, 164. [Google Scholar] [CrossRef]
  7. Wang, Y.; Wang, Z.; Han, K.; Tiwari, P.; Work, D.B. Gaussian process-based personalized adaptive cruise control. IEEE Trans. Intell. Transp. Syst. 2022, 1–12. [Google Scholar] [CrossRef]
  8. Groelke, B.; Earnhardt, C.; Borek, J.; Vermillion, C. A predictive command governor-based adaptive cruise controller with collision avoidance for non-connected vehicle following. IEEE Trans. Intell. Transp. Syst. 2022, 23, 1–11. [Google Scholar] [CrossRef]
  9. Jia, D.; Chen, H.; Zheng, Z.; Watling, D.; Connors, R.; Gao, J.; Li, Y. An enhanced predictive cruise control system design with data-driven traffic prediction. IEEE Trans. Intell. Transp. Syst. 2022, 7, 8170–8183. [Google Scholar] [CrossRef]
  10. Ruan, S.; Ma, Y.; Yang, N.; Xiang, C.; Li, X. Real-time energy-saving control for HEVs in car-following scenario with a double explicit MPC approach. Energy 2022, 247, 123265. [Google Scholar] [CrossRef]
  11. Li, S.; Li, K.; Rajamani, R.; Wang, J. Model Predictive Multi-Objective Vehicular Adaptive Cruise Control. IEEE Trans. Control Syst. Technol. 2011, 19, 556–566. [Google Scholar] [CrossRef]
  12. Lamprecht, A.; Steffen, D.; Nagel, K.; Haecker, J.; Graichen, K. Online model predictive motion cueing with real-time driver prediction. IEEE Trans. Intell. Transp. Syst. 2022, 23, 1–15. [Google Scholar] [CrossRef]
  13. Ly, K.; Mayekar, J.V.; Aguasvivas, S.; Keplinger, C.; Rentschler, M.E.; Correll, N. Electro-hydraulic rolling soft wheel: Design, hybrid dynamic modeling, and model predictive control. IEEE Trans. Rob. 2022, 1–20. [Google Scholar] [CrossRef]
  14. Yeganegi, M.H.; Khadiv, M.; Prete, A.D.; Moosavian, S.A.A.; Righetti, L. Robust walking based on MPC with viability guarantees. IEEE Trans. Rob. 2022, 38, 1–16. [Google Scholar] [CrossRef]
  15. Wu, Z.; Xia, X.; Zhu, B. Model predictive control for improving operational efficiency of overhead cranes. Nonlinear Dyn. 2015, 79, 2639–2657. [Google Scholar] [CrossRef]
  16. Jia, Y.; Jibrin, R.; Görges, D. Energy-optimal adaptive cruise control for electric vehicles based on linear and nonlinear model predictive control. IEEE Trans. Veh. Technol. 2020, 69, 14173–14187. [Google Scholar] [CrossRef]
  17. Zhou, Y.C.; Zhuang, W.C.; Ju, F. Ecological predictive cruise control of connected electric vehicle with predecessor velocity prediction and road grade preview. In Proceeding of the 18th IEEE Vehicle Power and Propulsion Conference, Gijon, Spain, 25 October–14 November 2021; pp. 1–7. [Google Scholar]
  18. Madhusudhanan, A.K. A method to improve an electric vehicle’s range: Efficient cruise control. Eur. J. Control 2019, 48, 83–96. [Google Scholar] [CrossRef]
  19. Zhang, S.; Zhuan, X.T. Study on adaptive cruise control strategy for battery electric vehicle. Math. Probl. Eng. 2019, 2019, 7971594. [Google Scholar] [CrossRef]
  20. Zhang, S.W.; Luo, Y.G.; Li, K.Q.; Li, V. Real-time energy-efficient control for fully electric vehicles based on an explicit model predictive control method. IEEE Trans. Veh. Technol. 2018, 67, 4693–4701. [Google Scholar] [CrossRef]
  21. Chen, J.; Sun, D.; Zhao, M.; Li, Y.; Liu, Z. A new lane keeping method based on human-simulated intelligent control. IEEE Trans. Intell. Transp. Syst. 2022, 23, 7058–7069. [Google Scholar] [CrossRef]
  22. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. In Proceedings of the Twenty-Seventh Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 1–9. [Google Scholar]
  23. Li, L.; Zhang, Y.B.; Yang, C.; Yang, B.J.; Martinez, M. Model predictive control-based efficient energy recovery control strategy for regenerative braking system of hybrid electric bus. Energy Convers. Manag. 2016, 111, 299–314. [Google Scholar] [CrossRef]
  24. Abdollahi, A.; Han, X.; Avvari, G.; Raghunathan, N.; Balasingam, B.; Pattipati, K.R.; Bar-Shalom, Y. Optimal battery charging, Part I: Minimizing time-to-charge, energy loss, and temperature rise for OCV-resistance battery model. J. Power Source 2016, 303, 388–398. [Google Scholar] [CrossRef]
  25. Zhang, S.; Zhuan, X.T.; Fang, Y.T.; Cheng, J. Model-predictive optimization for lane keeping assistance system with exponential decay smoothing. In Proceedings of the 2021 IEEE International Conference on Robotics and Biomimetics, Sanya, China, 6–9 December 2021; pp. 1–6. [Google Scholar]
  26. Dang, R.; He, C.; Zhang, Q. ACC of electric vehicles with coordination control of fuel economy and tracking safety. In Proceedings of the Intelligent Vehicles Symposium, Alcala de Henares, Spain, 3–7 June 2012; pp. 240–245. [Google Scholar]
  27. Qiu, J.F.; Wu, Q.H.; Ding, G.R.; Xu, Y.H.; Feng, S. A survey of machine learning for big data processing. EURASIP J. Adv. Signal Process. 2016, 67, 1–16. [Google Scholar]
  28. Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement learning in robotics: A survey. Int. J. Rob. Res. 2013, 32, 1238–1274. [Google Scholar] [CrossRef]
  29. Watkins, C.J.C.H. Learning from Delayed Rewards. Ph.D. Thesis, King’s College, London, UK, 1989. [Google Scholar]
  30. Mnih, V. Human-level control through deep reinforcement learning. Nature 2015, 518, 1–13. [Google Scholar] [CrossRef] [PubMed]
  31. Batra, M.; McPhee, J.; Azad, N.L. Anti-jerk model predictive cruise control for connected electric vehicles with changing road conditions. In Proceedings of the 2017 11th Asian Control Conference (ASCC), Gold Coast, Australia, 17–20 December 2017; pp. 49–54. [Google Scholar]
Figure 1. The structure of the target electric vehicle.
Figure 1. The structure of the target electric vehicle.
Symmetry 14 01718 g001
Figure 4. Two-dimensional car-following process.
Figure 4. Two-dimensional car-following process.
Symmetry 14 01718 g004
Figure 8. The framework of reinforcement learning.
Figure 8. The framework of reinforcement learning.
Symmetry 14 01718 g008
Figure 9. Framework of the DQN algorithm.
Figure 9. Framework of the DQN algorithm.
Symmetry 14 01718 g009
Figure 10. Lane centerline settings.
Figure 10. Lane centerline settings.
Symmetry 14 01718 g010
Table 1. The parameters of the MPC framework.
Table 1. The parameters of the MPC framework.
SymbolValueSymbolValue
Ts0.05 sρj0.94
th1.5 sMveh1550 kg
τ0.15 sl11.1 m
d07 ml21.58 m
dc5 mIz2873 kg m3
vxmin0 m/sCαf80 KN/rad
vxmax36 m/sCαr80 KN/rad
axmin−5.5 m/s2u2min−5 deg
axmax2.5 m/s2u2max5 deg
u1min−5.5 m/s2 ρ e s 0.6
u1max2.5 m/s2 ρ e ˙ s 0.6
jxmin−3 m/s3 ρ e α 0.6
jxmax3 m/s3 ρ e ˙ α 0.6
Rdiag (1,1)p10
ρδ0.94m5
ρv0.94T50 s
ρa0.94--
Table 2. Experiment scenario setting.
Table 2. Experiment scenario setting.
Scenariosinis (m)ini_vx (m/s)ini_vf (m/s)amp_ax (m/s2)μ
scenario150202520.45
scenario270252020.45
inis is the initial distance between vehicles, ini_vx is the initial longitudinal speed for the following vehicle, ini_vf is the initial longitudinal speed for the vehicle in front, amp_a is the amplitude of longitudinal acceleration for the preceding vehicle and μ is the ground adhesion coefficient.
Table 3. Software and hardware for the experiment.
Table 3. Software and hardware for the experiment.
NameProperty
Matlab2018a
CarSim2016.1
CPUIntel Core i7-4790 (3.60 GHz)
GPUNVIDIA TITAN V
Memory32.00 GB (3200 MHz)
Operating systemWindows 10 (64-bit)
Python3.8.8
PyTorch1.7.1
CUDA10.1
Table 4. Comparison of three performance indicators of the two strategies in scenario 1.
Table 4. Comparison of three performance indicators of the two strategies in scenario 1.
ObjectivesLane KeepingLateral StabilityEconomy
Indicators R M S E Δ X Y R M S E β R M S E a y R M S E δ f R M S E ψ ˙ ΔSOC/s
Integ_MPC0.0437 m0.0423 deg0.4933 m/s20.3108 deg1.4099 deg/s0.0049 km−1
MPC1_MPC20.0696 m0.0501 deg0.5933 m/s20.3901 deg1.7096 deg/s0.0054 km−1
Improvements37.21%15.57%16.85%20.33%17.53%9.26%
Average: 17.57%
Table 5. Comparison of three performance indicators of the two strategies in scenario 2.
Table 5. Comparison of three performance indicators of the two strategies in scenario 2.
ObjectivesLane KeepingLateral StabilityEconomy
Indicators R M S E Δ X Y R M S E β R M S E a y R M S E δ f R M S E ψ ˙ ΔSOC/s
Integ_MPC0.0469 m0.0496 deg0.5432 m/s20.3257 deg1.4806 deg/s0.0044 km−1
MPC1_MPC20.0738 m0.0577 deg0.6495 m/s20.4053 deg1.7748 deg/s0.0054 km−1
Improvements36.45%14.04%16.37%19.64%16.58%18.52%
Average: 16.66%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhang, S.; Zhuan, X. Two-Dimensional Car-Following Control Strategy for Electric Vehicle Based on MPC and DQN. Symmetry 2022, 14, 1718. https://doi.org/10.3390/sym14081718

AMA Style

Zhang S, Zhuan X. Two-Dimensional Car-Following Control Strategy for Electric Vehicle Based on MPC and DQN. Symmetry. 2022; 14(8):1718. https://doi.org/10.3390/sym14081718

Chicago/Turabian Style

Zhang, Sheng, and Xiangtao Zhuan. 2022. "Two-Dimensional Car-Following Control Strategy for Electric Vehicle Based on MPC and DQN" Symmetry 14, no. 8: 1718. https://doi.org/10.3390/sym14081718

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop