Variable Speed Limit Intelligent Decision-Making Control Strategy Based on Deep Reinforcement Learning under Emergencies

Yang, Jingwen; Wang, Ping; Ju, Yongfeng

doi:10.3390/su16030965

Open AccessArticle

Variable Speed Limit Intelligent Decision-Making Control Strategy Based on Deep Reinforcement Learning under Emergencies

by

Jingwen Yang

¹

,

Ping Wang

^2,* and

Yongfeng Ju

¹

School of Electronic and Control Engineering, Chang’an University, Xi’an 710054, China

²

School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 510006, China

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(3), 965; https://doi.org/10.3390/su16030965

Submission received: 26 December 2023 / Revised: 16 January 2024 / Accepted: 17 January 2024 / Published: 23 January 2024

(This article belongs to the Special Issue Traffic Safety and Transportation Planning)

Download

Browse Figures

Versions Notes

Abstract

Uncertain emergency events are inevitable and occur unpredictably on the highway. Emergencies with lane capacity drops cause local congestion and can even cause a second accident if the response is not timely. To address this problem, a self-triggered variable speed limit (VSL) intelligent decision-making control strategy based on the improved deep deterministic policy gradient (DDPG) algorithm is proposed, which can eliminate or alleviate congestion in a timely manner. The action noise parameter is introduced to improve exploration efficiency and stability in the early stage of the algorithm training and then maximizes differential traffic flow as the control objective, taking the real-time traffic state as the input. The reward function is constructed to explore the values of the speed limit. The results show that in terms of safety, under different traffic flow levels, the proposed strategy has improved by over 28.30% compared to other methods. In terms of efficiency, except for being inferior to the no-control condition during low-traffic-flow conditions, our strategy has improved over 7.21% compared to the others. The proposed strategy greatly benefits traffic sustainability in Intelligent Transport Systems (ITSs).

Keywords:

variable speed limit; deep deterministic policy gradient (DDPG); deep reinforcement learning (DRL); emergency

1. Introduction

Highway emergencies are uncertain [1,2] and occur accompanied by a series of unpredictable traffic phenomena. For example, a sudden lane drop will cause a large number of vehicles to quickly gather in the upstream section of the emergency section, which may cause a series of negative impacts and chain effects, such as queuing and even secondary accidents, resulting in heavier casualties and more property losses [3,4,5]. Therefore, it is necessary to pay attention to the emergency section. Timely traffic management and control are very necessary.

The variable speed limit (VSL) is an effective control measure to improve road safety in emergency environments [6] and is widely and successfully applied to highway systems in the US, the EU, and Australia [7,8]. VSLs can reduce the probability of traffic blockage by balancing vehicle speeds. The principle is to constrain the inflow by adjusting the values of the VSL in time, optimizing the traffic flow operating status to avoid stop-and-go traffic, waves, and other unstable states [9]. VSL control can adjust traffic states based on real-time traffic information, including traffic speed, flow, and so on, especially for emergency events or severe conditions and other dynamic states. The dynamic adjustment of appropriate speed based on the current conditions has advantages in efficiency and safety. Some studies [10,11] show that the implementation of the VSL method can reduce vehicle collisions and improve road safety. What is more, it also makes important contributions to traffic efficiency and environmental benefits [12,13].

The VSL control methods can be roughly divided into four types: rule-based, feedback-based, optimal-control-based, and reinforcement learning methods [14]. Rule-based VSL control methods are implemented by formulating logical rules [15,16]. As mentioned before, the value of the variable speed limit is determined based on preselected thresholds, including traffic flow, density, average speed, and so on, which can coordinate speed differences and stabilize traffic flow. Rule-based methods are simple and easy to implement compared to other methods and are already in most traffic management and control systems. The successful application of VSL control in traffic systems also illustrates its effectiveness in coordinating traffic and improving traffic safety and efficiency. The main idea of the feedback-based VSL controller [17,18] is to calculate a VSL value using the current and past traffic conditions, which usually requires less computing time than the optimal-control-based methods. However, the performance of feedback-based VSL control relies heavily on the accurate measurement of traffic conditions, such as traffic flow and density. Therefore, small perturbations in the measured density may lead to suboptimal performance of the closed-loop system. The VSL based on optimal control methods is typically implemented within a model predictive control framework [19,20,21,22]. At each time step, the VSL control command is calculated by solving an optimization problem whose objective function involves a performance metric, such as total travel time, safety, emissions, and fuel consumption. In addition, its control effect depends on the accuracy of the model. The VSL decision-making control strategy based on reinforcement learning is based on real-time data fed back by the environment [12,23,24], and it automatically senses the environmental status and conducts interactive training with the external environment through a continuous trial and error mechanism, thereby learning the optimal control decision.

The implementation of traditional VSL control relies on transportation infrastructure variable message signs (VMSs) to transmit control values. The VMSs on the highway are fixed and placed at discrete locations, so it is very difficult to react to a dynamic traffic environment. Considering the emergence of emergencies is uncertain and random, the locations of variable information signs are very important to the implementation effect of the strategy in the process of the implementation of the VSL control strategies. Therefore, variable information signs should be placed widely and densely enough to effectively respond to the dynamic emergency environment. However, the construction of a continuous variable information sign system requires expensive and cumbersome accessories to configure, such as gantries and other accessories. With the application and development of the Cooperative Vehicle Infrastructure System (CVIS), the comprehensive use of a variety of sensors provides new technical ways to collect traffic data [25,26]. Connected and Autonomous Vehicles (CAVs) with autonomous driving and network communication capabilities not only have a smaller expected headway but also are highly obedient and responsive in a timely manner to the control commands issued by the control center [27]. It solves the problem that the control strategy based on the basic traffic infrastructure has poor flexibility and slow control actions. It is difficult to measure the actual data, driver compliance, and negative impact of reducing driver uncertainty in traffic control. Therefore, CAVs with mobility and perception capabilities bring great potential to traffic management and control and also provide a new method of detecting traffic information. Han et al. [28] first used CAVs in the process of implementing VSL control at fixed bottlenecks in 2017, breaking the traffic management and control system relying on infrastructure.

As mentioned above, considering the practical engineering application, the traditional VSL method is easy to implement, but the performance effect of the control strategy depends to a large extent on the accuracy of the established traffic flow model. The parameters of the traffic flow model are related to the many factors considered [29]. For example, the traffic flow models for adjacent road sections are not the same due to different linear shapes. Most traffic models are too idealized to accurately distinguish and describe the traffic flows in different locations or on different road surfaces. Therefore, the establishment of the model is not detailed and accurate, resulting in the performance of the model-based VSL control strategy being unsatisfactory.

Deep reinforcement learning (DRL) is a self-learning method of continuous interaction with the environment to complete the decision-making process [30]. Continuous interaction can learn the characteristic values of the environment, and the decision-making is suitable for the environment. In recent years, some studies have introduced reinforcement learning algorithms into VSL control [12,23,24], and they have confirmed that the VSL methods based on DRL are better than the traditional method. However, most of the research on VSL methods based on DRL focuses on the balanced control of traffic flow, and there is a lack of research on management and control after emergencies.

The purpose of this work is to design an intelligent traffic management and control strategy for the VSL with DRL under emergencies. The algorithm can learn an adaptive controller that can adapt to the changing environment in a short time. The self-triggered intelligent decision-making framework solves traffic evacuation in emergencies for traffic sustainability development. The contributions of this paper are as follows:

1.: The method in this paper is model-free, which can avoid obtaining an accurate traffic flow model;
2.: The self-triggering type of traffic management control method can take timely measures after the occurrence time;
3.: The performance of the proposed method has an advantage regarding efficiency and safety compared with other methods.

The structure of this research consists of six sections. Section 1 provides an overview of the topic and the background for the study. Section 2 proposes the problem of research. Section 3 describes the Markov decision process of VSL control. Section 4 explains the VSL control problem based on improving a deep deterministic policy gradient (DDPG) under emergency. Section 5 discusses the study’s results. Finally, Section 6, the Conclusion, summarizes the research findings and provides insights into future work.

2. Problem Proposed

This section presents the basic model of traffic flow and then explains the principle of the VSL to solve emergencies based on the traffic flow model. Note that the explanation of the symbols mentioned as follows is shown in Table A1.

2.1. Traffic Flow Model

The basic parameters of the traffic flow model including traffic flow, speed and density can explain many non-equilibrium phenomena such as traffic bottlenecks, traffic shock waves, stop-and-go traffic and so on. The cell transmission model (CTM) is widely used to reveal the foundation features of traffic flow and is a traffic flow model that has the characteristics of a microscopic model [31]; What is more, the CTM is a macro-model with mathematical analytical properties. Therefore, the CTM is widely used in dynamic traffic allocation to explain a more realistic traffic flow distribution. The segment can be divided into several sub-sections (cells) based on the CTM, and then based on the principles of traffic flow conservation and waves, the transfer relationship between sub-sections is established, which can dynamically describe the process of evolution in a spatiotemporal manner. Each cell represents the sub-segment, reflects the characteristics at multiple time and space scales, and at the same time more accurately captures traffic waves and waiting vehicles.

As shown in Figure 1, the road segment can be divided into several cells. The state of each sub-segment (cell) can be represented as Equation (1).

x_{i} (t) = [v_{i} (t), k_{i} (t), q_{i} (t), L_{i}],

(1)

where

x_{i} (t)

represents the state of cell i at time t,

v_{i} (t)

represents the average speed (km/h) of cell i at time t,

k_{i} (t)

represents the density (veh/km) of cell i at time t,

q_{i} (t)

represents the traffic flow (veh/h) of the cell i, and

L_{i}

represents the length (km) of cell i.

2.2. Principle of Variable Speed Limit

Based on the above basic traffic flow parameters, the relationship between traffic flow and density of each sub-section can be expressed in Figure 2. The relationship between flow and density under a congested environment is complex, and there is an unstable traffic state between free flow and congested flow, which is called a metastable region [32]. The traffic state in metastable regions will increase the possibility of emergency. The obvious feature is that the road capacity of the sub-segment decreases suddenly after the occurrence of emergencies and other similar events. At the same time, the traffic capacity of the road section will be changed accompanied by the traffic conditions such as congestion density and critical density, as shown in Figure 2.

The VSL is an effective way to deal with reduced capacity. The principle is that even very small flows from on-ramps or mainline flow can create shock waves and traffic blocks when traffic is in metastable regions. The traffic will be interrupted. The VSL attempts to slow down the upstream traffic speed and limit a part of traffic flow into bottlenecks. Therefore, the speed limit of the sub-segment can change the shape of the fundamental graph, which makes the state from state A to somewhere between states B and C, as shown in Figure 3. The VSL can limit the inflow and recover the capacity drop by the limit flow equal to the capacity after the lane drop. The inflow can be restricted, and the vehicle travel time increases due to the speed limit control causing the average speed of the traffic flow to decrease. If not analyzed from a safety perspective, speed limit control plays a negative role in the operation of traffic flow. Therefore, it will only be more suitable in some emergency scenarios, such as alleviating downstream congestion and avoiding worsening downstream congestion.

3. Markov Decision Process of VSL Control

In this section, the VSL control problem under an emergency environment is described as the Markov decision process (MDP) and is formulated to address using deep reinforcement learning.

3.1. Markov Decision Process

The MDP is a classic decision-making process, and then we apply a reinforcement learning architecture to solve this problem. The agent will perceive the current system state and implement actions on the environment according to the strategy, thereby changing the state of the environment and receiving rewards. The accumulation of rewards over time is called reward. The MDP is described by its five components: (1) state space S; (2) action space A; (3) reward function R; (4) state transition matrix P; and (5) discount rate

γ

.

MDP describes the process of interaction between an agent and the environment: the agent takes action

a_{t}

in the current state

s_{t}

and then transfers the state to

s_{t + 1}

, and at the same time, it returns the reward function

r_{t}

, as shown in the Figure 4. The strategy is a mapping from the state space S to the action space A, that is,

π : S \to A

. The purpose of DRL is to find the optimal solution that can maximize the long-term cumulative reward strategy

π

.

The VSL control problem is abstracted as an MDP under emergency conditions. It is essential to appropriately define the five elements inherent to the MDP framework. Therefore, the follow-up of this section focuses on the formulation of the five elements of the MDP in the process of VSL control.

3.2. MDP of VSL Control

Traditional VSL control dynamically posts speed values through VMSs, relying too much on road infrastructure, resulting in poor flexibility in the implementation of control strategies and slow control actions. With the development of roadside and vehicle-mounted equipment, sensor technology and vehicle–road wireless communication networks, the cooperative vehicle infrastructure system enhances traffic information management and service capabilities. In this environment, vehicles and road unit systems and their remote data centers can establish effective communication to share relevant status information and traffic control strategies as well as accurately transmit the speed limit value to each vehicle, which solves the problem of poor flexibility and slow control actions of control strategies based on road infrastructure. At the same time, the proposed method solves the shortcomings of high-accuracy dependence on driver compliance and traffic flow state prediction models and reduces the negative impact of driver uncertainty on traffic control.

In this paper, we focus on using VSL to relieve queuing vehicles in emergencies and avoid vehicle congestion. The purpose is to explore VSL solutions in the CVIS environment and maximize the transportation network throughput. In the cooperative vehicle infrastructure system environment, it is not difficult to implement variable speed limit control actions by sending speed limit commands to vehicles in the corresponding area. For example, existing driver assistance systems and cruise control systems can be used to force vehicles to follow the received speed value. Figure 5 shows the VSL control process in the CVIS environment. The road section upstream of the emergency point is divided into a status monitoring area and a speed limit control area. The status monitoring area contains

N_{C}

cells, the VSL implementation area contains

N I

(

N_{I} \leq N_{C}

) cells, and the state of each cell is a set of density and speed. The roadside unit will input the information of the status monitoring area obtained from the road to the algorithm unit once an emergency occurs. The algorithm unit will calculate the current speed limit control value based on the current status information and then send the result to the roadside unit. The roadside unit system sends the VSL values to each vehicle in the policy implementation area. In addition, the reinforcement learning algorithm with the actor–critic architecture is used to solve the emergency condition.

The actor is used to output a VSL control strategy, and the critic is used to evaluate the actor’s strategy. The reward function can quantify the efficiency, safety and emission reduction capabilities of the transportation network.

In this section, the VSL control process is formulated as a Markov decision process. Agent, state, action, transition probability, and reward are defined as follows:

Agent: The VSL controller is regarded as an agent. The agent can output different speed limit values for different cells (sub-sections) upstream of the emergency point. The goal of the agent is to control roads once emergencies occur, divert vehicles on accident sections, avoid vehicle congestion, and improve road capacity.
State space: The state space reflects the road state of traffic flow in the real-time traffic environment. Based on the simulation platform, real-time information on the road can be obtained. This paper studies traffic flow control methods under emergencies; the upstream traffic state of the upstream section emergency point has a greater impact compared with the upstream section. Therefore, this paper pays attention to the traffic state of the upstream emergency point. Determine the state detection area of the upstream section of the emergency point and divide the area into $N_{s}$ number of cells. In this paper, the state space S at time t of the VSL controller can be expressed as:

$\begin{matrix} s_{t} = \{\begin{matrix} X_{1} (t) \\ X_{1} (t) \\ X_{3} (t) \\ . . . \\ X_{N_{s}} (t) \end{matrix}\} \end{matrix}$

(2)
Action space: The number of action spaces is related to the number of controlled cells. In the scenario of this paper, the number of action spaces is generally less than or equal to the number of state spaces. Considering the real world for implementation and driver compliance issues, the element values of the action space are set to discrete values, and the following formula represents the state expression of the action space.

$\begin{matrix} a_{t} = \{\begin{matrix} a_{1} (t) \\ a_{2} (t) \\ a_{3} (t) \\ . . . \\ a_{N_{c}} (t) \end{matrix}\} \end{matrix}$

(3)
Transition probability: The training of the agent is carried out on the open-source traffic simulation platform: Simulation of Urban MObility (SUMO). SUMO provides flexible interfaces for network design, traffic sensors and traffic control solutions. The state transition matrix in this article has been defined in the SUMO platform by default.
Reward value: Reinforcement learning is learning to select actions based on maximizing a given reward signal. The key issue with this approach is to ensure that the agent receives rewards that promote good system-level behavior. The reward function of the variable limit control problem can be defined from the optimization goal. The VSL was first proposed to reduce traffic conflicts and enhance the consistency of traffic flow speeds, which can improve road safety. Therefore, safety factors can be ignored when formulating a VSL strategy. The optimization goal of the VSL method can be total travel time, low collision probability, minimum vehicle emissions [33], etc. The difference in traffic flow upstream and downstream of the emergency point is used as the reward function. For the traffic road network, the closer the traffic flow downstream of the emergency point is to the traffic flow upstream, the less impact the accident point has on the traffic capacity of the road. The reward function can be expressed as:

$r_{t} = f_{o u t} (t) - f_{i n} (t),$

(4)

4. VSL Control Problem Based on Improving DDPG under Emergency

The self-triggered VSL intelligent decision-making control strategy architecture proposed in this paper based on improved DDPG is shown in Figure 6. The framework mainly includes three parts: the environment module, the trigger module and the intelligent decision-making control module.

The environment module is mainly used to receive the VSL value of each road section provided by the intelligent decision-making control module and provide real-time traffic status information for the trigger module and intelligent decision-making control module. The environment module mainly comprises the road network model and vehicle model.

The trigger module is based on the accident flag to determine whether to start the intelligent decision-making control module. The state of the emergency flag can be monitored by the video monitoring system to trigger the intelligent decision-making control module, or it can be manually set by road monitoring people. It is asleep when the accident flag is

F a l s e

, indicating that everything on the road is normal. When the accident flag is in an activated state, indicating that an emergency has occurred on the road, the intelligent decision-making control module will be triggered, and the triggering module will send the location of the emergency to the intelligent decision-making control module. The status of the trigger module determines whether the decision-making control module works. In this paper, the control strategy will be implemented only when an emergency occurs. Under normal road conditions, the intelligent decision-making control module will not be started, which can avoid excessive control of roads and save computing resources and energy efficiency.

The intelligent decision-making module is as follows: the intelligent decision-making module will receive the emergency location sent by the trigger module once an emergency occurs on the road and then collect the traffic information of the emergency point upstream from the environment module. The intelligent decision-making module will output values of the VSL of the road based on the current state and implement the control section, that is, the calculated VSL value of each sub-section, to guide the vehicles in the corresponding sub-section until the next control domain. Vehicles on the road are monitored at the same time. When the vehicle position is out of range in the detecting area of the emergency event, the control of the vehicle will be automatically released. This module is based on the DDPG algorithm and adds actions in the early stage of the algorithm update. The noise parameter improves the algorithm exploration efficiency and stability, avoids local optimally, and combines the continuous action values of the output to make the road smooth. In terms of algorithm implementation, the DDPG algorithm in this paper will use four neural networks, of which the actor and critic each use the same structure neutral network. In addition, they each have a target network. In the DDPG algorithm, the actor also needs the target network to calculate the target value, and the target network update adopts a soft update method that lets the target network update slowly, gradually approaching the value network, and finally obtaining a trained algorithm model. The actor output can be used to value the VSL.

Neural Network Design and Algorithm Update

The DDPG algorithm used in this paper belongs to the actor–critic architecture and includes a policy network and a value network. The policy network is a deterministic policy network, which means that the policy network makes decisions based on the current state s to obtain a deterministic action a. The value network can evaluate this action a based on state s, which can promote the policy network to make improvements.

The deterministic policy network can be expressed as

π (s; θ)

, where

θ

is the policy network parameter. The value network can be expressed as

q (s, a, ω)

, where

ω

is the value network parameter, and its output is a real number Q, which is used to evaluate the policy of the action based on the current state.

The time difference (TD) algorithm is used to update the value network based on a set of data

(s_{t}, a_{t}, r_{t}, s_{t + 1})

. Therefore, based on these data, the value network can be used to predict the action at time t based on

π (s_{t}; θ)

. The value

q_{t}

is calculated based on Formula (5).

q_{t} = q (s_{t}, a_{t}; ω),

(5)

In the same way, the value network can also predict the value

q_{t + 1}

of the action at time

t + 1

:

q_{t + 1} = q (s_{t + 1}, a_{t + 1}^{'}; ω),

(6)

where

a_{t + 1}^{'} = π (s_{t + 1}; θ)

.

Therefore,

δ_{t}

can be obtained from the following formula:

δ_{t} = q_{t} - (r_{t} + γ * q_{t + 1}),

(7)

In order for

δ_{t}

to be as small as possible, gradient descent is used to update the value network

ω

;

ω \leftarrow ω - α * δ * \frac{\partial q (s_{t}, a t; ω)}{\partial ω},

(8)

where

α

is the learning rate of the value network.

The policy network is updated using the deterministic policy gradient update method. The goal of training the value network is that the value of

q (s, a; ω)

is as large as possible where

a = π (s; θ)

. The deterministic policy network can be obtained by the following formula, which is a real number:

g = \frac{\partial q (s, π (s; θ); ω)}{\partial θ} = \frac{\partial a}{θ} * \frac{\partial q (s, a; ω)}{a},

(9)

The author hopes that the larger the value Formula (9), the better, so the gradient ascent method is used to update the policy network:

θ \leftarrow θ + β * g,

(10)

The core challenge of reinforcement learning is how to balance exploration with actively searching for actions that may yield high returns and bring long-term benefits. Without sufficient exploration, the agent may not be able to discover effective VSL strategies. Therefore, this paper focuses on the DDPG algorithm. The output part introduces the action noise parameter ℵ, which is shown below:

a_{t} = π (s_{t}; θ) + ℵ,

(11)

In the early stage of the algorithm update, possible action values can be explored as randomly as possible. As the rounds of algorithm update increase, ℵ becomes smaller and smaller, and the impact on the output action becomes smaller and smaller. Therefore, the output of the policy network adding noise to the action to disturb and increase exploration can also prevent local optimally in the early stage of the algorithm.

This architecture reasons about actions in a continuous space and then simply uses integer transformations to obtain continuous action values to output discrete actions. The actual speed limit value can be obtained by following the formula:

u (t) = V_{N} + I * a_{t},

(12)

In the formula,

V_{N}

and I are constants,

a_{t}

is obtained from the network, and

u_{t}

is the output quantity (control quantity).

In addition, as an offline strategy reinforcement learning algorithm, one of the advantages of DDPG is that its exploration can be independent of the learning algorithm, and the trained model can be directly used in the corresponding scenario, which has high engineering practical significance. The specific process of the algorithm is shown in Algorithm 1.

Algorithm 1 Self-triggering VSL based on improved DDPG algorithm.

5. Simulation Experiment and Analysis

The purpose of this section is to evaluate the implementation and effectiveness of deep reinforcement learning in VSL control strategies. To verify the advantages and disadvantages of the proposed method, this paper builds a joint simulation platform based on SUMO and PYTHON, respectively, using the DDPG-based algorithm (DDPG-VSL), improved DDPG algorithm (I-DDPG-VSL), no control (baseline) and traditional rule-based (rule-based) methods for VSL in different road environments to simulation and analysis.

The simulation platform in this paper uses the open-source software Simulation of Urban Mobility (SUMO 1.19.0). The software is very flexible and supports the use of its traffic control interface (TRACI) to set speed limits for vehicles. Moreover, SUMO can also define various vehicle models and vehicle-following and lane-changing parameters for the vehicle. PYTHON is selected for algorithm implementation, which can receive the traffic state information from the simulation platform and output the corresponding control strategy. The implementation architecture of its intelligent decision-making control strategy is shown in Figure 7. In terms of the construction of simulation scenarios, the selected area is exported from OpenStreetMap.org. Next, the traffic network for simulation is built based on this map. The netconvert command is used to convert the OSM file to the NET file.

The Hong Kong–Zhuhai–Macao Bridge highway from Zhuhai to Hong Kong is selected as the research scenario, as shown in Figure 8. The first reason is that the Hong Kong–Zhuhai–Macao Bridge is a typical long section without on-ramps and off-ramps. The VSL control strategy can only be used once the emergency occurs. The second reason is that the Hong Kong–Zhuhai–Macao Bridge, is a world-class cross-sea channel with national strategic significance, affecting the operation of the highway network and comprehensive transportation. Guangdong, Hong Kong, and Macao all have responsibilities [34], and the study of their roads has important engineering practical significance.

5.1. Simulation Model Parameters

To reduce the complexity of traffic flow as much as possible, four types of vehicles are selected, and the parameters are shown in Table 1. To simulate the road traffic flow as much as possible, the experiment covered traffic flow in three levels: flow, medium and high. To simulate dynamic traffic flow, Figure 9, Figure 10 and Figure 11 respectively show the traffic flow generated in different environments randomly selected during the simulation process.

5.2. Evaluation Metrics

Average travel time (ATT): ATT refers to the typical amount of time it takes for each vehicle to travel in a proposed scenario.
Potential collision number (PCN): Time-to-collision (TTC) defines the space gap divided by the difference speed of the lead vehicle and the following vehicle. If the value of TTC is greater than 3 s, the number of potential collisions will be increased by 1.

5.3. Simulation Results and Analysis

5.3.1. Parameter Setting and Training

When the traffic flow on the road is running stably, an emergency event is set to activate the trigger module to start the VSL intelligent decision-making module. Then, the emergency vehicle is removed after 3 min. After the traffic flow is detected to return to the normal state, the state of the trigger model is changed from activate to sleep. The traffic management and control strategies will stop output values of VSL, and the one-episode training process will be completed. Table 2 lists some necessary parameters for the DDPG algorithm.

The reward value obtained by the agent in each training episode can reflect the quality of its training effect. The higher the reward value, the better the training effect. The reward value obtained by the strategy based on the DDPG algorithm during the training process changes with the number of episodes, as shown in Figure 12. In addition, Figure 12 also shows the comparison of reward values under basic rule control and no control. As can be seen from Figure 12, before about 30 rounds, the reward value output based on the DDPG algorithm locally converges. This is because the memory bank parameter is set to 512 in the algorithm. When the memory pool space is filled, only then will the stored content in the memory pool be sampled and trained. However, only some conservative states are stored in this process based on the DDPG algorithm, and the action selection is relatively simple, so in the process of sampling and learning the states in the memory pool, the update of the reward value is relatively gentle and easy to converge locally. In the process of the management and control strategy based on the improved DDPG, the noisy parameter is introduced and tried to explore as many possible actions in the memory storage process. The action also can be explored, although the memory pooling is filled. The reward value based on the improved DDPG control strategy can quickly converge, and at the same time, some local convergence is avoided. At the same time, the change process of the reward value also shows that compared with rule-based and no-control strategies, the control strategy based on improved DDPG can gradually learn not only TRAFFIC data but also the impact data on traffic data including road linearity. Compared with the control strategy based on DDPG, the exploration process of improving DDPG algorithm is flexible, and the maximum reward value is finally obtained.

5.3.2. Results and Analysis

The proposed method was compared with the simple DDPG algorithm, rule-based control and no-control method in different traffic flow levels. Table 3, Table 4 and Table 5 show the implementation effects of traffic control strategies under different traffic flow levels, respectively. The results show that although no control is better than the other three control strategies in terms of efficiency at low traffic levels, it performs worse than other methods in terms of safety. Compared with the other three methods, the control based on improved DDPG performs excellent in safety. In terms of efficiency, it is also better than the rule-based and DDPG-based control strategies. Taking no control as the baseline, in a medium-level traffic flow environment, in terms of safety, the rule-based, DDPG-based and improved DDPG-based control strategies, respectively, reduced 17.24%, 24.14%, 29.31%, respectively, improving efficiency by 2.82%, 3.90%, 7.21%. In a high-level traffic flow environment, the performance of the four control strategies is similar to that of medium-level traffic flow. The control strategy based on improved DDPG performs best in terms of both safety and efficiency. The results demonstrate that VSL control based on an improved DDPG algorithm has the best performance on efficiency and safety under emergencies, and the safety improvement is greater than that in efficiency. It advises that at a relatively low level of traffic flow level, excessive vehicle control is not required; only timely guidance of traffic flow is required.

In addition, the weights of two evaluation indicators are calculated based on the entropy weight method [35,36] for three levels of traffic flows. The weights of the two indicators are

0.49

and

0.51

, respectively. The comprehensive evaluations using different methods under three traffic flows are shown in Figure 13. It is shown in Figure 13 that based on no-control methods as the benchmark, the comprehensive evaluation index value of the IDDPG-VSL method is the highest under three traffic flows in the comparison of three methods. Therefore, from a comprehensive point of view, the IDDPG-VSL-based traffic management and control strategy is the best for traffic performance.

6. Conclusions and Prospect

In response to the common traffic problem of reduced road capacity during emergencies, this paper innovatively proposes a self-triggered intelligent traffic management and control strategy based on VSL and reinforcement learning to achieve effective traffic flow evacuation. On the one hand, when an accident occurs, this strategy will be triggered immediately and automatically shut down when the traffic flow is recovered to avoid excessive management and control of the transportation system; on the other hand, this strategy can output suitable values of VSL for dynamic traffic flow. To test the effect of the strategy, this paper established a joint simulation platform of PYTHON and SUMO for verification. The results show that in terms of safety, under different traffic flow levels, the proposed strategy has improved by over 28.30% compared to other methods. In terms of efficiency, except for being inferior to no control at low traffic flow conditions, it has improved over 7.21% compared to others. In addition, from the perspective of the proposed comprehensive indicator, the IDDPG-VSL method has the highest performance under three-level traffic flows compared with other methods. The improvement in safety and efficiency has benefits for sustainable transport systems which make a positive contribution to environmental, social and economic sustainability.

The proposed strategy applies to a wide range of scenarios and applies to traffic congestion caused by traffic accidents, bad weather, etc. However, it does not apply to large-scale accidents that completely close the road. In subsequent research, we will consider expanding traffic scenarios and integrating ramp control into the VSL strategy to deal with larger traffic accidents. What is more, factors affecting the traffic flow on the road are also taken into account, such as weather factors [37].

Author Contributions

Conceptualization, J.Y. and P.W.; methodology, J.Y. and P.W.; case study, J.Y.; validation, J.Y. and P.W.; writing—original draft preparation, J.Y. and P.W.; writing—review and editing, J.Y., P.W. and Y.J.; supervision, Y.J.; All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China (52372321).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Appendix A

Table A1. List of Symbols.

Symbols	Explanation
i	The ID of each cell or sub-segment
$x_{i} (t)$	The state variable of cell or sub-segment i at time t
$v_{i} (t)$	The average speed of cell or sub-segment i at time t
$k_{i} (t)$	The density of cell or sub-segment i at time t
$q_{i} (t)$	The traffic flow of cell or sub-segment i at time t
$L_{i}$	The length of cell or sub-segment i
$Q_{m a x}$	The maximize capacity of road section
$Q_{b}$	The maximize capacity of road section when emergency occurs
$k_{c r}$	The critical density of road section
$k_{j a m}$	The jam density of road section
$N_{s}$	The number of state space of cell or sub-segment
$N_{c}$	The number of controlled cells or sub-segments
$s (t)$	The state space of agent at time t
$a (t)$	The action space of agent at time t
$a_{i} (t)$	The value of speed limit of of cell or sub-segment i at time t
$r (t)$	The value of reward function at time t
$f_{o u t} (t)$	The traffic flow of emergencies downstream section at time t
$f_{i n} (t)$	The traffic flow of emergencies upstream section at time t
$θ$	The policy network parameter
$ω$	The value network parameter
$δ_{t}$	The value of network loss at time t
$α$	The learning rate of actor
$β$	The learning rate of critic
ℵ	The parameter of action noise
$V_{N}$	Constant, related to the value of speed limit
I	Constant, related to the value of speed limit
$u (t)$	The implement value of VSL at time t

References

Yang, J.; Wang, P.; Yuan, W.; Ju, Y.; Han, W.; Zhao, J. Automatic Generation of Optimal Road Trajectory for the Rescue Vehicle in Case of Emergency on Mountain Freeway Using Reinforcement Learning Approach. IET Intell. Transp. Syst. 2021, 15, 1142–1152. [Google Scholar] [CrossRef]
Anjum, M.; Shahab, S. Emergency Vehicle Driving Assistance System Using Recurrent Neural Network with Navigational Data Processing Method. Sustainability 2023, 15, 3069. [Google Scholar] [CrossRef]
Wei, X.; Tian, S.; Dai, Z.; Li, P. Statistical Analysis of Major and Extra Serious Traffic Accidents on Chinese Expressways from 2011 to 2021. Sustainability 2022, 14, 15776. [Google Scholar] [CrossRef]
Lv, P.; Han, J.; Nie, J.; Zhang, Y.; Xu, J.; Cai, C.; Chen, Z. Cooperative Decision-Making of Connected and Autonomous Vehicles in an Emergency. IEEE Trans. Veh. Technol. 2023, 72, 1464–1477. [Google Scholar] [CrossRef]
Wang, P.; Yang, J.; Jin, Y.; Wang, J. Research on Allocation and Dispatching Strategies of Rescue Vehicles in Emergency Situation on the Freeway. In Proceedings of the 2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV), Shenzhen, China, 13–15 December 2020; pp. 130–135. [Google Scholar] [CrossRef]
Zhao, X.; Xu, W.; Ma, J.; Li, H.; Chen, Y.; Rong, J. Effects of Connected Vehicle-Based Variable Speed Limit under Different Foggy Conditions Based on Simulated Driving. Accid. Anal. Prev. 2019, 128, 206–216. [Google Scholar] [CrossRef] [PubMed]
Hu, W.; Cicchino, J.B. Effects of Lowering Speed Limits on Crash Severity in Seattle. J. Saf. Res. 2023, in press. [Google Scholar] [CrossRef]
Gaweesh, S.M.; Ahmed, M.M. Evaluating the Safety Effectiveness of a Weather-Based Variable Speed Limit for a Rural Mountainous Freeway in Wyoming. J. Transp. Saf. Secur. 2020, 12, 1205–1230. [Google Scholar] [CrossRef]
Fang, X.; Péter, T.; Tettamanti, T. Variable Speed Limit Control for the Motorway–Urban Merging Bottlenecks Using Multi-Agent Reinforcement Learning. Sustainability 2023, 15, 11464. [Google Scholar] [CrossRef]
Yasanthi, R.G.N.; Mehran, B.; Alhajyaseen, W.K.M. A Reliability-Based Weather-Responsive Variable Speed Limit System to Improve the Safety of Rural Highways. Accid. Anal. Prev. 2022, 177, 106831. [Google Scholar] [CrossRef]
Abohassan, A.; Contini, L.; Elmasry, H.; El-Basyouny, K. Assessing the Effectiveness of Speed Limit Reduction in Edmonton: A Case Study Analysis. Accid. Anal. Prev. 2024, 195, 107379. [Google Scholar] [CrossRef]
Han, Y.; Hegyi, A.; Zhang, L.; He, Z.; Chung, E.; Liu, P. A New Reinforcement Learning-Based Variable Speed Limit Control Approach to Improve Traffic Efficiency against Freeway Jam Waves. Transp. Res. Part C Emerg. Technol. 2022, 144, 103900. [Google Scholar] [CrossRef]
Othman, B.; De Nunzio, G.; Di Domenico, D.; Canudas-de-Wit, C. Analysis of the Impact of Variable Speed Limits on Environmental Sustainability and Traffic Performance in Urban Networks. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21766–21776. [Google Scholar] [CrossRef]
Vrbanić, F.; Ivanjko, E.; Kušić, K.; Čakija, D. Variable Speed Limit and Ramp Metering for Mixed Traffic Flows: A Review and Open Questions. Appl. Sci. 2021, 11, 2574. [Google Scholar] [CrossRef]
Yuan, T.; Alasiri, F.; Ioannou, P.A. Selection of the Speed Command Distance for Improved Performance of a Rule-Based VSL and Lane Change Control. IEEE Trans. Intell. Transp. Syst. 2022, 23, 19348–19357. [Google Scholar] [CrossRef]
Yuan, T.; Alasiri, F.; Zhang, Y.; Ioannou, P.A. Evaluation of Integrated Variable Speed Limit and Lane Change Control for Highway Traffic Flow. IFAC-PapersOnLine 2021, 54, 107–113. [Google Scholar] [CrossRef]
Tajdari, F.; Roncoli, C. Online Set-Point Estimation for Feedback-Based Traffic Control Applications. IEEE Trans. Intell. Transp. Syst. 2023, 24, 10830–10842. [Google Scholar] [CrossRef]
Shang, W.L.; Chen, Y.; Ochieng, W.Y. Resilience Analysis of Transport Networks by Combining Variable Message Signs With Agent-Based Day-to-Day Dynamic Learning. IEEE Access 2020, 8, 104458–104468. [Google Scholar] [CrossRef]
Guo, Y.; Xu, H.; Zhang, Y.; Yao, D. Integrated Variable Speed Limits and Lane-Changing Control for Freeway Lane-Drop Bottlenecks. IEEE Access 2020, 8, 54710–54721. [Google Scholar] [CrossRef]
Fang, J.; Ye, H.; Easa, S.M. Modified Traffic Flow Model with Connected Vehicle Microscopic Data for Proactive Variable Speed Limit Control. J. Adv. Transp. 2019, 2019, e8151582. [Google Scholar] [CrossRef]
Han, Y.; Wang, M.; He, Z.; Li, Z.; Wang, H.; Liu, P. A Linear Lagrangian Model Predictive Controller of Macro- and Micro- Variable Speed Limits to Eliminate Freeway Jam Waves. Transp. Res. Part C Emerg. Technol. 2021, 128, 103121. [Google Scholar] [CrossRef]
Mao, P.; Ji, X.; Qu, X.; Li, L.; Ran, B. A Variable Speed Limit Control Based on Variable Cell Transmission Model in the Connecting Traffic Environment. IEEE Trans. Intell. Transp. Syst. 2022, 23, 17632–17643. [Google Scholar] [CrossRef]
Wu, Y.; Tan, H.; Qin, L.; Ran, B. Differential Variable Speed Limits Control for Freeway Recurrent Bottlenecks via Deep Actor-Critic Algorithm. Transp. Res. Part C Emerg. Technol. 2020, 117, 102649. [Google Scholar] [CrossRef]
Zheng, S.; Li, M.; Ke, Z.; Li, Z. Coordinated Variable Speed Limit Control for Consecutive Bottlenecks on Freeways Using Multiagent Reinforcement Learning. J. Adv. Transp. 2023, 2023, e4419907. [Google Scholar] [CrossRef]
You, F.; Zhang, R.; Lie, G.; Wang, H.; Wen, H.; Xu, J. Trajectory Planning and Tracking Control for Autonomous Lane Change Maneuver Based on the Cooperative Vehicle Infrastructure System. Expert Syst. Appl. 2015, 42, 5932–5946. [Google Scholar] [CrossRef]
Shang, W.L.; Tao, X.; Bi, H.; Chen, Y.; Zhang, H.; Ochieng, W.Y. Audio Related Quality of Experience Evaluation in Urban Transportation Environments With Brain Inspired Graph Learning. IEEE Trans. Intell. Transp. Syst. 2023, 24, 13841–13851. [Google Scholar] [CrossRef]
Yu, H.; Jiang, R.; He, Z.; Zheng, Z.; Li, L.; Liu, R.; Chen, X. Automated Vehicle-Involved Traffic Flow Studies: A Survey of Assumptions, Models, Speculations, and Perspectives. Transp. Res. Part C Emerg. Technol. 2021, 127, 103101. [Google Scholar] [CrossRef]
Han, Y.; Chen, D.; Ahn, S. Variable Speed Limit Control at Fixed Freeway Bottlenecks Using Connected Vehicles. Transp. Res. Part B Methodol. 2017, 98, 113–134. [Google Scholar] [CrossRef]
van Wageningen-Kessels, F.; van Lint, H.; Vuik, K.; Hoogendoorn, S. Genealogy of Traffic Flow Models. EURO J. Transp. Logist. 2015, 4, 445–473. [Google Scholar] [CrossRef]
Vázquez-Canteli, J.R.; Nagy, Z. Reinforcement Learning for Demand Response: A Review of Algorithms and Modeling Techniques. Appl. Energy 2019, 235, 1072–1089. [Google Scholar] [CrossRef]
Daganzo, C.F. The Cell Transmission Model: A Dynamic Representation of Highway Traffic Consistent with the Hydrodynamic Theory. Transp. Res. Part B Methodol. 1994, 28, 269–287. [Google Scholar] [CrossRef]
Wan, Q.; Peng, G.; Li, Z.; Inomata, F.H.T. Spatiotemporal Trajectory Characteristic Analysis for Traffic State Transition Prediction near Expressway Merge Bottleneck. Transp. Res. Part C Emerg. Technol. 2020, 117, 102682. [Google Scholar] [CrossRef]
Vrbanić, F.; Miletić, M.; Tišljarić, L.; Ivanjko, E. Influence of Variable Speed Limit Control on Fuel and Electric Energy Consumption, and Exhaust Gas Emissions in Mixed Traffic Flows. Sustainability 2022, 14, 932. [Google Scholar] [CrossRef]
Wei, S.; Li, Y.; Yang, H.; Xie, M.; Wang, Y. A Comprehensive Operation and Maintenance Assessment for Intelligent Highways: A Case Study in Hong Kong-Zhuhai-Macao Bridge. Transp. Policy 2023, 142, 84–98. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, X.; Yuan, S.; Wang, K. Economic, Social, and Ecological Impact Evaluation of Traffic Network in Beijing–Tianjin–Hebei Urban Agglomeration Based on the Entropy Weight TOPSIS Method. Sustainability 2021, 13, 1862. [Google Scholar] [CrossRef]
Blagojević, A.; Stević, Ž.; Marinković, D.; Kasalica, S.; Rajilić, S. A Novel Entropy-Fuzzy PIPRECIA-DEA Model for Safety Evaluation of Railway Traffic. Symmetry 2020, 12, 1479. [Google Scholar] [CrossRef]
Qiu, H.; Zhu, W.; Wang, J.; Zeng, Z. Variable Speed Limit in Foggy Conditions Considering Dynamic Sight Distance and Friction. In Proceedings of the Institution of Civil Engineers—Transport; Emerald Publishing Limited: Leeds, UK, 2023; pp. 1–9. [Google Scholar] [CrossRef]

Figure 1. Segment divide based on the CTM.

Figure 2. Fundamental diagram under emergency and normal environment.

Figure 3. Fundamental diagram under VSL control strategy.

Figure 4. Markov decision process.

Figure 5. Structure of diagram of VSL control under Cooperative Vehicle Infrastructure System.

Figure 6. Framework of self-triggering intelligent decision-making control system.

Figure 7. Implementation architecture of intelligent decision control strategy.

Figure 8. Selected Hong Kong–Zhuhai–Macao bridge freeway section (Zhuhai to Hong Kong direction).

Figure 9. The relationship between depart time and number of vehicles under lower traffic flow.

Figure 10. The relationship between depart time and number of vehicles under medium traffic flow.

Figure 11. The relationship between depart time and number of vehicles under high traffic flow.

Figure 12. The reward of different methods during training process.

Figure 13. The comprehensive evaluations under three traffic flows.

Table 1. The parameters of vehicles.

Type of Vehicle	Length of Vehicle	Car Follow Model
Type 1	5.0	IDM
Type 2	5.0	Krauss
Type 3	8.0	IDM
Type 4	8.0	Krauss

Table 2. The parameters of DDPG algorithm.

Parameters of Algorithm	Values
Sampling times	15
Control horizon	15
iteration	420
Critic learning rate	0.0005
Actor learning rate	0.0002
Memory pool parameter	215
Sampling number	32
Max training episode	600
Noise value of action	0.3

Table 3. Comparison of control strategies under lower traffic flow.

Method	ATT	PCN
No control (Baseline)	81.27	106
Rule-based	87.89	92
Rule-based	8.15%	−13.2%
DDPG-VSL	92.57	79
DDPG-VSL	13.90%	−25.47%
IDDPG-VSL	82.88	76
IDDPG-VSL	1.98%	−28.30%

The best performances are in bold.

Table 4. Comparison of control strategies under medium traffic flow.

Method	ATT	PCN
No control (Baseline)	91.22	116
Rule-based	88.65	96
Rule-based	−2.82%	−17.24%
DDPG-VSL	87.66	88
DDPG-VSL	−3.90%	−24.14%
IDDPG-VSL	84.64	82
IDDPG-VSL	−7.21%	−29.31%

The best performances are in bold.

Table 5. Comparison of control strategies under high traffic flow.

Method	ATT	PCN
No control (Baseline)	92.7	124
Rule-based	88.69	99
Rule-based	−4.32%	−20.16%
DDPG-VSL	87.56	90
DDPG-VSL	−5.55%	−27.42%
IDDPG-VSL	85.39	83
IDDPG-VSL	−7.89%	−33.07%

The best performances are in bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, J.; Wang, P.; Ju, Y. Variable Speed Limit Intelligent Decision-Making Control Strategy Based on Deep Reinforcement Learning under Emergencies. Sustainability 2024, 16, 965. https://doi.org/10.3390/su16030965

AMA Style

Yang J, Wang P, Ju Y. Variable Speed Limit Intelligent Decision-Making Control Strategy Based on Deep Reinforcement Learning under Emergencies. Sustainability. 2024; 16(3):965. https://doi.org/10.3390/su16030965

Chicago/Turabian Style

Yang, Jingwen, Ping Wang, and Yongfeng Ju. 2024. "Variable Speed Limit Intelligent Decision-Making Control Strategy Based on Deep Reinforcement Learning under Emergencies" Sustainability 16, no. 3: 965. https://doi.org/10.3390/su16030965

APA Style

Yang, J., Wang, P., & Ju, Y. (2024). Variable Speed Limit Intelligent Decision-Making Control Strategy Based on Deep Reinforcement Learning under Emergencies. Sustainability, 16(3), 965. https://doi.org/10.3390/su16030965

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Variable Speed Limit Intelligent Decision-Making Control Strategy Based on Deep Reinforcement Learning under Emergencies

Abstract

1. Introduction

2. Problem Proposed

2.1. Traffic Flow Model

2.2. Principle of Variable Speed Limit

3. Markov Decision Process of VSL Control

3.1. Markov Decision Process

3.2. MDP of VSL Control

4. VSL Control Problem Based on Improving DDPG under Emergency

Neural Network Design and Algorithm Update

5. Simulation Experiment and Analysis

5.1. Simulation Model Parameters

5.2. Evaluation Metrics

5.3. Simulation Results and Analysis

5.3.1. Parameter Setting and Training

5.3.2. Results and Analysis

6. Conclusions and Prospect

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI