Next Article in Journal
Research on Trajectory Tracking Control Method for Crawler Robot Based on Improved PSO Sliding Mode Disturbance Rejection Control
Previous Article in Journal
Multi-Camera Hierarchical Calibration and Three-Dimensional Reconstruction Method for Bulk Material Transportation System
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Multi-Level Speed Guidance Cooperative Approach Based on Bidirectional Periodic Green Wave Coordination Under Intelligent and Connected Environment

1
College of Computer Science and Engineering, Guilin University of Technology, Guilin 541006, China
2
Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin 541004, China
3
College of Earth Sciences, Guilin University of Technology, Guilin 541004, China
4
School of Mechanical Engineering, Guilin University of Aerospace Technology, Guilin 541004, China
5
College of Arts, Guilin University of Technology, Guilin 541004, China
*
Authors to whom correspondence should be addressed.
Sensors 2025, 25(7), 2114; https://doi.org/10.3390/s25072114
Submission received: 3 February 2025 / Revised: 20 March 2025 / Accepted: 25 March 2025 / Published: 27 March 2025
(This article belongs to the Section Vehicular Sensing)

Abstract

:
To maximize arterial green wave bandwidth utilization, this study aims to minimize average travel delays at coordinated intersections and maximize vehicle throughput. In view of the aforementioned points, the present paper sets out a collaborative optimization method for the control of related intersection groups. The method combines multi-level speed guidance with green wave coordinated control. In an intelligent and connected environment (ICE), the driving trajectory of the initial vehicle is determined in each optimization cycle following the receipt of active speed guidance. Subsequently, the driving trajectories of subsequent vehicles are calculated, with an assessment made as to whether they can leave the intersection before the end of the green light. The subsequent step involves the calculation of a characteristic index, comprising the average speed of the arterial coordination section and its corresponding phase offset. The phase offset is then optimized with the objective of maximizing the comprehensive bandwidth of green wave coordination within the control range. The maximum average speed and the bidirectional cycle comprehensive green wave bandwidth are employed as the control objectives. Finally, a model is constructed through the combination of multi-level vehicle speed guidance with bidirectional cycle green wave coordinated control. A bi-level combinatorial optimization method is constructed through a combinatorial deep Q learning method, named Deep Q Network-Genetic Algorithm (DQNGA), with the objective of obtaining the global optimal solution. Finally, the reliability of the method is validated using traffic flow data and map sensor data on several associated road sections in a city. The results demonstrate that the proposed method reduces the average delay and number of stops by 20.76% and 44.49%, respectively, outperforming conventional traffic control strategies. This suggests that the issue of inefficient utilization of green light time in arterial coordinated signal control has been effectively addressed. Consequently, the efficiency of intersections in the intelligent and connected environment has been enhanced.

1. Introduction

The issue of traffic congestion has become increasingly pertinent in light of the exponential growth in the number of vehicles on the road. The arterial roads that traverse the central districts of urban areas represent a crucial component of the urban traffic system. However, these roads are experiencing an increase in traffic volume, which is the primary cause of traffic congestion. It is therefore essential to enhance the overall traffic capacity of the road network by optimizing the arterial coordination effects, including speed, green ratio, phase offset, green wave bandwidth and other related factors. In the conventional green wave coordinated control approach, the design speed and operational speed of the green wave are frequently fixed, and the green wave bandwidth of arterial roads is not fully exploited. For instance, Sangkey et al. proposed a dynamic bandwidth analysis method utilizing closed-loop signal data [1]. Zhou et al. developed an uneven double cycling method to significantly reduce delay at intersections, thereby providing preliminary guidelines for arterial coordination [2].
When vehicles are in motion on the road, drivers adjust and control their speed by sensing the distances between the front and the following vehicles. This is achieved through a process of visual perception, whereby the driver monitors the distance between their vehicle and the one in front, and adjusts their speed accordingly. This is achieved without the benefit of real-time access to the status of the traffic signals at the corresponding intersection. Consequently, it is difficult to regulate the speed of the vehicle in a timely manner. In recent years, there has been a notable advancement in the sophistication of connected vehicle (CV) technology, which has enabled vehicles to become connected to a multitude of elements, including people, vehicles, infrastructure, and the environment. The travel time of the vehicle is actively adjusted based on information pertaining to the current driving state of the vehicle and the operational state of the roadside traffic signal controller. This is to guarantee that the driver is directed safely through the intersection within the allotted green light time of the current signal cycle. Connected vehicles are able to pass through clusters of associated intersections without stopping in vehicle formations through active speed guidance. Furthermore, enhancing the exchange of information between vehicles and infrastructures can optimize traffic signals to accommodate real-time traffic flows in the future. A number of researchers and institutions have generated substantial achievements in this field to date. This field is divided into the following aspects.
One avenue for future research is the incorporation of a speed guidance model into signal optimization in traditional environments. The traditional study area is primarily concerned with the optimization of signal timing, encompassing the optimization of phase sequence and phase duration for a single intersection [3,4] and multiple intersections (such as road links or two adjacent intersections) [5,6]. In contrast, in traditional environments, the objective of speed guidance is to control speed, resulting in a consistent vehicle speed through variable speed limit facilities, which in turn improves mobility and safety. Liu et al. put forth an integrated model and its solution algorithm for the control of freeway corridors during incidents [7]. Wang et al. developed the model based on the intelligent driver model and variable speed limit control method [8]. However, the aforementioned studies are limited by the technology used to obtain vehicle data. The principal objective of this research is to develop a model for optimizing signal control based on idealized assumptions regarding speed guidance. Nevertheless, this approach does not fully align with the actual operational context.
The second research direction is to investigate the guidance of vehicles at varying speeds in a connected vehicle (CV) environment. This includes the optimization of signal timing at single intersections and the dynamic control and guidance of vehicles at varying speeds on arterial roads. On one hand, in terms of signal timing optimization at a single intersection. Zheng and Liu estimated traffic volumes and then modelled vehicle arrivals at signalized intersections as a time-dependent Poisson process [9]. Sun et al. proposed an innovative intersection operation scheme named as MCross: Maximum Capacity intersection operation scheme with signals [10]. Feng et al. proposed a control framework comprising signal optimization and vehicle trajectory control [11]. Yu et al. presented a mixed integer linear programming model for the purpose of optimizing vehicle trajectories and traffic signals in a connected vehicle environment [12]. Liang et al. used a deep reinforcement learning model for the purpose of controlling the traffic light cycle [13]. Emami et al. adaptively optimized traffic signal plans based on the different penetration information of connected vehicles [14]. Yang et al. proposed a recursive estimation algorithm to update the distribution of the value of times, using the lane choice information of connected vehicles [15]. Rafter et al. proposed a novel algorithm which combines position information from connected vehicles with data obtained from signal timing plans [16]. Mohebifard and Hajbabaie presented a method for cooperative signal timing and traffic metering in urban street networks with various connected vehicle market penetration rates [17]. On the other hand, in the aspects of dynamic speed control and guidance in arterial roads, Kamal et al. presented a novel control system to drive a vehicle efficiently on roads containing varying traffic and signals at intersections [18]. Wu et al. developed two vehicle speed guidance methods to decrease the delay and number of stops at intersections [19]. Lin et al. provided an algorithm to obtain vehicles’ dynamics parameters in a variant-speed area [20]. Ma et al. presented a partition-enabled multi-mode band model that is designed to solve the signal coordination problem [21]. Xu et al. optimized the traffic signal timing and vehicles’ speed trajectories at an isolated intersection [22]. Wu et al. proposed a novel joint control method based on model predictive control and connected vehicles for on-ramp metering and speed guidance on the urban expressway [23]. Zhou et al. proposed a hybrid cooperative intersection control framework consisting of microscopic-level virtual platooning control and macroscopic-level traffic flow regulation with connected vehicles [24]. Chen et al. built three conflict modes of different traffic movements and presented safe speed-dependent constraints for them [25]. Liang et al. proposed a decentralized signal control algorithm that leverages connected vehicles’ information to improve urban traffic operations [26]. Chen et al. developed an adaptive signal control that adjusts the signal timing plan while considering both the urban adjacent intersections’ traffic volume and the vehicles’ waiting time [27]. Yang et al. proposed a novel approach to integrate optimal control of perimeter intersections into a perimeter control scheme [28].
As demonstrated in the literature review, speed guidance and signal control remain prominent research topics. When optimizing signal timing at a single intersection, it is possible to jointly optimize signal timing parameters and vehicle speed in order to facilitate more accurate vehicle arrival planning. Nevertheless, previous studies have proposed that the trajectory of each vehicle should be determined based on fixed signal timing and an ideal green wave bandwidth. However, existing methods fail to address communication reliability and multi-lane traffic dynamics, limiting their practical applicability. Furthermore, the extant signal timing optimization method fails to consider the actual dynamics of vehicles in order to regulate their movements. A second limitation of the current study is that it only examined simple intersections or arterials with only one lane per approach. However, it should be noted that multi-lane approaches to intersections are a common feature of urban road networks. It is therefore necessary to develop a control method that can accommodate the general case, including groups of intersections with multi-lane approaches and varying traffic states. Thirdly, control strategies based on limited information may result in suboptimal operations. The absence of global traffic information may result in collaborative control optimization strategies that deviate from the optimal solution for the system as a whole.
Another avenue for investigation is the optimization of connected vehicle trajectories with the objective of reducing fuel and energy consumption, as well as phase offset, given fixed signal timings. Naturally, the optimal control methods were modelled based on the vehicle speeds or acceleration rates [29,30,31]. Liu et al. developed an optimal mode selection and resource allocation (MSRA) policy that maximizes the long-term overall throughput of a time-varying dynamic energy harvesting D2D-enabled cellular network (EH-DCN), subject to an age of information (AoI) constraint [32]. He et al. proposed a multi-stage optimal control formulation [33]. Tang et al. introduced a speed guidance strategy into a car-following model to analyze the relationship between driving behavior and fuel consumption [34]. Zhao and Zhang employed an online learning-based driving dynamics prediction model to forecast a set of uncertain driving states of the preceding vehicle [35]. He et al. proposed an overall control strategy for heterogeneous vehicle platoons with the objective of minimizing energy consumption [36]. Recent studies have proposed the integration of traffic signal control and vehicle trajectory control within a unified framework. Wu et al. proposed an integrated signal coordination control model to optimize the dynamic travel speed and signal offsets [37]. Tajalli et al. presented distributed optimization and coordination algorithms for dynamic speed optimization of connected and autonomous vehicles [38]. Wang et al. developed a joint control model for optimizing the connected vehicle speeds and coordinating signals along an urban arterial simultaneously [39]. Bie et al. proposed a dynamic headway control method for a high-frequency route with a bus lane [40]. Liu et al. developed a reservation-based cooperative transit signal priority mechanism system to optimize the signal scheme and speed guidance [41]. In view of the non-convexity of formulated problems with a large number of optimization variables and the dynamics of EH-DCNs with high real-time requirements, it is challenging to obtain a long-term optimal MSRA policy through conventional optimization methods. Fortunately, deep reinforcement learning (DRL), often represented by deep Q-networks (DQN) and deep deterministic policy gradient (DDPG), is a powerful tool for achieving optimal solutions in time-varying dynamic EH-DCNs. However, it should be noted that DQN demonstrates notable proficiency in addressing discrete variable problems but encounters challenges in the context of problems incorporating continuous variables. While DDPG is capable of handling problems involving both continuous and discrete variables, the issue of Q-value overestimation persists and requires resolution. To address this challenge, we have developed the MSRA twin delayed deep deterministic policy gradient (MSRA-TD3) algorithm, with the objective of enhancing the efficacy of DDPG in dealing with problems that involve both continuous and discrete variables.
In summary, the majority of the aforementioned studies have employed fixed signal timings for traffic signals, with control strategies designed to optimize energy consumption at specific speeds. Nevertheless, existing studies have indicated that platooning may not be able to maintain robust stability at specific speeds, which could potentially affect the operational efficiency of these methods. The microscopic nature of the existing algorithms, in which the trajectory of each connected vehicle is controlled, leads to a significant increase in the complexity of the algorithms used to solve the problem. Therefore, there is a clear need to improve the aforementioned algorithms and optimize their response efficiency.
In general, several problems remain to be addressed in the current research.
(1)
Despite the existence of some studies investigating vehicle speed guidance, the majority of current research focuses on a single intersection or the traditional arterial green wave coordinated control. There has been little consideration of the demand for additional green waves, which is the unused portion of the green wave under arterial green wave coordinated control. The introduction of an additional green wave has some impact on vehicle speed, which in turn has an impact on phase offset.
(2)
In the context of traditional traffic control, it is not uncommon for vehicles to experience difficulties in maintaining a constant speed, which can lead to a stop-and-go situation. To address this problem, it has been proposed to implement speed guidance for individual vehicles rather than for groups. However, further improvements to the control system are needed. In the context of coordinated control models, the parameters typically used for green wave optimization include fixed elements such as travel path, initial queue length, driving safety and driving speed. It is important to recognize that these models are not global and not fully dynamic optimization models.
(3)
The connected vehicle fleets are intercepted at the upstream intersection as they move in multiple directions across different lanes, in accordance with the green wave band speed designed for bidirectional cycle-coordinated intersection groups. The effect of the green wave is somewhat limited. There is a lack of research investigating the correlation and trade-off analysis between the continuous passing speed of connected vehicles and the safe and accurate multi-level speed active guidance in a bidirectional cycle-coordinated green wave control environment.
In summary, existing methods have three limitations: (1) static green wave designs that ignore dynamic traffic variations; (2) underutilization of additional green wave bandwidth; and (3) absence of joint vehicle–signal optimization. This paper addresses these gaps by proposing a bidirectional cycle green wave coordination framework integrated with multi-level speed guidance, enabled by a novel DQNGA hybrid algorithm for global optimization. Thus, this work contributes to the field in two key aspects: Firstly, a bidirectional cycle green wave coordinated control and multi-level vehicle speed guidance collaborative optimization model based on multi-dimensional comprehensive control objectives under a connected vehicle environment is established. Secondly, the model employs an artificial intelligence algorithm that is capable of achieving global optimization. This collaborative optimization model is distinguished from other studies by its utilization of the arterial green wave bandwidth time, including additional green waves, and its ability to adaptively determine the global optimal speed based on the real-time traffic signal state. This enables connected vehicle fleets to pass through arterial-related intersections at the global optimal speed without stopping.
The remainder of this paper is organized into the following sections: Section 2 provides a detailed description of the problem. Section 3 outlines the methodology and mathematical modelling employed. Section 4 presents a case study. The conclusion is presented in Section 5.

2. Problem Description

The model for coordinating the ideal arterial green wave is constructed based on the assumption of stable traffic flow [42,43]. In a road section where the green wave coordinated control is in operation, vehicles are permitted to travel at the green wave speed to achieve non-stop travel through the intersection. Nevertheless, the practical application of the green wave coordinated control model is presented in Figure 1 (for further details, please refer to Figure 1, which presents the control model of upstream coordination direction). It is generally acknowledged that the coordination cycle cannot be guaranteed to be consistent. Furthermore, the phase offsets between some intersections and their upstream and downstream counterparts are also inconsistent, which may result in the generation of additional green waves. Figure 1 (see b1 and b2) illustrates that vehicles passing within the public green wave bandwidth b continue to travel at the green wave speed (defined as v0) and can pass through the green wave smoothly. As shown in Figure 1, vehicles A and B are travelling at the green wave speed, which results in an empty green light at the downstream intersection. The issue can be resolved by instructing vehicles A and B to accelerate to speeds V1 and V2, respectively. In contrast, Vehicle C is also travelling at the green wave speed but encounters a red light at the downstream intersection, resulting in traffic delays. If vehicle C accelerates to a speed of V3, it may pass the intersection during the current green light cycle. An alternative option would be for the vehicle to decelerate to a speed of V4 in order to pass the intersection during the next green light cycle.
In an intelligent and connected environment, connected vehicles have the capacity to disseminate operating information (for example, speed, lane changing, acceleration, position) to other vehicles, while concurrently detecting the real-time traffic state at intersections (for example, traffic flow, traffic capacity, travel time) and traffic signal timing state (for example, red light time, green light time, cycle length) via vehicle-to-infrastructure (V2I). Additionally, the connected vehicles receive information regarding traffic signals and guidance speeds. However, the inherent randomness of vehicle speeds and queue lengths poses significant challenges in practical implementation. For instance, the green light time in other phases must be reduced to adapt to the arterial coordinated control.
In contrast to the previous method, the acquisition of real-time traffic data enables the implementation of bidirectional green wave coordinated control through active speed guidance in a connected vehicle environment. The system is capable of self-adjusting in real time, enabling the continuous flow of vehicles through intersection groups without stopping. In addition, the model is able to achieve the optimum comprehensive green wave bandwidth for intersection groups associated with arterial roads while adapting to vehicle operating states. This has the potential to increase the average speed of connected vehicle fleets traversing arterial roads, as well as optimizing the capacity and traffic service of said roads through a combination of optimal and self-adaptive phase offsets. The overall concept is illustrated in Figure 2.

3. Mathematical Modeling

The paper presents a collaborative optimization method that integrates fleet motion control, speed guidance and green wave bandwidth optimization with the aim of improving the traffic capacity of arterial-related intersection groups. Phase offset optimization is based on real-time traffic flow, with all connected vehicles maintaining a fleet and operating at an optimal average speed. Ultimately, the goal is to facilitate the uninterrupted passage of as many connected vehicles as possible through arterial-related intersection groups.

3.1. Model Assumptions and Parameter Definitions

The model is established on the following assumptions:
(1)
Vehicles within the control area are assumed to fully comply with speed guidance instructions [44].
(2)
Vehicles will not change lanes after entering the control area [45].
(3)
The interference of pedestrians and non-motorized vehicles is not considered [46].
(4)
Connected vehicles can achieve mutual cooperative control when they are in good communication with each other and with the roadside unit. The central control unit is capable of executing the assigned driving tasks [47].
(5)
The on-board units of connected vehicles and roadside units communicate with each other via the IEEE 802.11p protocol, which ensures the accuracy and real-time performance of the collected information [48].
Table 1 presents the definitions of the principal parameters of the optimization model.

3.2. Bidirectional Cycle Green Wave Coordinated Control Based on Multi-Level Speed Guidance Collaborative Method Under a Connected Vehicle Environment

3.2.1. Vehicle Speed Guidance Strategy Under a Connected Vehicle Environment

In a connected vehicle environment, a vehicle transmits information about its position and speed to the central control unit when it enters the defined control area. The size of the control area depends on the effective distance between the vehicle and the infrastructure. The objective of the optimization process is to determine the initial and target states of the vehicles. In order to reduce vehicle travel time and optimize the use of green wave bandwidth to increase vehicle throughput, it is necessary to allow vehicles to pass through arterial intersection groups at the maximum speed permitted by the road. This ensures that all vehicles reach the same target state before reaching the downstream intersection. This is achieved by the multi-level speed control method proposed in this paper. Figure 3 schematically illustrates the optimal speed control curve.
The target state model is presented in Equations (1) and (2):
V i j e n d = V max t i j e n d = t i 1 e n d + n = 2 j h i n l i j e n d = l i 1 e n d
h i j S 0 + l e n i j 1 V i j e n d h t
where rki and rki+1 represent the start time of the red light at the ith optimization cycle and the i + 1th optimization cycle of intersection k, respectively. gki represents the start time of the green light at the ith optimization cycle of intersection k. In this paper, we take the time from the start of the red light to the end of the green light at the downstream intersection as one optimization cycle.
In the context of speed guidance, it is of paramount importance to consider both vehicle acceleration and deceleration. Nevertheless, numerous studies tend to neglect the transitional stage between acceleration and deceleration. The objective of this paper is to reduce non-uniform motion time by focusing on the maximum acceleration (referred to as amax) with both positive and negative values. Upon entering the control area, the vehicle must undergo uniform acceleration until it reaches a velocity of Vmax. The optimization process comprises at least two stages. If the vehicle cannot pass through the intersection during the current green light, it is imperative that it is guided to maintain its speed in order to avoid the necessity of stopping and queuing at the intersection. This will permit the vehicle to pass through the intersection during the next green light. To ensure that the vehicle does not stop and queue, it will pass through a suitable uniform speed transitional stage before accelerating to Vmax. The speed control process of the vehicle will be divided into four stages. The first stage involves the vehicle accelerating or decelerating to reach the first guiding speed. The second stage is the uniform motion stage at the first guiding speed. The third stage comprises the second uniform acceleration motion, which accelerates to Vmax. The fourth stage involves uniform motion with Vmax. The subsequent paragraph will provide a detailed discussion of the two-stage and four-stage methods.
The multi-level speed guidance control method for each optimization cycle is described as follows:
Step 1: Determine the speed control curve for the first vehicle in each optimization cycle.
The leading vehicle is defined as the first vehicle to pass through the intersection after the start of each optimization cycle. By determining the speed control curve of the leading vehicle, it is possible to calculate the time range for subsequent vehicles to reach the downstream intersection and to establish the basic line shape of the speed control curve for those vehicles.
A speed control curve is calculated for the leading vehicle based on its initial speed upon arrival at the control area, the distance between the vehicle and the downstream intersection, and the signal state information of the downstream intersection. The speed control curve for the leading vehicle may be constituted by either two-stage speed control curves or four-stage speed control curves. The optimization method is given as follows.
(a)
Two-stage method
When entering the control area with the intention of passing the intersection smoothly, the leading vehicle is permitted to accelerate directly to Vmax. This not only addresses the issue of green light-empty release of additional green wave bandwidth but also reduces the queue and improves the traffic efficiency of intersection groups. Figure 4 shows the schematic diagram of active speed guidance in the two-stage method.
The time range for the leading vehicle to enter the control area applicable to the two-stage method can be calculated by determining the start and end times of the green light at the downstream intersection. When the leading vehicle arrives at the intersection exactly at the start time of the green light, the time (defined as  t i 1 0 ) at which the vehicle enters the control area can be calculated by Equation (3). In order to address this theoretical optimization, the following proposal is put forward: Probabilistic Compliance Models: The introduction of partial adherence rates (defined as ηcompliance).
V m a x = V i 1 0 + a m a x ( t i 1 1 t i 1 0 ) l 1 = ( V m a x ) 2 ( V i 1 0 ) 2 2 a m a x l 2 = V m a x ( t i 1 2 t i 1 1 ) L = l 1 + l 2 η c o m p l i a n c e = 0.8
Derive  t i 1 0  from Equation (3). The result is shown in Equation (4).
t i 1 0 = t i 1 2 L V max + V max 2 V i 1 0 2 2 a max V max V max V i 1 0 a max t i 1 1 = t i 1 2 L V max + V max 2 V i 1 0 2 2 a max V max
Similarly, as demonstrated in Equation (5), it is possible to calculate the time (defined as  t i e n d 0 ) when the leading vehicle enters the control area and arrives at the intersection precisely at the end of the green light time.
t i e n d 0 = t i e n d 2 L V max + V max 2 V i e n d 0 2 2 a max V max V max V i e n d 0 a max
Thus, the two-stage speed guidance control method is applicable to the leading vehicle in the  t i 1 0 , t i e n d 0 .
(b)
Four-stage method
The two-stage method calculations indicate that the leading vehicle entering the control area will encounter a red-light queue at the downstream intersection. In order to reduce the average delay time and ensure that the vehicle reaches Vmax before entering the intersection, it is necessary to employ the four-stage method in order to induce the vehicle to pass through the intersection without stopping. The diagram of speed guidance is shown in Figure 5.
The objective of this paper is to identify the optimal speed control curve for the vehicle, which should result in the minimum time required for uniform speed change and the maximum guiding vehicle speed. The four-stage speed control curve displays a relatively high degree of flexibility. The optimal speed control curve is calculated as follows.
The time range for the leading vehicle to enter the control area applicable to the four-stage method can be calculated by determining the start and end times of the green light at the downstream intersection. When the leading vehicle arrives at the intersection exactly at the start time of the green light, the time (defined as  T i 1 0 ) at which the vehicle enters the control area can be calculated by Equations (6) and (7). As shown in Figure 5, Equation (6) can be obtained from the relationship between speed and acceleration.
V i 1 = V i 1 0 + a m a x ( T i 1 1 T i 1 0 ) V m a x = V i 1 + a m a x ( T i 1 3 T i 1 2 )
The total displacement within the control range (defined as L) is measured when the vehicle is in the four-phase method, which can be calculated by Equation (7).
L 1 = ( V i 1 ) 2 ( V i 1 0 ) 2 2 a m a x L 2 = V i 1 ( T i 1 2 T i 1 1 ) L 3 = V m a x 2 V i 1 2 2 a m a x L 4 = V m a x T i 1 4 T i 1 3 L = L 1 + L 2 + L 3 + L 4
Similarly, from Figure 5, the time relationship for each stage in the four-stage method is shown in Equation (8)
T i j 1 T i j 0 = T 1 T i j 2 T i j 1 = T 2 T i j 3 T i j 2 = T 3 T i j 4 T i j 3 = T 4 T i j 4 T i j 0 = T
The T3 and Vi1 are calculated by Equations (6)–(8), respectively, as shown in Equation (9).
T 3 = V max V i 1 0 a max T 1 a max V i 1 = V i 1 0 + a max T 1
The objective function is calculated in Equation (10).
min F = T 1 T + V max V i 1 0 a max T 1 a max T + V max V i 1 0 + a max T
The appropriate T1 is calculated by Equation (10), after which the speed control curve of the leading vehicle is determined.
Step 2: Calculate the speed control curve of subsequent vehicles.
The speed control curve of the subsequent vehicles can be computed sequentially, following the determination of the speed control curve of the leading vehicle. The ensuing discussion will address these two cases in turn, with reference to the aforementioned leading vehicle.
(a)
Two-stage method
If the leading vehicle is able to successfully negotiate the intersection in a manner that is both smooth and efficient by employing the two-stage method, it is of the utmost importance that subsequent vehicles in the optimization cycle are also capable of utilizing this method. The aforementioned calculations demonstrate that the two-stage method is applied to subsequent vehicles entering the control area within the interval of  t i 1 0 , t i e n d 0 .
(b)
Four-stage method
When the leading vehicle is able to pass through the intersection in a smooth and uninterrupted manner via the four-stage method, subsequent vehicles do not adopt this method. Upon entering the control area, the speed control curves for both the two-stage and four-stage methods are calculated simultaneously. The speed control calculation method for subsequent vehicles is similar to that of the leading vehicle. However, in contrast to the leading vehicle, subsequent vehicles are required to maintain a safe driving distance from the vehicle in front while travelling. This requirement is satisfied by Equation (11).
Δ S t = S i j 1 t S i j t S
If subsequent vehicles can pass through the intersection smoothly via the two-stage method, then the method will be adopted. Otherwise, the four-stage method will be used.
Step 3: Determine whether the vehicle can leave the intersection before the green light ends.
Upon entering the control area, calculations demonstrate that the vehicle will encounter a red-light queue upon reaching the downstream intersection. To minimize average delay time and ensure the vehicle reaches Vmax before entering the intersection, it is necessary to guide the vehicle to pass through the intersection without stopping in the next optimization cycle. This will result in the vehicle becoming the leading vehicle in the subsequent optimization cycle, as shown in Figure 6.
Step 4: Calculate the traffic shockwave under a connected vehicle environment.
Step 4.1: Calculate the traffic gathering wave.
When the traffic signal turns red, the leading vehicle begins to stop, and subsequent vehicles adjust their speed and trajectory in accordance with the leading vehicle. The vehicle establishes a communication connection with the roadside unit and on-board unit. At this point, the on-board unit (OBU) transmits a continuous stream of requests to the roadside unit (RSU) for the real-time intersection group signal timing schemes and the remaining red light time of the current phase. This paper assumes a communication delay (defined as tdelay) between vehicles and infrastructure in a connected vehicle environment.
The time for the gathering wave to travel to the jth vehicle (defined as Tj_gather) can be calculated from Equation (12).
T j _ g a t h e r = σ × e ς e ς j ς × k 2 k 1 k 1 k 2 V i j
where  ς  and  σ  are constants to be calibrated based on the traffic data. k1 is the normal travel speed of the traffic flow. k2 is the congestion density of the traffic flow.
Step 4.2: Calculate the traffic dissipation wave.
When the traffic signal turns green, vehicles that were previously stopped in front of the stop line begin to move and accelerate through the intersection. The speed of the wave begins to travel backwards, following the dissipation wave received by the vehicle after time tdelay_1. The vehicle accelerates quickly until it reaches its maximum speed and then passes through the intersection at a constant speed. The time it takes for the dissipation wave to reach the jth vehicle at this stage (defined as Tj_dissipat) can be calculated using Equation (13).
T j _ d i s s i p a t = σ × e ς e ς j ς × h t 1 k 2 V max
Equation (14) calculates the red-light time for Ith phase (including the phases in all four directions of each intersection, such as the eastward, westward, southward, and northward phases).
R t = C L I g I
where gI represents the effective green light time at Ith phase. C is the cycle time. LI is the loss time at phase I of the intersection. The queue time for the signal control scheme at Ith phase is Tj_n, which is calculated by Equation (15).
T j _ n = ( R t t d e l a y ) T j _ g a t h e r + T j _ d i s s i p a t
Substituting Equations (13)–(15) into Equation (16),
T j _ n = C L I g I t d e l a y σ × e ς e ς j ς × k 2 k 1 k 1 k 2 V i j h t + 1 k 2 V max
Step 5: Calculate an evaluation index for combination of the green wave and the turn wave.
The traffic state within the green wave is quantified through the measurement of the speed and trajectory of connected vehicles, with the vehicle origin–destination (OD) trip matrix subsequently recorded.
The variables m and k represent traffic flow (including going straight and turning right or left) into and away from the green wave section ID (i.e., 1 < m < n, 1 < k < n + 1), respectively. As the arterial green wave is bidirectional, mk. Nm,k represents the number of vehicles travelling from section m to section k, whether stopping or continuing their journey (including the turn wave).
Define the vehicle origin–destination (OD) matrix. When mk < 2, the vehicle is not entering the arterial green wave of coordinated vehicles.
When mk ≥ 2 is satisfied, the vehicle is a vehicle that enters the arterial green wave coordination. Additionally, count the number of vehicles entering the road section where the green wave is located from the section m.
N m = m = k n + 1 N m , k
The evaluation index using the number of vehicles OD trip matrix is calculated as follows.
Step 5.1: Design the actual traffic efficiency value for the green wave.
The efficiency of the green wave coordination is measured by the actual traffic efficiency value, defined as IR. This value represents the number of intersections that a vehicle (including going straight and turning right or left) passes through continuously while in the green wave coordination and control scheme. The equation for calculating IR for all vehicles is based on the product of the number of vehicles and the number of intersections they pass through. It is crucial to emphasize that the scoring value for a single vehicle, IR, represents the objective scoring for the vehicle in the arterial green wave in its actual operational state. The equation for calculating IR for all vehicles is as follows:
I R = m = 1 n 1 k = 1 n + 1 N m , k + 2 × ( m k + 2 )
Step 5.2: Design the ideal value for green wave traffic efficiency.
In order to enter the green wave, it is optimal for vehicles (including those travelling straight and turning right or left) to depart from the first encountered green light intersection and drive away from the arterial green wave through the various intersections that have green lights. The optimal number of consecutive intersections through the green wave (Il) can be determined by combining it with the vehicle OD trip matrix. This represents the green wave control effect of the ideal state of operation. The product of the departure point of all OD trips is multiplied by Il to obtain the objective scoring value of the arterial green wave for the ideal number of consecutive intersections through. For all vehicles, the value of Il is calculated using Equation (19).
I l = m = 1 n 1 N m × ( n + 1 m )
Step 5.3: Design the disturbance value for green wave traffic efficiency.
The calculation of the optimal traffic efficiency value for the green wave excludes vehicles that do not enter the arterial green wave coordination from the statistical range. This guarantees that the evaluation outcomes of the vehicle coordination are consistent, even in circumstances where there are a considerable number of vehicles that do not adhere to the arterial green wave coordination. It is imperative to circumvent such scenarios to ensure the integrity of the study’s findings. In addition, it is acknowledged that different green wave coordination schemes possess control scenarios that are not conducive to repetition in practical applications. In order to ensure the objectivity of the evaluation results, the definition of the green wave incorporates a perturbation value. This value is of particular significance in coordinating the green wave on road sections with varying traffic volumes and intersection patterns. The objective scoring methodology considers only vehicles that are part of the green wave coordination, represented by a single vehicle (defined as ID). The scoring value for vehicles not included in the green wave coordination is defined as 1. The equation for all vehicle ID is calculated as follows:
I D = m = 1 n N m , k × 1 + m = 1 n + 1 N m , k + 1 × 1
Step 5.4: Design the evaluation index for green wave coordination.
The evaluation of green wave coordination is based on the actual traffic efficiency value. The size of the arterial green wave disturbance value is taken into consideration to calculate the degree of approximation of the value in question in comparison to the ideal operating state of the green wave. The evaluation index (defined as IE) for green wave coordination is calculated using Equation (21).
I E = I R I l I D
The traffic state of the arterial green wave is defined as S, and the traffic state of the ideal green wave is defined as SI. When the traffic state of the arterial green wave is approaching the ideal condition, the impact of disturbances on the green wave is negligible ( lim S S I I R = 0 ), and the actual efficiency value (ID) approaches the ideal efficiency value (Il) ( lim S S I I D = I I ). Therefore, the maximum value of the green wave coordinated evaluation index represents the ideal state ( lim S S I I E = lim S S I I R I I I D = lim S S I I I I I = 1 ).
To evaluate the effectiveness of bidirectional green wave coordinated controls, we use the evaluation index of green wave coordination (IE). The higher the value of IE, the better the green wave control. If  I E 1 I E 2 ε  (where ε is any infinitesimal), the value of IR is compared. The higher the value of IR, the better the green wave control. If  I R 1 I R 2 ε , the value of ID is further compared. The higher the value of ID, the better the effect of the green wave control.

3.2.2. Constructing the Collaborative Model Under a Connected Vehicle Environment

(a)
Correlation analysis of speed, signal offset and green wave bandwidth
In a connected vehicle environment, real-time information about the vehicles is obtained, including speed, position, vehicle spacing, and distance. Subsequently, the average speed of each pertinent roadway is determined through the application of Equation (22) following the completion of data pre-processing.
V ¯ = j = 1 n v j n s . t .   V i j e n d = V max t i j e n d = t i 1 e n d + n = 2 j h i n l i j e n d = l i 1 e n d h i j S 0 + l e n i j 1 V i j e n d h t v i ( t ) V max ,   t t i 1 0 , t i e n d 0   v i ( t ) V min ,   t t i 1 0 , t i e n d 0
where  V ¯  is the average speed. j is the jth vehicle. n is the total number of vehicles.
The speed of the vehicle affects the optimization of phase offset and the calculation of the bidirectional cycle green wave bandwidth to a certain extent. During green wave coordinated control, variations in phase offset between intersections can result in a shift in the green wave bandwidth, which in turn affects the speed of the guided vehicle.
(b)
Bidirectional cycle green wave bandwidth optimization
The control objective expression for the maximum number of vehicles passing in the green wave bandwidth time, as set out in Equations (23)–(25), was designed on the understanding that the intersection coordinated phase is of great importance for the release of traffic in different directions.
P = m = 1 N λ m z b m z s b m z e q m z d t d t t i e n d 0 t i 1 0 a m z t d t m = 1 N λ m f b m f s b m f e q m f d t d t t i e n d 0 t i 1 0 a m f t d t
H = m = 1 N λ m z b m z s b m z e q m z d t d t t i e n d 0 t i 1 0 a m z t d t + m = 1 N λ m f b m f s b m f e q m f d t d t t i e n d 0 t i 1 0 a m f t d t
min B 1 = min P H
where P represents the control objective expression of forward green wave bandwidth. H represents the control objective expression of reverse green wave bandwidth. bmzs denotes the start-time point of forward green wave bandwidth at the mth intersection. bmze denotes the end-time point of forward green wave bandwidth at the mth intersection. Where bmfs represents the start-time point of reverse green wave bandwidth at the mth intersection, bmfe represents the end-time point of reverse green wave bandwidth at the mth intersection. qmzd indicates the arrival rate of the forward traffic flow at the mth intersection. qmfd indicates the arrival rate of the reverse traffic flow at the mth intersection. amz denotes the acceleration of the forward traffic flow. amf denotes the acceleration of the reverse traffic flow. λmz is the passing coefficient of the forward traffic flow at the mth intersection. λmf is the passing coefficient of the reverse traffic flow at the mth intersection. N is the number of intersections.
The discretized control objective is calculated by Equations (26)–(28).
p = m = 1 N λ m z i = 1 z q m z d b m z s ( i ) i = 1 z v m z t i 1 0 ( i ) m = 1 N λ m f i = 1 f q m f d b m f s ( i ) i = 1 f v m f t i 1 0 ( i )
h = m = 1 N λ m z i = 1 z q m z d b m z s ( i ) i = 1 z v m z t i 1 0 ( i ) + m = 1 N λ m f i = 1 f q m f d b m f s ( i ) i = 1 f v m f t i 1 0 ( i )
min B 1 = min p h
where z is the number of time slots in the forward green wave bandwidth. f is the number of time slots in the reverse green wave bandwidth.  q m z d b m z s ( i )  denotes the arrival rate of traffic at the mth intersection with green wave start time bmzs in the forward direction for the ith period.  q m f d b m f s ( i )  denotes the arrival rate of traffic at the mth intersection with green wave start time bmfs in the reverse direction for the ith period.
The optimization goal is to balance bidirectional green wave bandwidths by minimizing the disparity in vehicle throughput between forward and reverse directions. Furthermore, the maximum number of vehicles passing through the overall green wave bandwidth is obtained by summing the number of vehicles passing through both the forward and reverse green wave bandwidths. However, the variation in demand for bidirectional traffic control is indicated by the inclusion of traffic passing coefficients, which demonstrate the varying levels of significance of bidirectional traffic flow. In practical applications, the traffic passing coefficients of each intersection can be set according to the actual traffic demand, as observed through empirical data. To illustrate, the disparate control requirements of outbound and inbound traffic on the corresponding arterials during the morning and evening peaks can be incorporated. This can be achieved by dividing the bidirectional traffic flow data into different time periods and designing the corresponding bidirectional traffic passing coefficients for each period. It is necessary that the coefficients satisfy the relationship shown in Equation (29).
λ m z + λ m f = 1
(c)
Model establishment
The conventional approach to determining the green wave bandwidth is to calculate the duration of the green wave band based on the spatiotemporal trajectories of the first and last vehicles that pass through the coordinated control system without stopping along the coordinated control direction. However, the proposed method actively guides the vehicle within a specified range prior to reaching the stop line at the intersection. Therefore, the traditional method of obtaining the green wave bandwidth is not applicable to the present study. To optimize the bidirectional comprehensive green wave bandwidth, we use the green wave bandwidth optimization design software V1.0. Based on the multi-level speed guidance collaborative model, it is employed to optimize the phase offsets between the intersections.
In arterial green wave coordination control, the green wave bandwidth is typically modified in response to alterations in the phase offset. Consequently, the process of identifying the optimal green wave bandwidth for bidirectional cycle coordination requires the identification of the most effective combination of phase offsets between intersections. The variable of interest is the phase offset of each intersection, which ranges from 0 to C, where C represents the cycle time. The objective of the optimization process is to identify the maximum value for the green wave bandwidth, denoted by maxb. The DQNGA algorithm offers a significant advantage in the resolution of combinatorial optimization problems. The model considers the importance of traffic release in different phases of directional coordination at each associated intersection. Accordingly, the phase offsets between the associated intersections are calculated in order to obtain the optimal comprehensive green wave bandwidth for bidirectional cycles. The objective of controlling the maximum number of vehicles passing continuously within the comprehensive green wave bandwidth has been successfully achieved.
The control variables for this study are the vehicle guidance speed and the bidirectional cycle green wave coordinated phase offset. The objective of constructing the collaborative optimization model is to achieve the maximum average vehicle speed and the bidirectional cycle comprehensive green wave bandwidth. The objective function is presented in Equation (30).
max Z = ψ V ¯ + ζ b s . t .   V i j e n d = V max t i j e n d = t i 1 e n d + n = 2 j h i n l i j e n d = l i 1 e n d h i j S 0 + l e n i j 1 V i j e n d h t v i ( t ) V max ,   t t i 1 0 , t i e n d 0   v i ( t ) V min ,   t t i 1 0 , t i e n d 0 min p h b max b g = min max ( g m 1 , m , g m , m + 1 ) , g max g m 1 , m min g m 1 , m = L e n m 1 , m o p t v m 1 , m p a s s + n m 1 , m p a s s n t m 1 , m g m 1 , m max g m , m + 1 min g m , m + 1 = L e n m , m + 1 o p t v m , m + 1 p a s s + n m , m + 1 p a s s n t m , m + 1 g m , m + 1 max g = ( C g y ) q z   o r   f max I = 1 s q z   o r   f max g min g g max T j _ n > 0 t d e l a y min t d e l a y t d e l a y max
where  Z  is the total objective function.  ψ  and  ζ  are the weights of average vehicle speed and comprehensive green wave bandwidth, respectively. If there is no queue at the stop line, the vehicles’ fleet will pass through the stop line at a speed defined as  v m 1 , m p a s s . gm−1,m is the required green time from m − 1th intersection through mth intersection. gm,m+1 is the required green time from mth intersection through m + 1th intersection.  L e n m 1 , m o p t  is the length of the vehicles’ fleet from m − 1th intersection through mth intersection.  L e n m , m + 1 o p t  is the length of the vehicles’ fleet from mth intersection through m + 1th intersection. tm−1,m is the time when the vehicle enters the control area from m − 1th intersection through mth intersection. tm+1,m is the time when the vehicle enters the control area from mth intersection through m + 1th intersection.  n m 1 , m p a s s  is the number of vehicles passing mth intersection continuously without stopping from the m − 1th intersection.  n m , m + 1 p a s s  is the number of vehicles passing m + 1th intersection continuously without stopping from the mth intersection. n is the total number of vehicles. g is the green time of coordinated direction. gmax is the maximum green time. g′ is the green time of uncoordinated direction. C is the cycle time.  q z   o r   f max  is the maximum single-lane traffic flow of forward or reverse direction. I is the number of phases. s is the total number of phases. y is the total lost time, including amber light time and start losing time of all phases. gm−1,m and gm,m+1 satisfy the constraints of the corresponding maximum and minimum green time. Meanwhile, g′ satisfies the constraints of the corresponding maximum and minimum green time.

3.2.3. Calculation Method of the Collaborative Model

The objective of the present study is to ascertain the impact of phase offset, or red-light queue dissipation time at downstream intersections, on the green wave bandwidth. The number of vehicles passing in the green wave bandwidth with different phase offsets is calculated, and a suitable phase offset is selected with the objective of maximizing the number of vehicles passing in the comprehensive green wave bandwidth. In the context of green wave coordinated control, it is assumed that no secondary queuing occurs. The vehicles are assumed to travel between the upstream and downstream intersections in accordance with the illustration in Figure 7.
In order to achieve the objective of maximizing the number of vehicles passing through the green wave coordinated bandwidth, it is necessary to calculate Tgb and Tge when the phase offset is determined. To facilitate the description of the calculation method, the definitions of the variables are presented in Table 2.
Step 1: The green time of each phase at the upstream intersection is divided according to the different traffic flows (e.g., qg1, qg2, qr1 and qr2 in Figure 7), and the traffic flows and the corresponding average speed of vehicles in each time period are determined (e.g., v1, v2, v3 and v4 in Figure 7).
Step 2: At the determined phase offset, the virtual travel lane is plotted to downstream intersection i + 1 by the average speed corresponding to each period during the green light time of the coordinated phase at upstream intersection i.
Step 3: Search for the driving line that encounters the red time at downstream intersection i + 1 for the first time in the above virtual driving line. Mark the starting time of the red light at the downstream intersection i + 1 corresponding to this driving line as t1,i+1, and the green time at the upstream intersection i corresponding to this driving line as tgbi.
Step 4: The driving lines corresponding to each time point are plotted separately for one cycle after starting from tgbi corresponding to the upstream intersection i. The driving lines at each time point are also plotted as they encounter the gather–disperse wave speed line at the downstream intersection i + 1 to determine tm,i+1.
Step 5: Determine whether the driving line passes through the interval enclosed by tm,i+1 during each period of the green time of upstream intersection i. If not passed, record the green time Tgb,i of the upstream intersection i corresponding to the first line. The green time Tge,i of the last line corresponding to the upstream intersection i. Record the green time  T g b , i + 1 c  of the first line corresponding to the downstream intersection i + 1, and the green time  T g e , i + 1 c  of the last line corresponding to the downstream intersection i + 1.
Step 6: Obtain the final  T g b i  and  T g e i  of the Nth intersections corresponding to the public green wave band within the coordinated range: taking the i + 1th intersection as an example, if  T g b , i + 1 c > T g b , i + 1 , then  T g b , i + 1 c T g b , i + 1 = 0 . If  T g e , i + 1 c < T g e , i + 1 , then  T g e , i + 1 c T g e , i + 1 = 0 . On this basis,  T g b , i + 1 , T g e , i + 1  is taken as the intersection i + 1, executes the effective green time of green wave coordinated control, and executes Step 5 again to obtain the updated  T g b , i + 1  and  T g e , i + 1 . Meanwhile, obtain  T g b , i + 1 c  and  T g e , i + 1 c , and so on until road intersection N.
Step 7: Backtrack from Tgb,N and Tge,N of intersection N and update Tgb,i+1 and Tge,i+1 (i = N – 1, N − 2, ..., 1) in turn.
When applying the proposed method for bidirectional cycle green wave coordinated control, it is important to note that the variables of each intersection, including qg1, qg2, qr1 and qr2, tg1 and tr1, may change due to variations in control parameters. Therefore, it is necessary to update the coordinated phase offset of the green wave to ensure effective control. Thus, the phase offset optimization calculation needs to be redone based on the updated parameter values. To solve this issue, this paper proposes a cyclic optimization method for the phase offset at control cycle intervals. The steps are outlined below.
Step 7.1: The traffic flow in the control cycle obtained from the connected vehicle environment before the implementation of the green wave is divided, and the values of qg1, qg2, qr1, qr2, tg1, tr1 are recorded.
Step 7.2: Determine the optimized phase offset.
Step 7.3: Green wave coordinated control is implemented.
Step 7.4: The traffic flow in the control cycle obtained from the connected vehicle environment after the implementation of the green wave is divided, and the values of qg1, qg2, qr1, qr2, tg1, tr1 are recorded.
Step 7.5: Determine whether the difference of the above-mentioned variables compared to the previous one satisfies the requirements. If so, the method is finished. Otherwise, go to Step 7.2.

3.3. Bi-Level Combinatorial Optimization Method for Model Solving

The bi-level combinatorial optimization method is implemented by gathering real-time signal timing data, vehicle travel information (such as speed and position), road traffic conditions, and other relevant data through the connected vehicle environment and traffic sensors. The data thus obtained are used to divide the collaborative combinatorial optimization control of the related intersection group into upper- and lower-layer control. The lower layer comprises multi-level vehicle speed guidance controllers at each intersection. Each controller employs a distinct learning strategy. The upper layer comprises bidirectional cycle green wave coordinated controllers, which adjust the temporary strategy of the lower layer primarily based on the feedback state of the lower layer. The upper- and lower-layer controllers jointly regulate the signals of the arterial roads within the study area. The bi-level combinatorial optimization method is illustrated in Figure 8. The following flowcharts have been designed to enhance the readability of the DQNGA algorithm described in this paper, which is shown in Figure 9.

3.3.1. The Lower-Layer Control

In contradistinction to static signal timing or heuristic approaches, the DQN-based framework is characterized by its capacity to adapt dynamically to fluctuations in traffic conditions, such as the influx of vehicles during phase transitions. It reduces the necessity for manual tuning through data-driven learning. It achieves hierarchical coordination via global–local rewards, prioritizing arterial road efficiency without disrupting access roads. The paper presents a methodology for selecting an action in the context of a connected vehicle environment, based on the state st and reward rt transmitted by the connected vehicles during a specified time period t. The action set At is used to select the appropriate speed to guide the vehicles at the intersection according to the current phase of the green time and phase transition state [49]. The state st comprises the vehicle movement trajectories and their moving states in each segment of the related intersection groups. This information is transmitted by the connected vehicle environment. The reward, rt, is a form of feedback that is provided following the selection of an action, at. The reward is based on several performance indexes, including the optimal guiding speed, the maximum average vehicle speed, and the maximum number of consecutive vehicles passing through the related intersection without stopping. Figure 10 shows the framework of the lower-layer control method. The DQN, which facilitates end-to-end learning, maps complex traffic states to optimal control actions, thereby overcoming the limitations of rule-based methods in dynamic environments. The discretization of control segments and multi-channel feature extraction have been shown to reduce dimensionality while preserving lane-specific behaviors. The integration of MDP with DRL ensures real-time adaptability while maintaining safety via phase transition rules. The global–local synergy effect is a further key element in preventing local optima, by penalizing actions that harm arterial throughput (for example, by reducing additional rewards when local decisions conflict with global goals).
(a)
State space
In order to provide an accurate description of traffic conditions at arterial intersections, this study selects a number of parameters, including the operating state of vehicles in each direction of the intersection, their movement trajectories, and variations in signal phases, which are used as state inputs. Furthermore, the study area is divided into discrete units and modelled in order to accurately represent the specific distribution of position, speed and lane-change information for vehicles. The intersection groups are subdivided into discrete control segments, each of which is associated with a specific attention zone for connected vehicles. These segments are represented by a space characteristic matrix. The corresponding element in the space characteristic matrix is assigned a value of 1 when a vehicle is present in the free control segment of the lane, and a value of 0 otherwise.
The present study focuses on the distribution of vehicle positions and the associated lane-change information. A matrix is constructed to include the position of vehicles with multiple free control segments and the distance to the intersection. Moreover, the lane-change behavior of vehicles is also taken into account. This is determined based on the vehicles‘ movement trajectories and included in the vehicle lane-change characteristic matrix. In the event of a lane change, the element is recorded as 2; otherwise, it is recorded as 3.
In order to minimize computational effort, the speed of vehicles with multiple free control segments is stored in the speed characteristic matrix based on the vehicle operating state information detected by detectors. For each free control segment, the single-channel convolutional Q network detects the vehicle operating state information, including the position and speed of vehicles, during the time interval t. The corresponding elements are then recorded in the speed characteristic matrix. In the event that the aforementioned conditions are not met, the speed characteristic matrix is supplemented with a value of zero. Additionally, the current signal phase and its time variation are included in the state, as they are pivotal factors in the decision-making process. The aforementioned elements are then recorded in the aforementioned characteristic matrix.
(b)
Action space
In order to enhance the efficacy of intersection groups, an intelligent control scheme of signals has been devised which establishes a flexible action space. The optimal speeds are selected to direct vehicular traffic at the intersection based on the prevailing phase of the green signal and the prospective phase transition state, taking into account all potential signal phases for each lane. The vehicle speed guidance strategy and feedback values are implemented based on a Markov Decision Process (MDP) mathematical model of the current phase of the green time and phase transition state. The MDP control strategy is integrated with iterative trials in deep reinforcement learning (DRL) to identify the optimal strategy for speed guidance when the feedback value is minimal. It should be noted that right-turn lanes are not included in the considerations, as they do not conflict with other lanes and are always accessible. Each loop represents a single phase of the green time and phase transition within a single phase cycle. The unit time of the loop is divided into discrete intervals. Subsequently, the currently active phase will be updated to the selected phase sequence state. The model establishes the maximum and minimum green times and their corresponding durations required to achieve phase transition. This signifies that a phase will transition to the subsequent phase upon the expiration of the maximum green light interval or in the event that the minimum green light interval is not satisfied. The original control scheme provides the basis for the iteration updates.
(c)
Reward space
In order to evaluate the efficacy of the selected action for each time interval and facilitate the vehicle’s adaptation to an optimal speed guidance strategy, the reward function furnishes global and local feedback on the performance of preceding actions. The local agent is responsible for monitoring and regulating the vehicle’s operations within a single intersection. Based on the observed vehicle operating states, the local agent generates and executes the necessary actions within the intersection, and subsequently outputs the current reward. In contrast, the global agent monitors the global state in order to assess the extent to which the local agent’s actions align with the overarching objective. Furthermore, additional rewards are provided to the local agent with the objective of enhancing the global efficiency of traffic flow at related intersection groups [44]. Figure 11 illustrates the hybrid learning framework of reward space.
The control objective of the local reward function is to maximize the average speed of vehicles, which allows for interactive and cooperative feedback optimization with the global agents [50]. The local reward (defined as  r t l o c a l ) includes the average speed reward and the additional reward, which is given in Equation (31).
r t l o c a l = r t Δ t a v g _ s p e e d + r t g l o b a l _ a d d
The  r t Δ t a v g _ s p e e d  is calculated by Equation (32).
r t Δ t a v g _ s p e e d = max j = 1 n T = t Δ t T n v j T n j = 1 n T = ( t 1 ) Δ t T n 1 v j ( T 1 ) n
where  r t Δ t a v g _ s p e e d  is the average speed reward.  r t g l o b a l _ a d d  is the additional reward. vjT is the speed of the jth vehicle within the time interval T. vj(T−1) is the speed of the jth vehicle within the time interval T − 1. j is the jth vehicle. n is the total number of vehicles. Since the right-turn vehicles are not affected by the traffic light, the average speed reward will not include the right-turn vehicles. The average speed reward function is expressed as all vehicles except the right-turn vehicles at the intersection within the sliding time window Δt. It is worth noting that the number of vehicles may still increase during the phase transition. So, Δt should contain the amber light time.
When the actions of local agents impact the optimal speed at the global level, global agents reduce additional rewards and prevent local agents from enhancing the efficiency of the intersection, thereby reducing global rewards. This approach ensures an overall optimal speed guidance strategy in arterial coordination.
In order to guarantee objectivity, it is essential that the global reward is based exclusively on the state space of each local agent. The number of vehicles carried by the global agent should be adjusted for all lanes with the objective of maximizing continuous, non-stop throughput of the main arterial roads while avoiding any impact on traffic flow on the access roads. Meanwhile, in order to achieve the objective of maximizing the continuous non-stop throughput of the main arterial roads while avoiding any adverse impact on the traffic flow on the access roads, the global reward function (defined as  r t g l o b a l ) is expressed as the continuous non-stop throughput [51]. The calculation is presented in Equation (33).
r t g l o b a l = max N m a i n R d T M N m a i n R d T 1 M s . t .   N m a i n R d T = T = t Δ t T n N p a s s T + T = t Δ t T n ( N o u t _ a c c e s s R d T N i n _ a c c e s s R d T ) N m a i n R d T 1 = T = ( t 1 ) Δ t T n 1 N p a s s T 1 + T = ( t 1 ) Δ t T n 1 ( N o u t _ a c c e s s R d T 1 N i n _ a c c e s s R d T 1 )
where  N m a i n R d T  is the total number of vehicles passing through the main arterial road during the time period from T to T + 1.  N p a s s T  is the number of vehicles continuously passing without stopping on the main arterial road during the time period from T to T + 1.  N o u t _ a c c e s s R d T  is the number of vehicles leaving the access road and entering the main arterial road from time T to time T + 1.  N i n _ a c c e s s R d T  is the number of new vehicles entering the access road from time T to time T + 1.  N m a i n R d T 1  is the total number of vehicles passing through the main arterial road during the time period from T − 1 to T N p a s s T 1  is the number of vehicles continuously passing without stopping on the main arterial road during the time period from time T − 1 to time T N o u t _ a c c e s s R d T 1  is the number of vehicles leaving the access road and entering the main arterial road from time T − 1 to time T N i n _ a c c e s s R d T 1  is the number of new vehicles entering the access road from time T − 1 to time T. M is the number of intersections.
Given the varying levels of importance attributed to access roads in comparison to the primary arterial routes, it is essential to assign distinct weights to the constituent elements of each. Equation (34) calculates the global reward function of main arterial roads and intersecting access roads at time interval T (defined as  R T g l o b a l ), assuming that it contains ax elements.
R T g l o b a l = δ = 0 1 x = 1 a x χ δ , a x r δ , a x , t g l o b a l
where δ is a symbol, and its value of 0 indicates an intersecting access road, otherwise, its value of 1 indicates a main arterial road.  χ δ , a x  is the weight coefficient.  r δ , a x , t g l o b a l  is the global reward value of element ax.
Equation (34) shows that to enable the continuous flow of vehicles through the intersection and ensure easy convergence of the model,  R T g l o b a l  must be maximized and  R T g l o b a l > 0  during the learning process.
DQN algorithm
The fundamental component of the DQN model is a convolutional neural network. Training is conducted using Q-learning to obtain the output, which represents the estimated Q-value of the optimal speed guidance strategy. The Q-value represents the total rewards that an agent can obtain when acting in state st. It can be approximated by selecting the action At+1 that yields the maximum Q-value Q′, which is calculated using Equation (35).
Q s t , A t = r t + 1 + μ r t + 2 + + μ m 1 r t + m r t + 1 + μ max A Q ( s t + 1 , A t + 1 θ ) , θ
where  Q ( s t + 1 , A t + 1 θ )  is the Q value when an action At+1 is selected under state st+1 and the current network parameter θ is updated for replication to the target network θ′, μ is the discount rate that increases a penalization for future rewards compared to the immediate reward rt+1. The larger μ is, the more agents would focus on subsequent rewards, while, if μ is smaller, agents would pay more attention to the current reward. In order to provide stable updates in each iteration, a separate target network θ′ is used to generate Q values. The parameters in the main neural network are updated by back propagation, where θ′. is updated based on θ in Equation (36).
θ = ξ θ + 1 ξ θ
where ξ is update rate, which indicates the impact of the new parameters on the target network degree.
Figure 10 illustrates the utilization of two neural networks by the DQN to enhance the stability of the training process. The Q′ value represents the predicted value of the neural network (NN) based on a given input sample. In the DQN framework, two NNs will be employed to predict the Q values, with one derived from the base NN model and the other from the target NN model. The mean square error (MSE) between the two Q′ values is employed as a loss function, facilitating the updating of the weights of the NNs. Following the completion of one episode, the weights of the base NN are copied or updated to the target NN with the objective of minimizing the loss function. This process reduces the error term of the network to a finite interval and ensures that the Q values are within a reasonable range. Consequently, all vehicles will attain a globally optimal speed at the pertinent intersection groups.
The loss function (defined as J(θ)) is expressed as the MSE between Q′ predicted from the base NN and the target NN network, which is calculated by Equation (37).
J ( θ ) = E 1 N t r a i n i n g r + μ max A Q t ( s , A θ ) , θ Q t ( s , A θ ) 2
where Ntraining is the number of times when the network is training.
In order to minimize the loss function J(θ), the Adaptive Moment Estimation (Adam) (i.e., a stochastic gradient descent method) is implemented. First, the ranking-based prioritized experience replay structure method is used to improve the learning efficiency. The calculation of the prioritized probability of samples involves increasing the replay probability of samples with an average vehicle speed in a ranking-based approach. The error (defined as  β v j ) of sample vj is calculated by Equation (38).  β v j  arranges in order, and let the priority  p v j  be the reciprocal of their order.
β v j = p v j P τ j p v j a l l P Q ( s , A θ ) , θ v j Q ( s , A θ ) v j
where  P τ  is the number of using priority, when  P τ  is 0, the random sampling is taken.  p v j P τ  is the priority of vehicle speed.  j p v j a l l P  is the total number of vehicle speeds’ priority.
Second, the weights in the neural network are updated by the gradient of loss function with learning rate (defined as  ω β v j ), which is calculated by Equations (39)–(41). In addition, the ranking-based prioritized experience replay structure method is used to explore possible actions at the beginning of the training stages. The agent will randomly choose an action with probability of  y ( p ) = p v j P τ j p v j a l l P . Otherwise, the agent will choose an action At+1 that obtains the maximum Q′ value predicted by the training neural network.
J ( θ ) θ = E 1 N t r a i n i n g r + μ max A Q ( s , A θ ) , θ Q ( s , A θ ) Q ( s , A θ ) θ
θ t + 1 = θ t ω β v j J t θ t
ω β v j = 1 ω β 0 h H p ˙ v j P τ 1 ρ v j P τ p ˙ v j a l l P 1 ρ v j a l l P + ϖ
where  ω β 0  is the initial learning rate. h is the current episode number. H is the total number of episodes.  p ˙ v j P τ  is the updated first moment of  p v j P τ p ˙ v j a l l P  is the updated second moment of  p v j a l l P ρ v j P τ  is the first exponential decay rate of  p v j P τ ρ v j a l l P  is the second exponential decay rate of  p v j a l l P ϖ  is the coefficient that gives stability to the value.
The detailed algorithms of the DQN are given in Algorithm 1.
Algorithm 1 The pseudo code of DQN algorithm
Initialize replay memory D with capacity N
Initialize reward function Q with random weights θ
Initialize target reward function Q′ with weights θ′
For episode = 1, H do
Initialize current state st
For t = 1, T do
With learning rate  ω β v j  select a random action at
Otherwise select  a t = arg max a Q ( s t , a θ )  
   Execute   action   a t   in   emulator   and   observe   reward   r t = { r t l o c a l , r t g l o b a l }
  Set next state st+1
  Store (st, at r t K , st+1) in D
  Sample random minibatch of (st, at r t K , st+1) from D
  For every (st, at r t K , st+1), N do
  Set  Q t = r t , If   episode   terminates   at   time   step   t r t + 1 + μ max A Q ( s t + 1 , A t + 1 θ ) , θ , Otherwise  
  Calculate the loss function value J(θ)
    Perform the Adaptive Moment Estimation
  (including (38) ~ (40)) with minimizing J(θ) to update θ
    Every t steps, copy weights from base NN to the target NN,
    reset Q′ = Q
  End For
End For
End For
Until training completed

3.3.2. The Upper-Layer Control

In terms of lower-layer control, the upper layer coordinates the control of related intersection groups based on predetermined conditions, such as cycle time, green ratio, and phase offset. Subsequently, the control scheme is then updated based on the initial and target states of the vehicles, as output by the lower layer. Upon adjustment of the phase offset, the aforementioned process is repeated, and the resulting green wave bandwidth after the adjustment is obtained. Consequently, the automatic adjustment of the phase offset to achieve the optimal bidirectional cycle-coordinated green wave bandwidth is essentially a problem of combining and optimizing the phase offset between intersections. The objective is to combine the phase offset of each intersection, which should take values in the range of [0, C]. The objective of the optimization process is minB1. The core process of combinatorial optimization is the search for the optimal green wave bandwidth.
Given the significant advantages of the genetic algorithm (GA) in solving combinatorial optimization problems, it is decided to use the GA to obtain the phase offsets between intersections. This is done to achieve the control objective of considering the importance of traffic release in different directions with coordinated phases at intersections and the maximum number of consecutive passing vehicles within the green wave bandwidth.
The GA is designed to determine the optimal coding strategy for the phase offset variable, which has a value range of [0, C]. However, the value of C may differ in various traffic environments and may not necessarily satisfy the 2n condition. When a variable contains finite (non-2n) discrete effective values, some common binary codes may exhibit redundancy. To enhance the algorithm’s complexity, special mechanisms such as fixed remapping, random remapping, and probabilistic remapping should be employed to map redundant codes into effective codes. Consequently, the real number coding method is an effective means of resolving combinatorial optimization problems.
The procedure of the real number coding genetic algorithm is given in Algorithm 2.
Algorithm 2 The procedure for GA with real number coding
Begin
Initialization:
 {
  Select the type of genetic operation and determine the parameters such as
  crossover probability and mutation probability.
  Set evolutionary algebra counter t = 0.
  The initial population B(0) = {O1, O2, …, ON} is
  generated by constructing chromosomes with
  phase offsets between associated intersections as
  genes, where N represents the population size.
 }
  Measurement: select minB1 as the objective function, and then the fitness value
  of each individual is calculated by the initial population B(0).
  While (the termination conditions are not satisfied) do
  {
    Crossover: B(t) is performed crossover operation to generate population
    B′(t + 1).
    Mutation: B′(t + 1) is performed mutation operation to generate population
    B″(t + 1).
    Measurement: The fitness value of each individual in population B″(t + 1) is
    calculated.
    Selection: AB″(t + 1) is selected to generate a new population B(t + 1), where
    A represents a subset or empty set of B(t).
    t = t + 1.
  }
End

4. Case Study

4.1. Simulation Scenario

In order to eliminate the effects of road section peculiarities and the periodic signal timing plan, two road sections are randomly selected from the arterial coordinated road sections in Yangzhou. The result is validated by examining the arterial coordinated control of the aforementioned road sections over multiple time periods throughout the day. Traffic data were collected from microwave radar sensors and onboard diagnostics (OBD) devices deployed by Yangzhou Traffic Management Bureau. The scenario in the case study has a total of 68,342 connected vehicles. Vehicle trajectories, speeds, and signal states were synchronized via the VISSIM COM interface to ensure spatiotemporal consistency. The data pertaining to the connected vehicles are presented in Table 3. Data collection and sensor traffic data are collected via the following methods:
On-board GPS: 68,342 connected vehicles with 1 Hz sampling and GPS timestamps.
Roadside Cameras: These cameras were utilized to capture lane-specific vehicle counts at intersections, validated against roadside LiDAR measurements (±2% error).
VISSIM COM Interface: Simulated vehicle trajectories with 0.1 s resolution.
The data covered three periods (morning/evening/evening peaks) on arterial roads in Yangzhou, China.
As illustrated in Figure 12 and Figure 13, the VISSIM simulation platform is constructed upon a three-dimensional Google map. Figure 12 shows the region containing all sections within Jiangyang middle road. The yellow line is the Jiangyang middle road, while the blue line represents the Wenhui east road. The green line is Xingcheng east road. The orange line represents the Yangzi river road.
Figure 13 illustrates the spatial distribution of the various sections within Wenchang west road, The yellow section represents Wenchang west road, while the blue section represents Guozhan road. The green section represents Ruiyang middle road, while the orange section represents Baixiang road.
When three periods (morning peak, flat peak and evening peak) are observed, the traffic volume with the arterial coordinated direction of two related road sections is as shown in Figure 14 and Figure 15.
Figure 14 shows that the traffic volume in the north–south straight direction on the Jiangyang middle road section is large. Figure 15 depicts the substantial traffic volume in the east–west direction on the Wenchang west road section. With regard to the relevant intersection groups on the Jiangyang middle road, the north-south straight phase has been designated as the arterial coordination phase. The north–south direction is the primary arterial road with forward traffic flow, while the south–north direction is the primary arterial road with reverse traffic flow. For the related intersection groups of Wenchang west road, the east–west straight phase is designated as the arterial coordination phase. The east–west direction is the primary arterial road with forward traffic flow, while the west–east direction is the primary arterial road with reverse traffic flow.

4.2. Simulation Verification

In order to guarantee the reliability of the proposed model, a bidirectional cycle green wave coordinated control and multi-level vehicle speed guidance platform is constructed using VISSIM 4.30/MATLAB 2025a simulation fused with real traffic data. The MAXBAND model is employed as a reference for the simulation, while the optimization function is obtained through the DQNGA algorithm, and the results are obtained through VISSIM via the COM interface. Furthermore, the COM interface can be adjusted to simulate the connected vehicle environment and achieve multi-level speed guidance through secondary development. The practical constraints are determined based on the experimental scenario. It is initially assumed that the normal speed of vehicles during the peak period is 20 km/h, and that during the flat peak period, the normal speed is 40 km/h. The speed limit of the road section is selected as the maximum speed. The starting point of the speed adjustment area is set at 300 m before the intersection. A speed guidance simulation is conducted for the two related road sections during the morning peak, flat peak and evening peak periods under MAXBAND and the proposed model. The average speed between each intersection of each related road section is calculated, and then the phase offset between each intersection is optimized using the optimization software to obtain the corresponding bidirectional cycle comprehensive green wave bandwidth. The newly devised green wave coordination plan is input into the vehicle speed guidance simulation platform and simulated repeatedly until the average vehicle speed and the bidirectional cycle comprehensive green wave bandwidth have reached their optimal values.

4.2.1. Results of DQN with Speed Guidance

A primary convolutional neural network is selected to provide state space feedback values, thereby enabling the selection of the most valuable action. Initially, DQN generates a single training batch of data and stores the current state and action. The feedback values are stored as quaternions (s, a, r, s′) in the replay memory D. The target neural network (NN) is a separate neural network that increases the learning stability and obtains the optimal speed guidance strategy by selecting the action with the maximum Q value and updating the priority of the samples after each training. Then, the learning rate in the NN is updated by Adam back propagation. The model derives the initial control scheme based on the initial learning rate and the action with the maximum Q value feedback operation. Finally, the bidirectional cycle comprehensive green wave bandwidth at all related intersection groups is adjusted according to the optimal vehicle speed. Thus, the Rectified Linear Unit (ReLU) activation function is used for all hidden layers, while the linear activation function is applied to the output layer. The Adam optimization algorithm [52] is employed to train the NN models. The parameters of the DQN model for the purposes of model training are presented in Table 4.
In order to test the performance of the proposed method, as illustrated in Figure 16, the outcomes are compared with those of alternative algorithms, including Coordinated Deep Reinforcement Learners (CDRL) [52], Multi-agent Deep Q-learning (MADQN) [53], Cooperative Deep Q-network with Q-value Transfer (QT-CDQN) [54] and Dual Targeting Algorithm (DTA) [55].
As demonstrated in Figure 16, the Adam-optimized DQN achieves superior performance compared to baseline algorithms, as evidenced by lower delays and higher throughput, indicating that the agents in the proposed algorithm employ a more effective speed guidance strategy than the other algorithms, resulting in optimal outcomes. In order to ensure convergence stability, an integrated modified reward function based on the local reward function and global reward function is employed. The function attempts to maximize the reward obtained in each iteration in isolation. However, CDRL is based on the same structure that has been trained using transfer learning. The MADQN algorithm is characterized by an agent that is trained independently by the deep Q-learning algorithm, with no cooperation among these agents. The QT-CDQN algorithm demonstrates superior learning and convergence rates compared to both MADQN and CDRL. In comparison to both CDRL and MADQN, as well as QT-CDQN, DTA is capable of achieving faster convergence and superior stability.
The Adam is introduced in DQN with the objective of achieving a balance between exploration and exploitation of actions and states. It is generally expected that the DQN training procedure will explore a greater number of potential actions based on states at the outset, with exploitation becoming more prevalent as action strategies are refined. The DQN with Adam learning procedure is capable of obtaining previous action strategies from previous scenarios, which enables the training procedure to converge on optimal values by exploring all possible actions. To test this hypothesis, the loss function curves are examined under different discount rates at the Jiangyang middle and Wenchang road sections. The ensuing results of the training are presented in Figure 17, Figure 18 and Figure 19. As demonstrated in Figure 16, the selection of hyperparameters exerts a substantial influence on the performance of a designated traffic signal control. The general trend observed in Figure 16 is that methods with a higher number of hyperparameters (e.g., DTA, QT-CDQN, CDRL, MADQN) demonstrate a greater difference in performance than methods with a lower number of hyperparameters (e.g., DQN with Adam).
It can be observed that the DQN with Adam is capable of achieving a comparable level of stability in reward with full exploration (discount rate changes from 0.01 to 0.99), which suggests that the proposed method is efficacious in the Jiangyang middle road section and the Wenchang road section. Figure 18 and Figure 19 illustrate the loss function values of queues and delays for each intersection. These results are consistent with those previously observed with the hyperparameter search and travel data. However, readers should note the comparison of the performance of the DQN with Adam and DTA, QT-CDQN, CDRL, and MADQN in Figure 18 and Figure 19. The DQN with Adam achieves relatively low queues and latencies at the beginning and end of the simulation, whereas it outperforms the other four algorithms when the demand peaks in the middle of the simulation. In scenarios characterized by high traffic demand, the DQN with Adam has the capacity to select the subsequent phase in a non-periodic manner. This ability may facilitate a reduction in queues and delays that exceeds the capacity of the four algorithms, which are constrained by a periodicity. However, it is noteworthy that the performance of the four algorithms is compromised during periods of low demand, a time when formulating an optimal policy should be a relatively straightforward task. It can be hypothesized that the other four algorithms may have overfitted to periods of the environment where the magnitude of the reward is significant (i.e., the middle of the simulation, when the demand is at its peak) and converged to a policy that is not well-suited to the environment during times of low traffic demand.
Figure 20 and Figure 21 illustrate the trajectory and speed results of MAXBAND and the proposed models. The time axis indicates the phase of intersection 1 transferring to green, followed by vehicles entering the current road section. Additionally, the heat values in the figure indicate the speed magnitude at the current time. As shown in Figure 20a, the MAXBAND model causes vehicles to stop at intersection 1 of the Jiangyang middle road section, resulting in the formation of a queue at approximately 18 s. The queue begins to dissipate at 46 s. Meanwhile, Figure 20b demonstrates that vehicles halt at the intersection of the Wenchang Road section, resulting in the formation of a queue at approximately 12 s. Upon the transition to the subsequent phase, the queue begins to dissipate at 27 s. As demonstrated in Figure 20c,d, vehicles are observed to wait at the stop line and queue for each cycle in the MAXBAND model, resulting in significant traffic delays at the related intersection groups.
As demonstrated in Figure 21a,b, the proposed model indicates that connected vehicles have the capacity to travel collectively as a fleet. The speed of the vehicle fleet is optimized based on the current traffic signal timing, resulting in a speed of approximately 40–45 km/h, which is higher than the speed of traffic free flow. This process leads to the formation of minimal queues, thereby enabling the majority of vehicles to pass through the relevant intersection groups without halting. In contrast, Figure 21c,d demonstrate that the majority of vehicles traverse the associated intersection groups without stopping. Conversely, the majority of the curves remain parallel throughout the entire process, with minimal delays observed under the proposed model.

4.2.2. The Results of GA with the Bidirectional Cycle Comprehensive Green Wave Bandwidth

Following a series of trials, the optimal settings for the standard genetic algorithm (GA) parameters are identified as follows: a population size of 250, 250 iterations, a crossover rate of 0.7 and a mutation rate of 0.1 (see Figure 22a). As illustrated in Figure 22, the GA exhibits a faster convergence rate than the BSA and FA, as evidenced by the fitness function reaching convergence at the 80th iteration. Figure 22c demonstrates that the GA requires a shorter computational time than the BSA [56] and FA [57]. The GA exhibits an 8.23% enhancement in computational efficiency in comparison to the FA. Conversely, the GA exhibited a 50.17% increase in computational efficiency in comparison to the BSA.
The results of the comparative analysis of the average speed and phase offset at each related road section during different periods are presented in Table 5, Table 6 and Table 7.
Following the genetic algorithm (GA) optimization process, the objective function value is at its maximum when the weights of λmz and λmf are both 0.5, where m = 1, 2, 3. Figure 23, Figure 24 and Figure 25 show the comparison of the number of vehicles passing continuously through the intersection during the comprehensive green wave between the proposed method and MAXBAND.
Figure 23 demonstrates that, under the specified parameters, the application of the method results in an approximate 31.58% increase in the number of vehicles passing in the forward comprehensive green wave bandwidth during the morning peak period. Concurrently, the number of vehicles passing in the reverse comprehensive green wave bandwidth increased by approximately 35.71% during the morning peak period. In the Wenchang West Road section, the application of the method resulted in an increase of approximately 31.25% in the number of vehicles passing in the forward comprehensive green wave bandwidth during the morning peak period. Concurrently, the number of vehicles passing in the reverse comprehensive green wave bandwidth increased by approximately 41.67% during the morning peak period.
Figure 24 illustrates that the application of the method resulted in a 16.67% increase in the number of vehicles passing through the forward comprehensive green wave bandwidth in the Jiangyang middle road section during the flat peak period. Concurrently, the number of vehicles passing through the reverse comprehensive green wave bandwidth in the Jiangyang middle road section during the flat peak period was elevated by approximately 33.33%. In the Wenchang West Road section, the application of the method resulted in an increase of approximately 15.38% in the number of vehicles passing through the forward comprehensive green wave bandwidth during the flat peak period. Concurrently, the number of vehicles passing through the reverse comprehensive green wave bandwidth was elevated by approximately 27.27%.
Figure 25 shows that the implementation of the method led to a 28.57% increase in the number of vehicles passing through the forward comprehensive green wave bandwidth in the Jiangyang middle road section during the evening peak period. Concurrently, the number of vehicles passing through the reverse comprehensive green wave bandwidth increased by approximately 16.67% during the aforementioned period. In the Wenchang West Road section, the application of the method resulted in an increase of approximately 7.69% in the number of vehicles passing in the forward comprehensive green wave bandwidth during the evening peak period. Concurrently, the number of vehicles passing through the reverse comprehensive green wave bandwidth increased by approximately 22.22% during the evening peak period.

4.2.3. Simulation Results

The two control schemes are simulated in accordance with the VISSIM 4.30 software. The average delays and stops are selected as the evaluation indexes of the final control effect. As illustrated in Figure 26 and Figure 27, the comparison of evaluation results reflects the average delays and stops when connected vehicles have passed through related intersection groups.
As shown in Figure 26 and Figure 27, the proposed model significantly reduces both average delays and stopping frequency across all tested scenarios. It is evident that the delays and stops are significantly reduced under the bidirectional cycle green wave coordinated control and multi-level vehicle speed guidance collaborative optimization method. The method allows for the uninterrupted passage of connected vehicles through the relevant intersection groups. The results indicate that the traffic capacity of the relevant road section and green time utilization have been enhanced as a consequence of the method.
The robustness of the evaluation method is demonstrated by analysing the results of the combined green wave and turn wave under different traffic saturation levels, as illustrated in Figure 28.
Figure 28 presents the analysis of the green wave rating results under different traffic saturation levels. In summary, the proposed method consistently outperforms MAXBAND, particularly under high saturation (>0.6), due to its adaptive coordination of vehicle trajectories and phase offsets. Nevertheless, when traffic saturation is reduced to 0.2, it becomes challenging to effectively demonstrate the superiority of the method presented in this paper. This is due to the fact that the evaluation index IE for green wave coordination becomes unstable when the penetration rate is reduced, resulting in a certain degree of randomness within the arterial green wave. This instability is attributed to the greater volatility of the actual green wave traffic efficiency value IR in low permeability, which affects the final evaluation results. However, the evaluation method appears to be practically feasible. As shown in Figure 28, the proposed method outperforms MAXBAND significantly under high saturation (>0.6) due to its adaptive speed guidance and phase offset coordination. However, under low saturation (<0.3), MAXBAND exhibits comparable performance because fixed offsets suffice for sparse traffic. Future improvements could involve dynamic weight adjustment in the objective function (e.g., prioritizing bandwidth maximization during peaks and energy efficiency during off-peaks) to enhance scenario-specific adaptability.
As shown in Table 8 and Table 9, the morning peak period (7:00~8:00), flat peak period (9:00~17:00) and evening peak period (18:00~19:00) were selected for further analysis of the simulation results (including average stops and average delay) in the Jiangyang middle road section and Wenchang west road section.
Table 8 and Table 9 provide further illustration of the final evaluation indexes from the model optimization. The data presented in Table 8 and Table 9 demonstrate that the proposed model exhibits a more pronounced improvement effect than the two models, with a reduction in the average delays and stops at the intersections of the Jiangyang middle and Wenchang west road sections. In particular, during the morning peak period, the average number of stops at the Jiangyang middle and Wenchang west road sections are reduced by 53.56% and 50.00% respectively. Furthermore, the average delay experienced by vehicles travelling along the Jiangyang middle road section is reduced by 24.32% during the flat peak period. Similarly, the average delay experienced by vehicles travelling along the Wenchang west road section is reduced by 28.85% during the morning peak period. The travel times of arterial vehicles are constrained by the phase offset on a coordinated arterial. It is not possible for them to simply increase their speed in order to reduce the delay time. Consequently, the most effective method for reducing shock wave delay is to reduce the delay for arterial vehicles. In the context of the connected vehicle environment, arterial vehicles are provided with speed guidance and are permitted to pass through the stop line without stopping. As a result, the starting wave delay can be significantly reduced. At the same time, the integration of vehicle speed guidance control and coordinated phase offset optimization is employed to optimize the utilization of green time. The proposed method not only enhances the coordinated phase and optimizes the utilization of the green wave band but also facilitates the uncoordinated phase in obtaining a longer green time, thereby reducing the average delays and average stops for arterial and non-arterial vehicles.
The performance is discussed under different conditions and potential improvements, including MAXBAND, MULTIBAND (named classical green wave method), PPO (named reinforcement learning baseline) and DQNGA in Table 10. Table 10 shows that DQNGA reduces average delay by 30.19% compared to MAXBAND and 22.32% compared to MULTIBAND and 17.78% to PPO. Similarly, it can be inferred that DQNGA reduces average stops by 50% compared to MAXBAND, 39.06% compared to MULTIBAND and 30.36% compared to PPO. The results show that the proposed DQNGA model exhibits enhanced stability and precision in comparison to the other three algorithms.

5. Conclusions

Building upon existing research in speed guidance and green wave coordination for connected vehicle systems, current methodologies predominantly focus on primary green wave bandwidth while neglecting the potential of supplementary green waves. This study introduces an integrated optimization framework that synergizes hierarchical speed guidance with bidirectional periodic green wave coordination. The proposed methodology establishes a global optimization framework that simultaneously determines optimal guidance speeds and bidirectional green wave bandwidths through three key innovations: (1) a hierarchical speed guidance strategy minimizing travel time while maximizing intersection throughput through trajectory optimization of initial and target vehicle states; (2) a multi-objective optimization model combining maximum arterial velocity with comprehensive green wave bandwidth under connected vehicle conditions; (3) a bi-level combinatorial optimization architecture combining a Deep Q-Network (DQN) with a Genetic Algorithm (GA), termed DQNGA, which resolves complex traffic scenarios through specialized neural network configurations.
Validation via VISSIM/MATLAB co-simulation with empirical traffic data demonstrated superior computational efficiency, with DQNGA outperforming conventional MAXBAND and state-of-the-art reinforcement learning approaches (CDRL, MADQN, QT-CDQN, DTA) by 20–35% in convergence speed. Field implementation revealed substantial operational improvements: morning peak stops decreased by 53.6% and 50.0% at Jiangyang Middle Road and Wenchang West Road sections, respectively, while off-peak delays were reduced by 24.3% and 28.9% at corresponding locations. These metrics confirm the framework’s capability to enhance arterial capacity through coordinated speed–phase optimization while maintaining traffic stability.
There remain numerous research topics for future investigation, several of which are discussed in the following sections. Firstly, the follow-up study will focus on the sensitivity analysis of the transition time of vehicle acceleration and deceleration to the model. Secondly, the proposed model assumes that all vehicles are fully connected. One of the challenges in integrating traditional vehicles (mixed traffic flow) into this model is the lack of data on their behavior.

Author Contributions

Conceptualization, L.D.; methodology, L.D.; validation, L.D. and S.L.; formal analysis, L.D.; investigation, L.Z. and X.X.; data curation, L.D.; writing—original draft preparation, L.D.; writing—review and editing, L.D.; visualization, L.D., S.L. and Z.Y.; supervision, L.Z. and X.X.; project administration, L.D. and X.X.; funding acquisition, L.D. and X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guangxi Natural Science Foundation (2024JJB170013), the Guangxi Science and Technology Major Program (GuikeAA23062001), Project for Enhancing Young and Middle-aged Teacher’s Research Basis Ability in Colleges of Guangxi (2024KY0271), the National Natural Science Foundation of China, grant numbers 62262011 and 61741303, and the Guangxi Key Technologies R&D Program (GuikeAB23026036). The authors gratefully thank the anonymous referees for their useful comments and the editors for their work.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the first author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Sangkey, K.; Ali, H.; Billy, W.M.; Rouphail, N.M. Dynamic bandwidth analysis for coordinated arterial streets. J. Intell. Transp. Syst. 2016, 20, 294–310. [Google Scholar]
  2. Zhou, H.M.; Gene, H.H., Jr.; Zhang, Y.L. Arterial signal coordination with uneven double cycling. Transp. Res. Part A Policy Pract. 2017, 103, 409–429. [Google Scholar]
  3. Guler, S.I.; Menendez, M.; Meier, L. Using connected vehicle technology to improve the efficiency of intersections. Transp. Res. Part C Emerg. Technol. 2014, 46, 121–131. [Google Scholar] [CrossRef]
  4. Yang, K.; Menendez, M.; Guler, S.I. Implementing transit signal priority in a connected vehicle environment with and without bus stops. Transp. B Transp. Dyn. 2019, 7, 423–445. [Google Scholar]
  5. Xuan, Y.G.; Daganzo, C.F.; Cassidy, M.J. Increasing the capacity of signalized intersections with separate left turn phases. Transp. Res. Part B Methodol. 2011, 45, 769–781. [Google Scholar] [CrossRef]
  6. He, Q.; Head, K.L.; Ding, J. PAMSCOD: Platoon-based arterial multi-modal signal control with online data. Transp. Res. Part C Emerg. Technol. 2012, 20, 164–184. [Google Scholar] [CrossRef]
  7. Liu, Y.; Chang, G.L.; Yu, J. An Integrated Control Model for Freeway Corridor Under Nonrecurrent Congestion. IEEE Trans. Veh. Technol. 2011, 60, 1404–1418. [Google Scholar]
  8. Ye, L.; Wang, W.; Xing, L.; Wang, H.; Dong, C.Y. Improving traffic efficiency of highway by integration of adaptive cruise control and variable speed limit control. J. Jilin Univ. (Eng. Technol. Ed.) 2017, 47, 1420–1425. [Google Scholar]
  9. Zheng, J.F.; Liu, H.X. Estimating traffic volumes for signalized intersections using connected vehicle data. Transp. Res. Part C Emerg. Technol. 2017, 79, 347–362. [Google Scholar] [CrossRef]
  10. Sun, W.L.; Zheng, J.F.; Liu, H.X. A capacity maximization scheme for intersection management with automated vehicles. Transp. Res. Part C: Emerg. Technol. 2018, 94, 19–31. [Google Scholar] [CrossRef]
  11. Feng, Y.H.; Yu, C.H.; Liu, H.X. Spatiotemporal intersection control in a connected and automated vehicle environment. Transp. Res. Part C Emerg. Technol. 2018, 89, 364–383. [Google Scholar] [CrossRef]
  12. Yu, C.H.; Feng, Y.H.; Liu, H.X.; Ma, W.; Yang, X. Integrated optimization of traffic signals and vehicle trajectories at isolated urban intersections. Transp. Res. Part B Methodol. 2018, 112, 89–112. [Google Scholar] [CrossRef]
  13. Liang, X.Y.; Du, X.S.; Wang, G.L.; Han, Z. A Deep Reinforcement Learning Network for Traffic Light Cycle Control. IEEE Trans. Veh. Technol. 2019, 68, 1243–1253. [Google Scholar] [CrossRef]
  14. Emami, A.; Sarvi, M.; Bagloee, S.A. Network-wide traffic state estimation and rolling horizon-based signal control optimization in a connected vehicle environment. IEEE Trans. Intell. Transp. Syst. 2021, 23, 5840–5858. [Google Scholar] [CrossRef]
  15. Yang, K.D.; Menendez, M.; Zheng, N. Heterogeneity aware urban traffic control in a connected vehicle environment: A joint framework for congestion pricing and perimeter control. Transp. Res. Part C Emerg. Technol. 2019, 105, 439–455. [Google Scholar] [CrossRef]
  16. Rafter, C.B.; Anvari, B.; Box, S.; Cherrett, T. Augmenting Traffic Signal Control Systems for Urban Road Networks with Connected Vehicles. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1728–1740. [Google Scholar] [CrossRef]
  17. Mohebifard, R.; Hajbabaie, A. Optimal network-level traffic signal control: A benders decomposition-based solution algorithm. Transp. Res. Part B Methodol. 2019, 121, 252–274. [Google Scholar] [CrossRef]
  18. Kamal, M.A.S.; Mukai, M.; Murata, J.; Kawabe, T. Model Predictive Control of Vehicles on Urban Roads for Improved Fuel Economy. IEEE Trans. Control Syst. Technol. 2013, 21, 831–841. [Google Scholar] [CrossRef]
  19. Wu, W.; Li, P.K.; Zhang, Y. Modelling and Simulation of Vehicle Speed Guidance in Connected Vehicle Environment. Int. J. Simul. Model. 2015, 14, 145–157. [Google Scholar] [CrossRef]
  20. Lin, P.Q.; Zhuo, Q.F.; Yao, K.B.; Ran, B.; Xu, J. Solving and Simulation of Microcosmic Control Model of Intersection Traffic Flow in Connected-vehicle Network Environment. China J. Highw. Transp. 2015, 28, 82–90. [Google Scholar]
  21. Ma, W.; Zou, L.; An, K.; Gartner, N.H.; Wang, M. A partition-enabled multi-mode band approach to arterial traffic signal optimization. IEEE Trans. Intell. Transp. Syst. 2019, 20, 313–322. [Google Scholar]
  22. Xu, B.; Ban, X.J.; Bian, Y.; Li, W.; Wang, J.; Li, S.E.; Li, K. Cooperative method of traffic signal optimization and speed control of connected vehicles at isolated intersections. IEEE Trans. Intell. Transp. Syst. 2019, 20, 1390–1403. [Google Scholar]
  23. Wu, L.N.; Ci, Y.S.; Sun, Y.C.; Qi, W. Research on Joint Control of On-Ramp Metering and Mainline Speed Guidance in the Urban Expressway Based on MPC and Connected Vehicles. J. Adv. Transp. 2020, 2020, 7518087. [Google Scholar]
  24. Zhou, A.; Peeta, S.; Yang, M.; Wang, J. Cooperative signal-free intersection control using virtual platooning and traffic flow regulation. Transp. Res. Part C Emerg. Technol. 2022, 138, 103610. [Google Scholar]
  25. Chen, X.; Hu, M.; Xu, B.; Bian, Y.; Qin, H. Improved reservation-based method with controllable gap strategy for vehicle coordination at non-signalized intersections. Phys. A Stat. Mech. its Appl. 2022, 604, 127953. [Google Scholar]
  26. Liang, X.; Guler, S.I.; Gayah, V.V. Decentralized arterial traffic signal optimization with connected vehicle information. J. Intell. Transp. Syst. 2021, 27, 145–160. [Google Scholar]
  27. Chen, H.; Qiu, T.Z. Distributed Dynamic Route Guidance and Signal Control for Mobile Edge Computing-Enhanced Connected Vehicle Environment. IEEE Trans. Intell. Transp. Syst. 2022, 23, 12251–12262. [Google Scholar]
  28. Yang, K.D.; Zheng, N.; Menendez, M. Multi-scale Perimeter Control Approach in a Connected-Vehicle Environment. Transp. Res. Procedia 2017, 23, 101–120. [Google Scholar]
  29. Wang, M.; Daamen, W.; Hoogendoorn, S.P.; van Arem, B. Rolling horizon control framework for driver assistance systems. Part I: Mathematical formulation and non-cooperative systems. Transp. Res. Part C Emerg. Technol. 2014, 40, 271–289. [Google Scholar]
  30. Ubiergo, G.A.; Jin, W.L. Mobility and environment improvement of signalized networks through Vehicle-to-Infrastructure (V2I) communications. Transp. Res. Part C Emerg. Technol. 2016, 68, 70–82. [Google Scholar]
  31. Wan, N.; Vahidi, A.; Luckow, A. Optimal speed advisory for connected vehicles in arterial roads and the impact on mixed traffic. Transp. Res. Part C Emerg. Technol. 2016, 69, 548–563. [Google Scholar] [CrossRef]
  32. Liu, X.; Xu, J.; Zheng, K.; Zhang, G.; Liu, J.; Shiratori, N. Throughput Maximization with an AoI Constraint in Energy Harvesting D2D-Enabled Cellular Networks: An MSRA-TD3 Approach. IEEE Trans. Wirel. Commun. 2025, 24, 1448–1466. [Google Scholar]
  33. He, X.; Liu, H.X.; Liu, X. Optimal vehicle speed trajectory on a signalized arterial with consideration of queue. Transp. Res. Part C Emerg. Technol. 2015, 61, 106–120. [Google Scholar]
  34. Tang, T.Q.; Yi, Z.Y.; Zhang, J.; Wang, T.; Leng, J.Q. A speed guidance strategy for multiple signalized intersections based on car-following model. Phys. A Stat. Mech. its Appl. 2018, 496, 399–409. [Google Scholar]
  35. Zhao, S.; Zhang, K. Online predictive connected and automated eco-driving on signalized arterials considering traffic control devices and road geometry constraints under uncertain traffic conditions. Transp. Res. Part B Methodol. 2021, 145, 80–117. [Google Scholar]
  36. He, Z.C.; Kang, H.; Li, E.; Zhou, E.L.; Cheng, H.T.; Huang, Y.Y. Coordinated control of heterogeneous vehicle platoon stability and energy-saving control strategies. Phys. A Stat. Mech. its Appl. 2022, 606, 128155. [Google Scholar]
  37. Wu, W.; Ma, W.J.; Yang, X.G. Dynamic speed-based signal offset optimization model within vehicle infrastructure integration environment. Control Theory Appl. 2014, 31, 519–524. [Google Scholar]
  38. Tajalli, M.; Hajbabaie, A. Distributed optimization and coordination algorithms for dynamic speed optimization of connected and autonomous vehicles in urban street networks. Transp. Res. Part C Emerg. Technol. 2018, 95, 497–515. [Google Scholar]
  39. Wang, P.W.; Jiang, Y.L.; Lin, X.; Zhao, Y.; Li, Y. A joint control model for connected vehicle platoon and arterial signal coordination. J. Intell. Transp. Syst. 2020, 24, 81–92. [Google Scholar]
  40. Bie, Y.M.; Xiong, X.Y.; Yan, Y.D.; Qu, X. Dynamic headway control for high-frequency bus line based on speed guidance and intersection signal adjustment. Comput.-Aided Civ. Infrastruct. Eng. 2020, 35, 4–25. [Google Scholar]
  41. Liu, J.H.; Lin, P.Q.; Ran, B. A Reservation-Based Coordinated Transit Signal Priority Method for Bus Rapid Transit System with Connected Vehicle Technologies. IEEE Intell. Transp. Syst. Mag. 2020, 13, 17–30. [Google Scholar]
  42. Lu, K.; Tian, X.; Jiang, S.Y.; Xu, J.M.; Wang, Y.H. Optimization model for regional green wave coordinated control based on ring-and-barrier structure. J. Intell. Transp. Syst. 2020, 26, 68–80. [Google Scholar]
  43. Lu, K.; Tian, X.; Jiang, S.; Lin, Y.; Zhang, W. Optimization Model of Regional Green Wave Coordination Control for the Coordinated Path Set. IEEE Trans. Intell. Transp. Syst. 2023, 24, 7000–7011. [Google Scholar]
  44. Milanes, V.; Shladover, S.E. Modeling cooperative and autonomous adaptive cruise control dynamic responses using experimental data. Transp. Res. Part C Emerg. Technol. 2014, 48, 285–300. [Google Scholar]
  45. Al-Jhayyish, A.M.H.; Schmidt, K.W. Feedforward Strategies for Cooperative Adaptive Cruise Control in Heterogeneous Vehicle Strings. IEEE Trans. Intell. Transp. Syst. 2018, 19, 113–122. [Google Scholar]
  46. Rajamani, R. Vehicle Dynamics and Control; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  47. Ploeg, J.; Scheepers, B.T.M.; van Nunen, E.; van de Wouw, N.; Nijmeijer, H. Design and experimental evaluation of cooperative adaptive cruise control. In Proceedings of the 14th International IEEE Conference on Intelligent Transportation Systems (ITSC), Washington, DC, USA, 5–7 October 2011; pp. 260–265. [Google Scholar]
  48. Segata, M.; Bloessl, B.; Joerer, S.; Sommer, C.; Dressler, F. Towards inter-vehicle communication strategies for platooning support. In Proceedings of the 7th International Workshop on Communication Technologies for Vehicles (Nets4Cars-Fall), St. Petersburg, Russia, 6–8 October 2014; pp. 1–6. [Google Scholar]
  49. Ma, D.F.; Chen, X.; Wu, X.D.; Jin, S. Mixed-coordinated Decision-making Method for Arterial Signals Based on Reinforcement Learning. J. Transp. Syst. Eng. Inf. Technol. 2022, 22, 145–153. [Google Scholar]
  50. Liu, X.M.; Tang, S.H. Green Wave Coordinated Control Method Based on Continuously Passing Vehicles. J. Transp. Syst. Eng. Inf. Technol. 2012, 12, 34–40. [Google Scholar]
  51. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  52. Van der Pol, E.; Oliehoek, F.A. Coordinated Deep Reinforcement Learners for Traffic Light Control. In Proceedings of the Learning, Inference and Control of Multi-Agent Systems (at NIPS 2016), Barcelona, Spain, 5–10 December 2016; Volume 8, pp. 21–38. [Google Scholar]
  53. Gao, J.; Shen, Y.; Liu, J.; Ito, M.; Shiratori, N. Adaptive traffic signal control: Deep reinforcement learning algorithm with experience replay and target network. arXiv 2017, arXiv:1705.02755. [Google Scholar]
  54. Ge, H.; Song, Y.; Wu, C.; Ren, J.; Tan, G. Cooperative deep Q-learning with Q-value transfer for multi-intersection signal control. IEEE Access 2019, 7, 40797–40809. [Google Scholar]
  55. Kodama, N.; Harada, T.; Miyazaki, K. Traffic Signal Control System Using Deep Reinforcement Learning with Emphasis on Reinforcing Successful Experiences. IEEE Access 2022, 10, 128943–128950. [Google Scholar]
  56. Meng, X.B.; Gao, X.Z.; Lu, L.; Liu, Y.; Zhang, H. A new bio-inspired optimisation algorithm: Bird Swarm Algorithm. J. Exp. Theor. Artif. Intell. 2016, 28, 673–687. [Google Scholar]
  57. Wu, J.; Wang, Y.G.; Burrage, K.; Tian, Y.C.; Lawson, B.; Ding, Z. An improved firefly algorithm for global continuous optimization problems. Expert Syst. Appl. 2020, 149, 113340. [Google Scholar]
Figure 1. Schematic diagram of the practical arterial green wave coordinated control model.
Figure 1. Schematic diagram of the practical arterial green wave coordinated control model.
Sensors 25 02114 g001
Figure 2. Diagram of optimization model.
Figure 2. Diagram of optimization model.
Sensors 25 02114 g002
Figure 3. Schematic diagram of multi-level speed guidance target state.
Figure 3. Schematic diagram of multi-level speed guidance target state.
Sensors 25 02114 g003
Figure 4. Schematic diagram of speed guidance two-stage method.
Figure 4. Schematic diagram of speed guidance two-stage method.
Sensors 25 02114 g004
Figure 5. Schematic diagram of speed guidance four-stage method.
Figure 5. Schematic diagram of speed guidance four-stage method.
Sensors 25 02114 g005
Figure 6. Schematic diagram of speed guidance without stopping.
Figure 6. Schematic diagram of speed guidance without stopping.
Sensors 25 02114 g006
Figure 7. Vehicles travelling on the lane between intersections.
Figure 7. Vehicles travelling on the lane between intersections.
Sensors 25 02114 g007
Figure 8. Bi-level combinatorial optimization method framework.
Figure 8. Bi-level combinatorial optimization method framework.
Sensors 25 02114 g008
Figure 9. Flowchart of the DQNGA algorithm.
Figure 9. Flowchart of the DQNGA algorithm.
Sensors 25 02114 g009
Figure 10. Framework of the lower-layer control based on DQN.
Figure 10. Framework of the lower-layer control based on DQN.
Sensors 25 02114 g010
Figure 11. The hybrid learning framework of reward space.
Figure 11. The hybrid learning framework of reward space.
Sensors 25 02114 g011
Figure 12. VISSIM visual interface of Jiangyang middle road section.
Figure 12. VISSIM visual interface of Jiangyang middle road section.
Sensors 25 02114 g012
Figure 13. The VISSIM visual interface of Wenchang road section.
Figure 13. The VISSIM visual interface of Wenchang road section.
Sensors 25 02114 g013
Figure 14. The traffic volume of Jiangyang middle road section’s intersections in each period.
Figure 14. The traffic volume of Jiangyang middle road section’s intersections in each period.
Sensors 25 02114 g014
Figure 15. The traffic volume of Wenchang west road section’s intersections in each period.
Figure 15. The traffic volume of Wenchang west road section’s intersections in each period.
Sensors 25 02114 g015
Figure 16. Performance results with different algorithms. (a) The results of reward with different algorithms. (b) The results of computational efficiency with different algorithms.
Figure 16. Performance results with different algorithms. (a) The results of reward with different algorithms. (b) The results of computational efficiency with different algorithms.
Sensors 25 02114 g016
Figure 17. The standard deviation results of loss function under different discount rates.
Figure 17. The standard deviation results of loss function under different discount rates.
Sensors 25 02114 g017
Figure 18. The loss function curves under different discount rates at the Jiangyang middle road section. (a) The loss function curves when discount rate changes from 0.01 to 0.50 at the Jiangyang middle road section. (b) The loss function curves when discount rate changes from 0.50 to 0.99 at the Jiangyang middle road section.
Figure 18. The loss function curves under different discount rates at the Jiangyang middle road section. (a) The loss function curves when discount rate changes from 0.01 to 0.50 at the Jiangyang middle road section. (b) The loss function curves when discount rate changes from 0.50 to 0.99 at the Jiangyang middle road section.
Sensors 25 02114 g018
Figure 19. The loss function curves under different discount rates at the Wenchang road section. (a) The loss function curves when discount rate changes from 0.01 to 0.50 at the Wenchang road section. (b) The loss function curves when discount rate changes from 0.50 to 0.99 at the Wenchang road section.
Figure 19. The loss function curves under different discount rates at the Wenchang road section. (a) The loss function curves when discount rate changes from 0.01 to 0.50 at the Wenchang road section. (b) The loss function curves when discount rate changes from 0.50 to 0.99 at the Wenchang road section.
Sensors 25 02114 g019
Figure 20. The trajectories of the MAXBAND model with Jiangyang middle and Wenchang road section. (a) The trajectories of Jiangyang middle road section. (b) The trajectories of Wenchang road section. (c) The trajectories of Jiangyang middle road section; (d) The trajectories of Wenchang road section.
Figure 20. The trajectories of the MAXBAND model with Jiangyang middle and Wenchang road section. (a) The trajectories of Jiangyang middle road section. (b) The trajectories of Wenchang road section. (c) The trajectories of Jiangyang middle road section; (d) The trajectories of Wenchang road section.
Sensors 25 02114 g020
Figure 21. The speeds and trajectories of the proposed model with Jiangyang middle and Wenchang road section. (a) The speeds of Jiangyang middle road section. (b) The speeds of Wenchang road section. (c) The trajectories of Jiangyang middle road section. (d) The trajectories of Wenchang road section.
Figure 21. The speeds and trajectories of the proposed model with Jiangyang middle and Wenchang road section. (a) The speeds of Jiangyang middle road section. (b) The speeds of Wenchang road section. (c) The trajectories of Jiangyang middle road section. (d) The trajectories of Wenchang road section.
Sensors 25 02114 g021aSensors 25 02114 g021b
Figure 22. The convergence curves, fitness function variations and computational time of GA with different algorithms. (a) The fitness function variations. (b) The convergence curves. (c) The results for computational time when the function converges.
Figure 22. The convergence curves, fitness function variations and computational time of GA with different algorithms. (a) The fitness function variations. (b) The convergence curves. (c) The results for computational time when the function converges.
Sensors 25 02114 g022
Figure 23. Continuously passing vehicles at the Jiangyang Middle Road Section and Wenchang West Road Section in morning peak period. (a) The Jiangyang Middle Road Section. (b) The Wenchang West Road Section.
Figure 23. Continuously passing vehicles at the Jiangyang Middle Road Section and Wenchang West Road Section in morning peak period. (a) The Jiangyang Middle Road Section. (b) The Wenchang West Road Section.
Sensors 25 02114 g023
Figure 24. Continuously passing vehicles at the Jiangyang Middle Road Section and Wenchang West Road Section in flat peak period. (a) The Jiangyang Middle Road Section. (b) The Wenchang West Road Section.
Figure 24. Continuously passing vehicles at the Jiangyang Middle Road Section and Wenchang West Road Section in flat peak period. (a) The Jiangyang Middle Road Section. (b) The Wenchang West Road Section.
Sensors 25 02114 g024
Figure 25. Continuously passing vehicles at the Jiangyang Middle Road Section and Wenchang West Road Section in evening peak period. (a) The Jiangyang Middle Road Section. (b) The Wenchang West Road Section.
Figure 25. Continuously passing vehicles at the Jiangyang Middle Road Section and Wenchang West Road Section in evening peak period. (a) The Jiangyang Middle Road Section. (b) The Wenchang West Road Section.
Sensors 25 02114 g025
Figure 26. The final comparison results for average stops and delays with two control models in Jiangyang middle road section. (a) The average stops with two control models. (b) The average delays with two control models.
Figure 26. The final comparison results for average stops and delays with two control models in Jiangyang middle road section. (a) The average stops with two control models. (b) The average delays with two control models.
Sensors 25 02114 g026
Figure 27. The final comparison results for average stops and delays with two control models in Wenchang west road section. (a) The average stops with two control models. (b) The average delays with two control models.
Figure 27. The final comparison results for average stops and delays with two control models in Wenchang west road section. (a) The average stops with two control models. (b) The average delays with two control models.
Sensors 25 02114 g027
Figure 28. The results of the green wave coordinated evaluation index under different traffic saturation levels: (a) under traffic saturation level of 0.2; (b) under traffic saturation level of 0.4; (c) under traffic saturation level of 0.6; (d) under traffic saturation level of 0.8.
Figure 28. The results of the green wave coordinated evaluation index under different traffic saturation levels: (a) under traffic saturation level of 0.2; (b) under traffic saturation level of 0.4; (c) under traffic saturation level of 0.6; (d) under traffic saturation level of 0.8.
Sensors 25 02114 g028aSensors 25 02114 g028b
Table 1. Definition of Parameters.
Table 1. Definition of Parameters.
SymbolQuantity
iThe ith optimization cycle
jThe jth vehicle
V i j e n d The target speed of the vehicle within the control range, m/s
V max Maximum permissible vehicle speed, m/s
t i j e n d The target time of vehicle arrival at downstream intersection, s
h i j The time headway between the current vehicle and the preceding vehicle, s
l i j e n d The target displacement of the vehicle within the control range, m
h t The time safe headway, s
S 0 The safe parking distance, m
l e n i j 1 The front vehicle’s length of the current vehicle, m
V i j 0 The initial speed of the vehicle entering the control range, m/s
V i e n d 0 The initial speed of the last vehicle entering the control range in the optimization cycle, m/s
t i j 0 ,   T i j 0 The initial time of vehicle entering the control range, s
t i j 2 ,   T i j 4 The time when the vehicle reaches the downstream intersection, s
t i j 1 ,   T i j 1 The time when the vehicle is at the end of the first uniform speed change stage, s
T i j 2 The time when the vehicle ends its first uniform motion at the guiding speed, s
T i j 3 The time when the vehicle is at the end of the second uniform speed change stage, s
L The total displacement of the vehicle within the control range, m
l 1 ,   L 1 The displacement of vehicle during the first uniform speed change stage, m
l 2 ,   L 4 The displacement of the vehicle in the stage of uniform motion at Vmax, m
L 2 The displacement of the vehicle during the first uniform motion stage at the guiding speed, m
L 3 The displacement of the vehicle during the second uniform speed change stage, m
V i j 1 The guiding speed of the vehicle entering the control range, m/s
Δ S t The time distance function between the current vehicle and the front vehicle
S i j 1 t The time distance function of the front vehicle
S i j t The time distance function of the current vehicle
S The safe space headway, m
t i e n d 0 The initial time when the last vehicle enters the control range in the optimization cycle, s
PThe control objective expression of forward green wave bandwidth.
HThe control objective expression of reverse green wave bandwidth.
Table 2. Definitions of the variables.
Table 2. Definitions of the variables.
VariableDefinition
tcThe time when the cth gather–disperse wave intersects with the c − 1th gather–disperse wave at the downstream intersection
v1The average speed corresponding to traffic flow qg2
v2The average speed corresponding to traffic flow qr1
v3The average speed corresponding to traffic flow qr2
v4The average speed corresponding to traffic flow qg1
W1~W5The gather–disperse wave speed
qg1The saturation flow of queuing vehicles in coordinated phase when they release
qg2The average flow of vehicles flowing out after the queue of vehicles has dissipated late in the green light of coordinated phase
qr1The average flow of queuing vehicles driving out of the intersection in the pre-reddish phase of the coordinated phase interface with the non-coordinated phase
qr2The average vehicle outflow after the coordinated phase red time late articulated uncoordinated phase queue has dissipated
tg1The time for the vehicle queue to dissipate during the green time of coordinated phase
tr1The time for completion of uncoordinated phase queue dissipation for coordinated phase articulation
C′Public cycle of upstream and downstream intersections
TgThe upstream intersections with the green time of the coordinated phases
TrThe upstream intersections with the red time of the coordinated phases
TgbTime difference between the beginning of the green wave at the upstream intersection and the beginning of the green time at the coordinated phase in the same cycle
TgeThe time difference between the end of the green wave at the upstream intersection and the start of the green light at the coordinated phase in the same cycle
Table 3. Some sample data.
Table 3. Some sample data.
Vehicle_IDGlobal_TimeLocal_XLocal_YGlobal_XGlobal_YvLane_IDFollowingSpace_HeadwayTime_Headway
5151,118,848,075,00030.034188.0626,451,203.7291,873,252.549133523119.15.11
22241,113,437,421,70041.429472.9016,042,814.2642,133,542.012144221153.342.01
10331,118,848,324,7006.2021701.146,452,347.6731,872,258.452131104038.810.92
7441,118,848,181,20028.878490.0866,451,422.3531,873,041.01815375237.81.54
Table 4. Definitions of Variables.
Table 4. Definitions of Variables.
Model ParametersValue
Replay memory D40,000
The number of training times80,000
Batch size64
Learning rate0.0001
Discount rate[0.01, 0.99]
ReLU activation function0.001
Action   interval   Δ t /s3
Table 5. The comparison results for average speed and signal offset of each section with MAXBAND and the proposed model in morning peak period.
Table 5. The comparison results for average speed and signal offset of each section with MAXBAND and the proposed model in morning peak period.
Road SectionIntersectionSignal Offset of MAXBANDSignal Offset of the Proposed ModelForward and Reverse Comprehensive Green Wave Bandwidth of MAXBANDForward and Reverse Comprehensive Green Wave Bandwidth of the Proposed Model
Jiangyang Middle Road SectionIntersection of Jiangyang Middle Road and Huidong Road015(30, 27)(34, 30)
Intersection of Jiangyang Middle Road and Xingcheng East Road126112
Intersection of Jiangyang Middle Road and Yangzi River Road7785
Wenchang West Road SectionIntersection of Wenchang West Road and Guozhan Road7277(21, 18)(23, 21)
Intersection of Wenchang West Road and Ruiyang Middle Road011
Intersection of Wenchang West Road and Baixiang Road4538
Table 6. The comparison results for average speed and signal offset of each section with MAXBAND and the proposed model in flat peak period.
Table 6. The comparison results for average speed and signal offset of each section with MAXBAND and the proposed model in flat peak period.
Road SectionIntersectionSignal Offset of MAXBANDSignal Offset of the Proposed ModelForward and Reverse Comprehensive Green Wave Bandwidth of MAXBANDForward and Reverse Comprehensive Green Wave Bandwidth of the Proposed Model
Jiangyang Middle Road SectionIntersection of Jiangyang Middle Road and Huidong Road0113(35, 33)(38, 35)
Intersection of Jiangyang Middle Road and Xingcheng East Road6378
Intersection of Jiangyang Middle Road and Yangzi River Road117101
Wenchang West Road SectionIntersection of Wenchang West Road and Guozhan Road3628(26, 24)(28, 27)
Intersection of Wenchang West Road and Ruiyang Middle Road0119
Intersection of Wenchang West Road and Baixiang Road4552
Table 7. The comparison results for average speed and signal offset of each section with MAXBAND and the proposed model in evening peak period.
Table 7. The comparison results for average speed and signal offset of each section with MAXBAND and the proposed model in evening peak period.
Road SectionIntersectionSignal Offset of MAXBANDSignal Offset of the Proposed ModelForward and Reverse Comprehensive Green Wave Bandwidth of MAXBANDForward and Reverse Comprehensive Green Wave Bandwidth of the Proposed Model
Jiangyang Middle Road SectionIntersection of Jiangyang Middle Road and Huidong Road017(29, 26)(32, 29)
Intersection of Jiangyang Middle Road and Xingcheng East Road126118
Intersection of Jiangyang Middle Road and Yangzi River Road7791
Wenchang West Road SectionIntersection of Wenchang West Road and Guozhan Road7282(23, 20)(26, 24)
Intersection of Wenchang West Road and Ruiyang Middle Road011
Intersection of Wenchang West Road and Baixiang Road4535
Table 8. Comparison of simulation results with Jiangyang middle road section.
Table 8. Comparison of simulation results with Jiangyang middle road section.
Road SectionAverage StopsAverage Delay
MAXBANDThe Proposed ModelImprovement EffectMAXBANDThe Proposed ModelImprovement Effect
Morning Peak Period2.751.27753.56%534318.87%
Flat Peak Period1.1210.69338.18%372824.32%
Evening Peak Period2.2951.22446.67%473721.28%
Table 9. Comparison of simulation results with Wenchang west road section.
Table 9. Comparison of simulation results with Wenchang west road section.
Road SectionAverage StopsAverage Delay
MAXBANDThe Proposed ModelImprovement EffectMAXBANDThe Proposed ModelImprovement Effect
Morning Peak Period3.91.9550.00%574228.85%
Flat Peak Period1.9131.21236.64%363016.67%
Evening Peak Period3.251.88841.91%484114.58%
Table 10. Differences in performance with different conditions and algorithms.
Table 10. Differences in performance with different conditions and algorithms.
Road SectionMetricMAXBANDMULTIBANDPPOProposed DQNGA
Jiangyang MiddleAverage. Delay (s)53484537
Wenchang WestAverage. Stops3.93.22.81.95
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dong, L.; Xie, X.; Zhang, L.; Li, S.; Yang, Z. A Multi-Level Speed Guidance Cooperative Approach Based on Bidirectional Periodic Green Wave Coordination Under Intelligent and Connected Environment. Sensors 2025, 25, 2114. https://doi.org/10.3390/s25072114

AMA Style

Dong L, Xie X, Zhang L, Li S, Yang Z. A Multi-Level Speed Guidance Cooperative Approach Based on Bidirectional Periodic Green Wave Coordination Under Intelligent and Connected Environment. Sensors. 2025; 25(7):2114. https://doi.org/10.3390/s25072114

Chicago/Turabian Style

Dong, Luxi, Xiaolan Xie, Lieping Zhang, Shuiwang Li, and Zhiqian Yang. 2025. "A Multi-Level Speed Guidance Cooperative Approach Based on Bidirectional Periodic Green Wave Coordination Under Intelligent and Connected Environment" Sensors 25, no. 7: 2114. https://doi.org/10.3390/s25072114

APA Style

Dong, L., Xie, X., Zhang, L., Li, S., & Yang, Z. (2025). A Multi-Level Speed Guidance Cooperative Approach Based on Bidirectional Periodic Green Wave Coordination Under Intelligent and Connected Environment. Sensors, 25(7), 2114. https://doi.org/10.3390/s25072114

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop