To evaluate the effectiveness of the traffic signal transition control method, this section utilizes VISSIM 4.30 simulation software for verification. Firstly, it is necessary to conduct statistical analysis of the traffic volume and road channelization at the arterial intersections. Subsequently, the arterial intersection plan should be created using AUTOCAD 2023 software, and traffic flow statistics should be obtained through Python on VISSIM simulation software for secondary development. Following this, the arterial intersection signal transition scheme can be verified through using “VISSIM + Python” simulation. To construct a network, the following steps must be performed.
Step 1: Edge Mode: Creation of four, two-way edges of length 500~1000 m with a common junction.
Step 2.1: Assign junction IDs as JW, JN, JE, JS and TL for traffic junctions corresponding to each direction and traffic light respectively.
Step 2.2: Assign edge IDs as N2TL, TL2N, S2TL, TL2S, E2TL, TL2E, W2TL and TL2W based on the direction of the edges.
Step 3.1: A traffic transition signal light is added to junction OPNS.
Step 3.2: Traffic phases NSA, NSLA, EWA and EWLA are configured with one green and one yellow phase for each, making a total of eight phases with IDs 0 through 7.
Step 4: Connection Mode: Connections are edited to ensure that the tidal lane of a four-lane edge could only switch turn left. The second lane is left lane that could only turn left. The rest of the two lanes could only go straight.
7.2. The Results of Signal Transition Control Based on OAS Deep Q-Learning
This section evaluates the performance of OAS Deep Q-Learning is compared with those of two categories of the baseline methods: traditional methods (including MaxPressure [
49] and FixedTime [
50]) and the Multi-Agent Deep Reinforcement Learning (MARL) [
51], Meta Variationally Intrinsic Motivated Reinforcement Learning (MetaVIMRL) [
52], and Cooperative Multi-Agent Deep Q-Network (CMDQN) [
53] based on the training results. For fair comparisons, the parameters and simulation conditions of the traditional methods and the DQL methods [
51,
52,
53] are the same.
Table 9 presents the parameters of OAS Deep Q-Learning with reinforcement learning network as mentioned in this paper.
As shown in
Figure 23, the average queue lengths of vehicles on the traffic intersection experience an overall decrease with every train step as the training progresses. Another statistic, average delays, also shows progressive decrease during training. Note that the better performance of our traffic-signal controller agent on both counts (average queue lengths and average delays) as training progresses proves that our OAS Deep Q-Learning model is successfully learning to adapt the transitions of phase offsets in various tidal lane scenarios to the different opening states of the tidal lanes at related intersections. The proposed method outperforms other methods in all tidal lane signal transition scenarios. As expected, the proposed method has better performance than other Deep Q-learning methods. The performance evaluation metrics used are average delay and average queue length. The proposed method has the smallest average delay value of 73.12 and the shortest average queue length value of 92.35. The proposed method outperforms other Deep Q-learning methods in terms of both data efficiency and performance.
In general, the desired outcome of the training curves is convergence, as the agent learns from historical experience.
Figure 24 demonstrates that the training curves of the proposed method and all DQL baseline algorithms [
48,
49,
50] exhibit an upward trend before converging. Among them, the training curves of CMDQN and MARL in
Figure 24b fluctuate more, but the performance of CMDQN is worse. This is because CMDQN and MARL algorithms are similar in that each agent updates its own policy independently, which makes the environment non-smooth. Furthermore, it is important to note that non-smoothness can result in slower and less stable training. As anticipated, MetaVIMRL outperforms MARL due to the fact that each local agent takes into account policy information from its surroundings, thereby mitigating the impact of partial observability on convergence. Conversely, OAS Deep Q-Learning exhibits the best performance as it demonstrates the fastest convergence rate, the smoothest convergence curve, and the highest reward. In other words,
Figure 24 shows that the average queue lengths and average delays improve with the training process. However, the method proposed in this paper outperforms the training process by converging faster and exhibiting less fluctuation.
The loss function metric indicates the performance of the model’s training. A smaller value of the loss function indicates better training.
Figure 25 shows the change in the loss function for each algorithm during the training process.
Figure 25 shows that all algorithms eventually converge. FixedTime and MaxPressure have a larger loss function. At a training step size of approximately 1520, FixedTime’s loss function converges, with an average of 2384.56. Meanwhile, MaxPressure’s loss function has an average of 2449.73. The proposed method has small fluctuations in the loss function, which is converged at a training step size of approximately 1000. The average value of loss function is 519.37. The performances of the MARL, MetaVIMRL, and CMDQN algorithms differ significantly from the algorithms proposed in this paper. This is mainly due to differences in their network training and the complexity of their training parameter settings. The loss function of MARL converges at around 1500 training steps, with an average value of 2034.40. The loss function of MetaVIMRL converges at around 1470 training steps, with an average value of 1179.90. The loss function of CMDQN is 2280.27. At a training step size of approximately 1470, the MetaVIMRL loss function converges, with an average value of 1179.90. The CMDQN loss function, on the other hand, has an average value of 2280.27. Thus, OAS Deep Q-Learning model is more adaptable than the other algorithms.
To evaluate the computational efficient of the proposed method, we compared it with five other methods using runtime as the evaluation metric.
Table 10 shows that runtime is chosen as the evaluation metric to compare the computational cost of all methods. After several experiments, the proposed method can avoid the complex learning process of other deep Q learning methods and outperforms traditional methods and other deep Q learning in terms of runtime, with the shortest average runtime of 32 s. It can also perform high-level computation with real-time input of parameters.
To enhance the verification of the robustness and reliability of the operation results, a comparison between the algorithm presented in this paper and other algorithms is necessary. It is important to note that achieving optimal control in complex environments requires sufficient access to each state-action space, which may lead to overfitting and result in poor control performance. However, to ensure robust performance on untested states, the agent must either be trained on a large dataset that includes as many state-action variations as possible or the state and action space must be simplified. Both approaches, however, may impact the controller’s convergence and optimality. Therefore, it is necessary to reduce the dimensionality of the state space while increasing the action space to achieve improved control performance.
A box plot (shown in
Figure 26) displays the data of the proposed method and other algorithms. In
Figure 26,
is represented the average value.
s is represented the standard deviation.
n is represented the number of experimental runs for obtaining the best value. The comparison of the average queue length and average delay reveals a correlation between the chosen performance metrics; shorter queue length result in smaller delays.
Figure 26a illustrates that the algorithm proposed in this paper has shorter queue lengths than the other algorithms by 7.14%, 14.29%, 21.43%, 39.13%, and 50.00%, respectively.
Figure 26b illustrates that the algorithm presented in this paper has a smaller delay time compared to other algorithms by 59.18%, 58.33%, 57.45%, 56.52%, and 53.49%, respectively.
The sensitive results were verified to be statistically significant using left-tailed hypothesis testing. To perform hypothesis testing on paired means, the absolute value of measurements obtained for a particular simulation by executing the five algorithms are subtracted from absolute values of measurements obtained by executing OAS Deep Q-Learning. During execution, the DNN model is evaluated for optimal policy π* for all the 10,000 simulations and the performance statistics recorded. Policy π* evaluation involves taking actions that have maximum Q-value in the state where the agent is at any time-step t. Similarly, the performance statistics generated on the execution of five algorithms is also recorded.
- (1)
Average queue length
We stated the Null and the Alternative hypothesis as below:
H
o: There is no difference between the true mean
υ_
OASDQLwt and
υ_
Othermethodswt and the difference observed in the sample means
x−OASDQLwt and
x−Othermethodswt is a matter of chance. i.e.,
HA: Average queue length for all vehicles in traffic simulations executed using OAS Deep Q-Learning is on an average less than the same traffic simulations executed using the five algorithms. i.e., υ_diffwt < 0.
Since the standard deviation of the actual distribution is not known the t-distribution was used for hypothesis testing. For a confidence level of 95% (significance = 0.05), degrees of freedom 99 (100 − 1) and left-tailed hypothesis testing. i.e.,
t_
c = −1.66. Where
t_
c is the critical value of the t-score for a 10,000-sample mean, below which it is safe to reject the null hypothesis H
o.
Figure 27 presents the sensitive results of average queue length.
Table 11 shows the sensitive parameters setting for average queue length.
t_score for the simulation sample captured above = −7.8. p-value < 0.00001. Since the calculated p-value << significance (0.05), we safely rejected Ho.
- (2)
Average delay
H
o: There is no difference between the true mean
υ_
OASDQLvqs and
υ_
Othermethodsvqs and the difference observed in the sample means
x−OASDQLvqs and
x−Othermethodsvqs a matter of chance, i.e., as follows:
HA: Average delay for all vehicles in traffic simulations executed using OAS Deep Q-Learning is on an average less than the same traffic simulation executed using the five algorithms. i.e., υ_diffvqs < 0.
Since the standard deviation of the actual distribution is not known the t-distribution was used for hypothesis testing. For a confidence level of 95% (significance = 0.05), degrees of freedom 99 (100-1) and left-tailed hypothesis testing. i.e.,
t_
c = −1.66.
t_
c is the critical value of the t-score for a 10,000-sample mean, below which it is safe to reject the null hypothesis H
o.
Figure 28 presents the sensitive results of average delay.
Table 12 shows the sensitive parameters setting for average delay.
t_score for the simulation sample captured above = −29.8. p-value < 0.00001. Since the calculated p-value << significance (0.05), we safely rejected Ho.
In summary, it has been demonstrated that the proposed method is not only viable but also more effective than the other five algorithms. The use of OAS Deep Q-Learning resulted in a great reduction in the average queue length of vehicles at the related intersections compared to the other five algorithms. Additionally, there is a reduction in the average delay at the related intersections.
In summary,
Table 13 summarizes the results of the comparison between the proposed method and the previous algorithms shown in this work.
Table 13 shows that the proposed method is not only viable but also more effective than the previous algorithms simulated in this work. The proposed method has the smallest average delay value of 73.12 and the shortest average queue length value of 92.35. The proposed method outperforms other Deep Q-learning methods in terms of both data efficiency and performance. Compared with the previous algorithms, the proposed method has shortest queue length and the smallest delay time. The improvements in performance metrics with queue length and delay time are 26.39% and 56.99%, respectively. The proposed method has small fluctuations in the loss function, which is converged at a training step size of approximately 1000. The average value of loss function is 519.37. In other words, the method proposed in this paper outperforms the training process by converging faster and exhibiting less fluctuation. The proposed method has the shortest average runtime of 32 s. It can also perform high-level computation with real-time input of parameters.
7.4. Simulation Verification
This paper compares and analyzes the signal transition control method with the traditional transition control method to determine the optimal opening scheme for tidal lanes at arterial intersections. The effectiveness of the proposed model is also verified in
Table 21,
Table 22 and
Table 23 and
Figure 29,
Figure 30 and
Figure 31.
- (1)
Synchronous Transition Verification
Table 21.
Simulation Results of Synchronous Transition.
Table 21.
Simulation Results of Synchronous Transition.
| Simulation Time (s) | Average Queue Length (m) | Average Delay (s) |
---|
Add | 0–2000 | 15.5 | 54.0 |
Subtract | 0–2000 | 14.7 | 51.3 |
The proposed model | 0–2000 | 11.8 | 39.3 |
Figure 29.
The comparison results for synchronous transition. (a) Average queue length for synchronous transition; (b) Average delay for synchronous transition.
Figure 29.
The comparison results for synchronous transition. (a) Average queue length for synchronous transition; (b) Average delay for synchronous transition.
- (2)
Asynchronous Transition Verification
Table 22.
Simulation Results of Asynchronous Transition.
Table 22.
Simulation Results of Asynchronous Transition.
| Simulation Time (s) | Average Queue Length (m) | Average Delay (s) |
---|
Add | 0–2000 | 15.0 | 52.0 |
Subtract | 0–2000 | 14.2 | 48.7 |
The proposed model | 0–2000 | 11.8 | 38.6 |
Figure 30.
The comparison results for asynchronous transition. (a) Average queue length for asynchronous transition; (b) Average delay for asynchronous transition.
Figure 30.
The comparison results for asynchronous transition. (a) Average queue length for asynchronous transition; (b) Average delay for asynchronous transition.
- (3)
Simulation results comparison of synchronous and asynchronous transitions.
Table 23.
Simulation Results of Synchronous and Asynchronous Transition.
Table 23.
Simulation Results of Synchronous and Asynchronous Transition.
| Simulation Time (s) | Average Queue Length (m) | Average Delay (s) |
---|
Synchronous transition | 0–2000 | 11.8 | 39.3 |
Asynchronous transition | 0–2000 | 11.8 | 38.6 |
Figure 31.
The comparison results for synchronous and asynchronous transitions. (a) Average queue length results comparison of synchronous and asynchronous transitions; (b) Average delay results comparison of synchronous and asynchronous transitions.
Figure 31.
The comparison results for synchronous and asynchronous transitions. (a) Average queue length results comparison of synchronous and asynchronous transitions; (b) Average delay results comparison of synchronous and asynchronous transitions.
The results of the simulation demonstrate that the algorithm introduced in this paper decreases the average queue length indicator of the related intersections by 23.87% and 19.73%, and the average vehicle delay indicator by 27.22% and 23.39%, respectively, when comparing synchronous transitions to the conventional Add and Subtract algorithms. The related intersections’ average queue length decreased by 21.33%. The average vehicle delay indicator also decreased by 25.77% and 20.74% respectively during asynchronous transition. Additionally, the average delay decreased by 1.78% in asynchronous compared to synchronous transition. The proposed algorithm outperforms the traditional algorithm in both synchronous and asynchronous transitions. Meanwhile, it is apparent from the conclusion that the asynchronous transition is more advantageous for maintaining traffic flow stability in the scenario of a tidal lane.