Author Contributions
Conceptualization, F.Y. and Q.L.; Data curation, F.Y.; Methodology, F.Y. and Z.L.; Project administration, X.L.; Resources, X.L.; Software, F.Y., Q.L. and X.G.; Supervision, X.L.; Validation, F.Y.; Visualization, F.Y.; Writing—original draft, F.Y.; Writing—review & editing, X.L. and Q.L. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Markov Decision Process.
Figure 1.
Markov Decision Process.
Figure 2.
The schematic diagram of the correlation matrix setting. Each row in the matrix represents the correlation between a vehicle and all other vehicles, specifically the logical values of 0 and 1, where 0 represents the actual distance greater than the set distance, and 1 represents the actual distance less than the set distance.
Figure 2.
The schematic diagram of the correlation matrix setting. Each row in the matrix represents the correlation between a vehicle and all other vehicles, specifically the logical values of 0 and 1, where 0 represents the actual distance greater than the set distance, and 1 represents the actual distance less than the set distance.
Figure 3.
Matrix segmentation diagram. Each row in the matrix represents the characteristic information of a vehicle, and the matrix is divided into the upper and lower parts to separate AV and HV. The green part indicates that the vehicle is not currently in the scenario.
Figure 3.
Matrix segmentation diagram. Each row in the matrix represents the characteristic information of a vehicle, and the matrix is divided into the upper and lower parts to separate AV and HV. The green part indicates that the vehicle is not currently in the scenario.
Figure 4.
The overall structure diagram of the model. FCN represents the full connection layer, and GNN represents the graph neural network.
Figure 4.
The overall structure diagram of the model. FCN represents the full connection layer, and GNN represents the graph neural network.
Figure 5.
Model action space diagram. The matrix row direction divides the longitudinal acceleration into discrete values, and the matrix column direction represents the lateral lane change of the vehicle.
Figure 5.
Model action space diagram. The matrix row direction divides the longitudinal acceleration into discrete values, and the matrix column direction represents the lateral lane change of the vehicle.
Figure 6.
Intention reward gradient diagram. The task completion of autonomous vehicles is seen as a factor in judging the quality of the current reinforcement learning model, which is manifested in the reward values that can be obtained when each vehicle is in different driving sections and lanes. The strip area represents the reward types that the corresponding vehicles can obtain from the area. Green represents reward (), blue represents punishment (), and orange represents serious punishment (). is set to 1 for normalization.
Figure 6.
Intention reward gradient diagram. The task completion of autonomous vehicles is seen as a factor in judging the quality of the current reinforcement learning model, which is manifested in the reward values that can be obtained when each vehicle is in different driving sections and lanes. The strip area represents the reward types that the corresponding vehicles can obtain from the area. Green represents reward (), blue represents punishment (), and orange represents serious punishment (). is set to 1 for normalization.
Figure 7.
Reinforcement learning model transplant diagram.
Figure 7.
Reinforcement learning model transplant diagram.
Figure 8.
Scenario random setting diagram.
Figure 8.
Scenario random setting diagram.
Figure 9.
Simulation scenario diagram.
Figure 9.
Simulation scenario diagram.
Figure 10.
Diagram of average reward. This value is the average of each step reward value.
Figure 10.
Diagram of average reward. This value is the average of each step reward value.
Figure 11.
Diagram of reward. This value is the accumulation of each single step reward value.
Figure 11.
Diagram of reward. This value is the accumulation of each single step reward value.
Figure 12.
Diagram of average Q. This value is a training mark value in reinforcement learning.
Figure 12.
Diagram of average Q. This value is a training mark value in reinforcement learning.
Figure 13.
Diagram of loss. This value represents the difference between the real network and the ideal network.
Figure 13.
Diagram of loss. This value represents the difference between the real network and the ideal network.
Figure 14.
Diagram of success rate. This value is obtained by the ratio of the number of vehicles completing the task (entering the corresponding ramp) to the total number.
Figure 14.
Diagram of success rate. This value is obtained by the ratio of the number of vehicles completing the task (entering the corresponding ramp) to the total number.
Figure 15.
Diagram of collisions. This value is the number of collisions between vehicles obtained by real-time detection in the simulation scenario.
Figure 15.
Diagram of collisions. This value is the number of collisions between vehicles obtained by real-time detection in the simulation scenario.
Figure 16.
Diagram of average velocity. This value is the average velocity of all AVs in the scenario.
Figure 16.
Diagram of average velocity. This value is the average velocity of all AVs in the scenario.
Figure 17.
Diagram of average steps. This value is the number of steps experienced at the end of each episode.
Figure 17.
Diagram of average steps. This value is the number of steps experienced at the end of each episode.
Figure 18.
Diagram of testing reward. This value is the average of the total reward value for each episode in the test.
Figure 18.
Diagram of testing reward. This value is the average of the total reward value for each episode in the test.
Figure 19.
Diagram of testing average reward. This value is the average of the average reward value of each episode in the test.
Figure 19.
Diagram of testing average reward. This value is the average of the average reward value of each episode in the test.
Figure 20.
Spatial distribution diagram of longitudinal movement. Based on the frequency statistics of each action output in each testing process, the probability distribution of the longitudinal action can be obtained through probability calculation. The data in the figure are obtained from the average of ten repeated experiments.
Figure 20.
Spatial distribution diagram of longitudinal movement. Based on the frequency statistics of each action output in each testing process, the probability distribution of the longitudinal action can be obtained through probability calculation. The data in the figure are obtained from the average of ten repeated experiments.
Figure 21.
Diagram of testing average steps. This value is the number of steps experienced at the end of each episode.
Figure 21.
Diagram of testing average steps. This value is the number of steps experienced at the end of each episode.
Figure 22.
Diagram of testing success rate. This value is obtained by the ratio of the number of vehicles completing the task (entering the corresponding ramp) to the total number.
Figure 22.
Diagram of testing success rate. This value is obtained by the ratio of the number of vehicles completing the task (entering the corresponding ramp) to the total number.
Figure 23.
Diagram of testing collisions. This value is the number of collisions between vehicles obtained by real-time detection in the simulation scenario.
Figure 23.
Diagram of testing collisions. This value is the number of collisions between vehicles obtained by real-time detection in the simulation scenario.
Figure 24.
Diagram of testing average velocity. This value is the average velocity of all AVs in the scenario.
Figure 24.
Diagram of testing average velocity. This value is the average velocity of all AVs in the scenario.
Table 1.
Training effect for different numbers of nodes.
Table 1.
Training effect for different numbers of nodes.
N of Nodes | Training Time for 1000 Episodes | Convergence Effect |
---|
32 | 1.5179 h | Poor (large fluctuation) |
64 | 2.6438 h | Acceptable (occasional large fluctuations) |
128 | 3.4756 h | Good (small fluctuation) |
256 | 6.0987 h | Good (small fluctuation) |
512 | 10.9542 h | Good (small fluctuation) |
Table 2.
Number setting of experimental vehicles.
Table 2.
Number setting of experimental vehicles.
| Algorithm Type | N of Vehicles |
---|
| | AVs | HVs |
| | Merge_0 | Merge_1 | |
Training Process | SGRL | 1 | 19 |
MGRL | 5 | 5 | 10 |
NGRL | 5 | 5 | 10 |
Testing Process | SGRL | 5 | 5 | 10 |
MGRL | 5 | 5 | 10 |
NGRL | 5 | 5 | 10 |
Table 3.
Computer hardware information.
Table 3.
Computer hardware information.
Item | Type |
---|
CPU | Intel I9 10980XE |
GPU | NVIDIA RTX3090(24G) |
RAM | Crucial DDR4 3200MHz 32G × 4 |
SSD | SAMSUNG 970 EVO Plus 1T × 2 |
OS | Ubuntu 20.04 |
Table 4.
Training time statistics (ten experiments for each algorithm; time unit is hours).
Table 4.
Training time statistics (ten experiments for each algorithm; time unit is hours).
| SGRL | MGRL | NGRL |
---|
1 | 3.335665 | 66.40787 | 3.251174 |
2 | 3.687935 | 78.94237 | 4.812508 |
3 | 3.782653 | 71.37449 | 3.737191 |
4 | 3.763133 | 82.16632 | 4.360045 |
5 | 3.38374 | 66.463 | 3.259155 |
6 | 3.286891 | 72.25948 | 3.890604 |
7 | 3.874574 | 67.38896 | 3.840993 |
8 | 3.043649 | 67.68016 | 3.668797 |
9 | 3.368947 | 65.51202 | 3.017189 |
10 | 3.832845 | 69.72951 | 4.072374 |
Mean | 3.536003 | 70.79242 | 3.791003 |
Table 5.
Performance comparison for different models.
Table 5.
Performance comparison for different models.
Model | Road Length | Average_V | N_Collisions | Success_Rate | Average_Steps |
---|
SGRL | 1000 m | 9.55588 | 1.66888 | 0.94068 | 601.1014 |
750 m | 8.45047 | 1.73714 | 0.92385 | 494.9505 |
500 m | 7.50548 | 2.06911 | 0.91493 | 307.8586 |
MGRL | 1000 m | 3.60231 | 2.90205 | 0.54129 | 2478.409 |
750 m | 3.33545 | 2.96804 | 0.53864 | 1836.932 |
500 m | 3.20753 | 3.02719 | 0.47095 | 1334.704 |
NGRL | 1000 m | 8.92665 | 1.23373 | 0.53529 | 204.3344 |
750 m | 7.16045 | 1.83906 | 0.52708 | 166.5145 |
500 m | 6.70472 | 2.31418 | 0.4839 | 111.9219 |