The multi-intersection TSC optimization problem involves maximizing the capacity of the target intersection through cooperation with the surrounding intersections. Capacity is defined as the number of vehicles passing through an intersection during a unit of time. The waiting time denotes the amount of time it takes a vehicle to exit the intersection from the time it stops at the intersection. A large capacity at the intersection results in fewer waiting vehicles at the intersection and shorter waiting times for them.
To increase the capacity at an intersection, it is important to optimize traffic signals. Specifically, the efficiency of green signals is important for handling many vehicles at an intersection. The more vehicles exit during a given green signal, the greater is the capacity at the intersection during the time unit. However, a fixed traffic signaling can cause green signal wastage because the green signal times are constant regardless of the traffic conditions. Green signal wastage means that although the lane has been allocated green light, there are no vehicles waiting; thus, there are no vehicles leaving the lane. Minimizing the wastage of green signals improves the capacity at an intersection.
The experiment result showed that approximately 30% of the green signal time was wasted in the fixed signaling system. This is because more green signal time is allocated than is needed. When the duration of green signals is distributed efficiently, more vehicles can be handled at an intersection. However, dynamic green-time assignments can cause a selfish signal distribution. Intersections are chain structures that are connected to the surrounding intersections. Consequently, there can be situations in which only high-demand lanes receive signals to handle many vehicles at an intersection. In other words, the selfish distribution of signals may result in lanes not receiving green signals. A traffic signal system should distribute signals to all lanes. Therefore, a fair and reasonable signal distribution is important.
3.2. Proposed DQN-Based Traffic Signal Control
The traffic signal problem of an intersection should be addressed by considering the characteristics of the intersection, which are dynamic and unpredictable. Traffic conditions change continuously depending on the time, day of the week, or weather. Intersections constantly face new environments. Moreover, the intersection has a continuous structure. The environment is computationally complex because it is affected by the circumstances in the surrounding intersections. In RL, the environment is modeled as an MDP. An MDP is defined as a state, action, or reward.
To optimize the signal control, we need to accurately recognize the current state in the MDP. In addition, it is necessary to recognize the situation in the surrounding intersections for cooperation. The state includes six pieces of information: the first and second are the information about the total traffic load () and the standard deviation of the traffic load () at the intersection, respectively. The combination of these two parameters provides a detailed representation of the situation at the intersection. For example, if the traffic load and its standard deviation are both small, there are not many vehicles at the intersection in general. However, a small traffic load with a large standard deviation indicates that, despite few vehicles being at the intersection, there are blocked lanes. Moreover, a large traffic load means that the intersection suffers from traffic congestion regardless of the standard deviation of the load.
The third (
) and fourth (
) pieces of information denote the traffic load of the two directions that will receive the next green light. The fifth (
) and sixth (
) pieces of information are regarding the traffic load in the directions where the vehicle will exit from the two directions that will receive the next green light. The proposed model provides green signals in two directions according to a traffic signal order.
Figure 2 shows the movement information when the vehicle receives a green signal. The intersection in
Figure 2 is divided into two directions per road. One is the left-turn direction, and the other is the straight direction where right turns are possible. For example, if the next green signal is scheduled to be assigned the northbound straight or the right-turn direction (
) and left-turn direction (
), the vehicles waiting in
will flow in the direction
and the vehicles waiting in the
direction will flow in the direction
. In this case, traffic load information in directions
and
is
and
, respectively, and that in directions
and
is
and
, respectively.
The information regarding the directions where the vehicle flows in when the green signal is on is needed because the number of vehicles that want to enter from a certain direction depends on the traffic load in the outgoing direction. This is because no matter how many green signals are allocated to the direction receiving the green signal, the number of vehicles that can enter is limited if there is a high traffic load in the directions that they are trying to enter. Therefore, if the time for a green signal is to be efficiently allocated, the traffic conditions in the surrounding intersections where the vehicle is about to enter should also be recognized. Therefore, the state of the MDP recognizes the current situation at the intersection with the following six pieces of information: .
The proposed optimal TSC system adjusts the time of the traffic signal with the action of the MDP to efficiently use the green signal. Therefore, it is important to distribute the green signal to the required lanes at the appropriate time. An efficient green time allocation can maximize the overall capacity of the intersections. The proposed model controls the time of the green signals; moreover, it is assumed that the lane order in which the green signal is assigned is fixed, as shown in
Figure 3. Therefore, the action is defined as the duration of the green signal time. We define the time unit as
; the action of the proposed model is defined as follows:
.
The number of lanes varies depending on the structure of the intersection; the number of lanes assigned with green lights also varies.
Figure 4 shows the structure of three- to six-way intersections. As can be seen in
Figure 5, there can be three possible signal directions at a three-way intersection and eight at a four-way intersection. The five- and six-way intersections have more signal directions than those of the four-way intersection. As such, the number of signal directions varies for each intersection structure. Therefore, the number of signal directions is affected by the structure of the intersection.
Time is an essential factor when allocating signals to lanes at traffic lights. Each signal direction has a certain length of time during which the vehicles exit the intersection. This length of time does not have to change even if the structure of the intersection changes. Therefore, the time duration of the signal is not affected by the structure of the intersection. Therefore, the signal control model for determining the order of the signal direction has to be changed according to the structure of the intersection. However, the model that determines how long the green light lasts in each direction does not have to change depending on the structure of the intersection. In other words, the proposed MDP does not determine the order of the progress of the vehicles according to the action; rather, it determines it according to the duration of the green signals. Thus, it is possible to apply an MDP to intersections of various structures, such as three-, four-, and five-way intersections.
The goal of this study is to maximize the capacity. In addition, for a fair and reasonable distribution of signals, all lanes have to be allocated green signals in order and the appropriate green signal time must be allocated according to traffic conditions. Therefore, the deviation in the queue length and waiting time between all lanes at the intersection can be reduced.
In this study, the waiting time, not the queue length, was considered as a reward parameter. This is because assigning waiting time as the reward parameter addresses the TSC problem by allocating an appropriate green signal time, which requires time information regarding how long vehicles must wait at an intersection. However, the queue length does not include sufficient time information on the number of vehicles waiting at the intersection. In [
10], there is an expression related to waiting time and queue length as follows:
Here, T denotes the average travel time; q, the average queue length; , the time interval; N, the number of vehicles; l, the length of the road; and , the speed of the vehicle. In this equation, time and queue have a positive correlation. However, the concept of and related to time is applied separately from the queue length. That is, the time information includes information related to time besides information related to the queue length.
Furthermore, the standard deviation of the waiting time rather than the waiting time itself represents the information regarding the deviation between the lane where a vehicle waits for a long time and the lane where a vehicle waits for a short time at an intersection. It is therefore effective to compare the situations in each lane. In addition, if the green signal is distributed when considering only a capacity maximization, the green signal can be distributed in favor of only certain lanes, which can cause problems to other lanes. However, considering the standard deviation of the waiting time can address problems that may arise when distributing signals for maximum capacity. Therefore, the standard deviation of the waiting time between lanes should be kept small.
In addition, the standard deviation of the waiting time is considered as a parameter rather than the waiting time because of the specificity of the data used in learning.
Figure 6 shows an increase or decrease between the average waiting time and the standard deviation of the waiting time over the time at an intersection. At approximately 30, 60, and 90 time units, the waiting time for a total of three times suddenly decreases. This is because the vehicle exits the intersection with the green light. The average waiting time has a large range of changes when the vehicle exits the intersection. The instantaneous value of the change is recognized as a significant value that affects the learning. Thus, the value of the standard deviation of the waiting time is more stable than the average waiting time when a sudden change occurs. In other words, sudden change can be prevented from being reflected in the learning. Therefore, we consider the standard deviation of the waiting time and the capacity as parameters to maximize the capacity at intersections with an efficient green signal distribution.
To maximize the performance of an intersection dealing with many vehicles, we configure the reward function with two parameters, i.e., the capacity (
) and the standard deviation of the waiting time (
).
is the adaptive weighting factor, which depends on the traffic load at the intersection. It ranges between 0 and 1.
denotes 1 –
. The reward function is defined as (
2).
Figure 7 is a data flow chart representing the data and the process flow of the proposed model. Moreover, it shows an interaction between an environment and the agent. The environment represents several intersections. The perceived information from the target intersection, such as the waiting time of the vehicle at the intersection and the traffic load of the exit lane, is sent to the agent. Next, the agent learns from the received information and determines an action for the current state to maximize the reward. The action is sent to the environment. The action represents the time length of the green light on the lane that will move next. The performance of the traffic light control at the intersection is calculated and reflected in the learning of the agent.