Deep Reinforcement Q-Learning for Intelligent Traffic Control in Mass Transit
Abstract
:1. Introduction and Related Work
1.1. Related Work
1.2. The Objectives of This Work
1.3. Overview of the Proposed Approach
1.4. Main Assumptions
- An automated circular metro line without junction, where trains move without passing each others and they stop at all stations.
- Passenger demand volumes as well as an origin–destination travel matrix are provided. The optimization problem is solved with an origin–destination (OD) matrix for the flow demand of passengers from one station to another.
- The operator is able to control the passenger inflows to platforms by closing and opening station’s gates; removing, adding or inverting escalators; etc.
1.5. Problem Formulation
1.6. Contributions
- We introduced a new DDQN model that aims to optimize traffic flow in mass transit lines, specifically in automated circular lines. Our model considers eighty-one possible actions, each of which is a multi-action represented in the ternary counting system and affects multiple traffic parameters simultaneously. Furthermore, we provide a comprehensive explanation of the DDQN architecture and its hyper-parameters beside the reward function, which is defined in a way to cover multiple objectives to ensure a clear understanding of the implementation process.
- In addition, we are optimizing not only the cost of number of running vehicles respecting passengers’ comfort but also the passengers inflow to the platforms.
- We evaluated the robustness of performance of our DDQN model through three scenarios of passenger travel demand: nominal, high and ultra-high levels.
- We interpreted the results obtained by the DDQN optimization and investigated the ability of the proposed methodology to be applied in real life.
2. Deep Reinforcement Q-Learning
2.1. Q-Learning Background
2.2. Deep Reinforcement Q-Learning (DQN) and Double DQN Algorithm
Double DQN Algorithm (DDQN)
Algorithm 1 Double Deep-Q-learning (DDQN) algorithm |
Initialize primary network , target network , replay buffer D, 1; for each iteration do for each environment step do Observe state and select ; Execute and observe next state and reward ; Store in replay buffer D; end for for each update step do sample ; Compute target Q value: Perform gradient descent step on ( Update target network parameters: end for end for |
3. Mass Transit Traffic Model
3.1. Vehicle Dynamics
N | Total number of platforms on the line |
n | Total number of segments |
m | Number of running trains |
L | Length of the whole line |
Boolean number of trains initially positioned on segment j | |
(at time zero) | |
Instant of the kth train departure from node j | |
Instant of the kth train arrival at node j. If node j is not a platform, the train | |
does not stop; thus, and are equal | |
Train running time on segment j, i.e., from node to node j | |
, train dwell time, i.e., delay time between the kth arrival at node j | |
and kth departure from node j | |
: train travel time from node to node j at the kth departure | |
: Node (or station) safe separation time applied for the kth arrival | |
at node j | |
: Departure time headway at node j associated to | |
the (k-1)th and kth departures from node j | |
: Node safe separation time, running time excluded. |
- , respectively, denote upper bounds for running, travel, dwell, safe separation times, headway and s variables.
- , respectively, denote lower bounds for the pre-cited variables.
- , respectively, denote the average values of the pre-cited variables.
- The departure time of the th vehicle from segment j has to hold after the departure of the th vehicle from segment , plus (+) the minimum running time from segment to segment j, plus (+) the dwell time of the vehicle at segment j. We write:
- The departure of the th vehicle from segment j has to hold after the departure of the vehicle from segment , plus (+) the minimum safe separation time at segment . We write:
3.2. Passenger Flow Dynamics
Passenger arrival rate from platform i (the origin) onto train for destination | |
platform j. If either i or j is not a platform (but just a disretization node), the | |
rate is zero. | |
Average rate of passenger flow arriving to platform i. If i is not a platform, | |
the rate is zero. |
The number of passengers of destination j that are willing to enter the platform l | |
between arrivals of vehicles and k. | |
Flow of passengers (in passengers per time unit) of destination j entering | |
platform l between arrivals of vehicles and k. | |
Stock of passengers of destination j present at platform l at the time of the kth | |
vehicle arrival at platform l. | |
Flow of passengers (in passengers per time unit) of destination j boarding at the | |
time of the kth vehicle arrival at platform l. | |
Stock of passengers of origin i and destination j present at the time of the kth | |
vehicle arrival at platform l inside the vehicle. | |
Flow of passengers (in passengers per time unit) alighting at the time of the kth | |
vehicle arrival at platform l. |
Passenger capacity of each vehicle (in max. number of passengers). | |
Passenger capacity of each platform(in max. number of passengers). | |
Maximum entering rate of passengers to platform l (maximum value of the | |
variable in passengers per time unit). | |
Maximum boarding rate (maximum value of the variable in passengers per | |
time unit). | |
Maximum alighting rate (maximum value of the variable in passengers per time | |
unit). |
3.2.1. Passenger Arrivals A
3.2.2. Passenger Inflows I to the Platforms
3.2.3. Stock of Passengers Q at Platforms
3.2.4. Vehicle Dwell Time w
3.2.5. Passenger Boarding Flow
3.2.6. Stock of Passenger P inside the Vehicles
3.2.7. Passenger Exits from Vehicles E
4. DDQN Model for Traffic Control in Mass Transit
4.1. State Representation
- Flow of passengers entering platforms (matrix I).
- Number of passengers inside the vehicles (family of matrices P).
- Number of passengers on the platforms (matrix Q).
- Flow of passengers boarding on vehicles (matrix ).
- Vehicle time headway at each station (vector h).
- Number of passengers willing to enter to stations (matrix A).
- Vehicle dwell time at each station (vector w).
4.2. Agent Actions
- Number m of running vehicles.
- Maximum vehicle speed or equivalently minimum vehicle running time .
- Maximum vehicle dwell time at stations .
- Maximum passengers entering rate to platforms .
4.3. Reward Function
4.4. Passenger Comfort inside the Trains
4.5. Passenger Comfort at the Platforms
4.6. Vehicle Time Headway
4.7. Number of Exiting (Served) Passengers
4.8. Number of Operating Vehicles
4.9. The Overall Reward Function
4.10. NN Architecture & DDQN Algorithm Implementation
5. Case Study: Metro Line 1 Paris
5.1. The Results of Optimization
5.2. Practical Implementation
- Using cameras to monitor passenger behavior inside the vehicles and at platforms for accurate estimation of the environment state.
- Controlling passenger access gates by activating/deactivating/inverting escalators, opening/closing doors, and other such measures as control actions.
6. Conclusions and Perspectives
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jiang, Z.; Fan, W.; Liu, W.; Zhu, B.; Gu, J. Reinforcement learning approach for coordinated passenger inflow control of urban rail transit in peak hours. Transp. Res. Part Emerg. Technol. 2018, 88, 1–16. [Google Scholar] [CrossRef]
- Alesiani, F.; Gkiotsalitis, K. Reinforcement learning-based bus holding for high-frequency services. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 3162–3168. [Google Scholar]
- Ying, C.S.; Chow, A.H.; Wang, Y.H.; Chin, K.S. Adaptive metro service schedule and train composition with a proximal policy optimization approach based on deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 2021, 23, 6895–6906. [Google Scholar] [CrossRef]
- Zhu, Y.; Goverde, R.M. Dynamic and robust timetable rescheduling for uncertain railway disruptions. J. Rail Transp. Plan. Manag. 2020, 15, 100196. [Google Scholar] [CrossRef]
- Šemrov, D.; Marsetič, R.; Žura, M.; Todorovski, L.; Srdic, A. Reinforcement learning approach for train rescheduling on a single-track railway. Transp. Res. Part Methodol. 2016, 86, 250–267. [Google Scholar] [CrossRef]
- Wang, Z.; Pan, Z.; Chen, S.; Ji, S.; Yi, X.; Zhang, J.; Wang, J.; Gong, Z.; Li, T.; Zheng, Y. Shortening passengers’ travel time: A dynamic metro train scheduling approach using deep reinforcement learning. IEEE Trans. Knowl. Data Eng. 2022, 35, 5282–5295. [Google Scholar] [CrossRef]
- Kolat, M.; Kovári, B.; Bécsi, T.; Aradi, S. Multi-agent reinforcement learning for traffic signal control: A cooperative approach. Sustainability 2023, 15, 3479. [Google Scholar] [CrossRef]
- Wang, J.; Sun, L. Dynamic holding control to avoid bus bunching: A multi-agent deep reinforcement learning framework. Transp. Res. Part Emerg. Technol. 2020, 116, 102661. [Google Scholar] [CrossRef]
- Liao, J.; Yang, G.; Zhang, S.; Zhang, F.; Gong, C. A deep reinforcement learning approach for the energy-aimed train timetable rescheduling problem under disturbances. IEEE Trans. Transp. Electrif. 2021, 7, 3096–3109. [Google Scholar] [CrossRef]
- Yan, H.; Cui, Z.; Chen, X.; Ma, X. Distributed Multiagent Deep Reinforcement Learning for Multiline Dynamic Bus Timetable Optimization. IEEE Trans. Ind. Inform. 2022, 19, 469–479. [Google Scholar] [CrossRef]
- Liu, Y.; Tang, T.; Yue, L.; Xun, J.; Guo, H. An intelligent train regulation algorithm for metro using deep reinforcement learning. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 1208–1213. [Google Scholar]
- Krasemann, J.T. Design of an effective algorithm for fast response to the re-scheduling of railway traffic during disturbances. Transp. Res. Part Emerg. Technol. 2012, 20, 62–78. [Google Scholar] [CrossRef]
- Obara, M.; Kashiyama, T.; Sekimoto, Y. Deep reinforcement learning approach for train rescheduling utilizing graph theory. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 4525–4533. [Google Scholar]
- Coşkun, M.; Baggag, A.; Chawla, S. Deep reinforcement learning for traffic light optimization. In Proceedings of the 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore, 17–20 November 2018; pp. 564–571. [Google Scholar]
- Website, P.M.L. Available online: https://www.ratp.fr/plans-lignes/metro/1 (accessed on 20 June 2022).
- Farhi, N.; Phu, C.N.V.; Haj-Salem, H.; Lebacque, J.P. Traffic modeling and real-time control for metro lines. arXiv 2016, arXiv:1604.04593. [Google Scholar]
- Farhi, N.; Phu, C.N.V.; Haj-Salem, H.; Lebacque, J.P. Traffic modeling and real-time control for metro lines. Part I-A max-plus algebra model explaining the traffic phases of the train dynamics. In Proceedings of the American Control Conference (IEEE), Seattle, WA, USA, 24–26 May 2017. [Google Scholar]
- Farhi, N.; Phu, C.N.V.; Haj-Salem, H.; Lebacque, J.P. Traffic modeling and real-time control for metro lines. Part II-The effect of passenger demand on the traffic phases. In Proceedings of the American Control Conference (IEEE), Seattle, WA, USA, 24–26 May 2017. [Google Scholar]
- Schanzenbächer, F.; Farhi, N.; Christoforou, Z.; Leurent, F.; Gabriel, G. Demand-dependent supply control on a linear metro line of the RATP network. Transp. Res. Procedia 2019, 41, 491–493. [Google Scholar] [CrossRef]
- Schanzenbächer, F.; Farhi, N.; Leurent, F.; Gabriel, G. Comprehensive passenger demand-dependent traffic control on a metro line with a junction and a derivation of the traffic phases. In Proceedings of the Transportation Research Board (TRB) Annual Meeting, Washington, DC, USA, 13–17 January 2019. [Google Scholar]
- Schanzenbächer, F.; Farhi, N.; Leurent, F.; Gabriel, G. Real-time control of the metro train dynamics with minimization of the train time-headway variance. In Proceedings of the IEEE Intelligent Transportation Systems Conference, Maui, HI, USA, 4–7 November 2018. [Google Scholar]
- Schanzenbächer, F.; Farhi, N.; Leurent, F.; Gabriel, G. A discrete event traffic model explaining the traffic phases of the train dynamics on a linear metro line with demand-dependent control. In Proceedings of the American Control Conference (IEEE), Milwaukee, WI, USA, 27–29 June 2018. [Google Scholar]
- Farhi, N. Physical Models and Control of the Train Dynamics in a Metro Line Without Junction. IEEE Trans. Control Syst. Technol. 2019, 27, 1829–1837. [Google Scholar] [CrossRef]
- Farhi, N. A discrete-event model of the train traffic on a linear metro line. Appl. Math. Model. 2021, 96, 523–544. [Google Scholar] [CrossRef]
- Schanzenbächer, F.; Farhi, N.; Leurent, F.; Gabriel, G. Feedback Control for Metro Lines With a Junction. IEEE Trans. Intell. Transp. Syst. 2020, 22, 2741–2750. [Google Scholar] [CrossRef]
- Schanzenbächer, F.; Farhi, N.; Christoforou, Z.; Leurent, F.; Gabriel, G. A discrete event traffic model explaining the traffic phases of the train dynamics in a metro line with a junction. In Proceedings of the IEEE Conference on Decision and Control (CDC), Melbourne, Australia, 12–15 December 2017. [Google Scholar]
- Schanzenbächer, F.; Farhi, N.; Leurent, F.; Gabriel, G. A discrete event traffic model for passenger demand-depenbdent train control in a metro line with a junction. In Proceedings of the ITS World Congress, Singapore, 21–25 October 2019. [Google Scholar]
- Farrando, R.; Farhi, N.; Christoforou, Z.; Schanzenbacher, F. Traffic modeling and simulation on a mass transit line with skip-stop policy. In Proceedings of the IEEE Intelligent Transportation Systems Conference, Rhodes, Greece, 20–23 September 2020. [Google Scholar]
- Farrando, R.; Farhi, N.; Christoforou, Z.; Urban, A. Impact of a fifo rule on the merge of a metro line with a junction. In Proceedings of the Transportation research Board (TRB) Annual Meeting, Washington, DC, USA, 9–13 January 2022. [Google Scholar]
- Ning, L.; Li, Y.; Zhou, M.; Song, H.; Dong, H. A deep reinforcement learning approach to high-speed train timetable rescheduling under disturbances. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 3469–3474. [Google Scholar]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
Variable Name | Min. Value | Max. Value |
---|---|---|
number of veh. | 20 | 148 |
max. speed (meter/second) | 5 | 22 |
max. dwell time (second) | 16 | 45 |
inflow to plat. (passengers/second) | 1 | 80 |
Passengers capacity of vehicles () | 700 passengers |
Ideal number of pass. in vehicles () | 525 passengers |
Max. passengers density in vehicles | 4 pass./m |
Maximum number of trains () | 148 |
Maximum acceptable veh. time headway () | 1000 s |
Acceptable average veh. time headway () | 600 s |
Area of platforms | 270 m |
Max. passengers density at platforms | 4 pass./m |
Passengers capacity of platforms () | 1080 passengers |
Acceptable number of passengers at platforms () | 540 |
PCs | |||
---|---|---|---|
PC 1 | PC 2 | ||
Processor | Intel Core i7 (8 cores) | Intel Core i5 (4 cores) | |
PC’s RAM | 16 GB | 8 GB | |
Utilized RAM | 4.3 GB (Python 2.5 GB and Octave 1.8 GB) | ||
Approximative Runtime | 36 h | 60 h | |
45 h | 76 h | ||
62 h | 90 h |
m | w (s.) | h (s.) | A (p.) | I (p./s.) | Q (p.) | (p./s.) | P (p.) | |
---|---|---|---|---|---|---|---|---|
35 | 20 | 165 | 100 | 68 | 85 | 68 | 380 | |
48 | 24 | 155 | 250 | 95 | 145 | 95 | 420 | |
55 | 28 | 150 | 2500 | 115 | 240 | 115 | 460 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khozam, S.; Farhi, N. Deep Reinforcement Q-Learning for Intelligent Traffic Control in Mass Transit. Sustainability 2023, 15, 11051. https://doi.org/10.3390/su151411051
Khozam S, Farhi N. Deep Reinforcement Q-Learning for Intelligent Traffic Control in Mass Transit. Sustainability. 2023; 15(14):11051. https://doi.org/10.3390/su151411051
Chicago/Turabian StyleKhozam, Shurok, and Nadir Farhi. 2023. "Deep Reinforcement Q-Learning for Intelligent Traffic Control in Mass Transit" Sustainability 15, no. 14: 11051. https://doi.org/10.3390/su151411051
APA StyleKhozam, S., & Farhi, N. (2023). Deep Reinforcement Q-Learning for Intelligent Traffic Control in Mass Transit. Sustainability, 15(14), 11051. https://doi.org/10.3390/su151411051