Interterminal Truck Routing Optimization Using Cooperative Multiagent Deep Reinforcement Learning
Abstract
:1. Introduction
- We propose a cooperative multiagent RL approach to produce feasible truck routes that minimize empty-truck trip cost (ETTC) and truck waiting cost (TWC).
- Under the same ITT-environment characteristics, the learning models produced in this study can provide feasible truck routes in real time within acceptable computational time.
- We conducted computational experiments using artificially generated data and compared the proposed method with two metaheuristic methods, namely simulated annealing (SA) and tabu search (TS), to evaluate the proposed method’s performance.
2. Literature Review
2.1. Interterminal Truck Routing Problem (ITTRP)
2.2. Reinforcement Learning (RL) for Vehicle Routing Optimization
2.3. Multiagent Reinforcement Learning (RL)
3. Empty-Truck Trips and Truck Waiting Cost in ITTRP
L | Set of locations |
T | Set of trucks |
C | Set of customers |
Set of requesting transport orders (TOs) of customers. , where k is The index of the customer | |
O | Set of all requesting orders, where |
Origin (source) location of a TO, where | |
Destination location of a TO, where | |
Subsets of the origin (source) location | |
Subsets of the destination location | |
Service time at origin (source) location of a TO | |
Service time at destination location of a TO | |
TO due date | |
Penalty cost for late TO, based on the due date | |
An initial position of a truck | |
Maximum number of working hours | |
A prefixed cost for using a truck | |
Variable costs per hour |
- All TOs must be served.
- A TO must be performed by a single truck.
- Each TO must be performed by considering its due date, including the service time at origin (source) and destination locations. The penalty cost is applied to every TO that is served after its due date.
- The initial position of all trucks is at terminal 1, and the next starting point of each truck is the destination location of the latest order served by the truck. The time and cost required for moving trucks from the initial location to the order origin (for first-time order assignments) are not considered objective.
- Pickup and delivery operations are considered as pairing constraints in the TO.
- The pickup vertices are visited before the corresponding delivery vertices (precedence constraints).
- Each location has a given availability time window; hence, the trucks can arrive at the origin and destination based on their given time windows. When trucks arrive in advance, they must wait until the opening time for the pickup or delivery.
- The truck speed and distance between the terminals, which are used to calculate the travel time as well as service time and time windows, are known in advance.
- The fee for carrying out TOs, fixed costs, and variable costs are known in advance.
4. Proposed Method
4.1. Reinforcement Learning (RL)
- is the set of the possible state of the environment that is defined as a finite state.
- is the set of possible actions that can be executed by an agent to interact with the environment.
- denotes the transition probability to move to the state and receiving a reward, r, given as follows: .
- is the expected reward received from the environment after the agent performs action a, at state s.
- is a discount factor determining how far the agent should look into the future.
4.2. Deep Q-Network
4.3. Cooperative Multiagent Deep Reinforcement Learning (DRL)
4.3.1. Agent 1 Design
State Representation
- represents the current time in minutes. The value has a range of 0–1440 that represents a 24 h range in minutes.
- represents the position of the truck. In our case, we only cover three terminals and two logistics facilities. The value of this element has a range of 1 to 5.
- indicates the presence of a TO with DR equal to 0.
- indicates the presence of a TO with DR > 0 and <= 0.50.
- indicates the presence of a TO with DR of more than 0.50.
Actions
- : choose TO with TOC1 characteristics.
- : choose TO with TOC2 characteristics.
- : choose TO with TOC3 characteristics.
Rewards
- R(t) = 0.01, if an agent takes no action when there is no TO.
- R(t) = −0.1, if an agent performs an improper action such as choosing , or when the TO with the corresponding characteristics is not available.
- R(t) = 100, if the selected action results in ETTC = 0.
- R(t) = 50, if the selected action results in ETTCi <= AETTC, where ETTCi is the current ETTC and AETTC is the average of the empty-truck trip costs of all complete TOs.
4.3.2. Agent 2 Design
State Representation
- means that an action was taken by agent 1.
- represents the current time in minutes. The value has a range of 0–1440 that represents a 24 h duration in minutes.
- represents the current position of the truck. In our case, we covered only one terminal and four logistics facilities. The value of this element has a range of 1 to 5.
- indicates the presence of a TO with NTWC characteristics.
- indicates the presence of a TO with LTWC characteristics.
- indicates the presence of a TO with HTWC characteristics.
Actions
- : choose a TO with NTWC characteristics.
- : choose a TO with LTWC characteristics.
- : choose a TO with HTWC characteristics.
Rewards
- R(t) = −0.1, if an agent chooses action , , or when the TO with the corresponding characteristics is not available.
- R(t) = 100, if the selected action results in TWC = 0.
- R(t) = 50, if the selected action results in TWCi <= ATWC, where TWCi is the current TWC and ATWC is the average of all complete TOs’ TWCs.
5. Experimental Results
5.1. Data
- TO origin (o): {T1, T2, T3, F1, F2}.
- TO destination (d): {T1, T2, T3, F1, F2}, where d ≠ o.
- Start time window (in minutes): {0, …, 1320}.
- Due date (in minutes): {120, …, 1440}.
5.2. Algorithm Configuration
5.2.1. DQN Configuration
- The number of hidden neurons should be between the input layer’s size and the output layer’s size.
- The number of hidden neurons should be two-thirds of the input layer’s size plus the size of the output layer.
- The number of hidden neurons should be less than twice the size of the input layer.
5.2.2. Simulated Annealing (SA) Configuration
5.2.3. Tabu Search (TS) Configuration
5.3. Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- UNCTAD. Review of Maritime Transport 2018; Review of Maritime Transport; UNCTAD: Geneva, Switzerland, 2018; ISBN 9789210472418. [Google Scholar]
- Heilig, L.; Voß, S. Inter-terminal transportation: An annotated bibliography and research agenda. Flex. Serv. Manuf. J. 2017, 29, 35–63. [Google Scholar] [CrossRef]
- Islam, S. Transport Capacity Improvement in and around Ports: A Perspective on the Empty-Container-Truck Trips Problem. Ph.D. Thesis, The University of Auckland, Auckland, New Zealand, 2014. [Google Scholar]
- Murty, K.G.; Wan, Y.W.; Liu, J.; Tseng, M.M.; Leung, E.; Lai, K.K.; Chiu, H.W.C. Hongkong international terminals gains elastic capacity using a data-intensive decision-support system. Interfaces 2005, 35, 61–75. [Google Scholar] [CrossRef] [Green Version]
- Izadi, A.; Nabipour, M.; Titidezh, O. Cost Models and Cost Factors of Road Freight Transportation: A Literature Review and Model Structure. Fuzzy Inf. Eng. 2020, 1–21. [Google Scholar] [CrossRef]
- Duinkerken, M.B.; Dekker, R.; Kurstjens, S.T.G.L.; Ottjes, J.A.; Dellaert, N.P. Comparing transportation systems for inter-terminal transport at the Maasvlakte container terminals. OR Spectr. 2006, 28, 469–493. [Google Scholar] [CrossRef]
- Tierney, K.; Voß, S.; Stahlbock, R. A mathematical model of inter-terminal transportation. Eur. J. Oper. Res. 2014, 235, 448–460. [Google Scholar] [CrossRef]
- Jin, X.; Kim, K.H. Collaborative inter-terminal transportation of containers. Ind. Eng. Manag. Syst. 2018, 17, 407–416. [Google Scholar] [CrossRef]
- Heilig, L.; Lalla-Ruiz, E.; Voß, S. Port-IO: A mobile cloud platform supporting context-aware inter-terminal truck routing. In Proceedings of the 24th European Conference on Information Systems, ECIS 2016, Istanbul, Turkey, 12–15 June 2016. [Google Scholar]
- Heilig, L.; Lalla-Ruiz, E.; Voß, S. Multi-objective inter-terminal truck routing. Transp. Res. Part E Logist. Transp. Rev. 2017, 106, 178–202. [Google Scholar] [CrossRef]
- Mukai, N.; Watanabe, T.; Feng, J. Route optimization using Q-learning for on-demand bus systems. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
- Jeon, S.M.; Kim, K.H.; Kopfer, H. Routing automated guided vehicles in container terminals through the Q-learning technique. Logist. Res. 2011, 3, 19–27. [Google Scholar] [CrossRef]
- Yu, J.J.Q.; Yu, W.; Gu, J. Online Vehicle Routing With Neural Combinatorial Optimization and Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3806–3817. [Google Scholar] [CrossRef]
- Zhao, J.; Mao, M.; Zhao, X.; Zou, J. A Hybrid of Deep Reinforcement Learning and Local Search for the Vehicle Routing Problems. IEEE Trans. Intell. Transp. Syst. 2020, 1–11. [Google Scholar] [CrossRef]
- Kalakanti, A.K.; Verma, S.; Paul, T.; Yoshida, T. RL SolVeR Pro: Reinforcement Learning for Solving Vehicle Routing Problem. In Proceedings of the 2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS), Ipoh, Malaysia, 19 September 2019; pp. 94–99. [Google Scholar]
- Adi, T.N.; Iskandar, Y.A.; Bae, H. Interterminal truck routing optimization using deep reinforcement learning. Sensors 2020, 20, 5794. [Google Scholar] [CrossRef]
- Prasad, A.; Dusparic, I. Multi-agent Deep Reinforcement Learning for Zero Energy Communities. In Proceedings of the 2019 IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe), Bucharest, Romania, 29 September–2 October 2019; pp. 1–5. [Google Scholar]
- Calvo, J.A.; Dusparic, I. Heterogeneous multi-agent deep reinforcement learning for traffic lights control. CEUR Workshop Proc. 2018, 2259, 2–13. [Google Scholar]
- Ben Noureddine, D.; Gharbi, A.; Ben Ahmed, S. Multi-agent Deep Reinforcement Learning for Task Allocation in Dynamic Environment. In Proceedings of the 12th International Conference on Software Technologies, Madrid, Spain, 24–26 July 2017; pp. 17–26. [Google Scholar]
- Sukhbaatar, S.; Szlam, A.; Fergus, R. Learning multiagent communication with backpropagation. Adv. Neural Inf. Process. Syst. 2016, 2252–2260. [Google Scholar] [CrossRef]
- Lin, K.; Zhao, R.; Xu, Z.; Zhou, J. Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1774–1783. [Google Scholar]
- Heilig, L.; Lalla-Ruiz, E.; Voß, S. port-IO: An integrative mobile cloud platform for real-time inter-terminal truck routing optimization. Flex. Serv. Manuf. J. 2017, 29, 504–534. [Google Scholar] [CrossRef]
- Min, H. The multiple vehicle routing problem with simultaneous delivery and pick-up points. Transp. Res. Part A Gen. 1989, 23, 377–386. [Google Scholar] [CrossRef]
- Bettinelli, A.; Ceselli, A.; Righini, G. A branch-and-cut-and-price algorithm for the multi-depot heterogeneous vehicle routing problem with time windows. Transp. Res. Part C Emerg. Technol. 2011, 19, 723–740. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: London, UK, 2018; ISBN 978-0262039246. [Google Scholar]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef] [Green Version]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- PARK, N.-K.; LEE, J.-H. The Evaluation of Backhaul Transport with ITT Platform: The Case of Busan New Port. J. Fishries Mar. Sci. Educ. 2017, 29, 354–364. [Google Scholar] [CrossRef]
- Tamura, S.; Tateishi, M. Capabilities of a four-layered feedforward neural network: Four layers versus three. IEEE Trans. Neural Netw. 1997, 8, 251–255. [Google Scholar] [CrossRef]
- Kirkpatrick, S.; Gelatt, C.D.; Vecchi, M.P. Optimization by simulated annealing. Science 1983, 220, 671–680. [Google Scholar] [CrossRef] [PubMed]
- Weyland, D. Simulated annealing, its parameter settings and the longest common subsequence problem. In Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation, Atlanta, GA, USA, 12–16 July 2008; p. 803. [Google Scholar]
- Fu, Q.; Zhou, K.; Qi, H.; Jiang, F. A modified tabu search algorithm to solve vehicle routing problem. J. Comput. 2018, 29, 197–209. [Google Scholar] [CrossRef]
- Glover, F. Future paths for integer programming and links to artificial intelligence. Comput. Oper. Res. 1986, 13, 533–549. [Google Scholar] [CrossRef]
- Kokubugata, H.; Kawashim, H. Application of Simulated Annealing to Routing Problems in City Logistics. In Simulated Annealing; InTech: Vienna, Austria, 2008; ISBN 9789537619077. [Google Scholar]
Origin | Destination | Start-Time Window | Due Date |
---|---|---|---|
T1 | T2 | 60 | 480 |
T4 | T5 | 30 | 360 |
T3 | T1 | 120 | 860 |
No | Location | Operational Time | |
---|---|---|---|
Open | Close | ||
1 | Terminal 1 (T1) | 00.00 | 24.00 |
2 | Terminal 2 (T2) | 00.00 | 24.00 |
3 | Terminal 3 (T3) | 00.00 | 24.00 |
4 | Logistics Facility 1 (F1) | 07.00 | 20.00 |
5 | Logistics Facility 2 (F2) | 08.00 | 20.00 |
From\To | PNIT | PNC | HJNC | HPNT | BNCT |
---|---|---|---|---|---|
PNIT | - | 7.6% | 4.7% | 26.1% | 2.5% |
PNC | 7.6% | - | 15.1% | 8.7% | 14.9% |
HJNC | 4.7% | 15.1% | - | 11.4% | 5.1% |
HPNT | 26.1% | 8.7% | 11.4% | - | 3.8% |
BNCT | 2.5% | 14.9% | 5.1% | 3.8% | - |
T2T (min) | TRTL (min) | GPT (min) | WTUL (min) | The Time Required per Move (min) | |
---|---|---|---|---|---|
PNIT–PNC | 2.85 | 0 | 0 | 30 | 33 |
PNIT–HJNC | 11.35 | 8 | 1 | 30 | 50 |
PNIT–HPNT | 4.92 | 4 | 1 | 30 | 40 |
PNIT–BNCT | 11.3 | 8 | 1 | 30 | 41 |
PNC–HJNC | 5.1 | 2 | 1 | 30 | 38 |
PNC–HPNT | 10.75 | 10 | 1 | 30 | 52 |
PNC–BNCT | 5.50 | 11 | 1 | 30 | 48 |
HJNC–HPNT | 11.62 | 12 | 1 | 30 | 55 |
HJNC–BNCT | 13.8 | 13 | 1 | 30 | 58 |
HPNT–BNCT | 4.5 | 1 | 1 | 30 | 37 |
Unit | Value | |
---|---|---|
Truck transportation cost | USD/time period | 4 |
Operation cost of idle truck | USD/time period | 0.01 |
Delay cost | USD/container/time period | 5 |
Revenue per container | USD/container | 25 |
Dataset ID (DID) | Dataset Category | Num. of Order |
---|---|---|
DC1-35 | 1 | 35 |
DC1-89 | 89 | |
DC2-116 | 2 | 116 |
DC2-173 | 173 | |
DC3-285 | 3 | 285 |
Hyperparameter | Value |
---|---|
Num. of episodes | 1000 |
Batch size | 64 |
Replay memory | 100,000 |
Discount factor γ | 0.995 |
Learning rate α | 0.001 |
Epsilon decay ϵ | 0.05 |
Parameter | Abbreviation |
---|---|
Average Truck Waiting Cost | Avg TWC |
Minimum Truck Waiting Cost | Min TWC |
Average Empty-Truck Trip Cost (in $) | Avg ETTC |
Minimum Empty-Truck Trip Cost (in $) | Min ETTC |
Average Computational Time (in seconds) | Avg CT |
Best Computational Time (in seconds) | Best CT |
DQN | ||||||
---|---|---|---|---|---|---|
DID | Min TWC | Avg TWC | Min ETTC | Avg ETTC | Min CT | Avg CT |
DC1-35 | 0.432 | 0.89 | 110.26 | 159.42 | 15.45 | 34.19 |
DC1-89 | 0.9 | 1.4 | 357.2 | 431.69 | 35.84 | 76.87 |
DC2-116 | 2.18 | 2.91 | 484.53 | 562.04 | 76.17 | 118.14 |
DC2-173 | 4.23 | 5.14 | 697.6 | 839.56 | 88.68 | 153.68 |
DC3-285 | 12.88 | 14.25 | 1232.66 | 1365.19 | 107.74 | 178.53 |
DC1-35 | 0.432 | 0.89 | 110.26 | 159.42 | 15.45 | 34.19 |
SA | ||||||
DC1-35 | 0.46 | 0.97 | 138.25 | 194.68 | 23.76 | 45.35 |
DC1-89 | 0.97 | 1.52 | 440.32 | 516.90 | 105.53 | 142.78 |
DC2-116 | 2.37 | 3.27 | 587.63 | 679.33 | 137.58 | 163.42 |
DC2-173 | 4.89 | 6.05 | 870.04 | 1029.80 | 178.52 | 253.41 |
DC3-285 | 15.25 | 17.48 | 1546.37 | 1683.82 | 475.31 | 598.63 |
DC1-35 | 0.46 | 0.97 | 138.25 | 194.68 | 23.76 | 45.35 |
DQN | ||||||
---|---|---|---|---|---|---|
DID | Min TWC | Avg TWC | Min ETTC | Avg ETTC | Min CT | Avg CT |
DC1-35 | 0.432 | 0.89 | 110.26 | 159.42 | 15.45 | 34.19 |
DC1-89 | 0.9 | 1.4 | 357.2 | 431.69 | 35.84 | 76.87 |
DC2-116 | 2.18 | 2.91 | 484.53 | 562.04 | 76.17 | 118.14 |
DC2-173 | 4.23 | 5.14 | 697.6 | 839.56 | 88.68 | 153.68 |
DC3-285 | 12.88 | 14.25 | 1232.66 | 1365.19 | 107.74 | 178.53 |
DC1-35 | 0.432 | 0.89 | 110.26 | 159.42 | 15.45 | 34.19 |
TS | ||||||
DC1-35 | 0.45 | 0.95 | 126.41 | 187.74 | 27.53 | 58.52 |
DC1-89 | 0.96 | 1.50 | 419.06 | 510.86 | 123.74 | 142.76 |
DC2-116 | 2.36 | 3.14 | 598.58 | 722.55 | 150.82 | 174.83 |
DC2-173 | 4.33 | 5.33 | 744.33 | 883.21 | 197.69 | 248.53 |
DC3-285 | 13.21 | 14.87 | 1321.41 | 1458.02 | 523.61 | 653.31 |
DC1-35 | 0.45 | 0.95 | 126.41 | 187.74 | 27.53 | 58.52 |
Gap Between DQN and SA (%) | ||||||
---|---|---|---|---|---|---|
DID | Min TWC | Avg TWC | Min ETTC | Avg ETTC | Min CT | Avg CT |
DC1-35 | 7.88 | 9.45 | 25.39 | 22.12 | 53.79 | 32.64 |
DC1-89 | 8.49 | 8.76 | 23.27 | 19.74 | 194.45 | 85.74 |
DC2-116 | 8.83 | 12.45 | 21.28 | 20.87 | 80.62 | 38.33 |
DC2-173 | 15.71 | 17.71 | 24.72 | 22.66 | 101.31 | 64.89 |
DC3-285 | 18.45 | 22.67 | 25.45 | 23.34 | 341.16 | 235.31 |
DC1-35 | 7.88 | 9.45 | 25.39 | 22.12 | 53.79 | 32.64 |
Gap Between DQN and TS (%) | ||||||
DC1-35 | 6.14 | 7.45 | 14.65 | 17.77 | 78.19 | 71.16 |
DC1-89 | 7.58 | 7.15 | 17.32 | 18.34 | 245.26 | 85.72 |
DC2-116 | 8.45 | 8.13 | 23.54 | 28.56 | 98.00 | 47.99 |
DC2-173 | 2.40 | 3.70 | 6.70 | 5.20 | 122.93 | 61.72 |
DC3-285 | 2.60 | 4.40 | 7.20 | 6.80 | 385.99 | 265.94 |
DC1-35 | 6.14 | 7.45 | 14.65 | 17.77 | 78.19 | 71.16 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Adi, T.N.; Bae, H.; Iskandar, Y.A. Interterminal Truck Routing Optimization Using Cooperative Multiagent Deep Reinforcement Learning. Processes 2021, 9, 1728. https://doi.org/10.3390/pr9101728
Adi TN, Bae H, Iskandar YA. Interterminal Truck Routing Optimization Using Cooperative Multiagent Deep Reinforcement Learning. Processes. 2021; 9(10):1728. https://doi.org/10.3390/pr9101728
Chicago/Turabian StyleAdi, Taufik Nur, Hyerim Bae, and Yelita Anggiane Iskandar. 2021. "Interterminal Truck Routing Optimization Using Cooperative Multiagent Deep Reinforcement Learning" Processes 9, no. 10: 1728. https://doi.org/10.3390/pr9101728
APA StyleAdi, T. N., Bae, H., & Iskandar, Y. A. (2021). Interterminal Truck Routing Optimization Using Cooperative Multiagent Deep Reinforcement Learning. Processes, 9(10), 1728. https://doi.org/10.3390/pr9101728