Multi-Agent Reinforcement Learning for Job Shop Scheduling in Dynamic Environments
Abstract
:1. Introduction
- (1)
- A new distributed multi-agent scheduling architecture (DMASA) is proposed, where each workpiece is treated as an intelligent agent. Using a reinforcement learning algorithm, all agents cooperate with each other to maximize the global reward, enabling effective training and implementation of the scheduling algorithm to make scheduling decisions.
- (2)
- Based on the Markov decision-making formula, the representation of state, action, observation, and reward is introduced. The use of heterogeneous graphs (HetG) is proposed to represent states in order to encode the state nodes effectively. A heterogeneous graph neural network (GE-HetGNN) based on graph node embedding is used to compute policies, including machine-matching strategies and process selection strategies.
- (3)
- For the purpose of green dynamic workshop scheduling, the multi-agent proximal strategy optimization algorithm (MAPPO) based on the AC architecture is employed to train the network. This approach minimizes energy consumption in the scheduling workshop while handling dynamic events, thereby achieving better rational resource utilization. To validate the superiority and generalization of the proposed architecture and algorithm, a large number of experiments were conducted on instances and standard benchmarks, including large-scale problems.
2. Literature Review
2.1. Dynamic Job Shop Scheduling Based on Conventional Methods
2.2. Dynamic Job Shop Scheduling Based on Artificial Intelligence (AI)
3. Problem Formulation
3.1. Dynamic Job Shop Scheduling Problem Formulation
- (1)
- It is available on every machine at zero moment;
- (2)
- Only one operation can be processed on one machine at a time;
- (3)
- Once an operation is processed on a machine, it cannot be interrupted;
- (4)
- There are disruptions in production, such as machine failure, order insertion, and job cancellation;
- (5)
- All processing data, including processing time, idle power of the machine, etc., are determined;
- (6)
- When two jobs have the same type of operation, they have the same and unique order of operation;
- (7)
- The transportation time and setup time of the job are negligible.
3.2. Markov Decision Process Formulation
4. Methodology
4.1. Proposed Framework
4.2. Stating Features Embedding
4.3. Reinforcement Learning Algorithm
Algorithm 1 pseudo-code: MAPPO algorithm |
Input: set hyperparameters, actor network parameter πθ, critic network parameter vφ, epoch update count R, discount factor γ 1 Sample N instances of workshop scheduling of size B from D 2 for iter = 1, 2…, I do 3 for b = 1, 2…, B do 4 Initialize st based on instance b; 5 while st is not a terminal do 6 extract the embedding using GNN; 7 sampling at in π(at, st); 8 receive the reward rt and the next state st+1; 9 finish updating state st to st+1; 10 compute the advantage function A for each step 11 compute the loss Lppo of PPO and optimize the parameters θ and φ for R epochs 12 update the network parameters; 13 if iter mod 10 = 0 then 14 verify the strategy 15 if iter mod 20 = 0 then 16 sample a new batch of scheduling instances of size B 17 return |
5. Experimental Evaluation
5.1. Experimental Preliminaries
- (1)
- Dataset
- (i)
- Brandimarte moderately flexible problem instances [47].
- (ii)
- Three distinct large-scale instance sets, “edata (where few operations can be distributed across more than one machine)”, “rdata (where most operations may be distributed to certain machines)”, and “vdata (where all operations may be distributed to several machines)”, were introduced by Hurink et al. [52].
- (iii)
- Direct testing on larger instances demonstrates the method’s robust generalization. For instance, the DMU instance [53] exhibits a broad range of operation processing times.
- (2)
- Baseline
- (3)
- Configuration setting
- (i)
- For each problem size, the training strategy network undergoes 20,000 iterations, with each iteration comprising four independent trajectories (i.e., instances), and all original features are normalized to the same scale;
- (ii)
- For the CNN architectures, the network is designed to estimate the Q-value pair . Typical CNN architectures consist of convolutional layers, nonlinear activation layers, and fully connected layers. The convolutional layers used are uniformly partially convolutional filters . The convolutional layers use 16 filters with kernel sizes (1,2), and 100 neurons are used in the fully connected hidden layers. The neural network’s optimizer is Adam. Based on the size of the instances, the hyperparameters beta of 0.9, of and learning rate of . The number of epochs ranges from 50 to 100, depending on the size of the instances. In model design, efforts are made to prevent convergence to local optima during training;
- (iii)
- For the graph neural network GNN with node embedding, for equation , the number of iterations k is set to 2, and is set to 0. Each has 2 hidden layers with a dimension of 64. The action selection network and state value prediction network both have 2 hidden layers with a dimension of 32;
- (iv)
- For DQN, the replay buffer size is set to 20,000, the batch size is set to 64, the discount factor is set to 0.9, and the learning rate is set to 0.001;
- (v)
- For MAPPO, set the epochs of the updated network to 1, set the clipping parameter to 0.2, and set the policy loss L, value function V, and entropy coefficients to 2, 1, and 0.01, respectively. For training, adjust the discount factor to 1 and use an Adam optimizer with a constant learning rate of . Other parameters remain unchanged;
- (vi)
- The hardware is a machine equipped with an Intel(R) Xeon(R) Gold 6130 CPU and a single Nvidia GeForce 3080Ti GPU.
5.2. Experimental Result
- (1)
- Performance evaluation of algorithms
- (i)
- In all instances, DRL significantly outperforms the best rule, and the algorithm has stronger generalization;
- (ii)
- The performance of the scheduling rule is unstable on Hurink’s instances and Brandimarte’s instances. From the experimental results, it can be seen that on Hurink’s instance, the scheduling rules with better performance are FOPNR + SPTW and SRPT + SPTW, and from the mean value, FOPNR + EFT and SRPT + EFT on Brandimarte’s instance perform better; in contrast, the method proposed in this study shows some stability for public instances.
- (iii)
- It can be found that the DRL method significantly outperforms all scheduling rules when trained on small-scale instances and generalized on large-scale instances, indicating that the method proposed in this study is effective when dealing with high-dimensional input space; and for the whole learning process, DMU is the data used for testing, and it can be seen from the experimental data that the method proposed in this study can effectively learn to generate better for invisible instances solutions.
- (iv)
- Tested with the same parameters, the PPO algorithm [44] performs better on instances than DQN [41] and DDPG [58] and performs about the same as the metaheuristic on instances with a relatively small total number of JXMs but for larger instances, the performance of the method proposed in this study is significantly better. However, overall, regardless of the method used, the ability to solve large-scale problems is worse than the ability to solve small-scale problems, and the training error increases as the scale increases in comparison to DRL. The increase in problem size brings about an increase in the scheduling state space, and the learning error increases when using the same network structure, such as CNN, which requires more iterations and a more optimized network structure to reduce the training error. So, it also reflects that using an improved graph neural network structure can reduce the training error, adapt to different sizes of data input, and improve the training speed.
- (2)
- Energy consumption assessment
- (3)
- Performance evaluation of the architecture
- (4)
- Evaluation of machine resource allocation
- (5)
- Verification of actual processing workshop
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
DMASA | Distributed Multi-Agent Scheduling Architecture |
HGNN | Heterogeneous Graph Neural Network |
AC | Actor–Critic |
RL | Reinforcement Learning |
DRL | Deep Reinforcement Learning |
GE-HetGNN | Graph Embedding–Heterogeneous Graph Neural Network |
HetG | Heterogeneous Graphs |
MAPPO | Multi-Agent Proximal Policy Optimization |
MADRL | Multi-Agent Deep Reinforcement Learning |
GA | Genetic Algorithms |
PSO | Particle Swarm Optimization |
ACO | Ant Colony Optimization |
ANN | Artificial Neural Network |
Appendix A
Job Agent | Operation Sequence | Processing Machines and Time Consumption | |||||
---|---|---|---|---|---|---|---|
Wooden Crafts | Upper box | 11.44 | 13 | 14 | - | - | - |
Feeding | - | 25.14 | 26 | 27 | 30 | - | |
CNC1, CNC2 | 240 | 242 | 238.46 | - | - | - | |
Packing | 60 | 61 | - | 50.38 | - | - | |
Upper cover | 28 | - | 30 | - | 26.19 | 32 | |
Cutting materials | - | 28 | 36 | - | - | 26.84 | |
USB drives | Upper box | 11.44 | 13 | 14 | - | - | |
Feeding | - | 25.14 | 26 | 27 | 30 | - | |
Laser | 19 | 24 | 18.21 | - | - | - | |
Packing | 60 | 61 | - | 50.38 | - | ||
Upper cover | 28 | - | 30 | - | 26.19 | 32 | |
Cutting materials | - | 28 | 36 | - | - | 26.84 |
References
- Zhang, J.; Ding, G.; Zou, Y.; Qin, S.; Fu, J. Review of job shop scheduling research and its new perspectives under Industry 4.0. J. Intell. Manuf. 2019, 30, 1809–1830. [Google Scholar] [CrossRef]
- Azemi, F.; Tokody, D.; Maloku, B. An optimization approach and a model for Job Shop Scheduling Problem with Linear Programming. In Proceedings of the UBT International Conference 2019, Pristina, Kosovo, 26 October 2019. [Google Scholar]
- Sels, V.; Gheysen, N.; Vanhoucke, M. A comparison of priority rules for the job shop scheduling problem under different flow time-and tardiness-related objective functions. Int. J. Prod. Res. 2012, 50, 4255–4270. [Google Scholar] [CrossRef]
- Park, J.; Chun, J.; Kim, S.H.; Kim, Y.; Park, J. Learning to schedule job-shop problems: Representation and policy learning using graph neural network and reinforcement learning. Int. J. Prod. Res. 2021, 59, 3360–3377. [Google Scholar] [CrossRef]
- Nasiri, M.M.; Salesi, S.; Rahbari, A.; Salmanzadeh Meydani, N.; Abdollai, M. A data mining approach for population-based methods to solve the JSSP. Soft Comput. 2019, 23, 11107–11122. [Google Scholar] [CrossRef]
- Mao, H.; Schwarzkopf, M.; Venkatakrishnan, S.B.; Meng, Z.; Alizadeh, M. Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM Special Interest Group on Data Communication, Beijing, China, 19–23 August 2019; pp. 270–288. [Google Scholar]
- Wang, J.; Zhang, Y.; Liu, Y.; Wu, N. Multiagent and bargaining-game-based real-time scheduling for internet of things-enabled flexible job shop. IEEE Internet Things J. 2018, 6, 2518–2531. [Google Scholar] [CrossRef]
- Wang, Z.; Gombolay, M. Learning scheduling policies for multi-robot coordination with graph attention networks. IEEE Robot. Autom. Lett. 2020, 5, 4509–4516. [Google Scholar] [CrossRef]
- Hu, H.; Jia, X.; He, Q.; Fu, S.; Liu, K. Deep reinforcement learning based AGVs real-time scheduling with mixed rule for flexible shop floor in industry 4.0. Comput. Ind. Eng. 2020, 149, 106749. [Google Scholar] [CrossRef]
- Caldeira, R.H.; Gnanavelbabu, A.; Vaidyanathan, T. An effective backtracking search algorithm for multi-objective flexible job shop scheduling considering new job arrivals and energy consumption. Comput. Ind. Eng. 2020, 149, 106863. [Google Scholar] [CrossRef]
- Kong, M.; Xu, J.; Zhang, T.; Lu, S.; Fang, C.; Mladenovic, N. Energy-efficient rescheduling with time-of-use energy cost: Application of variable neighborhood search algorithm. Comput. Ind. Eng. 2021, 156, 107286. [Google Scholar] [CrossRef]
- Yin, S.; Xiang, Z. Adaptive operator selection with dueling deep Q-network for evolutionary multi-objective optimization. Neurocomputing 2024, 581, 127491. [Google Scholar] [CrossRef]
- Mangalampalli, S.; Karri, G.R.; Kumar, M.; Khalaf, O.I.; Romero, C.A.; Sahib, G.A. DRLBTSA: Deep reinforcement learning based task-scheduling algorithm in cloud computing. Multimed. Tools Appl. 2024, 83, 8359–8387. [Google Scholar] [CrossRef]
- Gui, Y.; Tang, D.; Zhu, H.; Zhang, Y.; Zhang, Z. Dynamic scheduling for flexible job shop using a deep reinforcement learning approach. Comput. Ind. Eng. 2023, 180, 109255. [Google Scholar] [CrossRef]
- Srinath, N.; Yilmazlar, I.O.; Kurz, M.E.; Taaffe, K. Hybrid multi-objective evolutionary meta-heuristics for a parallel machine scheduling problem with setup times and preferences. Comput. Ind. Eng. 2023, 185, 109675. [Google Scholar] [CrossRef]
- Kianfar, K.; Atighehchian, A. A hybrid heuristic approach to master surgery scheduling with downstream resource constraints and dividable operating room blocks. Ann. Oper. Res. 2023, 328, 727–754. [Google Scholar] [CrossRef]
- Chen, M.; Tan, Y. SF-FWA: A Self-Adaptive Fast Fireworks Algorithm for effective large-scale optimization. Swarm Evol. Comput. 2023, 80, 101314. [Google Scholar] [CrossRef]
- Wang, G.; Wang, P.; Zhang, H. A Self-Adaptive Memetic Algorithm for Distributed Job Shop Scheduling Problem. Mathematics 2024, 12, 683. [Google Scholar] [CrossRef]
- Cimino, A.; Elbasheer, M.; Longo, F.; Mirabelli, G.; Padovano, A.; Solina, V. A Comparative Study of Genetic Algorithms for Integrated Predictive Maintenance and Job Shop Scheduling. In Proceedings of the European Modeling and Simulation Symposium, EMSS, Santo Stefano, Italy, 18–20 September 2023. [Google Scholar]
- Dulebenets, M.A. An Adaptive Polyploid Memetic Algorithm for scheduling trucks at a cross-docking terminal. Inf. Sci. 2021, 565, 390–421. [Google Scholar] [CrossRef]
- Singh, P.; Pasha, J.; Moses, R.; Sobanjo, J.; Ozguven, E.E.; Dulebenets, M.A. Development of exact and heuristic optimization methods for safety improvement projects at level crossings under conflicting objectives. Reliab. Eng. Syst. Saf. 2022, 220, 108296. [Google Scholar] [CrossRef]
- Singh, E.; Pillay, N. A study of ant-based pheromone spaces for generation constructive hyper-heuristics. Swarm Evol. Comput. 2022, 72, 101095. [Google Scholar] [CrossRef]
- Jing, X.; Pan, Q.; Gao, L. Local search-based metaheuristics for the robust distributed permutation flowshop problem. Appl. Soft Comput. 2021, 105, 107247. [Google Scholar] [CrossRef]
- Luo, J.; El Baz, D.; Xue, R.; Hu, J. Solving the dynamic energy aware job shop scheduling problem with the heterogeneous parallel genetic algorithm. Future Gener. Comput. Syst. 2020, 108, 119–134. [Google Scholar] [CrossRef]
- Xu, B.; Mei, Y.; Wang, Y.; Ji, Z.; Zhang, M. Genetic programming with delayed routing for multiobjective dynamic flexible job shop scheduling. Evol. Comput. 2021, 29, 75–105. [Google Scholar] [CrossRef]
- Nguyen, S.; Mei, Y.; Xue, B.; Zhang, M. A hybrid genetic programming algorithm for automated design of dispatching rules. Evol. Comput. 2019, 27, 467–496. [Google Scholar] [CrossRef] [PubMed]
- Zhang, F.; Mei, Y.; Nguyen, S.; Zhang, M. Correlation coefficient-based recombinative guidance for genetic programming hyperheuristics in dynamic flexible job shop scheduling. IEEE Trans. Evol. Comput. 2021, 25, 552–566. [Google Scholar] [CrossRef]
- Li, Y.; He, Y.; Wang, Y.; Tao, F.; Sutherland, J.W. An optimization method for energy-conscious production in flexible machining job shops with dynamic job arrivals and machine breakdowns. J. Clean. Prod. 2020, 254, 120009. [Google Scholar] [CrossRef]
- Li, Z.; Chen, Y. Minimizing the makespan and carbon emissions in the green flexible job shop scheduling problem with learning effects. Sci. Rep. 2023, 13, 6369. [Google Scholar] [CrossRef] [PubMed]
- Shao, W.; Shao, Z.; Pi, D. A multi-neighborhood-based multi-objective memetic algorithm for the energy-efficient distributed flexible flow shop scheduling problem. Neural Comput. Appl. 2022, 34, 22303–22330. [Google Scholar] [CrossRef]
- Afsar, S.; Palacios, J.J.; Puente, J.; Vela, C.R.; Gonzalez-Rodriguez, I. Multi-objective enhanced memetic algorithm for green job shop scheduling with uncertain times. Swarm Evol. Comput. 2022, 68, 101016. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Akyol, D.E.; Bayhan, G.M. A review on evolution of production scheduling with neural networks. Comput. Ind. Eng. 2007, 53, 95–122. [Google Scholar] [CrossRef]
- Weckman, G.R.; Ganduri, C.V.; Koonce, D.A. A neural network job-shop scheduler. J. Intell. Manuf. 2008, 19, 191–201. [Google Scholar] [CrossRef]
- Gong, G.; Chiong, R.; Deng, Q.; Gong, X.; Lin, W.; Han, W.; Zhang, L. A two-stage memetic algorithm for energy-efficient flexible job shop scheduling by means of decreasing the total number of machine restarts. Swarm Evol. Comput. 2022, 75, 101131. [Google Scholar] [CrossRef]
- Park, I.B.; Huh, J.; Kim, J.; Park, J. A reinforcement learning approach to robust scheduling of semiconductor manufacturing facilities. IEEE Trans. Autom. Sci. Eng. 2019, 17, 1420–1431. [Google Scholar] [CrossRef]
- Xiong, H.; Fan, H.; Jiang, G.; Li, G. A simulation-based study of dispatching rules in a dynamic job shop scheduling problem with batch release and extended technical precedence constraints. Eur. J. Oper. Res. 2017, 257, 13–24. [Google Scholar] [CrossRef]
- Ning, T.; Huang, M.; Liang, X.; Jin, H. A novel dynamic scheduling strategy for solving flexible job-shop problems. J. Ambient Intell. Humaniz. Comput. 2016, 7, 721–729. [Google Scholar] [CrossRef]
- Baykasoglu, A.; Karaslan, F.S. Solving comprehensive dynamic job shop scheduling problem by using a GRASP-based approach. Int. J. Prod. Res. 2017, 55, 3308–3325. [Google Scholar] [CrossRef]
- Liu, Y.; Fan, J.; Zhao, L.; Shen, W.; Zhang, C. Integration of deep reinforcement learning and multi-agent system for dynamic scheduling of re-entrant hybrid flow shop considering worker fatigue and skill levels. Robot. Comput.-Integr. Manuf. 2023, 84, 102605. [Google Scholar] [CrossRef]
- Workneh, A.D.; Gmira, M. Learning to schedule (L2S): Adaptive job shop scheduling using double deep Q network. Smart Sci. 2023, 11, 409–423. [Google Scholar] [CrossRef]
- Luo, S. Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning. Appl. Soft Comput. 2020, 91, 106208. [Google Scholar] [CrossRef]
- Liu, C.; Chang, C.; Tseng, C. Actor-critic deep reinforcement learning for solving job shop scheduling problems. IEEE Access 2020, 8, 71752–71762. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhu, H.; Tang, D.; Zhou, T.; Gui, Y. Dynamic job shop scheduling based on deep reinforcement learning for multi-agent manufacturing systems. Robot. Comput. Integr. Manuf. 2022, 78, 102412. [Google Scholar] [CrossRef]
- Han, B.; Yang, J. Research on adaptive job shop scheduling problems based on dueling double DQN. IEEE Access 2020, 8, 186474–186495. [Google Scholar] [CrossRef]
- Huang, J.; Gao, L.; Li, X. An end-to-end deep reinforcement learning method based on graph neural network for distributed job-shop scheduling problem. Expert Syst. Appl. 2024, 238, 121756. [Google Scholar] [CrossRef]
- Brandimarte, P. Routing and scheduling in a flexible job shop by tabu search. Ann. Oper. Res. 1993, 41, 157–183. [Google Scholar] [CrossRef]
- Sun, Y.; Han, J.; Yan, X.; Yu, P.S.; Wu, T. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proc. VLDB Endow. 2011, 4, 992–1003. [Google Scholar] [CrossRef]
- Sun, Y.; Norick, B.; Han, J.; Yan, X.; Yu, P.S.; Yu, X. Pathselclus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. ACM Trans. Knowl. Discov. Data (TKDD) 2013, 7, 1–23. [Google Scholar] [CrossRef]
- Zhang, C.; Song, W.; Cao, Z.; Zhang, J.; Tan, P.S.; Chi, X. Learning to dispatch for job shop scheduling via deep reinforcement learning. Adv. Neural Inf. Process. Syst. 2020, 33, 1621–1632. [Google Scholar]
- Ni, F.; Hao, J.; Lu, J.; Tong, X.; Yuan, M.; Duan, J.; Ma, Y.; He, K. A multi-graph attributed reinforcement learning based optimization algorithm for large-scale hybrid flow shop scheduling problem. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021. [Google Scholar]
- Hurink, J.; Jurisch, B.; Thole, M. Tabu search for the job-shop scheduling problem with multi-purpose machines. Oper.-Res.-Spektrum 1994, 15, 205–215. [Google Scholar] [CrossRef]
- Jain, A.S.; Meeran, S. Deterministic job-shop scheduling: Past, present and future. Eur. J. Oper. Res. 1999, 113, 390–434. [Google Scholar] [CrossRef]
- Han, B.; Yang, J. A deep reinforcement learning based solution for flexible job shop scheduling problem. Int. J. Simul. Model. 2021, 20, 375–386. [Google Scholar] [CrossRef]
- Behnke, D.; Geiger, M.J. Test Instances for the Flexible Job Shop Scheduling Problem with Work Centers. Arbeitspapier/Research Paper/Helmut-Schmidt-Universitat, Lehrstuhl fur Betriebswirtschaftslehre, Insbes. Logistik-Management. 2012. Available online: https://d-nb.info/1023241773/34 (accessed on 1 February 2024).
- Ding, H.; Gu, X. Hybrid of human learning optimization algorithm and particle swarm optimization algorithm with scheduling strategies for the flexible job-shop scheduling problem. Neurocomputing 2020, 414, 313–332. [Google Scholar] [CrossRef]
- Rooyani, D.; Defersha, F.M. An efficient two-stage genetic algorithm for flexible job-shop scheduling. IFAC Pap. 2019, 52, 2519–2524. [Google Scholar] [CrossRef]
- Lu, R.; Li, Y.-C.; Li, Y.; Jiang, J.; Ding, Y. Multi-agent deep reinforcement learning based demand response for discrete manufacturing systems energy management. Appl Energy 2020, 276, 115473. [Google Scholar] [CrossRef]
- He, Y.; Li, Y.; Wu, T.; Sutherland, J.W. An energy-responsive optimization method for machine tool selection and operation sequence in flexible machining job shops. J. Clean. Prod. 2015, 87, 245–254. [Google Scholar] [CrossRef]
- Nouiri, M.; Bekrar, A.; Trentesaux, D. Towards Energy Efficient Scheduling and Rescheduling for Dynamic Flexible Job Shop Problem. IFAC-Pap. 2018, 51, 1275–1280. [Google Scholar] [CrossRef]
Instance | LB | FOPNR +SPTW | FOPR +EFT | SRPT +EFT | SRPT +SPTW | HLO-PSO | 2S-GA | Ours | |
---|---|---|---|---|---|---|---|---|---|
Brandimarte_Data | Mk01 | 36 | 59 | 76 | 71 | 69 | 40 | 43 | 42 |
Mk02 | 24 | 80 | 69 | 60 | 71 | 28 | 37 | 28 | |
Mk03 | 204 | 381 | 374 | 374 | 381 | 243 | 224 | 204 | |
Mk04 | 48 | 111 | 123 | 120 | 120 | 63 | 71 | 59 | |
Mk05 | 168 | 224 | 242 | 236 | 265 | 175 | 183 | 196 | |
Mk06 | 33 | 162 | 149 | 126 | 178 | 71 | 106 | 38 | |
Mk07 | 133 | 295 | 278 | 278 | 295 | 144 | 184 | 159 | |
Mk08 | 523 | 717 | 661 | 643 | 728 | 523 | 523 | 633 | |
Mk09 | 299 | 550 | 559 | 535 | 525 | 350 | 371 | 326 | |
Mk10 | 165 | 460 | 404 | 373 | 414 | 238 | 235 | 200 | |
Hurink_vata-la1-5 (10J10M) | la1 | 570 | 820 | 900 | 881 | 835 | 579 | 572 | 633 |
la2 | 529 | 799 | 970 | 807 | 870 | 541 | 532 | 611 | |
la3 | 477 | 678 | 740 | 790 | 790 | 497 | 481 | 485 | |
la4 | 502 | 775 | 848 | 830 | 804 | 519 | 506 | 530 | |
la5 | 457 | 628 | 768 | 669 | 710 | 471 | 463 | 494 | |
Hurink_vata-la11-15 (20J5M) | la11 | 1071 | 1422 | 1541 | 1590 | 1355 | 1077 | 1255 | 1072 |
la12 | 936 | 1135 | 1316 | 1471 | 1142 | 939 | 1091 | 937 | |
la13 | 1038 | 1250 | 1403 | 1434 | 1266 | 1041 | 1102 | 1039 | |
la14 | 1070 | 1299 | 1470 | 1228 | 1311 | 1077 | 1166 | 1071 | |
la15 | 1089 | 1540 | 1521 | 1523 | 1478 | 1093 | 1196 | 1090 |
Instance | LB | DQN +CNN | DDPG +CNN | PPO +CNN | PPO +GNN | MAPPO +GNN | Ours | |
---|---|---|---|---|---|---|---|---|
Brandimarte_Data | Mk01 | 36 | 45 | 44 | 42 | 42 | 43 | 42 |
Mk02 | 24 | 28 | 28 | 32 | 28 | 28 | 28 | |
Mk03 | 204 | 264 | 257 | 204 | 258 | 255 | 243 | |
Mk04 | 48 | 62 | 74 | 78 | 61 | 60 | 59 | |
Mk05 | 168 | 218 | 193 | 187 | 195 | 188 | 196 | |
Mk06 | 33 | 157 | 123 | 90 | 42 | 41 | 38 | |
Mk07 | 133 | 172 | 227 | 169 | 161 | 159 | 159 | |
Mk08 | 523 | 679 | 581 | 531 | 634 | 607 | 633 | |
Mk09 | 299 | 388 | 386 | 349 | 323 | 347 | 350 | |
Mk10 | 165 | 351 | 337 | 279 | 207 | 200 | 200 | |
Hurink_vata-la1-5 (10J10M) | la1 | 570 | 666 | 662 | 693 | 697 | 610 | 633 |
la2 | 529 | 655 | 645 | 643 | 647 | 555 | 611 | |
la3 | 477 | 597 | 574 | 580 | 583 | 532 | 497 | |
la4 | 502 | 609 | 635 | 610 | 613 | 530 | 530 | |
la5 | 457 | 593 | 531 | 556 | 558 | 507 | 494 | |
Hurink_vata-la11-15 (20J5M) | la11 | 1071 | 1222 | 1222 | 1249 | 1356 | 1101 | 1255 |
la12 | 936 | 1047 | 1039 | 1092 | 1185 | 950 | 1091 | |
la13 | 1038 | 1151 | 1171 | 1211 | 1315 | 1053 | 1102 | |
la14 | 1070 | 1292 | 1292 | 1248 | 1355 | 1086 | 1166 | |
la15 | 1089 | 1221 | 1266 | 1271 | 1379 | 1111 | 1196 |
Instance | LB | DQN + CNN | DDPG + CNN | PPO + CNN | PPO + GNN | MAPPO + GNN | Ours | |
---|---|---|---|---|---|---|---|---|
Dmu (20J15M) | dmu01 | 2563 | 3520 | 3678 | 3609 | 3323 | 2796 | 2755 |
dmu02 | 2706 | 3765 | 3965 | 3811 | 3630 | 2954 | 2974 | |
dmu03 | 2731 | 3953 | 4101 | 3846 | 3660 | 2965 | 2839 | |
dmu04 | 2669 | 3521 | 3912 | 3759 | 3716 | 2989 | 2716 | |
dmu05 | 2749 | 3990 | 3927 | 3872 | 3171 | 3015 | 2916 | |
Dmu (20J20M) | dmu06 | 3244 | 3526 | 4082 | 3724 | 3358 | 3398 | 3131 |
dmu07 | 3046 | 4311 | 3855 | 3497 | 3671 | 3182 | 2997 | |
dmu08 | 3188 | 4413 | 4035 | 3660 | 4048 | 3411 | 3127 | |
dmu09 | 3092 | 4361 | 3913 | 3549 | 4421 | 3449 | 3079 | |
dmu10 | 2984 | 4243 | 4777 | 3426 | 3621 | 3208 | 2904 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pu, Y.; Li, F.; Rahimifard, S. Multi-Agent Reinforcement Learning for Job Shop Scheduling in Dynamic Environments. Sustainability 2024, 16, 3234. https://doi.org/10.3390/su16083234
Pu Y, Li F, Rahimifard S. Multi-Agent Reinforcement Learning for Job Shop Scheduling in Dynamic Environments. Sustainability. 2024; 16(8):3234. https://doi.org/10.3390/su16083234
Chicago/Turabian StylePu, Yu, Fang Li, and Shahin Rahimifard. 2024. "Multi-Agent Reinforcement Learning for Job Shop Scheduling in Dynamic Environments" Sustainability 16, no. 8: 3234. https://doi.org/10.3390/su16083234