Applying Decision Transformers to Enhance Neural Local Search on the Job Shop Scheduling Problem
Abstract
:1. Introduction
- The introduction of a decision transformer approach to boost the performance of a learned local search heuristic on the JSSP.
- An experimental analysis of both the performance and the learned behavior of the model with respect to the teacher models.
- A resulting ML-method that, depending on the computational time constraints, leads to state-of-the-art results for solving the JSSP through ML-based local search.
2. Preliminaries
2.1. Job Shop Scheduling Problems
2.2. Solution Methods for Job Shop Scheduling Problems
2.3. Deep Reinforcement Learning
2.4. Decision Transformers
2.5. Neural Local Search
- Accepts or declines the solution of the last iteration,
- Chooses a new neighborhood operation that defines the next neighborhood, i.e., the set of solutions to integrate in the search,
- Or chooses a perturbation operator to jump to a new area in the search space in which to continue the search.
- Acceptance decisions: the decision of whether the last LS step is accepted or not:
- Acceptance–Neighborhood decision: a tuple representing the above acceptance decision and the choice between four different neighborhood operations in the set :
- Acceptance–Neighborhood–Perturbation decision: a tuple representing the acceptance and neighborhood decisions plus a perturbation decision from the perturbation operator set :
3. Related Work
3.1. Learned Construction Heuristics
3.2. Learned Heuristic Search
3.3. Imitation Learning on the JSSP
4. Methods
4.1. Dataset Generation
4.2. DT-Training
4.3. DT-Testing
5. Numerical Results
5.1. Results on Taillard Benchmark
5.2. Results on Own Test Instances
6. Student–Teacher Comparison
- What are the practical implications of using the comparatively larger and slower DT models during inference with respect to performance?
- Is there a correlation between the relatively better performance of DT models in comparison with their teacher models and a greater deviation in learned behavior?
- Is there an optimal return-to-go to achieve the best performance? The results in Table 3 and Table 4 indicate a better performance of the model when the same number of local search steps is performed. However, inferences of the DT take longer to compute than those of the teacher models on the same hardware.
7. Concluding Discussion and Outlook
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
CEI | Critical End Improvement |
CET | Critical End Time |
CT | Critical Time |
DRL | Deep Reinforcement Learning |
DT | Decision Transformer |
ECET | Extended Critical End Time |
JSSP | Job Shop Scheduling Problem |
LS | Local Search |
ML | Machine Learning |
NLS | Neural Local Search |
PDR | Priority Dispatching Rules |
References
- Samsonov, V.; Kemmerling, M.; Paegert, M.; Lütticke, D.; Sauermann, F.; Gützlaff, A.; Schuh, G.; Meisen, T. Manufacturing Control in Job Shop Environments with Reinforcement Learning. In Proceedings of the ICAART 2021, Online, 4–6 February 2021; Rocha, A.P., Steels, L., den van Herik, J., Eds.; SCITEPRESS-Science and Technology Publications Lda: Sétubal, Portugal, 2021; pp. 589–597. [Google Scholar] [CrossRef]
- Park, I.B.; Park, J. Scalable Scheduling of Semiconductor Packaging Facilities Using Deep Reinforcement Learning. IEEE Trans. Cybern. 2023, 53, 3518–3531. [Google Scholar] [CrossRef]
- Kocot, B.; Czarnul, P.; Proficz, J. Energy-Aware Scheduling for High-Performance Computing Systems: A Survey. Energies 2023, 16, 890. [Google Scholar] [CrossRef]
- Pinedo, M. Scheduling: Theory, Algorithms, and Systems, 5th ed.; Springer International Publishing: Berlin/Heidelberg, Germany, 2016. [Google Scholar] [CrossRef]
- Taillard, E. Benchmarks for basic scheduling problems. Eur. J. Oper. Res. 1993, 64, 278–285. [Google Scholar] [CrossRef]
- van Hoorn, J.J. The Current state of bounds on benchmark instances of the job-shop scheduling problem. J. Sched. 2018, 21, 127–128. [Google Scholar] [CrossRef]
- Demirkol, E.; Mehta, S.; Uzsoy, R. Benchmarks for shop scheduling problems. Eur. J. Oper. Res. 1998, 109, 137–141. [Google Scholar] [CrossRef]
- van Ekeris, T.; Meyes, R.; Meisen, T. Discovering Heuristics And Metaheuristics For Job Shop Scheduling From Scratch Via Deep Reinforcement Learning. In Proceedings of the Conference on Production Systems and Logistics: CPSL 2021, Online, 10–11 August 2021; pp. 709–718. [Google Scholar] [CrossRef]
- Stricker, N.; Kuhnle, A.; Sturm, R.; Friess, S. Reinforcement learning for adaptive order dispatching in the semiconductor industry. CIRP Ann. 2018, 67, 511–514. [Google Scholar] [CrossRef]
- Tassel, P.; Gebser, M.; Schekotihin, K. A Reinforcement Learning Environment For Job-Shop Scheduling. arXiv 2021, arXiv:2104.03760. [Google Scholar]
- Tassel, P.; Kovács, B.; Gebser, M.; Schekotihin, K.; Kohlenbrein, W.; Schrott-Kostwein, P. Reinforcement Learning of Dispatching Strategies for Large-Scale Industrial Scheduling. In Proceedings of the International Conference on Automated Planning and Scheduling, Singapore, 13–24 June 2022; Volume 32, pp. 638–646. [Google Scholar] [CrossRef]
- Waubert de Puiseau, C.; Peters, J.; Dörpelkus, C.; Tercan, H.; Meisen, T. schlably: A Python framework for deep reinforcement learning based scheduling experiments. SoftwareX 2023, 22, 101383. [Google Scholar] [CrossRef]
- Cheng, L.; Tang, Q.; Zhang, L.; Zhang, Z. Multi-objective Q-learning-based hyper-heuristic with Bi-criteria selection for energy-aware mixed shop scheduling. Swarm Evol. Comput. 2021, 69, 100985. [Google Scholar] [CrossRef]
- Reijnen, R.; Zhang, Y.; Bukhsh, Z.; Guzek, M. Learning to Adapt Genetic Algorithms for Multi-Objective Flexible Job Shop Scheduling Problems. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation, New York, NY, USA, 15–19 July 2023. [Google Scholar] [CrossRef]
- Han, Z.Y.; Pedrycz, W.; Zhao, J.; Wang, W. Hierarchical Granular Computing-Based Model and Its Reinforcement Structural Learning for Construction of Long-Term Prediction Intervals. IEEE Trans. Cybern. 2022, 52, 666–676. [Google Scholar] [CrossRef]
- Chen, X.; Tian, Y. Learning to Perform Local Rewriting for Combinatorial Optimization. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 6281–6292. [Google Scholar]
- Tassel, P.; Gebser, M.; Schekotihin, K. An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling Problems Based on Constraint Programming. In Proceedings of the 2nd Conference on Production Systems and Logistics, Querétaro, Mexico, 28 February–2 March 2023; pp. 614–622. [Google Scholar]
- Falkner, J.K.; Thyssens, D.; Bdeir, A.; Schmidt-Thieme, L. Learning to Control Local Search for Combinatorial Optimization. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Grenoble, France, 19–23 September 2022; Volume 13717, pp. 361–376. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Chen, L.; Lu, K.; Rajeswaran, A.; Lee, K.; Grover, A.; Laskin, M.; Abbeel, P.; Srinivas, A.; Mordatch, I. Decision Transformer: Reinforcement Learning via Sequence Modeling. Adv. Neural Inf. Process. Syst. 2021, 34, 15084–15097. [Google Scholar]
- Perron, L.; Didier, F. CP-SAT. 2024. Available online: https://developers.google.com/optimization/cp/cp_solver/ (accessed on 17 February 2025).
- Haupt, R. A survey of priority rule-based scheduling. OR Spectr. 1989, 11, 3–16. [Google Scholar] [CrossRef]
- Adams, J.; Balas, E.; Zawack, D. The Shifting Bottleneck Procedure for Job Shop Scheduling. Manag. Sci. 1988, 34, 391–401. [Google Scholar] [CrossRef]
- Bhatt, N.; Chauhan, N.R. Genetic algorithm applications on Job Shop Scheduling Problem: A review. In Proceedings of the 2015 International Conference on Soft Computing Techniques and Implementations (ICSCTI), Faridabad, India, 8–10 October 2015; IEEE: Piscataway, NJ, USA, 2015. [Google Scholar] [CrossRef]
- Lourenço, H.R.; Martin, O.C.; Stützle, T. Iterated Local Search: Framework and Applications. Handb. Metaheuristics 2019, 272, 129–168. [Google Scholar] [CrossRef]
- Hansen, P.; Mladenović, N.; Brimberg, J.; Pérez, J.A.M. Variable Neighborhood Search. Handb. Metaheuristics 2019, 272, 57–97. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A. Reinforcement Learning: An Introduction, 2nd ed.; Adaptive Computation and Machine Learning; The MIT Press: Cambridge, MA, USA; London, UK, 2018. [Google Scholar]
- Ding, W.; Majcherczyk, N.; Deshpande, M.; Qi, X.; Zhao, D.; Madhivanan, R.; Sen, A. Learning to View: Decision Transformers for Active Object Detection. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar] [CrossRef]
- Wang, S.; Chen, X.; Jannach, D.; Yao, L. Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, 23–27 July 2023. [Google Scholar] [CrossRef]
- Lai, Y.; Liu, J.; Tang, Z.; Wang, B.; Hao, J.; Luo, P. ChiPFormer: Transferable Chip Placement via Offline Decision Transformer. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 18346–18364. [Google Scholar]
- van Laarhoven, P.J.M.; Aarts, E.H.L.; Lenstra, J.K. Job Shop Scheduling by Simulated Annealing. Oper. Res. 1992, 40, 113–125. [Google Scholar] [CrossRef]
- Nowicki, E.; Smutnicki, C. A Fast Taboo Search Algorithm for the Job Shop Problem. Manag. Sci. 1996, 42, 797–813. [Google Scholar] [CrossRef]
- Kuhpfahl, J.; Bierwirth, C. A study on local search neighborhoods for the job shop scheduling problem with total weighted tardiness objective. Comput. Oper. Res. 2016, 66, 44–57. [Google Scholar] [CrossRef]
- Balas, E.; Vazacopoulos, A. Guided Local Search with Shifting Bottleneck for Job Shop Scheduling. Manag. Sci. 1998, 44, 262–275. [Google Scholar] [CrossRef]
- van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar] [CrossRef]
- Zhang, C.; Song, W.; Cao, Z.; Zhang, J.; Tan, P.S.; Chi, X. Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning. Adv. Neural Inf. Process. Syst. 2020, 33, 1621–1632. [Google Scholar]
- Corsini, A.; Porrello, A.; Calderara, S.; Dell’Amico, M. Self-Labeling the Job Shop Scheduling Problem. Adv. Neural Inf. Process. Syst. 2024, 37, 105528–105551. [Google Scholar]
- Pirnay, J.; Grimm, D.G. Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement. Trans. Mach. Learn. Res. arXiv 2024, arXiv:2403.15180. [Google Scholar]
- Park, J.; Chun, J.; Kim, S.H.; Kim, Y.; Park, J. Learning to schedule job-shop problems: Representation and policy learning using graph neural network and reinforcement learning. Int. J. Prod. Res. 2021, 59, 3360–3377. [Google Scholar] [CrossRef]
- Waubert de Puiseau, C.; Meyes, R.; Meisen, T. On reliability of reinforcement learning based production scheduling systems: A comparative survey. J. Intell. Manuf. 2022, 33, 911–927. [Google Scholar] [CrossRef]
- Waubert de Puiseau, C.; Dörpelkus, C.; Peters, J.; Tercan, H.; Meisen, T. Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling. arXiv 2024, arXiv:2406.07325v1. [Google Scholar]
- Ingimundardottir, H.; Runarsson, T.P. Discovering dispatching rules from data using imitation learning: A case study for the job-shop problem. J. Sched. 2018, 21, 413–428. [Google Scholar] [CrossRef]
- Rinciog, A.; Mieth, C.; Scheikl, P.M.; Meyer, A. Sheet-Metal Production Scheduling Using AlphaGo Zero. In Proceedings of the Conference on Production Systems and Logistics: CPSL 2020, Stellenbosch, South Africa, 17–20 March 2020; pp. 342–352. [Google Scholar] [CrossRef]
- Lee, J.H.; Kim, H.J. Imitation Learning for Real-Time Job Shop Scheduling Using Graph-Based Representation. In Proceedings of the 2022 Winter Simulation Conference (WSC), Singapore, 11–14 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 3285–3296. [Google Scholar] [CrossRef]
- Falkner, J.K. NeuroLS. GitHub 2024. Available online: https://github.com/jokofa/NeuroLS (accessed on 17 February 2025).
- Sels, V.; Gheysen, N.; Vanhoucke, M. A comparison of priority rules for the job shop scheduling problem under different flow time- and tardiness-related objective functions. Int. J. Prod. Res. 2012, 50, 4255–4270. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980v9. [Google Scholar]
- Karpathy, A. minGPT. 2020. Available online: https://github.com/karpathy/minGPT (accessed on 17 February 2025).
- Waubert de Puiseau, C.; Tercan, H.; Meisen, T. Curriculum Learning in Job Shop Scheduling using Reinforcement Learning. In Proceedings of the Conference on Production Systems and Logistics: CPSL 2023—1, Querétaro, Mexico, 28 February–3 March 2023; pp. 34–43. [Google Scholar] [CrossRef]
- Hu, S.; Fan, Z.; Huang, C.; Shen, L.; Zhang, Y.; Wang, Y.; Tao, D. Q-value Regularized Transformer for Offline Reinforcement Learning. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Kostrikov, I.; Nair, A.; Levine, S. Offline Reinforcement Learning with Implicit Q-Learning. In Proceedings of the 35th Interantional Conference on Neural Information Processing Systems, Online, 6–14 December 2021; Volume 35. [Google Scholar]
AA | - | - | - | - | - | - | - | - |
AAN | - | X | X | X | - | - | - | - |
AANP | X | - | - | X | X | X | X |
Hyperparameter/Design | Value |
---|---|
Number of layers | 6 |
Number of attention heads | 8 |
Embedding dimension | 128 |
Batch size | 512 |
Context length K | 50 |
Nonlinearity | GeLU |
Dropout | 0.1 |
Adam betas | (0.9, 0.95) |
Grad norm clip | 1.0 |
Weight decay | 0.1 |
Learning rate decay | Cosine decay |
Average | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
NLSA | gap | 7.74% | 12.16% | 11.54% | 14.33% | 19.42% | 18.00% | 11.22% | 5.89% | 12.54% |
makespan | 1324.0 | 1530.7 | 1804.0 | 2043.1 | 2267.0 | 3273.0 | 3163.0 | 5682.0 | 2635.85 | |
NLSAN | opt. gap | 8.72% | 11.39% | 11.67% | 14.31% | 19.57% | 18.29% | 11.15% | 5.84% | 12.62% |
makespan | 1336.0 | 1520.2 | 1806.0 | 2042.7 | 2270.0 | 3281.0 | 3161.0 | 5679.0 | 2636.99 | |
NLSANP | opt. gap | 10.42% | 15.63% | 13.83% | 13.82% | 19.10% | 10.62% | 10.83% | 5.73% | 12.50% |
makespan | 1357.0 | 1578.1 | 1841.0 | 2033.9 | 2261.0 | 3068.5 | 3152.0 | 5673.0 | 2620.56 | |
DT | opt. gap | 8.63% | 11.15% | 11.73% | 14.12% | 19.21% | 10.55% | 10.59% | 5.88% | 11.48% |
makespan | 1335.0 | 1517.0 | 1807.0 | 2039.3 | 2263 | 3066.4 | 3145.0 | 5681.0 | 2606.71 | |
DT100 | opt. gap | 7.66% | 12.05% | 11.42% | 13.62% | 19.26% | 10.74% | 10.83% | 5.56% | 11.39% |
makespan | 1323.0 | 1529.3 | 1802.0 | 2030.4 | 2264.0 | 3071.7 | 3152.0 | 5664.0 | 2604.55 | |
optimal | makespan | 1228.9 | 1364.8 | 1617.3 | 1787.0 | 1898.4 | 2773.8 | 2843.9 | 5365.7 | 2359.98 |
Average | |||||||||
---|---|---|---|---|---|---|---|---|---|
NLSAN/NLSANP | 1319.7 | 1499.4 | 1736.8 | 1972.2 | 2178.2 | 2974.9 | 3137.0 | 5694.7 | 2564.1 |
DT | 1310.0 | 1496.4 | 1739.5 | 1972.1 | 2181.0 | 2974.7 | 3138.1 | 5689.6 | 2562.7 |
DT100 | 1301.3 | 1499.2 | 1737.0 | 1972.6 | 2181.1 | 2975.3 | 3136.9 | 5675.2 | 2559.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Waubert de Puiseau, C.; Wolz, F.; Montag, M.; Peters, J.; Tercan, H.; Meisen, T. Applying Decision Transformers to Enhance Neural Local Search on the Job Shop Scheduling Problem. AI 2025, 6, 48. https://doi.org/10.3390/ai6030048
Waubert de Puiseau C, Wolz F, Montag M, Peters J, Tercan H, Meisen T. Applying Decision Transformers to Enhance Neural Local Search on the Job Shop Scheduling Problem. AI. 2025; 6(3):48. https://doi.org/10.3390/ai6030048
Chicago/Turabian StyleWaubert de Puiseau, Constantin, Fabian Wolz, Merlin Montag, Jannik Peters, Hasan Tercan, and Tobias Meisen. 2025. "Applying Decision Transformers to Enhance Neural Local Search on the Job Shop Scheduling Problem" AI 6, no. 3: 48. https://doi.org/10.3390/ai6030048
APA StyleWaubert de Puiseau, C., Wolz, F., Montag, M., Peters, J., Tercan, H., & Meisen, T. (2025). Applying Decision Transformers to Enhance Neural Local Search on the Job Shop Scheduling Problem. AI, 6(3), 48. https://doi.org/10.3390/ai6030048