Advancing Sustainable Manufacturing: Reinforcement Learning with Adaptive Reward Machine Using an Ontology-Based Approach
Abstract
:1. Introduction
2. Related Work
3. Background
3.1. Markov Decision Process
3.2. Trust Region Policy Optimization
3.3. Reward Machine
3.4. Ontology
4. Job Shop Scheduling: Partially Observable and Dynamic Environment
- Orders should be processed by machines based on a specified sequence of operations.
- Sources , , and generate orders.
- Machines/Resources can process one order at a time. Machines are categorized into three groups: , , and . Each group consists of machines that can perform similar operations, following the Group Technology (GT) principle of forming part families based on process similarities. This grouping allows for streamlined processing of orders, as machines within the same group can be interchangeably used for specific types of operations, thereby reducing setup times and enhancing throughput.
- Work areas , , and are located in different locations in the factory to facilitate the efficient movement of orders. Each work area contains a subset of machine groups, ensuring that orders can be processed with minimal transportation delays between operations.
- Sinks , , and are entities that consume processed orders, and their capacity is the number of orders they consume at each time step.
- The scheduler agent selects an order and sends it to a machine/sink for its next operation step. The steps involved are as follows: The agent selects an order from the queue of orders at the source. Then, it moves to a work area and a group and selects a machine in the desired group to process the order. Finally, it selects an order from the queue of orders in the machine and places the processed order in a sink.
- selects an order from the source order queue.
- selects a work area for moving the order there.
- selects a group for selecting a machine from it.
- selects a machine for processing the order.
- selects an order from the machine order queue to be processed.
- selects a sink to consume the processed order.
- Random strategy for tasks , , and , which selects a work area, group, or sink randomly;
- LIFO strategy for task , which selects the last order entered into the queue for resource allocation;
- FIFO strategy for task , which selects the first order entered into the queue for processing;
- RL strategy for task , which selects the machine with the highest reward.
5. An Ontology-Based Adaptive Reward Machine for Reinforcement Learning Algorithms
Algorithm 1 Ontology-based Adaptive Reward Machine (ONTOADAPT-REWARD). |
|
5.1. JSS Ontology
5.2. Modeling JSS as an RL Process
- , selecting one of the orders in the source order queue.
- , selecting one of the three work areas.
- , selecting one of the three groups of machines.
- , selecting one of the sixteen machines.
- , selecting one of the six orders in the machine order queue.
- , selecting one of the three sinks.
- .
- .
- .
- .
- .
- .
5.3. Ontology-Based Modeling of State
- {NegativeBelief ⊑ Belief,
- PositiveBelief ⊑ Belief,
- NeutralBelief ⊑ Belief,
- WaitingTime ⊑ Time ⊓ ∀has.NegativeBelief,
- WorkingTime ⊑ Time ⊓ ∀has.PositiveBelief,
- SetupTime ⊑ Time ⊓ ∀has.NegativeBelief,
- IdleTime ⊑ Time ⊓ ∀has.NegativeBelief,
- FailureTime ⊑ Time ⊓ ∀has.NegativeBelief,
- RepairTime ⊑ Time ⊓ ∀has.NegativeBelief,
- Priority ⊑ Number ⊓ ∀has.NegativeBelief,
- Size ⊑ Number ⊓ ∀has.NeutralBelief,
- CostOverTime ⊑ Number ⊓ ∀has.NegativeBelief,
- MaintenanceCost ⊑ Number ⊓ ∀has.NegativeBelief,
- ProfitRate ⊑ Number ⊓ ∀has.PositiveBelief,
- Distance ⊑ Number ⊓ ∀has.NegativeBelief,
- Location ⊑ Number ⊓ ∀has.NeutralBelief,
- FailureRate ⊑ Number ⊓ ∀has.NegativeBelief,
- MachineCount ⊑ Number ⊓ ∀has.PositiveBelief,
- SourceCapacity ⊑ Number ⊓ ∀has.PositiveBelief,
- SinkCapacity ⊑ Number ⊓ ∀has.PositiveBelief,
- ProcessingCapacity ⊑ Number ⊓ ∀has.PositiveBelief,
- DueDate ⊑ Time ⊓ ∀has.PositiveBelief,
- OperationStep ⊑ Generate ⊔ Process ⊔ Consume ⊓ ∀has.NeutralBelief,
- Mobility ⊑ Mobile ⊔ Stationary ⊓ ∀has.NeutralBelief,
- Status ⊑ (Working ⊓ ∀has.PositiveBelief) ⊔ (Failure ⊓ ∀has.NegativeBelief) ⊔ (Idle ⊓ ∀has.NegativeBelief),
- Belief ⊑ Thing ⊓ PositiveBelief ⊔ NegativeBelief ⊔ NeutralBelief,
- Source ⊑ Thing ⊓ ∀has.SourceCapacity ⊓ ∃has.Order,
- Order ⊑ Thing ⊓ ∃has.DueDate ⊓ ∃has.Priority ⊓ ∃has.WaitingTime ⊓ ∀has.OperationStep ⊓ ∃has.ProfitRate ⊓ ∀has.Size ⊓ ∃has.CostOverTime,
- WorkArea ⊑ Thing ⊓ ∀has.Distance ⊓ ∃has.Group,
- Group ⊑ Thing ⊓ ∀has.MachineCount ⊓ ∀has.Machine,
- Machine ⊑ Thing ⊓ ∃has.Order ⊓ ∀has.ProcessingCapacity ⊓ ∃has.WorkingTime ⊓ ∃has.FailureTime ⊓ ∃has.IdleTime ⊓ ∃has.SetupTime ⊓ ∃has.RepairTime ⊓ ∀has.FailureRate ⊓ ∃has.MaintenanceCost ⊓ ∀has.Status ⊓ ∀has.Location ⊓ ∀has.Mobility,
- Sink ⊑ Thing ⊓ ∀has.SinkCapacity}
5.4. Reward Machine Modeling
5.4.1. New Propositional Symbol Extraction
5.4.2. New Reward Function Extraction
5.5. Learning
6. Evaluation
6.1. ONTOADAPT-REWARD-Based JSS
6.2. Model Evaluation
- Average utilization rate of machines (see Equation (6)).
- Average waiting time : The average waiting time of orders is computed as follows:
- Total failed orders : The number of failed orders due to delay (i.e., due date requirement).
- Total processed orders : The count of successfully processed orders.
6.3. Results and Discussion
7. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Briefing, U.S. International energy outlook 2013. US Energy Inf. Adm. 2013, 506, 507. [Google Scholar]
- International Energy Agency. Global Energy Review 2022; International Energy Agency: Paris, France, 2022. [Google Scholar]
- Yin, L.; Li, X.; Gao, L.; Lu, C.; Zhang, Z. A novel mathematical model and multi-objective method for the low-carbon flexible job shop scheduling problem. Sustain. Comput. Inform. Syst. 2017, 13, 15–30. [Google Scholar] [CrossRef]
- Eslami, Y.; Lezoche, M.; Panetto, H.; Dassisti, M. On analysing sustainability assessment in manufacturing organisations: A survey. Int. J. Prod. Res. 2021, 59, 4108–4139. [Google Scholar] [CrossRef]
- Popper, J.; Motsch, W.; David, A.; Petzsche, T.; Ruskowski, M. Utilizing multi-agent deep reinforcement learning for flexible job shop scheduling under sustainable viewpoints. In Proceedings of the International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), Mauritius, 7–8 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
- Goutham, P.M.; Rohit, Y.; Nanjundeswaraswamy, T. A review on smart manufacturing, technologies and challenges. Int. Res. J. Eng. Technol. IRJET 2022, 9, 663–678. [Google Scholar]
- Yang, L.; Li, J.; Chao, F.; Hackney, P.; Flanagan, M. Job shop planning and scheduling for manufacturers with manual operations. Expert Syst. 2021, 38. [Google Scholar] [CrossRef]
- Pach, C.; Berger, T.; Sallez, Y.; Bonte, T.; Adam, E.; Trentesaux, D. Reactive and energy-aware scheduling of flexible manufacturing systems using potential fields. Comput. Ind. 2014, 65, 434–448. [Google Scholar] [CrossRef]
- Tang, D.; Dai, M. Energy-efficient approach to minimizing the energy consumption in an extended job-shop scheduling problem. Chin. J. Mech. Eng. 2015, 28, 1048–1055. [Google Scholar] [CrossRef]
- Yang, S.; Wang, D.; Chai, T.; Kendall, G. An improved constraint satisfaction adaptive neural network for job-shop scheduling. J. Sched. 2010, 13, 17–38. [Google Scholar] [CrossRef]
- Zhou, T.; Zhu, H.; Tang, D.; Liu, C.; Cai, Q.; Shi, W.; Gui, Y. Reinforcement learning for online optimization of job-shop scheduling in a smart manufacturing factory. Adv. Mech. Eng. 2022, 14, 1–19. [Google Scholar]
- Zeng, Y.; Liao, Z.; Dai, Y.; Wang, R.; Li, X.; Yuan, B. Hybrid intelligence for dynamic job-shop scheduling with deep reinforcement learning and attention mechanism. arXiv 2022, arXiv:2201.00548. [Google Scholar]
- Ghanadbashi, S.; Zarchini, A.; Golpayegani, F. An ontology-based augmented observation for decision-making in partially observable environments. In Proceedings of the International Conference on Agents and Artificial Intelligence (ICAART), Lisbon, Portugal, 22–24 February 2023; SCITEPRESS: Setubal, Portugal, 2023; pp. 343–354. [Google Scholar]
- Ghanadbashi, S. Ontology-enhanced decision-making for autonomous agents in dynamic and partially observable environments. arXiv 2024, arXiv:2405.17691. [Google Scholar]
- Wang, C.; Zeng, L. Optimization of multi-objective job-shop scheduling under uncertain environment. J. Eur. Syst. Autom. 2019, 52, 179–183. [Google Scholar] [CrossRef]
- Zhang, H.; Buchmeister, B.; Li, X.; Ojstersek, R. Advanced metaheuristic method for decision-making in a dynamic job shop scheduling environment. Mathematics 2021, 9, 909. [Google Scholar] [CrossRef]
- Cunha, B.; Madureira, A.M.; Fonseca, B.; Coelho, D. Deep reinforcement learning as a job shop scheduling solver: A literature review. In Proceedings of the Hybrid Intelligent Systems: 18th International Conference on Hybrid Intelligent Systems (HIS 2018), Porto, Portugal, 13–15 December 2018; Springer: Cham, Switzerland, 2020; pp. 350–359. [Google Scholar]
- Palacio, J.C.; Jiménez, Y.M.; Schietgat, L.; Van Doninck, B.; Nowé, A. A Q-Learning algorithm for flexible job shop scheduling in a real-world manufacturing scenario. Procedia CIRP 2022, 106, 227–232. [Google Scholar] [CrossRef]
- Cao, Z.; Zhou, L.; Hu, B.; Lin, C. An adaptive scheduling algorithm for dynamic jobs for dealing with the flexible job shop scheduling problem. Bus. Inf. Syst. Eng. 2019, 61, 299–309. [Google Scholar] [CrossRef]
- Bülbül, K.; Kaminsky, P. A linear programming-based method for job shop scheduling. J. Sched. 2013, 16, 161–183. [Google Scholar] [CrossRef]
- Yu, Y.; Ying, Y. The dynamic job shop scheduling approach based on data-driven genetic algorithm. Open Electr. Electron. Eng. J. 2014, 8, 41–45. [Google Scholar] [CrossRef]
- Bürgy, R. A neighborhood for complex job shop scheduling problems with regular objectives. J. Sched. 2017, 20, 391–422. [Google Scholar] [CrossRef]
- Nguyen, S.; Zhang, M.; Johnston, M.; Tan, K.C. Genetic programming for job shop scheduling. In Evolutionary and Swarm Intelligence Algorithms; Springer: Cham, Switzerland, 2019; pp. 143–167. [Google Scholar]
- Abdullah, S.; Abdolrazzagh-Nezhad, M. Fuzzy job-shop scheduling problems: A review. Inf. Sci. 2014, 278, 380–407. [Google Scholar] [CrossRef]
- Li, R.; Gong, W.; Lu, C. Self-adaptive multi-objective evolutionary algorithm for flexible job shop scheduling with fuzzy processing time. Comput. Ind. Eng. 2022, 168, 108099. [Google Scholar] [CrossRef]
- Pathak, D.; Agrawal, P.; Efros, A.A.; Darrell, T. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, NSW, Australia, 6–11 August 2017; PMLR: Breckenridge, CO, USA, 2017; pp. 2778–2787. [Google Scholar]
- Yang, D.; Tang, Y. Adaptive inner-reward shaping in sparse reward games. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; IEEE: New York, NY, USA, 2020; pp. 1–8. [Google Scholar]
- Hu, Y.; Wang, W.; Jia, H.; Wang, Y.; Chen, Y.; Hao, J.; Wu, F.; Fan, C. Learning to utilize shaping rewards: A new approach of reward shaping. Adv. Neural Inf. Process. Syst. 2020, 33, 15931–15941. [Google Scholar]
- De Hauwere, Y.M.; Devlin, S.; Kudenko, D.; Nowé, A. Context-sensitive reward shaping for sparse interaction multi-agent systems. Knowl. Eng. Rev. 2016, 31, 59–76. [Google Scholar] [CrossRef]
- Tenorio-Gonzalez, A.C.; Morales, E.F.; Villasenor-Pineda, L. Dynamic reward shaping: Training a robot by voice. In Proceedings of the Ibero-American Conference on Artificial Intelligence (IBERAMIA), Bahia Blanca, Argentina, 1–5 November 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 483–492. [Google Scholar]
- Michini, B.; Cutler, M.; How, J.P. Scalable reward learning from demonstration. In Proceedings of the International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, 6–10 May 2013; IEEE: New York, NY, USA, 2013; pp. 303–308. [Google Scholar]
- Hadfield-Menell, D.; Milli, S.; Abbeel, P.; Russell, S.J.; Dragan, A. Inverse reward design. Adv. Neural Inf. Process. Syst. 2017, 30, 6765–6774. [Google Scholar]
- Baier, C.; Katoen, J.P. Principles of Model Checking; MIT Press: Cambridge, MA, USA, 2008. [Google Scholar]
- Wainwright, M.J.; Jordan, M.I. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 2008, 1, 1–305. [Google Scholar] [CrossRef]
- Toro Icarte, R.; Waldie, E.; Klassen, T.; Valenzano, R.; Castro, M.; McIlraith, S. Learning reward machines for partially observable reinforcement learning. Adv. Neural Inf. Process. Syst. 2019, 32, 15497–15508. [Google Scholar] [CrossRef]
- Furelos-Blanco, D.; Law, M.; Jonsson, A.; Broda, K.; Russo, A. Hierarchies of reward machines. arXiv 2022, arXiv:2205.15752. [Google Scholar]
- Zhou, W.; Li, W. A hierarchical bayesian approach to inverse reinforcement learning with symbolic reward machines. arXiv 2022, arXiv:2204.09772. [Google Scholar]
- Brewster, C.; O’Hara, K. Knowledge representation with ontologies: The present and future. IEEE Intell. Syst. 2004, 19, 72–81. [Google Scholar] [CrossRef]
- Ghanadbashi, S.; Zarchini, A.; Golpayegani, F. Ontology-based adaptive reward functions. In Proceedings of the Modelling and Representing Context (MRC) at 26th European Conference on Artificial Intelligence (ECAI), Krakow, Poland, 30 September–4 October; Springer: Berlin/Heidelberg, Germany, 2023; pp. 1–7. [Google Scholar]
- Chen, J.; Frank Chen, F. Adaptive scheduling and tool flow control in flexible job shops. Int. J. Prod. Res. IJPR 2008, 46, 4035–4059. [Google Scholar] [CrossRef]
- Zhang, B.X.; Yi, L.X.; Xiao, S. Study of stochastic job shop dynamic scheduling. In Proceedings of the International Conference on Machine Learning and Cybernetics, Guangzhou, China, 18–21 August 2005; IEEE: New York, NY, USA, 2005; Volume 2, pp. 911–916. [Google Scholar]
- Dominic, P.D.; Kaliyamoorthy, S.; Kumar, M.S. Efficient dispatching rules for dynamic job shop scheduling. Int. J. Adv. Manuf. Technol. 2004, 24, 70–75. [Google Scholar] [CrossRef]
- Chen, S.; Huang, Z.; Guo, H. An end-to-end deep learning method for dynamic job shop scheduling problem. Machines 2022, 10, 573. [Google Scholar] [CrossRef]
- Toro Icarte, R.; Klassen, T.; Valenzano, R.; McIlraith, S. Using reward machines for high-level task specification and decomposition in reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; PMLR: Breckenridge, CO, USA, 2018; pp. 2107–2116. [Google Scholar]
- Meuleau, N.; Peshkin, L.; Kim, K.E.; Kaelbling, L.P. Learning finite-state controllers for partially observable environments. arXiv 2013, arXiv:1301.6721. [Google Scholar]
- Mahmud, M.M.H. Constructing states for reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML), Haifa, Israel, 21–24 June 2010; OmniPress: West Norriton, PA, USA, 2010; pp. 727–734. [Google Scholar]
- Ghanadbashi, S.; Golpayegani, F. Using ontology to guide reinforcement learning agents in unseen situations. Appl. Intell. APIN 2022, 52, 1808–1824. [Google Scholar] [CrossRef]
- Ng, A.Y.; Harada, D.; Russell, S. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the International Conference on Machine Learning (ICML), Bled, Slovenia, 27–30 June 1999; Volume 99, pp. 278–287. [Google Scholar]
- Mannion, P.; Devlin, S.; Duggan, J.; Howley, E. Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning. Knowl. Eng. Rev. 2018, 33, e23. [Google Scholar] [CrossRef]
- Laud, A.D. Theory and Application of Reward Shaping in Reinforcement Learning. Ph.D. Thesis, University of Illinois at Urbana-Champaign, Urbana, IL, USA, 2004. [Google Scholar]
- Zou, H.; Ren, T.; Yan, D.; Su, H.; Zhu, J. Reward shaping via meta-learning. arXiv 2019, arXiv:1901.09330. [Google Scholar]
- Niekum, S.; Spector, L.; Barto, A. Evolution of reward functions for reinforcement learning. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), Dublin, Ireland, 12–16 July 2011; pp. 177–178. [Google Scholar]
- Hussein, A.; Elyan, E.; Gaber, M.M.; Jayne, C. Deep reward shaping from demonstrations. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; IEEE: New Yoek, NY, USA, 2017; pp. 510–517. [Google Scholar]
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning (ICML), Lille, France, 7–9 July 2015; PMLR: Breckenridge, CO, USA, 2015; pp. 1889–1897. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Zouaq, A.; Nkambou, R. A survey of domain ontology engineering: Methods and tools. In Advances in Intelligent Tutoring Systems; Springer: Berlin/Heidelberg, Germany, 2010; pp. 103–119. [Google Scholar]
- Alsubait, T.; Parsia, B.; Sattler, U. Measuring similarity in ontologies: A new family of measures. In Proceedings of the International Conference on Knowledge Engineering and Knowledge Management (EKAW), Linkoping, Sweden, 24–28 November 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 13–25. [Google Scholar]
- Pfitzer, F.; Provost, J.; Mieth, C.; Liertz, W. Event-driven production rescheduling in job shop environments. In Proceedings of the International Conference on Automation Science and Engineering (CASE), Munich, Germany, 20–24 August 2018; IEEE: New Yoek, NY, USA, 2018; pp. 939–944. [Google Scholar]
- Buchmeister, B.; Ojstersek, R.; Palcic, I. Advanced methods for job shop scheduling. Adv. Prod. Ind. Eng. APEM 2017, 31. [Google Scholar] [CrossRef]
- Kuhnle, A. Simulation and Reinforcement Learning Framework for Production Planning and Control of Complex Job Shop Manufacturing Systems. 2020. Available online: https://github.com/AndreasKuhnle/SimRLFab (accessed on 1 June 2022).
- Haller, A.; Janowicz, K.; Cox, S.J.; Lefrançois, M.; Taylor, K.; Le Phuoc, D.; Lieberman, J.; García-Castro, R.; Atkinson, R.; Stadler, C. The modular SSN ontology: A joint W3C and OGC standard specifying the semantics of sensors, observations, sampling, and actuation. Semant. Web 2019, 10, 9–32. [Google Scholar] [CrossRef]
- Duy, T.K.; Quirchmayr, G.; Tjoa, A.; Hanh, H.H. A semantic data model for the interpretion of environmental streaming data. In Proceedings of the International Conference on Information Science and Technology (ICIST), Nis, Serbia, 28–30 June 2017; IEEE: New York, NY, USA, 2017; pp. 376–380. [Google Scholar]
- Laroche, R.; Fatemi, M.; Romoff, J.; van Seijen, H. Multi-advisor reinforcement learning. arXiv 2017, arXiv:1704.00756. [Google Scholar]
- Kuhnle, A.; Röhrig, N.; Lanza, G. Autonomous order dispatching in the semiconductor industry using reinforcement learning. Procedia CIRP 2019, 79, 391–396. [Google Scholar] [CrossRef]
- Golpayegani, F.; Clarke, S. Co-Ride: Collaborative preference-based taxi-sharing and taxi-dispatch. In Proceedings of the IEEE 30th International Conference on Tools with Artificial Intelligence, ICTAI 2018, Volos, Greece, 5–7 November 2018; Tsoukalas, L.H., Grégoire, É., Alamaniotis, M., Eds.; IEEE: New York, NY, USA, 2018; pp. 864–871. [Google Scholar] [CrossRef]
- Zablith, F.; Antoniou, G.; d’Aquin, M.; Flouris, G.; Kondylakis, H.; Motta, E.; Plexousakis, D.; Sabou, M. Ontology evolution: A process-centric survey. Knowl. Eng. Rev. KER 2015, 30, 45–75. [Google Scholar] [CrossRef]
- Pires, C.E.; Sousa, P.; Kedad, Z.; Salgado, A.C. Summarizing ontology-based schemas in PDMS. In Proceedings of the International Conference on Data Engineering Workshops (ICDEW), Long Beach, CA, USA, 1–6 March 2010; IEEE: New York, NY, USA, 2010; pp. 239–244. [Google Scholar]
- Pouriyeh, S.; Allahyari, M.; Kochut, K.; Arabnia, H.R. A comprehensive survey of ontology summarization: Measures and methods. arXiv 2018, arXiv:1801.01937. [Google Scholar]
- Golpayegani, F.; Chen, N.; Afraz, N.; Gyamfi, E.; Malekjafarian, A.; Schäfer, D.; Krupitzer, C. Adaptation in Edge Computing: A Review on Design Principles and Research Challenges; ACM Transactions on Autonomous and Adaptive Systems: New York, NY, USA, 2024. [Google Scholar]
Parameter | Information |
---|---|
Machines at time step t. | |
Orders at time step t. | |
Work areas at time step t. | |
Groups at time step t. | |
Sinks at time step t. | |
The waiting time of order represents the duration in seconds that the order has waited to be completely processed by a machine at time step t. | |
The due date of order shows a required/preferable time of the order completion. It can have the following values: Low due date (1000 s), medium due date (3000 s), and high due date (5000 s). | |
The priority of order indicates the importance of the order at time step t, which can include low, medium, or high priorities. | |
The working time of machine indicates the total processing time of the machine at time step t. | |
The idle time of machine shows a period of time in which the machine is available but is not doing anything productive. | |
The failure time of machine is determined by summing the total time it was in a failure status at time step t. | |
The failure rate of machine would be equivalent to the failure probability of the machine at time step t, including low, medium, or high values. | |
The distance of a work area from the scheduler agent at time step t. | |
The machine count shows the number of machines in group at time step t. | |
The sink capacity shows the capacity of sink at time step t. |
Task | Reward Function | |||||||
---|---|---|---|---|---|---|---|---|
- | - | - | - | - | ||||
✓ | ✓ | ✓ | - | - | - | - | - | |
- | - | |||||||
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | - | - | |
- | - | - | ||||||
✓ | ✓ | ✓ | ✓ | ✓ | - | - | - | |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | × | |
- | ||||||||
✓ | ✓ | ✓ | × | × | × | × | - | |
- | - | - | - | - | - | - | ||
✓ | - | - | - | - | - | - | - |
ONTOADAPT-REWARD-Based Proposed Method | ||||
---|---|---|---|---|
Baseline Method | Without Multi-Advisor | With Multi-Advisor | ||
State | LIFO | TRPO () | TRPO () | |
Random | Random | Random | ||
Random | Random | Random | ||
TRPO | TRPO () | TRPO ( | ||
FIFO | TRPO () FIFO | TRPO () FIFO | ||
Random | Random | Random |
Description | ||
---|---|---|
Order load | Light | Generating three orders at each time step. |
Heavy | Generating six orders at each time step. | |
Due date level | High | 5% of orders have a low due date, 80% of them have a medium due date, and 15% have a high due date. |
Low | 25% of orders have a low due date, 70% of them have a medium due date, and 5% have a high due date. |
Scenario | Performance Metrics | ||||
---|---|---|---|---|---|
Utilization Rate
(Increase) |
Waiting Time
(Decrease) |
Processed Orders
(Increase) |
Failed Orders
(Decrease) | AVG | |
Light-High | 5% | 13% | 6% | 68% | 23% |
Light-Low | 11% | 8% | 14% | 73% | 27% |
Heavy-High | 12% | 9% | 14% | 41% | 19% |
Heavy-Low | 16% | 6% | 15% | 32% | 17% |
AVG | 11% | 9% | 12% | 54% | - |
Scenario | Performance Metrics | ||||
---|---|---|---|---|---|
Utilization Rate
(Increase) |
Waiting Time
(Decrease) |
Processed Orders
(Increase) |
Failed Orders
(Decrease) | AVG | |
Light-High | 15% | 3% | 16% | 51% | 21% |
Light-Low | 14% | 0% | 16% | 53% | 21% |
Heavy-High | 16% | 1% | 16% | 18% | 13% |
Heavy-Low | 19% | −1% | 18% | 4% | 10% |
AVG | 16% | 1% | 17% | 32% | - |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Golpayegani, F.; Ghanadbashi, S.; Zarchini, A. Advancing Sustainable Manufacturing: Reinforcement Learning with Adaptive Reward Machine Using an Ontology-Based Approach. Sustainability 2024, 16, 5873. https://doi.org/10.3390/su16145873
Golpayegani F, Ghanadbashi S, Zarchini A. Advancing Sustainable Manufacturing: Reinforcement Learning with Adaptive Reward Machine Using an Ontology-Based Approach. Sustainability. 2024; 16(14):5873. https://doi.org/10.3390/su16145873
Chicago/Turabian StyleGolpayegani, Fatemeh, Saeedeh Ghanadbashi, and Akram Zarchini. 2024. "Advancing Sustainable Manufacturing: Reinforcement Learning with Adaptive Reward Machine Using an Ontology-Based Approach" Sustainability 16, no. 14: 5873. https://doi.org/10.3390/su16145873
APA StyleGolpayegani, F., Ghanadbashi, S., & Zarchini, A. (2024). Advancing Sustainable Manufacturing: Reinforcement Learning with Adaptive Reward Machine Using an Ontology-Based Approach. Sustainability, 16(14), 5873. https://doi.org/10.3390/su16145873