Advancing Sustainable Manufacturing: Reinforcement Learning with Adaptive Reward Machine Using an Ontology-Based Approach
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper proposes ontology based adaptive reward machine model addresses these challenges by dynamically creating and modifying RMs based on domain ontologies. This adaptability allows the model to outperform a state-of-the-art baseline algorithm in resource utilization, processed orders, average waiting time, and failed orders, highlighting its potential for sustainable manufacturing by optimizing machine usage and reducing idle times. I suggest it can be accepted after minor revisions. The paper has the following issues:
1. It is not suitable to quote too many references at once, and it is necessary to summarize and summarize certain references separately, such as [3-6], [15, 18-20], [21-24], [26-29].
2. The first time a parameter appears, its definition needs to be given.
3. When abbreviations first appear, their full name needs to be written out, Both abstract and main text are required.
4. The related work section should be placed in the introduction.
5. You define 30%, 50%, and 20% of orders with low, medium, and high priority, respectively. What is the basis? You should give the reason.
6. Simulated episodes are set to 1000, each with 100 simulation steps. What will happen if it exceeds 1000 or is less than 1000?
Comments on the Quality of English LanguageExtensive editing of English language required.
Author Response
Comment 1: It is not suitable to quote too many references at once, and it is necessary to summarize and summarize certain references separately, such as [3-6], [15, 18-20], [21-24], [26-29].
Response 1: I have added explanations for references [3-6] in lines 29-34 of the revised version of the paper. Lines 70-75 now include descriptions of references [15, 18-20]. Descriptions for references [21-24] are provided in lines 79-84, and lines 90-94 include explanations for references [26-29].
Comment 2: The first time a parameter appears, its definition needs to be given.
Response 2: I have thoroughly reviewed the entire article and made the necessary corrections to ensure all parameters are defined upon their first appearance.
Comment 3: When abbreviations first appear, their full name needs to be written out, Both abstract and main text are required.
Response 3: I have carefully reviewed the entire article and made the necessary corrections to ensure that all abbreviations are fully spelled out upon their first appearance.
Comment 4: The related work section should be placed in the introduction.
Response 4: I have moved the related work section to follow the introduction section.
Comment 5: You define 30%, 50%, and 20% of orders with low, medium, and high priority, respectively. What is the basis? You should give the reason.
Response 5: The basis for defining 30%, 50%, and 20% of orders with low, medium, and high priority, respectively, is to simulate a realistic distribution that reflects typical job shop scheduling scenarios where not all orders have equal priority. This approach creates a challenging simulation environment that mirrors actual job shop scheduling dynamics and ensures that the model’s ability to handle different priority levels is thoroughly tested across a range of scenarios.
Comment 6: Simulated episodes are set to 1000, each with 100 simulation steps. What will happen if it exceeds 1000 or is less than 1000?
Response 6: Increasing the number of episodes beyond 1000 could potentially allow for more extensive exploration of the state-action space, leading to a better understanding of the agent's behavior across various environmental conditions and enhancing convergence towards optimal policies. Reducing the number of episodes might limit the diversity of experiences the RL agent encounters, potentially biasing our evaluation results. Fewer episodes could impact the stability of learning algorithms, as they may not have enough opportunities to converge to optimal policies or adequately adapt to changing environments. Our decision to use 1000 episodes strikes a balance between obtaining statistically significant results and managing computational resources effectively.
Comment 7: Extensive editing of English language required.
Response 7: I have extensively improved the English language throughout the article. Some of the improved English parts are shown with underlining.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe article discusses the problem of job-shop scheduling.
In order to offer a particular product to a customer (Sink), it is necessary to use a specific technology. How is the product technology modelled in Figure 1?
The authors do not present the learning method used in the study. What is the verification of training results? Reinforcement learning can give bad results without verification.
Author Response
Comment 1: In order to offer a particular product to a customer (Sink), it is necessary to use a specific technology. How is the product technology modelled in Figure 1?
Response 1: In our simulated job shop scheduling environment, there is no distinction between different products or sinks in terms of specific technological requirements. Instead, the focus is on optimizing the overall system efficiency by maximizing the capacity of sinks. Any produced order can be consumed by any sink, provided the sink has the capacity to handle it. This design choice simplifies the scheduling process and prioritizes overall system efficiency.
Comment 2: The authors do not present the learning method used in the study. What is the verification of training results? Reinforcement learning can give bad results without verification.
Response 2: I have described the learning methods used in our study (lines 585-588), as well as the verification process for the training results (subsection 6.2, Model Evaluation).
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsI doubt the share of global energy consumption by industry. Please crosscheck with IEA data.
Job scheduling has a potential to reduce this, but I would appreciate a number to characterise the potential (which might be around 15 to 20%) in complex manufacturing companies if you include optimising heat consuming processes, please comment.
I understand the relevance of the described optimisation process, but how does it consider the many boundary conditions starting from lead time, to organisational and availability issues, tool wear, tool change, maintenance, breakdowns, batch sizes etc. etc. in an actual plant. Has the approach/algorithm/programme been tested in an actual application and what were the results?
in your discussion can you be more precise on where we can expect significant impacts from your approach. In which sectors/subsectors/company sizes do we have data, works planning procedures, complex variety of subprocesses or machines so that this makes sense?
Author Response
Comment 1: I doubt the share of global energy consumption by industry. Please crosscheck with IEA data.
Response 1: I have crosschecked the data with IEA and updated lines 26-29 accordingly.
Comment 2: Job scheduling has a potential to reduce this, but I would appreciate a number to characterise the potential (which might be around 15 to 20%) in complex manufacturing companies if you include optimising heat consuming processes, please comment.
Response 2: While our study primarily focuses on operational performance metrics such as average machine utilization rate, waiting times, total failed orders, and processed orders, we did not directly measure energy savings metrics in this work. However, based on existing literature, effective job scheduling strategies have been shown to potentially reduce energy consumption by optimizing production sequences, minimizing idle times, and streamlining resource utilization (lines 40-47 have been added to emphasize this). For a comprehensive assessment of the energy-saving potential of the ONTOADAPT-REWARD model, future work could integrate energy-specific metrics into the evaluation framework.
Comment 3: I understand the relevance of the described optimisation process, but how does it consider the many boundary conditions starting from lead time, to organisational and availability issues, tool wear, tool change, maintenance, breakdowns, batch sizes etc. etc. in an actual plant. Has the approach/algorithm/programme been tested in an actual application and what were the results?
Response 3: Indeed, these factors significantly impact the performance and applicability of any optimization approach. We have extensively tested our approach in a simulated job shop environment that closely mirrors real-world complexities. This environment includes machine breakdowns, varying demand patterns, and fluctuating resource availability. I have added the following to the future work: "While our current validation has been conducted in a simulated environment, we have a plan to test and refine our approach in real-world settings considering more boundary conditions starting from lead time to organizational and availability issues, tool wear, tool change, and maintenance.
Comment 4: in your discussion can you be more precise on where we can expect significant impacts from your approach. In which sectors/subsectors/company sizes do we have data, works planning procedures, complex variety of subprocesses or machines so that this makes sense?
Response 4: I addressed this comment by adding the following paragraph to the conclusion and future work section:
"By using the relations defined in an ontology-based schema, it is possible to extract concepts related to each property, thereby reducing the agent's state dimensionality for each learner and enhancing computational efficiency. It is especially important in a factory with hundreds of machines and jobs. Therefore, ONTOADAPT-REWARD is suitable for small and large corporations due to its scalability. The model adapts to varying levels of complexity and size by focusing on critical concepts such as machine availability and order priorities, enabling efficient scheduling regardless of the number of machines and jobs involved."
Author Response File: Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsIt is more appropriate to discuss the factory layout in Figure 5. The arrangement method is based on the group technology (GT or part family) or the type of machine.
What impact does the handling system have on your approach?
You didn't add more details about the breakdown effects. (need more jusifications)
"sustainable manufacturing by optimizing machine usage and reducing idle times, "it is preferable to discuss resource utilization as well.
In future work, discuss using your approach for differential production types, such as batch production, to generalize your approach for different manufacturing processes and systems.
Author Response
Comment 1: It is more appropriate to discuss the factory layout in Figure 5. The arrangement method is based on the group technology (GT or part family) or the type of machine.
Response 1: I have incorporated this information into the definition of group and work area:
"Machines/Resources are categorized into three groups: $n_1$, $n_2$, and $n_3$. Each group consists of machines that can perform similar operations, following the Group Technology (GT) principle of forming part families based on process similarities. This grouping allows for streamlined processing of orders, as machines within the same group can be interchangeably used for specific types of operations, thereby reducing setup times and enhancing throughput.
Work areas $z_1$, $z_2$, and $z_3$ that are located in different locations in the factory to facilitate efficient movement of orders. Each work area contains a subset of machine groups, ensuring that orders can be processed with minimal transportation delays between operations."
Comment 2: What impact does the handling system have on your approach?
Response 2: To address this question, I have included the following paragraph in the conclusion and future work section:
"Moreover, our ONTOADAPT-REWARD model does not specifically address the material handling system (i.e., movement, protection, storage, and control of materials) within the factory layout. Instead, it focuses on optimizing the scheduling and processing of orders through various machines and work areas. While the handling system undoubtedly impacts overall efficiency, our model assumes ideal conditions for material movement without explicitly modeling these dynamics. Future work could incorporate a more detailed handling system to explore its effects on scheduling decisions and overall manufacturing efficiency."
Comment 3: You didn't add more details about the breakdown effects. (need more jusifications)
Response 3: In response to this comment, I have added the following paragraph to the future work section:
"Future work could incorporate mechanisms to manage machine breakdowns by reallocating orders or rescheduling tasks to minimize downtime. Integrating predictive maintenance and machine health monitoring could help anticipate breakdowns and proactively adjust schedules."
Comment 4: "sustainable manufacturing by optimizing machine usage and reducing idle times, "it is preferable to discuss resource utilization as well.
Response 4: In response to this comment, I have added the following explanation:
"Effective job scheduling strategies have demonstrated the potential to decrease energy consumption and emissions through optimized production sequences, reduced idle times, and enhanced resource utilization \cite{Pach2014, Tang2015, Guido2022}. They optimize resource utilization by intelligently allocating jobs to machines based on current workloads and machine availability, thereby reducing bottlenecks and idle times. This efficient allocation not only maximizes the use of available resources but also minimizes energy consumption and operational costs by ensuring that machines are operating at optimal capacity."
Comment 5: In future work, discuss using your approach for differential production types, such as batch production, to generalize your approach for different manufacturing processes and systems.
Response 5: In response to this comment, I have incorporated the following paragraph into the conclusion and future work section:
"Future research should also investigate the application of the ONTOADAPT-REWARD model to different production types, such as batch production, to generalize its applicability across various manufacturing processes and systems. This could involve adapting the model to handle the unique characteristics and constraints of batch production, including varying batch sizes, inter-batch setup times, and production sequencing."
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe author has made revisions based on my suggestions, and I suggest it can ba accepted.
Reviewer 2 Report
Comments and Suggestions for AuthorsI accept the authors' response.