Next Article in Journal
A System for Interviewing and Collecting Statements Based on Intent Classification and Named Entity Recognition Using Augmentation
Previous Article in Journal
Research on Dynamic and Thermal Effects Based on the Calculation of the Short-Circuit Current in Low-Voltage DC Distribution Systems for Civil Buildings
 
 
Article
Peer-Review Record

Hierarchical Episodic Control

Appl. Sci. 2023, 13(20), 11544; https://doi.org/10.3390/app132011544
by Rong Zhou 1, Zhisheng Zhang 1,* and Yuan Wang 2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2023, 13(20), 11544; https://doi.org/10.3390/app132011544
Submission received: 28 August 2023 / Revised: 30 September 2023 / Accepted: 6 October 2023 / Published: 21 October 2023

Round 1

Reviewer 1 Report

In the abstract, a new, improved method in the field of artificial intelligence is assumed: “hierarchical episodic control model extending episodic memory to the domain of hierarchical reinforcement”

In the abstract, a specific type of application is not assumed. Taking into account the specifics of the magazine, such a statement might be interesting for the reader.

At the end of the introductory chapter there is a reference to a notion that was not mentioned in the abstract: hippocampus.

The work includes an important volume of mathematical relations. It would be very useful to clarify the part of this theoretical support that is taken from the literature and the part that is the authors' own contribution.

In line with this clarification, it may be useful to reduce the explicit description of the first part and replace it with references. At the same time, it would be justified to keep the detailed description, together with justifications, of the second part, that relating to the original own contribution.

The application part of the proposed solution is based on experiments in the Four Rooms Games environment. In the same idea continues with the Mujoco Ant game environment, and UE4-based target tracking.

The conclusion regarding the improved results obtained by the proposed method "time across various games and practical applications." is only partially supported by the content of the paper. The practical application part needs to be strongly improved.

 

Author Response

3. Point-by-point response to Comments and Suggestions for Authors

Comments 1: In the abstract, a specific type of application is not assumed. Taking into account the specifics of the magazine, such a statement might be interesting for the reader.

Response 1: I appreciate your feedback regarding the specificity of the application context in the abstract. In response, we have refined the abstract to explicitly mention the specific type of application relevant to our research. This adjustment aligns with the magazine's preferences and ensures that readers have a clearer understanding of the practical context of our work. Your input has been invaluable in enhancing the abstract's relevance to the magazine's readership, and we thank you for your constructive suggestion.

 

Comments 2: At the end of the introductory chapter there is a reference to a notion that was not mentioned in the abstract: hippocampus.

Response 2: I appreciate your observation regarding the absence of the term "hippocampus" in the abstract despite its mention in the introductory chapter.

The concept of episodic memory originates from neuroscience, where it is implemented by the hippocampus. In the brain, rapid learning is believed to rely on the hippocampus and its ability to store episodic memories. In reinforcement learning, we draw inspiration from this concept and employ a non-parametric module to simulate the functionality of episodic memory.

In response, we have carefully revised the abstract to include a reference to the hippocampus, ensuring alignment with the introductory chapter and providing a more accurate representation of our paper's focus. Your feedback has been instrumental in enhancing the abstract's coherence, and we thank you for your valuable input.

 

 

Comments 3: The work includes an important volume of mathematical relations. It would be very useful to clarify the part of this theoretical support that is taken from the literature and the part that is the authors' own contribution.

In line with this clarification, it may be useful to reduce the explicit description of the first part and replace it with references. At the same time, it would be justified to keep the detailed description, together with justifications, of the second part, that relating to the original own contribution.

Response 3: Agree. Thank you for your valuable feedback regarding the mathematical relations in our work.

 

The numerous formulas in Section 2.2 provide the definition and computational formulas for the Option-Critic framework, sourced from the reference "Bacon, P.L.; Harb, J.; Precup, D. The option-critic architecture." In Section 3.3, a substantial number of formulas are dedicated to the analysis and proof of the convergence properties and non-overestimation characteristics of the OptionEM model. The symbols used in Section 3.3 are consistent with those introduced in Section 2.2. That's correct. In Section 3, the paper first describes the model proposed by the authors and then proceeds to mathematically prove the feasibility and properties of the model. This section represents the innovative contributions of the authors to the paper.

 

To address this concern, we have revised the manuscript to clearly demarcate and reference the mathematical relations obtained from existing literature. This adjustment will provide readers with a better understanding of the sources of these mathematical components.

 

Furthermore, we have retained the detailed description and justifications for the author's original contributions within the manuscript, as these are central to the paper's novelty and significance.

 

We believe that these modifications will enhance the clarity and transparency of our work and its contributions to the field. Your input has been invaluable in improving the manuscript, and we appreciate your thoughtful suggestions. If you have any further recommendations or concerns, please do not hesitate to share them. Thank you for your continued review and guidance.

 

Comments 4: The application part of the proposed solution is based on experiments in the Four Rooms Games environment. In the same idea continues with the Mujoco Ant game environment, and UE4-based target tracking.

The conclusion regarding the improved results obtained by the proposed method "time across various games and practical applications." is only partially supported by the content of the paper. The practical application part needs to be strongly improved.

Response 4: Agree. I appreciate your feedback regarding the practical application aspect of our proposed solution. We have taken your comments seriously and have made significant improvements to the paper's content related to the practical application part. We have now provided more detailed and comprehensive explanations of the experiments conducted in the Four Rooms Games environment, the Mujoco Ant game environment, and the UE4-based target tracking.

 

Additionally, we have strengthened the linkage between these experiments and the improved results mentioned in the conclusion. By offering more explicit insights into how our method performs across various games and practical applications, we aim to provide a more substantiated and convincing argument for its effectiveness.

 

Your input has been instrumental in refining the paper's content, and we thank you for your constructive feedback. If you have any further suggestions or concerns, please do not hesitate to share them. We are committed to ensuring the paper's quality and relevance to its readership.

 

4. Response to Comments on the Quality of English Language

Point 1: I am not qualified to assess the quality of English in this paper.

Response 1:   we will take your feedback into consideration and seek the assistance of a professional language editor to ensure the manuscript meets the highest linguistic standards. Your input is valuable, and we are committed to presenting a polished and well-written paper for review. Thank you for your understanding.

 

 

Reviewer 2 Report

It is an interesting work about a new concept "hierarchical episodic control" for reinforcement learning. Some issues that need improvement are cited below:

1. What is the efficiency problem of reinforcement learning according to the related work?  It must be well described because it's the main problem to be addressed in the research work.

2. The paper contributions should be summarized in the introduction section.

3. Comparisons must highlight benefits of proposed approach regarding implemented benchmark algorithms. Inclusion of comparative tables could help.

4. Number of references is poor (just 15). Related work and literature review must be improved.

The manuscript needs a proof reading, because there are some typos.

Author Response

3. Point-by-point response to Comments and Suggestions for Authors

Comments 1: What is the efficiency problem of reinforcement learning according to the related work?  It must be well described because it's the main problem to be addressed in the research work

Response 1: Thank you for highlighting the importance of describing the efficiency problem in reinforcement learning, which is the main focus of our research work. We have carefully reviewed the related work and have made significant improvements to the manuscript to provide a thorough and well-detailed description of the efficiency problem in reinforcement learning.

 

In the revised manuscript, we have dedicated a specific section to elucidate the challenges and intricacies associated with efficiency in reinforcement learning. We have discussed issues related to sample efficiency, computational complexity, and the trade-offs involved in achieving efficient learning in RL algorithms.

 

By providing this comprehensive description of the efficiency problem, we aim to convey a clear understanding of the research's primary objective and its significance in addressing a critical issue within the field of reinforcement learning.

 

Your feedback has been invaluable in enhancing the manuscript's clarity and alignment with the research's main focus. If you have any further suggestions or require additional clarifications, please do not hesitate to inform us. We appreciate your input and look forward to your continued review.

 

Comments 2: The paper contributions should be summarized in the introduction section

Response 2: Agree. Your feedback regarding summarizing the paper's contributions in the introduction section has been incorporated. The introduction has been revised to effectively highlight and encapsulate the key contributions of the paper, providing readers with a clearer understanding of its significance from the beginning. Thank you for this valuable suggestion, which has improved the overall structure of the manuscript.

 

Comments 3: Comparisons must highlight benefits of proposed approach regarding implemented benchmark algorithms. Inclusion of comparative tables could help.

Response 3: Agree. Thank you for  your feedback regarding the need to highlight the benefits of our proposed approach in comparison to benchmark algorithms. To address this concern, we have revised the manuscript to include comparative tables that clearly illustrate the advantages and performance differences between our approach and the benchmark algorithms. These tables will provide readers with a concise and visual representation of our method's strengths.

 

By incorporating these tables, we aim to enhance the clarity and effectiveness of our comparative analysis, making it easier for readers to grasp the benefits of our proposed approach in a straightforward manner. Your suggestion has been instrumental in improving the manuscript's presentation, and we thank you for your valuable input. If you have any further recommendations or queries, please feel free to share them.

 

Comments 4: Number of references is poor (just 15). Related work and literature review must be improved.

Response 4: Agree. I appreciate your feedback regarding the number of references and the quality of the related work and literature review. I have since worked on expanding the references to provide a more comprehensive context for the research. Furthermore, I have made significant improvements to the related work and literature review, enhancing the depth and breadth of the review to strengthen the manuscript's overall quality.

 

4. Response to Comments on the Quality of English Language

Point 1: The manuscript needs a proof reading, because there are some typos.

Response 1: Thank you for highlighting the need for proofreading due to typos in the manuscript. While content quality was the primary focus during the initial drafting, I have now initiated the proofreading process to address these concerns. I am committed to delivering a polished version that meets the highest standards of excellence.

 

Author Response File: Author Response.docx

Reviewer 3 Report

The title is very short- it visualizes a mini-project report.

Figs 2-4 are inferior quality.

A lot of equations are used, without the exact meaning and perception of these.

very few references.

 

 

 

It is fine.

no major issues

 

Author Response

3. Point-by-point response to Comments and Suggestions for Authors

Comments 1: The title is very short- it visualizes a mini-project report.

Response 1:  Thank you for your suggestions. Our title draws inspiration from reinforcement learning papers that leverage Episodic Memory, such as 'Model-Free Episodic Control,' 'Neural Episodic Control,' 'Flexible Option Learning,' 'Episodic curiosity through reachability,' and 'Episodic memory deep q-networks.' Additionally, we considered innovative papers in the realm of hierarchical reinforcement learning frameworks, including 'Deep successor reinforcement learning,' 'The option-critic architecture,' 'Natural option critic,' 'The eigenoption-critic framework,' 'Learning robust options', 'Learning abstract options' and 'Hierarchical actor-critic.' Our primary contribution is the extension of Episodic Memory into the realm of hierarchical reinforcement learning, where we demonstrate its effectiveness within the Option-Critic framework. We have mathematically verified the algorithm's convergence and non-overestimation properties.  We carefully considered both of these aspects for the title.  While we did contemplate 'Generalizable episodic memory for hierarchical reinforcement learning' as a title, we were concerned about potential confusion with a similar title, 'Generalizable episodic memory for deep reinforcement learning' (Hu, H.; Ye, J.; Zhu, G.; Ren, Z.; Zhang, C.). Consequently, we are inclined towards 'Hierarchical Episodic Control' as the preferred title. We greatly value your feedback and welcome further discussion to enhance the title.

 

Comments 2: Figs 2-4 are inferior quality.

Response 2: Agree. Thank you for pointing out the issue with the quality of Figures 2-4. I have taken note of this concern and will work to improve the resolution and overall quality of these figures to ensure they meet the necessary standards for clarity and readability. Your feedback is appreciated, and these enhancements will contribute to a better visual presentation in the manuscript.

 

Comments 3: A lot of equations are used, without the exact meaning and perception of these.

Response 3: I appreciate your observation regarding the use of equations in the manuscript. The numerous formulas in Section 2.2 provide the definition and computational formulas for the Option-Critic framework, sourced from the reference "Bacon, P.L.; Harb, J.; Precup, D. The option-critic architecture." In Section 3.3, a substantial number of formulas are dedicated to the analysis and proof of the convergence properties and non-overestimation characteristics of the OptionEM model. The symbols used in Section 3.3 are consistent with those introduced in Section 2.2. Thank you for your valuable input, which will contribute to improving the manuscript's clarity.

 

Comments 4: very few references.

Response 4: Agree. I appreciate your feedback regarding the number of references. I have since worked on expanding the references to provide a more comprehensive context for the research. Furthermore, I have made significant improvements to the related work and literature review, enhancing the depth and breadth of the review to strengthen the manuscript's overall quality.

 

4. Response to Comments on the Quality of English Language

Point 1: It is fine. no major issues

Response 1:  Thank you for your feedback on the quality of the English language in the manuscript. I'm pleased to hear that there are no major issues with it. If you have any further suggestions or if there are any specific minor improvements you'd recommend, please feel free to let me know. Your input is valuable and will contribute to enhancing the overall quality of the manuscript. Thank you for your review.

 

 

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The authors were concerned to improve the paper based on the recommendations received.

Reviewer 3 Report

Appendix A to be formatted from the left side so that it will fit the page 

Back to TopTop