Next Article in Journal
Zingiber mioga Extract Improves Moisturization and Depigmentation of Skin and Reduces Wrinkle Formation in UVB-Irradiated HRM-2 Hairless Mice
Next Article in Special Issue
Special Issue “Cognitive Robotics”
Previous Article in Journal
The Influence of the Modernization of the City Sewage System on the External Load and Trophic State of the Kartuzy Lake Complex
Previous Article in Special Issue
Semantic Mapping with Low-Density Point-Clouds for Service Robots in Indoor Environments
 
 
Article
Peer-Review Record

Intrinsically Motivated Open-Ended Multi-Task Learning Using Transfer Learning to Discover Task Hierarchy

Appl. Sci. 2021, 11(3), 975; https://doi.org/10.3390/app11030975
by Nicolas Duminy 1,2, Sao Mai Nguyen 2,3,*, Junshuai Zhu 2, Dominique Duhaut 1 and Jerome Kerdreux 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2021, 11(3), 975; https://doi.org/10.3390/app11030975
Submission received: 16 December 2020 / Revised: 16 January 2021 / Accepted: 17 January 2021 / Published: 21 January 2021
(This article belongs to the Special Issue Cognitive Robotics)

Round 1

Reviewer 1 Report

The authors investigate the possibility to construct an artificial agent that could learn in the way typically found in humans. They made an industrial robot to learn without external reward, such as people who are intrinsically motivated. Furthermore, the robot learns from observation, which is a phenomenon typically found in people in the social settings.

Not being competent in AI, I can only judge the manuscript in regard to its aspects pertaining to behavioral science. In general, I like this paper very much because it shows how intrinsically motivated social learning may be implemented in AI. It seems to me that, from the psychological point of view, this paper is both interesting and well done.

 

Minor points:

Figure 1 is at the very beginning of the paper although the first reference to this figure is found only in line 274.

I would suggest making some fragments of the text more reader friendly. For instance, the section 6.2 is difficult to follow.

Freedom to choose the goals or tasks is typical for intrinsic motivation but this is not a definitional feature. The most important feature is being rewarded by activity itself, not by its consequences. This is how intrinsic motivation works in humans. Maybe the author could address this issue somehow.

 

 

Author Response

The authors would like to thank the reviewer for their comments.

 

Figure 1 is at the very beginning of the paper although the first reference to this figure is found only in line 274.

We have corrected the introduction to reference to fig 1 on l 27-28

I would suggest making some fragments of the text more reader friendly. For instance, the section 6.2 is difficult to follow.

We have edited section 6.2 to improve its understanding.

Freedom to choose the goals or tasks is typical for intrinsic motivation but this is not a definitional feature. The most important feature is being rewarded by activity itself, not by its consequences. This is how intrinsic motivation works in humans. Maybe the author could address this issue somehow.

Indeed, thank you for mentioning this important mistake in our text. We have added in 2.1. that "These methods use a reward function that is not shaped to fit a specific task but is general to all tasks the robot will face. Tending towards life-long learning, this approach, also called artificial curiosity, may be seen as a particular case of reinforcement learning using a reward function parametrised by internal features to the learning agent".

Reviewer 2 Report

The paper provides a method for robot learning using reinforcement learning via the learning the
task-representations and active learning. Following are my comments:

The introduction establishes the robot learning problem to be unbounded but not redundant.

I suggest that the authors provide short and basic definitions for section 3.3 (e.g.,
interest mapping, exploration strategies) at the beginning of the section, so that it is easier for
an average reader to follow.

Please provide a short description of SGIM-TL and SGIM-ACTS and highlight the main differences
from SGIM-PB.

There are certain decisions in section 3.3 that are not clear. Please explain them further in
detail. For example, lines 228-231, which is a repetition of lines 204-206 without additional
details. Same for the lines 255-261.

How does the set of fixed parameters for the primitives affect the generalizability of the
transfer?

In figure 5, please add the ground truth for the task-hierarchy.

It is clear how the algorithm combines imitation learning and transfer learning, but there is
little to no indication of combining the intrinsic motivation for the task hierarchy.

The paper will benefit from thorough language proofreading. Especially, the Introduction, where
the authors switch between the examples and arguments abruptly.

Minor: Please increase the line space for Algorithm 1.

Author Response

The authors would like to thank the reviewer for her/his comments.

The introduction establishes the robot learning problem to be unbounded but not redundant.

Although the problems we consider and illustrate are redundant, for the sake of clarity of the introduction, we have removed the redundant property. We added in 3.1. that "As more than one action can lead to the same outcome, M is not a function."

I suggest that the authors provide short and basic definitions for section 3.3 (e.g., interest mapping, exploration strategies) at the beginning of the section, so that it is easier for an average reader to follow.

We have added more explicity definitions of interest mapping and exploration strategies in section 3.3. lines 219-220 and 227-230.

Please provide a short description of SGIM-TL and SGIM-ACTS and highlight the main differences from SGIM-PB.

We have added table 1 in section 3.3 to hightlight the differences between the 3 algorithms. For completeness, we also added IM-PB which is included in the results section.

There are certain decisions in section 3.3 that are not clear. Please explain them further in detail. For example, lines 228-231, which is a repetition of lines 204-206 without additional details. Same for the lines 255-261.

We have edited 3.3 and 3.3.1. to avoid repetitions. We tried to to explicit that the first mention in the beginning of section 3.3. of k-nearest neighbour is a general explanation of the model M, and the mentions in 3.3.1. are specific to each strategy and about how this model M is used by k-nearest neighbour.

How does the set of fixed parameters for the primitives affect the generalizability of the transfer?

We do not understand the question. The primitives used by the robot do not have a set of fixed parameters, only a fixed dimensionality derived from the DMP encoding of motions. The primitives are parametrised functions (DMP) with continuous parameter values, as stated in section 3.1. Through its autonomous action space exploration, the robot learns which parameters to use by local regression on the k-nearest neighbours. The only fixed primitives are the set of actions in the demonstration set, because the demonstration dataset has a fixed number of data.

In figure 5, please add the ground truth for the task-hierarchy.

The ground truth for the task hierarchy is represented in fig 2. We have reported the ground truth in Fig 5, 19 and 20 by indicating red squares.

It is clear how the algorithm combines imitation learning and transfer learning, but there is little to no indication of combining the intrinsic motivation for the task hierarchy.

Thank you for this comment. I added a formalisation of the task hierarchy and its links with the procedures and the algorithm in 3.2. I added a paragraph at the end of 3.3.2 to explain this combination. I also added a comment in 5.5.3 showing that intrinsic motivation has driven the exploration to the right task hierarchy.

The paper will benefit from thorough language proofreading. Especially, the Introduction, where the authors switch between the examples and arguments abruptly.

We have gone through proofreading of the text.

Minor: Please increase the line space for Algorithm 1.

We have increased the line space for Algorithm 1

Reviewer 3 Report

Remarks:

  1. The formatting below line 214 (description of the algorithm) should be improved. 
  2. 5 - descriptions on the axes are small and unreadable.
  3. F6, 11 and 14 - some of the diagrams overlap with the text.
  4. 19 and 20 - descriptions on the axes are small and unreadable.
  5. Line 599,600  “The performance of SGIM-PB stems from its tackling several aspects of transfer of knowledge, which relies on our proposed representation of compound actions and integrates”   - I have the feeling that there is no end to this sentence.
  6. Overall, I think the article is interesting, but the Discussion chapter contains theses that are not confirmed in many studies, but only in the case study.In my opinion, in the summary, the authors should clearly separate the properties of the method from the case study.

Author Response

The authors would like to thank the reviewers for their comments.

Below is the point-by-point response.

 

  • The formatting below line 214 (description of the algorithm) should be improved.
    We have increased the line space for Algorithm 1.
  • 5 - descriptions on the axes are small and unreadable.
    We have updated fig 5, but also fig 19 and 20 to have bigger texts.
  • F6, 11 and 14 - some of the diagrams overlap with the text.
    We have increased the spacing between the diagrams and the caption to avoid overlap.
  • 19 and 20 - descriptions on the axes are small and unreadable.
    We have updated fig 19 and 20 to have bigger texts.
  • Line 599,600 “The performance of SGIM-PB stems from its tackling several aspects of transfer of knowledge, which relies on our proposed representation of compound actions and integrates” - I have the feeling that there is no end to this sentence.
    We have corrected this sentence.
  • Overall, I think the article is interesting, but the Discussion chapter contains theses that are not confirmed in many studies, but only in the case study.In my opinion, in the summary, the authors should clearly separate the properties of the method from the case study.
    We have edited the discussion section to make explicit that some results are particular to this case study, and to link with results of previous works.

 

Back to TopTop