Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessFeature PaperArticle

Peer-Review Record

Adaptation to Other Agent’s Behavior Using Meta-Strategy Learning by Collision Avoidance Simulation

Appl. Sci. 2021, 11(4), 1786; https://doi.org/10.3390/app11041786

by Kensuke Miyamoto^1,*, Norifumi Watanabe²

and Yoshiyasu Takefuji¹

Reviewer 1: Anonymous

Reviewer 2:

A Ram Kim

Appl. Sci. 2021, 11(4), 1786; https://doi.org/10.3390/app11041786

Submission received: 31 December 2020 / Revised: 6 February 2021 / Accepted: 8 February 2021 / Published: 18 February 2021

(This article belongs to the Special Issue Advances in Multi-Agent Systems)

Round 1

Reviewer 1 Report

The paper presents an interesting problem in which agents are navigating a space, avoiding collisions and learning from their experience to (perhaps?) reduce future collisions. Having agents adapt or learn to improve future behaviour is a worthwhile goal. Agents that are capable of infering the intentions of other agents so that they can themselves make better decisions is a positive objective.

Unfortunately, the objectives of this research are not clearly presented in this paper, in its current form. The paper could be better presented and easier to understand with a stronger introduction and background section. The reader is left to have to guess exactly what is meant on a number of occasions. Introducing the desired agent behaviour early on would help set the context for the reader.

It would be greatly improved by providing a clear description of the simulation and the agents' goals. It might be guessed that the main goal for each agent is to complete a (minimal) number of movements around the grid without reaching a potential collision state with another agent that requires a backward move. In order to reach this objective, agents learn and try to predict the intentions of agents coming in the opposite direction and decide to move away to avoid a collision. However this is not stated clearly. It is not clearly stated whether agents are in teams or working alone. It is not clear if agents collide/avoid agents moving in the same direction as well as agents moving in opposing directions. To the reader, the impression is that the agents work alone, however the word cooperative implies that the agents work together with other agents in some way. The cooperative element is not clear.

It is not made clear what the research hypothesis was - if there was one. It would help the reader to introduce this early in the paper. It seems possible that a number of interacting concepts are involved and the research needs to clearly specify these individually. The link to human behaviour seems weak. Perhaps it would be better to limit discussion to agents. The definition of active agent behaviour and passive agent behaviour needs introduction earlier in the paper to clarify and define these terms, especially as the use of these terms appears a little contradictory to their usual english meaning.

The reader is left confused after reading the paper about how agents are expected to be actively modifying their behaviour. Perhaps the prediction of intentions or paths of other approaching agents needs further explanation. Is the overall objective of an agent to avoid collision by correctly predicting the path or strategy of an oncoming agent? Is the learning applied to all agents? Is the intent for agents to learn that not all approaching agents necessarily adopt the same strategy? Perhaps this should be stated explicitly.

The description could be improved by presenting the simulation goals separately and clearly, perhaps with images. Additionally, clearly stating the learning objectives early in the paper would help the reader. I am not sure what agents were learning - how to avoid a collision ? - how to infer/predict the behaviour of other approaching agents ? - how to decide to act given a situation ?

Comments for author File: Comments.pdf

Author Response

Section 1

Strategies such as active and passive are not intended for particular tasks. However, in practice, it is necessary to have the agents perform some task, and since this study deals with collision avoidance, we have added explanations of the behavior of each strategy in collision avoidance. (line 16-22, 33-41)
For the same reason, I avoided talking about the specific task in introduction, but I rewrote it to say that we compared rewards in Experiment 1 with mentioning the content of the task. (line 54-59)

Section 2

Added about relationship between previous studies and this study. (line 70-71,

Section 3.1

Added about "opponent" that doesn't intentionally interrupt, but simply want to go in the opposite direction. (line 97-98)
Added figure about agents' rewards. (line 134, 135. Figure 3, 4)
Both were "a strategy to give way", so I corrected one of them to "a strategy to make opponent give way". (line 144)
I wanted to write "he and his agents" mean to "the agents as the subject of the sentence and opposite agents". However, it was a strange sentence in the first place, so I rewrote it. (line 155-156)

Reviewer 2 Report

Please see the attached file.

Comments for author File: Comments.pdf

Author Response

Section 1

I changed the word "effectiveness" in Experiment 1 to mean that we compared the rewards with the content of the task. (line 54-59)

Section 2

"Active activities" in previous research was confusing with active strategy in this study. So I changed it to "vigorous movement". (line 65, 66)

Section 3.1

Added arrows to the figures and added about colors to the text. (line 102. Figure 1)
Present hyper-parameters in table. (line 125. Table 1)
Both were "a strategy to give way", so I corrected one of them to "a strategy to make opponent give way". (line 144)
I wanted to write "he and his agents" mean to "the agents as the subject of the sentence and opposite agents". However, it was a strange sentence in the first place, so I rewrote it. (line 155-156)

Section 4.1

We introducted epsilon "To prevent...". But I mistakenly left the sentence that I meant to erase because I wrote more about it in the next paragraph. Therefore, I deleted "To prevent..." sentence. (between line 223 and 224)

Section 4.2

I rewrote "effective" to mean that the agent can choose a strategy that matches the opponent's strategy at the time in experiment 2. (line 306-307)

Round 2

Reviewer 1 Report

The paper has improved in presentation and readability since the first version. I still feel the cooperative component of the collision avoidance task is not particularly convincing. Is there a difference between cooperation and reaction to avoid a collision? Perhaps the classification of the task as cooperation can be explained/argued more clearly.

I suggest the following further improvements:

Line 45-52 please make clear that this is a previous study and not the current study to avoid confusing the reader.

Line 57-59 Unfortunately, I still find this sentence confusing. similar rewards to ... ? How many parameters are involved? - moving obstacles, number of strategies. Are there just two configurations?
Comparing:
1. learning agents regarding other agents as moving obstacles (? and use one strategy?)
2. learning agents who switch strategies (is this related to moving obstacles?)

I still feel that the coordination demanded by the task is very low level coordination, more akin to consideration. In my mind, coordination implies that agents have tasks that need to be jointly coordinated. Please define coordination. Additionally, explain the distinction you wish to make between reactive behaviour and deliberate coordination. (In other words, explain why you think that the collision avoidance task is coordination and not just reactive).

Line 135 - spelling error 'glay' should be grey ?

Line 168 - 'begun' should be 'began' ?

Line 169 - what are 'temperature parameters' ?

Line 172 - 'less forward action' - I presume this is a comparative term, with reference to what? less than what? (perhaps you mean there are some episodes with fewer forward action choices than others? Are you able to quantify how many?)

Line 207 - 'a number of forward move' should read ' the number of forward moves'

Line 216 - ' the certain number of episodes' is vague and then explained in the next sentence. I would suggest combining these 2 sentences to be clearer. For example, to state something like : 'After 25000 episodes in each set, the agents, who are teachers ...'

What is your hypothesis and conclusion? Are you testing the usefulness of meta-strategy of switching? Yet in your conclusion, you refer to the design of cooperative rewards. The title of the paper uses the term meta-strategy. Is there a difference between meta-strategy and cooperative reward?

Author Response

Section 1
I added about cooperation in the strategy combination. (line 16-17, 42-45)
Cited at the beginning of the sentence, to clarify that study was previous. (line 50)
The preposition "similar with" was incorrectly used instead of "similar to" and corrected. "strategy switching agents" was changed to "meta-strategy agents" and "does not switch strategies" was changed to "only use one strategy". (line 63-64)

Section 3.2
I specified that compared to other agent models. I also wrote about the percentage of episodes in which agents tended to behaviors that would earn cooperative rewards. (line 178, 183-186)

Section 6
I rewrote the sentence because it was focused on cooperative rewards that help to switch strategies. (line 363-365)

Author Response File: Author Response.pdf

Article Menu

Adaptation to Other Agent’s Behavior Using Meta-Strategy Learning by Collision Avoidance Simulation

Further Information

Guidelines

MDPI Initiatives

Follow MDPI