Next Article in Journal
Νarrow Row Spacing and Cover Crops to Suppress Weeds and Improve Sulla (Hedysarum coronarium L.) Biomass Production
Next Article in Special Issue
Energy Efficient Routing and Dynamic Cluster Head Selection Using Enhanced Optimization Algorithms for Wireless Sensor Networks
Previous Article in Journal
An Efficient Method Combined Data-Driven for Detecting Electricity Theft with Stacking Structure Based on Grey Relation Analysis
Previous Article in Special Issue
Design of Power Location Coefficient System for 6G Downlink Cooperative NOMA Network
 
 
Article
Peer-Review Record

Multiagent Reinforcement Learning Based on Fusion-Multiactor-Attention-Critic for Multiple-Unmanned-Aerial-Vehicle Navigation Control

Energies 2022, 15(19), 7426; https://doi.org/10.3390/en15197426
by Sangwoo Jeon 1, Hoeun Lee 1, Vishnu Kumar Kaliappan 2,*, Tuan Anh Nguyen 2, Hyungeun Jo 1, Hyeonseo Cho 1 and Dugki Min 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Energies 2022, 15(19), 7426; https://doi.org/10.3390/en15197426
Submission received: 2 September 2022 / Revised: 4 October 2022 / Accepted: 5 October 2022 / Published: 10 October 2022
(This article belongs to the Special Issue Energy Efficiency in Wireless Networks)

Round 1

Reviewer 1 Report

The paper deals with a Multi-Agent Deep Reinforcement learning based on the Fusion Multi Actor Attention model for multi UAVs energy-efficient cooperative navigation control. The reading is not simple and fluent also due to the complexity of the subject dealt with. I suggest to the authors to:

 

Check that all acronyms are defined and in the same way (for example using capital letters for each word that forms the acronym)

Check the use of capital letters in sentences

To comment more fully most of the figures and  tables, also verifying that each symbol that appears in the figures is clearly defined in the text.

What does a' indicate in formula 9?

The article would be more solid if the results were not related only to simulated environments but closer to real environments.

Author Response

Please see the attachment. Thank you.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper presents a deep reinforcement learning based model for navigation control of multiple UAVs. The proposed model is based on fusion of the Multi-Actor Attention Critic (MAAC) model. Two features or improvements over MAAC are proposed: (i) The use of a sensory fusion layer, which enables the effective utilisation of all information related to multiple sensors, (ii) The use of a layer to compensate for the loss of information through the attention layer.

 Overall, the document is well written and well ordered. The authors provide sufficient references to related work, including a summary table, as well as an introduction/background section. According to the simulation results provided, the proposed model outperforms other models in terms of the number of deliveries made by UAVs during the same time, as well as the number of deliveries per distance travelled.

For all these reasons, I consider that the article is close to meeting the quality requirements for publication. However, there are some issues that could be improved and/or explained more clearly, especially regarding the training and validation process of the models, as well as grammatical and formatting issues. Thus, I consider that the article should be subject to further revision.

Please see more details in the comments section for authors.

Author Response

Please see the attachment. Thank you.

Author Response File: Author Response.pdf

Reviewer 3 Report

The paper can be recommended for publishing.

Author Response

Please see the attachment. Thank you.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

 

Dear authors,

Thank you for reviewing the manuscript and correcting grammatical errors. In my previous revision, I mistakenly swapped the comments for the editor and for the authors, so you did not receive all the comments. I am sorry for the mistake and apologise.

I indicate again the points to revise in the original manuscript, so that you can address the necessary changes. As I indicated above, I want to highlight the work that has been done, but I think there are some points that can be improved.

LIST OF COMMENTS:

·         Please provide a table or list of all abbreviations used in the manuscript.

·         Sometimes the proposed Multi-agent Reinforcement Learning based on Fusion Multi-Actor Attention Critic Sometimes it denotes as F-MAAC while others as FMAAC. Please always use the same criteria. Since the proposed model is based on the MAAC model with the addition of Fusion, I would opt for F-MAAC.

·         In Line 11, change 'Unity 3D' to just 'Unity', the most recent name of the application. Please revise this in the rest of the manuscript.

·         Check that all elements begin with a capital letter in Table 1. Errors detected in column 6 'Centralised/Decentralised' (rows 2,3,4,5 and 8) and column 8 'Simulated Environment' (row 9).

·         In section 2.1, following equation (1) the different parameters are defined. One of them is the new state denoted as s' (line 131). This way, the definition of the parameter R has probably a mistake:

o   R means the reward received in the next state s à R means the reward received in the next state s’

·         Improve the quality and resolution of Figure 1. Increase the font size and indicate in the caption of the figure what the letters a (action), s (state) and r (reward) stand for.

·         Please, rewrite the sentences in line 152. The explanation of the parameter ‘t’ is not clear.

·         There may be errors in some of the equation references in section 2.2. Please check the following sentences:

o   Line 159: Now to derive the value function vp(st) Equation 3 is substituted in Equation 5 à … Equation 3 is substituted in Equation 4

o   Line: 161: Now to action value function qp(st, at) Equation 3 is substituted in Equation 7 à … Equation 3 is substituted in Equation 6 

·         Regarding the simulation environment developed in Unity, briefly presented in Section 4.1, provide a link to the GitHub project, as well as a demonstration video as supplementary material.

·         Please, include an introductory paragraph before detailing the different concepts at the beginning of Section 4.2. I also recommend the use of dots or dashes at the beginning of each paragraph to highlight each part of the text. (This also applies to Section 2.3 where the MADDPG and MAAC models are described).

·         It is not clear from Table 2 which sensor the different parameters correspond to. Include some meaningful space or a delimiting line between each sensor (Ray-cast, INS, RADAR).

·         In Section 4.1, it is stated: "What is unique about this environment is that it forces two UAVs to work together to move a specific cargo." However, according to Table 4, 5 UAVs have been used for training and evaluation. Does this mean that there were always 5 UAVs, 2 working collaboratively to pick/deliver orders, and 3 working in isolation? Please clarify this issue in the manuscript.

·         In the current simulation environment, are you considering any models to represent the UAV battery and the decrease of the UAV's battery over time?

·         According to Section 5, the proposed F-MAAC model has been trained using the parameters defined in Table 5. What parameters have been used to train the other models used in the comparisons?

·         In relation to Table 3:

o   What criteria have been followed to determine the scores?

o   I would add a third column to specify in all cases which UAV receives the award.

o   Is this table only for the case where 2 UAVs work collaboratively, or is it valid for all cases, i.e., also when UAVs work in isolation? Possibly another column could be added to indicate whether the award is for a collaborative or individual action.

·         Add a marker to identify each of the lines in Figure 9. If the document is printed in black and white it is difficult to distinguish the different lines. This also applies to the case in Figure 10, especially in relation to the unfiltered data (hardly visible on paper). Possibly in this case, it is also necessary to increase the line thickness.

·         In Section 5.2, it is stated: “The randomness and the instability of the complex 3d environment made different learning aspects compared with the previous training”. Given that the simulation environment is the same, I assume this is because the second training is of longer duration. However, it is not very clear in the current wording. Please clarify this issue.

·         Proofreading of the manuscript by a native English speaker is recommended. Pay attention to capital letters (especially after commas), hyphens and other grammatical problems. Some examples:

o   Line 4: Multi-Agent Deep Reinforcement learning (MADRL) based Fusion Multi Actor Attention (FMAAC) à Multi-Agent Deep Reinforcement Learning (MADRL) based on Fusion Multi Actor Attention (F-MAAC).

o   Line 9: Next, A layer … -> Next, a layer…

o   Line 34: with existing Conventional heuristic-based search… à with existing conventional heuristic-based search…

o   Line 80: In real-world execution, It is difficult… à In real-world execution, it is difficult

o   Line 90: simulation environment à simulation environment. (Include the punctuation mark at the end of the sentence)

o   Line 93: based(MAAC) model. First, We introduce… à based (MAAC) model. First, We introduce… (add an space before brackets).

o   Line 98: Second, In the critic network, à Second, in the critic network,

o   Line 146: Bellman’s Equation à Bellman equation

o   Line 291: used to find the location of the other UAVs and Hubs. In table 2, à used to find the location of the other UAVs and hubs. In Table 2,

o   Line 293: seven types of following actions ascend, descend, forward,… à seven types of following actions: ascend, descend, forward,… (include colon : before detailing the actions).

o   Line 297: previous time step, dpre and distance of current timestep à previous time step, dpre, and distance of current timestep (include a comma before the parameter ‘dpre’)

o   Lines 310-311: The proposed F-MAAC algorithms are validated using the environment proposed in 311 section 3. à The proposed F-MAAC algorithm is validated using the environment proposed in 311 Section 4.

 

 

 

Author Response

Dear Reviewer,

We would like to thank your comments and suggestions. We have reviewed the entire manuscript and made a considerable effort to improve the quality of our work following your feedback. Please find the attached document for our answer to the query you raised and the remarks you made. 

Author Response File: Author Response.pdf

Back to TopTop