Next Article in Journal
Digital Self-Interference Cancellation for Full-Duplex UAV Communication System over Time-Varying Channels
Previous Article in Journal
Defining Structural Cracks in Exterior Walls of Concrete Buildings Using an Unmanned Aerial Vehicle
 
 
Article
Peer-Review Record

Factored Multi-Agent Soft Actor-Critic for Cooperative Multi-Target Tracking of UAV Swarms

by Longfei Yue, Rennong Yang, Jialiang Zuo, Mengda Yan, Xiaoru Zhao and Maolong Lv *
Reviewer 1: Anonymous
Submission received: 26 January 2023 / Revised: 20 February 2023 / Accepted: 20 February 2023 / Published: 22 February 2023

Round 1

Reviewer 1 Report

The classical MADDPG framework is adopted with some modifications in this paper. Specifically, a factored multi-agent soft actor-critic scheme under the maximum entropy framework is proposed, where a UAV swarm is enabled to learn cooperative MTT in an unknown environment. Moreover, this paper is well organized, but several questions are required to be answered.

1.      The English language needs polish, especially, the plurality of words and the article usage throughout the entire manuscript.

2.      The authors mainly emphasize the unknown trajectory of a moving target with a UAV in real-time in the abstract, but do not provide the corresponding solution.

3.      The classical MADDPG framework is a multi-agent decentralized actor, centralized critic approach. The authors also state that global observation-action history can be accessed during centralized training in the second contribution. However, the overall FMASAC Architecture in Figure 3 displays a decentralized critic rather than the centralized one. Figure 3 needs to be redrawn to highlight the decentralized actor and centralized critic.

4.      Equations (22) and (25) are the core elements of the developed algorithm, more explanation is required.

 

5.      For Algorithm 1, can the authors explain why the Actor and critic network update phases are outside of the for-loop of each agent?

Reviewer 2 Report

Authors proposed a factored multi-agent soft actor-critic (FMASAC) scheme under the maximum entropy framework, where UAV swarm is enabled to learn cooperative MTT in unknown environment. Overall, I think that this paper is well written and well organized as far as it goes. Yet, addressing the following points can further improve its quality.

The related work section is very reduced and lacks the critical study. I cannot see the position of this work compared to what has been already done. A comparative table is highly recommended here.

Another subsection regarding the UAV swarms communication can be a plus.

https://ieeexplore.ieee.org/abstract/document/9762762

https://link.springer.com/article/10.1007/s11276-022-03031-8

https://www.sciencedirect.com/science/article/abs/pii/S0140366422001967

Please improve the quality of the figures. Some colors are really inclear.

Please justify the use of the Kinematic Model! Is there any other 3D alternative?

Section 4 have so many small subsections. You either elaborate more or group these subsections.

Please further explain the algorithm and show its inputs/outputs.

What is the used evaluation environment? More network related-parameters like propagation model, communication technology ...etc should be discussed.

Otherwise, this is a very nice piece of work.

Reviewer 3 Report

The paper introduces a cooperative multi-agent reinforcement learning scheme, enabling UAV swarm cooperative multi-target tracking in a distributed manner. The proposal is based on several optimisation methodologies, such as the decentralized partially observable Markov decision process (Dec-POMDP), vanilla multi-agent soft actor-critic (MASAC) model, and spatial entropy reward (SER). Finally, the proposal has been tested and compared to previous solutions, showing a better performance. 

The proposal seems interesting and has mathematical soundness; nevertheless, the main flaw is that you do not detail and justify why you are using the optimisation methodologies.

For example, while reading the paper, I had several questions, such as, why do you use Dec-POMDP? Are there other alternatives? What is the reasoning behind using spatial entropy reward (SER)? Why do you combine all these methodologies? In the paper, most of the paper, you directly introduce the mathematical foundations of the methods used without explaining the reason for using them. This makes the paper hard to understand, even for experts in the area. 

Therefore, in the problem formulation, you should explain clearly why you are using these methods, the reasoning or decisions that lead you to use them, and how they are combined. It would be convenient to include some scheme or diagram with your proposed solution.

Other issues:

Create a reference table with all the symbols and notation used in the equations described throughout the paper. 

Section 3: you must introduce the section. A short paragraph with its contents.

Equation 5: please do not use a dot over the phi greek symbol, it could be interpreted as the derivative. 

Section 4.2.2. and equation 6. I don't understand this section and the equation.

Section 4.3. Again, why do you use spatial entropy?

Section 6.1.  "We develop(-ed) a two-dimensional UAV swarm multi-target tracking simulation environment." Give more details about this simulation environment.

Table 1: the speed of the UAVs is 60m/s (216 km/h), and of the targets, 40m/s (144 km/h). Please, put a real example of this scenario.... I think that targets move too fast.

Section 6.2-6.3: you only tested one scenario. For a UAV that has a speed of 60 m/s an area of 2km x 2km is very small. What would happen in a larger area? Do the targets move freely? 

Finally, although the English is quite good, there are some grammatical errors.

Back to TopTop