Multiple Unmanned Aerial Vehicle (multi-UAV) Reconnaissance and Search with Limited Communication Range Using Semantic Episodic Memory in Reinforcement Learning
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors1. The paper discusses limited communication but does not provide detailed metrics or constraints on how communication is limited. Including specific parameters, such as bandwidth limitations or latency, would add depth.
2. While the paper states that CNN-SEMU outperforms state-of-the-art methods, it lacks a detailed comparison with baseline models. Providing more comparative analysis with existing algorithms would strengthen the claims.
3. The paper mentions the use of episodic memory but does not elaborate on the mechanisms for memory retention and recall. A more detailed explanation of how episodic memory is structured and accessed during different phases of the operation would be beneficial.
4. There's a need for a scalability analysis to understand how the proposed method performs as the number of UAVs increases. This would help in assessing the practical applicability of the approach in larger-scale scenarios.
5. There's a need to improve the Conclusion by adding more detail about the results obtained and giving the general context of the paper to the reader.
Comments on the Quality of English LanguageSome minor grammatical issues have to be improved,
1. Line 80: There is NO VERB in this sentence.
2. Line 25: ... each drone broadcasts its messag(es)
3. In general, it's much better to collect several citations into one crochet, like in Line 491: ... settings in [26] and [27], and ... >> settings in [26, 27] and ...
Author Response
Point-by-point response to Comments and Suggestions for Authors
Comment 1: The paper discusses limited communication but does not provide detailed metrics or constraints on how communication is limited. Including specific parameters, such as bandwidth limitations or latency, would add depth.
Response: Thank you for pointing out this problem in our manuscript. In our manuscript, the term "limited communication" actually refers to "limited communication range" and does not involve communication bandwidth or latency. The restriction on communication range is not reflected in the constraints but directly affects the UAV's observations (see Equations 19 and 20). Specifically, the uncertainty maps and target maps of the UAV differ under various communication ranges (see Figures 4(a) and 4(c)). We also studied the impact of different communication ranges on UAV reconnaissance performance in Section 6.2 of the experimental part. To avoid confusion, we have changed "limited communication" in the original manuscript to "limited communication range."
Comment 2: While the paper states that CNN-SEMU outperforms state-of-the-art methods, it lacks a detailed comparison with baseline models. Providing more comparative analysis with existing algorithms would strengthen the claims.
Response: We thank the reviewer for pointing out this issue. In the original manuscript, we compared our algorithm CNN-SEMU with QMIX, QPLEX, EMC, SEMU, and variants of EMU in the baseline experiment analysis in Section 6.3. EMC and EMU are state-of-the-art episodic memory-based reinforcement learning algorithms, while SEMU is an ablation study of CNN-SEMU without the CNN structure. QMIX and QPLEX are classic reinforcement learning baseline algorithms. It is important to note that CNN-SEMU is based on the classic QMIX algorithm structure. For the MCRS-LCR problem, our algorithm demonstrated superior performance compared to both classic and state-of-the-art algorithms.
Comment 3: The paper mentions the use of episodic memory but does not elaborate on the mechanisms for memory retention and recall. A more detailed explanation of how episodic memory is structured and accessed during different phases of the operation would be beneficial.
Response: Thank you for your suggestion. In response to your suggestion, we have detailed the construction and recall of episodic memory in the revised manuscript.
- For the construction of episodic memory, Algorithm 1 describes the method in detail (see lines 410-419 on page 13). First, a trajectory is collected from the replay buffer (line 1). Then, a reverse chronological traversal is used to calculate the return (lines 2-4). Next, the encoder computes ​ (line 5), an-d the nearest neighbor algorithm finds the closest ​in the current episodic buffer (line 6). Last, lines 7-13 handle the update and addition of memories.
- For the recall of episodic memory, Figure 3(e) provides a visual representation, and lines 462-469 on page 14 offer a detailed explanation. Ultimately, the loss function in Equation 32 influences the algorithm's performance.
Comment 4: There's a need for a scalability analysis to understand how the proposed method performs as the number of UAVs increases. This would help in assessing the practical applicability of the approach in larger-scale scenarios.
Response: Thank you for bringing this issue to our attention regarding our manuscript. The original manuscript already includes an analysis of scalability. Table 2 presents the experimental scenarios with 4, 8, and 10 UAVs. In the baseline experiment analysis in Section 6.3, Figure 6 compares the algorithm's performance with 4 and 8 UAVs, showing that our algorithm performs better in smaller scenarios. Figure 7 compares the performance with 10 UAVs, indicating that our algorithm still achieves the best performance in more complex scenarios. Subsequent experiments use 10 UAVs as examples.
Comment 5: There's a need to improve the Conclusion by adding more detail about the results obtained and giving the general context of the paper to the reader.
Response: Thank you for your valuable feedback. I have revised the Conclusion to provide more detail about the results obtained and to give the reader a better understanding of the general context of the paper, as seen in lines 457-475 on page 24.
Response to Comments on the Quality of English Language
Point 1: Line 80: There is NO VERB in this sentence.
Response: Thank you for your comment. I believe you might be referring to the phrase "LC problem particularly challenging." In fact, MCRS-LC is a single term connected to the previous sentence, so the sentence does have a verb.
Point 2: Line 25: ... each drone broadcasts its messag(es).
Response: Thank you for pointing out this problem in our manuscript. Based on your suggestion, we have corrected the phrase " This means each UAV broadcasts its message" to " This means each UAV broadcasts its messages" in line 255 on page 8.
Point 3: In general, it's much better to collect several citations into one crochet, like in Line 491: ... settings in [26] and [27], and ... >> settings in [26, 27] and ...
Response: Thank you for your suggestion. Based on your suggestion, we have corrected the phrase " settings in [26] and [27], and ..." to " settings in [26, 27] and ..." in lines 497-498 on page 16.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors1. What is the scalability of the proposed method? Do scenarios of different scales need to be retrained?
2. The collision rate in Figure 7 is too high. Will the drone crash after collision? Will observation of the drone still be considered after collision?
3. Can the trajectory of the drone be visualized?
Author Response
Reviewer #2
Comment 1: What is the scalability of the proposed method? Do scenarios of different scales need to be retrained?
Response: Thank you for bringing up the formatting issue with the references in the manuscript. We conducted experiments based on the three different scenarios designed in Table 2. These experiments tested the reconnaissance performance of varying numbers of UAVs on different map sizes to validate the scalability of the proposed method.
In reinforcement learning, scenarios of different scales require retraining. This is because different scales may have varying state spaces, leading to changes in policy and value functions. Retraining ensures that the algorithm can effectively learn and adapt to the new environmental features.
Comment 2: The collision rate in Figure 7 is too high. Will the drone crash after collision? Will observation of the drone still be considered after collision?
Response: We thank the reviewer for pointing out this issue. We also noticed the issue of high UAV collision rates. This problem arises because the weight of each reward in Equation 21 is uniform. In our experiments, we set the weights of the four types of rewards to 1 to eliminate the impact of parameter weights on algorithm performance. Additionally, our current study does not consider UAV crashes after collisions, so collisions do not affect their observations. In future work, we can increase the collision penalty weight to reduce the UAV collision rate.
Comment 3: Can the trajectory of the drone be visualized?
Response: We thank the reviewer for pointing out this issue. In our initial experiments, we visualized the UAV trajectories. Using an example of 4 UAVs in a 10x10 grid, as shown below, the UAV trajectories exhibit significant overlap. This is because, in reconnaissance tasks, UAVs need to cover the environment at least twice. They prioritize continued reconnaissance of cells with conflicting results to increase the belief in the presence of targets, rather than relying on a single pass. However, UAV trajectories alone do not reflect this double coverage. Therefore, in Figure 4, we used contour maps to visualize the UAVs' uncertainty maps, which more clearly show the changes in map uncertainty.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsTitle: Multi-UAV Reconnaissance and Search with Limited Communication Using Semantic Episodic Memory in Reinforcement Learning
Summary:
This paper presents an innovative approach to multi-UAV (Unmanned Aerial Vehicles) reconnaissance and search operations under limited communication conditions using a reinforcement learning framework. The primary contributions include the introduction of a communication and information fusion model based on belief probability maps, the integration of episodic memory into the reinforcement learning process, and the proposal of a CNN-Semantic Episodic Memory Utilization (CNN-SEMU) algorithm.
Key Contributions:
i. Communication and Information Fusion Model.
ii. Episodic Memory in Reinforcement Learning.
iii. CNN-Semantic Episodic Memory Utilization (CNN-SEMU) Algorithm.
Abstract:
The abstract provides a concise overview of the study, including the problem addressed, the approach taken, and the key findings. However, it could be improved by mentioning specific performance metrics or quantitative results from the experiments to substantiate the claim of outperformance.
In the statement: "This paper investigates the problem of multi-UAV collaborative reconnaissance and search for static targets with limited communication (MCRS-LC)." Are you considered only static targets?
Section 1. Introduction:
The introduction effectively outlines the significance of multi-UAV reconnaissance and search operations, particularly under limited communication constraints. The problem statement is clear, and the motivation for using reinforcement learning and episodic memory is well-articulated.
Section 2. Related Work:
The related work section provides a comprehensive review of current research on UAV communication, search strategies, and the application of episodic memory in reinforcement learning. The authors identify key gaps in existing studies, particularly the challenges in communication efficiency and the extraction of semantic features from high-dimensional state spaces.
Section 3. System Description and Problem Formulation:
The system model is well-defined, including the grid-based reconnaissance environment model, UAV model, and belief probability map. The formulation of the multi-UAV collaborative reconnaissance and search problem as a multi-objective optimization problem is rigorous and detailed.
In Table 1 caption, "Table 1. Main Notations in Section 2.", it is supposed to be Section 3.
"3.1. Reconnaissance Environment Model" sub-section, in this statement "It is assumed that several static targets are distributed within the task area, each occupying one cell, with at most one target per cell." Are you considered only static targets? Does this proposed work not for dynamic targets? Especially in a complex and dynamic environment (with dynamic obstacles)?
In the "3.2. UAV Model" sub-section, Are you considering only the x-and y-axis in the equations? Are you assuming the UAV is in constant altitude (suppressed z-axis)?
In Figure 2, what is the size of the area of each cell?
In the "3.2. UAV Model" sub-section, in the statement "To address this issue, we treat the sensor readings as evidence and use the Dempster-Shafer (DS) evidence theory, as established in[7], to handle these conflicts. By fusing the sensor readings, we can measure the uncertainty of each cell and represent the uncertainty of all cells as a belief probability map.", If authors used the sensor readings, it could be nice to have sensor readings along with fused sensor readings in a table along with sensor details.
"3.4. UAV Communication and Information Fusion Model" sub-section, in this statement "It is important to note that our communication model assumes noiseless, instantaneous broadcast communication. This means each UAV broadcasts its message and immediately receives messages from all other UAVs within its communication range without errors." If you assume these many constraints, in a real-world scenario, it will be very difficult to achieve results.
Section 4 and Section 5. The methodology is well-detailed and logically structured. The use of belief maps for communication and information fusion is a sound approach. The reformulation of the problem within the reinforcement learning framework is clearly explained. The introduction of the CNN-SEMU algorithm is a significant contribution, and its components are described in sufficient detail. The use of an encoder-decoder structure with CNNs to extract semantic features is innovative and well-justified.
Section 6. Experiments:
The experimental setup is good, with extensive simulations demonstrating the efficacy of CNN-SEMU. The results are convincing, showing that CNN-SEMU outperforms state-of-the-art methods in search efficiency with simulation results. It would be more valuable to provide real experiments with multiple scenarios, accompanied by figures/videos, rather than to provide simulation results.
Discussion on Limitations: Expand on the limitations of this work and potential challenges in real-world applications are missing.
Conclusion:
The conclusion effectively summarizes the main contributions and findings. The authors highlight the improvements brought by the proposed methods and suggest potential future research directions.
Comments on the Quality of English LanguageNone
Author Response
Reviewer #3
Comment 1: The abstract provides a concise overview of the study, including the problem addressed, the approach taken, and the key findings. However, it could be improved by mentioning specific performance metrics or quantitative results from the experiments to substantiate the claim of outperformance.
Response: Thank you for your valuable feedback. We appreciate your suggestion to include specific performance metrics or quantitative results in the abstract. We have revised the abstract to incorporate these details, providing a clearer and more substantiated claim of outperformance.
Comment 2: In the statement: "This paper investigates the problem of multi-UAV collaborative reconnaissance and search for static targets with limited communication (MCRS-LC)." Are you considered only static targets?
Response: Yes, in this paper, we only consider the reconnaissance and search for static targets. In our future work, we will extend our research to include the reconnaissance and search for dynamic targets.
Comment 3: The introduction effectively outlines the significance of multi-UAV reconnaissance and search operations, particularly under limited communication constraints. The problem statement is clear, and the motivation for using reinforcement learning and episodic memory is well-articulated.
Response: Thank you for your positive feedback. We appreciate your acknowledgment of the clarity in our problem statement and the motivation for using reinforcement learning and episodic memory.
Comment 4: In Table 1 caption, "Table 1. Main Notations in Section 2.", it is supposed to be Section 3.
Response: Thank you for pointing out this error. We have corrected the caption to read "Table 1. Main Notations in Section 3." in the revised manuscript.
Comment 5: "3.1. Reconnaissance Environment Model" sub-section, in this statement "It is assumed that several static targets are distributed within the task area, each occupying one cell, with at most one target per cell." Are you considered only static targets? Does this proposed work not for dynamic targets? Especially in a complex and dynamic environment (with dynamic obstacles)?
Response: Thank you for your insightful comments. Yes, in this work, we only consider static targets. Our current focus is on developing and validating our approach in a controlled environment with static targets. Other flying UAVs can be considered dynamic obstacles. However, we recognize the importance of addressing dynamic targets and environments with dynamic obstacles. We plan to extend our research to include these aspects in future work.
Comment 6: In the "3.2. UAV Model" sub-section, Are you considering only the x-and y-axis in the equations? Are you assuming the UAV is in constant altitude (suppressed z-axis)?
Response: We thank the reviewer for pointing out this issue. In our current research, UAVs complete reconnaissance and search of the current cell at each time step. They only need to decide the direction for the next time step, so their orientation, flight altitude, and the z-axis are not considered. This is also mentioned in lines 191-195 on page 6 of the manuscript.
Comment 7: In Figure 2, what is the size of the area of each cell?
Response: Thank you for pointing out this problem in our manuscript. The size of each cell depends on the UAV's reconnaissance performance and the time step setting. For example, if a UAV can complete reconnaissance of an area with 100-meter sides in 10 minutes, then the cell size is set to 100 meters, and the time step is set to 10 minutes. We have also added supplementary information to this section in the revised manuscript, as seen in lines 192-195 on page 6.
Comment 8: In the "3.2. UAV Model" sub-section, in the statement "To address this issue, we treat the sensor readings as evidence and use the Dempster-Shafer (DS) evidence theory, as established in[7], to handle these conflicts. By fusing the sensor readings, we can measure the uncertainty of each cell and represent the uncertainty of all cells as a belief probability map.", If authors used the sensor readings, it could be nice to have sensor readings along with fused sensor readings in a table along with sensor details.
Response: We thank the reviewer for pointing out this issue. In our research, UAV sensor readings only indicate whether a target is detected. To resolve conflicting sensor readings, we use the DS evidence rule to fuse these readings into the current belief probability map (as shown in Equation 5), thereby updating the map iteratively.
Comment 9: "3.4. UAV Communication and Information Fusion Model" sub-section, in this statement "It is important to note that our communication model assumes noiseless, instantaneous broadcast communication. This means each UAV broadcasts its message and immediately receives messages from all other UAVs within its communication range without errors." If you assume these many constraints, in a real-world scenario, it will be very difficult to achieve results.
Response: Thank you for your insightful feedback. We acknowledge that assuming noiseless, instantaneous broadcast communication is a significant constraint and may not reflect real-world scenarios accurately. In future work, we plan to relax these assumptions and explore more realistic communication models that account for noise, delays, and potential errors.
Comment 10: The experimental setup is good, with extensive simulations demonstrating the efficacy of CNN-SEMU. The results are convincing, showing that CNN-SEMU outperforms state-of-the-art methods in search efficiency with simulation results. It would be more valuable to provide real experiments with multiple scenarios, accompanied by figures/videos, rather than to provide simulation results.
Response: Thank you for your positive feedback on our experimental setup and simulation results. We agree that real-world experiments would provide more valuable insights. In future work, we plan to conduct real experiments across multiple scenarios and include figures and videos to complement our findings.
Comment 11: Discussion on Limitations: Expand on the limitations of this work and potential challenges in real-world applications are missing.
Response: Thank you for your feedback. We acknowledge the need to discuss the limitations of our work and the potential challenges in real-world applications. In the conclusion section of the manuscript, we have expanded on the limitations of the current research and the challenges in real-world applications, as seen in lines 657-675 on page 24.
Author Response File:
Author Response.pdf

