Next Article in Journal
A Handheld LiDAR-Based Semantic Automatic Segmentation Method for Complex Railroad Line Model Reconstruction
Previous Article in Journal
Spatiotemporal Projections of Precipitation in the Lancang–Mekong River Basin Based on CMIP6 Models
Previous Article in Special Issue
MD3: Model-Driven Deep Remotely Sensed Image Denoising
 
 
Article
Peer-Review Record

DRL-Based Dynamic Destroy Approaches for Agile-Satellite Mission Planning

Remote Sens. 2023, 15(18), 4503; https://doi.org/10.3390/rs15184503
by Wei Huang 1,2,3, Zongwang Li 2,3, Xiaohe He 1,2,3, Junyan Xiang 1,2,3, Xu Du 4 and Xuwen Liang 2,3,*
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Remote Sens. 2023, 15(18), 4503; https://doi.org/10.3390/rs15184503
Submission received: 13 June 2023 / Revised: 27 August 2023 / Accepted: 4 September 2023 / Published: 13 September 2023
(This article belongs to the Special Issue Reinforcement Learning Algorithm in Remote Sensing)

Round 1

Reviewer 1 Report

This paper considers the problem of task planning for agile satellites, which is essentially a combinatorial optimization problem.  As the solution space is too large to guarantee finding the optimal solution, several strategies are applied to obtain a "good" solution rapidly.  These include graph attention networks to cluster tasks, "destroy" and "repair" operations to improve a solution, and reinforcement learning to train an agent to automate these tasks.

As a disclaimer my research focuses on reinforcement learning and Markov decision processes, but I am not an expert on agile satellites or combinatorial optimization.  I will attempt to comment on these but acknowledge I am at the edge of my knowledge on these topics.

Pros:  The introduction is thorough and the paper is well supported by the references.  With a few exceptions noted later, the quality of writing is good and the explanations comprehensible.  I do not have major concerns about the methodology or results.

Cons:  My chief issue with the paper is that some of the details in sections 2 and 3 have confused me, and I think a considerable rewrite is warranted before the paper is ready for publication.  I will list them specifically for the authors to clarify where appropriate.

Specific issues

The GAT acronym is defined in the abstract but not the body of the paper.  Similarly, D3RL is defined but not DRL.  I believe it would be good to explicitly define DRL as well.

Line 151: The "n" in "n-dimensional" should be italicized.

Line 153: I am very happy you included Table 1 to help the reader quickly reference variable symbols.  In fact, given how complex this project is, I encourage you to expand it to include the variables in section 3.2 and beyond.

Consider eliminating the use of two-character variables.  stithe start time for task i, is easily mistaken for the product s times ti.

Line 159:  It is redundant to say "for all" and also include the symbol ∀.  One or the other is sufficient.

Expression (1):  Is OWu = [wsu, weu]? If so, I recommend writing it as the interval to avoid introducing another notation which is only used once.  If not, please clarify the meaning of OWu.

Expression (2): On my first reading of this section, the definition of [wsiwei] led me to believe that there is a single time interval during which task i could be completed.  But the reference to |VTW|i implies that there are multiple intervals during which task i can be completed.  Can the authors clarify this?

Expression (2): Table one defines the notation xi, but this expression uses the double-subscript xij.  My presumption is that the j indicates the task would be completed during the jth visible window, but I am not certain.  Can the authors clarify?

Section 3.1:  I have an overall question about the purpose of the graph described in this section.  Expressions (7) and (8) suggest that adjacency of tasks represents "similarity" of tasks in terms of visibility and viewing angle.  Is the point to pre-process some information to speed up the "destroy" and "repair" operations later?

Expression (7):  On the left-hand side of this expression is a set operation, intersection, and so the result is another set.  But this is connected by a ≥ sign which, in common use, is only for comparing numbers.  This does not make sense.  Is the purpose of this expression is to identify tasks that have overlapping visibility time windows?  I think an alternate mathematical communication is necessary here. 

Line 231:  I'm assuming F represents the number of features used.  Can the authors define this explicitly?

Expression (9): Convention is that italics are used for variables and single-character functions (such as f).  For multi-character named functions like "LeakyReLU" italics are not necessary.  Of course, MDPI may have their own style guidelines on this.

Expression (10): k is an index from 1 to K, what values do these represent?  I assume the sigma represents an activation function but this should be stated explicitly.  Is LeakyReLU used for sigma as well?

Line 254: A reference is made to RN for removing action nodes, but the MDP formulation and the definition of actions is not until the next section.  Can the exposition be re-ordered so that only defined concepts are referenced?

I did not quite understand the values RN and RC, but this may be due to my ignorance of LNS.  Can you suggest a reference where I can learn more?

Sections 3.3.1, 3.3.2, and 3.3.3 describe the state space, action space, and reward structure for the underlying MDP.  But there is an important fourth component to all MDPs, which is the system dynamics governing transitions between states (in general potentially stochastic, but in this case deterministic) with dependence on the actions used.  In this context, explain how an action used in a state will result in another state.  On this topic, the "destroy" and "repair" operations have only minimal explanation.  More information is needed as this is a vital part of the methodology.

Expression (14):  I did not see where "C" was defined.  

Line 121: Replace "researches" with "research".

Line 153: Consider using "Summary of variables and notation" or "Notation and variable names" as the title.

Line 181: Consider "In this paper, we only consider..."

Lines 188-190: The second phrase beginning "Including the establishment..." is a sentence fragment.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors proposed a dynamic destroy deep reinforcement learning for agile satellite mission planning. This paper is well written and easy to follow, however, needs further revisions before publication. See below for detailed comments.

[1] The abstract need to be improved. Method quantitative comparison is needed.

[2] The use of deep learning for satellite data processing should be mentioned in the Introduction Section.

[3] The introduction is required to clarify what the improvements you have made or what the main problem you have solved regarding the current study instead of just listing what other researchers' academic results one by one.

[4] The description in the proposed method is not clear.

[5] Why not use GAT network for agile satellite mission planning.

[6] Figure 8. should be self-explanatory. Please improve them.

[7] What is the implication of the results (major contributions) to remote sensing science.

[8] The English still needs a thorough revision. I suggest the authors to have a native speaker correct the manuscript.

The English still needs a thorough revision.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

I would like to first say that I am not at all experienced in the field of (agile) satellite mission planning. The only relevant expertise that I have for reviewing this paper is my extensive background within reinforcement learning. Therefore, please consider my various recommendations for this paper as not being highly confident. This is especially true regarding any comments regarding related work etc - hence I entered 'not applicable' for the questions "Does the introduction provide sufficient background and include all relevant references?" and "Are all the cited references relevant to the research?"

With the above out of the way, below are my comments and suggestions.

* I would recommend describing what 'destroy' refers to in the dynamic destroy deep RL (D3RL) method; would be especially useful for someone like me who comes from a different field of research.

* I suggest using longer figure captions that explain more of what the figure depicts. For example, the caption of Fig. 1 could be significantly extended to explain the key parts of what the figure is trying to convey.

* Line 90: "... with the heuristic algorithm ..." <-- I think 'the' should be replaced with 'a', since it's unclear what THE heuristic algorithm would refer to (there are likely more than a single heuristic algorithm?).

* I recommend re-introducing abbreviations that were only once introduced further back in the document. For example, on Line 92, it refers to a GAT-based D3RL model, but then the reader has to go far back (the abstract) to get what the abbreviation is.

* In the fields of research where I operate, one typically wants to demonstrate results also on some form of real datasets, not only in simulated setups. In this work, results are only shown for simulated setups, but perhaps in this field it would not be practically possible to any 'real data experiments' (?). If that is the case, I suggest commenting somewhere in the results section why there are only results for simulated data - and otherwise, if there do exist real data setups, it would significantly strengthen the empirical evaluation if such results were included.

* A set of experiments which I think should be included in a revised version of this paper are generalizations from Area to World (and/or vice versa). Perhaps there are some reasons that I'm unaware of which makes such a 'transfer' experiment (i.e. train model on World, then evaluate it on some Area setups) not possible, and if it isn't possible I would suggest the authors comment on that.

* It is very common - and recommended - when doing experimentation with RL to show seed sensitivities. In other words, for a given setup, train N agents where the only difference among the N training runs is the initial random seed for the policy network parameter initialization. Then show test results as the average of those N agents (ideally also with confidence intervals). In a revised version of this paper I would want the authors to include some results for seed sensitivity analysis and comment on this.

* Given that I'm not an expert in this field, it was hard to grasp all the various concepts of Table 1, e.g. what is 'visible' time vs what is 'execution' time etc. It would be great if the paper could give some more explanations of these concepts.

* I think it was not clear why st_i* was set to the average of ws_i and we_i (see Line 178). Consider clarifying that.

* Some additional ablation results would have been nice, e.g. why there was 3 GAT blocks (it's not clear why 3 was the best design choice). And/or results when omitting the GRU (memory) component.

* It was not clear to me why the training curves were staircase-like in Fig. 5. I would have guessed that they were more "smooth" over training (?). Consider commenting this in the paper.

* In Table 4, 5 I recommend specifying what the metric being showed is (the metric is scheduled rate). Also, in Table 4 I think the caption is a bit confusing, in particular the 'on train set' part - it is not only evaluated 'on train set' but there is also a line 'test' in the bottom.

* Line 381: "... represents the average training results obtained from 10 iterations ..." <-- great to show average over multiple runs. But it would be even better if the variance was commented on as well.

* For the _test version of the results (e.g. in Fig. 6 and Table 4), it was a bit unclear to me what this _test version really was. In particular, does it correspond to a model trained for 5000 iters? Furthermore, it was unclear why the training process is considered to be completely convergent when iteration times exceed 5000. Consider commenting on that.

* Line 401: Why was the number of iterations set to 1500, and not 5000 as before (5000 was shown to be best (?))?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Thanks to the authors for their work in improving the manuscript.  My questions from the first report have been thoroughly answered.

I have just one small issue that I would like addressed before publication.

Table 1 and the first paragraph of Section 2.2 define the notation xi, but this expression (2) uses the double-subscript xij. Can the definition be updated to explain the role of the second subscript?  

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop