Efficiently Detecting Non-Stationary Opponents: A Bayesian Policy Reuse Approach under Partial Observability
Round 1
Reviewer 1 Report
This paper investigates the general method for searching neural network architectures without the opponent’s local information. The Bayesian policy reuse with LocAl oBservations (Bayes-Lab) algorithm based on the centralized training with decentralized execution (CTDE) frame was proposed. The validation environment, experimental settings, and corresponding experimental results were introduced to explain the effectiveness of the proposed algorithm. The current version still needs to be improved. Here are the comments:
1. The practical implications of the proposed method are not described clearly. Could you give a detailed introduction which are the innovative aspects of the research?
2. How is the search space defined or determined in the paper? What are the advantages of the method compared with the traditional method?
3. There is no performance evaluation function in the algorithm. According to the author, the opponent’s actions during online interactions need to be reconstructed. What are the criteria for selecting the optimal action?
4. Is the proposed method only suitable for an adversarial environment?
5. Systematic theory and detailed study are lacking in this paper.
6. Is the opponent's transition from one strategy to another static or dynamic?
7. How does the proposed algorithm learn from the new model when the agent detects poor execution of the current policy? How to effectively detect the strategy of the opponent and quickly respond to behavioral changes?
8. The overall quality of English is too colloquial. It is necessary to carefully check and correct the words, grammar and punctuation errors.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
This paper deals with an exciting topic. The article has been read carefully, and some minor issues have been highlighted in order to be considered by the author(s).
#1 What is the motivation of this paper?
#2 What is the contribution and novelty of this paper?
#3 What is the advantage of this survey paper?
#4 Which evaluation metrics did you used for comparison?
#5 It would be good if security domains for the deep neural network would be reflected in the related work such as BlindNet backdoor: Attack on deep neural network using blind watermark, Textual Adversarial Training of Machine Learning Model for Resistance to Adversarial Examples, Ensemble transfer attack targeting text classification systems.
#6 In Figure 3, the caption is more described.
#7 In Figure 7, the z-axis is more expanded because of some data is out of scope (z-axis).
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
1. The authors need to report the quantitative performance in the abstract.
2. Figure-1 is too small.
3. Figure-1, is there any feedback from Online to Offline?
4. How VAE is adopted together with A2C here? Is A2C another decoder?
5. It is not clear if Bayes-Lab in the conclusion refers to Bayes-Lab or Bayes-Lab(A).
6. Please suggest if any end application can be benefited from the proposed network.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
I recommend the acceptance.