Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Radar-Jamming Decision-Making Based on Improved Q-Learning and FPGA Hardware Implementation

Remote Sens. 2024, 16(7), 1190; https://doi.org/10.3390/rs16071190

by Shujian Zheng, Chudi Zhang

, Jun Hu^* and Shiyou Xu

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Remote Sens. 2024, 16(7), 1190; https://doi.org/10.3390/rs16071190

Submission received: 8 February 2024 / Revised: 14 March 2024 / Accepted: 27 March 2024 / Published: 28 March 2024

(This article belongs to the Special Issue Advances in Remote Sensing, Radar Techniques, and Their Applications)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Please find the attachment for a detailed review

Comments for author File: Comments.pdf

Comments on the Quality of English Language

Quality of English: Minor editing in the results and discussion section is required.

Author Response

Response to Reviewer 1 Comments

For the convenience of reading, we also attach our response in the attachment, please check.

The manuscript titled "Radar Jamming Decision-making Based on Improved Q-Learning and FPGA Hardware Implementation," with the ID: remote sensing-2888084, is well-written, and the authors have provided detailed explanations for every point. Personally, I appreciate papers with real-time implementation. However, learning and implementing Radar Jamming using FPGA can be challenging and crucial. In this paper, the authors present a radar jamming decision-making algorithm based on an enhanced Q-Learning approach, which addresses the issue of Q-value overestimation by integrating a second Q-table. The algorithm's implementation is carried out on a Field Programmable Gate Array (FPGA), with each step decomposed and described using hardware description language. Nevertheless, there are several aspects that need further exploration before making a final decision.

Response: Thank you for your constructive suggestions. We really appreciate your recognition of the work we have done. We think your comments will help improve the readability of the manuscript. We have made specific changes based on the comments. Responses to all comments are provided below. We hope that the revisions to the manuscript and the accompanying responses will meet the standards of the editors and reviewers.

Point 1. The paper lacks a data model section that explains the scenario.

Response 1: Thank you for your constructive suggestions. The scenario model in this paper is an ideal model. We set it up to have only one radar jammer interacting with the enemy radar. After the radar jammer continuously jams with the enemy radar and perceives the state change of the enemy radar, our main concern is whether the radar jamming decision-making algorithm based on reinforcement learning can find the best jamming style for each enemy radar state after a period of training. We have added a description of the radar jamming process in Section 5.1 and indicated it in red font. Thank you again for pointing them out.

Point 2. A block diagram, particularly in the data model section, illustrating the actual target and intruder target that require jamming, should be included.

Response 2: Thank you for your constructive suggestions. We have included a figure of the workflow of the jamming process in Section 5. At the same time, we also add the text description to it. Thank you again for pointing them out.

Point 3. Necessary mathematical equations detailing the jamming process should also be provided.

Response 3: Thank you for your constructive suggestions. We have completed the analysis and simulation of these 8 types of jamming methods on the hardware platform. But the focus of this paper is how to use reinforcement learning to make decisions on jamming methods and how to deploy the algorithm on the hardware platform. In order to avoid distracting, we only give a brief introduction to the name, which has now been added to the manuscript. Thank you again for pointing them out.

Point 4. Basic information regarding Radar Cross Section (RCS), bandwidth, carrier frequency, PRI, pulse width (PW), among others, should be included.

Response 4: Thank you for your constructive suggestions. The research of this paper mainly focuses on the use of radar jammer to make decisions about various radar jamming methods in the radar countermeasure environment, so that the enemy radar can not work properly. We assume that the parameters associated with these jamming signals will be determined after the jammer has conducted reconnaissance of the enemy radar signals. In other words, the research involved in this manuscript is mainly about how to make decisions about jamming methods. The decision of radar jamming signal parameters you mentioned will be the research plan in the future, thank you for pointing out.

Point 5. Subsections in the data model section that explain problem formulation should be added.

Response 5: Thank you for your constructive suggestions. We went over the manuscript again and tried to make further revisions to parts of the manuscript that were not clearly explained before. We have marked the part of the text revision with red font. Thank you for pointing out again.

Point 6. Additional mathematical details should be included in sections 3.1 and 3.2.

Response 6: Thank you for your constructive suggestions. We have added mathematical details of reinforcement learning in sections 3.1 and 3.2. At the same time, we also added descriptions of the relevant mathematical equations to make it easier to understand. Thank you again for pointing them out.

Point 7. Another figure illustrating the process of learning and jamming decision should be added.

Response 7: Thank you for your constructive suggestions. We have added Figure 7 to Section 5.1 and supplemented it with text to illustrate the process of interference and learning. Thank you again for pointing them out.

Point 8. A graphical comparison of convergence times among different methods should be provided.

Response 8: Thank you for your constructive suggestions. We recognized the importance of the comparisons here and completed them. However, due to the thousand-fold speed gap between different platforms, it is difficult to visualize the images. So please allow us to show the speed of convergence under different conditions in table form. We illustrate this in Table 4 and Table 5. Thank you again for pointing them out.

Point 9. Examples in the results and discussions sections demonstrating scenarios where targets and jammers are close to each other, as well as where jammers are distant from targets, should be included.

Response 9: Thank you for your constructive suggestions. In the research involved in this manuscript, we focus on the feasibility of an improved Q-learning algorithm for radar jamming decision-making. The possibility of its practical application is explored by deploying it on FPGA development board. Therefore, the interaction process between the jammer and the enemy radar in the spatial orientation is not considered. In the further study, we will consider the interaction process between jammer and enemy radar in space in the actual scenario. And we will try to make further decisions about the working attitude of the jammer.

Point 10. A state space model and time-frequency images of the jamming signal should also be added

Response 10: Thank you for your constructive suggestions. The jamming methods mentioned in the manuscript are simulated and analyzed. But the research of this manuscript focuses on the jamming decision algorithm based on reinforcement learning mentioned in the manuscript and its hardware platform deployment. We would like to focus the content of the article on this point and sincerely hope for your understanding of our failure to show the jamming signal..

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript " Radar Jamming Decision-making Based on Improved Q-Learning and FPGA Hardware Implementation " studies an improved Q-Learning algorithm and its implementation on FPGA. The manuscript presents a topic of interest to the community. Also, the main contributions of this work are clear. However, it has the following shortcomings.

（1）The use of two Q-Tables to address the overestimation issue is already exemplified in the DQN (Deep Q-Network) algorithm, which makes the innovation aspect of the paper appear insufficient.

（2）In the Experiments section, the state space, the action space, and the reward function were not clearly explained.

(3) The impact of the hyperparameters, alpha, r, and gama showed in Eq. (1), on performance is not analyzed in the Experiments section.

Author Response

Response to Reviewer 2 Comments

For the convenience of reading, we also attach our response in the attachment, please check.

Response: Thank you for your constructive suggestions. We think your comments will help improve the readability of the manuscript. We have made specific changes based on the comments. Responses to all comments are provided below. We hope that the revisions to the manuscript and the accompanying responses will meet the standards of the editors and reviewers.

The use of two Q-Tables to address the overestimation issue is already exemplified in the DQN (Deep Q-Network) algorithm, which makes the innovation aspect of the paper appear insufficient.

Response 1: Thank you for your constructive suggestions. The proposed algorithm is indeed a little similar to DQN algorithm. However, the radar countermeasure environment studied in this manuscript is a limited set of states and actions, in which Q-Learning can play a better role. In the traditional Q-Learning algorithm, there is only one Q table. When we update it, we will only take the Q value corresponding to the action of the maximum Q value that can be obtained in the current state to update it. It's a clever way of updating, and it represents a radical idea which makes the model easy to converge. But it also means that there will be overestimation of the phenomenon, such updates are sometimes too optimistic or too pessimistic. After we use two Q-tables to update, there is a correlation between the two Q-tables, but there is a certain delay in Q_B, which will limit the problem of updating too aggressively in traditional Q-Learning to a certain extent. When we adjust the parameters, we find that although this advantage is not huge, it is stable, which is our innovation point on the Q-Learning algorithm. On this basis, we completed the module design and deployment of traditional Q-Learning and improved Q-Learning algorithm on FPGA hardware platform to promote the realization of a more intelligent integrated radar jammer.

In the Experiments section, the state space, the action space, and the reward function were not clearly explained.

Response 2: Thank you for your constructive suggestions. In the experiment, the state space includes attack, guidance, imaging, target recognition, ranging, angle measurement, surveillance, tracking, acquisition, scanning. Different working states of the radar represent different threats to the jammer. In order to quantitatively describe the change in threat level, we assume that the radar threat state number is 10.

The action set of radar jamming method includes 8 kinds of jamming methods, such as smart noise jamming signal, full pulse forwarding jamming signal, intermittent sampling jamming signal, slice forwarding jamming signal in deceptive jamming. Suppressive jamming methods includes comb spectrum jamming signal, noise amplitude modulation jamming signal, noise phase modulation jamming signal and noise frequency modulation jamming signal.

In the reward of the experiments involved in the manuscript, we considered the situation where the radar state was jammed to the scanning state as a success, giving a reward value of 100. As the radar state moves away from the scanning, the reward value is reduced from -2 all the way down to -18. For each state away, the reward is reduced by 2.

We have added these details to the manuscript and marked them in red. Thank you for pointing it out.

3 The impact of the hyperparameters, alpha, r, and gama showed in Eq. (1), on performance is not analyzed in the Experiments section.

Response 3: Thank you for your constructive suggestions. As for the impact of hyperparameters on model training, we have added some experimental contents and have supplemented the details in Section 5.3. We also highlighted them in red. Thank you for pointing it out.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

In the paper, the authors proposed an enhanced Q-learning algorithm and showcased its implementation on FPGA. While the paper appears technically sound, there are several remarks concerning the presentation of material and the overall quality of the paper.

Firstly, I suggest excluding references from the abstract and moving them to the Introduction section.

When introducing Fig. 1, it is essential to explain the workflow, particularly detailing the meanings of R, A, and other abbreviations. The same observation applies to Fig. 2, which lacks a complete explanation.

The pseudo-code for the Q-learning algorithm seems incomplete, with the absence of "Update state s" needing clarification. Additionally, Equation (1) is provided without an accompanying explanation of the variables, leaving readers uncertain about its derivation.

The term "shortcut transceiver" (line 216) requires clarification, as does the meaning of JESD (line 221).

Abbreviations on Fig. 4 should be explained, and generally, all unexplained abbreviations throughout the paper should be thoroughly checked.

On line 317, an explanation is needed for why the number of actions was set at 8.

My primary concern lies in the perceived lack of significant improvement introduced by the new algorithm. The marginal advantage over the standard algorithm raises questions about the merit of publishing these results. Could you please provide justification for the publication of results with such a modest advantage over the standard method?

Comments on the Quality of English Language

English is generally good, please correct Line 142 (unfinished sentence).

Author Response

Response to Reviewer 3 Comments

For the convenience of reading, we also attach our response in the attachment, please check.

Firstly, I suggest excluding references from the abstract and moving them to the Introduction section.

Response 1: Thank you for your constructive suggestions. We have removed and marked the problems you mentioned from the abstract and then moved them to the Introduction Section. Thank you again for pointing them out.

Response 2: Thank you for your constructive suggestions. For Figure 1, in the process of an interaction of reinforcement learning, the agent will decide the action A_t to be taken according to the current state S_t of the environment and execute it. The environment interacts with the agent after the agent executes an action. Then the environment jumps to a new state S_t+1, giving the agent the reward R_t+1. After many episodes, the agent can learn about the environment through reinforcement learning, and finally find the optimal solution.

For Figure 2, it shows that the next state of the environment St+1 is only related to the current state St and has no connection to the previous state. Under the combined action of state and action, the environmental state jumps to St+1 and gives the corresponding reward Rt+1. We have added them into the manuscript and marked them in red. Thank you again for pointing them out.

Response 3: Thank you for your constructive suggestions. This was our mistake. An update formula was missing from the Algorithm 1. Now the algorithm has been completed.

In equation (1), α is the step size which is also called learning rate of the value estimate. γ is the discount factor for future estimation in the temporal difference process. r is the reward for the state jump. It also means that in this update process, we will use the value of Q_B to update Q_A. At the same time, after every certain number of updates, the Q_A table is synchronised to the Q_B table. Such an operation can solve the problem of overestimation of action values.

We have added them into the manuscript and marked them in red. Thank you again for pointing them out.

The term "shortcut transceiver" (line 216) requires clarification, as does the meaning of JESD (line 221).

Response 4: Thank you for your constructive suggestions. It's our mistake not to go into more detail here. RF shortcut transceiver, which is more accurately called Rf agile transceiver, refers to the RF transceiver module we are expected to use. In the actual radar jammer, the ADRV9009 RF agile transceiver will be responsible for interacting with the outside environment, transmitting the captured pulse signal to the FPGA board, and transmitting the jamming signal generated by the FPGA board. JESD means the JESD204B protocol which is the medium for data transmission between the FPGA board and the ADRV9009 RF agile transceiver. We have added them into the manuscript and marked them in red. Thank you again for pointing them out.

Abbreviations on Fig. 4 should be explained, and generally, all unexplained abbreviations throughout the paper should be thoroughly checked.

Response 5: Thank you for your constructive suggestions. In Figure 4, environment refers to the environment in which the radar interacts with the radar jammer. FPGA board is divided into processing system (PS) and programmable logic (PL), which are respectively control core and logic function implementation. We use personal computer (PC) to configure the PS side. The parameters are then distributed to other parts of the entire hardware system. JESD refers to the JESD204B protocol, which is a transport protocol that allows high-speed communication between the PL of the FPGA board and the ADRV9009 RF agile transceiver. We also rechecked the other parts. Thank you again for pointing them out.

On line 317, an explanation is needed for why the number of actions was set at 8.

Response 6: Thank you for your constructive suggestions. In our preparation work, we carried out analysis and simulation of various radar jamming patterns. In the radar jamming decision experiment, we selected the most representative four types of radar deceptive jamming and suppressive jamming respectively, such as smart noise jamming signal, full pulse forwarding jamming signal, intermittent sampling jamming signal, slice forwarding jamming signal in deceptive jamming and comb spectrum interference signal, noise amplitude modulation jamming signal, noise phase modulation jamming signal, noise frequency modulation jamming signal in the suppression jamming. So here, when making radar jamming method decisions, the number of action sets is set to 8. We have added them into the manuscript and marked them in red. Thank you again for pointing them out.

Response 7: Thank you for your constructive suggestions. In the traditional Q-Learning algorithm, there is only one Q table. When we update it, we will only take the Q value corresponding to the action of the maximum Q value that can be obtained in the current state to update it. It's a clever way of updating, and it represents a radical idea which makes the model easy to converge. But it also means that there will be overestimation of the phenomenon, such updates are sometimes too optimistic or too pessimistic. After we use two Q-tables to update, there is a correlation between the two Q-tables, but there is a certain delay in Q_B, which will limit the problem of updating too aggressively in traditional Q-Learning to a certain extent. When we adjust the parameters, we find that although this advantage is not huge, it is stable, which is our innovation point on the Q-Learning algorithm. On this basis, we completed the module design and deployment of traditional Q-Learning and improved Q-Learning algorithm on FPGA hardware platform to promote the realization of a more intelligent integrated radar jammer. We hope this can answer your questions.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

In the revised version of the paper, the authors have addressed my remarks and made necessary corrections. This version has seen considerable improvement and is now suitable for publication. I also acknowledge the explanation of the advantages of the proposed algorithm; however, the improvement compared to the standard method is not significant.

Comments on the Quality of English Language

English language is generally good.

Article Menu

Radar-Jamming Decision-Making Based on Improved Q-Learning and FPGA Hardware Implementation

Further Information

Guidelines

MDPI Initiatives

Follow MDPI