Frequency Diversity Array Radar and Jammer Intelligent Frequency Domain Power Countermeasures Based on Multi-Agent Reinforcement Learning

Zhou, Changlin; Wang, Chunyang; Bao, Lei; Gao, Xianzhong; Gong, Jian; Tan, Ming

doi:10.3390/rs16122127

Open AccessArticle

Frequency Diversity Array Radar and Jammer Intelligent Frequency Domain Power Countermeasures Based on Multi-Agent Reinforcement Learning

by

Changlin Zhou

¹,

Chunyang Wang

¹,

Lei Bao

^2,*,

Xianzhong Gao

²,

Jian Gong

¹ and

Ming Tan

³

¹

Air and Missile Defense College, Air Force Engineering University, Xi’an 710051, China

²

Test Center, National University of Defense Technology, Xi’an 710106, China

³

College of Information and Communication, National University of Defense Technology, Wuhan 430035, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(12), 2127; https://doi.org/10.3390/rs16122127

Submission received: 7 May 2024 / Revised: 9 June 2024 / Accepted: 9 June 2024 / Published: 12 June 2024

(This article belongs to the Topic Radar Signal and Data Processing with Applications)

Download

Browse Figures

Versions Notes

Abstract

:

With the development of electronic warfare technology, the intelligent jammer dramatically reduces the performance of traditional radar anti-jamming methods. A key issue is how to actively adapt radar to complex electromagnetic environments and design anti-jamming strategies to deal with intelligent jammers. The space of the electromagnetic environment is dynamically changing, and the transmitting power of the jammer and frequency diversity array (FDA) radar in each frequency band is continuously adjustable. Both can learn the optimal strategy by interacting with the electromagnetic environment. Considering that the competition between the FDA radar and the jammer is a confrontation process of two agents, we find the optimal power allocation strategy for both sides by using the multi-agent deep deterministic policy gradient (MADDPG) algorithm based on multi-agent reinforcement learning (MARL). Finally, the simulation results show that the power allocation strategy of the FDA radar and the jammer can converge and effectively improve the performance of the FDA radar and the jammer in the intelligent countermeasure environment.

Keywords:

frequency diversity array (FDA); multi-agent reinforcement learning (MARL); multi-agent deep deterministic policy gradient (MADDPG); power allocation; electronic warfare

Graphical Abstract

1. Introduction

In recent years, with the development of radar jamming and anti-jamming technology, the electronic countermeasures between the radar and jammer have become increasingly fierce [1,2]. Because the jammer and the target are in the radar antenna’s main lobe and share the radar antenna’s main lobe gain, the main lobe jamming is complicated to deal with in all kinds of jamming [3,4].

In 2006, Antonik et al. proposed the FDA radar [5], which has a small frequency increment between each transmitting array element and produces a directional graph with range–angle–time coupling characteristics [6,7,8]. The FDA radar’s performance is enhanced by unique features, such as beamforming [9], target parameter estimation [10,11], detection [12], tracking [13], imaging [14], clutter suppression [15], and low intercept performance [16,17,18,19]. In addition, the FDA radar can effectively suppress forwarded spoofing jamming located in the main lobe. [20,21,22]. Due to the frequency increment of each array element of the FDA radar, it also has excellent potential to suppress the spectrum-blocking jamming in the main lobe, and it can avoid fixed spectrum-blocking jamming through the transmitting power allocation [23,24]. Compared with the traditional suppression of spectrum-blocking jamming by spectrum notching [25,26], the FDA radar does not require complex optimization methods to design radar waveforms, nor does it require arbitrary waveform generators to generate optimized waveforms [27,28]. The above methods focus on the passive suppression of jamming signals by signal processing after the radar receives jamming signals. Compared with passive jamming signal suppression, with radar-active countermeasures the radar can take measures in advance to avoid jamming itself; thus, it has higher flexibility and freedom [3,29].

Aiming at the spectrum-blocking jamming caused by communication systems to radar, ref. [30] modeled the jamming suppression process as a Markov decision process. It sought the optimal transmitting strategy through strategy iteration. Aiming at intermittent spectrum jamming, [31] applies deep reinforcement learning (DRL) to spectrum jamming suppression to realize the real-time learning and online updating of a radar transmitting strategy. Aiming at fixed spectrum-blocking jamming, [32] uses DRL to avoid fixed spectrum-blocking jamming by transmitting the power allocation of an FDA-MIMO radar. Most studies assume that the jammer is a non-agent and can only adopt a fixed jamming strategy. However, by using learning radar strategies, jammers can adjust their jamming strategies [33,34]; so, studying the intelligent countermeasures between the radar and the jammer is significant.

Some scholars use the static game analysis framework to model the relationship between the radar and the jammer [35,36], and their model cannot represent the sequential decision problem between the radar and the jammer. The confrontation between the radar and the jammer should be a multi-round interactive sequential decision. In [37,38], the authors use the extensive-form game (EFG) model to model the multi-round interaction between the frequency-agile radar and the jammer. It uses the neural fictitious self-play (NFSP) method to seek the approximate Nash equilibrium strategy, but the limited interaction rounds and discrete strategies still limit its model. It is an essential premise to establish a more realistic radar and jamming counteraction model to realize radar-active counteraction.

This paper first introduced a parametric measurement model in which the target response changes with the transmitting frequency of the FDA radar [39]. Under the condition that the total energy is unchanged, the FDA radar can continuously adjust the energy distribution of the corresponding frequency of each array element to avoid spectrum jamming and improve the radar signal-to-jamming-plus-noise ratio (SJNR). The jammer can transmit a broadband jamming signal, and its center frequency aligns with the radar center frequency. The jamming energy of each frequency is continuously adjustable when the total jamming energy is unchanged. The MARL simulates the multi-round sequence decision process between the radar and the jammer. Under this model, we study the frequency–power confrontation strategy between the FDA radar and the jammer under the dynamic distribution of jamming power. The main contributions of this paper are as follows:

(1): The FDA radar signal model with target response varying with frequency is established based on continuously adjustable jamming and radar power. Targets respond differently to different frequencies, and FDA radars and jammers can use target response characteristics to allocate power in different frequency bands.
(2): The adversarial relationship between the FDA radar and the jammer is mapped to a MARL model. Power allocation in the frequency domain realizes the confrontation between the FDA radar and the jammer. In designing a reward function based on SJNR, the radar optimization strategy maximizes SJNR, while the jammer adjustment strategy minimizes SJNR.
(3): In the DRL framework, the power allocation strategies of the FDA radar and jammer were fixed, respectively, and the performance of the frequency domain power countermeasures using the deep deterministic policy gradient (DDPG) algorithm was analyzed. When one side of the FDA radar or jammer had a fixed transmission strategy, the other side adopted the DRL framework to optimize the transmission strategy, which could significantly improve the performance.
(4): The intelligent frequency domain power countermeasure of the FDA radar and jammer is analyzed using the MADDPG algorithm and the method of centralized training with decentralized execution (CTDE) based on MARL.

The structure of this paper is as follows. Section 2 gives the FDA radar signal model and jammer power allocation model and establishes the relationship between the FDA radar power distribution and SJNR. In Section 3, the meaning of each element of the FDA radar and jammer mapping to the DRL framework is designed; a single-agent environment interaction model is established; the DDPG algorithm is described; and the transmitting strategy optimization method is given. In Section 4, the meaning of each element of the FDA radar and jammer mapping to the MARL framework is designed in detail; the multi-agent environment interaction model is established; the MADDPG algorithm is described; and the performance and effect are analyzed. Section 5 shows the simulation results, and the theoretical analysis is verified. The Section 6 summarizes the paper.

2. Signal Model

We consider that the transmitting array and receiving array of the FDA radar are uniform linear arrays (ULAs); the transmitting array includes

M_{t}

transmitting array elements; the receiving array includes

M_{r}

receiving array elements; and the array unit spacing is

d

. The basic framework of the FDA radar is shown in Figure 1. With the first transmitting array element as the reference array element, the carrier frequency of the

m

transmitting array element is:

f_{m} = f_{0} + (m - 1) Δ f

(1)

where

f_{0}

represents the carrier frequency of the reference array, and

Δ f

represents the frequency increment between adjacent arrays. The signal transmitted by the

m

transmitting array can be expressed as:

φ_{M} (t) = w_{m} ϕ (t) e^{j 2 π f_{m} t}

(2)

where

w_{m}

is the transmitting weight value representing the power distribution of each transmitting element.

ϕ (t)

is a baseband pulse waveform, and its bandwidth meets

B \geq Δ f

; so, the signals transmitted by each transmitting array meet the orthogonality in the frequency domain:

\int ϕ (t) ϕ^{*} (t - τ) e^{j 2 π (m - m^{'})} d t = \{\begin{matrix} 1, \forall τ, m = m^{'} \\ 0, \forall τ, m \neq m^{'} \end{matrix}

(3)

where

*

stands for conjugation.

Considering that a point target in space is located in the far field

(r_{0}, θ_{0})

, the received signal of the

n

-th receiving array element of the FDA radar can be expressed as:

\begin{matrix} x_{n} (t) & = α e^{j 2 π \frac{d}{λ_{0}} (n - 1) \sin θ_{0}} \sum_{m = 0}^{M_{t} - 1} ϕ (t - τ_{m}) w_{m} σ_{m}^{t} e^{j 2 π f_{m} (t - τ_{m})} \\ \approx α_{0} ϕ (t - τ) e^{j 2 π \frac{d}{λ_{0}} (n - 1) \sin θ_{0}} \sum_{m = 0}^{M_{t} - 1} w_{m} σ_{m}^{t} e^{j 2 π f_{m} t} e^{j 2 π m \frac{d}{λ_{0}} \sin θ_{0}} e^{- j 4 π m Δ f \frac{r_{0}}{c}} \end{matrix}

(4)

where

α_{0} = α e^{- j 4 π f_{0} \frac{r_{0}}{c}}

includes the path loss of signal propagation,

e^{j 2 π \frac{d}{λ_{0}} (n - 1) \sin θ_{0}}

represents the phase difference caused by the spacing of the received array elements,

w_{m}

represents the energy weight,

σ_{m}^{t}

represents the target response of each frequency channel,

τ_{m} = (2 r_{s} - m d \sin θ_{0}) / c = τ - m d \sin θ_{0} / c

represents the two-way delay of signal propagation, and

c

represents the speed of light. The signals received by the FDA radar array can be expressed as:

x_{t} (t) = α_{0} a_{r} (θ_{0}) {[w ⊙ a_{t} (r_{0}, θ_{0}) ⊙ σ^{t}]}^{T} d i a g {e (t)} Ψ (t - τ)

(5)

where

⊙

represents the Hadamard product,

{(\cdot)}^{T}

represents the transpose, and

d i a g {\cdot}

represents the diagonal matrix.

a_{t} (r_{0}, θ_{0})

and

a_{r} (θ_{0})

represent the transmit and receive array guide vectors, respectively.

e (t)

represents the transmission vector related to the time and frequency increments,

Ψ (t - τ)

is the transmitted baseband signal vector, and

x_{t} (t)

is the received signal vector. They can be expressed as:

a_{r} (θ_{0}) = {[1, e^{j 2 π \frac{d}{λ_{0}} \sin θ_{0}}, \dots, e^{j 2 π \frac{d}{λ_{0}} (M_{r} - 1) \sin θ_{0}}]}^{T}

(6)

w = {[w_{0}, w_{1}, \dots, w_{M_{t} - 1}]}^{T}

(7)

a_{t} (r_{0}, θ_{0}) = a_{t} (θ_{0}) ⊙ a_{t} (r_{0})

(8)

a_{t} (θ_{0}) = {[1, e^{j 2 π \frac{d}{λ_{0}} \sin θ_{0}}, \dots, e^{j 2 π (M_{t} - 1) \frac{d}{λ_{0}} \sin θ_{0}}]}^{T}

(9)

a_{t} (r_{0}) = {[1, e^{- j 4 π Δ f \frac{r_{0}}{c}}, \dots, e^{- j 4 π (M_{t} - 1) Δ f \frac{r_{0}}{c}}]}^{T}

(10)

σ^{t} = {[σ_{0}^{t}, σ_{1}^{t}, \dots, σ_{M_{t} - 1}^{t}]}^{T}

(11)

e (t) = {[e^{j 2 π f_{0} t}, e^{j 2 π f_{1} t}, \dots, e^{j 2 π f_{M_{t} - 1} t}]}^{T}

(12)

Ψ (t) = ϕ (t) \cdot I_{M_{t} * 1}

(13)

x_{t} (t) = {[x_{0} (t), x_{1} (t), \dots, x_{M_{r} - 1} (t)]}^{T}

(14)

where

I_{M_{t} \cdot 1}

represents a

M_{t} \cdot 1

dimensional vector in which the elements are all 1.

The receiving end of the array adopts the processing architecture of a multi-channel mixing and matching filter. The multi-channel processing scheme is shown in Figure 2.

After multi-channel mixing and matching filtering,

x (t)

can be expressed as

Y_{t} (w)

.

Y_{t} (w) = α_{0} a_{r} (θ_{0}) {[w ⊙ a_{t} (r_{0}, θ_{0}) ⊙ σ^{t}]}^{T}

(15)

Further, convert the matrix

Y_{t} (w)

to vector

y_{t} (w)

.

y_{t} (w) = v e c (Y_{t} (w)) = α_{0} (w ⊙ a_{t} (r_{0}, θ_{0}) ⊙ σ^{t}) \otimes a_{r} (θ_{0})

(16)

By observing (15) and (16), the received signal of the FDA radar is related to the transmission power allocation, and the spectrum of the transmitted signal can be designed by adjusting it to avoid spectrum jamming.

The jammer is assumed to align the jamming signal carrier frequency with the FDA radar signal carrier frequency and to implement the spectrum jamming of the radar. After passing through the

m

-th matching filter channel of a single receiving array, the jamming signal can be represented as [24]:

j_{m} = \int i (t) e^{- j 2 π f_{m} t} ϕ^{*} (t - τ) d t

(17)

The jamming signal processed by a single receiving array can be represented as a vector

j

.

j = [i_{0}, i_{1}, \dots i_{m}, \dots, i_{M_{t} - 1}]

(18)

Further, jammer signals and noise signals, after being processed by the receiving array, can be expressed as:

y_{j + n} = j \otimes a_{r} (θ_{j}) + n

(19)

where

θ_{j}

indicates the direction of the jammer.

n \in ℂ^{M_{t} M_{r} \cdot 1} \sim (0, σ_{n}^{2} I_{M_{t} M_{r} \cdot M_{t} M_{r}})

represents the noise vector.

The signals transmitted by each transmitting array are orthogonal in the frequency domain, and the jamming signals also meet the orthogonality after passing through the multi-matching filter. The correlation matrix of the jamming signal can be expressed as

P

.

P = d i a g (p)

(20)

p = [p_{0}, p_{1}, \dots, p_{M_{t} - 1}]

(21)

p_{m} = E {{|i_{m}|}^{2}}

(22)

where

E {\cdot}

represents calculation expectation.

The covariance matrix of the jamming signal and noise signal can be expressed as

R_{j + n}

.

R_{j + n} = P \otimes [a_{r} (θ_{j}) a_{r}^{H} (θ_{j})] + σ_{n}^{2} I_{M_{t} M_{r} \cdot M_{t} M_{r}}

(23)

The total signal received by the FDA radar can be expressed as the sum of the target echo signal, the jamming signal, and the noise signal.

y (w) = y_{t} (w) + y_{j + n} = α_{0} (w ⊙ a_{t} (r_{0}, θ_{0}) ⊙ σ^{t}) \otimes a_{r} (θ_{0}) + j \otimes a_{r} (θ_{j}) + n

(24)

The adaptive beamforming method adopted by FDA radar at the receiving end is minimum variance distortionless response (MVDR). After signal processing, the SJNR can be expressed as

S J N R (w)

.

S J N R (w) = σ_{0}^{2} (w ⊙ a_{t} (r_{0}, θ_{0}) ⊙ σ^{t}) \otimes a_{r} (θ_{0}))^{H} R_{j + n}^{- 1} (w ⊙ a_{t} (r_{0}, θ_{0}) ⊙ σ^{t}) \otimes a_{r} (θ_{0}))

(25)

The FDA radar transmitting power allocation optimization model can be formulated as an optimization problem with the optimization objective of maximizing SJNR and a constant energy constraint, as shown in Equation (26).

\begin{array}{l} \max & S J N R (w) \\ s . t . & {‖w‖}^{2} = 1 \\ {‖j‖}^{2} = 1 \end{array}

(26)

3. Single-Agent Reinforcement Learning

3.1. DDPG Algorithm

For the parametric measurement model in which target response varies with the transmitting frequency of the FDA radar, the frequency domain power allocation between the FDA radar and the jammer should not be regarded as a set of discrete actions [40]. Power allocation should be a continuous control problem; the action space is a continuous set [41]. For the continuous control method, this paper considers the application of the DDPG algorithm to realize the power allocation between the FDA radar and the jammer [42].

The DDPG algorithm is an actor–critic algorithm, which consists of a strategy network and a value network. The policy network control agent takes an action according to the state. The value network scores the action based on the state and guides the iterative optimization of the strategy network according to this. Training the strategy network involves improving the parameters to obtain higher values in the value network. The value network is trained by optimizing the parameters so that the value network’s prediction is closer to the actual value function. The relationship between the policy and value networks is shown in Figure 3.

Both the policy and value networks adopt the multi-layer perceptron (MLP), and the network information of the two is shown in Table 1 and Table 2, respectively.

The DDPG is an off-policy method, and its behavior strategy differs from the target strategy, which is

a = u (s; θ^{-})

. The off-policy method can collect experience data through one strategy, train another strategy network, and reuse the collected experience during training. At the same time, the DDPG algorithm is similar to the deep Q network (DQN) algorithm, which will cause an overestimation of the real action value. The main reason for overestimation is that the temporal difference (TD) target overestimates the actual value, and bootstrapping leads to overestimation propagation. In order to solve the overestimation problem, the target strategy network

u (s; θ^{-})

and the target value network

q (s, a; ω^{-})

must be introduced during the training. We increase the exploratory nature of the action by adding noise

ℕ

to the policy network. The detailed flow of the DDPG algorithm applied to power allocation is shown in Algorithm 1.

Algorithm 1 DDPG algorithm applied to power allocation

Initialize policy network

u (s; θ)

and value network

q (s, a; ω)

with random network parameters

θ

and

ω

. Initialize the target network by copying the same parameters,

θ^{-} \leftarrow θ

and

ω^{-} \leftarrow ω

.

Initialize the experience playback pool

R

Input the maximum number of rounds

E

, time step

T

, discount factor

γ

, policy network learning rate

l r_{a c t o r}

, value network learning rate

l r_{c r i t i c}

, soft update parameter

τ

,

R

storage capacity

N_{t o t a l}

, minimum amount of data required for sampling

N_{\min}

, and batch size

N

, Gaussian noise variance

σ^{2}

.

For

e = 1 \to E

do

Initialize Gaussian noise

ℕ

Get initial state

s_{1}

For

t_{0} = 1 \to T

do

Select action

a_{t} = u (s_{t}; θ) + ℕ

according to the current policy.

Act

a_{t}

, get the reward

R_{t}

, and the environment state changes to

s_{t + 1}

.

Store

(s_{t}, a_{t}, R_{t}, s_{t + 1})

in the experience playback pool

R

Sample

N

tuples

{\{(s_{i}, a_{i}, R_{i}, s_{i + 1})\}}_{i = 1, \dots, N}

from

R

Calculate the TD target for each tuple

y_{i} = R_{i} + γ q (s_{i + 1}, u (s_{i + 1}; θ^{-}); ω^{-})

.

Minimize loss function

L = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - q (s_{i}, a_{i}; ω))}^{2}

and update value network

Calculate the policy gradient

\nabla_{θ} J \approx \frac{1}{N} \sum_{i = 1}^{N} \nabla_{θ} u (s_{i}; θ) \nabla_{a} q (s_{i}, a; ω) |_{a = u (s_{i}; θ)}

and update the policy network

Soft update target strategy network

θ^{-} = τ θ + (1 - τ) θ^{-}

Soft update target value network

ω^{-} = τ ω + (1 - τ) ω^{-}

End for

3.2. FDA Radar Agent

The background of this section is that the power allocation of the jammer is constant in each frequency band, and the jammer is a non-intelligent agent that cannot achieve intelligent power allocation by interacting with the electromagnetic environment. The FDA radar is an agent that can use the DDPG algorithm to achieve intelligent allocation of the FDA radar transmitting power by interacting with an electromagnetic environment. The interaction process between the FDA radar and the electromagnetic environment is shown in Figure 4.

As shown in Figure 4, the status

s

perceived by the FDA radar is the frequency domain power allocation

p

of the jamming in an electromagnetic environment, where the frequency domain power is arbitrarily allocated in the jamming frequency band. The FDA radar action

a

is that it transmits power allocation

w

. The FDA radar’s reward

R

is the SJNR received by the radar, and it determines the power allocation preference of FDA radars. The FDA radar continuously adjusts the transmitting power allocation, obtains a reward, and updates the state. The FDA radar transmitting strategy is optimized with the maximum reward as the optimization objective and the policy network as the optimization variable. Finally, the FDA radar’s strategy network

u (p; θ)

, which is obtained by interacting with the electromagnetic environment, is its learned power allocation strategy model.

3.3. Jammer Agent

The background of this subsection is considered to be the fact that the FDA radar transmitting power is constant at each frequency band. The jammer is an agent that can realize the intelligent allocation of jamming power by interacting with an electromagnetic environment and using the DDPG algorithm. The interaction process between the jammer and the electromagnetic environment is shown in Figure 5.

As shown in Figure 5, the state

s

sensed by the jammer is the transmitting power distribution

w

of the FDA radar in the electromagnetic environment, where the transmitting power of the radar can be arbitrarily allocated in the signal frequency band. The jammer’s action

a

is its transmitted jamming power distribution

p

. The reward

R

obtained by the jammer is set to a negative value of SJNR, and the larger the reward function, the smaller the SJNR.

4. Multi-Agent Reinforcement Learning

In an electromagnetic environment, both radars and jammers are intelligent systems, and both can learn through sequential interaction with the environment to improve their strategies. Therefore, the FDA radar and jammer together constitute a multi-agent system. The FDA radar and the jammer share the electromagnetic environment. The two interact with each other, and the change in the action of one party causes a change in the electromagnetic environment, which causes the other party’s state to change.

Under the framework of MARL, the FDA radar and jammer are in the electromagnetic environment together, and both interact with the electromagnetic environment independently, using the obtained rewards to improve their strategies to obtain higher cumulative rewards. The strategy of the FDA radar and jammer depends not only on their observations and actions but also on the perception and actions of the other party. At the same time, the relationship between the two belongs to the perfect competition relationship; the loss of one party is the gain of the other party; the two parties constitute a zero-sum game, and the sum of the reward obtained is 0.

The observation of the FDA radar is the jammer’s power allocation

o^{F D A} = p

, and the action is its own transmitting power allocation

a^{F D A} = w

. The observation of the jammer is the FDA radar’s power allocation

o^{j a m m e r} = w

, and the action is the power allocation

a^{j a m m e r} = p

of its power allocation. The state of the entire electromagnetic environment is

s = [o^{F D A}, o^{j a m m e r}]

, and the action is

a = [a^{F D A}, a^{j a m m e r}]

. Since the FDA radar and jammer constitute a zero-sum game,

R^{F D A} = - R^{j a m m e r}

. The interaction diagram between the FDA radar and the jammer is shown in Figure 6.

For the sequential decision optimization problem of the FDA radar and jammer, this paper considers the application of the MADDPG algorithm [43], which is an algorithm based on the actor–critic method under the CTDE framework. In this framework, all the agents share a centralized value network, simultaneously evaluating each agent’s policy network during training. The policy network of each agent is entirely independent, enabling decentralized execution. The MADDPG algorithm shows the relationship between the policy and value networks in Figure 7.

Both the strategy and value networks adopt MLP, and the network information of the two is shown in Table 3 and Table 4, respectively.

Like the DDPG algorithm, it solves the overestimation problem using the target strategy networks and target value networks and increasing the exploitability of the action by adding noise

ℕ

to the policy network. The specific process of the MADDPG algorithm applied to the FDA radar and jamming intelligence is shown in Algorithm 2.

Algorithm 2 MADDPG algorithm applied to power allocation

Initialize the FDA radar and jammer strategy network

u (o^{i}; θ^{i})

and value network

q (s, a; ω^{i})

, with random network parameters

θ^{i}

and

ω^{i}

. Initialize the target network by copying the same parameters,

θ^{i -} \leftarrow θ^{i}

and

ω^{i -} \leftarrow ω

.

Initialize the experience playback pool

R

Input the maximum number of rounds

E

, time step

T

, discount factor

γ

, policy network learning rate

l r_{a c t o r}

, value network learning rate

l r_{c r i t i c}

, soft update parameter

τ

,

R

storage capacity

N_{t o t a l}

, minimum amount of data required for sampling

N_{\min}

, and batch size

N

, Gaussian noise variance

σ^{2}

.

For

e = 1 \to E

do

Initialize Gaussian noise

ℕ

Initial observations of FDA radars and jammers obtained constitute the initial state

s_{1} = [o_{1}^{F D A}, o_{1}^{j a m m e r}]

For

t_{0} = 1 \to T

do

For the

i

-th agent, select an action based on the current policy

a_{t}^{i} = u (o^{i}; θ^{i}) + ℕ

.

Act

a_{t} = [a_{t}^{F D A}, a_{t}^{j a m m e r}]

, obtain the reward

R_{t}

, and the environment state changes to

s_{t + 1} = [o_{t + 1}^{F D A}, o_{t + 1}^{j a m m e r}]

.

Store

(s_{t}, a_{t}, R_{t}, s_{t + 1})

in the experience playback pool

R

Sample

N

tuples

{\{(s_{t}, a_{t}, R_{t}, s_{t + 1})\}}_{t = 1, \dots, N}

from

R

Let the FDA radar and jammer target strategy network make predictions

{\hat{a}}_{t + 1}^{i -} = u (o_{t + 1}^{i}; θ^{i -}) + ℕ

,

\forall i = F D A, j a m m e r

. Obtain predictive actions acting on the electromagnetic environment

{\hat{a}}_{t + 1}^{-} = [{\hat{a}}_{t + 1}^{F D A -}, {\hat{a}}_{t + 1}^{j a m m e r -}]

.

Let the FDA radar and jammer target value network make predictions

{\hat{q}}_{t + 1}^{i -} = q (s_{t + 1}, {\hat{a}}_{t + 1}^{-}; ω^{i -})

,

\forall i = F D A, j a m m e r

.

Calculate the TD target for each tuple

y_{t}^{i} = R_{t}^{i} + γ {\hat{q}}_{t + 1}^{i -}

,

\forall i = F D A, j a m m e r

.

Let the value network of FDA radars and jammers make predictions

q_{t}^{i} = q (s_{t}, a_{t}; ω^{i})

,

\forall i = F D A, j a m m e r

.

Calculate TD error

δ^{i} = q_{t}^{i} - y_{t}^{i}

,

\forall i = F D A, j a m m e r

.

Update value network

ω^{i} = ω^{i} - α δ^{i} \nabla_{ω^{i}} q (s_{t}, a_{t}; ω^{i})

,

\forall i = F D A, j a m m e r

.

Let the FDA radar and jammer strategy network make predictions

{\hat{a}}_{t}^{i} = u (o_{t}^{i}; θ^{i}) + ℕ

,

\forall i = F D A, j a m m e r

. Obtain predictive actions acting on the electromagnetic environment

{\hat{a}}_{t} = [{\hat{a}}_{t}^{F D A}, {\hat{a}}_{t}^{j a m m e r}]

.

Update policy network

θ^{i} = θ^{i} - β \nabla_{θ^{i}} u (o_{t}^{i}; θ^{i}) \nabla_{a_{t}^{i}} q (s_{t}, {\hat{a}}_{t}; ω^{i})

,

\forall i = F D A, j a m m e r

.

Soft update target policy network

θ^{i -} = τ θ^{i} + (1 - τ) θ^{i -}

,

\forall i = F D A, j a m m e r

.

Soft update target value network

ω^{i -} = τ ω^{i} + (1 - τ) ω^{i -}

,

\forall i = F D A, j a m m e r

End for

5. Simulation Results

Set the number of transmitting and receiving array elements

M_{t} = M_{r} = 6

, array spacing

d = λ_{0} / 2

, carrier frequency

f_{0} = 10 GHz

, frequency increment

Δ f = 100 MHz

, and baseband pulse waveform bandwidth

B_{ϕ (t)} = 100 MHz

. The target is in

(30^{\circ}, 50 km)

, and the jammer is in

(30^{\circ}, 40 km)

. The signal-to-noise ratio is

S N R = - 10 dB

; the jamming-to-noise ratio is

J N R = 20 dB

. Different frequencies will cause changes in the radar cross-sectional (RCS). Set the target RCS for the FDA radar

M_{t} = 6

transmitting bands as

σ^{t} = {[20, 16, 12, 8, 4, 0.1]}^{T}

. The remaining simulation parameters are set in each section.

This simulation is divided into three parts. The first part is for the fixed jammer power allocation, which uses the DDPG algorithm to realize the FDA radar intelligent frequency domain power allocation; the second part is for the fixed radar transmitting power allocation, which uses the DDPG algorithm to realize the jamming power intelligent allocation; the third part is for the MARL framework, which verifies and analyzes the performance of the FDA radar intelligence against the jammer in the frequency domain power allocation using the MADDPG algorithm.

5.1. FDA Radar Intelligent Frequency Domain Power Allocation

The jamming power allocation of the jammer is set in two cases, respectively. Case 1: the jamming power is evenly allocated

p_{1} = [\frac{1}{6}, \frac{1}{6}, \frac{1}{6}, \frac{1}{6}, \frac{1}{6}, \frac{1}{6}]

. Case 2: align jamming to the maximum target scattering frequency

p_{2} = [1, 0, 0, 0, 0, 0]

. The other simulation parameters are set to:

l r_{a c t o r} = 0.0003

,

l r_{c r i t i c} = 0.003

,

E = 400

,

T = 25

,

h i d d e n d i m e n s i o n = 64

,

γ = 0.98

,

τ = 0.005

,

N_{t o t a l} = 10000

,

N_{\min} = 1000

,

N = 64

, and

σ^{2} = 0.05

.

The power allocation of the two jamming cases in the frequency–time two-dimensional plane is shown in Figure 8.

To compare the effect of improving the FDA radar power allocation strategy before and after training, the power allocation schemes of the first and last episode of the FDA radar under two jamming conditions are given, as shown in Figure 9.

We can see from Figure 9 that in the face of different jamming environments, the FDA radar does not know the optimal power allocation strategy at the beginning and conservatively adopts the power allocation strategy of uniform allocation as far as possible. Then, after interacting with the electromagnetic environment, the FDA radar realizes the optimization and improvement of its power allocation strategy by learning the power allocation strategy of the jammer. When faced with a uniform allocation of jamming power, the FDA radar concentrates most of its power on the bands with the largest RCS. When focusing jamming power on the band with the largest RCS, the FDA radar concentrates most of its power on the band with the next largest RCS. In both cases, the trend of return (SJNR) with the episodes is shown in Figure 10.

As shown in Figure 10, in the face of different jamming power allocation situations, the FDA radar uses the DDPG algorithm to improve SJNR by optimizing transmitting power allocation strategies. At the same time, we can find that when facing the fixed allocation of jamming power, the FDA radar can learn the jamming strategy quickly to achieve the purpose of transmitting power to avoid the jamming band. However, we can also find that the final convergence of the SJNR value is closely related to different jammer power allocation strategies.

5.2. Jammer Intelligent Frequency Domain Power Allocation

The FDA radar transmitting power allocation is set in two cases, respectively. Case 1: the transmitting power is evenly allocated

w_{1} = [\frac{1}{6}, \frac{1}{6}, \frac{1}{6}, \frac{1}{6}, \frac{1}{6}, \frac{1}{6}]

. Case 2: the allocated power is in three frequency bands

w_{2} = [0, \frac{1}{3}, \frac{1}{3}, 0, \frac{1}{3}, 0]

. The other simulation parameters are set to:

l r_{a c t o r} = 0.0003

,

l r_{c r i t i c} = 0.003

,

E = 2500

,

T = 25

,

h i d d e n d i m e n s i o n = 64

,

γ = 0.98

,

τ = 0.005

,

N_{t o t a l} = 10000

,

N_{\min} = 1000

,

N = 64

, and

σ^{2} = 0.05

.

The power allocation of the two FDA radar transmitting strategies in the frequency–time two-dimensional plane is shown in Figure 11.

To compare the effect of improving the power allocation strategy of the jammer before and after training, the jamming power allocation scheme of the first and last episode of the jammer under two FDA radar transmission strategies is given, as shown in Figure 12.

As shown in Figure 12, when faced with different FDA radar transmitting strategies, the jammer initially adopted a strategy of evenly allocating jamming power as much as possible. Then, after interacting with the electromagnetic environment, the jammer realized the optimization and improvement of its power allocation strategy by learning the power allocation strategy of the FDA radar. When evenly allocating the FDA radar transmitting power, the jammer allocates the power proportionally according to the size of the RCS in each frequency band. When concentrating the transmitting power of the FDA radar in a few frequency bands, the jammer aligns the jamming power with the corresponding frequency band, and the allocated jamming power is proportional to the size of the RCS of each frequency band. In both cases, the trend of SJNR with the episodes is shown in Figure 13.

As shown in Figure 13, in the face of different FDA radar transmitting power allocation conditions, the jammer uses the DDPG algorithm to reduce SJNR by optimizing jamming power allocation strategies. At the same time, when facing the FDA radar transmitting power fixed allocation, the jammer can learn the FDA radar transmitting power strategy in a very short time to achieve the purpose of reducing SJNR by adjusting the jamming strategy. However, we can find that the final SJNR value is also closely related to the FDA radar power allocation strategy.

5.3. FDA Radar and Jammer Intelligent Frequency Domain Power Countermeasures

In the simulation in this section, the power of the FDA radar and jammer is set to be continuously adjustable in each frequency band. The other simulation parameters are set to:

l r_{a c t o r} = 0.01

,

l r_{c r i t i c} = 0.01

,

E = 600000

,

T = 25

,

h i d d e n d i m e n s i o n = 64

,

γ = 0.95

,

τ = 0.01

,

N_{t o t a l} = 10000

,

N_{\min} = 4000

,

N = 1024

, and

σ^{2} = 0.05

.

The allocation of the transmitted power of the FDA radar and jammer in the initial episode in the frequency–time two-dimensional plane is shown in Figure 14.

As shown in Figure 14, the FDA radar and jammer evenly and approximately allocate the power throughout the spectrum in the initial episode. At the same time, the FDA radar and jammer explore the electromagnetic environment. Each interaction changes the transmission power allocation to seek the optimal power allocation strategy. To compare the effect of improving the power allocation strategy of the FDA radar and jammer before and after training, the power allocation scheme of the FDA radar and jammer in the last episode is given, as shown in Figure 15.

By observing Figure 15, we can find that after multiple rounds of interaction, the power allocation of the jammer is closely related to the target frequency response. The frequency band with a significant target frequency response has more jamming power, while the one with a small target frequency response has less. The power allocation of the FDA radar is also related to the target response characteristics. The transmitting power is mainly concentrated in the frequency band with the maximum target response, and the power of each frequency band is closely related to the target RCS. At the same time, the FDA radar considers exploration and utilization, allocating transmitting power in other frequency bands with a small probability of exploring whether the jammer can realize frequency domain perception and jamming. Therefore, the jammer’s power allocation should be related to the target response characteristics, and the jammer concentrates power in the frequency band with significant target response characteristics. The FDA radar concentrates most of its energy in the frequency band with the more significant target response characteristics while considering exploring the electromagnetic environment.

In [30], the authors use the neural fictitious self-play (NFSP) algorithm to propose that in the frequency domain confrontation between the radar and jammer, the jammer concentrates power in the frequency band with the most significant target response, and the radar concentrates transmission power in the frequency band with the second largest target response. Based on the limited interactions between radar and jammer, this conclusion was drawn in [30]. In practice, the sequence interaction process between the radar and jammer is approximately infinite. Once the radar fixed its power in a particular frequency band, the jammer must also aim its power at this frequency band. In the interaction process between the FDA radar and jammer, both can sense the electromagnetic environment and allocate the transmitting power in real time; so, it is an excellent strategy to correlate the transmitting power allocation with the target RCS. They can also explore the strategy space with a small probability of sensing whether the electromagnetic environment model has changed. The return obtained by the FDA radar is SJNR, and the return obtained by the jammer is -SJNR. The trend of the two returns with the episodes is shown in Figure 16.

It can be seen from Figure 16 that the gains of the FDA radar and jammer are the losses of each other. They continuously optimize their power allocation strategy through interaction with the electromagnetic environment, learn the electromagnetic environment information through continuous sequential decision making, and finally find their optimal power allocation strategy. At the same time, we can find that when the FDA radar transmitting power and jamming power change dynamically, the convergence time of the two will be significantly prolonged. Because the electromagnetic environment changes dynamically, it is difficult for the FDA radar and jammer to learn a fixed policy. Hence, their cognition of the electromagnetic environment is a probability function, and their goal is to maximize their profit. As shown in Figure 15, the power allocation of the FDA radar is related to the target frequency response characteristics, and the jamming power allocation is related to the power allocation of the FDA radar. When the FDA radar fully learns the target frequency response characteristics, both benefits converge and achieve the maximum expected benefit. Therefore, one way to shorten the convergence time is to input the target frequency response characteristics as a priori knowledge to the FDA radar.

6. Conclusions

In electronic warfare, the confrontation between the radar and the jammer is a sequence decision with multiple rounds of interaction. This paper aims to solve the policy optimization problem of the FDA radar and the jammer in the process of intelligent frequency domain power countermeasures. To accurately describe the intelligent countermeasure relationship between the FDA radar and the jammer, the sequential decision problem of the FDA radar and jammer is mapped to MARL, and the MADDPG algorithm is used to find the optimal power allocation strategy for the FDA radar in the case of a dynamic electromagnetic environment. In the training process, CTDE is used to accelerate the convergence of the algorithm. The simulation results demonstrate the effectiveness of the resulting power allocation strategy for the FDA radar, regardless of whether the jammer is intelligent.

Author Contributions

Conceptualization, C.Z. and C.W.; methodology, C.Z. and M.T.; software, C.Z. and X.G.; validation, C.Z., J.G. and L.B.; formal analysis, C.Z.; resources, X.G., J.G. and L.B.; writing—original draft preparation, C.Z.; writing—review and editing, C.Z., M.T., X.G. and L.B.; supervision, C.W. and J.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Funds of China under grant 62201580, the Natural Science Foundation of Shaanxi Province under grant 2023-JC-YB-533.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors would like to thank all the reviewers and editors for their comments on this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ge, M.; Cui, G.; Yu, X.; Kong, L. Mainlobe jamming suppression with polarimetric multi-channel radar via independent component analysis. Digit. Signal Process. 2020, 106, 102806. [Google Scholar] [CrossRef]
Shi, J.; Wen, F.; Liu, Y.; Liu, Z.; Hu, P. Enhanced and generalized coprime array for direction of arrival estimation. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 1327–1339. [Google Scholar] [CrossRef]
Li, K.; Jiu, B.; Liu, H.; Pu, W. Robust Antijamming Strategy Design for Frequency-Agile Radar against Main Lobe Jamming. Remote Sens. 2021, 13, 3043. [Google Scholar] [CrossRef]
Shi, J.; Yang, Z.; Liu, Y. On Parameter Identifiability of Diversity-Smoothing-Based MIMO Radar. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 1660–1675. [Google Scholar] [CrossRef]
Antonik, P.; Wicks, M.C.; Griffiths, H.D.; Baker, C.J. Frequency diverse array radars. In Proceedings of the 2006 IEEE Conference on Radar, Verona, NY, USA, 24–27 April 2006; IEEE: Arlington, VA, USA, 2006; pp. 215–217. [Google Scholar]
Lan, L.; Liao, G.; Xu, J.; Zhang, Y.; Liao, B. Transceive Beamforming with Accurate Nulling in FDA-MIMO Radar for Imaging. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4145–4159. [Google Scholar] [CrossRef]
Lan, L.; Marino, A.; Aubry, A.; De Maio, A.; Liao, G.; Xu, J.; Zhang, Y. GLRT-Based Adaptive Target Detection in FDA-MIMO Radar. IEEE Trans. Aerosp. Electron. Syst. 2020, 57, 597–613. [Google Scholar] [CrossRef]
Zhou, C.; Wang, C.; Gong, J.; Tan, M.; Bao, L.; Liu, M. Ambiguity Function Evaluation and Optimization of the Transmitting Beamspace-Based FDA Radar. Signal Process. 2023, 203, 108810. [Google Scholar] [CrossRef]
Lan, L.; Liao, G.; Xu, J.; Xu, Y.; So, H.C. Beampattern Synthesis Based on Novel Receive Delay Array for Mainlobe Interference Mitigation. IEEE Trans. Antennas Propag. 2023, 71, 4470–4485. [Google Scholar] [CrossRef]
Ding, Z.; Xie, J.; Xu, J. A Joint Array Parameters Design Method Based on FDA-MIMO Radar. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 2909–2919. [Google Scholar] [CrossRef]
Lan, L.; Rosamilia, M.; Aubry, A.; De Maio, A.; Liao, G. FDA-MIMO Transmitter and Receiver Optimization. IEEE Trans. Signal Process. 2024, 72, 1576–1589. [Google Scholar] [CrossRef]
Huang, B.; Basit, A.; Gui, R.; Wang, W.-Q. Adaptive Moving Target Detection without Training Data for FDA-MIMO Radar. IEEE Trans. Veh. Technol. 2022, 71, 220–232. [Google Scholar] [CrossRef]
Basit, A.; Wang, W.; Nusenu, S.Y. Adaptive transmit beamspace design for cognitive FDA radar tracking. IET Radar Sonar Navig. 2019, 13, 2083–2092. [Google Scholar] [CrossRef]
Huang, L.; Zong, Z.; Zhang, S.; Wang, W.-Q. 2-D Moving Target Deception against Multichannel SAR-GMTI Using Frequency Diverse Array. IEEE Geosci. Remote Sens. Lett. 2021, 19, 4006705. [Google Scholar] [CrossRef]
Zhou, C.; Wang, C.; Gong, J.; Tan, M.; Bao, L.; Liu, M. Joint optimisation of transmit beamspace and receive filter in frequency diversity array-multi-input multi-output radar. IET Radar Sonar Navig. 2022, 16, 2031–2039. [Google Scholar] [CrossRef]
Xiong, J.; Cui, C.; Gao, K.; Wang, W.-Q. Cognitive FDA-MIMO radar for LPI transmit beamforming. IET Radar Sonar Navig. 2017, 11, 1574–1580. [Google Scholar] [CrossRef]
Pengcheng, G.; Wu, Y. Improved transmit beamforming design based on ADMM for low probability of intercept of FDA-MIMO radar. J. Commun. 2022, 43, 133–142. [Google Scholar]
Gong, P.; Zhang, Z.; Wu, Y.; Wang, W.-Q. Joint Design of Transmit Waveform and Receive Beamforming for LPI FDA-MIMO Radar. IEEE Signal Process. Lett. 2022, 29, 1938–1942. [Google Scholar] [CrossRef]
Gong, P.; Xu, K.; Wu, Y.; Zhang, J.; So, H.C. Optimization of LPI-FDA-MIMO Radar and MIMO Communication for Spectrum Coexistence. IEEE Wirel. Commun. Lett. 2023, 12, 1076–1080. [Google Scholar] [CrossRef]
Lan, L.; Xu, J.; Liao, G.; Zhang, Y.; Fioranelli, F.; So, H.C. Suppression of Mainbeam Deceptive Jammer with FDA-MIMO Radar. IEEE Trans. Veh. Technol. 2020, 69, 11584–11598. [Google Scholar] [CrossRef]
Gui, R.; Wang, W.-Q.; Farina, A.; So, H.C. FDA Radar with Doppler-Spreading Consideration: Mainlobe Clutter Suppression for Blind-Doppler Target Detection. Signal Process. 2020, 179, 107773. [Google Scholar] [CrossRef]
Tan, M.; Gong, J.; Wang, C. Range Dimensional Monopulse Approach with FDA-MIMO Radar for Mainlobe Deceptive Jamming Suppression. IEEE Antennas Wirel. Propag. Lett. 2023, 23, 643–647. [Google Scholar] [CrossRef]
Gui, R.; Zheng, Z.; Wang, W. Cognitive FDA radar transmit power allocation for target tracking in spectrally dense scenario. Signal Process. 2021, 183, 108006. [Google Scholar] [CrossRef]
Wang, L.; Wang, W.; Hing, C.S. Covariance Matrix Estimation for FDA-MIMO Adaptive Transmit Power Allocation. IEEE Trans. Signal Process. 2022, 70, 3386–3399. [Google Scholar] [CrossRef]
Jakabosky, J.; Ravenscroft, B.; Blunt, S.D.; Martone, A. Gapped spectrum shaping for tandem-hopped radar/communications cognitive sensing. In Proceedings of the 2016 IEEE Radar Conference (RadarConf), Philadelphia, PA, USA, 2–6 May 2016; pp. 1–6. [Google Scholar]
Ravenscroft, B.; Blunt, S.; Allen, C.; Martone, A.; Sherbondy, K. Analysis of spectral notching in fm noise radar using measured interference. In Proceedings of the International Conference on Radar Systems, Belfast, UK, 23–26 October 2017. [Google Scholar]
Aubry, A.; Carotenuto, V.; De Maio, A.; Farina, A.; Pallotta, L. Optimization theory-based radar waveform design for spectrally dense environments. IEEE Aerosp. Electron. Syst. Mag. 2016, 31, 14–25. [Google Scholar] [CrossRef]
Stinco, P.; Greco, M.; Gini, F.; Himed, B. Cognitive radars in spectrally dense environments. IEEE Aerosp. Electron. Syst. Mag. 2016, 31, 20–27. [Google Scholar] [CrossRef]
Zhou, C.; Wang, C.; Gong, J.; Tan, M.; Bao, L.; Liu, M. Phase Characteristics and Angle Deception of Frequency-Diversity-Array Transmitted Signals Based on Time Index within Pulse. Remote Sens. 2023, 15, 5171. [Google Scholar] [CrossRef]
Selvi, E.; Buehrer, R.M.; Martone, A.; Sherbondy, K. Reinforcement Learning for Adaptable Bandwidth Tracking Radars. IEEE Trans. Aerosp. Electron. Syst. 2020, 56, 3904–3921. [Google Scholar] [CrossRef]
Thornton, C.E.; Kozy, M.A.; Buehrer, R.M. Deep Reinforcement Learning Control for Radar Detection and Tracking in Congested Spectral Environments. IEEE Trans. Cogn. Commun. Netw. 2020, 6, 1335–1349. [Google Scholar] [CrossRef]
Ding, Z.; Xie, J.; Qi, C. Transmit Power Allocation Method of Frequency Diverse Array-Multi Input and Multi Output Radar Based on Reinforcement Learning. J. Electron. Inf. Technol. 2023, 45, 550–557. [Google Scholar]
Wang, L.; Peng, J.; Xie, Z.; Zhang, Y. Optimal jamming frequency selection for cognitive jammer based on reinforcement learning. In Proceedings of the 2019 IEEE 2nd International Conference on Information Communication and Signal Processing (ICICSP), Weihai, China, 28–30 September 2019; pp. 39–43. [Google Scholar]
Liu, H.; Zhang, H.; He, Y.; Sun, Y. Jamming Strategy Optimization through Dual Q-Learning Model against Adaptive Radar. Sensors 2021, 22, 145. [Google Scholar] [CrossRef]
Bachmann, D.J.; Evans, R.J.; Moran, B. Game Theoretic Analysis of Adaptive Radar Jamming. IEEE Trans. Aerosp. Electron. Syst. 2011, 47, 1081–1100. [Google Scholar] [CrossRef]
Norouzi, T.; Norouzi, Y. Scheduling the usage of radar and jammer during peace and war time. IET Radar Sonar Navig. 2012, 6, 929–936. [Google Scholar] [CrossRef]
Li, K.; Jiu, B.; Pu, W.; Liu, H.; Peng, X. Neural Fictitious Self-Play for Radar Antijamming Dynamic Game with Imperfect Information. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 5533–5547. [Google Scholar] [CrossRef]
Geng, J.; Jiu, B.; Li, K.; Zhao, Y.; Liu, H.; Li, H. Radar and Jammer Intelligent Game under Jamming Power Dynamic Allocation. Remote Sens. 2023, 15, 581. [Google Scholar] [CrossRef]
Huang, L.; Li, X.; Wan, W.; Ye, H.; Zhang, S.; Wang, W.Q. Adaptive FDA Radar Transmit Power Allocation for Target Detection Enhancement in Clutter Environment. IEEE Trans. Veh. Technol. 2023, 72, 11111–11121. [Google Scholar] [CrossRef]
Zhou, C.; Wang, C.; Gong, J.; Tan, M.; Bao, L.; Liu, M. Expert systems with applications reinforcement learning for FDA-MIMO radar power allocation in congested spectral environments. Expert Syst. Appl. 2024, 251, 123957. [Google Scholar] [CrossRef]
Ding, Z.; Xie, J.; Yang, L. Cognitive Conformal Subaperturing FDA-MIMO Radar for Power Allocation Strategy. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 5027–5083. [Google Scholar] [CrossRef]
Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the International Conference on Machine Learning, PMLR, Beijing, China, 21–26 June 2014; pp. 387–395. [Google Scholar]
Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Pieter Abbeel, O.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. 2017, 30, 6379–6390. [Google Scholar]

Figure 1. The basic framework of FDA radar.

Figure 2. The multi-channel processing scheme.

Figure 3. The relationship between the policy and value networks in DDPG.

Figure 4. The interaction process between FDA radar and electromagnetic environment.

Figure 5. The interaction process between jammer and electromagnetic environment.

Figure 6. The interaction diagram between the FDA radar and the jammer.

Figure 7. The relationship between the policy and value networks in MADDPG.

Figure 8. Power allocation of jamming in the frequency–time 2-dimensional plane: (a) case one; (b) case two.

Figure 9. FDA radar power allocation scheme: (a) first episode in case one; (b) last episode in case one; (c) first episode in case two; (d) last episode in case two.

Figure 10. The trend of return (SJNR) with the episodes.

Figure 11. Power allocation of FDA radar in the frequency–time 2-dimensional plane: (a) case 1; (b) case two.

Figure 12. Jammer power allocation scheme: (a) first episode in case one; (b) last episode in case one; (c) first episode in case two; (d) last episode in case two.

Figure 13. The trend of SJNR with the episodes.

Figure 14. Allocation of transmitted power between FDA radar and jammer in initial episode: (a) FDA radar; (b) jammer.

Figure 15. FDA radar and jammer transmitting power allocation in the last episode: (a) FDA radar; (b) jammer.

Figure 16. The trend of the returns of FDA radar and jammer with the episodes.

Table 1. Policy network parameter information.

Layer	Input	Output	Activation Function
MLP1	state dimension	hidden dimension	Relu
MLP2	hidden dimension	hidden dimension	Relu
MLP3	hidden dimension	action dimension	softmax

Table 2. Value network parameter information.

Layer	Input	Output	Activation Function
MLP1	state and action dimension	hidden dimension	Relu
MLP2	hidden dimension	hidden dimension	Relu
MLP3	hidden dimension	1	/

Table 3. Policy network parameter information of MADDPG.

Layer	Input	Output	Activation Function
MLP1	observation dimension	hidden dimension	Relu
MLP2	hidden dimension	hidden dimension	Relu
MLP3	hidden dimension	agent action dimension	softmax

Table 4. Value network parameter information of MADDPG.

Layer	Input	Output	Activation Function
MLP1	state and environment action dimension	hidden dimension	Relu
MLP2	hidden dimension	hidden dimension	Relu
MLP3	hidden dimension	1	/

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, C.; Wang, C.; Bao, L.; Gao, X.; Gong, J.; Tan, M. Frequency Diversity Array Radar and Jammer Intelligent Frequency Domain Power Countermeasures Based on Multi-Agent Reinforcement Learning. Remote Sens. 2024, 16, 2127. https://doi.org/10.3390/rs16122127

AMA Style

Zhou C, Wang C, Bao L, Gao X, Gong J, Tan M. Frequency Diversity Array Radar and Jammer Intelligent Frequency Domain Power Countermeasures Based on Multi-Agent Reinforcement Learning. Remote Sensing. 2024; 16(12):2127. https://doi.org/10.3390/rs16122127

Chicago/Turabian Style

Zhou, Changlin, Chunyang Wang, Lei Bao, Xianzhong Gao, Jian Gong, and Ming Tan. 2024. "Frequency Diversity Array Radar and Jammer Intelligent Frequency Domain Power Countermeasures Based on Multi-Agent Reinforcement Learning" Remote Sensing 16, no. 12: 2127. https://doi.org/10.3390/rs16122127

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Frequency Diversity Array Radar and Jammer Intelligent Frequency Domain Power Countermeasures Based on Multi-Agent Reinforcement Learning

Abstract

1. Introduction

2. Signal Model

3. Single-Agent Reinforcement Learning

3.1. DDPG Algorithm

3.2. FDA Radar Agent

3.3. Jammer Agent

4. Multi-Agent Reinforcement Learning

5. Simulation Results

5.1. FDA Radar Intelligent Frequency Domain Power Allocation

5.2. Jammer Intelligent Frequency Domain Power Allocation

5.3. FDA Radar and Jammer Intelligent Frequency Domain Power Countermeasures

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI