The difficulty in solving the above model lies in the fact that the types of two resource variables are different, and they exhibit nonlinear coupling in both the objective function and constraints, resulting in a high-dimensional mixed-integer nonlinear programming (MINLP) problem. To efficiently solve the optimal scheduling strategy, combining the advantages of swarm intelligence methods, an adaptive hierarchical strategy solution framework based on the Improved Adaptive Parameter Evolution Marine Predators Algorithm (IAPEMPA) is proposed. The algorithm improvement method and solution framework details are given below.
5.2.1. Improved Adaptive Parameter Evolution Marine Predators Algorithm
The MPA [
36,
37,
38,
39,
40] is a novel nature-inspired optimization method proposed by Faramarzi et al. in 2020 that solves the optimal solution by simulating predators pursuing prey populations in marine ecosystems. In the MPA algorithm, there are two important matrices, one is the prey matrix
Prey composed of prey populations, and the initialization equation for the element
in the matrix is
where
xU and
xL are the upper and lower bounds of the search space, and rand is a uniform random number in the range [0, 1]. The prey matrix can be represented as
where
Np is the population size and
d is the individual dimensions in the population.
Calculate the fitness of each row vector
in matrix (29), and duplicate
Np copies of the individual
with the best fitness to form another elite matrix
Elite.
The key to the MPA algorithm is three-stage optimization and the
FADs effect. Specifically, the algorithm divides the iterative process into three stages, with each stage adjusting the position update equation based on the speed ratio of predator and prey. The
FADs effect is an effective method specially designed to avoid local optima. The specific population position update method is shown in Equation (31).
where
t is the current iteration,
Max_T is the maximum number of iterations,
RB is a Brownian motion random vector based on a normal distribution,
RL is a random vector based on the Lévy distribution,
R is a uniform random vector in the range of [0, 1],
CF is an adaptive parameter that controls the predator’s movement step size,
U is a binary random vector, and
r is a random number in [0, 1],
P = 0.5,
FADs = 0.2.
The canonical MPA algorithm exhibits competence in continuous optimization, but it has the disadvantages of slow convergence speed and strong parameter dependence, which cannot be directly applied to this task. To overcome this gap, multiple strategies are adopted for this algorithm, so as to further improve its accuracy and real-time performance while adapting to resource scheduling strategies solutions.
The specific design for encoding and decoding is as follows:
(1) Encoding and decoding of jamming beam allocation variables.
As shown in
Figure 2, the beam allocation variable is represented by a real vector with dimension
M ×
P, where the upper and lower bounds of each element are 1 and −1. Let the encoded position vector of the prey be
, with the decoded method as follows: sort the elements in the position vector in ascending order to obtain an index vector
, take the first
N elements to form a new vector
, and the number of platforms that jam the radar
n is
, where % is the remainder sign.
(2) Encoding and decoding of jamming mode selection variables.
To be consistent with the strategy design of the algorithm, real number encoding is also chosen for the jamming mode selection variable, which is represented by a vector with dimension
M, where the upper and lower bounds of each element are
I and 0. Let the encoded position vector of the prey be
, with the decoded method as follows: perform an up-rounding operation on the elements in the vector, then obtain the jamming mode number selected by each platform, as shown in
Figure 3.
- 2.
Dominant strategy adaptive selection mechanism (DSASM).
The canonical MPA partitions the population into static subgroups for Lévy flight exploitation (first 50%) and Brownian motion exploration (remaining 50%) during mid-iteration phases. However, this rigid strategy allocation neglects individual evolutionary states, often trapping high-fitness particles in local optima while underutilizing promising search regions. Hence, a dominant strategy adaptive selection mechanism is proposed based on the evolutionary information of the population. Specifically, the individual in the population selects the more favorable strategy according to its own fitness and the historical efficiency of the two strategies. The selection probability of the two strategies is calculated as
where
and
denote the probability that individual
i selects either the Lévy flight strategy or the Brownian motion strategy in generation
t,
is the fitness of individual
i,
is the average fitness of the population,
and
denote the historical efficiency of adopting Lévy flight and Brownian motion, respectively, where
NumL denotes the number of individuals that generate a new prey position superior to the original one after adopting Lévy flight,
popL is the total number of individuals that adopt this strategy, and
NumB and
popB are numbers of the same meaning after adopting Brownian motion. The initial values of
and
are set to 0.5.
The process of the DSASM is shown in Algorithm 1.
Algorithm 1: Dominant Strategy Adaptive Selection Mechanism |
1: | for each individual i do |
2: | Calculate the fitness value of individual i; |
3: | Calculate the selection probabilities and of two strategies based on Equations (32)–(34); |
4: | Generate a random number ; |
5: | if then |
6: | The individual i executes the Levy flight strategy; |
7: | else |
8: | The individual i executes the Brownian motion strategy; |
9: | End if |
10: | end for |
- 3.
Adaptive Parameter evolution mechanism.
The parameter choice plays an important role in the convergence speed and evolution direction of the algorithm. To overcome the dependence of the step size control parameter CF and the Fish Aggregating Devices effects parameter FADs in the MPA algorithm, an adaptive parameter evolution mechanism based on parameter storage is proposed.
Parameter storage means that when the population completes a position update, the parameters that find a better position (i.e.,
) are stored in the corresponding sample library, denoted as beneficial parameter sample libraries
PCF and
PFADs. The weighted Lehmer mean
of these samples is calculated as the sampling benchmark for the parameter update before the next iteration begins. The sampling benchmark is updated as follows
where
represents
CF or
FADs.
represents the
kth parameter in the sample library.
Considering the differences in the function and range constraints of the two parameters,
CF and
FADs choose two distribution sampling generation strategies in the potential solution space respectively.
CF selects Cauchy distribution and exponential distribution sampling, while
FADs selects normal distribution and quadratic polynomial distribution sampling. The sampling criterion is designed as follows:
where
randn(0,0.1) denotes a random number based on normal distribution with mean 0 and variance 0.1,
randc(0,0.1) denotes a random number based on a Cauchy distribution with mean 0 and variance 0.1,
r1 and
r2 are uniform random numbers in [0, 1], and
a0 = 0.3 and
a1 = 0.15 are constants. The sampling criterion adaptively adjusts the sampling range of the parameters in the next stage based on the historical sampling experience and improves the sampling probability of beneficial parameters by alternately using parameter sampling strategies with different distributions.
- 4.
Search intensity control strategy.
The main idea of search intensity control is to control the population size and FADs effect frequency to reduce unnecessary computational costs and help find potential optimal solutions faster. The specific method is as follows:
(1) Population size control. During the evolution process, the population will gradually gather around the potential optimal solution, exhibiting a similarity trend, and the definition of population similarity
is as follows:
where
dist() calculates the Euclidean distance between two vectors, and
and
represent the optimal individual of generation
t-th and the
t+1-th generation, respectively. Then the population size is updated as follows:
where
and
represent the upper and lower bounds of population size.
(2)
FADs effect frequency control. The MPA algorithm uses the
FADs effect to jump out of the local optimal solution, but using this mechanism for some potentially bad individuals will increase additional computational cost. Thus, an activation operator is designed to activate this mechanism when a certain number of iterations and stagnation occur. The expression of the activation operator
is as follows:
When the iteration growth rate is greater than the current optimal solution’s fitness growth rate, it is considered that the risk of the population falling into local optima increases, and increases accordingly, making it easier to activate the FADS effect mechanism. Conversely, as decreases, the activation probability of the mechanism adaptively decreases.
Algorithm 2 provides the pseudocode for IAPEMPA.
Algorithm 2: The procedure of IAPEMPA |
Input: Fitness function , dimension d, upper and lower bounds of variables , upper and lower bounds of population size , maximum number of iterations Max_T |
Output: Optimal solution and its fitness values: and |
1: | Initialize the population Prey (i = 1,…, Np), calculate the fitness, and construct the Elite matrix Elite, t = 1 |
2: | while t < Max_T do |
3: | if t Max_T/3 then |
4: | Update prey population based on Equation (31) |
5: | Else if Max_T/3 < t 2 × Max_T/3 then |
6: | Execute Algorithm 1 to determine the search strategy for each individual |
7: | Update prey population based on Equation (31) |
8: | Else if t > 2 × Max_T/3 then |
9: | Update prey population based on Equation (31) |
10: | End if |
11: | Calculate population fitness, achieve memory saving and update the Elite matrix Elite |
12: | Update parameters CF and FADs using Equations (35)–(39) |
13: | Calculate the parameter using Equation (42) |
14: | Generate a random number r |
15: | if r < then |
16: | Apply FADs effect based on Equation (31) |
17: | End if |
18: | Update population size Np based on Equations (40) and (41) |
19: | Set t = t + 1 |
20: | end while |
5.2.2. Adaptive Scheduling Strategy Solution Framework Embedded with IAPEMPA
In this section, an adaptive hierarchical jamming resource scheduling solution framework suitable for embedding swarm intelligence methods is proposed, including two main phases: dynamic adjustment of the objective function and optimizer solution. The design framework is shown in
Figure 4. Below is a detailed explanation of the two phases.
The non-cooperative game of radar confrontation requires automatic adjustment of the resource scheduling task model objective to adapt to the state change of each radar in the network. In a specific air defense mission, the radar adopts several specific signal patterns in each state and performs Markov state transitions based on the quality of the target information obtained.
Table 1 shows the correspondence between the radar state and the radar signal observation values through extensive reconnaissance statistics in the early stage of this task, including five states labeled as
and eight observation values labeled as
. Considering that the radar state is not directly observable, the HMM model is constructed based on the observation of the radar’s signal pattern sequence
. Firstly, the EM method [
41] is used to obtain the radar state transition probability and signal observation probability, which provide prior decision information for resource scheduling tasks. Then the Bayes rule is used to update the radar state belief probability based on observations as follows:
where
b′ is the updated belief probability,
represents the current belief probability when the radar is in state
,
represents the transition probability from state
to state
, and
represents the probability of observing
when the state is
. The estimated state of the radar is the state with the highest belief probability.
After completing each radar’s state estimation in the network, penalty control weights
are set considering the jamming interception risk. If the interception coefficient of sensor platform
m reaches the upper limit 1 at time
k, the corresponding weight is updated as follows:
where
is a large positive real number. Then, Equation (26) can be redescribed as
- 2.
Optimizer solution.
To meet the real-time requirements of online resource scheduling and be applicable to embedded management of other swarm intelligence algorithms, a hierarchical strategy optimizer is designed to decouple the beam allocation variables and jamming mode selection variables in the beam allocation layer (upper layer) and mode selection layer (lower layer), respectively. Specifically, in the beam allocation layer, the model (26) fixes and transforms into a subproblem (46) with respect to . Then, the output of the beam allocation layer is used as the input of the style selection layer, and the model (26) fixes and transforms into a subproblem (47) with respect to . By iteratively optimizing between the upper and lower layers until the termination condition is reached and the optimization stops, the optimal solution that satisfies the constraints is obtained. The specific steps of the IAPEMPA-based optimizer solution strategy are as follows:
(1) Initialize a beam allocation scheme
using a beam allocation encoding strategy and substitute it into Equation (26) to transform it into a decision problem
OP 5.1 only with respect to
.
Obviously, Equation (46) is a univariate optimization problem, which can be solved using IAPEMPA to obtain the potential optimal jamming mode selection scheme .
(2) Next, fix the jamming mode selection scheme
, and then re-optimize the beam allocation scheme. The decision problem is transformed into
Similarly, use the IAPEMPA to solve OP 5.2 and update the beam allocation scheme.
(3) Solve the subproblem OP 5.1 and OP 5.2 iteratively and update the beam allocation scheme and jamming mode selection scheme; when the maximum number of iterations is reached or the change of the objective function value is less than a given value, the loop stops. Finally obtain the optimal beam allocation scheme and jamming mode selection scheme at time k.
The specific process of the joint jamming resource scheduling scheme generation is shown in
Figure 5.