1. Introduction
Due to the open nature of the channel, wireless communication is susceptible to both inadvertent and intentional jamming, significantly reducing the reliability and effectiveness of information transmission [
1]. To ensure reliable transmission in harsh electromagnetic environments, it is necessary to research anti-jamming technologies. Currently, common anti-jamming techniques in communication include spread spectrum [
2], power control [
3], and rate adaptation. However, traditional anti-jamming methods often rely on fixed strategies or rules. While these methods can counteract certain types and levels of malicious jamming, predefined patterns and parameters struggle to handle carefully designed malicious attacks, how to achieve efficient communication under unknown and dynamic malicious jamming has become a current research focus [
4,
5].
Reinforcement learning is a machine learning technique inspired by the living organisms’ natural tendency to avoid harm. In reinforcement learning, the agent interacts with the external environment by performing various actions and continuously learns from the feedback provided by the environment, thereby discovering strategies that are more advantageous to itself [
6]. Recently, reinforcement learning-based anti-jamming technologies for communication systems have garnered widespread attention. For example, Reference [
7] models the anti-jamming problem under temporal random pulse jamming as a Markov Decision Process (MDP) and proposes a Q-learning-based temporal anti-jamming algorithm, which enables the transmitter to flexibly switch between active and silent states to evade random pulse jamming. Reference [
8] proposes an alternating reinforcement learning anti-jamming algorithm for channel selection and power control under tracking jamming threats, which achieves optimal channel selection and suboptimal power control. However, real-world communication anti-jamming problems often involve large state-action spaces, leading to the “curse of dimensionality”. Classic reinforcement learning methods, which explore state-action spaces through single-step iteration, struggle to converge [
9] and face challenges in solving real-time online anti-jamming decision problems in complex jamming environments.
Deep reinforcement learning (DRL) [
10], with its powerful fitting capabilities through deep neural networks, alleviates the curse of dimensionality problem faced by traditional reinforcement learning, enabling effective anti-jamming in complex jamming environments. For instance, Reference [
11] proposes a cross-domain anti-jamming algorithm based on Deep Q-Network (DQN), which enables mobile nodes to learn optimal strategies for position adjustment and power control in unknown dynamic jamming environments, thereby achieving reliable transmission. Reference [
12] presents a deep reinforcement learning anti-jamming scheme based on the Actor-Critic (AC) framework for mobile edge networks, capable of simultaneously selecting offloading nodes, transmission power, and data rates to achieve anti-jamming data offloading. However, DRL algorithms represented by DQN have a slower adaptation speed to jamming environments, requiring longer training times, and struggle to maintain communication reliability under unknown and rapidly changing jamming.
To overcome the aforementioned drawbacks, some researchers have explored communication anti-jamming methods based on control theory [
13]. From the perspective of control theory, the normal transmission of wireless communication systems under unknown jamming is viewed as a control process affected by disturbance, with the jamming modeled as time-varying uncertainty disturbances in the control system. This approach helps to address or circumvent the limitations of existing machine learning-based anti-jamming algorithms. Reference [
14] proposes a robust power control scheme for cognitive radio networks based on Lyapunov stability theory and switched affine systems. However, due to the specific network forms targeted, the proposed methods are difficult to apply to general wireless communication networks, limiting its universality. Reference [
15] introduces a stability control algorithm based on switched systems and multiple Lyapunov functions, which can adaptively adjust modulation and coding schemes as well as transmission power according to jamming conditions, thereby improving BER performance and maintaining system stability. However, for more complex electromagnetic environments, simple state feedback controllers may not provide sufficient stability guarantees and convergence speed.
Table 1 shows the advantages and disadvantages of existing algorithms.
This paper introduces optimal control and proposes a multi-parameter control anti-jamming algorithm for wireless communication systems based on system dynamic descriptions, which does not rely on known system parameters. This approach enhances transmission reliability and achieves effective anti-jamming communication in rapidly varying unknown jamming environments.
Specifically, the innovations of this paper can be summarized as follows:
To address the complexity associated with simultaneous adjustment of power and modulation coding in communication systems, this paper introduces a modeling approach based on linear switching systems. By employing stability analysis theory, control rules based on the signal-to-jamming-and-noise ratio (SJNR) are formulated, which delineate switching intervals and correspondingly match modulation and coding schemes, thereby effectively reducing system complexity.
In the subsystem, the Linear–Quadratic Regulator (LQR) is introduced for power control to achieve the rapid stabilization of the bit error rate under burst interference. Additionally, the multiple Lyapunov function method is employed to optimize stability rules by constructing a corresponding Lyapunov function for each modulation and coding scheme, thereby reducing the conservativeness of the stability rules.
The structure of this paper is organized as follows:
Section 2 describes the problem and system modeling methods;
Section 3 provides a detailed design of the feedback controller in the subsystem;
Section 4 presents the sufficient conditions for stability control rules;
Section 5 details the methodology and steps of the algorithm, along with the system flowchart;
Section 6 conducts comprehensive experiments under three types of jamming; and
Section 7 summarizes the paper and presents the conclusions.
2. Problem Description and System Modeling
The system model of this paper is illustrated in
Figure 1. The wireless communication system consists of a transmitter and a corresponding receiver, with adaptive capabilities for power and modulation coding adjustment. The communication transmission of this system is affected by a malicious jammer, and the jamming signal can effectively cover the receiver. Consider an anti-jamming control model for the wireless communication system, as shown in
Figure 2. After perceiving the electromagnetic environment, the transmitter adjusts its transmission power and modulation coding based on feedback information to ensure that the receiver’s bit error rate does not exceed the preset target error rate.
For the convenience of the study, the following assumptions are made in this paper:
The transmitter’s transmission power is , and the wireless communication system operates over an Additive White Gaussian Noise (AWGN) channel.
The receiver has spectrum sensing capability, allowing it to sense the power levels of jamming and noise in channel on a time-slot basis, but it cannot determine the behavior characteristics, patterns, or probability distribution of the jamming.
Ignoring free-space propagation losses, and with both the transmission power and the power of jamming plus noise in the channel expressed in dBm, the SJNR during transmission can be simply represented as: .
The BER at the receiver under is , with the target BER for normal system operation being .
The system has M modulation and coding schemes, corresponding to M transmission rates, where represents the system under the i-th modulation and coding combination.
Figure 3 shows the schematic curves of system BER under different modulation and channel coding combinations.
The purpose of system modeling is to describe the system using the following form of linear differential switching equations:
Here, are referred to as the dynamic characteristic coefficients, control coefficients, sensing coefficients, and direct terms of the i-th subsystem, respectively.
Choose the transmission power as the control input variable, the SJNR at the receiver as the system state variable, and the BER as the system output variable.
The curves of BER versus SJNR under different modulation and coding schemes are shown in
Figure 3. It can be observed that when the BER
, the BER curve generally enters the “waterfall area” and can be approximated as a straight line. Therefore, a linear equation can be used to approximate the relationship between SJNR and BER for a given modulation and coding scheme. Consequently, the system under M modulation and coding combinations can be expressed in the following piecewise function form:
In the formula, the sensing coefficient
and the direct term
are determined by the slope and intercept of the fitted line for a given modulation and coding scheme;
i represents the currently active subsystem, and
is the SJNR threshold value for the
i-th modulation and coding scheme. Since the SJNR at the receiver at time
can be expressed as
, the system state is composed of the control input
and the noise and jamming power at that time. Assuming that the received noise and jamming power can be accurately measured, it follows that:
According to Equation (3), the rate of change in the SJNR over time for the
i-th modulation and coding scheme is:
In the formula,
represents the control parameter for the
i-th modulation and coding scheme. According to Equation (2), the dynamic characteristic parameters are
and the control parameters are
. Therefore, combining Equations (1) and (4), the state equation of the system can be modeled in the following switching system form:
To simplify the calculations, define
The control rule of the wireless communication switching system is to divide the switching intervals based on the SJNR, with each interval corresponding to a specific modulation and coding scheme. In the event of jamming, if the strength of the jamming is considerable, resulting in an instantaneous shift in the SJNR interval from one modulation and coding scheme to another, the system initiates a transition in the modulation and coding scheme and subsequently adjusts the transmission power in a manner that ensures the desired BER is maintained. In the event of weak jamming, which results in the instantaneous SJNR remaining within the same interval, the adjustment made is to the transmission power. This is illustrated in
Figure 4.
The stability control problem of the wireless communication switching system lies in how the system’s BER can quickly converge to the target value when the system perceives the electromagnetic environment and adjusts its transmission power, modulation, and coding schemes accordingly. The following section designs the anti-jamming power controller introduced in each wireless communication switching subsystem.
3. Design of the Feedback Controller
To ensure that the output BER of the wireless communication system quickly stabilizes at the target value, appropriate state feedback must be used to configure the system’s eigenvalues. Therefore, each subsystem’s feedback loop employs continuous-time linear quadratic optimal control [
16], as shown in
Figure 5. The goal is to design a state feedback controller
that minimizes the quadratic cost function of the system state, ensuring that the output
of the jammed communication system quickly restores the target bit error rate in an optimal form according to the performance criteria. This approach enables a fast response to external jamming while maintaining system stability.
For convenience in analyzing the switching system, let:
The system state can thus be transformed into:
where
.
In the absence of jamming, the equilibrium point of the wireless communication system is , where is the target SJNR, and is the target transmission power. Let , , where represents the system state error and represents the control input error. Then, the system dynamic error equation is .
Substituting the state feedback controller into the dynamic error equation yields:
The quadratic performance index is:
where
represents the weight of the system state variables, and
represents the weight of the system control inputs.
Substituting Equation (10) into the cost function
yields:
To solve for matrix K, the presence of the integral term complicates the calculation. Therefore, it is assumed that there exists a constant matrix P such that:
Substituting Equation (13) into the cost function given by Equation (12) yields:
Simplifying Equation (14) yields:
Substituting Equation (10) into Equation (14) gives:
It can be demonstrated that for Equation (17) to be valid, the terms within the parentheses must be identically zero, which yields:
Equation (21) contains a quadratic term involving matrix K, which complicates the calculations. Additionally, since matrix P is assumed to be a constant matrix, it is therefore assumed that:
Thus, it can be obtained that:
Substituting Equation (22) into Equation (21) to eliminate the quadratic term involving K yields:
Equation (24) is referred to as the degenerate matrix Riccati equation. By solving this equation, matrix P can be obtained. If a positive definite matrix P exists, the system is stable. Matrix P can then be substituted into Equation (21) to obtain matrix K. The resulting state feedback matrix K is the optimal matrix.
Thus, the optimal state feedback controller can be obtained as:
Due to the variable substitutions made for analytical convenience, the actual output of the current wireless communication system power controller, according to Equation (6), should be:
5. Implementation Steps and Flowchart of the Proposed Method
During the performance simulation of the proposed algorithm, the system state and input at each time step are computed based on Equation (5). To mitigate potential errors arising from the discretization of continuous systems, the fourth-order Runge–Kutta method is employed to approximate the continuous system [
18]. This method is designed to simulate the continuous system’s dynamic behavior with high accuracy during the time-stepping process, thereby minimizing errors introduced by system discretization. The specific steps are as follows:
When the solution process reaches the time step, to calculate the SJNR at the next time step moment, the first slope is obtained from the system state at the current time . Then, the second slope is computed using the first slope , followed by the third slope , which is derived from the second slope . Finally, the fourth slope is calculated using the third slope . The weighted average of these slopes is then used as an approximation of the average rate of change in SJNR: .
Based on the above analysis, the specific algorithm flow is as follows (Algorithm 1):
Algorithm 1: Multi-parameter Control Anti-jamming Algorithm for Wireless Communication System Based on LQR |
1: Initialization: Set the initial values of the system: system target bit error rate yr(t), system state equilibrium point (xr(t), ur(t)), step size dt |
2: for t = 1, 1 + dt,⋯, T do |
3: Calculate the current system output bit error rate by substituting the system state xi(t) into Equation (5); |
4: Substitute the signal-to-jamming-and-noise ratio (SJNR) and the output of the power controller (Equation (26)) into the system state equation (Equation (5)). Then, apply the fourth-order Runge–Kutta method for numerical integration of the state equation to solve for the rate of change in SJNR ; |
5: Substitute the rate of change into to determine the SJNR at the next time step; |
6: Switch the subsystem based on the value of the SJNR, adjust the modulation and coding scheme, and modify the power accordingly; |
7: t = t + 1; |
8: end for |
The system flowchart is shown in
Figure 6.
6. Simulation Analysis
In this section, MATLAB simulations are conducted to evaluate the performance of the proposed power control algorithm within the subsystems, the impact of weights on subsystem performance, and the performance of the multi-parameter control anti-jamming algorithm for wireless communication systems based on LQR.
Simulation 1 compares the performance of the proposed power control algorithm within the wireless communication subsystem with that of the traditional power adaptive algorithm and the power control algorithm base on PID under sudden jamming conditions.
Using Binary Phase Shift Keying (BPSK) modulation and (2016, 504) Low-Density Parity-Check Code (LDPC) as examples, the BER curve, after fitting, results in a straight line
, and then the sensing coefficient
. The state weight Q is set to 10, the input weight R is set to 0.001, the simulation duration is 10s, the sampling interval is 0.01 s, and the target BER is 10
−4.
Figure 7 illustrates the simulation comparison of BER curves for the proposed power control algorithm in the subsystem, traditional power adaptive algorithms, and power control algorithms based on PID under burst jamming conditions.
From
Figure 7a, it can be observed that the communication system’s transmission channel is AWGN with a signal-to-noise ratio of 30 dB. The system experiences a burst jamming with a power of 5 dBm at t = 5 s, lasting for 0.2 s.
Figure 7b shows that the proposed algorithm can rapidly respond to jamming, with a response speed significantly superior to existing methods. The proposed algorithm converges to the target BER within 0.01s, even while the jamming persists. Compared to the power control algorithm based on PID and traditional power adaptive methods, the proposed algorithm achieves faster convergence of the output BER to the target BER and exhibits less BER fluctuation.
Simulation 2 compares the impact of different weighting factors on the performance of the power control algorithm in the proposed subsystem.
From
Figure 8a, it can be observed that the system is subjected to periodic pulse jamming with a power of 5 dBm, lasting 0.2 s with a period of 1 s.
Figure 8b shows the bit error rate curve with the selected state variable weight Q = 10 and varying control input weights. It is evident that as the control input weight decreases, the oscillation amplitude of the system output becomes more subdued and the response time improves. This is because reducing the control input weight lowers the penalty on the control input, allowing the system to apply a larger control input to more rapidly adjust the state variables to the target values.
In Simulation 3, the performance of the proposed multi-parameter control anti-jamming algorithm for wireless communication systems based on LQR is compared and analyzed with that of the power control algorithm based on LQR for unmodeled switching systems, as well as the multi-parameter control anti-jamming algorithm based on PID, under the condition of random pulse jamming.
Assume that the system initially uses Quadrature Phase Shift Keying (QPSK) modulation and (2016, 504) LDPC coding for information transmission. The system is capable of switching between different modulation and coding schemes freely when exposed to external jamming, while maintaining a high transmission rate in the absence of jamming or when jamming is minimal.
Using QPSK modulation and BPSK modulation, with both using (2016, 504) LDPC coding, as examples of switching between two subsystems, the BER curves, after fitting, are represented by straight lines
and
, respectively. Therefore, based on the system modeling conclusions, the linear state equations for these two subsystems can be obtained as follows:
From
Figure 3, it can be observed that when the BER of the wireless communication system reaches the target value of 10
−4, the signal-to-jamming-plus-noise ratio of the system under QPSK modulation is −1.5. Furthermore, when
, the bit error rate of the system under QPSK modulation is greater than the target value of 10
−4, while the BER under BPSK modulation remains within the target range. Therefore, to ensure that the system’s BER remains within the target value while switching to the optimal modulation and coding scheme, the switching intervals for the two modulation and coding schemes are set as
and
, respectively. The system’s state-space equations are then given by:
Figure 9 presents a simulation comparison of the BER curves for the proposed algorithm, the power control algorithm based on LQR for unmodeled switching systems, and the multi-parameter control anti-jamming algorithm for wireless communication systems based on PID under random jamming conditions.
From
Figure 9a, it can be observed that the communication system’s transmission channel is AWGN with a signal-to-noise ratio of 30 dB, and the channel is subjected to random pulse jamming with power levels of 1 dBm and 5 dBm.
Figure 9b shows the BER curves of each system as a function of time. The analysis reveals that the power-controlled communication system without the switching mechanism experiences a sudden increase in BER under higher power jamming, peaking at 10
−3.833, which significantly affects the transmission reliability of the communication system. The BER of the multi-parameter control anti-jamming algorithm for wireless communication systems based on PID shows considerable fluctuation and slower response speed. In contrast, the proposed algorithm’s bit error rate is almost never above the target bit error rate.