A CNN-Based Adaptive Federated Learning Approach for Communication Jamming Recognition

Zhang, Ningsong; Li, Yusheng; Shi, Yuxin; Shen, Junren

doi:10.3390/electronics12163425

Open AccessArticle

A CNN-Based Adaptive Federated Learning Approach for Communication Jamming Recognition

¹

The Sixty-Third Research Institute, National University of Defense Technology, Nanjing 210007, China

²

School of Electronic Science, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(16), 3425; https://doi.org/10.3390/electronics12163425

Submission received: 4 July 2023 / Revised: 8 August 2023 / Accepted: 10 August 2023 / Published: 13 August 2023

(This article belongs to the Special Issue Multi-Scale Communications and Signal Processing)

Download

Browse Figures

Versions Notes

Abstract

:

The effective and accurate recognition of communication jamming is crucial for enhancing the anti-jamming capability of wireless communication systems. At present, a significant portion of jamming data is decentralized, stored in local nodes, and cannot be uploaded directly for network training due to its sensitive nature. To address this challenge, we introduce a novel distributed jamming recognition method. This method leverages a distributed recognition framework to achieve global optimization through federated learning. Each node independently trains its local model and contributes to the comprehensive global model. We have devised an adaptive adjustment mechanism for the mixed weight parameters of both local and global models, ensuring an automatic balance between the global model and the aggregated insights from local data across devices. Simulations indicate that our personalization strategy yields a 30% boost in accuracy, and the adaptive weight parameters further enhance the recognition accuracy by 1.1%.

Keywords:

federated learning; jamming recognition; convolutional neural networks

1. Introduction

The openness of the wireless channel makes wireless communication vulnerable to jamming [1,2,3,4]. To address this issue, a lot of anti-jamming methods were proposed, such as the direct sequence spread spectrum and frequency hopping. Deep reinforcement learning has also been employed to address the decision of anti-jamming [5,6,7]. Notably, the premise of effective anti-jamming is to identify jamming signals correctly. However, several fundamental challenges hinder effective jamming recognition:

Diverse Signal Characteristics: Jamming signals can exhibit diverse characteristics depending on the jammer’s technology and strategy. Identifying these varied signals consistently is challenging.
Data Limitations: Accurate recognition often requires a substantial amount of data for training machine learning models. However, in real-world scenarios, obtaining a vast and varied dataset is a challenge.
Distributed Data: The data is decentralized and stored in local nodes, and cannot be directly uploaded for network training due to the sensitivity of communication jamming data.

1.1. Related Works

Traditional jamming recognition in wireless communication mainly focuses on the feature extraction and the design of classifiers. In [8], the author exploited the theory of compressed sensing to extract the spectral features of jamming signals. In [9], the author made full use of the spectrum fusion and anti-jamming prior information. He combined the idea of a dual support vector machine (SVM) and solution-oriented hierarchical classification to reduce the online time complexity with high recognition accuracy. In [10], the author first performed variational mode decomposition for the jamming signal and then extracted the features from the decomposed intrinsic mode function (IMF). Finally, the extracted features are fed into the support vector machine for recognition. In [11], five characteristic parameters were obtained by wavelet decomposition, and the recognition accuracy reached 90% when the JSR was greater than 5 dB. In [12], the author converted the jamming signal into signal feature space and carried out feature extraction based on the Hilbert signal space theory. In [13], the author proposed a fuzzy jamming recognition method based on the complexity of the received signal. In this algorithm, the sequence complexity and the box dimension of the received signal were used as the classification features of the jamming pattern. In [14], the author proposed a jamming recognition scheme based on a small data-driven Naive Bayes classifier. Data augmentation was employed to expand the data set, which achieved better performance.

However, all these methods mentioned above depend on establishing threshold values, which are significantly affected by the channel environment and risk omitting valuable information. Additionally, it also requires a manual feature design. Nowadays, developments in convolutional neural networks (CNNs) have made extraordinary achievements, improving the state of the art significantly. Owing to its potent nonlinear representation and automatic feature extraction capabilities, it has been widely employed in various domains, including network security [15,16], defect detection [17], surveillance security [18], and communication security [19].

The CNN, for the classification of wireless modulation signals, was proposed in [20] to automatically extract signal features, resulting in improved performance compared to traditional feature extraction methods. In [21,22], the CNN was designed for radar jamming signals’ recognition on the condition of limited training samples. In [23], the author proposed a multi-feature fusion network based on the fractional Fourier transform (FRFT). By integrating the local and global features of the fractional domain of jamming signals and incorporating an attention mechanism, the network enhances its ability to identify notable features in signals.

Despite a rich body of literature on machine learning based jamming recognition, most of the jamming recognition research is based on a single node, which is easily affected by channel fading. Meanwhile, jamming recognition based on distributed machine learning has attracted growing research interest due to its advantages, such as its accommodation of massive nodes, reliability, and robustness of recognition performance. In [24], the author used multiple nodes in different directions and locations to recognize the jamming. He aggregated all nodes of data into the center node to achieve good performance. However, in the real world, due to the sensitivity of communication jamming data, the data is decentralized and stored in local nodes, and cannot be directly uploaded for network training.

Federated learning (FL), of which the key idea is to distribute training across multiple data clients that have local data, is one approach to overcome the above challenges [25,26,27,28]. In [29], the author proposed a federated meta (FedMeta) learning approach for the recognition of communication jamming. He employed a distributed recognition architecture to learn the global model by federated learning, which achieves excellent recognition performance. In [30], the author proposed a model contrastive federated learning (MOON) method, which is a simple and effective federated learning framework. The main premise of MOON is to make full use of the similarity among model representations, resulting in corrected local training for each individual party. This correction is attained by applying contrastive learning using a model-level approach.

However, when the heterogeneity among local data is large (i.e., the global and local optimal models may drift significantly), the global model obtained by minimizing the joint empirical distribution in various ways may not be universal and is hard to converge. Specifically, the statistical heterogeneity of data across user devices will degrade the recognition performance of federated learning. In [31], the author proposed a personalization layer method for federated training (FedPer), which can tackle the obstacle of statistical heterogeneity to some extent. However, it neglected the fusion of local models and the global model.

1.2. Novelty and Main Contributions

In light of the above challenges and existing research, our objective is to craft an adaptive federated learning (AFL) framework that boasts high accuracy and reliability in detecting malicious jamming signals. To this end, our primary contributions are:

Firstly, we introduce a deep CNN-based federated learning framework for jamming recognition. This allows each local node model not only to acquire valuable information from the central node but to also contribute its gradient information.
Secondly, we conceptualize an adaptive adjustment mechanism for the mixed weight parameter of both the local and global models. This can autonomously strike a balance between the global model and the collective knowledge of local data across devices.
Thirdly, we demonstrate the robustness of parameter $α$ and the superiority of our algorithm through experiments.

2. System Model

In a standard federated learning scenario, the primary objective is to jointly learn a global model across all devices cooperatively. The system model is shown in Figure 1. There is a fusion center with N nodes, where each node has an independent jamming database and only has access to its local data. At the beginning of each round, a random fraction of nodes is selected, and the center node sends the initial global model state to each of these nodes. Afterwards, the selected nodes perform local computation based on the global state and its local dataset and send an update to the center node. The center node then applies these updates to its global state, and the process repeats.

The signal

Y_{i} (t)

received by the i-th node is expressed as:

Y_{i} (t) = S_{i} (t) + J_{i} (t) + N_{i} (t),

(1)

where

S_{i} (t)

,

J_{i} (t)

, and

N_{i} (t)

denote the quadrature phase shift keying (QPSK) signal, jamming signal, and Gaussian white noise, respectively.

Assume that

J_{i} (t)

has eight common malicious jamming patterns, which are single tone jamming, multi tone jamming, narrow band jamming, broadband jamming, comb frequency jamming, common sweeping frequency jamming, sawtooth sweeping frequency jamming, and Gaussian pulse jamming, respectively.

Single tone jamming is emitted at a certain frequency, which can be expressed as:

J (t) = \sqrt{P_{J}} \cos (2 π f_{J} t + θ_{J}) θ_{J} \sim (0, 2 π),

(2)

where

P_{J}

,

f_{J}

, and

θ_{J}

denote the power, frequency, and the initial phase of the jamming, respectively.

Muti-tone jamming is the superposition of multiple single tone jamming at different frequency points, which can be expressed as

J (t) = \sum_{i = 1}^{N_{j}} A_{i} \exp (j (2 π f_{i} t + φ_{i})),

(3)

where

N_{j}

denotes the number of frequency points, and

A_{i}

,

f_{i}

, and

φ_{i}

denote the power, frequency, and initial phase of the jamming, respectively.

Narrow band jamming can be regarded as passing Gaussian white noise through a narrowband bandpass filter, which is a bandlimited noise signal. The frequency domain expression of the narrowband filter is as follows:

H (j 2 π f) = \{\begin{matrix} 1, & |f - f_{J}| \leq \frac{W_{I}}{2} \\ 0, & otherwise \end{matrix},

(4)

where

f_{j}

denotes the center frequency of the bandpass filter and

W_{I}

denotes the bandwidth.

Broadband jamming can be obtained by passing Gaussian white noise through an ideal wideband filter. The frequency domain expression of the narrowband filter is as follows:

H (j 2 π f) = \{\begin{matrix} 1, & |f| \leq \frac{W_{I}}{2} \\ 0, & otherwise \end{matrix} .

(5)

Comb frequency jamming is a set of narrow band jamming signals modulated over a series of frequency points. Comb frequency jamming will pass the Gaussian white noise through several narrowband filters to obtain several blocking signals in the narrow band range. The frequency domain of its filter can be expressed as:

H (j 2 π f) = \{\begin{matrix} 1, & |f - f_{J, i}| \leq \frac{W_{I, i}}{2} \\ 0, & otherwise \end{matrix},

(6)

where

f_{J, i}

and

W_{I, i}

denote the center frequency and bandwidth of the i-th filter, respectively. The common sweeping frequency jamming refers to jamming with continuous linear changes in frequency over time, which can be expressed as:

J (t) = \sqrt{P_{J}} \exp [j (2 π f_{0} t + π k t^{2} + φ)],

(7)

where

f_{0}

denotes the starting frequency of the common sweeping frequency, and k denotes the modulation slope.

The sawtooth sweeping frequency jamming utilizes a narrow band signal to scan the frequency band in one sweep cycle, which can be expressed as:

J (t) = U_{j} X (t) \cos (2 π f_{j} t + θ_{t} + φ),

(8)

where

U_{j}

denotes the amplitude, and

X (t)

denotes a narrow band signal. The instantaneous frequency of the jamming signal can be expressed as follows:

f (t) = 2 π f_{j} + \frac{d θ (t)}{d t} = 2 π f_{j} + F (t),

(9)

where

F (t)

is the sweep frequency function, satisfying the following formula:

F (t) = 2 π k t, n T_{f} \leq t \leq (n + 1) T_{f}, n = 0, 1, 2 \dots,

(10)

where k and

T_{f}

denote the frequency sweep slope and frequency sweep period, respectively. Gaussian pulse jamming is a typical time-domain jamming. It has the characteristics of sudden occurrence, short duration, and high interference power, which can be expressed as:

J (t) = \{\begin{matrix} p (t), 0 < t \leq τ_{p} \\ 0, τ_{p} < t < T_{p}, \end{matrix}

(11)

where

τ_{p}

denotes the pulse duration,

T_{p}

denotes the pulse duration, and

p (t)

denotes Gaussian noise with a zero mean and the variance

σ_{p}^{2}

.

For each node, the input layer performs a 128-point FFT transformation on the received signal

Y_{i} (t)

, and splices the real and imaginary parts of the matrix to obtain a 1 × 128 × 2 matrix, which is used as the input of the neural network.

3. Proposed Algorithm

We define the optimization problem in federated learning as federated optimization, making a contrast to distributed optimization. Federated optimization has several important attributes different from a typical distributed optimization problem:

Non-IID: The jamming signals collected by each cognitive node are different due to the different positions of cognitive nodes. As a result, any particular node’s local dataset may not be representative of the global distribution.
Unbalanced: Some cognitive nodes have jamming signals and some do not, leading to varying amounts of local training data.
Limited communication: Cognitive nodes are on slow or expensive connections.

In a federated learning setting, each node only has access to its own data distribution on domain

D_{i} = \{(X, Y)\}

, where

X \in R^{d}

denotes the signal domain and

Y

denotes the label domain. For any hypothesis

h \in H

, the true risk at local distribution is denoted by:

L_{D_{i}} (h) = E_{(x, y) \sim D_{i}} [ℓ (h (x), y)] .

(12)

The average distribution over all nodes is denoted by:

\bar{D} = \frac{1}{n} \sum_{i = 1}^{n} D_{i} .

(13)

The goal of federated learning is to attain an optimal combination of the global model and the local model. In this case, the global model attempts to train the global model by minimizing the empirical risk

\bar{D}

:

{\bar{h}}^{*} = \arg min_{h \in H} {\hat{L}}_{\bar{D}} (h) .

(14)

Meanwhile, each node trains a local model with the help of the global model, with mixing weight

α_{i}

:

{\hat{h}}_{l o c, i}^{*} = \arg min_{h \in H} {\hat{L}}_{D_{i}} (α_{i} h + (1 - α_{i}) {\bar{h}}^{*}) .

(15)

Finally, the local model of the i-th node is:

h_{α_{i}} = α_{i} {\hat{h}}_{l o c, i}^{*} + (1 - α_{i}) {\bar{h}}^{*} .

(16)

The local node’s key motivation to participate in federated learning is to strive for decreased errors in local generalization through the utilization of data provided by other nodes. Ideally, the local nodes would have the capacity to make full use of the global model information to overcome the limitations of having only limited local training data, while also minimizing the impact of differences between their own local data and the data provided by other devices. Obviously, the global model is inactive to be employed as a local model when the local distribution is not strongly correlated with the global distribution.

3.1. CNN Network Structure

The structural parameters of the CNN are given in Table 1. The whole network consists of a total of four convolutional modules, and the final classification output is carried out by a max pooling layer and a fully connected layer. In order to extract more features, each convolution module uses an increasing number of convolution kernels. Specifically, the number of convolution kernels is 16, 32, 64, and 128, respectively.

Figure 2 shows the structure of the convolutional module. In the convolutional module, ‘conv’ denotes the convolutional layer, ‘K’ denotes the number of convolutional kernels, and ‘S’ denotes equal-width convolution. The convolutional module contains a convolutional layer, activation layer, dropout layer, and max pooling layer.

The convolution layer performs a convolution operation on input data to extract features of the signal. Different convolution kernels are equivalent to different feature extractors. At present, the overall structure of convolutional networks tends to use smaller convolutional kernels (such as 1 × 1 or 3 × 3) and deeper structures. The convolution layers in this paper all adopt a 3 × 3 convolution structure.

The activation layer is a nonlinear transformation, which is usually applied to the hidden layer output of neural networks. It can perform a nonlinear mapping to the input signal, and increase the nonlinear characteristics on the basis of each neuron. Hence, it has a strong expression ability. In this paper, relu is used as the activation function of each hidden layer:

f (x) = relu (x) = \{\begin{matrix} x & x \geq 0 \\ 0 & else . \end{matrix}

(17)

The dropout layer randomly drops some neurons with a certain probability during training, allowing the network to learn a more robust feature representation and avoid overfitting the model. The dropout layer randomly drops each neuron during forward propagation with a certain probability p, multiplying the output value of that neuron by 0. In back-propagation, the gradient of the discarded neurons is also set to 0 to avoid their influence on the gradient. The dropout layer ensures the generalization and robustness of the neural network when dealing with test data. For this paper, p is set to 0.5.

The pooling layer can significantly reduce the dimensions of the feature map and the parameters that need to be trained, and avoid overfitting of the network. Pooling layers mainly include the max pooling layer and avg pooling layer. The max pooling layer focuses on the strongest response of certain features in the signal, which has the capacity to better preserve the frequency information in the signal. Hence, we employed the max pooling layer in this paper.

The fully connected layer is generally located at the last layer of the classification network structure. It fully connects the feature maps’ output by the local layer and the previous layer, and converts the feature maps obtained by the network into the target classification results. In this paper, the neural network outputs the final classification score through the fully connected layer.

3.2. Adaptive Federated Learning

Our proposed federated learning method is based on adequately mixing the optimal global model and slightly modified local model. Meanwhile, the per-device mixing parameter

α_{i}

is of great significance for the generalization ability of the global model and local model. In this subsection, we propose our adaptive federated learning algorithm to learn the local models and global model, which are shown in Algorithm 1.

Algorithm 1: Adaptive Federated Learning.

Input: Mixture weights $α_{1}, \dots, α_{m}$ , Local step $τ$ , A set of randomly selected N nodes
Center executes:
initialize $w_{0}$
for communication round $t = 1, 2, \dots$ do
$m \leftarrow m a x (C \cdot N, 1)$
$S_{t} \leftarrow$ (random set of m node)
for each nodes $n \in S_{t}$ in parallel do
$w_{t + 1}^{k} \leftarrow$ NodeExecutes $(n, w_{t})$
$m_{t} \leftarrow \sum_{n \in S_{t}} m_{n}$
$w_{t + 1} \leftarrow \sum_{n \in S_{t}} \frac{m_{n}}{m_{t}} w_{t + 1}^{k}$

NodeExecutes $(n, w)$ :
for each local epoch j from 1 to $τ$ do
for iteration i from 1 to B do
$\begin{array}{l} w_{n}^{(i)} = w_{n}^{(i - 1)} - η_{i} \nabla f_{n} (w_{n}^{(i - 1)}; ξ_{n}^{i}) \\ v_{n}^{(i)} = v_{n}^{(i - 1)} - η_{i} \nabla_{v} f_{n} ({\bar{v}}_{n}^{(i - 1)}; ξ_{n}^{i}) \\ {\bar{v}}_{n}^{(i)} = α_{n} v_{n}^{(i)} + (1 - α_{n}) w_{n}^{(i)} \end{array}$
$α_{n}^{(i)} = α_{n}^{(i - 1)} - η_{i} \nabla_{α} f_{n} ({\bar{v}}_{n}^{(i - 1)}; ξ_{n}^{i})$
return $w_{n}^{(i)}$

We let every hypothesis

h \in H

to be denoted by a vector

w \in R^{d}

and parameterize the empirical risk at i-th device by a local loss function. Adaptive federated learning can be regarded as a dual-phase optimization problem: update the shared model globally, and update the nodes’ model locally. The fusion center will solve the optimization problem:

min_{w \in R^{d}} [F (w) : = \frac{1}{n} \sum_{i = 1}^{n} \{f_{i} (w) : = E_{ξ_{i}} [f_{i} (w, ξ_{i})]\}],

(18)

where

f_{i} (.)

denotes the local loss function at the i-th node,

ξ_{i}

denotes the minibatch of the date at the i-th node, and n denotes the total number of nodes. We need to learn a local model to optimize the local empirical risk. In this case, each node needs to solve the following optimization problem over its local data:

min_{v \in R^{d}} f_{i} (α_{i} v + (1 - α_{i}) w^{*}),

(19)

where

w^{*} = \arg {min}_{w} F (w)

denotes the optimal global model. Mixing parameter

α_{i}

controls the equilibrium between the two models, which is associated with the distinctiveness of both the local and global model.

The center node randomly selects m as a set

S_{t}

. Each selected node maintains three models at iteration i: the global model

w_{n}^{(i)}

, the local model

v_{n}^{(i)}

, and the mixed model

{\bar{v}}_{n}^{(i)}

. Subsequently, the chosen nodes will locally execute the ensuing updates on their respective data:

w_{n}^{(i)} = w_{n}^{(i - 1)} - η_{i} \nabla f_{n} (w_{n}^{(i - 1)}; ξ_{n}^{i}),

(20)

v_{n}^{(i)} = v_{n}^{(i - 1)} - η_{i} \nabla_{v} f_{n} ({\bar{v}}_{n}^{(i - 1)}; ξ_{n}^{i}),

(21)

where

\nabla f_{n} (.; ξ)

denotes the gradient of

f (.)

determined at mini-batch

ξ

. Based on the latest iteration of the global model and the corresponding local model, we recalibrate the mixed model

{\bar{v}}_{n}^{(i)}

accordingly. After

τ

local steps, the selected nodes will send their individualized versions of the global model

w_{n}^{(i)}

to the center node for aggregation by averaging:

w_{t + 1} \leftarrow \sum_{n \in S_{t}} \frac{m_{n}}{m_{t}} w_{t + 1}^{k} .

(22)

Afterwards, the center node will choose another set of m nodes for the ensuing round of training and disseminate an updated model to these selected nodes.

3.3. Parameter $α$ Update

The goal of adaptive federated learning is to find the optimal parameter

α

to achieve the balance between the global model and local model. Generally, when the local and global data distributions are well-aligned, the optimal parameter

α

should be set as small, as this would enable each node to learn more from the data of other devices. On the other side, when the local and global distributions drift significantly, the optimal parameter

α

should be set close to 1. In practice, we actually do not know the distance between the users’ distribution and the average distribution. Hence, determining the optimal

α

is impossible. However, we can infer it automatically during optimization. We can use a step of the gradient descent to update it at every communication round:

α_{n}^{(i)} = α_{n}^{(i - 1)} - η_{i} \nabla_{α} f_{n} ({\bar{v}}_{n}^{(i - 1)}; ξ_{n}^{i}),

(23)

which shows that the mixing coefficient

α

is updated according to the difference between the local and global models. Evidently, when the local and global models are very closely to each other,

α

tends to be stable.

4. Simulation

4.1. Experiment Setup

The jamming dataset consists of eight jamming signals along with QPSK modulated communication signals. We set

F_{s} = 10

MHz. The center frequency of the QPSK signal carrier

f_{c} = 2.5

MHz. The channel model is additive Gaussian white noise; the SNR is

- 10

dB. The JSR is set from

- 15

dB to 10 dB with an interval of 1 dB. The center frequency of all jamming signals are randomly set near

f_{c}

.

The number of frequency points of multi tone jamming is randomly set from four to eight. The bandwidth is set from 2.5 to 3 MHz for narrow band jamming. The bandwidth of broadband jamming is set from 0.25 to 0.3 MHz. Comb jamming has four narrow band jamming signals combined. The gap between adjacent frequency points is 1 MHz. For the common sweeping frequency jamming, the sweep frequency speeds are randomly set from 1 to 10 THz. Different from common sweeping frequency jamming, the sawtooth sweeping frequency jamming employed narrow band jamming signals to scan the frequency band periodicity. For Gaussian pulse jamming, the pulse width is set as 0.985e-6 and the duty cycle is set as 0.125.

For all the experiments, we have a total of 52,000 jamming samples, which are unevenly distributed among 100 nodes. Each node itself performs a few shot learning tasks. All calculations are implemented using Pytorch 1.14 on the Nvidia RTX 3060Ti graphic processor unit. We employ the linear decay structure for the learning rate. The learning rate is decreased by 1%. The batchsize for each node is set to be 10 and

τ

is set as 2.

4.2. Experiment Result

We first investigate the effect of the number of nodes involved in the computation on the performance of the algorithm. We run the same experiments with different sampling rates C for the jamming signal dataset.

Figure 3 shows that despite the convergence speed of the model being slow when the sampling rate is low, it has little effect on the final recognition result in that the increase of communication rounds can ensure sufficient information exchange between nodes to make up for the problems caused by the lack of participating computing nodes.

In Figure 4, we discuss the effect of the initial value

α

of the adaptation parameter. We run it with different values of

α \in {0.25, 0.5, 0.75}

. We know that for the jamming data which is highly Non-IID, a larger

α

is preferred. However, despite the initial value set as

α = 0.25

, our proposed algorithm still achieves 97.8% recognition accuracy in that AFL can dynamically adjust the weights of the local model and the global model based on the current training loss, which demonstrates the robustness of our algorithm.

In Figure 5, the average accuracies of FedMeta, MOON, FedPer, and AFL are provided. It can be seen that MOON and FedMeta have poor performances compared with the other two algorithms. Moreover, their curve fluctuation is extremely intense. This can be explained since they only learn the parameters of the global model and ignore the particularity of the respective local node weights. As a result, when each node has fewer data and is not identically distributed, their recognition performance will be poor. Compared with the FedPer algorithm, AFL considers the weight of the local model and dynamically adjusts the mixing weight of the local model and the global model. Hence, it achieves the best performance.

Figure 6 shows the confusion matrix of FedMeta. Classes from J1 to J8 represent the eight jamming patterns we mentioned before. The algorithm is easy to confuse single tone jamming and multi tone jamming with each other in that multi tone jamming is a combination of several single-tone jamming frequencies. In addition, FedMeta is most likely to identify narrowband jamming as comb frequency jamming, and its performance on the sawtooth sweeping frequency and Gaussian pulse jamming are disastrous because the training of each local node is severely influenced by the data of other nodes.

Figure 7 shows the confusion matrix of FedPer. We can observe that its performance against various jamming patterns is significantly better than FedMeta. The personalized approach in our study is precisely based on this algorithm. During its training, the algorithm thoroughly considers the uniqueness of data at each local node; hence, it exhibits an impressive performance, surpassing FedMeta by a considerable percentage. However, it tends to confuse single tone jamming with multi tone jamming. This confusion arises because multi tone jamming is composed of single tone jamming. Moreover, it often misidentifies multi tone jamming as comb jamming. This is due to the fact that comb jamming is made up of narrowband jamming across multiple frequency bands, and the narrower the bandwidth, the closer its spectral characteristics are to multi tone jamming.

Figure 8 shows the confusion matrix of AFL. It can be seen that our proposed method outperforms, with an average recognition accuracy of more than 95% for each type of jamming. Moreover, the average accuracy of AFL was 30.1% higher than FedMeta. Our proposed method’s recognition performance against sawtooth sweep jamming is inferior to FedPer. However, for the other seven types of jamming, its recognition accuracy is superior. Its overall average recognition accuracy is 1.1% higher than FedPer, which can be attributed to the fact that AFL automatically adjusts the equilibrium between the global model and the communal knowledge from local data across all devices. One primary error classification that emerges is the misidentification of sawtooth sweep jamming as narrow band jamming. This is predominantly due to the resemblance in the frequency domain characteristics between these two jamming types, given that our network uses the FFT of signals for training. Narrowband jamming can essentially be seen as Gaussian jamming within a particular confined range. Thus, there is a proclivity for the neural network to mistakenly classify a minor portion of sawtooth sweep jamming as narrowband jamming.

Table 2 presents a comparison of algorithm complexity and execution time. We can see that the runtime and memory usage of our proposed algorithm are higher than those of FedMeta and FedPer, but lower than MOON. This is because our algorithm has to determine the weights of the hybrid model. It is precisely this determination of hybrid weights that contributes to the improved accuracy of our method.

5. Conclusions

In this paper, we propose a method that employs a distributed recognition framework. Our proposed method utilizes a distributed recognition architecture to achieve global optimization through federated learning, where each node will train its local models while contributing to the global model. We have developed an adaptive adjustment method for the mixed weight parameter of both the local model and the global model, enabling automatic fine-tuning of the balance between the global model and the shared knowledge derived from local data across all devices. Simulation results demonstrate that our schema achieves an excellent performance compared with the state of the art algorithm.

In future work, we will conduct research in two areas. On the one hand, we aim to reduce the complexity of the algorithm, such as pruning the structure of the neural network or improving the efficiency of the parameter transmission between nodes. On the other hand, we will focus on further enhancing the accuracy of the algorithm, such as integrating the attention mechanism into the network.

Author Contributions

Conceptualization, Y.L., Y.S. and N.Z.; methodology, Y.L.; validation, N.Z., Y.S. and J.S.; formal analysis, Y.L., N.Z. and J.S.; investigation, Y.L. and N.Z.; resources, Y.L., N.Z. and J.S.; data curation, N.Z. and Y.S.; writing—original draft preparation, N.Z.; writing—review and editing, Y.L. and Y.S.; visualization, N.Z.; supervision, Y.L. and Y.S.; project administration, Y.L. and N.Z.; funding acquisition, Y.L.; All authors have read and agreed to the published version of the manuscript.

Funding

National Natural Science Foundation of China: U19B2014.

Data Availability Statement

Due to institutional data privacy requirements, our data is unavailable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yao, F. Communication Anti-Interference Engineering and Practice; Electronic Industry Press: Beijing, China, 2012. [Google Scholar]
Gao, Y.; Fan, H.; Ren, L.; Liu, Z.; Liu, Q.; Mao, E. Joint Design of Waveform and Mismatched Filter for Interrupted Sampling Repeater Jamming Suppression. IEEE Trans. Aerosp. Electron. Syst. 2023. [Google Scholar] [CrossRef]
Almasoud, A.M. Robust Anti-Jamming Technique for UAV Data Collection in IoT Using Landing Platforms and RIS. IEEE Access 2023, 11, 70635–70651. [Google Scholar] [CrossRef]
Yang, T.; Yuan, Y.; Yi, W. Multi-Domain Resource Scheduling for Surveillance Radar Anti-Jamming based on Q-Learning. In Proceedings of the 2023 IEEE Radar Conference (RadarConf23), San Antonio, TX, USA, 1–5 May 2023. [Google Scholar] [CrossRef]
Zhou, Q.; Niu, Y.; Xiang, P.; Li, Y. Intra-Domain Knowledge Reuse Assisted Reinforcement Learning for Fast Anti-Jamming Communication. IEEE Trans. Inf. Forensics Secur. 2023. [Google Scholar] [CrossRef]
Han, C.; Huo, L.; Tong, X.; Wang, H.; Liu, X. Spatial anti-jamming scheme for internet of satellites based on the deep reinforcement learning and stackelberg game. IEEE Trans. Veh. Technol. 2020, 69, 5331–5342. [Google Scholar] [CrossRef]
Aboueleneen, N.; Alwarafy, A.; Abdallah, M. Secure and Energy-Efficient Communication for Internet of Drones Networks: A Deep Reinforcement Learning Approach. In Proceedings of the 2023 International Wireless Communications and Mobile Computing (IWCMC), Marrakesh, Morocco, 19–23 June 2023; pp. 818–823. [Google Scholar]
Mughal, M.O.; Kim, S. Signal classification and jamming detection in wide-band radios using naïve bayes classifier. IEEE Commun. 2018, 22, 1398–1401. [Google Scholar] [CrossRef]
Yi, W.; Qu, Y.; Li, S.; Liu, Q. Hierarchical Jamming Recognition with Spectrum Fusion Feature and Twin-bound SVM for Cognitive Satellite Communications. In Proceedings of the 2023 IEEE Wireless Communications and Networking Conference (WCNC), Glasgow, UK, 26–29 March 2023. [Google Scholar] [CrossRef]
Zhou, H.; Wang, Z.; Wu, R.; Xiong, X. Jamming Recognition Algorithm Based on Variational Mode Decomposition. IEEE Sens. J. 2023, 23, 17341–17349. [Google Scholar] [CrossRef]
Wang, G.; Wang, Y.; Huang, G. Classification Methods with Signal Approximation for Unknown Interference. IEEE Access 2020, 8, 37933–37945. [Google Scholar] [CrossRef]
Kong, L.; Xu, Z.; Wang, J. A novel algorithm for jamming recognition in wireless communication. In Proceedings of the 2013 6th International Congress on Image and Signal Processing (CISP), Hangzhou, China, 16–18 December 2013; Volume 3, pp. 1473–1477. [Google Scholar]
Niu, Y.; Cheng, Y.; Chen, J. Jamming pattern recognition based on complexity measure. In Proceedings of the 2010 3rd International Congress on Image and Signal Processing, Yantai, China, 16–18 October 2010; Volume 8, pp. 3596–3600. [Google Scholar]
Shi, Y.Y.; Lu, X.; Niu, Y.; Li, Y. Efficient jamming identification in wireless communication: Using small sample data driven naive bayes classifier. IEEE Wirel. Commun. Lett. 2021, 10, 1375–1379. [Google Scholar] [CrossRef]
Roopak, M.; Tian, G.Y.; Chambers, J. Deep Learning Models for Cyber Security in IoT Networks. In Proceedings of the 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 7–9 January 2019; pp. 0452–0457. [Google Scholar]
Zhang, H.; Yu, F.; Yan, L.; Wang, T. Key Technologies of Communication Security Detection between Heterogeneous Systems Based on Communication Gateway. In Proceedings of the 2023 IEEE 3rd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Chongqing, China, 26–28 May 2023; pp. 258–262. [Google Scholar]
Yan, B.; Zheng, J.; Li, R.; Fu, K.; Chen, P.; Jia, G.; Shi, Y.; Lv, J.; Gao, B. Semi-supervised pipeline anomaly detection algorithm based on memory items and metric learning. Nondestruct. Test. Eval. 2023. [Google Scholar] [CrossRef]
Yang, Z.; Huo, L. Bolt preload monitoring based on percussion sound signal and convolutional neural network (CNN). Nondestruct. Test. Eval. 2022, 37, 464–481. [Google Scholar] [CrossRef]
Toma, A.; Cecchinato, N.; Drioli, C.; Foresti, G.L.; Ferrin, G. CNN-based processing of radio frequency signals for augmenting acoustic source localization and enhancement in UAV security applications. In Proceedings of the 2021 International Conference on Military Communication and Information Systems (ICMCIS), The Hague, The Netherlands, 4–5 May 2021. [Google Scholar] [CrossRef]
O’Shea, T.J.; Roy, T.; Clancy, T.C. Over-the-air deep learning based radio signal classification. IEEE J. Sel. Top. Signal Process. 2018, 12, 168–179. [Google Scholar] [CrossRef] [Green Version]
Shao, G.; Chen, Y.; Wei, Y. Convolutional neural network-based radar jamming signal classification with sufficient and limited samples. IEEE Access 2020, 8, 588–598. [Google Scholar] [CrossRef]
Hou, L.; Zhang, S.; Wang, C.; Li, X.; Chen, S.; Zhu, L.; Zhu, Y. Jamming Recognition of Carrier-Free UWB Cognitive Radar Based on MANet. IEEE Trans. Instrum. Meas. 2023, 72, 8504413. [Google Scholar] [CrossRef]
Zhou, H.; Wang, L.; Guo, Z. Recognition of Radar Compound Jamming Based on Convolutional Neural Network. IEEE Trans. Aerosp. Electron. Syst. 2023. [Google Scholar] [CrossRef]
Shen, J.; Li, Y. Cooperative multi-node cognition method based on deep residual network. Electronics 2022, 11, 3280. [Google Scholar] [CrossRef]
Wang, S.; Tuor, T.; Salonidis, T.; Leung, K.K.; Makaya, C.; He, T.; Chan, K. Efficient Adaptive federated learning in resource constrained edge computing systems. IEEE J. Sel. Areas Commun. 2021, 37, 1205–1221. [Google Scholar] [CrossRef] [Green Version]
Lim, W.Y.B.; Luong, N.C.; Hoang, D.T.; Jiao, Y.; Liang, Y.-C.; Yang, Q.; Niyato, D.; Miao, C. Federated learning in mobile edge networks: A comprehensive survey. IEEE Commun. Surv. Tutor. 2020, 22, 2031–2063. [Google Scholar] [CrossRef] [Green Version]
Zhang, L.; Wu, Y.; Chen, L.; Fan, L.; Nallanathan, A. Scoring Aided Federated Learning on Long-tailed Data for Wireless IoMT based Healthcare System. IEEE J. Biomed. Health Inform. 2023. [Google Scholar] [CrossRef] [PubMed]
Roy, S.; Li, J.; Bai, Y. Federated Learning-Based Intrusion Detection System for IoT Environments with Locally Adapted Model. In Proceedings of the 2023 IEEE 10th International Conference on Cyber Security and Cloud Computing (CSCloud)/2023 IEEE 9th International Conference on Edge Computing and Scalable Cloud (EdgeCom), Xiangtan, China, 1–3 July 2023; pp. 203–209. [Google Scholar]
Liu, M.; Liu, Z.; Lu, W.; Chen, Y.; Gao, X.; Zhao, N. Distributed few-shot learning for intelligent recognition of communication jamming. IEEE J. Sel. Top. Signal Process. 2022, 16, 395–405. [Google Scholar] [CrossRef]
Li, Q.; He, B.; Song, D. Model-Contrastive Federated Learning. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10708–10717. [Google Scholar]
Lin, J.; Wu, X. Personalized Federated Learning with Data Heterogeneity Constraints. In Proceedings of the 2022 3rd International Conference on Computer Science and Management Technology (ICCSMT), Shanghai, China, 18–20 November 2022; pp. 152–155. [Google Scholar]

Figure 1. System model of jamming recognition.

Figure 2. Structure of the convolutional module.

Figure 3. Performance comparison of AFL with different sample rates.

Figure 4. Performance comparison of AFL with different mixing weights

α

.

Figure 4. Performance comparison of AFL with different mixing weights

α

.

Figure 5. Performance comparison of AFL with other state of the art algorithm.

Figure 6. Confusion matrix of the FedMeta.

Figure 7. Confusion matrix of the FedPer.

Figure 8. Confusion matrix of the AFL.

Table 1. Network structure parameters.

Index	Layers	Output Dimension
1	Input	(128, 2, 1)
2	Convolution Module (16)	(64, 2, 16)
3	Convolution Module (32)	(32, 2, 32)
4	Convolution Module (32)	(16, 2, 32)
5	Convolution Module (64)	(8, 2, 64)
6	Max Pooling Layer	(4, 2, 64)
7	Full Connected Layer	(8, 1)

Table 2. Comparison of complexity and processing time for various algorithms.

Algorithm	Total Time	Memory Used
FedMeta	12.40 s	28.70 MB
FedPer	11.89 s	28.65 MB
Moon	24.97 s	43.05 MB
AFL	17.91 s	35.12 MB

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, N.; Li, Y.; Shi, Y.; Shen, J. A CNN-Based Adaptive Federated Learning Approach for Communication Jamming Recognition. Electronics 2023, 12, 3425. https://doi.org/10.3390/electronics12163425

AMA Style

Zhang N, Li Y, Shi Y, Shen J. A CNN-Based Adaptive Federated Learning Approach for Communication Jamming Recognition. Electronics. 2023; 12(16):3425. https://doi.org/10.3390/electronics12163425

Chicago/Turabian Style

Zhang, Ningsong, Yusheng Li, Yuxin Shi, and Junren Shen. 2023. "A CNN-Based Adaptive Federated Learning Approach for Communication Jamming Recognition" Electronics 12, no. 16: 3425. https://doi.org/10.3390/electronics12163425

APA Style

Zhang, N., Li, Y., Shi, Y., & Shen, J. (2023). A CNN-Based Adaptive Federated Learning Approach for Communication Jamming Recognition. Electronics, 12(16), 3425. https://doi.org/10.3390/electronics12163425

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A CNN-Based Adaptive Federated Learning Approach for Communication Jamming Recognition

Abstract

1. Introduction

1.1. Related Works

1.2. Novelty and Main Contributions

2. System Model

3. Proposed Algorithm

3.1. CNN Network Structure

3.2. Adaptive Federated Learning

3.3. Parameter $α$ Update

4. Simulation

4.1. Experiment Setup

4.2. Experiment Result

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A CNN-Based Adaptive Federated Learning Approach for Communication Jamming Recognition

Abstract

1. Introduction

1.1. Related Works

1.2. Novelty and Main Contributions

2. System Model

3. Proposed Algorithm

3.1. CNN Network Structure

3.2. Adaptive Federated Learning

3.3. Parameter α Update

4. Simulation

4.1. Experiment Setup

4.2. Experiment Result

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.3. Parameter $α$ Update