Fine-Grained Recognition of Mixed Signals with Geometry Coordinate Attention

Yi, Qingwu; Wang, Qing; Zhang, Jianwu; Zheng, Xiaoran; Lu, Zetao

doi:10.3390/s24144530

Open AccessArticle

Fine-Grained Recognition of Mixed Signals with Geometry Coordinate Attention

by

Qingwu Yi

^1,2,

Qing Wang

²,

Jianwu Zhang

^3,*,

Xiaoran Zheng

³ and

Zetao Lu

³

¹

National Key Laboratory of Radar Signal Processing, Xidian University, Xi’an 710071, China

²

State Key Laboratory of Satellite Navigation System and Equipment Technology, The 54th Research Institute of CETC, Shijiazhuang 050081, China

³

School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(14), 4530; https://doi.org/10.3390/s24144530

Submission received: 13 May 2024 / Revised: 22 June 2024 / Accepted: 8 July 2024 / Published: 13 July 2024

(This article belongs to the Section Intelligent Sensors)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

With the advancement of technology, signal modulation types are becoming increasingly diverse and complex. The phenomenon of signal time–frequency overlap during transmission poses significant challenges for the classification and recognition of mixed signals, including poor recognition capabilities and low generality. This paper presents a recognition model for the fine-grained analysis of mixed signal characteristics, proposing a Geometry Coordinate Attention mechanism and introducing a low-rank bilinear pooling module to more effectively extract signal features for classification. The model employs a residual neural network as its backbone architecture and utilizes the Geometry Coordinate Attention mechanism for time–frequency weighted analysis based on information geometry theory. This analysis targets multiple-scale features within the architecture, producing time–frequency weighted features of the signal. These weighted features are further analyzed through a low-rank bilinear pooling module, combined with the backbone features, to achieve fine-grained feature fusion. This results in a fused feature vector for mixed signal classification. Experiments were conducted on a simulated dataset comprising 39,600 mixed-signal time–frequency plots. The model was benchmarked against a baseline using a residual neural network. The experimental outcomes demonstrated an improvement of 9% in the exact match ratio and 5% in the Hamming score. These results indicate that the proposed model significantly enhances the recognition capability and generalizability of mixed signal classification.

Keywords:

machine learning; fine-grained image recognition; residual neural network; information geometry

1. Introduction

Automatic modulation recognition (AMR) technology is used to identify the modulation schemes of received signals and find extensive applications in both military and civilian domains, closely related to non-cooperative communication, electronic warfare, and security detection [1,2,3]. With the rapid development of wireless communication technologies, the modulation types of signals are becoming increasingly diverse and complex. Additionally, the radio frequency environment is becoming more challenging, leading to frequent occurrences of signal time–frequency overlap [4]. Currently, research on automatic recognition of single modulation types has become relatively mature. For instance, methods employing decision theory, likelihood approaches, classification models based on domain knowledge and manually designed features [5,6], and neural network models based on deep learning [7] have been developed. However, research on the recognition of signals involving two or more modulation types is less explored. There are two main approaches for modulation type recognition in the context of multiple signal mixtures: one involves preprocessing by separating signals first and then recognizing the modulation type of each isolated signal, and the other approach aims to directly recognize multiple signals without extensive preprocessing or separation.

In the first approach, the effectiveness of signal classification is closely tied to the performance of signal separation. Commonly used algorithms for multi-signal separation include Independent Component Analysis (ICA) and its variant, Fast Independent Component Analysis (FICA) [8,9,10]. These methods frequently necessitate substantial prior knowledge to separate overlapping signals using channel matrices, which may be restrictive in practical applications.

Yang et al. employed a Uniform Linear Array (ULA) as a signal receiver to capture and distinguish signals. They utilized advanced techniques such as high-order cumulants and instantaneous statistical amplitude features, combined with a classical decision tree, for digital signal modulation classification. This method yielded positive results, even under conditions of low signal-to-noise ratio [11]. However, the effectiveness of this approach heavily depends on the precise estimation of signal arrival angles, and the classification performance is closely tied to this accuracy. Moreover, the use of ULA for signal reception and separation is more appropriate for scenarios where the signal reception matrix is either positive definite or overdetermined, which limits its use in complex settings dominated by underdetermined reception matrices. Ai et al. implemented a UNet semantic segmentation network to categorize time–frequency plots of radar signals, designed to handle an unspecified number of radar signals. This technique delivered impressive Mean Intersection over Union (MioU) and Mean Pixel Accuracy (MPA) results for the modulation classification of both single and multiple radar signals in environments with high noise levels [12]. Nonetheless, semantic segmentation networks, which depend on pixel-level identification, often encounter challenges such as high computational demands, lengthy training periods, and substantial parameter sizes. Wang et al. merged the ALOHA algorithm with a binary tree algorithm to create a machine learning-based hybrid anti-collision algorithm for analyzing mixed RF signals. This strategy successfully separates signals under both positive definite and underdetermined conditions, effectively estimating the number of RF signals and showcasing strong recognition capabilities and robustness [13]. These techniques underscore diverse strategies for addressing the challenges in recognizing modulation types in mixed signal scenarios, each with distinct advantages and disadvantages. Further research is essential to develop robust and efficient methods for automatic modulation recognition in the complex and fluctuating conditions of radio frequency environments.

Current techniques for recognizing multiple signal modulations, which expand on the recognition of individual modulation types, have advanced considerably. Nevertheless, significant interference and overlap among signals present challenges in feature extraction for neural networks, necessitating more advanced methods to effectively retain information. Moreover, neural networks for signal classification often utilize multiple classifiers. For example, reference [14] describes a CNN-based method for automatically recognizing mixed-signal modulations. However, a limitation of this method is the rapid increase in classification categories as the number of modulation types increases.

Inspired by fine-grained image classification networks and methods for fine-grained signal analysis [15,16,17,18], this paper introduces a recognition model designed to achieve a detailed feature analysis of signals. The objective is to develop a high-precision recognition model that can accurately classify mixed signals across various modulation schemes, providing valuable insights for subsequent digital communication system processes. The model utilizes a ResNet framework as its core architecture to extract essential information. It incorporates a Geometric Coordinate Attention Fusion module, which applies principles of information geometry to enhance pooling methods and bolster attention mechanisms. This strategy aims to thoroughly analyze feature information of samples, integrate features across different scales and modalities, and employ low-rank bilinear pooling to capture intricate signal details by amplifying differences between signal features. This enhancement boosts modulation type recognition and reduces the need for signal preprocessing, ultimately improving classification performance. To mitigate the impact of varying signal quantities on the network’s output parameters, a multi-label classification approach is adopted for signal categorization. This method significantly reduces output dimensions relative to traditional multi-class classification techniques. In the experiments, a dataset was constructed with signals from five different modulation types, including mixed signals, under various signal-to-noise ratios to validate the effectiveness of the proposed module in enhancing classification capabilities. Additionally, experiments with a robustness dataset demonstrated the model’s stability and general applicability across diverse mixed signal conditions. These research findings provide new ideas and methods for the development of automatic modulation recognition technology, with the potential to enhance the performance and reliability of wireless communication systems in practical applications. Future research will continue to focus on optimizing model structures, improving recognition accuracy, and exploring more efficient signal feature extraction and classification methods to address increasingly complex and dynamic communication environments.

The main innovations of this paper are as follows:

(1): The proposed Geometry Coordinate Attention Fusion Module is applied in neural networks. This module defines pooling operations based on information geometry theory to design an effective spatial dimension feature enhancement mechanism. This method provides corresponding time–frequency weights for features, resulting in more expressive feature representations.
(2): The low-rank bilinear pooling module is introduced to achieve cross-layer interaction between fused features and backbone features. This module obtains fine-grained feature representations of signals by mapping and aggregating between different features.

The remainder of this article is as follows. In Section 2, we introduce the signal reception model, the derivation of the information geometry model, and related work, including the backbone neural networks we used. In Section 3, we present our proposed model framework and its underlying principles. In Section 4, we evaluate various performance metrics of the model through experiments and conduct a comparative analysis. In Section 5, we summarize the strengths and weaknesses of the proposed model and provide an outlook on future development directions.

2. Related Works

2.1. Signal Data

In a complex electromagnetic environment, there typically exists one or more sources that transmit and receive signals. When there are M signal transmitters and the signals are captured by a single-channel sensor, the mathematical expression of the signal can be described as follows:

x (t) = A \sum_{i = 1}^{M} s_{i} (t) + n (t)

(1)

where

x (t)

represents the signal received by the single-channel sensor,

A

denotes the signal receiving mixing coefficient matrix,

s_{i} (t)

is the i-th independent signal source, M is the total number of independent signal sources that the sensor can receive, and n(t) represents additive noise.

In complex electromagnetic environments, the signals received often exhibit poor sparsity and significant overlap, whether in the time or frequency domains. Distinguishing between various signal types in these domains remains a considerable challenge. To address this, our study converts the signals into the time–frequency domain (TF), where they are represented as two-dimensional images, evolved from their original one-dimensional forms. This conversion notably enhances signal sparsity and magnifies the differences among distinct signals. We employ the Short-Time Fourier Transform (STFT) as the core method and investigate the energy plots it produces. The mathematical formula for the STFT is presented below:

STFT (t, f) = \int_{- \infty}^{\infty} [x (u) h (u - t)] e^{- 2 j π f u} d u

(2)

where

x (t)

is the received signal,

h (t)

is the window function of STFT, and the window function used in this article is the Hamming window.

The energy plots in the time–frequency domain are generated based on time–frequency representations. They possess the characteristic of being less affected by cross-interference, and they exhibit better separability in signal time–frequency domain.

The mathematical expression of the energy diagram is

{(STFT (t, f))}^{2} = {(\int_{- \infty}^{\infty} [x (u) h (u - t)] e^{- 2 j π f u} d u)}^{2}

(3)

The mathematical expression of the energy diagram after the signal is transformed is

X (t, f) = A \sum_{i = 1}^{M} S_{i} (t, f) + N (t, f)

(4)

where

X (t, f)

is the received signal of

x (t)

after time–frequency transformation,

S_{i} (t, f)

is the transmitted signal of

s_{i} (t)

after time–frequency transformation, and

N (t, f)

is the noise in the time–frequency domain.

2.2. Subsection Statistical Manifolds and the Geometric Structures

In the pooling layers of neural network models, traditional average pooling and max pooling operations extract features by focusing solely on the average or maximum values within specific regions. A more sophisticated approach involves performing separate average pooling and max pooling operations on the signals while sharing weights [19], but this method still results in information loss. This paper proposes a novel approach based on information geometry theory to preserve more feature information during pooling operations. The goal is to enhance the separability between sample features and improve the model’s recognition accuracy.

2.2.1. Fisher Information

Based on information geometry theory, under different principles of invariance, statistical regularity can be endowed with different Riemannian metrics, representing distinct geometric structures. The Fisher information matrix, due to its statistical and geometric properties, serves as a cornerstone in the theory and applications of information geometry, often used for constructing Riemannian metrics on statistical manifolds.

Without loss of generality, consider the density function of an exponential family,

p_{θ} (x), θ \in Θ

, and

\int_{- \infty}^{+ \infty} p_{θ} (x) d x = 1, \begin{matrix}  \end{matrix} θ \in Θ

(5)

where

θ

is the parameter space of the probability density function of the exponential distribution family.

In fact, there are many ways to define the Riemannian metric on the parameter space

Θ

. For different problem scenarios, it is necessary to select a concise and effective metric among many Riemannian metrics. Without losing generality, the algorithm in this chapter selects the Fisher information metric as the analysis object.

Assume the following:

\partial_{i} = \frac{\partial}{\partial θ_{i}}

(6)

then

\int_{- \infty}^{+ \infty} \partial_{i} \partial_{j} p_{θ} (x) d x = 0, \begin{matrix}  \end{matrix} θ \in Θ

(7)

Therefore, the Fisher information metric can be expressed as

g_{i j} = \int_{- \infty}^{+ \infty} p_{θ} (x) \partial_{i} \partial_{j} p_{θ} (x) d x

(8)

Assume

u, v \in T_{θ}

, where

T_{θ} = T_{θ} (Θ)

is the tangent space at point

θ \in Θ

,

u = u^{i} \partial_{i}, \begin{matrix} v = v^{i} \partial_{i} \end{matrix}

(9)

Therefore,

g (u, v) = g_{i j} u^{i} v^{j}

(10)

2.2.2. Gaussian Statistical Manifold

The Gaussian distribution has extremely broad practical applications. In radar engineering, the probability distributions of many random variables can be approximated by the normal distribution. It holds a significant position in information geometry and is one of the core topics in statistical geometric analysis.

Consider the following density function of the Gaussian distribution family:

p (x; μ, σ) = \frac{1}{σ \sqrt{2 π}} \exp (- \frac{{(x - μ)}^{2}}{2 σ^{2}})

(11)

where μ and σ are the mean and standard deviation of the Gaussian distribution, respectively. Taking the logarithm of (10), we can obtain

\log p (x; μ, σ) = \frac{θ_{1}^{2}}{4 θ_{2}} - \frac{1}{2} \log (- \frac{π}{θ_{2}}) + θ_{1} x + θ_{2} x^{2}

(12)

where

(θ_{1}, θ_{2}) = (μ / σ, - 1 / 2 σ)

, let

ϕ (θ_{1}, θ_{2}) = - \frac{θ_{1}^{2}}{4 θ_{2}} + \frac{1}{2} \log (- \frac{π}{θ_{2}})

(13)

Therefore,

\log p (x; μ, σ) = θ_{1} x + θ_{2} x^{2} - ϕ (θ_{1}, θ_{2})

(14)

Find the partial derivatives of both ends of Equation (13) with respect to

(θ_{1}, θ_{2})

, and we obtain

\partial_{i} \partial_{j} \log p (x; μ, σ) = - \partial_{i} \partial_{j} ϕ (θ_{1}, θ_{2})

(15)

Therefore,

\int_{- \infty}^{+ \infty} p (x; μ, σ) \partial_{i} \partial_{j} \log p (x; μ, σ) = - \partial_{i} \partial_{j} ϕ (θ_{1}, θ_{2})

(16)

And the Fisher information is

g_{i j} = - \partial_{i} \partial_{j} ϕ (θ_{1}, θ_{2})

(17)

or

[g_{i j}] = [\begin{matrix} σ & 2 μ σ \\ 2 μ σ & 4 μ^{3} σ + 2 μ σ^{2} \end{matrix}]

(18)

Based on Equation (18), this paper modifies the pooling operation within the Coordinate Attention (CA) mechanism and proposes a new attention mechanism called Geometry Coordinate Attention (GCA). The GCA aims to enhance information retention in sample features.

2.3. ResNet

In deep learning algorithms, particularly for deep convolutional neural networks (CNNs), increasing the depth of the network can enhance its learning capacity to a certain extent. However, as the network depth increases further, a phenomenon known as the vanishing or exploding gradient problem occurs during backpropagation, where gradients become extremely small or large when propagated back to shallow layers. This makes it challenging to update parameters effectively, resulting in model degradation and reduced classification performance. To address this issue, He et al. introduced residual networks (ResNet) in 2016, which is based on residual learning [20]. The main structure in ResNet is the residual building unit (RBU), consisting of a non-linear layer followed by an identity shortcut connection. The use of identity shortcut connections helps alleviate the difficulty of parameter optimization caused by increasing model depth. Parameter updates not only propagate layer by layer but can also be directly passed through the identity shortcut connections to shallower layers in ResNet, thereby reducing the training difficulty of the network effectively.

3. Geometry Coordinate Attention and Low-Rank Bilinear Pooling Network

The structure of the geometric coordinated attention and low-rank bilinear pooling network constructed in this article is shown in Figure 1. This paper uses Resnet-50 as the backbone network, adds geometric coordination attention to further improve the feature learning ability of the convolutional network, and then introduces a low-rank bilinear pooling module to achieve fine-grained analysis of signal sample features. Finally, multi-label classification is used to replace the multi-class output sample prediction values. In this section, we first introduce the principle of the geometric coordination attention module, and then introduce the multi-modal feature fine-grained fusion method of the low-rank bilinear pooling module.

3.1. Geometric Coordinated Attention Module

In order to effectively retain multi-level feature information, this paper designs a weighted fusion module of multi-scale features. Extract low-, medium-, and high-level features from the backbone network, so that features of different scales are optimized and fused in the time–frequency domain. The design structure of this module is shown in Figure 2.

First, the low-, medium-, and high-level features x₁, x₂, and x₃ of the received signal are extracted from different layers of the backbone network. Feed the features individually into GCA, where H, W, and C represent the height dimension, width dimension, and channel dimension of the feature, respectively. Since the input feature of the neural network is the time–frequency feature plot of the signal, the width dimension and height dimension of the feature can be regarded as the time and frequency dimensions of the feature. Then, in the GCA, the time–frequency dimension weight information of multi-scale modal features is extracted and weighted, and finally, the multi-scale modal features are fused.

In GCA, the pooling operation

P (x)

is defined according to Equation (18) as

P (x) = v [g_{i j}] v^{T} = v [\begin{matrix} σ & 2 μ σ \\ 2 μ σ & 4 μ^{3} σ + 2 μ σ^{2} \end{matrix}] v^{T}

(19)

where

v = [μ, σ]

;

μ

and

σ

are the mean and variance of the input features in a certain dimension. The pooling operation method defined by Equation (19) comprehensively considers the statistical and geometric characteristics of features, and can extract more feature information than the traditional average pooling operation and maximum pooling operation.

The input features encode the dimensions along the horizontal and vertical axes through two pooling operations to generate a pair of dimension-aware feature sets. This transformation can help locate the features in spatial locations, and correlate features in time and frequency. The mathematical expression of vertical pooling coding (H Pool) is as follows:

y^{H} (h, 1, C) = \sum_{0 \leq k \leq W} P (x_{C} (h, k, C))

(20)

where

x_{C}

is the feature of the received signal feature in each channel dimension of the neural network,

h

is the minimum granularity of the height dimension,

c

is the minimum granularity of the channel dimension, and

y^{H} \in R^{H \times 1 \times C}

is the height direction feature of the sample in the channel dimension and is also a frequency characteristic. The mathematical expression of horizontal pooling coding (W Pool) is as follows:

y^{W} (1, w, C) = \sum_{0 \leq k \leq H} P (x_{C} (k, w, C))

(21)

where w is the minimum granularity of the width dimension and y^W ∈ R^1×W×C is the width direction feature of the sample in the channel dimension and also the time feature.

Time and frequency features are concatenated in the channel dimension, and the convolution function with a convolution kernel size of (1,1) is used to obtain the time–frequency joint intermediate state features. Then, the batch normalization and a non-linear activation function are applied to reduce differences among the joint features.

y^{H + W} = concat (y^{H}, y^{W})

(22)

z^{H + W} = Re Lu (BN (conv (y^{H + W}))), z^{H + W} \in R^{(H + W) \times 1 \times \frac{C}{r}}

(23)

where r is used to control the GCA parameter amount.

Then, the joint intermediate features along the channel dimension are reshaped into two features:

z_{r}^{H} \in R^{H \times 1 \times (C / r)}

and

z_{r}^{W} \in R^{1 \times W \times (C / r)}

. Over-sampling operations using convolutional functions with kernel size (1,1) are applied on each feature to restore the parameter dimensions equal to the input features. Finally, a non-linear activation function is applied to each feature to obtain weights or coefficients along the time and frequency dimensions.

z^{H} = sigmoid (conv (z_{r}^{H})), z^{H} \in R^{H \times 1 \times C}

(24)

z^{W} = sigmoid (conv (z_{r}^{W})), z^{W} \in R^{1 \times W \times C}

(25)

where

sigmoid

(·) is the sigmoid function operation, which performs normalization and activation.

Finally, multiply the input features of different scales by their corresponding feature weight coefficients along the dimensions to obtain enhanced weighted features along the time and frequency dimensions. Additionally, normalize the scaled weighted features to achieve feature fusion with unified scale parameters.

x_{i}^{H + W} = x_{i} \times z_{i}^{H} \times z_{i}^{W}, i = 1, 2, 3

(26)

where

x_{i}^{H + W}

is the weighted feature obtained by multiplying the corresponding input feature with its weight coefficient.

z = x_{1}^{H + W} \oplus x_{2}^{H + W} \oplus x_{3}^{H + W}

(27)

where the ⊕ operation is element-wise addition. The variable

z \in R^{7 \times 7 \times 2048}

is the final output, which is a multi-scale weighted fusion feature.

3.2. Low-Rank Bilinear Pooling Module

This paper leverages a method involving the outer product of extracted neural network features from two modalities to perform weighted fusion of features, obtaining bilinear characteristics of the signal to enhance the network’s ability to analyze fine-grained features and consequently improve its capability for signal modulation recognition. Addressing the issue of parameter explosion resulting from bilinear pooling operations, the paper employs low-rank bilinear pooling to optimize the operation and reduce parameter count. Using Hadamard product in place of the outer product to interact with different features, the corresponding relationships are captured. This approach significantly reduces parameter count at the expense of certain computational complexity. In the final fully connected layer, a staged reduction in parameters is employed alongside introducing dropout to mitigate the risk of overfitting.

The bilinear pooling operation is shown in Figure 3, and its mathematical expression is

f = \sum_{j = 0}^{c_{x}} \sum_{k = 0}^{c_{z}} w z_{j} x_{k} + β = x^{T} W z + β

(28)

where

x

and

z

are two features that are mutually mapped.

x

is the backbone feature obtained from input samples through the backbone architecture, while

z

is the fused features output by the Geometric Coordinate Attention module from the high-, middle-, and low-level features of the samples.

C_{x}

and

C_{z}

are the channel dimensions of the two features,

w

is the mapping between the channel dimensions of the two features,

W

is the weight matrix,

β

is the bias, and

f

represents the output features.

Perform matrix decomposition on the weight matrix as follows:

W = U V^{T}

(29)

where

U

and

V

are low-rank decomposition matrices of

W

and

m

is the joint embedding dimension. Equation (28) can be further expressed in Hadamard product form, with a mathematical expression as follows:

F = x^{T} W z + β = x^{T} U V^{T} z + β = U^{T} x \circ V^{T} z + β

(30)

where

\circ

denotes the Hadamard product and

F

is the output feature after the mathematical expression transformation.

Using two low-rank matrices,

U

and

V

, to approximate

W

avoids the direct computation of the outer product of the two features as in the original bilinear pooling method. This reduction in computation and parameters decreases the parameter count from

C_{x} \times C_{z}

to

m \times (C_{x} + C_{z})

. It can be observed that the parameter count is controlled jointly by the joint embedding dimension m and the channel numbers of the two input features. Therefore, while ensuring the features remain unchanged, it is possible to attempt multi-channel merging to reconstruct feature parameter scales, reduce the number of channels, and increase computational complexity to decrease parameter count, ensuring efficient network operation.

In this module, the mutual mapping between two sets of input features is used to extract joint representations of features across channels, enabling multi-modal bilinear pooling for improved neural network performance in signal modulation recognition within the same sample. Additionally, the use of low-rank matrix approximation for outer product computations addresses the issue of excessive parameter count in bilinear pooling, thereby mitigating potential inefficiencies and overfitting effects in neural network operations.

3.3. Loss Function and Evaluation Metrics

This paper’s model uses a multi-label classification approach for signal classification, which consists of multiple binary classifiers combined together. Sample labels are encoded using one-hot encoding. The normalization function connected to the model’s output layer is replaced with the sigmoid function instead of the Softmax function. The model employs a combination of the cross-entropy loss function and the binary cross-entropy loss (BCEloss) for multi-label binary classification. Compared to traditional multi-class algorithms, this method effectively reduces label dimensions and optimally utilizes the feature space of output predictions. The BCEloss not only considers the situation when the prediction value is 1 but also incorporates the impact of the prediction value being 0 on the loss. The expression for the BCEloss is as follows:

BCEloss (y_{pred}, y_{true}) = - \frac{1}{p} \sum_{j = 1}^{p} \sum_{i = 1}^{q} [y_{true}^{i, j} \cdot \log y_{pred}^{i, j} + (1 - y_{true}^{i, j}) \cdot \log (1 - y_{pred}^{i, j})]

(31)

where

y_{t r u e}^{i, j}

is the true value of the i-th label in the j-th sample, and

y_{p r e d}^{i, j}

is the predicted value of the i-th label in the j-th sample.

p

is the total number of samples, which is equivalent to the batch size during batch training.

q

is the number of sample labels, which is equivalent to the prediction output dimension.

The evaluation indicators of the experiment are based on ref. [21], which uses the exact match ratio (EMR), hamming score (HS), and F-Measure (F1). The three indicators are used as a specific evaluation of the model’s ability to identify signal modulation types. The exact match ratio is the strictest measure of model recognition effectiveness, representing the ability of model identification to be completely accurate, but partial accuracy is also part of model evaluation. Therefore, hamming score and F-Measure are included as evaluation metrics. The hamming score refers to the accuracy feedback of the overall sample, representing the model’s perception of all labels. The F-Measure is a comprehensive measure of sample recognition accuracy, balancing precision and recall, representing the balance of the model.

The calculation formula for the exact match ratio (EMR) is as follows:

EMR = \frac{1}{p} \sum_{j = 1}^{p} I (y_{true}^{j} = = y_{pred}^{j})

(32)

where

y_{p r e d}^{j}

is the predicted value set for the j-th sample,

y_{t r u e}^{j}

is the true value set for the j-th sample, and

I (\cdot)

is a function that returns 1 when the predicted value is exactly equal to the true value, otherwise it returns 0. Accuracy represents the average accuracy across all samples, where for each individual sample, it indicates the proportion of signal categories where both predicted and true values are true among the total signal categories predicted as true or true in reality.

The hamming score calculation formula is as follows:

Acc = \frac{1}{p} \sum_{j = 1}^{p} \frac{\sum_{i = 1}^{q} y_{pred}^{i . j} \cap y_{true}^{i . j}}{\sum_{i = 1}^{q} y_{pred}^{i . j} \cup y_{true}^{i . j}}

(33)

The F-Measure calculation formula is

F 1 = \frac{1}{p} \sum_{j = 1}^{p} \frac{\sum_{i = 1}^{q} 2 (y_{pred}^{i, j} \cap y_{true}^{i, j})}{\sum_{i = 1}^{q} y_{pred}^{i, j} + y_{true}^{i, j}}

(34)

4. Simulation Experiment and Performance Analysis

4.1. Dataset

In this experiment, a single-antenna reception method was simulated under complex environments by adding colored noise to sample signals to better reflect real-world conditions. The dataset consists of single-signal samples or mixed-signal samples with a signal-to-noise ratio (SNR) spanning from −12 dB to 8 dB in 2 dB increments. The signal sampling frequency is 1 × 10⁷ Hz, with a frequency range from 0.5 × 10⁷ Hz to 3 × 10⁷ Hz. The resulting sample data format is a 224 × 224 RGB matrix. The mixing of signals is based on fifteen combinations as listed in Table 1, with each type of sample generated at various SNR levels resulting in 240 samples per combination, totaling 39,600 signal time–frequency energy plots. The mixing levels among different signal types are evenly distributed. The dataset is divided into training and test sets with a ratio of 5:1.

4.2. Experiment Environment

The experimental environment configuration for this paper is as follows: the computer system used was Ubuntu 16.04, with a TAITAN graphics card and 24 GB of memory. The programming language used was Python 3.8, the deep learning framework was PyTorch 1.7, and the optimizer was Adaptive Moment Estimation optimizer. For training, the approach involved training from scratch, with an initial learning rate set to 0.001 and adaptive decay. The training was conducted for 200 epochs, and the batch size was set to 32 based on network parameters and GPU memory size.

4.3. Ablation Study

To verify the reliability of the proposed modules in this paper, four ablation experiments were conducted using the same dataset and experimental setup with identical evaluation metrics. The neural network model containing both the Geometry Coordinate Attention Fusion Module and the low-rank bilinear pooling module was named GCABNet; the model containing only the Geometry Coordinate Attention Fusion Module was named GCANet; the model containing only the low-rank bilinear pooling module was named BNet; and the backbone network model was named InitNet. In the BNet model, both input variables in the bilinear pooling module use feature x. Additionally, a model named GCABVGG, which integrates the proposed modules into the VGG16 network structure, was compared with the VGG16 network (VGG).

The results of the ablation experiments under dynamic SNR conditions depicted in Figure 4a–c indicate that the GCA Module and the low-rank bilinear pooling module proposed in this paper led to an approximately 9% increase in EMR and a 5% increase in HS under low SNR conditions. The model’s recognition capability improved under high SNR conditions, with an EMR of over 98% for signal modulation type recognition at SNRs above 2 dB and approaching 100% at SNRs exceeding 4 dB. Specifically, the low-rank bilinear pooling module achieved an approximate 6% increase in EMR under low SNR conditions, demonstrating that bilinear pooling can yield more effective discriminative features at a finer granularity, significantly impacting the model’s recognition capability. Furthermore, the Geometric Coordinate Attention accurately obtained time–frequency dimension weights from multiple scales of sample features, effectively retaining sample feature information and assisting the model in capturing underlying discriminative features. Therefore, the proposed modules in this paper show feasibility in improving the recognition of signal modulation types.

Moreover, as shown in the experimental results in Figure 4d-f, integrating the proposed modules into the VGG network as the backbone structure resulted in a 7% improvement in EMR and approximately 9% improvement in HS under low SNR conditions, with improved balance in the F-Measure, further demonstrating the reliability of the proposed modules in this paper.

4.4. Attention Comparison Experiment

To verify the advantages of obtaining more effective time–frequency weight coefficients using the Geometric Coordinate Attention proposed in this paper’s GCA Fusion Module, this section conducts six comparative experiments by replacing the GCA part of the GCA Fusion Module with CA, SAM, BAM, CAM, CBAM, and SENet, while keeping all other conditions identical in the experiment environment. The contrasting performance of different attention on the dataset in terms of recognition capability and classification effectiveness is evaluated.

Specifically, CA (Coordinate Attention) analyzes the input features in multiple dimensions through multiple independent attention heads and outputs weight coefficients after normalization. SAM (Spatial Attention Module) analyzes both height and width dimensions, enhancing the model’s ability to process spatial information. CAM (Channel Attention Module) extracts channel dimension information of features, enhancing the model’s ability to learn dependencies between different channels. BAM (Bottleneck Attention Module) is a parallel combination of CAM and SAM attention mechanisms. CBAM (Convolutional Block Attention Module) is a series combination of CAM and SAM attention mechanisms. SENet (Squeeze-and-Excitation Network) enhances the response of important feature channels while suppressing the response of unimportant feature channels.

The experimental results in Table 2 demonstrate that the GCA Module used in this paper exhibits superior performance compared to classical attention. The GCA used in this paper outperforms other attention mechanisms in terms of both EMR and HS, with up to a 3% performance advantage in EMR and a 1% performance advantage in HS. Furthermore, the F-Measure results indicate that the stability of the mechanism proposed in this paper is also advantageous compared to other mechanisms. These results demonstrate that the GCA proposed in this paper maximizes the extraction of signal feature weight coefficients based on information geometry theory, enhancing the learning ability of neural networks and thereby improving model recognition capabilities. Compared to other mechanisms, the GCA considers both local granularity information in the time–frequency domain and integrates channel weights, retaining more feature information and enhancing the separability of signal features, proving the feasibility and importance of capturing time–frequency weight information. The comparison between the proposed GCA and the CA concludes that attention mechanisms based on information geometry theory can more effectively retain sample feature information, thus improving model recognition capabilities.

4.5. Robustness Experiment

To verify the robustness of the signal recognition model proposed in this paper, a supplementary dataset was created in this section. It includes additional mixed-signal combinations not present in the original dataset from Section 4.1. Under the same experimental conditions, this supplementary dataset was used as the test set for further experiments to validate the reliability of the signal recognition model. The signal combinations in the supplementary dataset are shown in Table 3. The robustness dataset comprises mixed signals with signal-to-noise ratios (SNRs) ranging from −12 dB to 8 dB in steps of 2 dB. Each type of sample was generated 40 times at each SNR level. In the robustness experiment, a confusion matrix was used to represent the experimental results. In this matrix, aside from signal mixture types serving as labels, the “X” label was introduced to indicate that the model recognized a signal combination not present in the corresponding dataset mixture. Additionally, the “NULL” label denotes cases where the model did not recognize any signals.

Figure 5 shows the confusion matrix obtained by randomly selecting 80 signal samples for each mixed type from the original dataset at various signal-to-noise ratios (SNRs). It reveals that the EMR for signals reaches 84.6%, with EMR exceeding 92% for single signals and nearing 80% for multiple signals. The HS is concluded to be 91.1%. It is also observed that as the number of coexisting mixed signals within a sample type increases, the model’s ability to recognize signal types weakens. This is due to the increased number of signals under the same noise conditions, causing energy dispersion in the time–frequency domain and resulting in blurred features that reduce separability and increase recognition difficulty.

In Figure 6, the EMR for signals from the supplementary dataset is only 47.2%, while the HS reaches 84.6%. The supplementary dataset contains a significantly higher number of mixed-signal samples compared to the original dataset, resulting in a noticeable decrease in the EMR due to the increased complexity of mixed samples. However, the HS approaches that of the original dataset. Considering the EMR, HS, and analysis of the confusion matrix, the model’s ability to recognize signal types in the supplementary dataset is comparable to that in the original dataset. However, there is a discernible limitation in distinguishing between different types of signal combinations, particularly affecting the recognition performance when QAM and BPSK signal types coexist. Therefore, under the same experimental conditions, introducing supplementary dataset samples with a 40:1 ratio in the training set enhances the model’s learning of different types of combinations and leads to a new supplementary confusion matrix.

In Figure 7, the EMR of the samples reaches 73%, with an HS of 90.3%. The majority of samples that did not achieve an absolute match were due to misidentifications of a single signal type, indicating a certain level of accuracy. Therefore, the model in this study demonstrates a certain capability and robustness in recognizing signal combinations outside of the training samples. After training with small samples of unknown combinations, the recognition capability improves significantly, with an approximately 17% increase in EMR and a 5.7% increase in HS, approaching the model’s accuracy on the original dataset.

5. Conclusions

To effectively capture and utilize the fine-grained features of mixed signals and reduce signal preprocessing, this study developed an end-to-end recognition model for analyzing the granularity of mixed signals. The model integrates the Geometric Coordinated Attention (GCA) Module to identify potential features across time and frequency domains and optimally utilize multi-scale features. Furthermore, the model employs low-rank bilinear pooling to map backbone features to optimized time–frequency features, thus enhancing the model’s capacity to discern signal sample types with greater granularity. In the ablation studies, the effectiveness of the newly introduced GCA Module and the low-rank bilinear pooling module was confirmed. By evaluating various attention mechanisms’ impacts on model performance, this study revealed that the chosen attention mechanism more effectively extracts time–frequency weights and preserves feature information, leading to enhanced recognition performance. The robustness experiments indicated the model’s capability to recognize novel combinations of signal types, but further improvements in model performance are needed. Future efforts will aim to boost the model’s recognition accuracy for different combinations of signals, while managing neural network complexity and maintaining parameter volume.

Author Contributions

Conceptualization, Q.Y. and Q.W.; Methodology, J.Z.; Software, X.Z. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was Supported by the National Key Research and Development Program of China (2021YFF0603903) and Zhejiang Provincial Natural Science Foundation of China (LZ23F010001).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Due to privacy concerns, the data supporting the findings of this study are not publicly available.

Conflicts of Interest

The authors declare no conflict of interest.

References

Dahap, B.I.; Liao, H.S. Advanced algorithm for automatic modulation recognition for analogue & digital signals. In Proceedings of the 2015 International Conference on Computing, Control, Networking, Electronics and Embedded Systems Engineering (ICCNEEE), Khartoum, Sudan, 7–9 September 2015; pp. 32–36. [Google Scholar]
Wu, J.P.; Han, Y.Z.; Zhang, J.M.; Wang, H.K. Automatic modulation recognition of digital communication signals. In Proceedings of the 2010 First International Conference on Pervasive Computing, Signal Processing and Applications, Harbin, China, 17–19 September 2010; pp. 590–593. [Google Scholar]
Li, P.B.; Qi, L.; Wang, J.L. Modulation recognition method of time-frequency aliasing signals in single-channel. In Proceedings of the Eleventh International Conference on Graphics and Image Processing (ICGIP 2019), Hangzhou, China, 12–14 October 2019; SPIE: Bellingham, WC, USA, 2020; Volume 11373, pp. 767–774. [Google Scholar]
Hou, C.; Liu, G.; Tian, Q.; Zhou, Z.; Hua, L.; Lin, Y. Multisignal modulation classification using sliding window detection and complex convolutional network in frequency domain. IEEE Internet Things J. 2022, 9, 19438–19449. [Google Scholar] [CrossRef]
Shao, Z.P.; Li, N.G. Research on signal modulation type recognition based on cyclic spectrum in internet of things. In Proceedings of the 2020 13th International Conference on Intelligent Computation Technology and Automation (ICICTA), Xi’an, China, 24–25 October 2020; pp. 355–360. [Google Scholar]
Yang, Y.Q.; Jiang, S.Q.; Zhou, Z. Research on signal models and recognition method for digital modulation. In Proceedings of the 2019 IEEE 7th International Conference on Computer Science and Network Technology (ICCSNT), Dalian, China, 19–20 October 2019; pp. 508–512. [Google Scholar]
Zhang, Z.L.; Li, Y.; Gao, M.G. Few-shot learning of signal modulation recognition based on attention relation network. In Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 18–21 January 2021; pp. 1372–1376. [Google Scholar]
Gao, Q.; Huang, S.; Wang, L.; Wang, K.; Zhang, Y.; Feng, Z. Modulation classification of mixed signals using independent component analysis. In Proceedings of the 2015 IEEE/CIC International Conference on Communications in China (ICCC), Shenzhen, China, 2–4 November 2015; pp. 1–5. [Google Scholar]
Wang, L.; Gao, Q.; Zhang, K.; Zhang, Y.; Feng, Z. Modulation classification of mixed signals using fast independent component analysis. In Proceedings of the 2016 IEEE Wireless Communications and Networking Conference, Doha, Qatar, 3–6 April 2016; pp. 1–5. [Google Scholar]
Huang, S.; Yao, Y.; Wei, Z.; Feng, Z.; Zhang, P. Automatic modulation classification of overlapped sources using multiple cumulants. IEEE Trans. Veh. Technol. 2016, 66, 6089–6101. [Google Scholar] [CrossRef]
Yang, Y.Q.; Zou, J.J.; Yang, L.F. An automatic modulation classification method for multi-signal scenario based on ULA. In Proceedings of the 2021 IEEE 21st International Conference on Communication Technology (ICCT), Tianjin, China, 13–16 October 2021; pp. 1304–1309. [Google Scholar]
Ai, J.; Hua, L.; Liu, J.; Chen, S.; Xu, Y.; Hou, C. Radar signal separation recognition method based on semantic segmentation. In Proceedings of the 2021 8th International Conference on Dependable Systems and Their Applications (DSA), Yinchuan, China, 11–12 September 2021; pp. 302–306. [Google Scholar]
Wang, L.; Luo, Z.; Guo, R.; Li, Y. A Review of Tags Anti-Collision Identification Methods Used in RFID Technology. Electronics 2023, 12, 3644. [Google Scholar] [CrossRef]
Zhao, H.; Liu, J.; Chen, H.; Chen, J.; Li, Y.; Xu, J.; Deng, W. Intelligent diagnosis using continuous wavelet transform and gauss convolutional deep belief network. IEEE Trans. Reliab. 2022, 72, 692–702. [Google Scholar] [CrossRef]
Li, D.; Liu, J.; Lee, S.I.; Xiong, J. Pushing the limits of fine-grained activity sensing using acoustic signals. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022, 6, 1–27. [Google Scholar]
Moghaddam, M.G.; Shirehjini, A.A.N.; Shirmohammadi, S. A WiFi-Based Method for Recognizing Fine-Grained Multiple-Subject Human Activities. IEEE Trans. Instrum. Meas. 2023, 72, 1–13. [Google Scholar] [CrossRef]
Liu, S.; Fu, X.; Xu, H.; Zhang, J.; Zhang, A.; Zhou, Q.; Zhang, H. A fine-grained ship-radiated noise recognition system using deep hybrid neural networks with multi-scale features. Remote Sens. 2023, 15, 2068. [Google Scholar] [CrossRef]
Zhang, X.; Xiong, H.; Zhou, W.; Lin, W.; Tian, Q. Picking neural activations for fine-grained recognition. IEEE Trans. Multimed. 2017, 19, 2736–2750. [Google Scholar] [CrossRef]
Liang, Z.; Tao, M.; Wang, L.; Su, J.; Yang, X. Automatic modulation recognition based on adaptive attention mechanism and ResNeXt WSL model. IEEE Commun. Lett. 2021, 25, 2953–2957. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Godbole, S.; Sarawagi, S. Discriminative methods for multi-labeled classification. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 26–28 May 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 22–30. [Google Scholar]

Figure 1. Neural network framework.

Figure 2. Geometry Coordinate Attention mechanism.

Figure 3. Low-rank bilinear pooling method.

Figure 4. Ablation results: (a) EMR in Resnet-50 backbone network; (b) HS in Resnet-50 backbone network; (c) F1 in Resnet-50 backbone network; (d) EMR in VGG16 backbone network; (e) HS in VGG16 backbone network; (f) F1 in VGG16 backbone network.

Figure 5. Confusion matrix in initial dataset.

Figure 6. Confusion matrix in supplementary dataset.

Figure 7. Confusion matrix in supplementary dataset with enhance training.

Table 1. Signal sample combination in initial dataset.

Number of Signal Modulation Types	Signal Combination
1	[LFM], [EQFM], [QAM], [FSK], [BPSK]
2	[LFM + QAM], [LFM + FSK], [LFM + BPSK], [EQFM + BPSK], [EQFM + FSK], [EQFM + QAM], [LFM + EQFM]
3	[LFM + EQFM + FSK], [LFM + EQFM + QAM], [EQFM + LFM + FSK]

Table 2. Performance comparison of different attention mechanisms in the model.

Attention Mechanisms	EMR	HS	F-Measure
SAM	81.313%	89.757%	92.424%
BAM	81.595%	90.024%	92.650%
CAM	81.446%	89.774%	92.364%
CBAM	82.301%	90.032%	91.831%
SENet	82.677%	90.671%	93.134%
CA	83.656%	90.851%	93.078%
GCA	84.651%	91.104%	93.143%

Table 3. Signal sample combination in supplementary dataset.

Number of Signal Modulation Types	Signal Combination
2	[BPSK + FSK], [BPSK + QAM], [FSK + QAM],
3	[BPSK + EQFM + FSK], [BPSK + EQFM + QAM], [BPSK + FSK + LFM], [BPSK + LFM + QAM], [EQFM + FSK + QAM], [FSK + LFM + QAM],
4	[BPSK + EQFM + FSK + LFM], [BPSK + EQFM + FSK + QAM], [BPSK + EQFM + LFM + QAM], [BPSK + FSK + LFM + QAM], [EQFM + FSK + LFM + QAM]
5	[LFM + EQFM + BPSK + FSK + QAM]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yi, Q.; Wang, Q.; Zhang, J.; Zheng, X.; Lu, Z. Fine-Grained Recognition of Mixed Signals with Geometry Coordinate Attention. Sensors 2024, 24, 4530. https://doi.org/10.3390/s24144530

AMA Style

Yi Q, Wang Q, Zhang J, Zheng X, Lu Z. Fine-Grained Recognition of Mixed Signals with Geometry Coordinate Attention. Sensors. 2024; 24(14):4530. https://doi.org/10.3390/s24144530

Chicago/Turabian Style

Yi, Qingwu, Qing Wang, Jianwu Zhang, Xiaoran Zheng, and Zetao Lu. 2024. "Fine-Grained Recognition of Mixed Signals with Geometry Coordinate Attention" Sensors 24, no. 14: 4530. https://doi.org/10.3390/s24144530

APA Style

Yi, Q., Wang, Q., Zhang, J., Zheng, X., & Lu, Z. (2024). Fine-Grained Recognition of Mixed Signals with Geometry Coordinate Attention. Sensors, 24(14), 4530. https://doi.org/10.3390/s24144530

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fine-Grained Recognition of Mixed Signals with Geometry Coordinate Attention

Abstract

1. Introduction

2. Related Works

2.1. Signal Data

2.2. Subsection Statistical Manifolds and the Geometric Structures

2.2.1. Fisher Information

2.2.2. Gaussian Statistical Manifold

2.3. ResNet

3. Geometry Coordinate Attention and Low-Rank Bilinear Pooling Network

3.1. Geometric Coordinated Attention Module

3.2. Low-Rank Bilinear Pooling Module

3.3. Loss Function and Evaluation Metrics

4. Simulation Experiment and Performance Analysis

4.1. Dataset

4.2. Experiment Environment

4.3. Ablation Study

4.4. Attention Comparison Experiment

4.5. Robustness Experiment

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI