A Lightweight Dual-Branch Complex-Valued Neural Network for Automatic Modulation Classification of Communication Signals

Xu, Zhaojing; Fan, Youchen; Fang, Shengliang; Fu, You; Yi, Liu

doi:10.3390/s25082489

Open AccessArticle

A Lightweight Dual-Branch Complex-Valued Neural Network for Automatic Modulation Classification of Communication Signals

by

Zhaojing Xu

^1,†,

Youchen Fan

^1,†,

Shengliang Fang

^1,*,

You Fu

² and

Liu Yi

²

¹

School of Space Information, Space Engineering University, Beijing 101416, China

²

Graduate School, Space Engineering University, Beijing 101416, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Sensors 2025, 25(8), 2489; https://doi.org/10.3390/s25082489

Submission received: 6 March 2025 / Revised: 31 March 2025 / Accepted: 14 April 2025 / Published: 15 April 2025

(This article belongs to the Section Communications)

Download

Browse Figures

Versions Notes

Abstract

:

Currently, deep learning has become a mainstream approach for automatic modulation classification (AMC) with its powerful feature extraction capability. Complex-valued neural networks (CVNNs) show unique advantages in the field of communication signal processing because of their ability to directly process complex data and obtain signal amplitude and phase information. However, existing models face deployment challenges due to excessive parameters and computational complexity. To address these limitations, a lightweight dual-branch complex-valued neural network (LDCVNN) is proposed. The framework uses dual pathways to separately capture features with phase information and complex-scaling-equivariant representations, adaptively fused via trainable weighted fusion. Spatial and channel reconstruction convolution (SCConv) is extended to complex domain and combined with complex-valued depthwise separable convolution block (CBlock) and complex-valued average pooling to eliminate feature redundancy and extract higher order features. Finally, efficient classification is realized through complex-valued fully connected layers and a complex-valued Softmax. The evaluations demonstrate that LDCVNN achieves the highest average accuracy on RML2016.10a with only 9.0 K parameters and without data augmentation, which reducing the number of parameters by 99.33% compared to CDSN and by 97.25% compared to CSDNN. Additionally, LDCVNN achieves a better balance between efficiency and performance across other datasets.

Keywords:

automatic modulation classification (AMC); complex-valued neural networks (CVNNs); deep learning; Riemannian manifold

1. Introduction

The rapid advancement of wireless communication technology has exacerbated the scarcity of industrial spectrum resources, driving cognitive radio (CR) technology to become a crucial solution for enhancing spectrum efficiency. As a core functional module of CR systems, automatic modulation classification (AMC) plays a vital role in improving the communication efficiency of Unmanned Aerial Vehicle systems, ensuring reliable interconnection of Internet of Things devices, and enhancing communication anti-interference capabilities [1,2]. Traditional AMC approaches are primarily divided into two categories: statistical inference based on likelihood functions [3,4,5] and machine learning based on handcrafted features [6]. The former is limited by the idealized assumptions of channel models, making it difficult to adapt to complex electromagnetic environments. The latter, although improving practicality through feature engineering such as cyclic spectrum and higher-order cumulants, still relies on expert experience for feature design and environmental adaptability.

In recent years, deep learning-based AMC technology has broken through the limitations of traditional methods, achieving a direct mapping from raw signals to modulation categories through end-to-end feature learning. Mainstream architectures such as convolutional neural network (CNN) [7,8,9], Long Short-Term Memory (LSTM) [10], and their hybrid models [11,12] have demonstrated classification performance surpassing traditional methods on public datasets. However, their underlying real-valued neural networks (RVNNs) have an essential limitation: In-phase and quadrature (IQ) signals have physical coupling in the complex domain, while RVNNs treat them as independent real-number channels, which disrupts the inherent structure of the signal, leading to the loss of phase information and weakened model interpretability.

Meanwhile, complex-valued neural networks (CVNNs) can directly process complex data and achieve collaborative processing of IQ components at the signal representation level. Their advantages are reflected in maintaining the integrity of signal phase-amplitude and possessing rotational equivariance, which can enhance the physical interpretability of modulation features and the robustness of the model to communication interference such as carrier frequency offset.

Complex-valued Neural Networks (CVNNs), initially proposed and theoretically grounded by Clarke and Hirose et al. [13] in 1990, have evolved significantly and have been demonstrated to exhibit potential exceeding that of Real-valued Neural Networks (RVNNs) in handling complex-valued data, particularly in optimization and generalization capabilities [14]. Given the inherent I/Q complex sampling nature of wireless communication signals and the critical role of amplitude and phase information for AMC, CVNNs can directly and effectively process this information, enabling a more complete extraction of waveform features. Consequently, they are considered superior to RVNNs for AMC [15,16,17].

Existing CVNN-based AMC research primarily falls into two main approaches. The first approach is based on the key complex components proposed by Trabelsi et al. [14], which involves extending layers of RVNNs to implement complex-valued operations, effectively achieving a transition from the real domain to the complex domain. For instance, Li et al. [18] proposed a complex-valued DNN, which exhibits lower error rates and fewer parameters compared to real-valued DNNs and complex-valued ResNets. Tu et al. [19] systematically compared the performance of three complex-valued neural networks with their equivalent real-valued counterparts for AMC. The results indicated that complex-valued neural networks generally outperform real-valued neural networks at higher Signal-to-Noise Ratios (SNRs) and can extract useful features from signals earlier at lower SNRs. Although the computational complexity of complex-valued neural networks is typically higher than that of real-valued neural networks, the computational cost can be brought closer to that of real-valued neural networks by adjusting the number of parameters. In 2022, to further improve the performance of CVNN-based AMC models, S. Kim et al. [20] extended max-pooling and Softmax functions to the complex domain and developed complex-valued CNN and complex-valued ResNet accordingly. Comparative experiments with real-valued CNNs and ResNets demonstrated that the proposed complex-valued classifiers significantly improved the performance of signal modulation recognition, particularly enhancing the recognition accuracy of phase-related modulation types. Furthermore, Xu et al. [17] proposed complex-valued VGG and complex-valued ResNet architectures and found that CVNNs offer higher accuracy and fewer model parameters compared to RVNNs with equivalent network architectures. In 2024, Cheng et al. [21] utilized complex-valued convolution layers, dropout layers, and complex average pooling layers to construct multiple local feature learning blocks, aimed at learning the local and hierarchical correlations of IQ signals, and combined CNN and LSTM to achieve higher precision classification. Despite the effectiveness of these methods in leveraging the amplitude and phase information of signal data, they exhibit sensitivity to complex-valued scaling. This implies that the extracted features are prone to significant changes with variations in the amplitude of the input signal, leading to a certain degree of loss in signal content and potentially affecting the robustness of the algorithm in some scenarios. While data augmentation (DA) techniques can be used to train the model to mitigate this issue, such external data processing methods are not only time-consuming but also offer limited effectiveness.

Another approach, proposed by Chakraborty et al. [22,23], is based on Riemannian manifold theory. This approach models the complex domain as a product manifold of scaling and rotation groups and defines convolution and distance transformations on the manifold, enabling the model to effectively ignore variations caused by channel attenuation or clock skew, thereby significantly enhancing the robustness of automatic modulation recognition. This method features a smaller model size, high computational efficiency, and excels in data classification and information extraction. However, it is worth noting that this approach discards a significant amount of phase information, and due to the limitations of the SurReal framework in handling complex algebraic operations, its performance on large datasets may not be ideal [24].

Additionally, considering the limited computational resources often encountered in real-world industrial scenarios, the large model size and high computational complexity can hinder the deployment of the model on hardware. Consequently, model lightweighting has gradually become a focal point of research in this field. Generally, AMC methods based on real-valued neural networks reduce computational costs through model compression (e.g., channel pruning [25,26,27]), architecture search [28,29] (e.g., MobileNet), or knowledge distillation [30]. However, these methods do not consider the characteristics of the complex domain, and directly migrating them to CVNNs may lead to a significant decline in performance. Some researchers have adopted lightweight real-valued neural networks as the main architecture, only replacing local modules with complex components, and combining them with data augmentation techniques to balance accuracy and computational complexity. For example, Wang et al. [31] combined complex-valued separable convolution and residual networks and added a hybrid data augmentation technique of rotation and splicing to compensate for the potential performance loss caused by lightweight design. And Guo et al. [32] proposed a lightweight convolutional neural network by combining data augmentation, complex-valued convolution, depthwise separable convolution(DSC), channel attention mechanisms, and channel shuffling, achieving a lightweight and low-complexity automatic modulation classification with only 9751 model parameters. In addition, Xiao et al. [33] attempted to extend DSC to the complex domain and combine it with residual connection structures to construct a lightweight structure suitable for CVNNs, aiming to facilitate the lightweight deployment of AMC systems.

Therefore, most existing CVNNs for AMC focus on performance improvement while neglecting a critical constraint in practical deployments: industrial-grade equipment’s stringent requirements for model lightweighting and computational efficiency. And current research still lacks a lightweight AMC framework specifically designed for the characteristics of the complex domain. Most current CVNNs fail to meet data processing demands in resource-constrained scenarios due to complex arithmetic units and redundant parameter designs.

To address the aforementioned issues, a lightweight dual-branch complex-valued neural network (LDCVNN) is proposed, which achieves efficient deployment while ensuring the advantages of complex-domain processing. The main innovations include:

To integrate the advantages of the two main approaches in existing CVNN-based AMC, a dual-branch extraction structure is designed to capture both features containing rich phase information and features with complex-scaling equivariance. These features are then fused using a trainable weighted fusion.
To reduce redundant complex-valued features of CVNNs, spatial and channel reconstruction convolution (SCConv) is extended to the complex domain;
To further enhance feature diversity and facilitate efficient feature mining and dimensionality reduction, the fused features are further extracted by complex-valued spatial and channel reconstruction convolution (CSCConv), complex-valued depthwise separable convolution block (CBlock) and Complex-valued average pooling (CAP).

2. Methodology

This section can be divided into three parts. The first part focuses on the problem statement and the signal model. The second part elaborates on the design and implementation of the complex-valued modules. And the third part provides a detailed introduction to the model proposed in this paper.

2.1. Problem Statement and Signal Model

AMC serves as a key technology in cognitive radio systems, playing an important role in dynamic spectrum access. Its primary task is to analyze the time–frequency characteristics of received signals to accurately identify different modulation types and transmitter types, thereby enabling dynamic spectrum allocation and interference avoidance. This technique establishes a mapping relationship between the feature space of received signals and the modulation category space, providing fundamental support for spectrum sensing and decision-making.

The input signal is a complex-valued baseband time series obtained by sampling the in-phase and quadrature components of the radio frequency signal. Since tasks like modulation classification require real-valued outputs, automatic modulation classification for wireless communication signals can be regarded as a multi-classification problem that maps complex-valued inputs to real-valued outputs:

f \in F : S \to C

. Here,

S

and

C

represent the sample space and category space, respectively. Considering practical requirements, deep learning networks approximate the mapping function between

S

and

C

by learning the relationships between the data and their corresponding labels in the dataset. The optimization objective can be expressed as follows:

\min_{f \in F} E_{(s, label) \sim D} {L_{c e} [f (s), label]}

(1)

Here,

D

represents the existing dataset,

label

denotes the modulation category labels of signal data

s

, and

L_{c e}

is the cross-entropy loss function.

Signals in the sample space can be either continuous or composed of discrete modulated bits, containing factors such as variations in frequency, phase, or amplitude. Additionally, the signals may be affected by path loss and superimposed with Gaussian white noise representing thermal noise. Therefore, the equivalent complex baseband signal model with in-phase and quadrature components used in this paper is as follows:

s (l) = A (l) e^{j (ω l + φ)} x (l) + n (l), \begin{matrix}  \end{matrix} l = 1, \dots, L

(2)

In the equation,

s (l)

represents the received signal stored in the form of discrete in-phase and quadrature components, with a sample length of

L

,

A (l)

denotes the wireless channel gain,

x (l)

represents the transmitted signal, and

n (l)

represents the complex additive Gaussian white noise. For convenience in processing satellite communication signal data and performing modulation recognition, the received signal

s

can be expressed as:

s = [\begin{matrix} ℜ {s [1]}, \dots, ℜ {s [L]} \\ ℑ {s [1]}, \dots, ℑ {s [L]} \end{matrix}] = [\begin{matrix} S_{I} \\ S_{Q} \end{matrix}] = S_{I} + j S_{Q}, \begin{matrix} S_{I}, \end{matrix} S_{Q} \in ℝ

(3)

Here, the received signal

s

is represented as a complex vector where each element is split into its real part

ℜ {s [l]}

and imaginary part

ℑ {s [l]}

.

S_{I}

and

S_{Q}

represent the in-phase component and quadrature component obtained after down-conversion of the signal, and

j

is the imaginary unit. From a mathematical perspective, the I/Q component correspond to the real and imaginary parts, respectively, and there is a mapping relationship between them during each multiplication operation, which is often overlooked in most deep learning-based AMC models. Additionally, the amplitude

|s|

and phase

∠ s

, which contain the information from the in-phase and quadrature channels, can be expressed as follows:

|s| = \sqrt{S_{I}^{2} + S_{Q}^{2}}

(4)

∠ s = \arctan (S_{Q}, S_{I})

(5)

2.2. Complex-Valued Building Blocks

2.2.1. Complex-Valued Convolution (CConv)

To directly process complex-valued data and avoid information loss or redundancy, complex-valued convolutions are employed. The complex convolutional kernel could be defined as

k = k_{r} + j k_{i}

, and for complex-valued input

s = S_{I} + j S_{Q}

, real-valued operations are used to simulate complex-valued operations. The complex-valued convolution could be obtained using the formula shown below [14]:

CConv (s) = k * s = (k_{r} * S_{I} - k_{i} * S_{Q}) + j (k_{r} * S_{Q} + k_{i} * S_{I})

(6)

Here,

*

denotes the point-wise multiplication operation. Although this approach achieves an extension from the real domain to the complex domain, allowing for full utilization of the amplitude and phase information in the IQ signals, the extracted features exhibit sensitivity to complex-valued scaling. In certain scenarios, this sensitivity may affect the robustness of the algorithm.

To address this issue, the weighted Fréchet mean filtering [34] is simultaneously employed as convolution to process the IQ data of the signal [22]. This ensures the preservation of geometric structural features, such as phase relationships and amplitude ratios, enabling the model to ignore changes caused by channel attenuation or clock offsets without requiring additional learning of features at different scales or phases. The objective of weighted Fréchet mean (wFM) convolution is to find a complex number

m

such that the sum of squares of the weighted distances from all input points to that point is minimized. This definition not only retains the weighted average characteristic of traditional convolution, but also takes into account the special properties of complex-valued data.

From the perspective of manifold geometry, the non-zero complex plane

\tilde{C}

can be viewed as the product manifold of positive magnitudes and planar rotations. By modeling the complex domain as the product manifold of the scaling group

R^{+}

and the rotation group

SO (2)

, the distance metric between any two complex-valued data

s_{1}, s_{2} \in \tilde{C}

could be defined as [22]:

d (s_{1}, s_{2}) = \sqrt{\log^{2} \frac{|s_{2}|}{|s_{1}|} + \log {‖R (∠ s_{2}) R {(∠ s_{1})}^{- 1}‖}^{2}}

(7)

R (∠ s (l)) = [\begin{matrix} \cos (∠ s (l)) & - \sin (∠ s (l)) \\ \sin (∠ s (l)) & \cos (∠ s (l)) \end{matrix}], l = 1, \dots, L

(8)

For a signal sequence

s (l)

of length

L

,

R (∠ s (l))

is a 2 × 2 rotation matrix of

R^{+} \times SO (2)

corresponding to the phase

∠ s (l)

of the signal sequence at time step

l

. The wFM convolution kernel on this manifold is obtained by optimizing the weighted variance of the manifold distances [22]:

\begin{matrix} wFM (\{w_{k}\}, \{s_{k}\}) = \arg \min_{m \in \tilde{C}} \sum_{k} w_{k} d^{2} (s_{k}, m) \end{matrix}

(9)

The weight

\{w_{k}\}

satisfies the convex combination constraint to ensure the closure of the manifold.

2.2.2. Complex-Valued Activations

The rectified linear unit (ReLU) is primarily used to capture the nonlinear features in the data. The complex-valued ReLU is implemented by applying a real-valued activation function separately to the real and imaginary parts of the features, and its expression is given in Equation (10) [14]. The Tangent ReLU [22] (tReLU) is equivalent to implementing the ReLU operation on a Riemannian manifold. It projects the manifold points onto the tangent space via a logarithmic map, applies the ReLU activation in the tangent space, and then maps the result back to the manifold using an exponential map. This operation preserves the manifold structure while achieving feature sparsification and enhancing the nonlinear expressive capability. The formula is presented as Equation (11).

C Re LU (s) = Re LU (S_{I}) + j Re LU (S_{Q})

(10)

t Re LU (s) = e x p (Re LU (l o g |s|) + j Re LU (∠ s))

(11)

2.2.3. Complex-Valued Max Pooling (CMP)

Complex-Valued Max Pooling (CMP) resolves this by comparing magnitudes through amplitude-phase optimization. The formula is as follows:

\{\begin{matrix} I = I n d e x ({MP}_{k, s} \{|s|\}) \\ {CMP}_{k, s} (s) = S_{I} (I) + j S_{Q} (I) \end{matrix}

(12)

Here,

k

and

s

denote the pooling kernel size and stride, respectively.

I

represents the index matrix of the maximum value after local regional max-pooling, and MP refers to the real-valued max-pooling operation.

2.2.4. Complex-Valued Average Pooling (CAP)

Complex-valued average pooling can be achieved by calculating the average of the real and imaginary parts of all complex numbers within a local region separately. The formula for CAP is as follows:

{CAP}_{k, s, p} (s) = {AP}_{k, s, p} (|s|)

(13)

Here,

k

,

s

and

p

, respectively denote the pooling size, stride, and padding size, while AP represents the real-valued average pooling operation.

2.2.5. Complex-Valued Batch Normalization (CBN)

The mathematical expression for the complex-valued batch normalization is given in the following formula [14]:

CBN (\tilde{s}) = γ \tilde{s} + β

(14)

Here,

γ

and

β

are the scaling parameter and shifting parameter in the complex domain, respectively, and

\tilde{s}

is the normalized representation of the complex-valued input

s

.

γ = [\begin{matrix} γ_{r r} & γ_{r i} \\ γ_{i r} & γ_{i i} \end{matrix}]

(15)

\tilde{s} = {(V)}^{- \frac{1}{2}} (s - E (s))

(16)

V = [\begin{matrix} cov (S_{I}, S_{I}) & cov (S_{I}, S_{Q}) \\ cov (S_{Q}, S_{I}) & cov (S_{Q}, S_{Q}) \end{matrix}]

(17)

Here, the scaling parameter

γ

consists of

γ_{r r}

and

γ_{i i}

, both initialized to 1, while

γ_{i r}

and

γ_{r i}

as well as the real and imaginary parts of the complex-valued offset

β

are all initialized to 0.

{(V)}^{- \frac{1}{2}}

is the square root of the inverse of the input covariance matrix

V

, and

\tilde{s}

follows a standard complex-valued distribution with a mean of 0, covariance of 1, and pseudo-covariance of 0.

cov (\cdot, \cdot)

is the covariance operator.

2.2.6. Complex-Valued Full Connection (CFC)

Unlike simple linear transformations, the complex-valued full connection (CFC) performs a combination of the real and imaginary components of neurons.

w_{r}

and

w_{i}

are the real and imaginary parts of the complex-valued fully connected weight, respectively.

CFC (s) = (w_{r} \times S_{I} - w_{i} \times S_{Q}) + j (w_{r} \times S_{Q} + w_{i} \times S_{I})

(18)

2.2.7. Complex-Valued Depthwise Separable Convolution Block (CBlock)

The composition of the complex-valued depthwise separable convolution block (CBlock) [33] is illustrated in Figure 1a. Its structure includes residual connections, where all convolution layers in the residual paths are one-dimensional, complex-valued depthwise separable convolutions (CDSC1d) [33]. With its unique architecture, CBlock can effectively capture the intrinsic relationships in complex-valued data using a lightweight design, thereby reducing computational complexity while enhancing the model’s expressiveness and performance. Rep denotes the number of repetitions, and stride represents the stride value. The CDSC1d leverages the mapping advantages of depthwise separable convolution (DSC) to simplify cross-channel and spatial correlations. It is implemented through complex-valued pointwise convolution (CPWC) with complex-valued operations on the channel dimension and depthwise convolution (DWC) with real-valued operations on the spatial dimension, as shown in Figure 1b.

2.2.8. Complex-Valued Spatial and Channel Reconstruction Convolution (CSCConv)

Spatial and channel reconstruction convolution (SCConv) [35] is an efficient convolution module composed of a spatial reconstruction unit (SRU) and a channel reconstruction unit (CRU), which are designed to suppress spatial redundancy and reduce channel redundancy, respectively. To adapt SCConv for CVNNs in handling complex-valued features, we made innovative improvements. Specifically, the SRU was retained due to its proven effectiveness in evaluating and reconstructing spatial features, as well as its strong performance in suppressing spatial redundancy and enhancing feature representation. However, the CRU, while effective at reducing channel redundancy for real-valued features using the Split-Transform-and-Fuse strategy, exhibits limitations when dealing with complex-valued features. Therefore, we replaced the original CRU with a complex-valued channel reconstruction unit (CCRU), resulting in the proposed complex-valued spatial and channel reconstruction convolution (CSCConv).

The structure of the SRU is illustrated in the green dashed area of Figure 2. It uses the scaling parameters of the group normalization (GN) layer to measure spatial information across channels, calculates normalized weights to generate reweighting factors for enhancing or suppressing different parts of the feature map, and performs grouping and recombination operations to reconstruct spatial features. This process simplifies spatial information processing and improves computational efficiency.

The architecture of the CCRU is shown in the blue dashed area of Figure 2. During the data splitting stage, the input is divided into two parts, and channel-wise compression is performed via 1 × 1 CConv to ensure full utilization of complex-valued information in subsequent processing. In the transformation stage, complex-valued groupwise convolution (CGWC) and CPWC are introduced to more effectively extract high-level information from complex-valued features while reducing computational redundancy. Both CGWC and CPWC replace traditional two-dimensional real-valued convolutions with one-dimensional CConv layer. In the fusion stage, complex global average pooling is applied to reduce feature dimensions in the complex domain while preserving statistical information of complex-valued features, facilitating subsequent feature fusion. Other settings remain consistent with those of Reference [35].

2.2.9. Complex-Valued Softmax (CSoftmax)

Typically, Softmax is used as the final step in deep learning-based AMC models to normalize predictions into a probability distribution. In this paper, the real-valued Softmax is extended to the complex domain by utilizing the magnitude of complex data [20].

C S o f t m a x (s) = \frac{\exp (|s|)}{\sum_{m = 1}^{n} \exp (|s_{m}|)}

(19)

2.3. Proposed Model

In order to balance computational complexity and classification accuracy, a lightweight dual-branch complex-valued neural network (LDCVNN) is proposed for automatic modulation classification (AMC) of communication signals. The structural diagram of LDCVNN is shown in Figure 3. First, the original IQ signals are processed by the phase information feature extraction (PIFE) module and the complex-valued scaling equivariant feature extraction (CSEFE) module, respectively, to extract features rich in phase information and features with complex scaled equivariance. Then, the trainable weighted feature fusion is used to adaptively adjust the weights, thereby effectively fusing the features extracted from the two branches. Subsequently, the feature extraction and dimension reduction (FEDR) module is used to achieve more efficient feature mining and dimension reduction. Finally, the extracted features are mapped to the modulation category space through a CFC and a CSoftmax, completing end-to-end automatic modulation classification.

The specific design of each functional module is described below:

PIFE Module: This module is constructed by cascading CDSC1d, CBN, and CReLU. The PIFE module aims to effectively capture the instantaneous phase characteristics of the signal and retain the joint spectral-temporal information.

CSEFE Module: This module is based on Riemannian manifold geometry theory and is implemented using a cascaded structure of wFM layer and tReLU layer. This module first converts original IQ signals into Amplitude-Phase (AP) signals and then extracts features with complex-valued scaling equivariance through orthogonal decomposition in the manifold space.

FEDR Module: This module comprises CSCConv, CBlock and CAP. This module aims to further extract high-level semantic features, enhance feature diversity, reduce redundant computation, and ultimately achieve efficient feature mining and dimension reduction.

3. Experiments

3.1. Datasets and Experimental Condition

Table 1 shows the datasets used in the experiments. Among these, RML2016.10a and RML2016.10b simulate time-varying stochastic channel effects under adverse propagation conditions, such as additive white gaussian noise and frequency offsets [36,37]. The RML2018.01a is collected indoors using USRP B210 at the 900 MHz ISM frequency band [38]. HisarMod2019.1 includes a wider range of multipath fading types (ideal, static, Rayleigh, Rician, and Nakagami-m channels) with different delays, making it more representative of real-world wireless communication environments [39]. A total of 156,000 samples (300 signals × 26 modulation categories × 20 SNR levels) are selected from HisarMod2019.1 for experimentation. Table 2 presents the detailed layer specifications of the proposed model under different data lengths. SNR refers to the in-band Signal-to-Noise Ratio. The in-band SNR is defined as the ratio of the desired signal power to the noise power within the signal bandwidth.

In the experiments, the cross-entropy loss function is adopted, and the Adam optimizer is used for model training. The initial learning rates are set to 0.01 and 0.001, respectively, with a total of 300 training epochs and a batch size of 400. Other hyperparameters are kept at their default values. Two training strategies are employed during the training process: one involves dynamically adjusting the learning rate, where the learning rate is halved if the validation accuracy does not improve over five consecutive epochs; the other maintains a fixed global learning rate throughout the training. After training, the model parameters with the highest classification accuracy on the validation set are selected as the optimal parameters for testing the model’s classification accuracy on the test set. The experiments are conducted using a GeForce GTX 3090 GPU, an Intel(R) Xeon(R) Gold 6248R CPU, the Windows 10 operating system, and the deep learning framework is PyTorch 1.11.0.

3.2. Evaluation Method

According to the true values and predicted values, the entire sample set can be divided into true positive (TP), false positive (FP), true negative (TN), and false negative (FN). Therefore, accuracy is defined as the proportion of all correctly classified samples to the total number of samples. F1-score is a harmonic average of precision and recall that allows for a comprehensive evaluation of model performance. In the experiments, accuracy and F1-score are used to evaluate the performance of the network model on different datasets. Accuracy and F1-score could be calculated as follows:

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(20)

\{\begin{matrix} F_{1} - score = \frac{2 \times Precision \times Recall}{Precision + Recall} \\ Precision = \frac{TP}{TP + FP} \\ Recall = \frac{TP}{TP + FN} \end{matrix}

(21)

Here, precision indicates the proportion of samples that are actually positive among all the samples predicted as positive by the model. And recall indicates the proportion of actually positive samples that are correctly predicted as positive by the model. The core goal of model lightweighting is to reduce resource consumption and enhance performance in edge computing. Therefore, it is crucial to comprehensively evaluate the degree of model lightweighting. This paper conducts a quantitative analysis of the model from multiple dimensions, including inference speed, floating-point operations per second (FLOPs), and the number of parameters.

The number of model parameters refers to the total count of all trainable parameters, reflecting the model’s ability to learn and store information. A higher parameter count enhances the model’s expressive power but also increases the risk of overfitting, requiring more data for training and greater computational resources; inference speed is defined as the average time required by the model to process a single sample during the testing phase, which is an important metric for measuring model efficiency; and FLOPs represents the amount of floating-point operations required for a single forward pass of the model. Lower FLOPs indicate higher computational efficiency and a more lightweight model design.

3.3. Experiment Settings

As a preliminary step to evaluating model performance, we performed a pre-experiment to determine the most effective position for the CSCConv within the LDCVNN. CSCConv is an efficient convolutional module designed to effectively extract channel features of the target while reducing redundant computations. When placed in an appropriate position, it could help the model learn faster and converge more effectively during training. To investigate the optimal placement of CSCConv in the LDCVNN, experiments were conducted by placing CSCConv at three different positions as shown in Figure 4, marked as ①, ②, and ③. In addition, to assess the influence of the number of repetitions of the first CBlock on the overall model performance, the experiment was conducted with repetition counts ranging from 1 to 8. Furthermore, the model design exhibiting the best results in preliminary experiments was then used as the basis for all subsequent ablation and comparative analyses. The RML2016.10a was used for the experiments, with a training/validation/testing split of 6:2:2.

Ablation studies were performed on the RML2016.10a. Random sampling without replacement was used to create a 6:2:2 split for the training, validation, and testing sets. The LDCVNN was then compared against LDCVNN-A, LDCVNN-B, and LDCVNN-C to validate the effectiveness of each module within LDCVNN for AMC. Here, LDCVNN represents the baseline model without any ablation modifications. LDCVNN-A, LDCVNN-B, LDCVNN-C, and LDCVNN-D represent versions of LDCVNN where the CSEFE, PIFE, CSCConv, or the second CBlock were removed, respectively. LDCVNN + DA represents the proposed LDCVNN model augmented with a data augmentation (DA).

Comparative experiments were conducted on the RML2016.10a, RML2016.10b, RML2018.01a, and HisarMod2019.1. Random sampling without replacement was initially applied to create training, validation, and testing set splits with ratios of 6:2:2, 6:2:2, 3:4:4, and 1:1:1, respectively. To evaluate the generalization ability and lightweight characteristics of the LDCVNN, several key AMC models were selected as benchmarks for comparison, as presented in Table 3. During the training process, the initial learning rate for CDSN was set to 0.0001, which was referenced from the Reference [19]. All other parameter settings remained consistent with those described in Section 3.1.

4. Results and Discussions

4.1. Pre-Experiment

The experimental results in Table 4 demonstrate that Position ① achieves the optimal accuracy with the fewest model parameters. This suggests that placing the CSCConv after feature-weighted fusion and before the CBlock helps to reduce redundant information in the intermediate layers, thus more effectively facilitating subsequent deep feature learning. Figure 5 demonstrates that the model attains its peak accuracy on the test set when the repetition count is set to 5.

4.2. Ablation Experiments

Table 5 presents the average accuracy and model’s parameters of the LDCVNN after ablation studies and with the addition of a DA module. The average accuracy (all SNRs) represents the accuracy averaged over all SNRs and all modulation categories. The results are evident that the model’s accuracy slightly decreases following the ablation operations, while the model capacity also reduces. Among the removed components, the exclusion of PIEF has the most significant impact on model accuracy, resulting in a 6.18% decrease. This indicates that PIEF module plays a critical role in modulated signal classification within the model, corroborating its ability to extract complex features rich in phase information, which is essential for AMC tasks. The removal of CSCConv, the second CBlock, or the CSEFE module leads to accuracy drops of 1.61%, 0.88%, and 0.52%, respectively. These findings suggest that these modules contribute to model accuracy without significantly increasing computational burden, even when the baseline accuracy is already relatively high. Their effectiveness is further demonstrated by their cooperative interactions with other modules, enhancing classification accuracy. This validates the effectiveness of each module in improving the model’s performance for AMC.

Additionally, compared to LDCVNN, the classification accuracy of LDCVNN + DA improves by only 0.05%. This result demonstrates that the proposed LDCVNN model can effectively retain the phase information of the original data while preserving the complex-scaling equivariance of complex features, enabling it to capture important characteristics of IQ signals. Consequently, the model does not rely on data augmentation to enhance performance during training. Furthermore, this suggests that the LDCVNN model exhibits strong robustness in handling data, adapting well to the existing training dataset without requiring additional transformations.

4.3. Comparative Experiments

Table 6, Table 7, Table 8 and Table 9 summarizes the performance of our model compared to various benchmark models on four datasets—RML2016.10a, RML2016.10b, RML2018.01a and HisarMod2019.1. Average Accuracy (≥0 dB) represents the average accuracy when the SNR is greater than or equal to 0 dB. Highest accuracy represents the maximum average accuracy at various SNRs. Figure 6 and Figure 7 present t-SNE visualizations of the feature distributions for various models on RML2016.10a at different SNR level. The features, extracted before the classifiers of each model, are projected into a two-dimensional coordinate system after dimensionality reduction using t-SNE. As illustrated in Figure 6 and Figure 7, specific modulation categories are represented by different colors. These visualizations demonstrate the data distribution of each modulation category and the separation degree between different categories.

In terms of model’s parameters, our model exhibits a significant advantage, maintaining a stable parameter size of 9.0–11.3 K on all datasets. This represents a three-order reduction compared to ResNet (21,450 K). As shown in Table 6, our model outperforms other lightweight models such as ULCNN (9.4 K) and CSDNN (327.1 K) on RML2016.10a, demonstrating superior parameter efficiency. This efficiency stems from the collaborative optimization and lightweight design of the modules within the proposed model, which effectively minimizes parameter redundancy.

Regarding computational complexity, our model achieves extremely low FLOPs of 0.00060 G/Sample on RML2016.10a, which is only 73% of the similarly lightweight ULCNN (0.00082 G/Sample) and three orders of magnitude lower than the computationally intensive ResNet (0.993 G/Sample). This low computational cost makes the model highly suitable for resource-constrained environments such as edge devices and real-time communication systems.

In terms of performance, our model achieves average accuracies of 62.41% and 63.97% on RML2016.10a and RML2016.10b, respectively, surpassing traditional models like CNN2 (51.46%/48.74%) and lightweight models like ULCNN (60.58%/63.09%). Additionally, its performance is comparable to the more computationally expensive MCLDNN (60.83%/65.49%), with a gap of less than 1.5%. This indicates that the lightweight design of our model does not compromise its core classification capabilities. The F1-Score trends align with the accuracy results, confirming that the model achieves a balanced trade-off between precision and recall. Meanwhile, our model demonstrates strong performance on RML2018.01a and HisarMod2019.1, which have larger data length, achieving slightly higher average and high accuracies compared to the lightweight ULCNN. On the more challenging dataset, HisarMod2019.1, our model achieves an average accuracy of 43.97%, surpassing most models and closely approaching the performance of ULCNN (43.85%), while remaining competitive with PET-CGDNN. The performance of all models on HisarMod2019.1 is generally lower compared to that in the literature [2]. This is because the size of the training set used in this study is only 12.5% of the training set size reported in the literature [2]. In the field of deep learning, the amount of training data significantly impacts model performance. Typically, more training data enables the model to better learn the patterns and features within the data, thus improving its generalization ability on unseen data. Therefore, when the size of the training set is substantially reduced, the model may not be able to sufficiently learn the necessary features, resulting in decreased performance. Additionally, HisarMod2019.1 is designed to include multiple channel environments and a larger number of modulation categories, further increasing the complexity of the task.

When compared to other CVNNs, such as CDSN and CSDNN, our model exhibits significant advantages in parameter compression, computational efficiency, and generalization. In terms of model parameters, the proposed model achieves a two-order reduction compared to these CVNNs. For example, CDSN requires 1336.1 K/1331.6 K parameters on RML2016.10a and RML2016.10b, while CSDNN requires 327.1 K/315.4 K parameters. In contrast, our model requires only 9.0 K parameters, achieving reductions of 99.33% compared to CDSN and 97.25% compared to CSDNN. Furthermore, our model achieves FLOPs of 0.0006–0.00062 G/Sample on RML2016.10a and RML2016.10b, representing an 82.2% reduction compared to CDSN (0.00337 G/Sample) and a 93.5% reduction compared to CSDNN (0.00923 G/Sample). As shown in Figure 7 (g2–l2), compared to CDSN and CSDNN, our model achieves better data separation for different modulation categories at SNR = 2 dB. Notably, the data for QAM16 and QAM64 are relatively more compact and clustered, which facilitates subsequent classification tasks.

In addition, our model demonstrates superior generalization capabilities on RML2016.10b and HisarMod2019.1. It achieves an average accuracy of 63.97% on RML2016.10b, comparable to CDSN (64.04%) and CSDNN (65.82%), but with only 1.4% (vs. CDSN) and 2.8% (vs. CSDNN) of their parameter counts. And our model achieves an accuracy of 43.97% on HisarMod2019.1, improving by 37.3% over CDSN (32.04%) and by 1.47% over CSDNN (43.33%). The result highlights the effectiveness of the lightweight design of the proposed model in achieving high performance with minimal computational resources.

5. Conclusions

A lightweight dual-branch complex-valued neural network (LDCVNN) is proposed in this paper, which is capable of extracting features enriched with phase information as well as features possessing complex-scaling equivariance. The proposed model, with its compact design of only 9.0 K parameters, achieves the highest average accuracy on RML2016.10a without the need for data augmentation. It also achieves reductions of 99.33% compared to CDSN and 97.25% compared to CSDNN. Extensive evaluations conducted on other datasets, including RML2016.01b, RML2018.01a, and HisarMod2019, demonstrate that the proposed method successfully strikes a favorable balance between computational efficiency and classification performance. Compared to contemporary CVNNs employed for AMC, our approach exhibits superior lightweight properties and strong generalization capabilities. This research offers an effective solution for AMC tasks in resource-constrained scenarios, holding significant theoretical and practical implications.

For future work, we suggest exploring strategies to further enhance model performance under more challenging channel conditions. Additionally, investigating methods to enable continuous learning and real-time adaptation in dynamically evolving complex electromagnetic environments would allow the full potential of the model’s lightweight and efficient design to be realized.

Author Contributions

Conceptualization, Z.X. and Y.F. (Youchen Fan); methodology, Z.X.; software, Z.X. and Y.F. (Youchen Fan); validation, Z.X., Y.F. (Youchen Fan), Y.F. (You Fu) and L.Y.; formal analysis, Z.X. and Y.F. (You Fu); investigation, Z.X.; resources, S.F.; data curation, Z.X.; writing—original draft preparation, Z.X.; writing—review and editing, Z.X.; visualization, Z.X. and Y.F. (Youchen Fan); supervision, L.Y.; funding acquisition, S.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Key Basic Research Projects of the Basic Strengthening Program, grant number 2020-JCJQ-ZD-071.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AMC	automatic modulation classification
LDCVNN	lightweight dual-branch complex-valued neural network
CVNN	complex-valued neural network
RVNN	real-valued neural network
IQ	in-phase and quadrature
SCConv	spatial and channel reconstruction convolution
CBlock	complex-valued depthwise separable convolution block
CDSC	complex-valued depthwise separable convolution
CR	cognitive radio
CNN	convolutional neural network
GRU	gated recurrent unit
DA	data augmentation
DSC	depthwise separable convolution
ReLU	rectified linear unit
DWC	depthwise convolution
SRU	spatial reconstruction unit
CRU	channel reconstruction unit
GN	group normalization
PIFE	phase information feature extraction
CSEFE	complex-valued scaling equivariant feature extraction
FEDR	feature extraction and dimension reduction
TP	true positive
FP	false positive
TN	true negative
FN	false negative
FLOPs	floating-point operations per second
DNN	dense neural network
LSTM	long short-term memory network
CConv	complex-valued convolution
wFM	weighted Fréchet mean filtering
tReLU	tangent rectified linear unit
CMP	complex-valued max pooling
CAP	complex-valued average pooling
CBN	complex-valued batch normalization
CFC	complex-valued full connection
CBlock	complex-valued depthwise separable convolution block
CDSC	complex-valued depthwise separable convolutions
CPWC	complex-valued pointwise convolution
CSCConv	complex-valued spatial and channel reconstruction convolution

References

Huynh-The, T.; Pham, Q.V.; Nguyen, T.V.; Nguyen, T.T.; Ruby, R.; Zeng, M.; Kim, D.S. Automatic Modulation Classification: A Deep Architecture Survey. IEEE Access 2021, 9, 142950–142971. [Google Scholar] [CrossRef]
Zhang, F.; Luo, C.; Xu, J.; Luo, Y.; Zheng, F.-C. Deep learning based automatic modulation recognition: Models, datasets, and challenges. Digit. Signal Process. 2022, 129, 103650. [Google Scholar] [CrossRef]
Dulek, B. Online Hybrid Likelihood Based Modulation Classification Using Multiple Sensors. IEEE Trans. Wirel. Commun. 2017, 16, 4984–5000. [Google Scholar] [CrossRef]
Wen, W.; Mendel, J.M. Maximum-likelihood classification for digital amplitude-phase modulations. IEEE Trans. Commun. 2000, 48, 189–193. [Google Scholar] [CrossRef]
Xu, J.L.; Su, W.; Zhou, M. Likelihood-Ratio Approaches to Automatic Modulation Classification. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2011, 41, 455–469. [Google Scholar] [CrossRef]
Hazza, A.; Shoaib, M.; Alshebeili, S.A.; Fahad, A. An overview of feature-based methods for digital modulation classification. In Proceedings of the 2013 1st International Conference on Communications Signal Processing, and their Applications (ICCSPA), Sharjah, United Arab Emirates, 12–14 February 2013; pp. 1–6. [Google Scholar]
Meng, F.; Chen, P.; Wu, L.; Wang, X. Automatic modulation classification: A deep learning enabled approach. IEEE Trans. Veh. Technol. 2018, 67, 10760–10772. [Google Scholar] [CrossRef]
Snoap, J.A.; Popescu, D.C.; Latshaw, J.A.; Spooner, C.M. Deep-Learning-Based Classification of Digitally Modulated Signals Using Capsule Networks and Cyclic Cumulants. Sensors 2023, 23, 5735. [Google Scholar] [CrossRef]
Snoap, J.A.; Popescu, D.C.; Spooner, C.M. Deep-Learning-Based Classifier With Custom Feature-Extraction Layers for Digitally Modulated Signals. IEEE Trans. Broadcast. 2024, 70, 763–773. [Google Scholar] [CrossRef]
Rajendran, S.; Meert, W.; Giustiniano, D.; Lenders, V.; Pollin, S. Deep Learning Models for Wireless Signal Classification with Distributed Low-Cost Spectrum Sensors. IEEE Trans. Cogn. Commun. Netw. 2018, 4, 433–445. [Google Scholar] [CrossRef]
Xu, J.; Luo, C.; Parr, G.; Luo, Y. A Spatiotemporal Multi-Channel Learning Framework for Automatic Modulation Recognition. IEEE Wirel. Commun. Lett. 2020, 9, 1629–1632. [Google Scholar] [CrossRef]
Liu, X.; Yang, D.; El Gamal, A. Deep Neural Network Architectures for Modulation Classification. In Proceedings of the 51st IEEE Asilomar Conference on Signals Systems, and Computers, Pacific Grove, CA, USA, 29 October–1 November 2017; pp. 915–919. [Google Scholar]
Clarke, T.L. Generalization of neural networks to the complex plane. In Proceedings of the 1990 IJCNN International Joint Conference on Neural Networks, San Diego, CA, USA, 17–21 June 1990; pp. 435–440. [Google Scholar]
Trabelsi, C.; Bilaniuk, O.; Zhang, Y.; Serdyuk, D.; Subramanian, S.; Santos, J.F.; Mehri, S.; Rostamzadeh, N.; Bengio, Y.; Pal, C.J. Deep Complex Networks. arXiv 2017, arXiv:1705.09792. [Google Scholar]
Hirose, A.; Yoshida, S. Comparison of Complex- and Real-Valued Feedforward Neural Networks in Their Generalization Ability. In Neural Information Processing; Lu, B.-L., Zhang, L., Kwok, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; Volume 7062, pp. 526–531. [Google Scholar]
Lee, C.; Hasegawa, H.; Gao, S. Complex-Valued Neural Networks: A Comprehensive Survey. IEEE-CAA J. Autom. Sin. 2022, 9, 1406–1426. [Google Scholar] [CrossRef]
Xu, J.; Wu, C.; Ying, S.; Li, H. The Performance Analysis of Complex-Valued Neural Network in Radio Signal Recognition. IEEE Access 2022, 10, 48708–48718. [Google Scholar] [CrossRef]
Li, W.; Xie, W.; Wang, Z. Complex-Valued Densely Connected Convolutional Networks. In Data Science; Zeng, J., Jing, W., Song, X., Lu, Z., Eds.; Springer: Singapore, 2020; Volume 1257, pp. 299–309. [Google Scholar]
Tu, Y.; Lin, Y.; Hou, C.; Mao, S. Complex-Valued Networks for Automatic Modulation Classification. IEEE Trans. Veh. Technol. 2020, 69, 10085–10089. [Google Scholar] [CrossRef]
Kim, S.; Yang, H.-Y.; Kim, D. Fully Complex Deep Learning Classifiers for Signal Modulation Recognition in Non-Cooperative Environment. IEEE Access 2022, 10, 20295–20311. [Google Scholar] [CrossRef]
Cheng, R.; Chen, Q.; Huang, M. Automatic modulation recognition using deep CVCNN-LSTM architecture. Alex. Eng. J. 2024, 104, 162–170. [Google Scholar] [CrossRef]
Chakraborty, R.; Wang, J.; Yu, S.X. SurReal: Fréchet Mean and Distance Transform for Complex-Valued Deep Learning. arXiv 2019, arXiv:1906.10048. [Google Scholar]
Chakraborty, R.; Xing, Y.; Yu, S.X. SurReal: Complex-Valued Learning as Principled Transformations on a Scaling and Rotation Manifold. IEEE Trans. Neural Netw. Learn. Syst. 2020, 3, 940–951. [Google Scholar] [CrossRef]
Singhal, U.; Xing, Y.; Yu, S.X. Co-domain Symmetry for Complex-Valued Deep Learning. arXiv 2021, arXiv:2112.01525. [Google Scholar]
Zhang, F.; Luo, C.; Xu, J.; Luo, Y. An Efficient Deep Learning Model for Automatic Modulation Recognition Based on Parameter Estimation and Transformation. IEEE Commun. Lett. 2021, 25, 3287–3290. [Google Scholar] [CrossRef]
Lin, Y.; Tu, Y.; Dou, Z. An Improved Neural Network Pruning Technology for Automatic Modulation Classification in Edge Devices. IEEE Trans. Veh. Technol. 2020, 69, 5703–5706. [Google Scholar] [CrossRef]
Tu, Y.; Lin, Y. Deep Neural Network Compression Technique Towards Efficient Digital Signal Modulation Recognition in Edge Device. IEEE Access 2019, 7, 58113–58119. [Google Scholar] [CrossRef]
Zhang, X.; Zhao, H.; Zhu, H.; Adebisi, B.; Gui, G.; Gacanin, H.; Adachi, F. NAS-AMR: Neural Architecture Search-Based Automatic Modulation Recognition for Integrated Sensing and Communication Systems. IEEE Trans. Cogn. Commun. Netw. 2022, 8, 1374–1386. [Google Scholar] [CrossRef]
Fu, X.; Gui, G.; Wang, Y.; Ohtsuki, T.; Adachi, F. Lightweight Automatic Modulation Classification Based on Decentralized Learning. IEEE Trans. Cogn. Commun. Netw. 2021, 8, 57–70. [Google Scholar] [CrossRef]
Lin, Y.; Zha, H.; Tu, Y.; Zhang, S.; Yan, W.; Xu, C. GLR-SEI: Green and Low Resource Specific Emitter Identification Based on Complex Networks and Fisher Pruning. IEEE Trans. Emerg. Top. Comput. Intell. 2023, 8, 3239–3250. [Google Scholar] [CrossRef]
Wang, F.; Shang, T.; Hu, C.; Liu, Q. Automatic Modulation Classification Using Hybrid Data Augmentation and Lightweight Neural Network. Sensors 2023, 23, 4187. [Google Scholar] [CrossRef]
Guo, L.; Wang, Y.; Liu, Y.; Lin, Y.; Zhao, H.; Gui, G. Ultralight Convolutional Neural Network for Automatic Modulation Classification in Internet of Unmanned Aerial Vehicles. IEEE Internet Things J. 2024, 11, 20831–20839. [Google Scholar] [CrossRef]
Xiao, C.; Yang, S.; Feng, Z. Complex-Valued Depthwise Separable Convolutional Neural Network for Automatic Modulation Classification. IEEE Trans. Instrum. Meas. 2023, 72, 2522310. [Google Scholar] [CrossRef]
Fréchet, M.R. Les éléments aléatoires de nature quelconque dans un espace distancié. Ann. L’institut Henri Poincaré 1948, 10, 215–310. [Google Scholar]
Li, J.; Wen, Y.; He, L. SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 6153–6162. [Google Scholar]
O’Shea, T.J.; West, N. Radio Machine Learning Dataset Generation with GNU Radio. In Proceedings of the GNU Radio Conference, Boulder, CO, USA, 12–16 September 2016. [Google Scholar]
O’Shea, T.J.; Corgan, J.; Clancy, T.C. Convolutional radio modulation recognition networks. In Proceedings of the Engineering Applications of Neural Networks: 17th International Conference, EANN 2016, Aberdeen, UK, 2–5 September 2016; Proceedings 17. pp. 213–226. [Google Scholar]
O’Shea, T.J.; Roy, T.; Clancy, T.C. Over-the-Air Deep Learning Based Radio Signal Classification. IEEE J. Sel. Top. Signal Process. 2018, 12, 168–179. [Google Scholar] [CrossRef]
Tekbıyık, K.; Ekti, A.R.; Görçin, A.; Kurt, G.K.; Keçeci, C. Robust and Fast Automatic Modulation Classification with CNN under Multipath Fading Channels. In Proceedings of the 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 25–28 May 2020; pp. 1–6. [Google Scholar]
Njoku, J.N.; Morocho-Cayamcela, M.E.; Lim, W. CGDNet: Efficient Hybrid Deep Learning Model for Robust Automatic Modulation Recognition. IEEE Netw. Lett. 2021, 3, 47–51. [Google Scholar] [CrossRef]
Hermawan, A.P.; Ginanjar, R.R.; Kim, D.S.; Lee, J.M. CNN-Based Automatic Modulation Classification for Beyond 5G Communications. IEEE Commun. Lett. 2020, 24, 1038–1041. [Google Scholar] [CrossRef]
Zeng, Y.; Zhang, M.; Han, F.; Gong, Y.; Zhang, J. Spectrum Analysis and Convolutional Neural Network for Automatic Modulation Recognition. IEEE Wirel. Commun. Lett. 2019, 8, 929–932. [Google Scholar] [CrossRef]

Figure 1. Schematic diagrams of Complex-Valued Depthwise Separable Convolution Block (CBlock). (a) The structure of the CBlock; (b) The structure of the one-dimensional complex-valued depthwise separable convolution (CDSC1d).

Figure 2. The structure of Complex-Valued Spatial and Channel Reconstruction Convolution (CSCConv).

Figure 3. The structure of the proposed LDCVNN.

Figure 4. The positions of CSCConv.

Figure 5. Effect of the repetition counts of the first CBlock.

Figure 6. T-SNE Visualization of Model Feature Distributions on RML2016.10a at SNR = −6 dB.

Figure 7. T-SNE Visualization of Model Feature Distributions on RML2016.10a at SNR = 2 dB.

Table 1. AMR Open-source Datasets.

Dataset	RML2016.10a [36]	RML2016.10b [37]	RML2018.01a [38]	HisarMod2019.1 [39]
Details	classes *: 11 SNR: −20:2:18 dim: 2 × 128 size: 220 K	classes *: 10 SNR: −20:2:18 dim: 2 × 128 size: 1.2 M	classes *: 24 SNR: −20:2:30 dim: 2 × 1024 size: 2.56 M	classes *: 26 SNR: −20:2:18 dim: 2 × 1024 size: 780 K

* classes is the number of modulation types.

Table 2. LDCVNN Layer Details with Varying Input Lengths. (L1 = 128, L2 = 1024).

Layer	Input Shape		Kernel		Stride	Output Shape
Layer	L1	L2	L1	L2	-	L1	L2
wFM	[2,1,1,128]	[2,1,1,1024]	7	9	2	[2,16,1,61]	[2,16,1,508]
tReLU	[2,16,1,61]	[2,16,1,508]	-	-	-	[2,16,1,61]	[2,16,1,508]
wFM	[2,16,1,61]	[2,16,1,508]	7	9	1	[2,16,1,55]	[2,16,1,500]
tReLU	[2,16,1,55]	[2,16,1,500]	-	-	-	[2,16,1,55]	[2,16,1,500]
CDSC1d	[2,128]	[2,1024]	7	9	2	[32,61]	[32,508]
CBN + CReLU	[32,61]	[32,508]	-	-	-	[32,61]	[32,500]
CDSC1d	[32,61]	[32,500]	7	9	2	[32,61]	[32,500]
CBN + CReLU	[32,61]	[32,500]	-	-	-	[32,55]	[32,500]
weighted feature fusion	[2,16,1,55], [32,55]	[2,16,1,500], [32,500]	-	-	-	[32,55]	[32,500]
CBlock(Rep = 5)	[32,55]	[32,500]	3	3	2	[48,28]	[48,250]
CBlock(Rep = 1)	[48,28]	[48,250]	1	1	1	[48,28]	[48,250]
CAP	[48,28]	[48,250]	-	-	-	[48,1]	[48,1]
CFC	[48]	[48]	-	-	-	[classes *]	[classes *]

* classes is the number of modulation types.

Table 3. Main architectures of the compared models.

Model	Main Structure	Network Type
CNN2 [39]	CNN + DNN	RVNN
CLDNN [12]	CNN + LSTM + DNN	RVNN
CGDNN [40]	CNN + GRU + DNN	RVNN
ResNet [12]	ResNet	RVNN
MCLDNN [11]	CNN + LSTM + Multi-channel	RVNN
ICAMC [41]	CNN + Gaussian noise	RVNN
SCNN [42]	CNN	RVNN
PET-CGDNN [25]	CNN + GRU + DNN	PET + RVNN
ULCNN [32]	CV-CNN + GRU +DNN	DA + RVNN (with CV-CNN)
CDSN [19]	CV-CNN + CV-DNN	CVNN
CSDNN [33]	CV-CNN + Residual Connection + CDSC + CV-DNN	CVNN

Table 4. Effect of placing CSCConv at different positions.

Position	Capacity *	Average Accuracy
①	9025	62.41%
②	9977	61.40%
③	9977	61.43%