1. Introduction
Unmanned Aerial Vehicles (UAVs) are commonly referred to as drones, and have a wide variety of both military and civilian applications [
1]. They are used in advertising, transportation, firefighting, search and rescue operations, traffic monitoring, and atmospheric studies in the civilian sector; however, they are mostly used for reconnaissance in the military. Applications and usage of drones are continuously expanding, and this poses security risks. Therefore, it is critical to determine the existence and specific role and application of each drone. Although they are not sufficiently dependable, conventional techniques, including the use of sensors, acoustic signals, and radars, are used for this purpose [
2]. This study reviews the use of radar, electro-optical sensors, thermal sensors, and acoustic sensors for radio frequency (RF)-based drone detection and classification. This section looks at traditional drone detection research first, followed by RF-based studies. One typical instrument for spotting flying vehicles is a radar sensor. It is not impacted by bad weather and perceives long-range operation compared to other sensors [
3]. Nevertheless, it falls short when it comes to identifying small, slowly moving objects like drones [
4]. To classify surveillance radar data as birds and UAVs, a probabilistic motion model has been developed [
5]. Additionally, the technique in [
6] applies UAV classification utilizing Linear Frequency Modulated Continuous Wave (LFMCW) 2D surveillance radar data. According to a different study, radar sensors are generally dependable for detecting drones, notwithstanding their inefficiency in categorization [
7].
On the other hand, optical sensors are an all-purpose tool for image processing. Images were first taken with optical cameras for the purpose of classifying UAVs. These images are then classified using Deep Neural Network (DNN)-based image processing techniques like Faster-RCNN [
8], VGG-16 [
9], and ZF-net [
10], or classical methods like Histogram of Gradients (HOG) using optical cameras [
11]. High-resolution cameras are necessary for all of these techniques to effectively distinguish drones from background photos. Furthermore, it could be challenging to tell UAVs apart from tiny things like birds [
7]. As an alternate strategy to address this issue, the use of thermal sensors is recommended. The heat that is released by things that optical sensors are unable to see can be captured and detected by them. Panoramic thermal cameras are used in [
12] to detect drones because they work well for nighttime surveillance, and can take pictures in inclement weather, such as rain, snow, fog, etc. Nevertheless, a significant drawback is their susceptibility to moisture [
7].
Acoustic sensors are another kind of sensor utilized for drone detection, similar to radar and imaging sensors. These low-cost sensors have the potential to differentiate UAV noises from other noise sources. Nevertheless, noise has a significant impact on these sensors [
7]. For instance, radar and acoustic sensors are combined in [
13] to identify UAV rotor types. Similarly, the authors of [
14] propose a Green and Low-Resource Specific Emitter Identification method (GLR-SEI) using complex networks and Fisher pruning. Evaluated with real-world ADS-B data, their method showed promising results. In [
15], the authors introduced a novel Long-Tailed Specific Emitter Identification (LT-SEI) method using Decoupled Representation (DR) learning. Evaluated with real-world ADS-B data, their method demonstrated superior long-tail recognition performance. Furthermore, UAVs are detected using high-resolution cameras [
16]. Using acoustic sensors in the K-Nearest Neighborhood (KNN) approach, real-time drone identification and tracking are investigated [
17]. However, according to reports, UAV identification becomes challenging if the acoustic sensor is placed more than 150 m away from the UAV [
18]. Drone detection has lately made use of drone-emitted radio frequency signals as an alternate method [
19]. For that reason, a sizable data set called DroneRF was assembled from various drones [
19]. These data set is used to detect and classify drones in the surrounding area, by different techniques.
UAV detection and classification were performed using the DNN method in recent research, keeping in view the limitations of conventional methods [
20,
21]. Drone signal detection and classification were studied using the DroneRF data set and the 1-DCNN method [
22,
23], along with 10-fold cross-validation. The work in [
24] exclusively uses low-band signals, even though these studies use both low-band and high-band RF signals. For that, they also employed the XGBoost algorithm. To distinguish between various drones, the authors of [
25] suggested a Deep Complex valued Convolutional Neural Network (DC–CNN)-based RF fingerprinting. They used nine alternative algorithm models and two distinct RF drone signal datasets to perform drone detection and classification. In [
26], drone detection performance is examined in terms of RF signal source distance through the development of a drone detection mechanism. Five distinct drones are used in [
27] to estimate the loads carried by drones up to a 200-m distance.
Drone classification is not just restricted to security and defense, as mentioned in previous paragraphs. There are plenty of applications that have emerged with the advancements in future 6G technology [
28]. Drone signal classification in 6G networks offers multifaceted benefits. By accurately discerning between different drone signals, 6G networks can optimize the allocation of resources like bandwidth and power, ensuring efficient communication for drones [
29]. This classification capability also bolsters security measures by distinguishing between legitimate drone activities and potential threats, enhancing airspace safety [
30]. Moreover, it enables dynamic spectrum sharing, optimizing spectrum utilization, and minimizing interference, thus boosting overall network capacity and performance. Additionally, with the ability to adapt mobility management strategies based on signal classification, 6G networks ensure seamless connectivity and efficient handover between ground-based and aerial networks [
31]. Furthermore, tailored Quality of Service (QoS) guarantees and optimizations for diverse drone applications can be provided, fostering the widespread adoption of drone-based services across various sectors in 6G networks [
32].
Recently, deep ensemble learning-based approaches have also gained significant attention from the research community. The technique has been widely applied, and demonstrated promising results. These types of techniques not only demonstrated their potential, but also their strength in handling and finding complex patterns in the data. In [
33], the authors introduced a novel framework utilizing neural network-based concepts and reduced feature vectors, combined with multiple machine learning techniques, to accurately classify mitotic and non-mitotic cells in breast cancer histology images, outperforming existing methods in accuracy and efficiency. Similarly, in [
34], the authors propose DCNN-4mC, a neural network-based tool, for accurate identification of DNA N4-methylcytosine (4mC) sites, achieving superior performance across multiple species datasets compared to existing computational tools. In [
35], the authors introduce m6A-NeuralTool, a computational model for efficient identification of N6-methyladenosine (m6A) sites, achieving superior accuracy compared to existing models across multiple species datasets, facilitating rapid and accurate identification of m6A modifications for biomedical research and product development. In [
36], the authors present DL-m6A, a novel deep learning-based tool for identifying N6-methyladenosine (m6A) sites in mammals, demonstrating superior performance compared to existing methods across tissue-specific and full transcript datasets. This tool offers enhanced accuracy and contextual feature representation, providing valuable insights for biology experts, and is accessible via a freely available web server. In [
37], the authors introduce MCSE-enhancer, a multi-classifier stacked ensemble model, which effectively identifies enhancers by integrating experimental and computational approaches. By leveraging physiochemical properties as feature descriptors and employing a stacked classifier, MCSE-enhancer outperforms previous techniques, achieving an accuracy of 81.5%, marking a notable improvement over existing models. Despite showing its strengths in machine vision, image processing, and microbiology, there is very limited research in the field of RF signal classification, especially for drone classification. This study uses the drones benchmark dataset [
38] to improve the classification resilience and accuracy. More precisely, the major contributions of this manuscript are:
Proposing a novel feature extraction technique (manual signal processing-based) to complement automatic feature extraction by deep network layers (convolution and Long Short-Term Memory [LSTM] layers).
Implementing both automatic and manual feature fusion within the Composite Ensemble Learning (CEL) framework for enhanced pattern extraction.
Comparing the proposed method with the state-of-the-art deep learning techniques to demonstrate its efficacy.
The rest of the paper is structured as follows. The methodology is presented in
Section 2 of this article. The used data set is explained in
Section 3. The experiments, results, and analysis of the study are presented in
Section 4. The discussion is in
Section 5, and finally, the conclusion of the paper is presented in
Section 6.
2. Proposed Methodology
Figure 1 is the main block diagram that demonstrates the proposed methodology. The process starts with multiple IQ signals that are multiplexed into an ordered series of real numbers, thereby making them analytically usable. This interleaving step becomes important for translating the intricate IQ data set into a structure more compatible with the deep learning-related processes.
The interleaved IQ signals are then passed through the denoise block, which executes signal filtering to enhance the signal quality and reduce variance due to the presence of background noise. This is very crucial, as it helps in increasing the efficiency of the subsequent feature extraction. For signal denoising, discrete wavelets are employed, which are very useful in denoising through multiresolution analysis, since they analyze the noise and the signal attributes at various levels of resolution. They are used to perform localized filtering, both in the time and frequency domains, which could be of immense help in correctly filtering out noises that are usually embedded, while at the same time preserving the important aspects of the signal. Also worth elaborating is the fact that wavelet-based thresholding techniques are adjusted to particular features of noise, which contributes to the improvement of the signal quality. After the denoising, the signal flows into three parallel streams for feature extraction. The first derivative subdivision determines the first derivative of the denoised signals that contain rate change data indicating large fluctuations. The second derivative branch performs the second derivative to find accelerations and other fine details of the signal in addition to the slope, while CNN and LSTM Branch use feature extraction based on CNNs and LSTMs, CNNs are designed to extract spatial features and LSTMs are used to address temporal information allowing the model to harness the information in sequential data. Features from the three branches are combined into a composite feature vector which consists of both manually engineered and automatically generated features. It is then passed through the dense layers and the final sigmoid layer to obtain the resultant composite vector. This is followed by the dense layers that add more depth to the features, and the sigmoid layer that outputs a probability for each drone class, resulting in the classification decision.
2.1. Signal Model and Denoising
The first step is reading time domain signals from drones using a benchmark dataset [
38]. This is illustrated in
Figure 2a. Signals acquired with complex IQs cannot be processed directly in the network because only real values are received at the input layer. First, the complex value is decomposed into real and imaginary values, and then the samples are interleaved to form an array of real values. This is represented by (
1)
where
is the measured noisy signal,
is the original signal, and
is the Gaussian noise with zero mean. The equivalent representation of (
1) is given by (
2)
The denoising of the signal is done by calculating wavelet coefficients, represented by (
3)
The threshold values are determined after obtaining the wavelet coefficients. We apply hard thresholding due to its simplicity, where the absolute values of all wavelet coefficients below the threshold are set to zero. The hard thresholding function is defined as (
4)
where
and
The coefficients in the frequency domain are represented by (
4). To convert back into the time domain, the inverse discrete wavelet transform is computed, obtaining (
7), as illustrated in
Figure 2b:
2.2. Feature Extraction
The processed signal in (
7) is then split into two branches. One branch is for manual feature extraction, and the other stacked branch is in parallel for automatic feature extraction. The first and second derivatives of a time-domain signal play crucial roles in pattern extraction and analysis across diverse fields. In signal processing, the first derivative offers insights into the rate of change of the signal at each point, effectively capturing the slope or gradient of the waveform. The first derivative aids in edge detection, facilitating the delineation of boundaries between regions of interest.
The second derivative provides additional depth to pattern analysis by revealing curvature-related attributes of the time-domain signal. It denotes the rate of change of the slope and effectively highlights concave and convex regions within the waveform. This information aids in detecting inflection points, where the curvature changes direction, signifying potential shifts or transitions in the underlying pattern. Moreover, the second derivative enhances the robustness of pattern extraction algorithms by facilitating noise reduction. Focusing on the curvature of the signal helps filter out high-frequency noise, thereby improving the SNR and enhancing the accuracy of pattern recognition of drone signals.
Then, we fuse both feature vectors to form our final manually crafted feature vector, represented as (
10).
In the domain of pattern recognition and automatic feature extraction, convolutional layers are instrumental due to their proficiency in discerning spatial patterns within the input data. Through convolution operations employing trainable filters, CNNs adeptly detect intricate features like edges, textures, and shapes in images or sequences. By stacking multiple convolutional layers, CNNs progressively extract hierarchical representations, with each layer capturing increasingly abstract and complex patterns. This hierarchical approach facilitates the identification of high-level features by building upon lower-level representations, thereby enabling CNNs to recognize diverse and nuanced patterns present in the data. Additionally, CNNs have a feature called translation invariance, which enables them to recognize patterns regardless of where they are located in the input. This is an important feature for tasks like object detection and image localization.
Long Short-Term Memory (LSTM) layers, on the other hand, are extremely important for modeling sequential data and deriving temporal dependencies. LSTM networks are superior to typical Recurrent Neural Networks (RNNs) at capturing long-range dependencies and maintaining information over lengthy time steps. LSTM networks may efficiently describe complicated sequential patterns while addressing problems such as the vanishing gradient problem that arises in deep recurrent designs by integrating memory cells and gating mechanisms. LSTM networks excel at comprehending and exploiting temporal relationships within sequential data, enabling them to discern subtle patterns and dependencies over extended periods. This proficiency is particularly advantageous in SEI applications, where understanding the temporal dynamics of signals is essential for accurate identification and classification. When combined with CNN layers, LSTM networks further enhance the efficacy of pattern recognition and feature extraction in drone signal classification tasks. CNN layers excel at extracting spatial features from signal data, while LSTM layers excel at modeling temporal dependencies. By leveraging the complementary strengths of both architectures, neural network models can effectively capture spatial and temporal characteristics inherent in emitter signals. SEI systems can reliably and precisely identify emitters in a variety of surroundings and signal circumstances thanks to this synergistic integration. As a result, the combination of CNN and LSTM layers provides a strong foundation for improving passive drone identification. The final fused composite feature vector consists of the manual and automatic feature vectors and can be represented by (
11).
where
is the automatic feature extracted by the convolutional and LSTM layer of the other branch.
2.3. Architecture
Our suggested network architecture, depicted in
Figure 3, is intended to handle two different kinds of input data: manually retrieved features from the raw signal and raw signal data for automatic feature extraction.
A one-dimensional convolutional layer with 64 filters and a kernel size of 5 is applied to the raw signal input. After this layer has extracted features from the raw signal data, the output is down-sampled using a max-pooling layer to reduce spatial dimensions while preserving the most important information. Conversely, the
order differential characteristics are sent directly to the flattening layer. Following flattening, the input is fed into a dense layer with 64 units and an activation function of the Rectified Linear Unit (ReLU), which extracts features from a manually engineered feature vector. Following that, the outputs from the dense layer and the convolutional layer are combined. The temporal sequence of the data are preserved by passing this concatenated output via an LSTM layer with 64 units, with return sequences set to True. To extract the most pertinent characteristics and reduce the dimensionality of the data, a global max-pooling layer is applied. To obtain the final output predictions, the concatenated and pooled output is then run through a second dense layer that uses a softmax activation function. The sparse categorical cross-entropy loss function, displayed in (
12), is used to create the model.
where
is the expected probability that sample
i belongs to class
j,
N is the number of samples in the batch,
C is the number of classes, and
is a binary indicator of whether class
j is the proper classification for sample
i. Using the supplied labels, training accuracy metrics, and the Adam optimizer, the model is trained for 50 epochs and 64 batch sizes on the unprocessed signal data.
2.4. Computational Complexity
To calculate the Floating-Point Operations (FLOPs) for the network, we need to consider the number of arithmetic operations performed during the forward pass. Here is how we calculate it:
Conv1D Layer (conv1):
MaxPooling1D layer (pool1):
Flatten layer (flatten2):
Dense layer (dense1):
LSTM layer (lstm1):
GlobalMaxPooling1D layer (lstm1_pooled):
Concatenate layer (merged):
Dense layer (dense1):
Total complexity:
4. Experiments, Results, and Analysis
4.1. Loss Function and Curves
The curve depicted in
Figure 4 provides insightful observations regarding the training and generalization behavior of the model. Initially, the training score curve exhibits fluctuations, indicating active learning from the training data. However, beyond epoch 120, the curve stabilizes, suggesting that the model has effectively captured the underlying patterns within the training data.
Meanwhile, the test score curve closely tracks the training score during the early epochs, demonstrating the model’s ability to generalize to unseen data. As training progresses, the test score curve converges and aligns closely with the training curve, indicating stable generalization performance. This convergence post-epoch 50 signifies that the model has struck a balance between minimizing the loss of the training data and effectively generalizing to new, unseen instances.
4.2. SNR vs. Accuracy
This section covers the performance-based analysis of the proposed method on open-source drone remote control radio frequency signal dataset [
38].
A detailed and comprehensive analysis of each of the following methods has been carried out:
Time Frequency Convolutional Neural Networks—Short Time-Frequency Transform (TF-CNN (STFT)) [
39].
Time Frequency Convolutional Neural Networks—Discrete Wavelet Transform (TF-CNN (DWT)) [
39].
Time domain Inphase and Quadrature Convolutional Neural Networks—(T-CNN) [
40].
Dense Network (DenseNet-18) [
41].
Visual Geometry Group (VGG16) [
42]
Figure 5 compares the classification accuracy and the model’s performance with different SNR levels of the dataset and, thereby, proves the efficiency of the new CEL approach in the drone dataset compared to benchmark deep learning algorithms. The objective of this study is therefore to assess the performance of the proposed CEL method against others under different SNR levels. The CEL method yields a higher recognition percentage than the foregoing methods, especially at a lower SNR of 5dB and below. In situations where other techniques do not allow extracting relevant features from RF fingerprints due to low SNR, the CEL method demonstrated better results that are more stable and less sensitive to increased noise levels.
The significant improvements in the results are mainly attributed to the introduction of denoising procedures within the deep learning system. The wavelet-based denoising is chosen because it performs well with noise present in different frequency bands when the signal is decomposed to different scales. This method unaltered the major signal features while restraining only the noise, which makes it more effective in environments with low SNR and helps to improve the model’s performance. Therefore, we can state that the CEL model can achieve high accuracy as compared to the benchmark techniques, even in high noise conditions.
4.3. Batch Size vs. Performance Metrics
In
Figure 6, we illustrate the influence of batch size on classification accuracy across different SNR values. Our analysis revealed that when the batch size exceeds 15, the achieved accuracy consistently remains above 97.5%. Furthermore, as the SNR surpasses 10 dB, we observed a sharp increase in accuracy, reaching approximately 99%.
These findings underscore the importance of batch size selection in training neural networks for classification tasks, particularly in scenarios with varying levels of noise. Larger batch sizes tend to yield higher accuracies, with notable improvements observed at higher SNR levels. This suggests that larger batches facilitate more stable updates, leading to enhanced model performance.
Table 2 provides the model’s performance for different values of batch size. The evaluation is based on four key metrics, including F1-score, precision, recall, and accuracy. In general, an increase in the batch size leads to the improvement of the model resulting in the growth of all the values. This implies that a larger batch size helps in improving the model’s generalization and, therefore, accuracy. However, the additional analysis, involving statistical tests and graphical analysis, would be required to state more definite conclusions on the influence of the batch size used for model building.
4.4. Confusion Matrix
Figure 7 displays the confusion matrix generated by the proposed method, providing valuable insights into its classification performance. The diagonal entries of the matrix represent the number of correct classifications for each class, showcasing the model’s accuracy on individual categories. Meanwhile, the upper and lower triangulation matrices indicate instances of misclassification, offering a clear view of the errors made by the method.
Through rigorous evaluation with 100 iterations of the drone identification method, we observed promising results. When the SNR is greater than 5 dB, the average accuracy of the proposed method surpasses an impressive 98%. This signifies the method’s effectiveness in correctly identifying drone signals under favorable SNR conditions that are above 7 dB SNR. The high accuracy achieved in these experiments highlights the potential of the proposed approach for real-world applications, where accurate identification of drone signals is crucial for communication security and management.
4.5. 3D Scatter Plot
In
Figure 8, we present a three-dimensional scatter plot featuring 15 distinct classes and three Principal Component Analysis (PCA) features. Each point in this plot represents an instance from one of the 15 classes, positioned in a three-dimensional space defined by the PCA features. These features serve as axes, with each one representing a principal component, a linear combination of the original features selected to capture maximum data variance.
Notably, the plot reveals clear and distinct regions for each class, showcasing the effective separation of classes within the three-dimensional feature space. This indicates that the PCA features successfully capture the inherent structure of the data, facilitating precise differentiation between classes as driven by the proposed methodology. The reference to “excellent bifurcation capabilities” highlights the discriminative power of both manually crafted and automatically derived features within our methodology. This distinction enables the method to effectively bifurcate classes, demonstrating its ability to discern essential characteristics and patterns that set each class apart. As a result, the scatter plot displays well-defined separation boundaries, underscoring the efficacy of our novel approach.
4.6. Box Plot
We examine a box plot in
Figure 9 that illustrates the distribution of data points among four Principal Component Analysis (PCA) characteristics in 15 different classes. The plot’s boxes, each of which represents a distinct class, show how the data are distributed among the four PCA features within each class. The middle line in each box represents the median value; the box itself extends from the first to the third quartile (Q3) of the data distribution, providing information on the interquartile range. Moreover, the whiskers extend to outline the data range, eliminating any outliers, which are shown as single dots.
This box plot not only facilitates an understanding of the variability and spread of data within each class, but also offers a glimpse into the effectiveness of our fingerprint features. By analyzing the distribution and spread of data across the PCA features, we gain valuable insights into the distinctiveness of each class and the discriminatory power of our selected features.
Moreover, this plot serves as a complement to the earlier visualization in
Figure 8, which showcased the distinct regions occupied by different classes in a three-dimensional space. Together, these visualizations provide a comprehensive understanding of the relationships between the PCA features, class distributions, and the overall effectiveness of our proposed methodology in capturing and distinguishing between different classes.
5. Discussion
In this study, we introduce a novel approach for the classification of drones based on RF signals, with the objective of enhancing the accuracy of drone identification under practical conditions. This aligns with the evolving trends in drone technology and its integration into future 6G networks. Our methodology effectively combines handcrafted features with automatic feature extraction through deep learning, amalgamating these features into a single vector that encapsulates a comprehensive array of discriminant information. Consequently, our work synergizes conventional signal processing techniques with advanced deep learning architectures to address the challenges of drone classification, thereby bolstering wireless security in 6G networks.
A pivotal aspect of our approach is its applicability under spread-spectrum techniques, including frequency-hopping spread spectrum. These techniques can distort the original RF fingerprint, potentially misrepresenting the true signal characteristics if the capture period is shorter than the duration of frequency hopping or code spreading. To mitigate this issue, it is imperative that the capture period is sufficiently long to encompass at least two cycles of the modulation. This ensures the preservation of relevant signal variations essential for accurately capturing and characterizing the RF fingerprint. Future research will delve deeper into this aspect.
Our experimental results and the evaluation of various metrics underscore the superiority of the proposed methodology. The trained model demonstrates high accuracy, particularly under conditions of high SNRs. Additionally, our approach exhibits robustness to noise across all SNR levels, including very low SNRs, due to the incorporation of signal denoising techniques within the framework. These findings affirm the efficacy of our method in addressing drone identification challenges across diverse environments and signal conditions, consistent with the requirements of 6G technology.
Moreover, the employed methodology utilizes three-dimensional scatter and box plots, which provide insightful visualizations for estimating the specificity of classes within the feature space. Such representations validate the capability of our approach to accurately classify drone signals based on their RF characteristics and to distinctly delineate different classes.
In summary, this study presents an innovative and forward-looking model for the categorization of drones based on RF signals, contributing significant advancements in the passive identification of drones. This research underscores the critical importance of advanced drone identification techniques in ensuring secure and regulated drone operations within the context of the 6G era.