1. Introduction
The increase in communication demands and the shortage of spectrum resources have caused the cognitive radio (CR) and multiple-input multiple-output (MIMO) techniques to be implemented in wireless communication systems. As one of the essential steps of CR, modulation classification (MC) is widely applied in both civil and military applications, such as spectrum surveillance, electronic surveillance, electronic warfare, and network control and management [
1]. It improves radio spectrum utilization and enables intelligent decision-making for context-aware autonomous wireless spectrum monitoring systems [
2]. However, most of the existing MC methods are focused on single-input single-output (SISO) scenarios, which cannot be directly applied when multiple transmitting antennas are equipped at the transceivers [
3]. Therefore, it is crucial to research the performance of the MC method for MIMO communication systems.
Traditional MC approaches for the SISO systems discussed in the literature can be classified into two main categories: likelihood-based (LB) approaches and feature-based (FB) approaches [
4]. The LB approaches can theoretically achieve optimal performance, as they compute the likelihood functions of the different modulated signals to maximize the classification accuracy. However, they have a very high computational complexity and require prior information, such as the channel coefficient [
5,
6]. Hence, the LB approaches cannot be directly applied in fast modulation classification and blind modulation classification (BMC). By contrast, the FB approaches cannot obtain the optimal result, but they have lower computational complexity and do not require prior information [
7]. The FB methods usually include two steps: feature extraction and classifier design. The higher-order statistics, instantaneous statistics, and other features are calculated in the feature extraction. Then, the popular classification methods, such as decision trees [
8], support vector machine [
9,
10], and artificial neural networks (ANNs) [
11,
12], are adopted as the classifiers. With the rapid rise of artificial intelligence and the emerging requirements of intelligent wireless communication, deep-learning-based approaches are now becoming widely studied and used in different aspects of wireless communication, such as the transceiver design at the physical layer [
13] and BMC problems [
7,
14,
15,
16,
17,
18]. More specifically, Rajendran et al. [
15] proposed a new data-driven model for BMC based on long short-term memory (LSTM), which learned the features from the time-domain amplitude and phase information of the modulation schemes and yielded an average classification accuracy close to 90% for signal-to-noise ratios (SNRs) from 0 to 20 dB. In addition, the time–frequency features have also been applied to the deep-learning-based BMC problem. In [
7], the Resnet model was adopted as the classifier, and the authors presented an approach for fusing the red–green–blue (RGB) spectrogram images and the handcrafted features of the modulated signals to obtain more discriminating features. The experimental results demonstrated that the proposed scheme has a superior performance. The latest research indicates that deep-learning-based MC methods achieve higher accuracy than the traditional LB and FB approaches for SISO systems [
7,
19].
Though right now, multiple antennas have been adopted by many mobile communication systems, our investigation and survey indicate that the MC for MIMO systems is less discussed than for the SISO system. Recent studies about this topic are summarized in
Table 1, and they are characterized and distinguished by the scenario, method, features, modulation classes, SNR, and accuracy. From this table, we can note that most of the proposed approaches are FB-based; concretely, the authors in [
20,
21] proposed similar methods for the MC of MIMO transceiver systems that calculate the higher-order statistical moments and cumulants of the received signals. Then, an artificial neural network is employed to classify the modulation types. In [
22], a clustering classifier based on centroid reconstruction is presented to identify the modulation scheme with an unknown channel matrix and noise variance in MIMO systems. The simulation results showed that their algorithm could obtain excellent performance, even at low SNRs and with a very short observation interval. To deal with the BMC problem and the two major constraints in the railway transmission environment (i.e., the high speeds and impulsive nature of the noise), Kharbech et al. [
23] proposed a feature-based process of blind identification that includes three parts: impulsive noise mitigation, feature extraction, and classification. By analyzing the correlation functions of the received signals for certain modulation formats, Mohamed et al. resolved the BMC problem in single- and multiple-antenna systems operating over frequency-selective channels in [
24] and the BMC problem in the Alamouti space-time block code (STBC) System [
25].
More interesting studies about this topic concern the modulation classification for MIMO orthogonal frequency division multiplexing (OFDM) systems, as the MIMO OFDM has been widely adopted by many commercial standards, such as LTE and the next Wifi. For this problem, different approaches, such as the approximate Bayesian inference method, the Gibbs sampling-based method, and the joint independent component analysis (ICA) with support vector machines (SVMs) method, were proposed by Liu et al. in [
26,
27,
28,
29] for the MIMO OFDM signals. However, all these studies are traditional feature-based or likelihood-based approaches.
In fact, from the aforementioned MIMO-based systems, we note that it is difficult to directly apply deep learning to the raw in-phase and quadrature (IQ) data or the time-domain amplitude and phase data, since the overlapped signals at the receiver of the MIMO system destroy the statistical features [
30]. Hence, it is crucial to extract the distinguishable features or convert the raw signals for BMC in MIMO systems. The time–frequency analysis methods can jointly analyze the time-domain and frequency-domain features of signals, and the different modulation types have distinct time-domain and frequency-domain features. Hence, in this paper, in order to overcome the effect of the overlapping signals at the receiver, we analyze the time–frequency features of the modulated signals to resolve the BMC problem in MIMO systems. First, the time–frequency analysis method based on the windowed short-time Fourier transform (STFT) [
31] is employed to generate the spectrum of the MIMO-modulated signals. Then, the spectrum with different time windows is converted to a grayscale image, and this grayscale image is further transferred to the RGB spectrogram image [
32]. Second, a fine-tuned AlexNet-based convolutional neural network (CNN) model is introduced to learn the features from the RGB spectrogram images. The modulation scheme of each receiving stream among the receiving MIMO signals is identified in this stage. Finally, the previously produced decisions are merged to form the final result. In addition, this method can be simplified to directly apply to SISO systems. The simulation results show that the proposed method achieves a superior performance in low-SNR scenarios for both the MIMO system and the SISO system. Particularly, the proposed method obtains −80.42% accuracy at SNR = −4 dB for the MIMO network, which is the highest accuracy compared with the existing works, as mentioned in
Table 1.
This paper is organized as follows. The signal models of the MIMO and SISO systems and the STFT-based time–frequency analysis method are introduced in
Section 2.
Section 3 presents the BMC scheme for the MIMO systems, including the proposed CNN model and the decision method. Then, the RGB spectrogram image and the classification performance in different scenarios are analyzed in
Section 4. Finally, conclusions are drawn in
Section 5.
3. Proposed BMC Scheme
In this section, a time–frequency analysis is conducted and a deep-learning-based BMC scheme is proposed. The block diagram of the proposed BMC scheme is shown in
Figure 2, which shows four modules: signal generator, time–frequency analysis, CNN classifier, and decision fusion. The signal generator outputs the modulated signals
(with the same modulation type) for each transmitting antenna [
20]. This process was described in
Section 2.1 and
Section 2.2. Then, the time–frequency analysis is performed for the received signal
for each receiving antenna, which generates the RGB spectrogram image
(partially described in
Section 2.3). Next, the AlexNet-based CNN classifier is trained based on a number of RGB spectrogram images in the training stage, and the modulation type of each received signal
is identified in the test stage. Finally, the decisions of different signal branches are combined by the decision fusion module for the final decision. In the next three sections, we will illustrate in detail the procedures of the time–frequency analysis, CNN-based classifier, and decision fusion.
3.1. Time–Frequency Analysis for Received Signals
The flow chart of STFT-based time–frequency analysis is shown in
Figure 1. First, using the amplitude-shift keying (ASK) signal as an example, the received signal
is divided into
frames by the hamming window
with length
, the details of which are described in Equations (
4)–(
7). Second, the spectrum of the windowed signal is obtained by its Fourier transform. Third, by normalizing and combining the linear spectral magnitude vector, the grayscale spectrogram image
is obtained (the size of the related grayscale matrix is
). Finally, to accommodate the input layer of AlexNet and improve the distinguishability of the spectrogram image, the grayscale spectrogram image is mapped onto the RGB spectrogram image
(the size of the related RGB matrix is
). Then, the RGB matrix is cut or padded into
before being fed into the CNN.
3.2. AlexNet-Based CNN Classifier
In our proposed BMC scheme, AlexNet, which is utilized for object detection [
39] and was the winner of the 2012 ImageNet Large-Scale Visual Recognition Challenge (ILSVRC), is adopted as the classifier. The network architecture of AlexNet is shown in
Figure 3 [
40].
As depicted in
Figure 3, AlexNet contains eight layers; the first five are convolutional and the remaining three are fully connected. The output of the last fully connected layer is fed to a 1000-way softmax that produces a distribution over the 1000 class labels [
39]. AlexNet uses the rectified linear unit (ReLU) as the activation function of the CNN. In practice, the dropout and max pooling techniques are applied to the CNN. AlexNet has an excellent performance in visual tracking and object detection due to its capability in sensing the pattern position on the image. Therefore, considering that the spectrogram image has rich pattern position information, it is sensible to choose AlexNet as the classifier network.
The motivation for transfer learning comes from the fact that people can intelligently apply knowledge learned previously to solve new problems faster or with better solutions [
41]. In order to utilize the pretrained AlexNet, transfer learning is employed to fine-tune AlexNet and accelerate the training process. The last layer of the pretrained AlexNet network in
Figure 3 is configured with 1000 classes, and this layer must be fine-tuned to accommodate the new classification task. First, all layers except the last layer are extracted; then, the last layer is replaced with a new fully connected layer that contains eight neurons (i.e., the number of modulation categories in this paper). In the end, the parameters of the activation layer and the classification output layer are set to accommodate the new classification task. Therefore, with such fine-tuning, the output of AlexNet can precisely perform the modulation classification of the received signals. The training hyperparameters are listed in
Table 2; concretely, the minimum of the batch size is set to 10, the maximum of the training epoch is set to 10, and the learning rate is set to
, respectively.
3.3. Decision Fusion
Since there are multiple antennas at the receiver of the MIMO network, it is possible for each branch to cooperate with the others to achieve higher identification reliability [
20]. As shown in
Figure 2, the
received signals are classified independently because the influences of signal overlapping, interchannel noise, and random phase shifting may cause each received signal to be identified as a different modulation type. This may lead to incorrect identification results. The decision fusion among all the receiving antennas aims to improve the average classification accuracy. The decision vector of the
i-th received signal,
, can be defined as
where
K is the number of modulation types,
is the probability of identifying the received signal
as modulation type
k, and
meets the following condition:
Therefore, the modulation type
of the received signal
is the modulation type that has the maximum probability. The modulation type with maximum probability can be defined as a set
as follows:
Note that there are two cases for the above equations: (1) The maximum probability is unique, i.e., , and the modulation type of the i-th received signal is the element of ; (2) the maximum probability is not unique, i.e., , and the modulation type of the i-th received signal is randomly chosen from .
Hence, the decision fusion can be converted into the problem of deciding the final modulation type
m according to
,
. The fusion rule at the fusion module can be OR, AND, or majority rule, which can be generalized as the “n-out-of-
rule” [
42]. That is, a certain modulation scheme is identified when a classifier is decided on among the
classifiers. Take the
as an example; the possible modulation types form the set
{2PSK, 4PSK, 8PSK}. If there are more than three classifiers, the modulation type is identified as 2PSK (4PSK or 8PSK); then, the final modulation type is 2PSK (4PSK or 8PSK). If there are two classifiers, the modulation type is identified as 2PSK, and the modulation types of the other two classifiers are identified as 4PSK and 8PSK, respectively; then the final decision is 2PSK. In addition, if the two classifiers identify the modulation type as 2PSK and the other two classifiers identify the modulation type as 4PSK (or 8PSK), the decision fusion center will randomly choose a modulation type between 2PSK and 4PSK (or 8PSK) as the final result.
4. Performance Analysis
In this section, the proposed time–frequency analysis and deep-learning-based BMC algorithm are tested under different modulation schemes for both the SISO and MIMO scenarios. Specifically, the channel coefficient
h randomly takes values from
over multiple symbols, and the AWGNs with different SNRs are added into the modulated signals for both the SISO and MIMO scenarios. For the MIMO system, random phase shifts within one symbol interval are considered for the MIMO scenario. Without any other statements, the MIMO antenna configurations are
and
. In addition, the 2ASK, 2FSK, 2PSK, 4ASK, 4FSK, 4PSK, 8PSK, and 16QAM modulation schemes are considered, unless otherwise stated. The parameters of the modulated signals are assigned as follows. The sampling frequency
is 16 KHz, the carrier frequency
is 2 KHz, the symbol rate
is 100 Hz, and the length of the original digital signal is 14 (i.e., each modulated signal contains (16,000/100) × 14 = 2240 sample points). In addition, in the training stage, 100 modulated signals for each modulation type and SNR are randomly generated for both the SISO and MIMO scenarios, in which the SNR varies from −4 to 10 dB at intervals of 2 dB [
7]. In the test stage, 100 modulated signals for each modulation type and SNR are randomly generated. All the signal samples were generated in MATLAB 2017b, and the training and testing of AlexNet are based on the MATLAB neural network toolbox. Additionally, the parameters to generate the RGB spectrogram image were set as
,
,
, and
.
We now discuss how the modulation order, SNR, and overlapping of the MIMO signals influence the RGB spectrogram image of the modulated signals. Then, the classification performance of the proposed scheme is validated for different scenarios.
4.1. RGB Spectrogram Image of the Modulated Signals
In this subsection, in order to simplify the analysis, we select only certain binary and quaternary digital signal sequences (as shown in
Figure 4) to generate the RGB spectrogram image. The binary signal
Figure 4a is used to generate the two-order modulated signals (i.e., 2ASK, 2FSK, and 2PSK), and the quaternary signal
Figure 4b is used for the four-order modulated signals (i.e., 4ASK, 4FSK, and 4PSK).
4.1.1. RGB Spectrogram Image of the Modulated Signals with Different Modulation Orders
We first evaluate how the modulation order affects the RGB spectrogram image at an SNR of 10 dB for the SISO scenario. The considered modulation schemes are ASK, FSK, and PSK, which are shown in
Figure 5. They are analyzed separately as follows.
First of all, the RGB spectrogram image is a time–frequency distribution image of the modulated signal. The horizontal axis of this image represents time and the vertical axis represents frequency. In addition, the color of the RGB spectrogram image represents the value of the normalized spectral magnitude (i.e., the values corresponding to blue and red are zero and one, respectively).
Figure 5a,b show the RGB spectrogram image of the ASK-modulated signals. The power of the ASK-modulated signals concentrates on one frequency band in the image, and the power in the image is discontinuous over time. In addition, the color in the image is blue when the digital signal sequence is at the zero level in
Figure 4, and it is red when the digital signal sequence is at a non-zero level, which corresponds to the values of the spectral magnitude. In addition, compared with the 2ASK signal, the spectral magnitude of the 4ASK signal has a larger average value (i.e., more pixels in the 4ASK RGB spectrogram image have a value of 1).
Figure 5b,e show the RGB spectrogram image of the FSK-modulated signals at an SNR of 10 dB. The spectral magnitude of the 2FSK-modulated signals has a larger value over two sub-bands, and the spectral magnitude of the 4FSK-modulated signals has a larger value over four sub-bands. For the FSK signals, the modulation order is equal to the number of modulated frequencies, which is the number of sub-bands in the RGB spectrogram image.
The RGB spectrogram images of the PSK-modulated signals are shown in
Figure 5c,f. The phase mutation of the modulated signals is captured in the RGB spectrogram images. Specifically,
Figure 4a and
Figure 5c both have the
-phase mutation in the 2PSK-modulated signal from 0 to 1 and from 1 to 0 in the binary digital signal sequences. The
-phase mutation decreases the value of the power spectral density at the modulated frequency, which appears as a “ring” in the RGB spectrogram image. Similarly, comparing
Figure 4b and
Figure 5f, the
- and
-phase mutations also partly decrease the value of the power spectral density at the modulated frequency, but they appear as a “half-ring” in the RGB spectrogram image. Therefore, modulated signals with different modulation orders have different time–frequency features, and it is reasonable to classify the modulated signals using the time–frequency analysis.
4.1.2. RGB Spectrogram Image of the Modulated Signals for the MIMO Channels
We now analyze how the MIMO channel influences the RGB spectrogram image of the modulated signals. The 2ASK, 2FSK, and 2PSK modulation schemes are discussed herein. The antenna configuration for the MIMO system is
and
; then, the random channel attenuation assigns a value from
, random phase shifts within one symbol interval are considered for the MIMO scenario, and the AWGNs with 10 dB SNRs are added into the modulated signals. In addition, a multiplexing-based transmission scheme is adopted for the MIMO system. Specifically, two transmitting antennas send two independent data streams, but with the same modulation scheme (e.g., 2ASK, 2FSK, or 2PSK). The result is shown in
Figure 6.
A comparison of
Figure 5 and
Figure 6 shows that, for all the modulated signals, the signal overlapping of the MIMO system has no effect on the power distribution of the modulated signals in the frequency domain, but the power distribution over the time domain is changed. The latter can be explained by the fact that the overlapping of different transmitted signals partly destroys the time–frequency characteristics of raw modulated signals. In spite of this, some crucial time–frequency characteristics are not destroyed by the MIMO signals overlapping, such as the “ring” that is caused by the phase mutation in the 2PSK signal (shown in
Figure 5c and
Figure 6c). Hence, the overlapping of modulated signals partially destroys the time–frequency characteristics, but some of the crucial time–frequency characteristics are still preserved in the RGB spectrogram image. Therefore, the RGB spectrogram image can still be used to identify the modulation type, even in the MIMO scenario.
4.1.3. RGB Spectrogram Image of the Modulated Signals with Different SNRs for the MIMO Channels
In this section, only the two-order modulation schemes are analyzed for different SNRs of the RGB spectrogram image for the MIMO network. For the 2ASK-modulated signals with SNR = 10 dB and SNR =
dB, the corresponding RGB spectrograms are shown in
Figure 6a and
Figure 7a, respectively. For the 2ASK-modulated signals, as the noise power increases, the components of the noise power become more prominent, as shown by the white patches in the RGB spectrogram image. However, the main features of the RGB spectrogram images of the 2ASK modulated signals are not destroyed. That is, the power distribution of the 2ASK-modulated signals is still concentrated in one sub-band in the RGB spectrogram image. In addition, the distribution of the power values of the power spectral density are almost the same at different SNRs. Similarly, the RGB spectrograms for the 2FSK- and 2PSK-modulated signals with SNR = 10 dB and SNR =
dB are shown in
Figure 6b and
Figure 7b and
Figure 6c and
Figure 7c, respectively. From these figures, we can conclude that increases in the noise power do not destroy the main features of the RGB spectrogram images of these modulated signals, and thus, they can be used as the features for modulation classification, even in the low-SNR region.
4.2. Classification Accuracy of the Proposed Scheme
The classification accuracy of the proposed scheme is tested and verified for both the SISO and MIMO scenarios. We first randomly generate the data stream, and then it is modulated and passed through the MIMO or SISO channels. In order to verify the performance of the proposed scheme, some benchmark schemes are introduced, such as the SqueezeNet-based method [
43], the GoogleNet-based method [
44], the scheme based on the smooth pseudo Wigner–Ville distribution (SPWVD) proposed in [
7], and the scheme based on the Wigner–Ville distribution (WVD) proposed in [
31].
4.2.1. Classification Accuracy in the MIMO Scenario
The classification performance of the proposed scheme for the MIMO scenario is now verified. In order to better understand the performance of the proposed scheme, the model is trained and tested with two data sets, i.e., one for the modulation set
= {2ASK, 2FSK, 2PSK, 4ASK, 4FSK, 4PSK, 8PSK, 16QAM} and another for a smaller modulation set
= {2ASK, 2FSK, 2PSK, 4ASK, 4FSK, 4PSK}. In the training stage,
Figure 8 shows the training accuracy versus the iterations in
; as the number of iterations increases, the neural network converges gradually. In the testing stage, the SNR of the modulated signals is varied from SNR =
dB to SNR = 10 dB, and the result is shown in
Figure 9. For both scenarios with and without the decision fusion module, the classification accuracy of the proposed scheme increases as the SNR of the modulated signals increases, which is consistent with the theoretical analysis. However, by introducing the decision fusion module, a 10% performance improvement in the classification accuracy can be achieved. In particular, the proposed scheme can achieve 80.42% and 87.92% accuracy at −4 and 10 dB SNR in
, and 87.78% and 93.33% accuracy at −4 and 10 dB SNR in
. In addition, the proposed scheme is also compared with the SqueezeNet- and GoogleNet-based schemes. The detailed hyperparameters are shown in
Table 3. From this, we can note that the proposed scheme outperforms both the SqueezeNet- and GoogleNet-based methods. This can be explained as the fact that the decision fusion module and the excellent representational capability of AlexNet together enhance its classification accuracy. Moreover, we know that GoogleNet has the most training parameters and SqueezeNet has the least. In the training stage, since GoogleNet has the most training parameters and the training sets are not big enough, the complex GoogleNet can not fit well, and the generalization error of GoogleNet is higher than those of both SqueezeNet and AlexNet. SqueezeNet has the least training time, but the generalization error of SqueezeNet is larger than that of AlexNet. Hence, AlexNet achieved the best performance in our problem.
More specifically, the confusion matrices of the classification results are shown in
Figure 10a,b for modulated SNRs of −4 and 10 dB, respectively. The MFSK- and QAM-modulated signals have the highest classification accuracies at both −4 and 10 dB, and the MASK-modulated signals have the second highest. The MPSK signals (especially the 4PSK signals) exhibit the worst classification performance, as shown in
Figure 10a. Most of the 4PSKs are misclassified as 8PSK at SNR =
dB, and the performance is improved only slightly at SNR = 10 dB. This result indicates that the MIMO system structure has negative effects on the time–frequency characteristics of the MPSK signals, which is consistent with the theoretical analysis. Hence, our proposed scheme has difficulty identifying the high-order PSK signals in the MIMO system. However, the time–frequency analysis and deep-learning-based scheme have excellent performance in classifying the MFSK-, ASK-, and QAM-modulated signals, and they can obtain superior average classification accuracy for the MIMO system.
In order to completely characterize the performance of the proposed method, we further investigate how the antenna configuration impacts the obtained performance, i.e., 2 × 2, 2 × 3, and 2 × 4 antennas over the transceiver, and the result is shown in
Figure 11. One can note that, on the one hand, the classification accuracy of the proposed scheme increases as the SNR of the modulated signals increases for all different antenna configurations. On the other hand, with the increase of the number of receiving antennas, the classification accuracy is increased. These results are as expected, since the increase of the number of receiving antennas at the receiver will improve the diversity performance of the MIMO-based system; then, the decision fusion algorithm can obtain the co-operation gained by jointly deciding the correct modulation with greater probability, and thereby improving the probability of identification. However, it is important to note that, herein, for our proposed scheme, we can obtain the diversity gain for the multiple antennas receiving at the receiver and also the co-operation gain for the decision fusion scheme. Since we do not perform the estimation of the channel, it is not possible to further obtain a coherent combination gain with the multiple antennas at the receiver.
At last, the performance of the proposed approach is evaluated under more realistic multi-path fading channels for the MIMO system, and the multi-path channel model for the MIMO network is developed as follows:
where
I denotes the number of paths for each antenna, and
denotes the received signal at the
j-th antenna with the channel gain
and delay
[
45]. In this, with different multi-path numbers for the MIMO channel, i.e.,
,
, and non-multi-path, the classification accuracy performance is evaluated over different SNRs, and the result is shown in
Figure 12. One can note that, with the increase of the number of the multiple paths, the classification accuracy is decreased. This result is consistent with the theoretical analysis, as multi-path transmission from the transmitter to receiver will cause frequency-selective fading for the signals, and the greater the number of the multiple paths is, the stronger the influence on the time–frequency properties of modulated signals will be and the more the STFT time–frequency image will be damaged. However, we can observe that, even for the scenario of
, the proposed scheme can still obtain 74.5% and 82.5% classification accuracy at SNR =
dB and SNR = 10 dB. This result indicates that our proposed scheme can obtain robust performance even for the frequency-selective channel.
4.2.2. Classification Accuracy in the SISO Scenario
In order to better understand how different time–frequency analysis tools affect the performance of the proposed scheme, herein, both the smooth pseudo Wigner–Ville distribution (SPWVD)-based scheme [
7] and the Wigner–Ville distribution (WVD)-based scheme [
31] are introduced and compared with our proposed method for the SISO network, and the result is shown in
Figure 13. In the simulation, the average classification accuracy of the proposed scheme is evaluated by varying the SNR of the signals from –4 to 10 dB. From this figure, we observe that as the SNRs of the signal increase, the classification accuracies of all three classification schemes are gradually improved. Moreover, our proposed scheme always has the highest average classification accuracy. Its classification accuracy is always larger than 92.37% even at SNR =
dB, and it has a classification accuracy of 99.12% at SNR = 10 dB. This significantly outperforms the SPWVD-based method and the WVD-based method. These results confirmed that our method has higher classification accuracy and better robustness, even in the low-SNR region. In addition, by comparing with
Figure 8, we note that the average classification accuracy of the MIMO scenario is lower than that of the SISO scenario. This is due to the fact that, by using multiple antennas at the transmitter, the signals from different transmitter antennas may non-coherently combine at each receiver antenna, thus worsening the classification performance, as mentioned in
Section 4.1.
5. Conclusions
In this paper, we resolve the problem of blind modulation classification (BMC) for MIMO systems. Specifically, the windowed STFT was used to analyze the time–frequency characteristics of the modulation signals, and the time–frequency graphs of the modulated signals were converted into RGB spectrogram images. Then, transfer learning was utilized to fine-tune AlexNet to adapt to our classification problem, and the generated RGB spectrogram images were fed into the fine-tuned CNN to extract features and train the network. Finally, the decisions of each received signal from the MIMO receivers were combined by the decision fusion module for the final decision. The STFT-based time–frequency analysis results showed that each modulation type had unique time–frequency characteristics, and that the additive noise had limited influence on the time–frequency characteristics of the modulation signals. The numerical results indicated that the proposed scheme can achieve 92.37% and 99.12% classification accuracy at the SNRs of −4 and 10 dB, respectively, in the SISO system. For the MIMO system, the proposed scheme can still achieve 80.42% and 87.78% at the SNR of −4 dB for the large and small modulation sets, respectively. This is a considerable improvement over existing studies of the BMC of MIMO systems, especially in the low-SNR region. However, many open problems still exist for BMC. As mentioned earlier, the extension of our proposed deep-learning-based approach to the MIMO OFDM system is still a challenge issue, and this is part of our future work.