*4.1. Performance on Signal Detection*

For multi-signal detection, we need to know each signal carrier frequency, start-stop time and modulation format. Figure 10 shows some detection results from our model. From Figure 10a,b, it is indicated that our model is beneficial for multi-signals detection, and it can accurately estimate the relevant information about each signal. Moreover, our model has very a promising application prospect in engineering because it has a good visualization effect.

**Figure 10.** *Cont.*

**Figure 10.** The SSD networks detection results. (**a**) ideal result 1; (**b**) ideal result 2; (**c**) not perfect result 1; (**d**) not perfect result 2; (**e**) no signal result 1; (**f**) no signal result 2.

To some extent, our model is not perfect yet, and there are still some aspects that need to be improved. From Figure 10c,d, we can learn that once the signal length is large, the estimation of the signal start-stop time is not precise, while the estimation of the carrier frequency is precise. The cause of this phenomenon may be that the time-frequency spectrum has large deformation and extreme length-with radio, while the natural image is not. Therefore, we need to further optimize the default box in the SSD networks. Figure 10e,f show the network performance when there is no signal exists. It can be observed that the model does not produce a false alarm, which is useful in engineering.

Figure 11 shows our model detection precision versus different SNRs. We choose the mean Average Precision (mAP) as the performance index of the model. To calculate the mAP, we need to calculate precision and recall. For calculating precision and recall, we need to identify True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN). Recall is defined as the proportion of all positive examples ranked above a given rank. Precision is the proportion of all examples above that rank which are from positive. The Average Precision (AP) summarizes the shape of the precision/recall curve. Hence, the mAP is the mean of all the AP values across all classes as measured above. They can be calculated as follows

$$\text{Precision} = \frac{TP}{TP + FP}, \text{Recall} = \frac{TP}{TP + FN} \tag{29}$$

$$AP = \int\_0^1 \mathcal{P}(r) dr\tag{30}$$

$$mAP = \frac{\sum\_{\text{num\\_classes}} AP\_i}{\text{num\\_classes}}\tag{31}$$

It can be deduced that with the increase of the SNR, the mAP value of the SSD network is increasing. When the SNR is 5 dB, the mAP value can reach 90% in *IoU* is 0.5. Different *IoU* thresholds can lead to different results. Although the increase of the threshold can obtain more reliable signal carrier frequency and start-stop time, it sacrifices the precision of signal detection. Besides, we can adopt some traditional methods to further estimate these signal parameters. Finally, we choose 0.5 as the threshold of *IoU*.

**Figure 11.** The SSD networks performances.

Once we detected the signals, we need to evaluate the precision of the estimated parameters. We use the normalized offset of the estimated and the actual parameters as the criterion of measurement. They can be presented as follows:

$$
\Delta f = \frac{\left| f\_{pre} - f\_{val} \right|}{R} \tag{32}
$$

$$
\Delta t\_{\text{start}} = \frac{\left| t\_{\text{pre\\_start}} - t\_{\text{true\\_start}} \right|}{T}, \\
\Delta t\_{\text{stop}} = \frac{\left| t\_{\text{pre\\_stop}} - t\_{\text{true\\_step}} \right|}{T} \tag{33}
$$

where *fpre* is the predicted value of the carrier frequency, *freal* is the actual value of the carrier frequency, *R* is the symbol rate, *tpre*\_*start* and *tpre*\_*stop* are the predict values of the start and the stop time, *ttrue*\_*start* and *ttrue*\_*stop* are the actual values of the start and the stop time, and *T* is the signal duration. Table 2 shows the carrier frequency and the start-stop time precision when the signal is detected. It can be seen that the precision of the carrier frequency is higher than start and stop time. These phenomena are consistent with Figure 10c,d. And in future research, we need to combine the prior information of the signal to design the default boxes and the networks.

**Table 2.** The offset in the estimation of the various parameters.


We also compare our model performances with the RCNN networks and the Fast RCNN. From Figure 12a, we can see that the mAP of the Fast RCNN and the RCNN is higher than the SSD networks, but the improvement is not significant. And from Figure 12b, we can infer that the SSD networks has considerable advantages in processing speed compared with the RCNN and Fast RCNN. Our model processing speed can reach 0.05 s for each time-frequency spectrum, and such a computational complexity is acceptable for many practical communications systems.

**Figure 12.** Different networks performances. (**a**) performances for mAP; (**b**) performances for time.

### *4.2. Performance on Modulation Recognition*

For signal modulation recognition, we set a series of experiments to test the network performances. Figure 13a shows the recognition performances of each modulated signal under the different SNR. It can be seen that the algorithm can still achieve better performance when the SNR is very low. Because its modulation complexity, the performance of 64QAM signal is worse than other signals, but it still can achieve 94% accuracy at 7 dB. For BPSK and OQPSK signals, they have distinct visual characteristics from other modulated signals in the eye diagram and the vector diagram, which recognition accuracy can reach 100% even in 0 dB. And it is also obvious that the recognition performance of circular modulation signals {8PSK, 16APSK, 32APSK} is better than QAM modulation signal. To understand the results better, the confusion matrices in different SNR levels are presented in Figure 13b–d. It can be seen that the network shows excellent performance in discriminating BPSK, QPSK, OQPSK, 8PSK, and 16APSK. Moreover, in our experiments, it can be seen that 16QAM is more likely confused with 64QAM, while 16APSK is more likely confused with 64QAM.

**Figure 13.** *Cont.*

**Figure 13.** The performance of each modulation (**a**) classification accuracy for each modulation versus SNR; (**b**) normalized confusion matrix in 0 dB; (**c**) normalized confusion matrix in 3 dB; (**d**) normalized confusion matrix in 10 dB

For accuracy comparison, we consider four different modulation classification algorithms.


Figure 14 presents the average classification accuracy of five algorithms versus SNR. The average accuracy is obtained by averaging the classification performance of eight modulation categories. The performance results of our algorithm outperform all other algorithms.

**Figure 14.** Different methods performance versus SNR.

Considering the error of the carrier frequency estimation by SSD networks and FFT in practice, we research the network recognition performance in different frequency offsets. We set a series of frequency offset for signals, and the result is shown in Figure 15. It can be seen that the recognition accuracy of the signals with a frequency offset is lower than those without frequency offset. When the signals have a large frequency offset, the network is no longer suitable. We also collect some signals from a real satellite communication system, and the real-time wireless channel is performed in the received signals. And then, we use a signal playback device, a DSP card, and PCs to simulate signal reception process. From Figure 15a, we can obtain that the recognition accuracy on real data is lower on simulated data at same SNR level. It may be due to the training data, which not consider the actual channel environment clearly. But the recognition accuracy can still reach 90% when the SNR is 4 dB. And for further research, we will make full use of the real signal to make our model more robust.

**Figure 15.** The network performance on different frequency offsets range. (**a**) classification accuracy for different frequency offset versus SNR; (**b**) normalized confusion matrix in 3 dB when the frequency offset is [−0.01, 0.01]; (**c**) normalized confusion matrix in 3dB when the frequency offset is [−0.02, 0.02]; (**d**) normalized confusion matrix in 10 dB when the frequency offset is [−0.1, 0.1].

We also consider the influence of the symbol numbers and the eye number in eye diagram on the network performance. We obtain the best parameter settings of samples by grid search. The symbol number is set as 200, 400, 800, and 1000, respectively, while the eye number is set as 2, 3, 4, and 5. The results are shown in Figure 16. It can be seen that theses parameters do affect network performance. With the increase of symbol number and eye number, the overall accuracy of the model

is gradually increasing. But we also can see that when the symbol number is 1000 and the eye number is 5, the improvement of performance is not obvious. Therefore, we finally choose 800 symbols and 4 eye numbers to generate the eye diagram and the vector diagram.

**Figure 16.** The network performance on the different sample parameters.

Finally, we compare the performance of the single input network with the multi-inputs network in this work. The results are shown in Figure 17. The modulation recognition algorithm based on a single eye diagram has poor performance. The performance of the I-eye diagram is lower than that of the Q-eye diagram, which may be due to the setting of the initial phase in the same modulation format. And the performance of the vector diagram based method is also inferior to our method, since it does not make full use of the signal waveform information.

**Figure 17.** The network performance on the different input model.

#### **5. Conclusions and Discussions**

In our research, we have demonstrated our initial efforts to establish a DL framework for multi-signals detection and modulation classification problem. In our method, the time-frequency spectrums are exploited for multi-signals detection task, while the eye-diagrams and vector diagrams

are exploited for the modulation classification task. The simulation results prove that DL technologies have the ability to solve the problems in the communication field and have higher performance than other methods.

However, in the future, we will do more rigorous analysis and more comprehensive experiments. Besides, for practical use, we will collect the samples generated from the real channels, and then retrain or fine-tune the model for better performance.

**Author Contributions:** Conceptualization, X.Z. and H.P.; methodology, X.Z.; software, X.Z.; validation, X.Z., H.P. and X.Q.; formal analysis, G.L.; investigation, X.Z.; resources, G.L.; data curation, X.Z.; writing—original draft preparation, X.Q.; writing—review and editing, G.L.; visualization, S.Y.; supervision, X.Z.; project administration, H.P.; funding acquisition, H.P.

**Funding:** This research was funded by the National Natural Science Foundation of China (No. 61401511) and the National Natural Science Foundation of China (No. U1736107).

**Conflicts of Interest:** The authors declare no conflict of interest.
