**1. Introduction**

Cognitive radio (CR) [1–3] has been used to refer to radio devices that are capable of learning and adapting to their environment. Due to the increasing requirements for wireless bandwidth of radio spectrum, automatic signal detection and modulation recognition techniques are indispensable. It can help users to identify the modulation format and estimate signal parameters within operating bands, which will benefit communication reconfiguration and electromagnetic environment analysis. Besides, it is widely used in both military and civilian applications, which have attracted much attention in the past decades [4–7].

Multi-signals detection is a task to detect the existing signals in a specific wideband, which is one of the essential components of CR. The most significant difference between signal and non-signal is energy. Hence, many wideband multi-signals detection algorithms are based on energy detector (ED). Some threshold-based wideband signal detection methods, such as [8–13], reduce the probability of false alarm or missed alarm. However, these methods are sensitive to noise changes and challenging to ensure the detection accuracy of all detection scenarios. Therefore, many non-threshold-based detection algorithms have been proposed [14–17]. However, these algorithms have high computational complexity, which results in poor online detection performance.

For automatic modulation recognition, algorithms based on signal phase, frequency, and amplitude have been widely used [18]. However, these algorithms are significantly affected by noise, and the performance can be substantially degraded in low SNR condition. High-order statistical-based algorithms [19–22], such as signal high-order cumulants and cyclic spectrum, have excellent anti-noise performance. The computational complexity of these methods is relatively low, but the selection of features relies too much on expert experience. It is difficult to obtain features that can adapt to non-ideal conditions. In particular, it is challenging to set the decision threshold when there are plenty of modulation formats to be classified.

Deep Learning (DL) techniques [23,24] have made outstanding achievements in Computer Vision [25,26] (CV) and Natural Language Processing [27,28] (NLP) for their strong self-learning ability. Recently, more and more researchers use DL techniques to solve signal processing problems. For signal detection, many DL-based methods, such as [29–31], detect signals in narrowband environment. These methods only detect the existing of signal, but can not estimate the relevant parameters. Therefore, developing a technique leverages deep learning to detect signal efficiently and effectively is still a challenging problem. For DL-based modulation classification, there has been some reported work, including [32–36]. For example, some researchers used the signal IQ waveform as data representation and learned the sample using CNNs [32–34]. Other researchers focused on developing methods to represent modulated signals in data formats for CNNs. Among these methods, constellation-based algorithms [35,36] have been widely utilized, where signal prior knowledge is fully considered.

In this study, DL techniques are fully utilized in multi-signals detection and modulation recognition. For multi-signals detection, we use the deep learning target detection network to detect the location of each signal. In our initial research, the used model is SSD networks, which is a relatively advanced target detection network. Furthermore, we use the time-frequency spectrum as the signal characteristic expression. Due to the time-frequency characteristic of the M-ary Frequency Shift Keying (MFSK) format signals, we can identify the modulation format while the signal is detected. Meanwhile, for M-ary Phase Shift Keying (MPSK), M-ary Amplitude Phase Shift Keying (MAPSK), and M-ary Quadrature Amplitude Modulation (MQAM) signal, the difference in the time-frequency spectrum is not sufficient to identify the signal modulation. Therefore, during the signal detection procession, we identify them in the same format, and only detect the signal presence or absence. Through the signal detection network, we can roughly get the signal carrier frequency and start-stop time. After that, we use a series of traditional methods to convert these signals from the wideband into the baseband. To recognize MPSK, MAPSK, and MQAM signals, a multi-inputs CNNs is designed. Moreover, we adopt the signal vector diagram and eye diagram as the network inputs, which are more robust than in-phase and quadrature (IQ) waveform data and constellation diagram.

This paper addresses the topic of DL based multi-signals detection and modulation classification. The main contributions of this paper are summarized as follows: (1) We propose a relatively complete DL framework for signal detection and modulation recognition, which is more intelligent than traditional algorithms. (2) We establish different signal representation schemes for several tasks, which facilitate the use of the built DL framework for detection and classification. (3) We propose a multi-inputs CNNs model to extract and map the features from different dimensions.

The rest of this paper is presented as follow. In Section 2, we offer a detail introduction to the signal model and the dataset generation. Section 3 shows the DL framework for signal detection and modulation recognition. Section 4 confirms our initial experiment result from different aspects. Finally, our conclusions and directions for further research are given in Section 5.

### **2. Communication Signal Description and Dataset Generation**

In realistic communication processing, the signal may be distorted by the effect of non-linear amplifier and channel. In actual situation, the received signal in the communication system can be expressed as:

$$r(t) = \int\_{\tau=0}^{\tau\_0} s(n\_{\rm clk}(t-\tau))h(\tau)d\tau + n\_{\rm add}(t) \tag{1}$$

where *s*(*t*) is the transmission signal, *nClk*(*t*) is timing deviation, *h*(*t*) represents the transmitted wireless channel, *nadd*(*t*) is additive white Gaussian noise.

*Sensors* **2019**, *19*, 4042

In this section, we will describe different modulated signals and their sample representation for our DL framework. We will also explain the reason why we use it and the method we enhance it.

### *2.1. Modulation Signal Description*

For any digital modulation signal, the transmission signal can be presented as

$$s(t) = \sum\_{n} a\_{ll} \mathbf{e}^{\mathbf{j}(w\_n t + \phi)} \mathbf{g}(t - nT\_b) \tag{2}$$

where *wn* is the signal angular frequency, φ is the carrier initial phase, *Tb* is the symbol period, *an* is the symbol sequence, *g*(*t*) is the shaping filter.

For MFSK signal, it can be presented as

$$a\_{\rm ll} = 1, w\_{\rm ll} = w\_0 + \frac{2\pi}{M} i\_\prime i = 0, 1, \dots, M - 1 \tag{3}$$

For MPSK signal, it can be presented as

$$a\_n = e^{j2\pi i/M}, i = 0, 1, \dots, M-1,\\ w\_n = w\_0 \tag{4}$$

For MQAM signal, it can be presented as

$$\begin{aligned} a\_{\text{il}} &= I\_n + \text{j}Q\_n\\ I\_{\text{ll}}Q\_{\text{ll}} &= 2i - \frac{\text{M}}{4} + 1, i = 0, 1, \dots, \frac{\text{M}}{4} - 1, w\_{\text{ll}} = w\_0 \end{aligned} \tag{5}$$

MAPSK constellations are robust against nonlinear channels due to their lower peak-to-average power ratio (PAPR), compared with QAM constellations. Therefore, APSK was mainly employed and optimized over nonlinear satellite channels during the last two decades. As recommended in DVB-S2 [37], it can be presented as:

$$a\_{ll} = r\_k \exp\left[\mathbf{j}\left(\frac{2\pi}{n\_k}\dot{\mathbf{r}}\_k + \theta\_k\right)\right] \tag{6}$$

where *rk* is the radius of the *k*th circle, *nk* is the number of constellations in *k*th circle, *ik* is the ordinal number of constellation points in the *k*th circle, θ*<sup>k</sup>* is the initial phase of the *k*th circle.

### *2.2. Signal Time-Frequency Description*

For multi-signals detection task, we use the wideband signal time-frequency spectrum as the neural network input. To prove the feasibility of this method, we theoretically prove the time-frequency visual characteristic of each modulation. Here, we use the short-time Fourier transform [38] (STFT) to analyze the signal time-frequency characteristic.

### 2.2.1. MFSK Signal Time-frequency Description

The STFT of MFSK signal can be expressed as

$$STFT\_{\theta\_{\mathsf{FSK}}}(t,\mathsf{w}) = \int\_{-\infty}^{+\infty} [\mathsf{s}\mathsf{r}\mathsf{s}\mathsf{x}(\mathsf{r})\mathsf{y}^\*(\mathsf{r}-t)]e^{-j\omega\mathsf{r}\mathsf{t}}d\mathsf{r} = \int\_{-\infty}^{+\infty} \left[\sum\_{k=-\infty}^{+\infty} A\mathsf{g}(\mathsf{r}-kT\_{\mathsf{b}})e^{l(\omega\_{\mathsf{k}}\mathsf{r}+\phi\_{\mathsf{k}})}\mathsf{y}^\*(\mathsf{r}-t)\right]e^{-j\omega\mathsf{r}\mathsf{t}}d\mathsf{r} \tag{7}$$

where γ(*t*) is the window function, whose duration is *T*. When γ(*t*) is in a symbol duration, Equation (7) can be simplified as

$$\begin{aligned} STFT\_{\mathfrak{H}\xi\mathbf{X}}(\mathbf{t},\mathbf{w}) &= \int\_{-T/2}^{T/2} A e^{j(\mathbf{w}\_k(\tau+t)+\phi\_k)} e^{-j\mathbf{w}(\tau+t)} d\tau = AT e^{-j\mathbf{w}t} e^{j(\mathbf{w}\_k t + \phi\_k)} S a(\frac{\mathbf{w} - \mathbf{w}\_k}{2} T), \\ kT\_b + T/2 &< t < (k+1)T\_b - T/2, k = 0, 1, 2, \dots \end{aligned} \tag{8}$$

where *Sa*(*w*) = sin(*w*)/*w*. When γ(*t*) spans two symbols, Equation (7) can be simplified as

$$\begin{array}{lcl} \text{STFT}\_{\text{sPSK}}(t,\boldsymbol{w}) &= \int\_{-T/2}^{d} A e^{j(\boldsymbol{w}\_{k}(\tau+t)+\phi\_{k})} e^{-jw(\tau+t)} d\tau + \int\_{d}^{T/2} A e^{j(\boldsymbol{w}\_{k+1}(\tau+t)+\phi\_{k+1})} e^{-jw(\tau+t)} d\tau = \\ &= A e^{j((\boldsymbol{w}\_{k}-\boldsymbol{w})t+\phi\_{k})} \int\_{-T/2}^{d} e^{-j(\boldsymbol{w}-\boldsymbol{w}\_{k})\tau} d\tau + A e^{j((\boldsymbol{w}\_{k+1}-\boldsymbol{w})t+\phi\_{k+1})} \int\_{d}^{T/2} e^{-j(\boldsymbol{w}-\boldsymbol{w}\_{k+1})\tau} d\tau = \\ &= A \frac{T+2d}{2} e^{j((\boldsymbol{w}\_{k}-\boldsymbol{w})t+\phi\_{k})} e^{\frac{j(\boldsymbol{w}-\boldsymbol{w}\_{k})(T-2d)}{4}} S a \left(\frac{\boldsymbol{w}-\boldsymbol{w}\_{k})(T+2d)}{4}\right) + \\ &+ A \frac{T-2d}{2} e^{j((\boldsymbol{w}\_{k+1}-\boldsymbol{w})t+\phi\_{k+1})} e^{\frac{j(\boldsymbol{w}-\boldsymbol{w}\_{k+1})(T+2d)}{4}} S a \left(\frac{\boldsymbol{w}-\boldsymbol{w}\_{k+1})(T-2d)}{4}\right), \\ &(k+1)T\_{b} \bigg| \qquad -T/2 < t < (k+1)T\_{b} + T/2, d = (k+1)T\_{b} - t, k = 0, 1, 2, \ldots \end{array} \tag{9}$$

where *wk*<sup>+</sup><sup>1</sup> is the carrier angular frequency of the k+1-th symbol. If *wk*<sup>+</sup><sup>1</sup> = *wk*, it indicates that the carrier angular frequency does not jump, so Equation (8) is same as Equation (9). We take the modulus square of Equation (8). The result can be expressed as

$$\begin{array}{l} \text{SPCC}\_{\mathfrak{s}\_{\text{FSK}}}(t, w) = \left| \text{STFT}\_{\mathfrak{s}\_{\text{FSK}}}(t, w) \right|^{2} = A^{2} T^{2} S a^{2} (\frac{w - w\_{b}}{2} T), \\\ \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{} \quad \text{}$$

And for Equation (9), it can be expressed as:

$$\begin{array}{l}\text{SPEC}\_{\text{SFSK}}(t,w\_{k}) \approx \frac{A^{2}(T+2d)^{2}}{4} \le A^{2}T^{2}, -T/2 < d < T/2, \\\ (k+1)T\_{b}-T/2 < t < (k+1)T\_{b}+T/2, k = 0,1,2,\dots \end{array} \tag{11}$$

Obviously, the value of *SPECsFSK* (*t*, *wk*) will increase as the increase of jumping time *d*. The energy decreases gradually as γ(*t*) slips away from the symbol. So when *d* = *T*/2, the window is completely within one symbol, and the maximum value is obtained.

$$\begin{array}{c} \text{SPEC}\_{\text{SFG}}(t, w\_k)\_{\text{max}} = A^2 T^2, \\ (k+1)T\_b - T/2 < t < (k+1)T\_b + T/2, k = 0, 1, 2, \dots \end{array} \tag{12}$$

When *d* = −*T*/2, the window completely spans to next symbol, and the minimum value is obtained

$$\begin{array}{c} \text{SPEC}\_{\text{\\$}\_{\text{\\$}\\$}}(t, w\_k)\_{\text{min}} = 0, \\ (k+1)T\_b - T/2 < t < (k+1)T\_b + T/2, k = 0, 1, 2, \dots \end{array} \tag{13}$$

From our analysis, we can easily get the characteristics of FSK modulation: (1) There will be sharp brightness changes in the time-frequency image at the frequency change moment. (2) The signal modulation number *M* and frequency spacing are important parameters for the MFSK time-frequency characteristics, which determine the value of *wk*.

### 2.2.2. Amplitude–Phase Modulation Signal Time-frequency Description

For MPSK, MAPSK, and MQAM signal, since they all belong to amplitude-phase modulation, the derivation processing of the signal time-frequency characteristics is the same as MPSK. Hence, we specify the time-frequency characteristics of the MPSK signal, and the STFT can be expressed as:

$$\begin{split} STFT\_{\mathfrak{s}\_{\text{PSK}}}(t,w) &= \int\_{-\infty}^{+\infty} [s\rho\_{\text{SK}}(\tau)\gamma^{\*}(\tau-t)]e^{-jw\tau}d\tau = \\ &= \int\_{-\infty}^{+\infty} \left[\sum\_{k=-\infty}^{+\infty} A\chi(\tau-kT\_{b})e^{j(w\_{k}\tau+\phi\_{k}+\phi\_{k})}\gamma^{\*}(\tau-t)\right]e^{-jw\tau}d\tau \end{split} \tag{14}$$

As the derivation of MFSK signal time-frequency characteristics, when γ(*t*) is in a symbol duration, the Equation (14) can be simplified as:

$$\begin{aligned} STFT\_{\text{sysK}}(t, w) &= \int\_{-T/2}^{T/2} A e^{j(w\_t(\tau+t) + \phi\_k + \phi\_k)} e^{-jw(\tau+t)} d\tau = AT e^{-jwt} e^{j(w\_t t + \phi\_k + \phi\_k)} S a(\frac{w - w\_t}{2} T), \\ kT\_b + T/2 &< t < (k+1)T\_b - T/2, k = 0, 1, 2, \dots \end{aligned} \tag{15}$$

*Sensors* **2019**, *19*, 4042

When γ(*t*) spans two symbols, the Equation (14) can be simplified as:

$$\begin{split} \text{STFT}\_{\text{sys}\gets\text{K}}(t,\boldsymbol{w}) &= \int\_{-T/2}^{d} A e^{j(\boldsymbol{w}\_{c}(\tau+t)+\phi\_{c}+\phi\_{k})} e^{-jw(\tau+t)} d\tau + \int\_{d}^{T/2} A e^{j(\boldsymbol{w}\_{c}(\tau+t)+\phi\_{c}+\phi\_{k+1})} e^{-jw(\tau+t)} d\tau = \\ &= A e^{j((\boldsymbol{w}\_{c}-\boldsymbol{w})t+\phi\_{c}+\phi\_{k})} \int\_{-T/2}^{d} e^{-j(\boldsymbol{w}-\boldsymbol{w}\_{c})\tau} d\tau + A e^{j((\boldsymbol{w}\_{c}-\boldsymbol{w})t+\phi\_{c}+\phi\_{k+1})} \int\_{d}^{T/2} e^{-j(\boldsymbol{w}-\boldsymbol{w}\_{c})\tau} d\tau = \\ &= A \frac{T+2d}{2} e^{j((\boldsymbol{w}\_{c}-\boldsymbol{w})t+\phi\_{c}+\phi\_{k})} e^{\frac{j(\boldsymbol{w}-\boldsymbol{w}\_{c})(T-2d)}{4}} S a \Big(\frac{\boldsymbol{w}-\boldsymbol{w}\_{c})(T+2d)}{4}\Big) + \\ &+ A \frac{T+2d}{2} e^{j((\boldsymbol{w}\_{c}-\boldsymbol{w})t+\phi\_{c}+\phi\_{k+1})} e^{\frac{j(\boldsymbol{w}-\boldsymbol{w}\_{c})(T+2d)}{4}} S a \Big(\frac{(\boldsymbol{w}-\boldsymbol{w}\_{c})(T-2d)}{4}\Big), \\ &(k+1)T\_{b} - T/2 < t < (k+1)T\_{b} + T/2, d = (k+1)T\_{b} - t, k = 0, 1, 2, \dots \end{split} \tag{16}$$

where φ*k*+<sup>1</sup> is the phase of the k+1-th symbol. And if φ*k*+<sup>1</sup> = φ*k*, Equation (15) is equal to Equation (16). We take the modulus square of (15), and the result can be expressed as:

$$\begin{array}{l} SPC\_{\text{sp\_SK}}(t, w) = \left| STFT\_{\text{sp\_SK}}(t, w) \right|^2 = A^2 T^2 Sa^2 (\frac{w - w\_k}{2} T), \\ kT\_b + T/2 < t < (k + 1)T\_b - T/2, k = 0, 1, 2, \dots \end{array} \tag{17}$$

And for (16), it can be expressed as:

$$\begin{array}{ll} \text{SPCC}\_{\mathsf{S\_{PK}}}(t, w\_{\mathsf{c}}) = \frac{A^2 T^2}{2} (1 + \cos(\phi\_{\mathsf{k}} - \phi\_{\mathsf{k}+1})) + 2A^2 d^2 (1 - \cos(\phi\_{\mathsf{k}} - \phi\_{\mathsf{k}+1})) \leq A^2 T^2, & \text{(18)}\\ (k+1)T\_b - T/2 < t < (k+1)T\_b + T/2, d = (k+1)T\_b - t, k = 0, 1, 2, \dots \end{array} \tag{18}$$

We take the partial derivative for Equation (18):

$$\frac{\frac{\partial \text{SPEC}\_{\text{res}\_{k}}(t, w\_{k})}{\frac{\partial d}{\partial d}} = 4A^{2}d(1 - \cos(\phi\_{k} - \phi\_{k+1})), -T/2 < d < T/2,\\(k+1)T\_{b} - T/2 < t < (k+1)T\_{b} + T/2, k = 0, 1, 2, \dots$$

From Equation (19), we can easily learn that *SPECsPSK* (*t*, *wc*) get the minimum value when φ*k*+<sup>1</sup> = φ*<sup>k</sup>* or d = 0. But the minimum value is much greater than 0, which is greatly different for the MFSK signal.

$$\begin{array}{l} \text{SPC}\_{sps\&}(t, w\_c)\_{\text{min}} = \frac{A^2 T^2}{2} (1 + \cos(\phi\_k - \phi\_{k+1})) \bowtie 0, \\ (k+1)T\_b - T/2 < t < (k+1)T\_b + T/2, k = 0, 1, 2, \dots \end{array} \tag{20}$$

Hence, for MPSK signal, there is only one wide frequency band in the time-frequency diagram, and the brightness fluctuation appears in a small range, which is different from MFSK. And from derivation processing, we can know that the MPSK time-frequency characteristics are less affected by M, so it is hard to distinguish PSK signals with different M. Figure 1 presents different modulation signals in the wideband.

**Figure 1.** Different modulation signals in the wideband.

### *2.3. Signal Eye Diagram and Vector Diagram Description*

The function of the eye diagram is to observe the baseband signal waveform by an oscilloscope. Through the eye-diagram, we can adjust the receiver filter to improve system performance. Besides, due to the characteristics of the modulated signal itself, different modulation modes have apparent visual differences in the eye diagram. As shown in Figure 2, because of the different modulation scales, there are different eye numbers in each eye diagram. For OQPSK, since the two orthogonal signals stagger for half a symbol period, the eye-opening position is always staggered, while other modulated signals always appear at the same time.

**Figure 2.** The eye diagram and vector diagram of different modulation signals in 15dB (**a**) BPSK; (**b**) QPSK; (**c**) OQPSK; (**d**) 8PSK; (**e**) 16QAM; (**f**) 16APSK; (**g**) 32APSK; (**h**) 64QAM.

By reconstructing the signal IQ waveforms in the corresponding time, the signal vector diagram shows the symbol trajectory. From its formation mechanism, it is similar to the signal constellation diagram. However, unlike the constellation diagram, the vector diagram can reflect the signal phase information. For example, it can easily distinguish QPSK from OQPSK with the same initial phase, because there is no 180◦ phase shift in OQPSK, while it exists in QPSK. Meanwhile, compared with the constellation diagram, the vector diagram is more convenient to obtain and requires less prior information.

### *2.4. The Generation Processing of the Dataset*

Figure 3 presents the processing of our dataset construction and annotation. To make samples more diverse, we set sampling phase offset, frequency offset, phase offset, and amplitude attenuation in sample generation processing.

**Figure 3.** The generation processing of the dataset. (**a**) dataset for signal detection; (**b**) dataset for modulation recognition.

For signal detection, we need to determine the reconnaissance frequency range and set the signal number in the wideband at first. We set different frequency offset for each signal, and ensure that the signals do not overlap in the frequency domain. Then we perform the STFT on the wideband. Not only we record the modulation format of each signal, but also record the start-stop time, carrier frequency, and bandwidth. Then we convert them into the coordinates on time-frequency image, which are the label information for the network.

For modulation recognition, traditional eye diagram and vector diagram are binary images, which do not consider the signal aggregation degree at a particular location. Hence, we consider the signal aggregation degree and enhance the traditional eye diagram and vector diagram. Figure 4 presents the enhancement processing of the dataset. In our initial research, since CNNs are insensitive to edge information, the signal amplitude is quantified between [−1.05, 105] by 128 after normalizing the amplitude. Furthermore, the parameter settings are obtained by experiments. For example, we choose 800 symbols and 4 symbols as a waveform group to generate the eye diagram and the vector diagram, and related experiments will also be described in detail in subsequent chapters. Moreover, we perform the following operations on the images to make the image details more prominent, where **Im**<sup>0</sup> is the original image, **Im**1,**Im**2,**Im**<sup>3</sup> are the channels of the enhanced image and α, β are scaling factors.

$$\begin{aligned} \mathbf{Im}\_1 &= \text{unit8} \Big( \frac{\mathbf{Im}\_0 - \min(\mathbf{Im}\_0)}{\max(\mathbf{Im}\_0) - \min(\mathbf{Im}\_0)} \times 255 \Big), \\ \mathbf{Im}\_2 &= \text{unit8} (\alpha \times \log 2 (\mathbf{Im}\_1 + 1)), \\ \mathbf{Im}\_3 &= \text{unit8} (\exp(\mathbf{Im}\_1/\beta)) \end{aligned} \tag{21}$$

**Figure 4.** The enhancement processing of the dataset.
