*3.1. Compressed Value of HOC*

CS techniques perform successfully whenever applied to so-called compressible and/or K-sparse signals, i.e., signals that can be represented by *K N* significant coefficients over an N-dimensional basis [21,22]. The K-sparse signal *<sup>s</sup>*(*t*) <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* of dimension N is accomplished by computing a measurement vector *<sup>y</sup>*(*t*) <sup>∈</sup> <sup>R</sup>*<sup>M</sup>* that consists of *<sup>M</sup> <sup>N</sup>* linear projections of the vector *s*(*t*). The compression sampling rate *fcs* = (*M*/*N*)*fs*, where *fs* is traditional sampling rate and δ = *M*/*N* ∈ (0, 1) is called the compression ratio. The linear compression sampling process can be described as:

$$y = \Phi \mathbf{s} + n\tag{15}$$

where Φ represents a *M* × *N* matrix, usually over the field of real numbers. It is noted that the measurement matrix Φ is a random matrix satisfying the restricted isometry property (RIP) [23], and its form is various, such as Gaussian matrix [24] and local Hadamard matrix [25].

According to the theory of feature extraction and compression sensing, the linear square compression sampling process can be defined as:

$$\left\|\mathbf{x}\right\|^2 = \Phi \left\|\mathbf{s}\right\|^2\tag{16}$$

where [[•]] represents the product operation of the corresponding elements between the vectors. From Equation (15), we can through CS to simplify the reconstruction.

First, we define a relationship between the autocorrelation matrix **R**[[*x*]]<sup>2</sup> and **R**[[*s*]]<sup>2</sup> :

$$\mathbf{R}\_{\left[\mathbf{x}\right]^{2}} = \left[\mathbf{x}\right]^{2} (\left[\mathbf{x}\right]^{2})^{T} = \left(\Phi\left[\mathbf{s}\right]^{2}\right) (\Phi\left[\mathbf{s}\right]^{2})^{T} = \Phi \mathbf{R}\_{\left[\mathbf{s}\right]^{2}} \Phi^{T} \tag{17}$$

Second, we obtain a relationship between [[*x*]]<sup>2</sup> and **<sup>R</sup>**[[*x*]]<sup>2</sup> :

$$\left\|\mathbf{x}\right\|^2 = \mathbf{P}\_{\left\|\mathbf{x}\right\|^2} \mathsf{vec}(\mathbf{R}\_{\left\|\mathbf{x}\right\|^2}) \tag{18}$$

where *vec*(*AXB*)=(*BT* <sup>⊗</sup> *<sup>A</sup>*)*vec*(*X*) (<sup>⊗</sup> denotes Kronecker product) and **<sup>P</sup>**[[*x*]]<sup>2</sup> <sup>∈</sup> {0, 1} *<sup>n</sup>*×*n*<sup>2</sup> that maps the linearly products to the vectorized counterparts [[*x*]]<sup>2</sup> and **<sup>R</sup>**[[*x*]]<sup>2</sup> [26].

According to Equation (4), the compressed value of fourth-order cumulant can be defined as:

$$\mathbf{C}\_{\text{(4,0)}a} = F\|\mathbf{s}\|^4 - \mathfrak{Z}\left[F\|\mathbf{s}\|^2\right]^2 \tag{19}$$

where *F* = <sup>1</sup> *<sup>N</sup>* [exp(−*j*2πα*n*/*N*)](<sup>α</sup>,*n*) <sup>∈</sup> <sup>R</sup>*M*α×*<sup>N</sup>* is the discrete Fourier transform (DFT) matrix.

Therefore, the linear representation process of the vector-form **C**(4,0)<sup>δ</sup> is shown as:

$$\begin{array}{lcl} \mathbf{C}\_{\text{(4,0)}\delta} &= F P\_s \text{vec}(R\_{\left[\boldsymbol{s}\right]^2}) - 3P\_{Fs} \text{vec}(\mathbf{R}\_{\boldsymbol{F}\left[\boldsymbol{s}\right]^2}) = F P\_s \text{vec}(\mathbf{R}\_{\left[\boldsymbol{s}\right]^2}) - 3P\_{Fs} \text{vec}(F \mathbf{R}\_{\left[\boldsymbol{s}\right]^2} F^T) \\ &= \left[ F P\_s - 3P\_{Fs} (F \otimes F) \right] \text{vec}(\mathbf{R}\_{\left[\boldsymbol{s}\right]^2}) = \left[ F P\_s - 3P\_{Fs} (F \otimes F) \right] \left[ P\_{\left[\boldsymbol{x}\right]^2} (\Phi \otimes \Phi) \right]^\dagger \left[ x \right]^2 \end{array} \tag{20}$$

where [•] † stands for the pseudo-inverse operation.

Finally, by deriving the linear compressed sampling process of the fourth-order cumulant, we can obtain the compressed value of the eighth-order as follows:

$$\begin{array}{rcl} \mathbf{C}\_{\text{(8.0)}\delta} = & \{ \text{FP}\_{\text{s}}[(P\_{\text{s}} - 28P\_{\text{s}}^{1/2}P\_{\text{Fs}}^{1/2}(\mathbf{F}\otimes\boldsymbol{F})^{1/2} - 35\mathbf{F}\mathbf{P}\_{\text{s}} + 420P\_{\text{Fs}}(\mathbf{F}\otimes\boldsymbol{F})] - \\ & \quad \quad \quad \quad \quad \quad 630P\_{\text{Fs}}^{2}(\mathbf{F}\otimes\boldsymbol{F})^{2} \} \bullet \left[P\_{\text{[ $\boldsymbol{x}$ ]}^{2}}(\Phi\otimes\Phi)\right)^{\dagger} \|\boldsymbol{x}\|^{2} \end{array} \tag{21}$$

Therefore, the first and second characteristic parameters after CS is T1 = **C**(8,0)<sup>δ</sup> / **C**(4,0)<sup>δ</sup> and T2 = C*d*(8,0)<sup>δ</sup> / C*d*(4,0)<sup>δ</sup> 2 .

### *3.2. Compressed Value of Cyclic Spectrum*

It can be seen from Equations (10) and (16) that there is no direct linear relationship between the compressed sampled value x in the time domain and the cyclic spectrum *Sx*α, so the existing reconstruction algorithm cannot be used to implement the cyclic spectrum estimation. It is necessary to use some explicit linear relations between the second-order statistic to derive its transformation, and indirectly establish the linear relationship between them, so as to use the existing reconstruction algorithm to complete the estimation of the cyclic spectrum [27].

In order to obtain the compressed values of the cyclic spectrum, we first need to obtain the relationship between the cyclic spectrum matrix **S***s*<sup>δ</sup> and the cyclic autocorrelation matrix **R***s*δ(*u*, *v*). Let **R***s*δ(*u*, *v*) denote the form of the time-varying covariance matrix R. When x is a real value, R is a symmetric semi-positive definite matrix. For the convenience of calculation, we convert it into an auxiliary covariance matrix:

$$\mathbf{R} = \begin{bmatrix} \mathbf{R}\_{\mathrm{s\delta}}(0,0) & \mathbf{R}\_{\mathrm{s\delta}}(0,1) & \mathbf{R}\_{\mathrm{s\delta}}(0,2) & \cdots & \mathbf{R}\_{\mathrm{s\delta}}(0,N-1) \\ \mathbf{R}\_{\mathrm{s\delta}}(1,0) & \mathbf{R}\_{\mathrm{s\delta}}(1,1) & \mathbf{R}\_{\mathrm{s\delta}}(1,2) & \cdots & 0 \\ \mathbf{R}\_{\mathrm{s\delta}}(2,0) & \mathbf{R}\_{\mathrm{s\delta}}(2,1) & \mathbf{R}\_{\mathrm{s\delta}}(2,2) & \cdots & 0 \\ & \cdots & \cdots & \cdots & \cdots & \cdots \\ \mathbf{R}\_{\mathrm{s\delta}}(N-1,0) & 0 & 0 & \cdots & 0 \end{bmatrix} \tag{22}$$

*Sensors* **2020**, *20*, 1438

where N is the sampling point.

The matrix R contains all the elements in the **R***s*<sup>δ</sup> vector except for the zero elements. The relationship between **R***s*<sup>δ</sup> and R can be expressed as:

$$\text{vec}\{\mathbf{R}\} = \mathbf{H}\_N \mathbf{R}\_{\Rightarrow \delta} \tag{23}$$

where **H***<sup>N</sup>* ∈ {0, 1} *<sup>N</sup>*2×(*N*(*N*+1)/2) , *vec*{•} means the vectorization operation, and the **R***s*<sup>δ</sup> vector is mapped to *vec*{**R**}.

So the relationship between the cyclic spectrum matrix and the cyclic autocorrelation matrix is shown as:

$$\begin{aligned} \mathbf{R}\_{s\delta} &= \sum\_{\upsilon=0}^{N-1} \mathbf{G}\_{\upsilon} \mathbf{R} \mathbf{D}\_{\upsilon} \\ \mathbf{S}\_{s\delta} &= \mathbf{R}\_{s\delta} \mathbf{F} \end{aligned} \tag{24}$$

where **G***<sup>n</sup>* = [ <sup>1</sup> *<sup>N</sup>* exp(−*<sup>j</sup>* <sup>2</sup><sup>π</sup> *<sup>N</sup> <sup>a</sup>*(*<sup>n</sup>* <sup>+</sup> *<sup>v</sup>* <sup>2</sup> ))](*a*,*n*) <sup>∈</sup> <sup>R</sup>*N*×*N*, **<sup>F</sup>** = [exp(−*j*2π*vb*/*N*)](*v*,*b*) is the N-point DFT matrix and **D***<sup>v</sup>* is an *N* × *N* matrix with only its (*v*, *v*)th diagonal element being 1 and all other elements being 0. In addition, *n* and *v* are time delay, *a*, *b* ∈ [0, *N* − 1] denotes digital cyclic frequency and α = *f a*/*N* stands for the cyclic frequency.

The time-varying covariance matrix **R***<sup>x</sup>* = *E* \* *xmxT m* + of the compressed value x is also a symmetric semi-definite matrix, which can be rearranged into a vector **R***x*<sup>δ</sup> of length *M*(*M* + 1)/2 to represent as:

$$\begin{array}{ll} \mathbf{R}\_{\mathbf{x}\boldsymbol{\delta}} = & \left[ \mathbf{R}\_{\mathbf{x}\boldsymbol{\delta}}(0,0), \mathbf{R}\_{\mathbf{x}\boldsymbol{\delta}}(1,0), \cdots, \mathbf{R}\_{\mathbf{x}\boldsymbol{\delta}}(M-1,0) \right] \\ & \mathbf{R}\_{\mathbf{x}\boldsymbol{\delta}}(0,1), \mathbf{R}\_{\mathbf{x}\boldsymbol{\delta}}(1,1), \cdots, \mathbf{R}\_{\mathbf{x}\boldsymbol{\delta}}(M-2,1) \\ & \mathbf{R}\_{\mathbf{x}\boldsymbol{\delta}}(0,M-1) \right]^{T} \end{array} \tag{25}$$

Through the linear formula conversion, we can define two projection matrices **P***<sup>m</sup>* ∈ {0, 1} *<sup>N</sup>*2×(*N*(*N*+1)/2) and **<sup>Q</sup>***<sup>m</sup>* <sup>∈</sup> {0, 1/2, 1} (*M*(*M*+1)/2)×*M*<sup>2</sup> map the entries of x, s to those in *vec*(**R***x*) and *vec*(**R***s*), it can be shown that:

$$\begin{aligned} \text{vec}\{\mathbf{R}\_{\text{x}}\} &= \mathbf{P}\_{\text{m}}\mathbf{x} \\ s &= \mathbf{Q}\_{\text{m}} \text{vec}\{\mathbf{R}\_{\text{s}}\} \end{aligned} \tag{26}$$

where **P***<sup>m</sup>* and **Q***<sup>m</sup>* are special mapping matrices.

Because of *x*(*t*) = Φ*s*(*t*), we can obtain:

$$\mathbf{x} = \mathbf{Q}\_m \text{vec}(\mathbf{R}\_x) = \mathbf{Q}\_m(\Phi \otimes \Phi) \text{vec}(\mathbf{R}\_s) = \mathbf{Q}\_m(\Phi \otimes \Phi)\mathbf{P}\_m\mathbf{s} = \Theta \mathbf{s} \tag{27}$$

where <sup>Θ</sup> <sup>=</sup> **<sup>Q</sup>***m*(<sup>Φ</sup> <sup>⊗</sup> <sup>Φ</sup>)**P***<sup>m</sup>* <sup>∈</sup> <sup>R</sup> *<sup>M</sup>*(*M*+1) <sup>2</sup> <sup>×</sup> *<sup>N</sup>*(*N*+1) <sup>2</sup> .

Following the equation (24), we can obtain *vec*(**R***s*δ) is:

$$\text{vec}({\bf R}\_{\mathfrak{sl}}) = \sum\_{v=0}^{N-1} ({\bf G}\_v^T \otimes {\bf D}\_v) \text{vec}({\bf R}\_{\mathfrak{sl}}) = \Omega s \tag{28}$$

where <sup>Ω</sup> <sup>=</sup> *<sup>N</sup>*−<sup>1</sup> *v*=0 (*gT <sup>v</sup>* <sup>⊗</sup> **<sup>D</sup>***v*)**P***<sup>m</sup>* <sup>∈</sup> <sup>R</sup>*N*2×(*N*(*N*+1)/2).

Through the Equations (24), (27) and (28), we can derive the measurement vector x as a linear function of the vector-form cyclic spectrum **S***s*<sup>δ</sup> as:

$$\mathbf{S}\_{\circ \delta} = \boldsymbol{\Xi} \boldsymbol{\Omega} \boldsymbol{\Theta}^{\dagger} \mathbf{x} \tag{29}$$

where <sup>Ξ</sup> = (**F**−*<sup>T</sup>* <sup>⊗</sup> **<sup>I</sup>***N*) −1 , **I***<sup>N</sup>* is the N dimension unit matrix. Therefore, the third characteristic parameter after CS is T3 = max(S*s*δ).

### **4. The Structural Process of Decision Tree–Support Vector Machine Classifier**

### *4.1. The Principle of Support Vector Machine*

Support vector machine (SVM) is based on the principle of structural risk minimization [28–30]. Its final solution can be transformed into a quadratic convex programming problem with linear constraints. There is no local minimum problem. By introducing the kernel function, the linear SVM can be simply extended to the non-linear SVM, and there is almost no additional computation for high-dimensional samples.

The main idea can be seen from Figure 1 that for a linear separable case, the idea of maximizing the classification boundary is used to seek the optimal hyperplane H, while H1 and H2 are hyperplanes passing through the closest sample to the H and parallel to h, respectively, and the distance between them is called the classification interval. In the case of linear indivisibility, the linear indivisible samples in the low-dimensional input space are transformed into the high-dimensional feature space by the non-linear mapping algorithm, so that they can be linearly separable, in order that the high-dimensional feature space can be solved by the linear analysis method.

**Figure 1.** Linear (**a**) and non-linear (**b**) classification of support vector machine (SVM).

Suppose that the training set is - (*xi*, *xi*), *i* = 1, 2, ... , *L* and the expected output is *yi* ∈ {+1,−1}, where +1 and −1 represent two kinds of class representation respectively. If *xi* ∈ *Rn* belongs to the first category, the corresponding output is *yi* = +1; if it belongs to the second category, the corresponding output is *yi* = −1. The linear separability of the problem shows that there is a hyper-plane (*w* ∗ *x*) + *b* = 0, which makes the positive and negative inputs of the training points located on both sides of the hyper-plane, respectively. When the training sets are not completely linearly separable, we can introduce the relaxation variable ξ*<sup>i</sup>* ≥ 0 *i* = 1, 2, ... , *L*, then the objective function is transformed into:

$$\begin{array}{ll}\min & \phi(w) = \frac{1}{2} \|w\|^2 + \mathbb{C} \sum\_{i=1}^{L} \xi\_i\\ & s.t. & y\_i[(w \cdot x\_i) + b] \ge 1 - \xi\_i\\ & \xi\_i \ge 0 & i = 1, 2, \dots, L \end{array} \tag{30}$$

where *C* ≥ 0 is the penalty parameter. A larger *C* indicates a larger penalty for misclassification and is the only parameter that can be adjusted in the algorithm. By choosing the proper kernel function *K*(*x*, *xT*) and using the Lagrange multiplier method to solve Equation (30), the corresponding dual problem can be obtained as follows:

$$\begin{aligned} \min\_{\alpha} & \frac{1}{2} \sum\_{i=1}^{L} \sum\_{j=1}^{L} y\_i y\_j \alpha\_i \alpha\_j \mathcal{K}(\mathbf{x}\_i, \mathbf{x}\_j) - \sum\_{j=1}^{L} \alpha\_j\\ & \text{s.t.} \quad \sum\_{i=1}^{L} y\_i \alpha\_i = 0\\ & 0 \le \alpha\_i \le \mathcal{C} \qquad i = 1, 2, \dots, L \end{aligned} \tag{31}$$

Equation (31) obtains the optimal solution α∗ = (α∗ <sup>1</sup>, α<sup>∗</sup> <sup>2</sup>, ... , α<sup>∗</sup> *<sup>L</sup>*) and selects a positive component 0 ≤ α<sup>∗</sup> *<sup>j</sup>* ≤ *C* of α<sup>∗</sup> , and calculates *<sup>b</sup>*<sup>∗</sup> <sup>=</sup> *yj* <sup>−</sup> *<sup>L</sup> i*=1 *yi*α<sup>∗</sup> *i K*(*xi*, *xj*) accordingly. Finally, the policy function *f*(*x*) = sgn( *L i*=1 *yi*α<sup>∗</sup> *i K*(*xi*, *xj*) + *b*<sup>∗</sup> ) is obtained.

### *4.2. The Structure of Decision Tree–Support Vector Machine Classifier*

Through the above calculation and analysis of the compressed values of the feature parameters based on CS, the feature parameters are input into the decision tree–support vector machine (DT–SVM) structure with efficient calculation to realize signal classification. Adding support vector machine (SVM) to every node of the decision tree can comprehensively utilize the efficient computing power of the decision tree structure and the high classification performance of the SVM to achieve the high-precision classification of MC. In particular, for the K-class classification problem, the method of DT–SVM only needs to construct K-1 SVM sub classifier. The classification process of decision tree is shown in Figure 2.

**Figure 2.** The identification process of decision-tree (DT) classifier.

Using the three features in Figure 1 to complete the MC, the specific steps of the process are as follows:


**Figure 3.** The values of feature parameters (**a**) T1 and (**b**) T2 under different signal-to-noise ratios (SNRs).

It can be seen from Figure 3a that the compressed value of the feature parameter T1 tends to be stable with the increase of SNR and conforms to the theoretical value. In addition, it is obvious from the figure that T1 can classify the six signals into three categories includes (OOK, DPSK), (QPSK, OQPSK) and (16QAM, 64QAM). Similarly, it can be seen from Figure 3b that the feature parameter T2 after difference can distinguish QPSK from OQPSK. The circular spectrum and the cross-sectional diagrams of cycle frequency of OOK, DPSK, 16QAM and 64QAM are shown in Figure 4. It can be seen from Figure 4 that the maximum value of the cyclic spectrum of different signals is different, so the remaining signals can be distinguished by T3.

(**d**)

**Figure 4.** Cyclic spectrum and cross-sectional diagrams of (**a**) OOK (on-off keying), (**b**) DPSK (differential phase shift keying), (**c**) 16QAM (16 quadrature amplitude modulation) and (**d**) 64QAM (64 quadrature amplitude modulation).

### **5. Simulation Results and Discussion**

For signal modulation classification task, the modulation set is {OOK, DPSK, QPSK, OQPSK, 16QAM, 64QAM}. In this simulation process, all modulation signals adopt the same modulation parameters, that is, the carrier frequency is 100 kHz, the symbol rate is 40 kbps, and the sampling frequency is 800kHz. For each kind of modulation signal, the simulation generates 500 characteristic samples under different SNR (SNR from −5 dB to 15 dB, interval 5 dB). The K-fold cross validation (K-CV) method is used to evaluate the generalization ability of the model, which can not only improve the data utilization, but also solve the over fitting problem to a certain extent, so as to select the model. In this paper, K is chosen as 10. The basic principle of K-CV is to divide the training data set into K equal subsets. Each time, K-1 data are used as training data, and other data are used as test data. In this way, we repeat K times, estimate the expected generalization error according to the mean square error (MSE) average value after K times iteration, and finally select a group of optimal parameters.

To evaluate the influence of different symbol lengths on the classification performance, we select the symbol length N = 512, 1024, 2048, 4096 for the OOK, QPSK and 16QAM signals (as examples), respectively, to analyze the classification accuracy in Figure 5. As can be seen from Figure 5, with the increase of symbol length, the trend of classification accuracy is gradually increasing and finally tends to 100%. However, the increase of symbol length affects the classification accuracy mainly in the case of low SNR. Considering the influence of the computation cost in the classification process, 2048 is chosen as the symbol length in this paper.

**Figure 5.** Correct classification rate with different symbol length for (**a**) OOK, (**b**) QPSK and (**c**) 16QAM under different SNRs.

In order to evaluate the impact of different compression ratios on the classification performance, we analyzed the classification accuracy in Figure 6 with the compression ratios of 25%, 37.5%, 50% and

75% for OOK, QPSK and 16QAM signals (as examples) when the symbol length is 2048. As can be seen from Figure 6, with the increase of compression ratio, the classification accuracy increases slightly and finally tends to 100%. The increase of symbol rate has little effect on recognition rate. Therefore, in order to reduce the sampling rate and system complexity as much as possible, 25% is selected as the compression rate value.

**Figure 6.** Correct classification rate with different compression rate for (**a**) OOK, (**b**) QPSK and (**c**) 16QAM under different SNRs.

In order to optimize the parameters of kernel function and improve the classification accuracy, the grid search method is used. Table 5 shows the optimization results of penalty parameter and kernel parameter of each node of DT–SVM and the classification accuracy of sub-SVM under the RBF kernel function. It can be seen from the table that under different SNR conditions, with the improvement of SNR, the classification accuracy of sub nodes has improved. When the SNR is 0 dB or above, the average classification accuracy has reached 100%.

˄˅ ˄˅ ˄˅ ˄˅ ˄˅


**Table 5.** The optimization results and identification accuracy of the sub-SVM of each node in DT–SVM.

In order to prove the superiority of this method in recognition accuracy, Table 6 shows the classification accuracy of six different cognitive radio signals using multidimensional HOC, cyclic spectrum and DT–SVM classifier when the kernel function is RBF. In addition, the sizes of training and testing subsets are selected as 80% and 20% of the whole set of eigenvectors. The results show that with the increase of SNR, the classification accuracy of six kinds of signal is improved. When the SNR is 0 dB, the classification accuracy is 100%. It is proved that this method still has high recognition accuracy under low SNR. In addition, a new modulation signal can be introduced to expand the flexibility of the method and shows better compatibility, which will certainly increase the complexity of the algorithm and the classification time of the whole classification system.

**Table 6.** The classification accuracy of the cognitive radio signals using multidimensional HOC, cyclic spectrum and DT–SVM classifier.


### **6. Conclusions**

We have proposed a method through simulation to identify the cognitive radio signals based on compressed sensing combined with HOC and cyclic spectrum, which has been proved to perform well in noisy situations. It successfully achieves reconstructing the feature parameters of HOC and cyclic spectrum directly from the sub-Nyquist rate rather than reconstructing the original signal. The simulation results indicate that this method can effectively achieve modulation classification for six kinds of cognitive radio signals with three feature parameters. In this paper, the proposed method is relatively simple and the feature parameters used are few, and they are also less affected by noise. By analyzing the effect of symbol length and compression rate on the classification rate, the classification performance is improved and the classification accuracy can reach 100% when the SNR is 0 dB. This technique utilizes a sampling rate of CS much lower than the Nyquist sampling rate and noise-insensitive feature extraction algorithm, realizes blind classification without any prior information from the transmitter, and has low computational complexity.

**Author Contributions:** X.S. and Z.Z. conceived and designed the experiments; X.S. and X.T. performed the experiments; X.S. and X.G. analyzed the data; X.S., Z.Z. and X.G. contributed the simulation software and experimental facilities; X.S. and Z.Z. wrote the paper; S.S. participated in the funding acquisition and investigation. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Frontier Science and Technology Innovation Project in the National Key Research and Development Program under Grant No. 2016QY11W2003, Natural Science Foundation of Hunan Province under Grant No. 2018JJ3607, Natural Science Foundation of China under Grant No. 51575517 and National Technology Foundation Project under Grant No. 181GF22006.

**Conflicts of Interest:** The authors declare no conflict of interest.
