**3. Methods**

*3.1. CNN*

CNN is a typical feedforward neural network. It virtually constructs various filters that can extract the characteristics of input data. Through these filters, the input data is convoluted and pooled, and the topology features hidden in the data are extracted step by step. With the deep entry of the network layer, the extracted features are abstracted. Therefore, the extracted features have translation, scaling, and rotation invariance. The sparse connection in CNN reduces the number of training parameters and speeds up the convergence; weight sharing effectively avoids algorithm overfitting; and downsampling makes full use of the features of the data and reduces the data dimension, optimizing the network structure [29,30]. CNN can deal with one-dimensional (1-D) signals and sequences, two-dimensional (2-D) images, and three-dimensional (3-D) videos. We apply CNN to extract features from 1-D sequences in this paper.

The essential components of CNN are convolutional operation and pooling operation. Through convolution operation, high-level local region feature representations are extracted with different filter kernels. The convolution process is described as follows:

$$\mathbf{x}\_{j}^{l} = f(\sum\_{i \in M\_{j}} \mathbf{x}\_{i}^{l-1} \times \mathbf{k}\_{ij}^{l} + \mathbf{B}\_{j}^{l}) \tag{4}$$

where *xlj* are the j feature maps of *lth* layer through convolution operation between *l* − 1*th*'s output *xl*−<sup>1</sup> *i* and *j* filters *klij*, *Blj* is *j* bias of each feature map; *i* is in the range of *j* input values *Mj*. After convolution operation, *xlj* is processed with an activation function. The comprehensive result *alj* is the input of the next layer. Rectified Linear Unit (ReLU) was widely applied to accelerate and converge the CNN, which enabled a nonlinear expression of input signals to enhance the representation ability. Which is formalized as follows:

$$a\_j^l = \max\{0, x\_j^{l-1}\} \tag{5}$$

Another key component of CNN is the pooling operation, which is employed to reduce the dimension of input data and ensure scale invariance. Thus, obtained features are more stable, especially when data is acquired from a noisy environment. There are three types of pooling operations: maximum, minimum, and average pooling operation. We give an example of utilizing maximum pooling, which is expressed as follows:

$$p\_j^l = \max\{q\_j^{l-1}(t)\}, \ t \in [(j-1)w, jw] \tag{6}$$

where *plj* is the output of maximum value among *l* − 1*th* layer obtained feature maps *ql*−<sup>1</sup> *j* (*t*), *t* is *tth* output neurons at *jth* layer in the network, *w* is the width of pooling size. Further details of CNNs can be found in LeCun's paper [11].
