3.2.1. Convolutional Layer

The proposed model has multiple convolutional layers. A convolutional layer has H di fferent sizes of convolutional kernels. As mentioned in Section 2, in order to ensure the e fficiency and e ffectiveness of the classification for two-dimensional time-series, the height of kernels is the same as the number of data points for one day. For convolution kernels of size *Hi*, **D***u* = (*<sup>F</sup>*, *T*) denotes the *u*th data sample. The corresponding kernel weight **w***<sup>u</sup> j* ∈ R*F*×*<sup>K</sup>* is used to extract features from the input data, where K is the kernel length. For example, the feature map *o<sup>u</sup> j*,*i*is calculated by:

$$
\rho^{\
u}\_{j,i} = f\_a(\mathbf{w}^{\mu}\_j \ast \mathbf{D}^{\mu} + b^{\mu}\_j) \tag{10}
$$

where ∗ means the convolutional operation. *bu j* ∈ R is a bias term and *fa*(·) is a nonlinear activation function such as the rectified linear unit (ReLU) function. Without the activation function, the output of the next layer is a linear function of the input of the previous layer. Additionally, it is easy to prove that no matter how many convolutional layers there are, the output is a linear combination of inputs, which means the network has no hidden layer. Therefore, activation functions can improve the e ffectiveness of neural networks. 

There are C kernels **w***<sup>u</sup>* 1, **w***<sup>u</sup>* 2, ··· , **w***<sup>u</sup> j* , ··· **w***<sup>u</sup> C* of size *Hi* to produce C feature maps as follows:

$$\mathbf{o}\_{i}^{u} = \begin{bmatrix} o\_{1,i'}^{u} \, o\_{2,i'}^{u} \, \cdots \, \, o\_{j,i'}^{u} \, \cdots \, \, \, o\_{\mathbf{C},i}^{u} \end{bmatrix}^{T} \tag{11}$$

After first convolution, the feature maps of kernel size *Hi* are represented by **<sup>D</sup>***i*(*<sup>N</sup>*,*C*, *T* − *K* + <sup>1</sup>).

In order to extract the time features and compress the amount of data, the feature maps of the first convolutional layer should be convoluted multiple times. Thus, there are multiple convolutional layers in the proposed neural network. It is worth noting that the kernel size of the previous layer is not necessarily equal to that of the next layer. For instance, the kernel size of **D***i* in the upper layer is *Hi*1, and in the next layer is *Hi*2. *Hi*1 and *Hi*2 are independent of each other. After passing through these convolutional layers, the feature maps of kernel size {*Hi*1, *Hi*2, ··· , *HiM*} are expressed as:

$$\mathbf{D}\_{i}(N, \mathbb{C}, T - K\_{1} - K\_{2} - \dots - K\_{M} + M) \tag{12}$$

where *KM* is the kernel length of convolutional layer M.
