*4.2. CNN Feature Extraction*

Different from other CNNs, we adopted only one pooling layer to reduce the dimension of extracted features due to the data we used with lees dimensions, which is motivated by [31]. Firstly, CNN models the multi-scale local features from raw sample tensor *Reshape*(*x*(*t*)) at three-scale convolution operations—Conv1\_1, Conv1\_2, and Conv1\_3—using different size kernels with shapes of 1 × 2, 1 × 3, 1 × 4. The convoluted results are activated by "ReLU", as defined in Equation (5). In order to obtain more robust features, we applied one more convolutional layer to extract the abstract feature representations again; they are Conv2\_1, Conv2\_2, and Conv2\_3. At last, extracted multi-local features are processed by one wide convolution layer "Global\_Conv" to obtain global representations. CNN extracted features are expressed as Equation (21) and then are flattened for the next step, where *CNN*() is the process of this sub-section.

$$\text{CNN}\_{\text{features}} = \text{CNN}(\text{Reshape}(\mathbf{x}(\mathbf{t}))) \tag{21}$$
