*2.2. CNN-Based Fault Diagnosis*

Convolutional neural networks are widely used in data-based fault diagnosis applications. Based on the different types of convolutional kernel operations, they can be divided into 1D-CNNs and 2D-CNNs. The 1D convolutional structure is proposed mainly in response to the fact that the neural network often requires manual feature extraction of the raw signal when performing recognition. One-dimensional convolutional neural networks can use the raw data directly as an input to the neural network. For example, Eren et al. proposed an adaptive 1D convolution method that can extract data features directly from the raw time-domain data [21]. An online diagnostic network based on 1D-CNN was designed for the effective diagnosis of a gearbox, where vibration sensors cannot be used, and the signal was collected by a rotary encoder [22]. A deep convolutional structure, Deep Inception Net with Atrous Convolution (ACDIN), was designed in [23] based on 1D-CNNs, which improved the feature extraction ability of the network by adding an inception layer. The 1D convolution was improved by Atrous convolution. This led to a significant increase in the diagnostic capability of the network. To address the problem of uneven distribution of samples in the dataset, Jia et al. proposed a 1D-CNN with normalized weights for onedimensional input data [24]. Jiang et al. designed a multi-scale signal resolution method using one-dimensional convolution for signal feature extraction, which achieved a positive result [25]. Appana et al. proposed the extraction of the raw signal by CNN for the case of multiple faults and environmental influences [26]. One-dimensional CNNs have excellent environmental adaptability and can effectively resist interference.

The main issue that needs to be solved when 2D convolutional neural networks are used for fault diagnosis is how to convert the acquired 1D raw signal into 2D data that can be fed into the network. A number of approaches have been proposed to solve this problem. Guo et al. used the residual processed short-time Fourier transform (STFT)-transformed image of the original signal as the input into the CNN [27]. Long et al. proposed a signal to image conversion mechanism to transform the raw time domain signal into 2D grey images [28]. This enables feature extraction of the collected vibration signals in a similar manner to picture recognition. Han proposed a spatiotemporal convolutional neural network (ST-CNN), which extracts spatiotemporal features via the spatiotemporal pattern network (STPN) and then makes a diagnosis based on the CNN [29]. In [30], Yu et al. used a pseudo-color map to represent the data extracted by STFT and then fed the images into a CNN for training recognition. Sun et al. used the dual-tree complex wavelet transform method to extract features from the raw data, and the DTCWT wavelet sub-bands were used as multiple rows of a matrix so that a 2D signal was formed and sent to 2D convolution for processing [31]. Similarly, the Hilbert envelope demodulation spectra (HEDS) of reconstructed signals in each frequency band were also spliced into the 2D signal matrix [32]. The HEDS of the reconstructed signal for each frequency band were stitched into a 2D signal matrix to produce a 2D signal. The time domain signals were arranged row by row to form a 2D input matrix as the network input for diagnostics. Min et al. arranged the time domain signals row by row to form a 2D input matrix as the network input for diagnostics [33].
