A Lightweight Bearing Fault Diagnosis Method Based on Multi-Channel Depthwise Separable Convolutional Neural Network

Ling, Liuyi; Wu, Qi; Huang, Kaiwen; Wang, Yiwen; Wang, Chengjun

doi:10.3390/electronics11244110

Open AccessArticle

A Lightweight Bearing Fault Diagnosis Method Based on Multi-Channel Depthwise Separable Convolutional Neural Network

by

Liuyi Ling

^1,2,*

,

Qi Wu

¹,

Kaiwen Huang

²,

Yiwen Wang

¹ and

Chengjun Wang

¹

School of Artificial Intelligence, Anhui University of Science and Technology, Huainan 232001, China

²

School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan 232001, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(24), 4110; https://doi.org/10.3390/electronics11244110

Submission received: 14 November 2022 / Revised: 6 December 2022 / Accepted: 6 December 2022 / Published: 9 December 2022

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The existing rolling bearing fault diagnosis method based on the deep convolutional neural network has the issues of insufficient feature extraction ability, poor anti-noise ability, and a large number of model parameters. A lightweight bearing fault diagnosis method based on depthwise separable convolutions is proposed. The proposed method can simultaneously extract different features from vibration signals in different directions to enhance the stability of the diagnosis model. The lightweight unit based on depthwise separable convolutions in the feature extraction layer reduces the size of the model and the number of parameters that need to be learned. The vibration signals of bearings in different directions are converted into time-frequency signals by the short-time Fourier transform (STFT) and then into pictures as the input of the model. In order to verify the effectiveness and generalization of the method, this paper uses the gearbox data set of Southeast University and the CWRU (Case Western Reserve University) bearing data set for experiments. Comparisons of bearing fault diagnosis results using the proposed model with other classical deep learning models are implemented. The results show that the proposed model is superior to other classical deep learning models; thus, it has a smaller model size, higher accuracy, and less computation burden. Compared with using a single-direction vibration signal as input, the proposed model that uses multiple vibration signals in different directions as input has more accuracy.

Keywords:

depthwise separable convolutions; fault diagnosis; lightweight; deep learning; bearing

1. Introduction

With industrial upgrading and technological innovation, the handicraft industry is gradually replaced by a variety of mechanical equipment. In order to liberate more labor, machinery and equipment are moving toward automation [1,2]. Rolling bearings have been widely used in various mechanical facilities [3,4]. As a vulnerable part of mechanical equipment, the health of rolling bearings means whether the equipment can work steadily and efficiently [5,6]. Once there is a problem with the rolling bearings, it will reduce the life of the parts directly connected to the bearings, and even cause the whole mechanical equipment to stop working [7,8]. Therefore, in order to avoid the economic losses caused by bearing fault, it is necessary to monitor the health of the bearing [9].

Since the working environment of rolling bearings is usually noisy, the vibration signal contains various features. In order to extract the desired features, the time-frequency analysis method is usually used to analyze the vibration signal [10]. Li et al. [11] proposed a variational-mode-decomposition-based frequency band entropy (FBE), which can fully extract fault features in signals. Zhang et al. [12] designed a new fault feature extraction method, i.e., a fast entrogram, which can accurately filter the components containing the most fault information by spectral segmentation of the signal in the frequency domain. Zhu et al. [13] proposed a new fault diagnosis method, in which the features in the vibration signal were extracted by using wavelet packet transform, and then the features with the most information were extracted by singular value decomposition. The extracted features were used to complete the fault diagnosis by the classifier. Patel [14] proposed a unique method for extracting and selecting bearing fault features based on Euclidean distance, which can reduce the time of feature extraction and improve the classification accuracy with fewer features. Cheng et al. [15] proposed a fault diagnosis method, which can enhance the recognition ability of the signal cycle. The main cycle of the signal is determined by continuous measurement, exhibiting excellent recognition ability. Yu et al. [16] proposed a time-frequency detection method for detecting the impulse component in the signal. By improving the time redistribution synchrosqueezing compression transform (TSST) for centralized analysis of the signal, the vibration features in the signal can be accurately extracted. Some researchers also use conventional methods such as principal component analysis, the hidden Markov model, and envelope analysis to diagnose and predict bearing faults [17,18,19,20]. However, the traditional fault diagnosis methods using time-frequency analysis to select the fault feature in the signal often use the experience of experts when analyzing highly complex signals; hence, this limits these methods to be used by everyone.

The traditional machine learning methods can only complete the learning using a simple network structure before the computer’s performance is greatly improved. Traditional machine learning algorithms such as support vector machine (SVM) has been used by many researchers for fault diagnosis [21]. Compared with only the use of signal processing methods for bearing fault analysis and diagnosis, traditional machine learning can directly extract the feature in the signal through the designed network without other processing of the signal [22]. Wang et al. [23] proposed a new diagnosis method for bearing fault analysis and diagnosis by combining SVM with a grasshopper optimization algorithm, which can extract features at exclusive scales to enhance the accuracy of fault diagnosis. Wan et al. [24] proposed a random forest model that can quickly and effectively diagnose the fault of rolling bearings, and improved the diagnostic efficiency of the model by training multiple decision trees in parallel. Yuan et al. [25] combined a convolutional neural network with SVM for bearing analysis and fault diagnosis, and transformed a one-dimensional time domain signal into a two-dimensional time-frequency domain signal by continuous wavelet transform. The two-dimensional signal was saved as a picture as the input of the model to improve the diagnostic accuracy and stability of the model. Li et al. [26] combined the concept of superposition representation learning (S-RL) with SVM to design a new bearing analysis and diagnosis method called deep stacking least squares support vector machine (DSLS-SVM) to effectively extract inherent fault features from measured vibration signals. However, the traditional machine learning method cannot extract the desired features well when encountering complex signals or signals containing noise due to its simple network.

Since the beginning of the 21st century, computer computing capacity has increased significantly due to advances in chip technology [27]. With the speedy improvement of deep learning technology, deep learning has been broadly used in the field of fault analysis owing to the fact of its strong feature extraction capability [28]. Chen et al. [29] designed a bearing analysis and diagnosis method, in which different features in the original signal were extracted by using convolution kernels with different sizes, and then the extracted features were input into the long-short-term model. End-to-end diagnosis and analysis without any signal processing were achieved. Wang et al. [30] designed a bearing fault analysis and diagnosis method based on a convolutional neural network and multi-attention mechanism. By adaptively adjusting the features extracted from each layer, the features were recalibrated to enhance the feature learning ability of the model. Hoang et al. [31] used the motor current signal as the dataset to analyze and diagnose the bearing fault. The feature extraction work was completed by the convolutional neural network, and then the extracted features were classified to realize the fault analysis and diagnosis. Meng et al. [32] proposed a new data preprocessing technique to obtain the training data, which was then combined with the feature extraction of a denoising automatic encoder to complete the fault diagnosis and analysis. Zhang et al. [33] proposed a fault diagnosis and analysis method. The features of the original signal were extracted by a convolutional neural network, and then the extracted features were input into the classifier improved by a particle swarm optimization algorithm to realize the function of adaptive feature extraction. However, the above deep-learning-based methods have the following drawbacks:

To extract the desired features at a deeper level, traditional deep learning models often increase the depth of the network, but this increases the probability of gradient explosion and gradient disappearance when the parameters of the model are updated reversely;
If a deeper model is used, it means that more parameters need to be learned, which can easily lead to overfitting. In addition, increasing the size of the model may also increase the diagnosis time and reduce the real-time performance of model diagnosis;
Existing fault diagnosis methods based on deep learning only use the vibration signal of a single direction as the dataset for diagnosis, ignoring the characteristics contained in the vibration signal of other directions, resulting in poor performance of the model;
Most diagnostic methods have poor anti-noise performance and robustness, and the model trained by noiseless datasets cannot effectively classify data sets with different proportions of noise.

In order to address the above issues, this paper proposes a lightweight multi-channel deep separable convolutional neural network (MCDS-CNN) for rolling bearing fault diagnosis. The proposed method can significantly reduce the size of the model and achieve good diagnostic results using data sets with a different signal-to-noise ratio (SNR). The major contributions of this work are summarized as follows: (1) In this paper, a lightweight fault diagnosis model for bearings is proposed. By using the residual connection and batch normalization (BN) to make the model quickly and smoothly learn and update parameters, the possibility of gradient explosion or gradient disappearance is reduced. (2) A lightweight unit is designed to extract features by using depthwise separable convolutions, significantly reducing the number and size of model parameters. (3) The influence of vibration signals in various directions as input on the diagnostic performance of the model is discussed. Time domain vibration signal is transformed into a time-frequency domain signal by short-time Fourier Transform (STFT). The necessity of using vibration signals in multiple directions as input is further verified by experiments for improving the performance of the model. (4) Compared with other classical models, the proposed method has better performance and higher noise immunity.

The rest of this paper is organized as follows: Section 2 introduces some theoretical backgrounds, including a time-frequency transform method and convolution neural network. Section 3 describes the overall fault diagnosis process and the overall structure and parameters of the proposed model. Section 4 discusses the influence of vibration signals in various directions as input on the diagnosis effect and compares the performance and anti-noise performance with other classical models. Finally, Section 5 summarizes the conclusions.

2. Theoretical Background

2.1. Short Time Fourier Transform

STFT is a popular method for signal analysis. It uses a Fourier transform with a time window, which can move in the time dimension, so as to provide a better localization in the time and frequency domains. The basic operating formula of STFT is defined as follows:

{STFT}_{x} (t, f) = \int_{- \infty}^{+ \infty} x (t) h (t - τ) e^{- j 2 π f t} d t

(1)

where

x (t)

represents the original one-dimensional time-domain signal,

h (t - τ) e^{- j 2 π f t}

is the window function of the STFT, and f represents the frequency of the Fourier transform. By moving the window function, the original signal is analyzed piecewise.

STFT can map the original one-dimensional time-domain signal to the two-dimensional space containing the time-frequency signal; hence, the output with both time-domain and frequency-domain information can be obtained by STFT. Different from using only a one-dimensional time domain signal or transforming a one-dimensional vibration signal into a frequency domain signal, the fault diagnosis methods using the STFT transform to process vibration signals can avoid missing important information and make full use of the advantages of deep learning in processing images. Due to the above advantages, STFT is widely used in fault diagnosis [34], image processing [35], and audio signal processing [36].

2.2. Deep Convolution Neural Network

2.2.1. Convolution Neural Network

Since Le Cun first proposed the concept of convolutional neural networks (CNN) in 1989 [37], CNN is gradually applied to the fields of deep learning and image processing with the improvement of computer performance. A convolutional network is usually used to extract the features of input information. Since convolution introduces the local receptive field mechanism, a convolutional network for extracting image features is superior to some linear feature extraction methods. The standard convolution operation is shown in Figure 1.

The one-dimensional time domain signal is saved as a two-dimensional image through STFT, and then the obtained image is convoluted. A

L_{i n} \times W_{i n} \times C_{i n}

image as input to generate a

L_{o u t} \times W_{o u t} \times C_{o u t}

feature maps by convolution, where

L_{i n}

and

W_{i n}

is the length and width of the input image, respectively,

L_{o u t}

and

W_{o u t}

is the length and width of the output feature image, respectively, and

C_{i n}

and

C_{o u t}

are the number of channels for the input image and output feature image, respectively. The convolution kernel parameter number is

L_{k} \times W_{k} \times C_{i n} \times C_{o u t}

, where

L_{k}

and

W_{k}

represent the length and width of the convolution kernels, respectively. The number of channels for each convolution kernel equals the number of channels for the input, and the number of convolution kernel equals the number of channels for the output feature map. The calculation formula for standard convolution is as follows:

h_{j} = f (\sum_{i} X_{i} * W_{i j} + b_{j})

(2)

where

h_{j}

represents the jth output feature image of the current convolution layer,

X_{i}

denotes the ith output feature image of the previous convolution layer, * is the convolution operation,

W_{i j}

represents the convolution kernel corresponding to the ith input feature map of the jth layer,

b_{j}

represents the offset of the jth output corresponding to the convolution layer, and f is a nonlinear excitation function.

When each convolution kernel scans the input image, its weight is constant, while different convolution kernels have different weights and offsets, which are used to extract different features. Finally, the extracted different feature maps are superimposed on the channel dimension. Due to the shared weights and offsets within the convolution kernel of the standard convolution operation, the number of parameters in the convolution network can be greatly reduced compared with the fully connected network. The calculation formulas about the number of parameters in the standard convolution and the fully connected network are respectively as follows:

P a r a m e t e r s_{c o n v} = L_{k} \times W_{k} \times C_{i n} \times C_{o u t}

(3)

P a r a m e t e r s_{f c} = L_{i n} \times W_{i n} \times C_{i n} \times C_{o u t}

(4)

where

P a r a m e t e r s_{c o n v}

represents the number of parameters in the convolution network, and

P a r a m e t e r s_{f c}

represents the number of parameters in the full connection network.

2.2.2. Depthwise Separable Convolution

Unlike standard convolution, depthwise separable convolution decomposes the standard convolution neural network into a point convolution neural network (PCNN) and a depthwise convolution neural network (DCNN). The depthwise convolution is only used to extract the features of each channel of the input map without considering the mutual influence between each channel, while the point convolution superimposes the feature map extracted by deep convolution in the depth direction. Therefore, point convolution and depthwise convolution are usually used together for feature extraction.

Depthwise convolution is first proposed by Ref. [38]. Different from the standard convolution, each convolution kernel of the depthwise convolution has only one channel, and each convolution kernel corresponds to the channel of each input image. As shown in Figure 2, depthwise convolution can be written as follows:

h_{j} = f (\sum_{i} X_{i} * W_{i} + b_{j})

(5)

where

X_{i}

is the ith channel of the input feature map,

W_{i}

represents the convolution kernel corresponding to the ith channel of the input feature map.

The calculation formula of the number of parameters in depthwise convolution is as follows:

P a r a m e t e r s_{d v} = L_{k} \times W_{k} \times C_{i n}

(6)

where

P a r a m e t e r s_{d v}

represents the number of parameters in DCNN and is

1 / C_{o u t}

of that of CNN.

Although DCNN can extract the feature of each channel of the input image, it does not take into account the correlation between each channel. Since the number of convolution kernels for depthwise convolution must be equal to the channel number of the input image, the channel number of the output feature map is equal to the channel number of the input image. In order to increase the correlation between channels and adjust the channel number of the output feature map of depthwise convolution, a pointwise convolution layer is added to DCNN.

Pointwise convolution can increase the channel correlation of the input map. The channel number of each convolution kernel corresponds to that of the input image. Both the length and width of each convolution kernel are set to 1 to reduce the parameter number of the network. Hence the difference between pointwise convolution and standard convolution is that their convolution kernel sizes are different. The principle of PCNN is shown in Figure 3, and its parameter number calculation formula is as follows:

P a r a m e t e r s_{p v} = 1 \times 1 \times C_{i n} \times C_{o u t}

(7)

where

P a r a m e t e r s_{p v}

represents the number of parameters in pointwise convolution and is

1 / (L_{k} \times W_{k})

of that of standard convolution.

Depthwise separable convolution can extract features without ignoring the correlation between channels. Compared with CNN, it can reduce the complexity of the model, namely, the number of parameters, while the ability to extract features is almost no loss. When the input channel number of depthwise separable convolution is equal to that of standard convolution and their output channel numbers are also equal, the ratio of their parameter numbers can be expressed as follows:

\frac{L_{k} \times W_{k} \times C_{i n} + 1 \times 1 \times C_{i n} \times C_{o u t}}{L_{k} \times W_{k} \times C_{i n} \times C_{o u t}} = \frac{1}{C_{o u t}} + \frac{1}{L_{K} \times W_{k}}

(8)

Equation (8) shows that the more the number of output channels, the more the parameter number of depthwise separable convolution reduces if the length and width of the convolution kernel are ignored.

2.2.3. Residual Network

Convolutional networks can extract hidden features from input images. When an image contains too much information, the number of layers of convolutional networks should increase to better extract the desired features. However, when the number of stacked convolutional network layers is too much, it will lead to gradient explosion and gradient disappearance, which degrade the diagnostic accuracy and performance of the model. He et al. [39] proposed a residual network to reduce the possibility of gradient disappearance and gradient explosion. The structure of the residual network is shown in Figure 4.

As shown in Figure 4, X represents the input,

F (X)

denotes the output of the nth-layer network, and

F (X)

plus X enters the activation function. A residual network can reduce the likelihood of gradient disappearances and explosions when the number of network layers is too much. Under the premise of equal layers, the residual network converges faster compared with other networks.

2.2.4. Batch Normalization

The parameters of each layer are updated in the training process. In order to reduce training time and speed up training, we normalize the data before they enter the next layer, so we also call this process batch normalization. The calculation formula for batch normalization is as follows:

x^{'} = \frac{x - μ}{σ}

(9)

y^{'} = γ \cdot x^{'} + β

(10)

where

x^{'}

represents an output element after normalization,

x

is the initial data,

μ

represents the average of the initial data,

σ

represents the variance of the initial data,

y^{'}

represents the result after

x^{'}

is transformed and scaled), and

γ

and

β

are the factors that need to be learned.

2.2.5. GELU Function

In order to make the convolution layer better fit the nonlinear function, the output of the convolution layer is usually passed through an activation function. The ReLU function is often used. The definition of the ReLU function is as follows:

ReLU (x) = \{\begin{matrix} x x > 0 \\ 0 x \leq 0 \end{matrix}

(11)

The output of the ReLU function is x if its input, x, is greater than zero; otherwise, the output is zero. However, the network is no longer updated if the number of elements in input vectors is less than zero is too much, which will lead to learning stagnation. To address this issue, we use the GELU activation function as follows:

GELU (x) = 0.5 x (1 + \tanh (\sqrt{2 / π} \cdot (x + 0.044715 x^{3})))

(12)

The GELU activation function is the full name of the Gaussian error linear activation unit [40]. The GELU activation function randomly regularizes the input to fit the nonlinear function, multiplying the input by zero or one. The probability that the input is multiplied by zero increases as the input decreases. The GELU function combines nonlinearity with random regularization to make the updating and adjustment of weights in the network more sensitive.

2.2.6. Patch Embedding

Patch embedding is proposed in Ref. [41] for image classification. The patch embedding layer connects the input layer, which plays the role of downsampling and adjusting the number of channels and can also save some calculations.

As shown in Figure 5, the input is a three-dimensional image with a length of H, a width of W, and a channel number of three. The working principle of patch embedding is similar to convolution. A convolution kernel of P*P slides on the input image by a step of P. The number of convolution kernels is N. After patch embedding, an image block with the length of H/P, the width of W/P, and the channel number of N is generated. Patch embedding compresses the size of the input image and changes the number of channels. Compared with the down-sampling in the pooling layer, patch embedding can avoid the loss of information.

3. Diagnosis Process and Model Design

3.1. Diagnosis Process

A bearing fault diagnosis model using multi-directional vibration signals is presented in this paper. The specific diagnosis process is shown in Figure 6, which consists primarily of four parts, namely, data acquisition, data pre-processing, model training, and model testing. Since the input of the model requires three time-frequency maps of vibration signals in various directions, we first collect three vibration signals in various directions and then transform them into two-dimensional time-frequency maps by STFT. The training data set and test data set are produced by the two-dimensional time-frequency maps, and the model is trained until it converges. Finally, the diagnostic accuracy of the model is verified.

3.2. Model Design

3.2.1. Model Structure

The structure diagram of the designed MCDS-CNN model is shown in Figure 7, which shows that the model consists of the following four parts: an input layer, concat layer, a feature extraction layer, and a classification layer. The input of the input layer is the time-frequency maps of vibration signals in three different directions, namely, the x-axis, y-axis, and z-axis directions. In the concat layer, the time-frequency maps of three vibration signals are superimposed on the channel dimension to generate a 9-channel map. The output tensor after down-sampling is randomly regularized and normalized through the GELU layer and the batch normalization layer, respectively, which accelerates the training speed and reduces the training time. The normalized output data enter the designed lightweight unit (Light-Unit) to further extract deeper features. In the classification layer, the extracted features are dimension-reduced and then expanded through the global average pooling layer and flatten layer, respectively. Finally, the prediction results output from the softmax layer.

To improve training efficiency as well as reduce the classification time, several strategies are proposed to reduce the number of weight parameters in the model, as follows:

Before extracting the fault feature, patch embedding is used to downsample the input image and adjust the number of channels, which reduces the number of parameters greatly;
Use the GELU function to add nonlinearity to the output of each layer. Compared with the commonly used ReLU activation function, the GELU function can adaptively adjust the activation results so as to update the gradient more effectively and accelerate the training speed;
The proposed Light-Unit can significantly reduce the number of network parameters and the risk of gradient explosion or gradient disappearance by combining depthwise separable convolution with residual connection;
Using the global pooling layer instead of the full connection layer can directly reduce the dimension of the output features, which significantly reduces the number of model parameters and accelerates the training time.

The specific parameters of the designed MCDS-CNN model are shown in Table 1.

3.2.2. Light-Unit

The feature extraction function of the model is completed by several stacked Light-Units. The Light-Units consist of a depthwise convolution layer, a pointwise convolution layer, a GELU activation layer, and a BN layer. The specific network structure of Light-Units is shown in Figure 8, and the network parameters are shown in Table 2.

4. Results and Discussion

To verify the generalization and robustness of the proposed method, experiments were carried out on two open source bearing datasets. One data set is the gearbox data set of Southeast University, and the other is the CWRU-bearing data set of Case Western Reserve University.

4.1. Performance Metrics

In order to demonstrate the lightweight and efficiency of the proposed model, we use accuracy, parameter number and size of the model, ratio, and F1-score as metrics. The following are their definitions and calculation formulas.

4.1.1. Diagnostic Accuracy

Diagnostic accuracy is usually used as the metrics to evaluate the performance of the model. Formally, accuracy is denoted as follows:

a c c u r a c y = \frac{n_{r i g h t}}{N_{t o t a l}}

(13)

where

n_{r i g h t}

is the number of samples that can be correctly classified by the model, and

N_{t o t a l}

is the total number of samples.

4.1.2. Parameter Number and Size of Model

The number of parameters and the size of the model are used as the metrics to evaluate the efficiency of the model. The number of parameters can be defined as follows:

\{\begin{cases} P a r a m s_{w} = K_{l} \times K_{w} \times C_{i n} \times C_{o u t} \\ P a r a m s_{b} = C_{o u t} \\ P a r a m s_{t o t a l} = P a r a m s_{w} + P a r a m s_{b} \end{cases}

(14)

where

P a r a m s_{w}

is the number of parameters required for a convolution operation,

K_{l}

and

K_{w}

is the length and width of convolution kernel, respectively,

C_{i n}

and

C_{o u t}

are the number of channels for the input image and output image, respectively,

P a r a m s_{b}

is the number of parameters of the bias unit, and

P a r a m s_{t o t a l}

is the total number of parameters.

The size of model can be computed as follows:

S i z e_{m o d e l} = s_{i n p u t} + s_{b a c k} + s_{p a r a m s}

(15)

where

S i z e_{m o d e l}

represents the size of model,

s_{i n p u t}

is the size of input image,

s_{b a c k}

is the size of weights in the backward process, and

s_{p a r a m s}

is the size of paramas.

4.1.3. Ratio

The smaller the Ratio is, the better the model performance is. Raito is defined as follows:

R a t i o = \frac{m_{s i z e}}{m_{a c c u r a c y}}

(16)

where

m_{a c c u r a c y}

represents the accuracy of the model, and

m_{s i z e}

denotes the size of the model.

4.1.4. F1-Score

The F1-score is a common performance metric in binary classification tasks. The calculation formula of F1-score is as follows:

\{\begin{cases} P r e c i s i o n = \frac{T P}{T P + F P} \\ R e c a l l = \frac{T P}{T P + F N} \\ F 1 - Score = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} \end{cases}

(17)

where TP represents the positive sample predicted by the model as a positive sample, FP represents the negative sample predicted by the model as a positive sample, and FN represents that the positive sample is predicted by the model as a negative sample.

4.2. Data Set 1

The experimental platform of the gearbox dataset of Southeast University [42] is a drivetrain dynamic simulator (DDS), which consists of a gearbox, motor, and brake controller. Three acceleration sensors are used to collect bearing vibration signals of the x, y, and z axis, respectively. There are five types of bearing faults in this dataset, namely, ball fault (BF), no-fault (NF), inner ring fault (IF), outer ring fault (OF), and inner and outer ring mixed fault (MF). Due to the limited data length provided by the data set, we use overlapping sampling technology to expand the data set. As shown in Figure 9, the data set is generated by transforming the intercepted signal into a two-dimensional time-frequency image by a short-time Fourier transform.

After overlapping sampling each type of fault signal, the number of samples becomes m, and m can be expressed, as follows:

m = ⌊\frac{L - l}{n} + 1⌋

(18)

where

L

represents the total length of the original signal,

l

represents the length of single sample, and n denotes the step length of sliding window.

The data set is divided into the following three sub-data sets: A, B, and C. Data set A is collected under the condition that the speed of the bearing is 1200 rpm and there is no load, and data set B is collected under the condition that the speed is 1800 rpm and the load is 7.32 N.m, while data set C is collected under the above two conditions. Both data sets A and B include 400 training samples and 100 test samples. Data set C includes 800 training samples and 200 test samples. Table 3 shows the details of all the data sets.

4.2.1. Analysis

Influence of Vibration Signals in Different Directions on Diagnosis Results

In order to analyze the influence of vibration signals in various directions on the fault diagnosis results, we consider all possible combinations of vibration signals in three directions as the input. These options include a single x-axis signal, a single y-axis signal, a single z-axis signal, the combined signal of x-axis and y-axis, the combined signal of x-axis and z-axis, the combined signal of y-axis and z-axis, as well as the combined signal of x-axis, y-axis, and z-axis. Due to the uncertainty of parameter updates during training, the performance of the model obtained from each training is different. In order to eliminate the difference as far as possible, we use each input option to train the proposed MCDS-CNN model ten times. The trained model is tested by the test set, and the test results are shown in Table 4.

It can be seen from Table 4 that the maximum accuracy, the minimum accuracy, and the average accuracy of bearing fault diagnosis reach 100%, 99%, and 99.69%, respectively, when we use the combined signal of the x-axis, y-axis, and z-axis to train the model. The three values are the highest of all the options because the combined signal of the x-axis, y-axis, and z-axis includes the most fault features. Compared with using combined signals, using a single axis signal has a shorter training time owing to the smaller size of maps that enter the feature extraction layer.

Influence of the Number of Lightweight Units on Model Performance

In order to investigate the influence of the number of lightweight units on the fault diagnosis results, we chose a different number of lightweight units to design the feature extraction layer to evaluate the performance of the model. The number of lightweight units ranges from 1 to 7. The total number of parameters, the total size of the model, the accuracy, and the training time are selected as the evaluation index. The specific results are shown in Table 5. It can be seen that the training time is the shortest and the total number of parameters as well as the total size are the least if only one lightweight unit is used. This is because the fewer the number of lightweight units, the less the number of model weights that need to be updated. However, the accuracy of the model using only one lightweight unit for feature extraction is lower than that of the model using multiple lightweight units because the feature extraction network with only one lightweight unit is not deep enough to lead to some feature omissions. The accuracy of the model is the highest when two lightweight units are used. The accuracy of the model decreases with the increase in the number of lightweight units if the number of lightweight units exceeds two. Although the feature extraction network with more lightweight units can learn more features, it will also lead to gradient explosion, gradient disappearance, overfitting, and other problems, thus resulting in a decrease in the accuracy of fault diagnosis. Therefore, the proposed model uses two lightweight units to extract fault features.

4.2.2. Experimental Verification

Efficient and Lightweight Verification

In order to verify the superiority in the accuracy and lightweight, comparisons of fault diagnosis results between using the proposed model and using AlexNet [43], ResNet18 [44], LeNet5 [45], MobileNet [46], as well as 2D-CNN, are implemented. The code of the comparison model used is available publicly. The 2D-CNN is a standard convolutional network and has no lightweight unit. Figure 10 shows the accuracy of fault diagnosis for each model on data sets A, B, and C. It can be observed that the proposed model, MCDS-CNN, has the best performance on all three data sets, and the average accuracy is 100%, 99.8%, and 99.5%, respectively. The accuracy of 2D-CNN on data set A is only 76.8%, while that of LeNet5 on data set B is only 77%. Their accuracy is far lower than that of the proposed model. Both AlexNet and ResNet18 achieve an accuracy of more than 98% on the three data sets, which indicates that a deep network is helpful to enhance the diagnostic accuracy of a model.

Table 6 shows the size, accuracy, and Ratio of each model for data set A. It can be seen that although the total number of parameters of 2D-CNN is only 59 KB, its accuracy is only 76.8%. The total number of parameters of the proposed model, MCDS-CNN, is 342 KB, which is far lower than that of AlexNet, ResNet18, or MobileNet, and the total size of the MCDS-CNN model is only 1.58 MB, which is only 0.03%, 0.03%, 0.08%, 7.5%, and 2.5% of the size of AlexNet, ResNet18, LeNet5, MobileNet, and 2D-CNN, respectively, but the accuracy of the MCDS-CNN model reaches 99.8%, which is the highest. The Raito of the proposed model, MCDS-CNN, is the smallest, which demonstrates lightweight and efficiency. Table 7 shows the size, accuracy, and Ratio of each model for data set B. Similar results and conclusions as on data set A can be achieved, which further demonstrates the advantage of the proposed model in lightweight and efficiency.

Anti-Noise Verification

Bearings usually work in complex conditions with some interference, such as ambient noise. To verify the anti-noise performance and robustness of the proposed model, it is necessary to use the data sets with noise to test the model. The open-source data sets were collected in the laboratory, which indicates that the data sets are ‘clean’. Therefore, we add white noise to the original vibration signal so as to obtain the test data sets with a different signal-to-noise ratio (SNR). The definition of SNR is shown in Equation (19). Figure 11 shows an example of the time-frequency maps with no noise, with an SNR of −2 dB, 0 dB, 2 dB, 4 dB, and 6 dB, respectively.

SNR = 10 \lg (\frac{P_{s i g n a l}}{P_{n o i s e}})

(19)

In Equation (19),

P_{s i g n a l}

represents the effective power of the signal, and

P_{n o i s e}

represents the effective power of noise.

Each trained model is tested by the test data sets with different SNRs, respectively. It should be noted that no artificial white noise is added to the training data set. Table 8, Table 9 and Table 10 show the relationship between the accuracy of fault diagnosis and the SNR of the test data set A, B and C, respectively. It can be observed that the accuracy increases with the increase in the SNR for all the models, whether on the test data set A, B, or C, because more noise in the test data set will cover more fault feature information. In addition, the proposed model has the best accuracy under all SNR conditions on data sets A, B, or C. The accuracy of the proposed model is 18% to 50% higher than that of other models when the SNR is −2 dB. If the SNR becomes 4 dB or 6 dB, both the accuracy of 2D-CNN and AlexNet are close to that of MCDS-CNN, but the size of 2D-CNN is 39 times larger than that of the proposed model, while the size of AlexNet is 2665 times larger than that of the proposed model.

4.3. Data Set 2

In order to verify the generality of the proposed method, this paper also uses the CWRU-bearing data set of Case Western Reserve University for experiments without changing the model structure or data set generation method.

The experimental platform of the CWRU bearing data set at Case Western Reserve University consists of a 2 hp motor, a torque sensor, a dynamometer, and a control electronic device. There are two acceleration sensors that collect vibration data from the drive end of the motor housing and the fan end, respectively. In this paper, seven kinds of fault states under no load conditions are selected for diagnosis. The specific bearing fault states are shown in Table 11.

The data set is generated by overlapping sampling the data in the same way as the bearing data set from Southeast University. Each fault state has a total of 800 samples, of which 700 are used as training sets and 100 as test sets. The details of the data set are shown in Table 12.

4.3.1. Analysis

Since the CWRU bearing data set of Case Western Reserve University contains the diagnostic data of the drive end and the fan end, in order to compare the influence of vibration signals in different directions on the diagnostic results, this paper separately uses the vibration signals of the drive end (DE) and the fan end (FE) as the input of the model and the drive end together with the fan end as the input of the model for fault diagnosis. The diagnostic effect of the model under three input conditions is shown in Table 13.

It can be seen from Table 13 that using FE and DE together as input to the model performs better than using only one vibration signal as input to the model, although it adds some training time.

4.3.2. Experimental Verification

In order to verify the versatility and excellent anti-noise performance of the proposed model, a data-driven model, DD-CNN [47], is also used as a comparison model. DD-CNN is a new convolutional neural network based on LeNet-5 for fault diagnosis. The input of the model in DD-CNN is the same as the model proposed in this paper by converting the signal into a two-dimensional image. Therefore, DD-CNN is suitable as a comparison model. Since only the data set is changed without changing the model structure, the size of all comparison models is not affected, so only the anti-noise verification of the models is performed.

Add white noise to the original vibration signal in data set 2 to simulate the real working environment. The specific performance of each model is shown in Table 14.

Table 14 shows that the diagnostic performance of the proposed model under different signal-to-noise ratios is superior to other classical models.

4.4. Visualization of Diagnostic Results

In order to graphically exhibit the superiority of the proposed method, the t-distributed stochastic neighbor embedding algorithm (t-SNE) is used to visualize the fault diagnosis results. Taking test data set A as an example, we use t-SNE to visualize the fault diagnosis results of all models, as shown in Figure 12. It can be seen that the proposed model can accurately distinguish five types of bearing faults without overlapping. The 2D-CNN, AlexNet, and LeNet5 have a poor performance in distinguishing between IF fault and OF fault. Except for the proposed model, other models have certain errors in identifying the NO fault and the MF fault. For example, ResNet18 incorrectly classifies NO fault as MF fault, and AlexNet incorrectly classifies MF fault as NO fault. The visualization of diagnostic results shows that the proposed model can fully extract the features of different types of bearing faults to accurately complete fault classification.

5. Conclusions

In this paper, a multi-channel deep separable convolutional neural network (MCDS-CNN) based on depthwise separable convolutions and residual connection networks is proposed for bearing fault diagnosis. The vibration signals in different directions are transformed into two-dimensional time-frequency images by STFT and then are regarded as the input of the network. By fusing the features of vibration signals in different directions, the fault diagnosis accuracy and robustness of the network are improved. The designed lightweight unit of the network greatly reduces the number of parameters by using depthwise separable convolutions, thus speeding up the calculation speed. The influence of the number of lightweight units on the fault diagnosis results is investigated. It is found that the accuracy of the model is the highest when two lightweight units are used. The residual connection is added to improve the stability of the network. Comparisons of fault diagnosis results between using the proposed model and using the other six classic models are implemented. The comparison results demonstrate the advantages of the proposed model in lightweight and accuracy. At the same time, through the experimental analysis of two data sets, it was proved that the proposed method has better anti-noise performance and generalization. The visualization of diagnostic results also verifies the accuracy of the proposed model for bearing fault diagnosis.

Author Contributions

Methodology, L.L. and Q.W.; software, Q.W.; validation and data curation, Y.W. and K.H.; writing—original draft preparation, L.L. and Q.W.; project administration, C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Academic Support Project for Top-notch Talents in Disciplines (Majors) of Colleges and Universities in Anhui Province under Grant gxbjZD2021052 and supported by Anhui Natural Science Foundation Project under Grant 2208085ME128.

Data Availability Statement

Data sharing is not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hoang, D.-T.; Kang, H.-J. A survey on deep learning based bearing fault diagnosis. Neurocomputing 2019, 335, 327–335. [Google Scholar] [CrossRef]
Li, C.; De Oliveira, J.V.; Cerrada, M.; Cabrera, D.; Sánchez, R.V.; Zurita, G. A systematic review of fuzzy formalisms for bearing fault diagnosis. IEEE Trans. Fuzzy Syst. 2018, 27, 1362–1382. [Google Scholar] [CrossRef]
Chen, Z.; Mauricio, A.; Li, W.; Gryllias, K. A deep learning method for bearing fault diagnosis based on cyclic spectral coherence and convolutional neural networks. Mech. Syst. Signal Process. 2020, 140, 106683. [Google Scholar] [CrossRef]
Wang, X.; Mao, D.; Li, X. Bearing fault diagnosis based on vibro-acoustic data fusion and 1D-CNN network. Measurement 2021, 173, 108518. [Google Scholar] [CrossRef]
Zhang, Y.; Xing, K.; Bai, R.; Sun, D.; Meng, Z. An enhanced convolutional neural network for bearing fault diagnosis based on time–frequency image. Measurement 2020, 157, 107667. [Google Scholar] [CrossRef]
Huang, W.; Gao, G.; Li, N.; Jiang, X.; Zhu, Z. Time-frequency squeezing and generalized demodulation combined for variable speed bearing fault diagnosis. IEEE Trans. Instrum. Meas. 2018, 68, 2819–2829. [Google Scholar] [CrossRef]
Wang, J.; Mo, Z.; Zhang, H.; Miao, Q. A deep learning method for bearing fault diagnosis based on time-frequency image. IEEE Access 2019, 7, 42373–42383. [Google Scholar] [CrossRef]
Pang, B.; Nazari, M.; Tang, G. Recursive variational mode extraction and its application in rolling bearing fault diagnosis. Mech. Syst. Signal Process. 2022, 165, 108321. [Google Scholar] [CrossRef]
Wang, Z.; Zhou, J.; Du, W.; Lei, Y.; Wang, J. Bearing fault diagnosis method based on adaptive maximum cyclostationarity blind deconvolution. Mech. Syst. Signal Process. 2022, 162, 108018. [Google Scholar] [CrossRef]
Yu, G. A concentrated time–frequency analysis tool for bearing fault diagnosis. IEEE Trans. Instrum. Meas. 2019, 69, 371–381. [Google Scholar] [CrossRef]
Li, H.; Liu, T.; Wu, X.; Chen, Q. An optimized VMD method and its applications in bearing fault diagnosis. Measurement 2020, 166, 108185. [Google Scholar] [CrossRef]
Zhang, K.; Xu, Y.; Liao, Z.; Song, L.; Chen, P. A novel Fast Entrogram and its applications in rolling bearing fault diagnosis. Mech. Syst. Signal Process. 2021, 154, 107582. [Google Scholar] [CrossRef]
Zhu, H.; He, Z.; Wei, J.; Wang, J.; Zhou, H. Bearing fault feature extraction and fault diagnosis method based on feature fusion. Sensors 2021, 21, 2524. [Google Scholar] [CrossRef]
Patel, S.P.; Upadhyay, S.H. Euclidean distance based feature ranking and subset selection for bearing fault diagnosis. Expert Syst. Appl. 2020, 154, 113400. [Google Scholar] [CrossRef]
Cheng, J.; Yang, Y.; Li, X.; Cheng, J. Adaptive periodic mode decomposition and its application in rolling bearing fault diagnosis. Mech. Syst. Signal Process. 2021, 161, 107943. [Google Scholar] [CrossRef]
Yu, G.; Lin, T.; Wang, Z.; Li, Y. Time-reassigned multisynchrosqueezing transform for bearing fault diagnosis of rotating machinery. IEEE Trans. Ind. Electron. 2020, 68, 1486–1496. [Google Scholar] [CrossRef]
Kim, S.; An, D.; Choi, J.-H. Diagnostics 101: A tutorial for fault diagnostics of rolling element bearing using envelope analysis in matlab. Appl. Sci. 2020, 10, 7302. [Google Scholar] [CrossRef]
Kwan, C.; Zhang, X.; Xu, R.; Haynes, L. A Novel Approach to Fault Diagnostics and Prognostics. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation (Cat. No. 03CH37422), Taipei, Taiwan, 14–19 September 2003; pp. 604–609. [Google Scholar]
Zhang, X.; Xu, R.; Kwan, C.; Liang, S.Y.; Xie, Q.; Haynes, L. An Integrated Approach to Bearing Fault Diagnostics and Prognostics. In Proceedings of the 2005 American Control Conference, Portland, OR, USA, 8–10 June 2005; pp. 2750–2755. [Google Scholar]
Bearing Diagnostics. Available online: https://www.innomic.com/en/knowledge/bearing-diagnostics/ (accessed on 13 June 2022).
Wang, Z.; Yao, L.; Chen, G.; Ding, J. Modified multiscale weighted permutation entropy and optimized support vector machine method for rolling bearing fault diagnosis with complex signals. ISA Trans. 2021, 114, 470–484. [Google Scholar] [CrossRef]
Li, X.; Jiang, H.; Niu, M.; Wang, R. An enhanced selective ensemble deep learning method for rolling bearing fault diagnosis with beetle antennae search algorithm. Mech. Syst. Signal Process. 2020, 142, 106752. [Google Scholar] [CrossRef]
Wang, Z.; Yao, L.; Cai, Y. Rolling bearing fault diagnosis using generalized refined composite multiscale sample entropy and optimized support vector machine. Measurement 2020, 156, 107574. [Google Scholar] [CrossRef]
Wan, L.; Gong, K.; Zhang, G.; Yuan, X.; Li, C.; Deng, X. An efficient rolling bearing fault diagnosis method based on spark and improved random forest algorithm. IEEE Access 2021, 9, 37866–37882. [Google Scholar] [CrossRef]
Yuan, L.; Lian, D.; Kang, X.; Chen, Y.; Zhai, K. Rolling bearing fault diagnosis based on convolutional neural network and support vector machine. IEEE Access 2020, 8, 137395–137406. [Google Scholar] [CrossRef]
Li, X.; Yang, Y.; Pan, H.; Cheng, J.; Cheng, J. A novel deep stacking least squares support vector machine for rolling bearing fault diagnosis. Comput. Ind. 2019, 110, 36–47. [Google Scholar] [CrossRef]
Xu, G.; Liu, M.; Jiang, Z.; Söffker, D.; Shen, W. Bearing fault diagnosis method based on deep convolutional neural network and random forest ensemble learning. Sensors 2019, 19, 1088. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Mo, Z.; Wang, J.; Zhang, H.; Miao, Q. Weighted cyclic harmonic-to-noise ratio for rolling element bearing fault diagnosis. IEEE Trans. Instrum. Meas. 2019, 69, 432–442. [Google Scholar] [CrossRef]
Chen, X.; Zhang, B.; Gao, D. Bearing fault diagnosis base on multi-scale CNN and LSTM model. J. Intell. Manuf. 2021, 32, 971–987. [Google Scholar] [CrossRef]
Wang, H.; Liu, Z.; Peng, D.; Qin, Y. Understanding and learning discriminant features based on multiattention 1DCNN for wheelset bearing fault diagnosis. IEEE Trans. Ind. Inform. 2019, 16, 5735–5745. [Google Scholar] [CrossRef]
Hoang, D.T.; Kang, H.J. A motor current signal-based bearing fault diagnosis using deep learning and information fusion. IEEE Trans. Instrum. Meas. 2019, 69, 3325–3333. [Google Scholar] [CrossRef]
Meng, Z.; Zhan, X.; Li, J.; Pan, Z. An enhancement denoising autoencoder for rolling bearing fault diagnosis. Measurement 2018, 130, 448–454. [Google Scholar] [CrossRef] [Green Version]
Zhang, X.; Han, P.; Xu, L.; Zhang, F.; Wang, Y.; Gao, L. Research on bearing fault diagnosis of wind turbine gearbox based on 1DCNN-PSO-SVM. IEEE Access 2020, 8, 192248–192258. [Google Scholar] [CrossRef]
Ma, Y.; Maqsood, A.; Corzine, K.; Oslebo, D. Long Short-Term Memory Autoencoder Neural Networks Based Dc Pulsed Load Monitoring Using Short-Time Fourier Transform Feature Extraction. In Proceedings of the 2020 IEEE 29th International Symposium on Industrial Electronics (ISIE), Delft, The Netherlands, 17–19 June 2020; pp. 912–917. [Google Scholar]
Khan, A.S.; Ahmad, Z.; Abdullah, J.; Ahmad, F. A spectrogram image-based network anomaly detection system using deep convolutional neural network. IEEE Access 2021, 9, 87079–87093. [Google Scholar] [CrossRef]
Elbir, A.; İlhan, H.O.; Serbes, G.; Aydın, N. Short Time Fourier Transform Based Music Genre Classification. In Proceedings of the 2018 Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting (EBBT), Istanbul, Turkey, 18–19 April 2018; pp. 1–4. [Google Scholar]
LeCun, Y.; Boser, B.; Denker, J.; Henderson, D.; Howard, R.; Hubbard, W.; Jackel, L. Handwritten digit recognition with a back-propagation network. Adv. Neural Inf. Process. Syst. 1990, 2, 396–404. [Google Scholar]
Sifre, L.; Mallat, S. Rigid-motion scattering for texture classification. arXiv 2014, arXiv:1403.1687. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Shao, S.; McAleer, S.; Yan, R.; Baldi, P. Highly accurate machine fault diagnosis using deep transfer learning. IEEE Trans. Ind. Inform. 2018, 15, 2446–2455. [Google Scholar] [CrossRef]
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
Ullah, A.; Elahi, H.; Sun, Z.; Khatoon, A.; Ahmad, I. Comparative analysis of AlexNet, ResNet18 and SqueezeNet with diverse modification and arduous implementation. Arab. J. Sci. Eng. 2022, 47, 2397–2417. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Wen, L.; Li, X.; Gao, L.; Zhang, Y. A New Convolutional Neural Network-Based Data-Driven Fault Diagnosis Method. IEEE Trans. Ind. Electron. 2018, 65, 5990–5998. [Google Scholar] [CrossRef]

Figure 1. Standard convolution operation.

Figure 2. Depthwise convolution operation.

Figure 3. Pointwise convolution operation.

Figure 4. Structure of residual network.

Figure 5. Patch embedding process.

Figure 6. Specific process of fault diagnosis using multi-directional vibration signals.

Figure 7. Overall structure of the proposed MCDS-CNN.

Figure 8. Overall structure of Light-Unit.

Figure 9. Data preprocessing process.

Figure 10. Accuracy comparisons between the proposed model, MCDS-CNN, and other five models on data set A, B, and C.

Figure 11. An example of the time-frequency maps with different SNRs: (a) no noise; (b) the SNR of −2 dB; (c) the SNR of 0 dB; (d) the SNR of 2 dB; (e) the SNR of 4 dB; (f) the SNR of 6 dB.

Figure 12. Visualization of bearing fault diagnosis results of all the models: (a) 2D-CNN (b) the proposed model MCDS-CNN; (c) AlexNet; (d) LeNet5; (e) MobileNet; (f) ResNet18.

Table 1. Specific parameters of the proposed MCDS-CNN.

Layer	Kernel Size/Channel	Output(Length × Width × Channel)	Params
Input	-	64 × 64 × 3	0
Concat	-	64 × 64 × 9	0
PE	16 × 16/128	4 × 4 × 128	33,024
GELU	-/128	4 × 4 × 128	0
BN	-/128	4 × 4 × 128	256
Light-Unit	-	4 × 4 × 128	23,168
GApooling	1 × 1/128	1 × 1 × 128	0

Table 2. Specific parameters of Light-Unit.

Layer	Kernel Size/Channel	Output(Length × Width × Channel)	Params
DW Conv	7 × 7/128	4 × 4 × 128	6400
GELU	/128	4 × 4 × 128	0
BN	/128	4 × 4 × 128	256
Residual	/128	4 × 4 × 128	0
PW Conv	1 × 1/128	4 × 4 × 128	16,512
GELU	/128	4 × 4 × 128	0
BN	/128	4 × 4 × 128	256

Table 3. Details of data sets.

Data Sets	Load(N.m)	Train Samples	Test Samples	Fault Types	Labels
A/B/C	0/7.32/0 + 7.32	400/400/800	100/100/200	BF	0
		400/400/800	100/100/200	NO	1
		400/400/800	100/100/200	IF	2
		400/400/800	100/100/200	OF	3
		400/400/800	100/100/200	MF	4

Table 4. Performance of the proposed model tested by the data sets formed using different combination of vibration signals from data set 1.

Metrics	X	Y	Z	X + Y	X + Z	Y + Z	X + Y + Z
Max Accuracy	99.00%	97.60%	98.60%	99.80%	99.80%	99.00%	100%
Min Accuracy	87.35%	84.20%	95.80%	97.20%	98.60%	94.80%	99.00%
Avg Accuracy	98.26%	94.71%	96.78%	99.37%	99.57%	98.33%	99.69%
Training Time	120.77 s	120.86 s	120.62 s	124.03 s	125.84 s	124.76 s	125.70 s

Table 5. Influence of the number of lightweight units on the performance of the proposed model.

Metrics	1	2	3	4	5	6	7
Total Params	319 KB	342 KB	366 KB	389 KB	414 KB	436 KB	459 KB
Total Size	1.38 MB	1.58 MB	1.77 MB	1.97 MB	2.17 MB	2.37 MB	2.57 MB
Accuracy	97.69%	99.79%	99.74%	99.60%	98.99%	98.26%	98.35%
Training Time	123.94 s	125.59 s	127.05 s	131.40 s	127.81 s	131.27 s	128.75 s

Table 6. Ratio and other performances of the proposed model, MCDS-CNN, and other five models on data set A.

Metrics	Alex Net	ResNet18	LeNet5	MobileNet	2D-CNN	MCDS-CNN
Total Params (KB)	29,687	11,174	86	2232	59	342
Total Size (MB)	4211.58	4152.25	1890.79	21.01	62.32	1.58
Accuracy (%)	98.80	99.60	79.20	92.80	76.80	99.80
Training Time (s)	212.24	793.38	80.08	1350.20	144	123.41
Ratio (MB/%)	42.627	41.689	23.873	0.226	0.811	0.015

Table 7. Ratio and other performances of the proposed model, MCDS-CNN, and other five models on data set B.

Metrics	Alex Net	ResNet18	LeNet5	MobileNet	2D-CNN	MCDS-CNN
Total Params (KB)	29,687	11,174	86	2232	59	342
Total Size (MB)	4211.58	4152.25	1890.79	21.01	62.32	1.58
Accuracy (%)	98.85	99.60	88	97.05	98.82	100
Training Time (s)	215.77	766.22	77.38	1449.58	136.06	135.84
Ratio (MB/%)	42.606	41.689	21.486	0.216	0.631	0.016

Table 8. Accuracy of the proposed model MCDS-CNN, and other five models on data set A with different SNR.

SNR/(dB)	Alex Net	ResNet18	LeNet5	MobileNet	2D-CNN	MCDS-CNN
−2	54%	26%	20%	23%	45%	72%
0	68%	39%	26%	34%	68%	80%
2	79%	57%	34%	49%	82%	88%
4	90%	75%	51%	62%	92%	94%
6	93%	87%	61%	73%	98%	98.8%

Table 9. Accuracy of the proposed model MCDS-CNN, and other five models on data set B with different SNR.

SNR/(dB)	Alex Net	ResNet18	LeNet5	MobileNet	2D-CNN	MCDS-CNN
−2	29%	31%	20%	31%	38%	49%
0	40%	44%	24%	39%	55%	59%
2	72%	66%	28%	54%	77%	73%
4	83%	80%	57%	66%	84%	91%
6	87%	78%	86%	77%	93%	94%

Table 10. Accuracy of the proposed model MCDS-CNN, and other five models on data set C with different SNR.

SNR/(dB)	Alex Net	ResNet18	LeNet5	MobileNet	2D-CNN	MCDS-CNN
−2	51%	26%	35%	44%	27%	60%
0	61%	38%	55%	50%	28%	70%
2	72%	59%	71%	57%	37%	82%
4	82%	74%	88%	62%	57%	89%
6	92%	80%	89%	67%	92%	94%

Table 11. Bearing fault description in CWRU data set.

Fault Location	Diameter/ Inches	Depth/ Inches	Bearing Manufacturer	Labels
Health	0	0.011	SKF	0
Inner Raceway	0.007	0.011	SKF	1
Inner Raceway	0.014	0.011	SKF	2
Ball	0.007	0.011	SKF	3
Ball	0.014	0.011	SKF	4
Outer Raceway	0.007	0.011	SKF	5
Outer Raceway	0.014	0.011	SKF	6

Table 12. Details of data set 2.

Bearing Status	Number of Training Sets	Number of Test Sets	Labels
Health	700	100	0
Inner Raceway	700	100	1
Inner Raceway	700	100	2
Ball	700	100	3
Ball	700	100	4
Outer Raceway	700	100	5
Outer Raceway	700	100	6

Table 13. Performance of the proposed model tested by the data sets formed using different combination of vibration signals from data set 2.

Metrics	DE	FE	DE + FE
Max Accuracy	99.70%	99.60%	100%
Min Accuracy	98.20%	98.40%	99.20%
Avg Accuracy	99.47%	99.59%	99.79%
Training Time	114.24 s	115.74 s	123.62 s
F1-Score	0.98	0.98	0.99

Table 14. Accuracy of the proposed model MCDS-CNN, and other six models on data set 2.

SNR/(dB)	Alex Net	ResNet18	LeNet5	DD-CNN	MobileNet	2D-CNN	MCDS-CNN
−2	62%	46%	45%	32%	54%	47%	64%
0	71%	58%	55%	33%	66%	58%	80%
2	78%	69%	71%	45%	77%	67%	88%
4	86%	84%	94%	66%	82%	77%	95%
6	97%	90%	96%	92%	97%	92%	98%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ling, L.; Wu, Q.; Huang, K.; Wang, Y.; Wang, C. A Lightweight Bearing Fault Diagnosis Method Based on Multi-Channel Depthwise Separable Convolutional Neural Network. Electronics 2022, 11, 4110. https://doi.org/10.3390/electronics11244110

AMA Style

Ling L, Wu Q, Huang K, Wang Y, Wang C. A Lightweight Bearing Fault Diagnosis Method Based on Multi-Channel Depthwise Separable Convolutional Neural Network. Electronics. 2022; 11(24):4110. https://doi.org/10.3390/electronics11244110

Chicago/Turabian Style

Ling, Liuyi, Qi Wu, Kaiwen Huang, Yiwen Wang, and Chengjun Wang. 2022. "A Lightweight Bearing Fault Diagnosis Method Based on Multi-Channel Depthwise Separable Convolutional Neural Network" Electronics 11, no. 24: 4110. https://doi.org/10.3390/electronics11244110

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Lightweight Bearing Fault Diagnosis Method Based on Multi-Channel Depthwise Separable Convolutional Neural Network

Abstract

1. Introduction

2. Theoretical Background

2.1. Short Time Fourier Transform

2.2. Deep Convolution Neural Network

2.2.1. Convolution Neural Network

2.2.2. Depthwise Separable Convolution

2.2.3. Residual Network

2.2.4. Batch Normalization

2.2.5. GELU Function

2.2.6. Patch Embedding

3. Diagnosis Process and Model Design

3.1. Diagnosis Process

3.2. Model Design

3.2.1. Model Structure

3.2.2. Light-Unit

4. Results and Discussion

4.1. Performance Metrics

4.1.1. Diagnostic Accuracy

4.1.2. Parameter Number and Size of Model

4.1.3. Ratio

4.1.4. F1-Score

4.2. Data Set 1

4.2.1. Analysis

Influence of Vibration Signals in Different Directions on Diagnosis Results

Influence of the Number of Lightweight Units on Model Performance

4.2.2. Experimental Verification

Efficient and Lightweight Verification

Anti-Noise Verification

4.3. Data Set 2

4.3.1. Analysis

4.3.2. Experimental Verification

4.4. Visualization of Diagnostic Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI