Instantaneous Square Current Signal Analysis for Motors Using Vision Transformer for the Fault Diagnosis of Rolling Bearings

Chen, Fei; Zhou, Xin; Xu, Binbin; Yang, Zheng; Qu, Zege

doi:10.3390/app13169349

Open AccessArticle

Instantaneous Square Current Signal Analysis for Motors Using Vision Transformer for the Fault Diagnosis of Rolling Bearings

by

Fei Chen

¹,

Xin Zhou

¹,

Binbin Xu

^1,*,

Zheng Yang

²

and

Zege Qu

¹

Sino-German College of Intelligent Manufacturing, Shenzhen Technology University, Shenzhen 518118, China

²

School of Mechanical and Aerospace Engineering, Jilin University, Changchun 130025, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(16), 9349; https://doi.org/10.3390/app13169349

Submission received: 28 July 2023 / Revised: 11 August 2023 / Accepted: 16 August 2023 / Published: 17 August 2023

(This article belongs to the Special Issue Intelligent Fault Diagnosis and Monitoring)

Download

Browse Figures

Versions Notes

Abstract

:

Using vibration signals for bearing fault diagnosis can generally achieve good diagnostic results. However, it is not suitable for practical industrial applications due to the restricted installation and high cost of vibration sensors. Therefore, the easily obtainable motor current signal (MCS) has received widespread attention in recent years. Meanwhile, traditional fault diagnosis methods cannot meet the diagnostic accuracy requirements because of the low signal-to-noise ratio (SNR) of the MCS. Committed to achieving bearing fault diagnosis through MCS, a rolling bearing fault diagnosis method, ISCV-ViT, based on the MCS and the Vision Transformer (ViT) model, is proposed. In particular, a signal processing method based on the instantaneous square current value (ISCV) is proposed to process the MCS directly obtained through a frequency converter into time-domain images. Then, the ViT model is applied for bearing fault diagnosis. Finally, experimental verification is carried out based on the public bearing dataset of Paderborn University (PU) and the bearing dataset of Shenzhen Technology University (SZTU). The analysis of the experimental results demonstrates that the average accuracy of the ISCV-ViT for the two datasets is up to 96.60% and 94.87%, respectively.

Keywords:

bearing fault diagnosis; signature processing; deep learning; vision transformer; motor current signal analysis

1. Introduction

With the progress of science and technology, mechanical equipment has become more complicated and sophisticated. As a result of mechanical equipment playing an important role in actual production, the production process is seriously affected when a fault occurs. Consequently, it is very meaningful to carry out health monitoring and fault diagnosis. According to statistics, 41% of induction motor failures result from bearing faults [1]. Therefore, research on motor bearing fault diagnosis is particularly significant. Previous research on bearing fault diagnosis has mostly focused on vibration signal acquisition and feature extraction. For example, Peng et al. [2] provided a comprehensive and systematic summary of recent research for fault diagnosis of rolling bearings using vibration signals. On the one hand, adding vibration sensors obviously increases costs in actual production. On the other hand, installing sensors on high-precision equipment is often impossible, especially under some special working conditions. Therefore, detection of bearing faults through the signals obtained from the equipment itself has received widespread attention, such as the motor current signal (MCS) [3,4,5], which can be directly measured by a frequency converter without installing other sensors. Meanwhile, due to the rapid development of deep learning and its excellent performance in terms of fault diagnosis technology, research on bearing fault diagnosis based on the combination of deep learning intelligent algorithms and current signals has remarkable engineering value [6,7]. Compared with vibration signal analysis, motor current signature analysis (MCSA) is a relatively new method for the fault diagnosis of rolling bearings [8]. Owing to the benefits of energy savings and soft motor start-up capability, variable frequency drives (VFDs) have been extensively applied in mechanical equipment [9]. Nevertheless, VFD output harmonics result in a poor signal-to-noise ratio, which makes the application of stator current signals challenging [10]. According to the above current signal characteristics, some scholars have carried out much research on signal processing. Currently, bearing fault diagnosis mainly centers on time-domain, frequency-domain, and time-frequency-domain analysis. In particular, spectral analysis is the most widely used method.

First, time-domain analysis is the most straightforward technique for fault detection and diagnosis, which usually involves the scalar indices used to determine bearing conditions [11]. Time-domain analysis extracts fault features by calculating statistical parameters such as the kurtosis, crest factor, skewness, and probability density curve [12]. Due to the gradual increase in the degree of bearing faults over time, Godoy et al. [13] monitored the amplitude of current signals in the time domain. In addition, some research has attempted to combine the time-domain and frequency-domain features of the stator current signal to extract fault features for bearing fault diagnosis [14,15]. Due to the weak time-domain features of the current signal and the minimal relationship between them and bearing faults, time-domain features are often used as auxiliary analysis combined with frequency-domain analysis.

Second, spectral analysis of the current signal is the most widely used method at present. Nevertheless, most studies based on current signals are devoted to motor fault diagnosis, while studies on bearing fault diagnosis are relatively scarce. For example, to make frequency analysis more effective, a normalized frequency-domain energy operator is proposed to avoid the supply frequency masking the fault frequency component for broken rotor bar fault diagnosis [16]. Similarly, Wang et al. [17] used a third-order energy operator for current signal demodulation to enhance the fault features. Kabul and Unsal [18] observed the Hilbert envelope spectrum of a current signal and performed spectral analysis to detect fault-related harmonic components. Aiming to address the problem that the supply frequency component of the current signal is dominant and the fault sideband amplitude is too low to detect, some frequency spectrum preprocessing methods based on the stator current are applied to motor fault diagnosis [19,20,21]. In the Fourier spectrum, the sideband appears near the generation frequency, and the fault characteristic frequency is within its interval. In addition, the amplitude modulation (AM) and frequency modulation (FM) characteristics of the analytic signal model enable us to use demodulation analysis to avoid the difficulties caused by complex sideband analysis in the Fourier spectrum [22]. However, the current signal is severely modulated by several factors during actual operation, such as the power frequency and bearing rotation frequency. Its application in engineering is still limited due to the complexity of modulation and demodulation in signal processing.

Recently, to observe a more subtle time-varying process of the signal, time-frequency analysis has been fully developed in recent years because it can express a signal in the time domain and frequency domain synchronously. A bearing fault leads to rotor radial motion and load torque changes; the former adds additional frequency to the stator current, while the latter causes phase modulation of the stator current, which results in a time-varying frequency content [23]. Becker et al. [24] detected the rotor eccentricity caused by bearing faults by collecting three-phase current signals from the motor, and the results show that advanced transient current signature analysis (ATCSA) has good industrial application potential. The continuous wavelet transform (CWT) was used to effectively extract the features of fault components from the current signal, and the 2D and 3D wavelet scalograms were used to achieve time-frequency characterization [25]. By means of conducting multiresolution analysis on square current signals based on the maximum overlap discrete wavelet transformer (MODWT), the extracted time-frequency features were used for bearing fault diagnosis [26,27]. However, similar to the case of frequency-domain analysis, due to the complex modulation of the frequency component of the current signal, it is still not well processed even in the time-frequency domain. Meanwhile, many of these methods are only adapted to specific failure conditions, so it is necessary to correctly evaluate the accuracy and applicability of the developed method for current signals under different fault conditions.

To avoid the complex feature extraction process of signals, deep learning (DL) and neural networks have received much attention in fault diagnosis due to their model independence and superior performance, which are properties of data-driven approaches [28]. For instance, Qu and Zhang [29] combined variational mode decomposition (VMD) and an artificial neural network (ANN) to perform the fault diagnosis of rolling bearings. Shifat et al. [30] added an attention layer to the bidirectional long-short-term memory (LSTM) neural network model to obtain degradation features of three different patterns of the brushless DC (BLDC) motor and make future predictions. Kerboua and Kelaiaia [31] avoided information loss during the process of converting into images using mathematical models by converting the three-phase stator current into a voxelated 3D data structure, and then fault diagnosis is performed by inputting them into a 3D CNN model. Many studies have applied convolutional neural network (CNNs) and their variants to the fault diagnosis of electric machines and achieved good results [32,33,34]. However, due to the limitation of the size of the convolution kernel and its convolution operation mode, capturing long-range feature information using the above methods based on CNNs and their variants is very difficult [35]. Considering the issue of imbalanced data and time-varying features caused by variable working conditions, some of the latest methods improve CNN from the adaptive level [36,37], some focus on data balance processing [38]. This is not ideal for fault diagnosis based on the current signal since its fault features are too weak. Moreover, when training on large-scale datasets, its calculation costs increase significantly. Recently, a new architecture that takes full advantage of an attention mechanism that can obtain global features containing different positions of the entire sequence to focus on sensitive features—Transformer—was proposed by Google’s research team [39]. Due to the fact that the attention mechanism operation mode is almost completely different from convolution operation, this is a significant breakthrough in the field of NLP. Soon afterwards, Vision Transformer (ViT) [40] was proposed by the Google Research team to apply Transformer to image classification, which had received widespread attention from the academic community. Although it is not widely used in fault diagnosis at present, considering the special requirements of extracting sequence features and internal correlations during signal processing, the varieties of Transformer show extraordinary potential. For example, Ding et al. of Southeast University [41] relied on their bearing experimental dataset to extract fault features through time-frequency analysis of vibration signals and input them into Transformer for fault diagnosis to verify the performance. Tang et al. [42] proposed an integrated Vision Transformer model that incorporated the soft voting method for classification decisions, which can enable the model to perform bearing fault diagnosis more accurately. He et al. [43] utilized the Siamese network to map feature vectors into a new space for faster and more effective fault feature extraction and finally combined it with Vision Transformer to achieve fault diagnosis. These studies have verified the effectiveness and superiority of the Transformer model. Wu et al. [44] made a detailed comparative study on the fault diagnosis methods of rolling bearings by using different sensors and pointed out that the technology using artificial intelligence is still developing rapidly. The combination of time-domain statistics, frequency spectra, time-frequency spectra, and other image analyses with deep learning methods provides a new technique for bearing fault diagnosis.

Through a comparative analysis of relevant research, combining the limitations of the above methods with the advantages of ViT [40], a bearing fault diagnosis method is proposed based on the instantaneous square current value and the Vision Transformer model (ISCV-ViT). And verify the competitiveness of the proposed method in solving bearing mechanical fault diagnosis problems based on MCS through case studies. At the signal processing level, the instantaneous square current value of the two-phase current is calculated to plot the instantaneous square current value-time curve as a feature extracted from the time domain. Subsequently, the time-domain diagram of the instantaneous square current value is input to the ViT model to perform fault classification. The main contributions of this study are summarized as follows:

For the reason that it is not suitable for practical industrial applications due to the restricted installation and high cost of vibration sensors, the MCS is selected as the analysis subject. And proposed a processing method, ISCV, for fault feature extraction in MCS;
To solve the problem that traditional fault diagnosis methods cannot meet the requirements of diagnostic accuracy due to the low SNR of MCS, a diagnostic model based on ISCV-ViT is proposed for bearing diagnosis, which is evaluated on two datasets of rolling bearings to prove the model’s accuracy and practicality, namely, the public bearing dataset from Paderborn University (PU) and the laboratory bearing dataset of Shenzhen Technology University (SZTU).

The rest of this paper is organized as follows: Section 2 introduces the effective value theory and the basic structure of ViT; Section 3 presents the details of the proposed method; and Section 4 gives the experimental verification results and comparative analysis. Finally, the conclusion is shown in Section 5.

2. Theoretical Basis

2.1. Transformer Model

Transformer was proposed by Vaswani et al. [39] for machine translation and achieved excellent performance. The structure of Transformer uses an attention mechanism to represent the global correlation between input and output rather than using recursive operations. It not only effectively prevents the gradient from disappearing, which is a problem in RNN, but also greatly speeds up the calculation efficiency through parallel training. Another novel point is positional embedding, which is used to identify the order relationship in the language and is a good solution to the long-range dependency problem in NLP.

In general, different blocks in a Transformer model have specific functions. Position embedding layers are used to understand the order of different words in the entire sequence and the multihead self-attention (MSA) mechanism, and fully connected layers are used to calculate feature vectors, as well as residual connections [45], layer normalization, encoder-decoder architecture, etc. According to the usage of decoder blocks and encoder blocks, Transformer typically includes three basic structures: (1) use only encoder blocks for classification tasks; (2) use only decoder blocks for language modeling; and (3) use both encoder and decoder blocks for machine translation tasks [41]. In the field of fault diagnosis, fault classification is the final target result. In recent years, ViT, an encoder-only Transformer variant that has been widely used in image classification, has fully met our requirements for fault classification. The ViT model only uses the encoder module of the Transformer to extract features (the original Transformer also has the decoder module, which is used to implement sequence-to-sequence tasks, such as machine translation), so the following discussion will focus on the encoder module of the Transformer.

2.2. Transformer Encoder

The Transformer encoder is composed of many transformer blocks, including two sublayers, MSA, and a positionwise fully connected feedforward network, as presented on the left side of Figure 1. The core operation of Transformer is a multihead self-attention mechanism, which has achieved better results than RNN or LSTM structures. In summary, attention aggregates information according to the similarity of input information given by the current query with differences, which is to perform a weighted average operation on all input. In the calculation of the self-attention mechanism, the actual operations are three vectors: query, key, and value, where key and value are paired. The self-attention mechanism performs certain linear operations on these three vectors and obtains the results considering the global information. The specific process of self-attention is shown in Figure 2, which corresponds with the scaled dot-product attention presented on the right side of Figure 1.

As shown in the figure, the input vector is a series of sequences in the form of

[x^{1} {, x}^{2} {, x}^{3} {, \dots, x}^{t}]

. Each input vector multiplies a weight parameter matrix

W

to obtain a string of embedding vectors

[a^{1} {, a}^{2} {, \dots, a}^{t}]

. Then, each is fed into the self-attention layer, which multiplies each embedding vector by three different transformation matrices

W_{q} {, W}_{k} {, W}_{v}

to obtain three corresponding vectors, namely,

q^{i} {, k}^{i} {, v}^{i}

. Next, the self-attention mechanism performs vector point multiplication for

q^{i}

(query) and each

k^{i}

(key) to measure the similarity of any two vectors. The calculation process is called the scaled inner product (dot product), which is defined as follows:

α_{n, i} = q^{n} \cdot \frac{k^{i}}{\sqrt{d}}

(1)

where

α_{n, i}

is used to weigh the similarity between the ith key and the nth query, d is the dimension of

q^{n}

and

k^{i}

, and

\sqrt{d}

is a scaling factor used to balance the dimensional changes resulted from vector dot product operations. The

α_{n, i}

is then sent to obtain the weights

{\hat{α}}_{n, i}

on the values by applying the softmax function:

{\hat{α}}_{n, i} = e x p (α_{n, i}) / \sum_{j} e x p (α_{n, j})

(2)

where

i, j \in \{1, 2, \dots, n\}

. Then, vectors are constructed with global information by summing the product of the weight parameter

{\hat{α}}_{n, i}

with the corresponding

v^{i}

.

b^{n} = \sum_{i} {\hat{α}}_{n, i} v^{i}

(3)

where

b^{n}

is the reconstructed vector, which can be computed in parallel. In contrast with RNN, the self-attention layer can not only achieve the same results but also significantly improve the computational efficiency due to parallel operation. The current requirement for the input data type of Transformer is a tensor of shape

ℝ^{n \times d}

, a sequence containing

n

vectors whose dimensions are

d

. Therefore, the matrix of self-attention is described as:

A t t e n t i o n (Q, K, V) = S o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(4)

where

d_{k} = d

is the dimension of the queries and keys. Equation (4) is an overview of the single-head self-attention mechanism, and its relationship with the MSA is shown in the middle of Figure 1.

Since single-head self-attention can only focus on one aspect of the global information, it is beneficial to perform the attention function in parallel on each of the different, learned linear projections of queries, keys, and values. Meanwhile, the output of each subspace is connected and projected again to obtain the final value. This is exactly what the MSA does, which enables the model to further analyze the output of subspaces with different attention cores [39]. The input embeddings

X

with dimension

d_{model}

and the corresponding matrices

Q_{i} {, K}_{i} {, V}_{i}

with dimension

d_{k} {, d}_{k} {, d}_{v}

, respectively, are obtained by matrix multiplication with the weight matrices

W_{i}^{Q} \in ℝ^{d_{model} {\times d}_{k}}, W_{i}^{K} \in ℝ^{d_{model} {\times d}_{k}}, W_{i}^{V} \in ℝ^{d_{model} {\times d}_{v}}

, respectively, and then passed through the MSA. The details are as follows:

\begin{matrix} M S A (X) = C o n c a t (h e a d_{1}, \dots, h e a d_{h}) \cdot W^{O} \\ h e a d_{i} = A t t e n t i o n (X W_{i}^{Q}, X W_{i}^{K}, X W_{i}^{V}) \end{matrix}

(5)

where

h

is the number of linear projections,

W^{O} \in ℝ^{h \cdot d_{v} {\times d}_{model}}

is the value matrix,

d_{k} {= d}_{v} = d_{model} / h

, and the

Concat

function is used to concatenates the output values from different attention heads.

3. The Proposed Method

3.1. Description of the Proposed Method

In this paper, signals that can be obtained by the mechanical equipment itself rather than by installing additional sensors, such as the current signal, are our primary concern. To conduct a profound study on signals such as currents whose fault features are too weak to extract, the bearing fault diagnosis method ISCV-ViT is proposed. Since current is sensitive to load change, from the perspective of energy, the instantaneous square current value is calculated based on the effective value theory and multiphase current. This avoids complex frequency denoising and signal transformation processes, but the diagnostic accuracy is not low and may be better. The overall architecture of the ISCV-ViT is shown in Figure 3.

3.2. Specific Details of the Fault Diagnosis Method (ISCV-ViT)

3.2.1. The Instantaneous Square Current Value

Considering the particularity of the current signal, it is necessary to select appropriate signal preprocessing methods. Statistical time-domain characteristics are preferred because of their simple calculation and effective modeling of signal changes and trends [46]. Consequently, to reduce the complexity of signal processing and take the effectiveness of feature extraction into consideration, the two-phase current is coupled based on the idea of the effective value of the current. Finally, the time-domain images are formed as the input to the diagnostic model.

It is necessary to consider the symmetry of the three-phase current under normal conditions and the current amplitude fluctuation and phase modulation under fault conditions. To fully mine the fault features contained in the current signal, two-phase current sampling values are selected for coupling calculation through Equation (6) according to the theoretical basis for effective value calculation.

I_{n} = \sqrt{\frac{1}{2} (i_{a n}^{2} + i_{b n}^{2})}

(6)

where

I_{n}

is the effective value of the nth current sampling coupling, and

i_{a n}

and

i_{b n}

are the nth sampling values of one phase currents.

The essence of fault diagnosis based on the current signal is to find feature differences. The most obvious and direct one is the fluctuation of current amplitude over time. However, relying solely on one-dimensional time-domain statistical indicators makes it difficult to reflect the process of change over time. We take

I_{n}

as the statistical time-domain feature and draw the time-domain images as the subsequent model input, which is shown in Figure 4. Considering the features of the dimensionless data that can accelerate the solution of the model, the obtained current data is subjected to a Min-Max scaling operation after coupling. From the image (b) in Figure 4, it can be seen that the peak part fluctuates more significantly after ISCV processing due to the integration of two-phase current characteristics with phase difference.

3.2.2. Input Embeddings

It is widely known that the input of the standard Transformer module is a two-dimensional matrix, which is a vector sequence (token). However, the data format of a two-dimensional image is a three-dimensional matrix

[H, W, C]

, which is inconsistent with the requirements of Transformer. Therefore, it is necessary to transform the data through the embedding layer. For ViT, the input embeddings consist of three parts: patch embeddings, position embeddings, and learnable embeddings.

Patch embedding is a one-dimensional vector obtained by the linear projection of an image patch. Specifically, given a time-domain image

x \in ℝ^{H \times W \times C}

, where

H

and

W

are the lengths of the image in the horizontal and vertical directions, respectively,

C

is the number of channels (an RGB image has 3 channels). First, according to the specified size

p \times p

, we divide the image into

N = H W / p^{2}

patches

x_{p} \in ℝ^{N \times (p * p * C)}

. Then, each patch is linearly mapped to a one-dimensional vector

x_{p}^{'} \in ℝ^{{N \times d}_{model}}

, which is exactly the form of input that meets our requirements. In the algorithm, this is implemented through a convolution layer, which then flattens the output according to the

H

and

W

dimensions.

A learnable embedding is a vector specifically used for classification in ViT. Referring to the class token proposed in BERT [47], a token

z_{0}^{0} {= x}_{class}

independent of the image and with fixed position embedding is spliced with patch embeddings obtained by convolution. Moreover, the token is a randomly initialized, trainable embedding, so it can encode the statistical characteristics of the entire dataset. This makes the classification task more reasonable. At this point, the vector obtained by splicing is

z_{t} \in ℝ^{(N + 1) {\times d}_{model}}

.

Position embedding is the same as position encoding in Transformer. ViT uses standard learnable one-dimensional embeddings

E_{pos} \in ℝ^{(N + 1) {\times d}_{model}}

, which are directly superimposed with the obtained patch embeddings. Thus, it is necessary to ensure that both dimensions are the same. Notably, the author of [34] conducted a series of comparative experiments in ViT to explain the reason for choosing one-dimensional position embedding. The results show that the accuracy is significantly improved compared with that without position embedding, but there is little difference in the accuracy when using one-dimensional or two-dimensional position embedding.

Thus far, the input embeddings

z_{0}

meet the requirements.

z_{0} = [x_{c l a s s}; x_{p}^{1} E; x_{p}^{2} E; \dots; x_{p}^{N} E] + E_{p o s}

(7)

where

E \in ℝ^{(p * p * C) {\times d}_{model}}

is the learnable linear mapping used in patch embeddings.

3.2.3. Transformer Blocks

In ViT, feature extraction is carried out by the serially stacked transformer blocks, and the classification task is performed with the features corresponding to the class token at the end. The core architecture of each transformer block in ViT is composed of MSA and multilayer perceptron (MLP) modules connected alternately. The Transformer block includes layer normalization, MSA, a dropout layer, a residual connector, and MLP. MSA is the most critical component and has been described in detail in Section 2.2. The specific details of the MLP module are described as follows.

The MLP module is mainly composed of three parts: fully connected (FC) layers, a Gaussian error linear unit (GELU) function, and a dropout layer. It is worth noting that the shapes of the input and output are consistent before and after the transformer encoder. This is because there are two fully connected layers in the MLP. The first FC layer will quadruple the input dimension (4

d_{model}

), and the second FC layer will restore the original feature dimension (

d_{model}

). Different from the vanilla Transformer, ViT uses GELU as the activation function. Compared with the rectified linear unit (ReLU) function, it can retain not only the probability of the activation parameter value but also the dependence on input. Then, the data passes through the dropout layer, which can effectively alleviate overfitting and achieve regularization to some extent. The GELU activation function is defined as follows:

GELU (x) = x Φ (x) = x \cdot \frac{1}{2} [1 + e r f (x / \sqrt{2})]

(8)

The result can be estimated as follows:

0.5 x (1 + \tanh [\sqrt{2 / π} (x + 0.044715 x^{3})])

(9)

where

Φ (x)

is the standard normal distribution function and

e r f (\cdot)

represents the Gaussian error function.

Finally, ViT realizes data transfer between the MSA module and MLP module through residual connections. In addition, residual connections play a great role in the transfer of features from the bottom to the top and effectively avoid the disappearance of the gradient.

For this step, it can be seen from the previous section that the one-dimensional input embeddings

z_{0} = [z_{0}^{0} {; z}_{0}^{1}; \dots {; z}_{0}^{N}]

are formed by patch embeddings, position embeddings, and learnable embeddings. Then, they are fed into the Transformer encoder, and the calculation process is shown in Equations (10)–(12). Finally, the learnable class tokens

z_{L}^{0}

are extracted as the overall feature, which is input into the MLP classifier to implement the fault classification task. The MLP classifier is a simple neural network composed of an FC layer. Significantly, the base 16 version of ViT is chosen as the basic model according to the influence of the number of attention heads on the behavior of the model. The structure and main hyperparameters of the proposed ISCV-ViT are shown in Table 1. The data information flowchart of the ISCV-ViT model is shown in Figure 5.

z_{l}^{'} = MSA (L N (z_{l - 1})) + z_{l - 1}, l = 1, \dots, L

(10)

z_{l} = MLP (L N (z_{l}^{'})) + z_{l}^{'}, l = 1, \dots, L

(11)

y = L N (z_{L}^{0})

(12)

where

z_{l}^{'}

is the operation result of the lth MSA module,

z_{l}

represents the output after one MSA and MLP operation in the lth Transformer block,

L

is the number of Transformer blocks that can be optionally stacked, and

y

is the output of the Transformer encoder.

3.2.4. Training Strategy

In the process of model training, it is our target to have the predicted value infinitely approximate the actual value. That is, the difference needs to be minimized, which means that the selection of the loss function is very critical in this process. The process of feeding the input feature

(x)

into the model to obtain the predicted value

(y)

is called the forward pass. The process of updating model parameters according to the D-value between the predicted value and the actual value is called the backwards pass. In between, the loss function provides input data for the backwards pass by accepting the predicted value of the model and calculating the difference. Based on the above analysis and the characteristics of classification problems, the loss function of the proposed ISCV-ViT method selects the cross-entropy (CE) function (13).

L o s s (y_{j i}, {\hat{y}}_{j i}) = - \frac{1}{k} \sum_{j}^{k} \sum_{i}^{n} [y_{j i} l o g {\hat{y}}_{j i} + (1 - y_{j i}) l o g (1 - {\hat{y}}_{j i})]

(13)

where

y_{j i}

are the actual values,

{\hat{y}}_{j i}

are the predicted values,

k

is the number of training samples, and

n

is the number of categories. When the D-value

{L (y}_{j i}, \hat{y_{j i}})

has been calculated, the Adam optimizer [48] is used to update learnable parameters in the backwards pass. To achieve the best iteration effect, the mean and variance of the gradient are included in the update calculation of the step size. This is exactly what Adam does when he combines the advantages of the SGDM and RMSProp [49] optimization algorithms. The final calculation of the learnable parameter

θ_{t}

is as follows:

θ_{t} = θ_{t - 1} - l r * \hat{m_{t}} / (\sqrt{\hat{v_{t}}} + ε)

(14)

where

θ_{t - 1}

is the learnable parameter obtained from the

(t - 1)

th update,

l r

is the learning rate (stride),

\hat{m_{t}}

and

\hat{v_{t}}

are the first-order and the second-order moment estimates of the gradient after deviation correction, respectively, and

ε = 10^{- 8}

, which is used to avoid dividing by zero.

4. Experimental Validation

To demonstrate the applicability and accuracy of the proposed ISCV-ViT method, two case studies are performed for experimental verification. One is on a public dataset, which is from the bearing dataset of Paderborn University (PU) [50], and the other is on the laboratory bearing dataset of Shenzhen Technology University (SZTU). Furthermore, the diagnostic model is evaluated according to the experimental results. To perform signal processing, the single-phase current images of the current signal are extracted to compare with the proposed ISCV time-domain images. In addition, to illustrate the superiority of the ISCV-ViT model at the level of algorithm diagnosis, comparative experiments are conducted with the standard ViT and ResNet50 with residual connections. The specific hyperparameter settings of the above three benchmark models are shown in Table 2.

4.1. Case 1: Public Bearing Dataset of Paderborn University (PU)

4.1.1. Dataset and Experimental Apparatus

At present, most of the fault datasets of bearings both at home and abroad are based on vibration signals, so there are few datasets containing current signals. The public bearing dataset PU synchronously collects vibration and motor current signals that are presently widely used and is also known as the condition monitoring (CM) experimental bearing dataset. The test rig is shown in Figure 6 and is divided into several different modules based on its function: the motor, torque measuring shaft, rolling bearing, flywheel, and load motor. In the rolling bearing test module, the dataset is generated by replacing bearings with different damage types. All the test bearings are ball bearings of type 6203. Among them, there are 12 artificially damaged bearings (including 7 outer-race faults and 5 inner-race faults), 14 damaged bearings in the accelerated life test (including 5 outer-race faults, 6 inner-race faults, and 2 composite faults), and 6 healthy bearings. The motor phase current signal is obtained through the output current of the frequency converter, which is collected through a LEM CKSR 15-NP current transducer. Then, the current signal is converted from an analog signal to a digital signal at a sampling rate of 64 kHz. According to the operating parameters, four working conditions are set, and two of them are selected for the experiment, as shown in Table 3, according to the experimental verification requirements. Data for each working condition were collected 20 times, for 4 s each time, and saved as a MATLAB file whose name consists of the experimental condition and the special bearing code (such as N15_M01_F10_K001_1.mat).

4.1.2. Experimental Setup of Case 1

To accurately evaluate the performance of the proposed ISCV-ViT model, three groups of experiments are conducted on the public dataset PU: ISCV-ViT, ViT, and ISCV-ResNet50. Since the current signal is sensitive to the load, two working conditions, N15_M01_F10 and N15_M07_F10, are selected. At the same time, five fault categories are selected from the accelerated life test for each working condition in addition to the health category. The details are shown in Table 4. A total of 25,600 data points are taken as a sample, and 200 samples are extracted from each category. Finally, 1200 samples are divided into a training set and a validation set at a ratio of 4:1. To eliminate accidental errors, each experiment was run for 200 epochs and repeated three times.

4.1.3. Evaluation and Analysis of Case 1

Based on the previous analysis, model training is conducted on the public PU dataset according to the stated basic network structure parameters. Figure 7 shows the accuracy curves for different methods on the public dataset PU. Obviously, compared to ISCV-ViT and ISCV-ResNet50, ViT, which directly uses the time-domain diagram of single-phase current signals as input without processing, performs poorly in terms of both convergence speed and accuracy. This also directly proves the feasibility of the ISCV method for preprocessing current signals. Through further comparative observation, the accuracy of the ISCV-ViT is the highest, and there is a large gap between the ISCV-ViT and others. In this experiment, the input images of ViT are time-domain images of MCS that have not been processed by ISCV. Due to the high level of noise and complex data components, the highest diagnostic accuracy of ViT is only 69.4%. This can be explained as reducing the complexity of your dataset, which can potentially improve the accuracy of the deep learning model [51,52]. In addition, ISCV-ViT starts to stabilize after approximately 150 epochs due to the slightly slower training convergence of the ViT model on small and medium-sized datasets. By comparison, ISCV-ResNet50 converges very quickly after approximately 60 epochs have passed. This is a predictable result since the residual connection and batch normalization can accelerate the convergence of the network applied in ResNet50. Meanwhile, the performance of the three models is stable under two operating conditions with different load settings. This indicates that the accuracy and generalization of the ISCV-ViT are superior by far.

To further quantitatively evaluate the diagnostic performance of the ISCV-ViT model, confusion matrices are created for the best validation results under different operating conditions, as shown in Figure 8. The corresponding accuracies are 95.4% for N15_M01_F10 and 97.1% for N15_M07_F10. Notably, it can be clearly seen from the confusion matrices that a small number of OR1 faults are misjudged as IR1 faults. Otherwise, almost all other fault prediction accuracies reached 100%. Simultaneously, different working conditions displayed the same misjudgments. Therefore, it can be concluded that the classification of IR and OR faults still has room for improvement. Meanwhile, it can be inferred that this misjudgment may be related to the damage combination of the faults.

Finally, the data from repetitive experiments is further statistically analyzed. Generally, the ViT model is superior to the ResNet model in processing temporal sequences due to the attention mechanism, which can obtain global characteristics. The mean and standard deviation of the repeated experiment accuracy are calculated to describe the reliability of the model, and the experimental results shown in Figure 9 also prove this point. The average accuracy of the proposed ISCV-ViT method reached 95.40% and 96.60% for the two conditions, which is higher than that of the other methods. Additionally, the standard deviation of the ISCV-ViT is also lower than that of the others, which means that the stability of the ISCV-ViT is better. Table 5 supplements the experimental results of [50] on its own PU dataset by using features extracted from MCS. Significantly, the current signal processing methods adopted are all based on current frequency, such as the fast Fourier transform (FFT), power spectral density (PSD), and wavelet packet decomposition (WPD). Then, diagnosis was performed through machine learning algorithms. The highest accuracy is 93.3%, slightly lower than the accuracy of 95.40% of our proposed method. Table 6 shows the comparison between the four latest mechanical fault diagnosis methods based on current signals and the proposed methods on their respective datasets. The results indicate that the proposed method can effectively perform bearing fault diagnosis based on motor current signals. Meanwhile, the accuracy of the latest method [53,54,55] is slightly higher than that of the proposed method, which is because of the smaller number of fault categories. In addition, the label of composed faults and varying degrees of faults were also considered in this paper, as shown in Table 4. M&E refers to mechanical and electrical faults in Table 6. Generally speaking, mechanical fault diagnosis using MCS is more difficult than electrical faults. This is because electrical faults can cause more significant changes in the MCS. Compared to [56] with the same number of labels, the accuracy of the proposed method has significantly improved. Based on the comparison results, the proposed method has certain competitiveness in bearing fault diagnosis based on MCS.

4.2. Case 2: Laboratory Bearing Dataset of Shenzhen Technology University (SZTU)

4.2.1. Experimental Apparatus

In addition to the experimental verification on the above public dataset of PU, the laboratory motor fault diagnosis test rig is used to conduct experiments and collect relevant data. The test rig can provide fault data for motors and bearings and mainly consists of the following modules: motor, frequency transformer, rolling bearing, magnetic particle brake, and junction block. The specific details of the test rig are shown in Figure 10. By replacing the bearings of different fault categories, the corresponding two-phase current signal in the frequency transformer is extracted at a sampling frequency of 40.96 kHz. All the test bearings are of type UCPH206. There are five types of faulty bearings, excluding healthy bearings; their specific descriptions are shown in Table 7. The load torque of the magnetic particle brake can be adjusted between 0 and 50 N·m according to the load braking current since the torque is linear with the current. According to rotational speed and different load sizes, a total of four groups of experiments are designed, which are shown in Table 8. To ensure sufficient sample points, each group is extracted for 12 s and repeated 20 times. Finally, the data are saved as a MATLAB file (such as FBM_1_C.mat).

4.2.2. Experimental Setup of Case 2

In an effort to demonstrate the generalization ability of the proposed ISCV-ViT method, repeated comparative experiments are conducted consistently with those on the public dataset PU in the laboratory test rig. According to the sampling frequency and speed setting value, 20,480 data points are taken as a sample. Other experimental settings are consistent with case 1. Two hundred samples are extracted from each category, and finally, the training set and validation set are divided according to a ratio of 4:1. Each group experiment is carried out for 200 epochs and repeated three times.

4.2.3. Results Analysis of Case 2

For the laboratory dataset, four sets of experiments are conducted with the load adjusted from 0 N·m to a maximum of 50 N·m to simulate actual operating conditions as much as possible. Similar to case 1, we plotted the corresponding accuracy curves shown in Figure 11. As we can see, the accuracy of the proposed method is higher than the other methods under four operating conditions. Notably, the performance of the proposed method, ISCV-ViT, improves slightly as the load increases. The experimental results of the PU in Case 1 are also reflected accordingly. Especially from the results shown in (c) of Figure 11, the convergence speed of ISCV-ViT is almost consistent with Resnet50. Furthermore, we can clearly see that there are certain misjudgments between the FBO and FBB faults on the laboratory dataset by drawing the visual confusion matrices shown in Figure 12. The reasons for this need further research. Finally, Figure 13 shows the statistical results of all trials. It is obvious that the accuracy of the ISCV-ViT is higher than that of the other methods under the four operating conditions. Notably, the performance of the proposed ISCV-ViT method does not decrease and even slightly improves as the load increases. The experimental results are consistent with those of the PU dataset in Case 1. In addition, ISCV-ViT has the highest average accuracy of 94.87% and the lowest standard deviation. Compared to the experimental results of case 1, the validation accuracy on the laboratory dataset is slightly lower than that on the public dataset PU. Therefore, we can conclude that the generalization ability and accuracy of the proposed method are excellent.

5. Conclusions

In this paper, a concise processing method, ISCV, is proposed for a current signal based on the relevant theory of effective values. In addition, the successful ViT from the field of NLP is applied to bearing fault diagnosis. Based on this model, an ISCV-ViT method for bearing fault diagnosis based on motor current signals is proposed.

The ISCV-ViT has the following characteristics: (1) The current signal is easy to obtain and does not require additional sensor costs. This is very meaningful for fault diagnosis of complex and precise equipment, especially in industrial applications. (2) Compared with the current popular time-frequency characteristic analysis methods for motor current signals, the proposed ISCV method is more concise and effectively avoids some complex noise removal processes. (3) Aiming to address the problem of timing characteristic recognition of the current signal, the ViT model that is applied can more accurately capture global features that require attention due to the multihead self-attention (MSA) mechanism. Meanwhile, the parallel computing architecture greatly reduces the network size.

Furthermore, the feasibility and generalization of the proposed method are verified through the public dataset PU and the laboratory dataset SZTU. In subsequent research, we will further optimize the model to improve convergence speed and accuracy. In addition, we believe that fusing frequency features can further improve model accuracy based on ISCV.

Author Contributions

Conceptualization, F.C. and X.Z.; Methodology, X.Z.; Validation, Z.Q.; Formal analysis, Z.Y.; Investigation, Z.Y.; Resources, Z.Q.; Writing—original draft, F.C. and X.Z.; Writing—review & editing, F.C., X.Z. and B.X.; Supervision, F.C.; Project administration, B.X.; Funding acquisition, F.C. and B.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Key Research and Development Program of China, grant number 2022YFF0610400; the Guangdong Province Key Construction Discipline Scientific Research Capacity Improvement Project, grant number 2022ZDJS114; the School-enterprise Cooperation Research Foundation of Shenzhen Technology University for Graduate Student, grant number XQHZ202302; the Shenzhen UAV Test Public Service Platform and Low-altitude Economic Integration and Innovation Research Center, grant number 29853MKCJ202300205.

Institutional Review Board Statement

The study did not require ethical approval.

Informed Consent Statement

The study did not involve humans.

Data Availability Statement

Data available on request due to restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sheikh, M.A.; Bakhsh, S.T.; Irfan, M.; Nor, N.B.; Nowakowski, G. A review to diagnose faults related to three-phase industrial induction motors. J. Fail. Anal. Prev. 2022, 22, 1546–1557. [Google Scholar] [CrossRef]
Peng, B.; Bi, Y.; Xue, B.; Zhang, M.J.; Wan, S.T. A survey on fault diagnosis of rolling bearings. Algorithms 2022, 15, 347. [Google Scholar] [CrossRef]
Avina-Corral, V.; Rangel-Magdaleno, J.; Morales-Perez, C.; Hernandez, J. Bearing fault detection in adjustable speed drive-powered induction machine by using motor current signature analysis and goodness-of-fit tests. IEEE Trans. Ind. Inform. 2021, 17, 8265–8274. [Google Scholar] [CrossRef]
Becker, V.; Schwamm, T.; Urschel, S.; Antonino-Daviu, J.A. Two current-based methods for the detection of bearing and impeller faults in variable speed pumps. Energies 2021, 14, 4514. [Google Scholar] [CrossRef]
Lv, R. Research on Asynchronous Motor Rotor Fault Diagnosis Technology Based on Stator Current Characteristic Analysis; Xi’an University of Science and Technology: Xi’an, China, 2021. [Google Scholar]
Chen, W. Research on Fault Diagnosis Method of Asynchronous Motor Based on Deep Transfer Learning; China University of Mining and Technology: Xuzhou, China, 2022. [Google Scholar]
Long, Z.; Zhang, X.F.; Zhang, L.; Qin, G.J.; Huang, S.D.; Song, D.Y.; Shao, H.D.; Wu, G.P. Motor fault diagnosis using attention mechanism and improved adaboost driven by multi-sensor information. Measurement 2021, 170, 108718. [Google Scholar] [CrossRef]
Zhu, X. Faults Detection of Locomotive Traction System by Modified Wavelet Bispectrum Motor Current Signature Analysis; Harbin University of Science and Technology: Harbin, China, 2020. [Google Scholar]
Kim, S.J.; Kim, K.; Hwang, T.; Park, J.; Jeong, H.; Kim, T.; Youn, B.D. Motor-current-based electromagnetic interference de-noising method for rolling element bearing diagnosis using acoustic emission sensors. Measurement 2022, 193, 110912. [Google Scholar] [CrossRef]
Ali, M.Z.; Shabbir, M.; Zaman, S.M.K.; Liang, X.D. Single- and multi-fault diagnosis using machine learning for variable frequency drive-fed induction motors. IEEE Trans. Ind. Appl. 2020, 56, 2324–2337. [Google Scholar] [CrossRef]
Neupane, D.; Seok, J. Bearing fault detection and diagnosis using case western reserve university dataset with deep learning approaches: A review. IEEE Access 2020, 8, 93155–93178. [Google Scholar] [CrossRef]
Shah, D.S.; Patel, V.N. A review of dynamic modeling and fault identifications methods for rolling element bearing. In Proceedings of the 2nd International Conference on Innovations in Automation and Mechatronics Engineering (ICIAME), Vallabh Vidyanagar, India, 7–8 March 2014; pp. 447–456. [Google Scholar]
Godoy, W.F.; Morinigo-Sotelo, D.; Duque-Perez, O.; da Silva, I.N.; Goedtel, A.; Palacios, R.H.C. Estimation of bearing fault severity in line-connected and inverter-fed three-phase induction motors. Energies 2020, 13, 3481. [Google Scholar] [CrossRef]
Song, L. Research on Fault Diagnosis of Planet Bearings Based on Analysis of Vibration and Motor Current; East China Jiaotong University: Nanchang, China, 2021. [Google Scholar]
Wang, X.B.; Luo, L.Q.; Tang, L.L.; Yang, Z.X. Automatic representation and detection of fault bearings in in-wheel motors under variable load conditions. Adv. Eng. Inform. 2021, 49, 101321. [Google Scholar] [CrossRef]
Li, H.Y.; Feng, G.J.; Zhen, D.; Gu, F.S.; Ball, A.D. A normalized frequency-domain energy operator for broken rotor bar fault diagnosis. IEEE Trans. Instrum. Meas. 2021, 70, 3500110. [Google Scholar] [CrossRef]
Wang, W.D.; Song, X.J.; Liu, G.H.; Chen, Q.; Zhao, W.X.; Zhu, H.Y. Induction motor broken rotor bar fault diagnosis based on third-order energy operator demodulated current signal. IEEE Trans. Energy Convers. 2022, 37, 1052–1059. [Google Scholar] [CrossRef]
Kabul, A.; Unsal, A. Diagnosis of multiple faults of an induction motor based on Hilbert envelope analysis. Metrol. Meas. Syst. 2022, 29, 191–205. [Google Scholar] [CrossRef]
Naha, A.; Samanta, A.K.; Routray, A.; Deb, A.K. A method for detecting half-broken rotor bar in lightly loaded induction motors using current. IEEE Trans. Instrum. Meas. 2016, 65, 1614–1625. [Google Scholar] [CrossRef]
Samanta, A.K.; Naha, A.; Routray, A.; Deb, A.K. Fast and accurate spectral estimation for online detection of partial broken bar in induction motors. Mech. Syst. Signal Process. 2018, 98, 63–77. [Google Scholar] [CrossRef]
Singh, G.; Naikan, V.N.A. Detection of half broken rotor bar fault in VFD driven induction motor drive using motor square current MUSIC analysis. Mech. Syst. Signal Process. 2018, 110, 333–348. [Google Scholar] [CrossRef]
Ma, H. Research on Fault Diagnosis of High Voltage Disconnector Mechanism Based on Motor Stator Current Characteristics; University of Science and Technology Beijing: Beijing, China, 2021. [Google Scholar]
Bloedt, M.; Granjon, P.; Raison, B.; Rostaing, G. Models for bearing damage detection in induction motors using stator current monitoring. IEEE Trans. Ind. Electron. 2008, 55, 1813–1822. [Google Scholar] [CrossRef]
Becker, V.; Schwamm, T.; Urschel, S.; Antonino-Daviu, J. Fault detection of circulation pumps on the basis of motor current evaluation. IEEE Trans. Ind. Appl. 2021, 57, 4617–4624. [Google Scholar] [CrossRef]
Singh, S.; Kumar, N. Detection of bearing faults in mechanical systems using stator current monitoring. IEEE Trans. Ind. Inform. 2017, 13, 1341–1349. [Google Scholar] [CrossRef]
Avina-Corral, V.; Rangel-Magdaleno, J.D.; Peregrina-Barreto, H.; Ramirez-Cortes, J.M. Bearing fault detection in ASD-powered induction machine using MODWT and image edge detection. IEEE Access 2022, 10, 24181–24193. [Google Scholar] [CrossRef]
Jimenez-Guarneros, M.; Morales-Perez, C.; Rangel-Magdaleno, J.D. Diagnostic of combined mechanical and electrical faults in ASD-powered induction motor using MODWT and a lightweight 1-D CNN. IEEE Trans. Ind. Inform. 2022, 18, 4688–4697. [Google Scholar] [CrossRef]
Zhang, X.T.; Hu, Y.H.; Deng, J.M.; Xu, H.; Wen, H.Q. Feature engineering and artificial intelligence-supported approaches used for electric powertrain fault diagnosis: A review. IEEE Access 2022, 10, 29069–29088. [Google Scholar] [CrossRef]
Liang, X.B.; Yao, J.Y.; Zhang, W.F.; Wang, Y.R. A novel fault diagnosis of a rolling bearing method based on variational mode decomposition and an artificial neural network. Appl. Sci. 2023, 13, 3413. [Google Scholar] [CrossRef]
Shifat, T.A.; Yasmin, R.; Hur, J.W. A data driven RUL estimation framework of electric motor using deep electrical feature learning from current harmonics and apparent power. Energies 2021, 14, 3156. [Google Scholar] [CrossRef]
Kerboua, A.; Kelaiaia, R. Fault diagnosis in an asynchronous motor using three-dimensional convolutional neural network. Arab. J. Sci. Eng. 2023, 19. [Google Scholar] [CrossRef]
Zhang, K.L.; Li, H.K.; Cao, S.X.; Yang, C.; Sun, F.B.; Wang, Z.B. Motor current signal analysis using hypergraph neural networks for fault diagnosis of electromechanical system. Measurement 2022, 201, 111697. [Google Scholar] [CrossRef]
Park, C.H.; Kim, H.; Lee, J.; Ahn, G.; Youn, M.; Youn, B.D. A feature inherited hierarchical convolutional neural network (FI-HCNN) for motor fault severity estimation using stator current signals. Int. J. Precis. Eng. Manuf.-Green Technol. 2021, 8, 1253–1266. [Google Scholar] [CrossRef]
Zhao, M.H.; Fu, X.Y.; Zhang, Y.J.; Meng, L.H.; Tang, B.P. Highly imbalanced fault diagnosis of mechanical systems based on wavelet packet distortion and convolutional neural networks. Adv. Eng. Inform. 2022, 51, 101535. [Google Scholar] [CrossRef]
Li, D.C.; Zhang, M.; Kang, T.B.; Li, B.; Xiang, H.B.; Wang, K.S.; Pei, Z.L.; Tang, X.Y.; Wang, P. Fault diagnosis of rotating machinery based on dual convolutional-capsule network (DC-CN). Measurement 2022, 187, 110258. [Google Scholar] [CrossRef]
Zhao, X.L.; Yao, J.Y.; Deng, W.X.; Ding, P.; Zhuang, J.C.; Liu, Z. Multiscale deep graph convolutional networks for intelligent fault diagnosis of rotor-bearing system under fluctuating working conditions. IEEE Trans. Ind. Inform. 2023, 19, 166–176. [Google Scholar] [CrossRef]
Zhao, X.L.; Yao, J.Y.; Deng, W.X.; Ding, P.; Ding, Y.F.; Jia, M.P.; Liu, Z. Intelligent fault diagnosis of gearbox under variable working conditions with adaptive intraclass and interclass convolutional neural network. IEEE Trans. Neural Netw. Learn. Syst. 2022. Early Access. [Google Scholar] [CrossRef] [PubMed]
Zhao, X.L.; Yao, J.Y.; Deng, W.X.; Jia, M.P.; Liu, Z. Normalized conditional variational auto-encoder with adaptive focal loss for imbalanced fault diagnosis of bearing-rotor system. Mech. Syst. Signal Process. 2022, 170, 108826. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16 × 16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations, Virtual Event, 3–7 May 2021. [Google Scholar]
Ding, Y.F.; Jia, M.P.; Miao, Q.H.; Cao, Y.D. A novel time-frequency Transformer based on self-attention mechanism and its application in fault diagnosis of rolling bearings. Mech. Syst. Signal Process. 2022, 168, 108616. [Google Scholar] [CrossRef]
Tang, X.Y.; Xu, Z.B.; Wang, Z.G. A novel fault diagnosis method of rolling bearing based on integrated vision Transformer model. Sensors 2022, 22, 3878. [Google Scholar] [CrossRef]
He, Q.C.; Li, S.B.; Bai, Q.; Zhang, A.S.; Yang, J.; Shen, M.M. A siamese vision Transformer for bearings fault diagnosis. Micromachines 2022, 13, 1656. [Google Scholar] [CrossRef]
Wu, G.G.; Yan, T.Y.; Yang, G.L.; Chai, H.Q.; Cao, C.C. A review on rolling bearing fault signal detection methods based on different sensors. Sensors 2022, 22, 8330. [Google Scholar] [CrossRef]
He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Saucedo-Dorantes, J.J.; Zamudio-Ramirez, I.; Cureno-Osornio, J.; Osornio-Rios, R.A.; Antonino-Daviu, J.A. Condition monitoring method for the detection of fault graduality in outer race bearing based on vibration-current fusion, statistical features and neural network. Appl. Sci. 2021, 11, 8033. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Naacl Hlt 2019), Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar]
Kingma, D.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
Lessmeier, C.; Kimotho, J.K.; Zimmer, D.; Sextro, W. Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification. In Proceedings of the European Conference of the Prognostics and Health Management Society, Chengdu, China, 19–21 October 2016. [Google Scholar]
Kabir, H.; Garg, N. Machine learning enabled orthogonal camera goniometry for accurate and robust contact angle measurements. Sci. Rep. 2023, 13, 1497. [Google Scholar] [CrossRef]
Bolon-Canedo, V.; Remeseiro, B. Feature selection in image analysis: A survey. Artif. Intell. Rev. 2020, 53, 2905–2931. [Google Scholar] [CrossRef]
Sun, M.D.; Wang, H.; Liu, P.; Long, Z.; Yang, J.T.; Huang, S.D. A novel data-driven mechanical fault diagnosis method for induction motors using stator current signals. IEEE Trans. Transp. Electrif. 2023, 9, 347–358. [Google Scholar] [CrossRef]
Toma, R.N.; Prosvirin, A.E.; Kim, J.M. Bearing fault diagnosis of induction motors using a genetic algorithm and machine learning classifiers. Sensors 2020, 20, 1884. [Google Scholar] [CrossRef]
Hoang, D.T.; Kang, H.J. A motor current signal-based bearing fault diagnosis using deep learning and information fusion. IEEE Trans. Instrum. Meas. 2020, 69, 3325–3333. [Google Scholar] [CrossRef]
Piedad, E.; Chen, Y.T.; Chang, H.C.; Kuo, C.C. Frequency occurrence plot-based convolutional neural network for motor fault diagnosis. Electronics 2020, 9, 1711. [Google Scholar] [CrossRef]

Figure 1. The structure of the Transformer encoder.

Figure 2. The schematic calculation process of self-attention.

Figure 3. The overall architecture of the proposed ISCV-ViT method.

Figure 4. Normalized current signal time domain images: (a) Single-phase current; (b) two-phase current processed by ISCV.

Figure 5. Flowchart of the ISCV-ViT model.

Figure 6. The test-rig of the bearing dataset of Paderborn University (PU).

Figure 7. Accuracy curves for different methods on the public dataset PU: (a) N15_M01_F10; (b) N15_M07_F10.

Figure 8. Confusion matrices for the best results of ISCV-ViT on the public dataset PU: (a) N15_M01_F10 (95.4%); (b) N15_M07_F10 (97.1%).

Figure 9. Accuracy and standard deviation of different methods under different conditions on the public dataset PU.

Figure 10. The composition of the motor fault diagnosis test rig.

Figure 11. Accuracy curves for different methods on the laboratory dataset SZTU: (a) Condition 1, (b) Condition 2, (c) Condition 3, and (d) Condition 4.

Figure 12. Confusion matrices for the best results of ISCV-ViT on the laboratory dataset SZTU: (a) Condition 1 (92.9%); (b) Condition 2 (93.3%); (c) Condition 3 (95.4%); (d) Condition 4 (94.2%).

Figure 13. Accuracy and standard deviation of different methods under different conditions on the laboratory dataset SZTU.

Table 1. The adjustable hyperparameter settings of the ISCV-ViT.

	Value
Input image size	$[224, 224, 3]$
Number of Transformer blocks $L$	12
Embedding dimension $d_{model}$	64
Number of attention heads $h$	12
Position encoding	1D

Table 2. The hyperparameter settings of the three benchmark models.

Model	Hyperparameter
ISCV-ViT	Max epochs = 200
	Batch size = 32
	Optimizer = Adam (lr = 2 × 10⁻³)
ViT	Max epochs = 200
	Batch size = 32
	Optimizer = Adam (lr = 2 × 10⁻³)
ISCV-ResNet50	Max epochs = 200
	Batch size = 32
	Optimizer = Adam (lr = 2 × 10⁻³)

Table 3. Experimental conditions of case 1.

No.	Speed (rpm)	Load (N·m)	Radial Force (N)	Condition
0	1500	0.7	1000	N15_M07_F10
1	1500	0.1	1000	N15_M01_F10

Table 4. Bearing fault categories of the experiments.

Label	Name	Fault Category	Extent of Damage	Combination
0	K001	Normal	-	-
1	KA30	OR1 (outer ring)	1	R (repetitive damage)
2	KB23	IROR	2	M (multiple damage)
3	KI04	IR1 (inner ring)	1	M
4	KA04	OR2	1	S (single damage)
5	KI16	IR2	3	S

Table 5. Performance of various algorithms based on MCS.

Algorithm	Accuracy (%)
RF	83.3
BT	81.7
Ensemble	93.3
ISCV-ViT	95.4

Table 6. Comparison between the proposed method and relevant works.

Method	Class	Type	Accuracy
ViT	5	M	51.8%
ISCV-ResNet50	5	M	81.9%
ISCV-ViT	5	M	95.4%
[53]	4	M	96.3%
[54]	3	M	97%
[55]	3	M	97.2%
[56]	5	M&E	92.4%

Table 7. Bearing fault categories.

Label	Bearing Code	Fault Location	Damage Description
0	Normal	None	Healthy
1	FBB	Rolling element	Artificial damage, 3 mm peeling pit on bearing ball
2	FBC	Compound fault (IR + OR)	Artificial damage, 2 mm cracks in inner & outer-race
3	FBI	Inner-race (IR)	Artificial damage, 2 mm crack in inner-race
4	FBO	Outer-race (OR)	Artificial damage, 2 mm crack in outer-race
5	TBW	Retainer	Artificial damage, broken bearing retainer

Table 8. Experimental conditions of case 2.

Speed (rpm)	Braking Current (A)	Name
1200	0	Condition 1
	0.31	Condition 2
	0.62	Condition 3
	0.94	Condition 4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, F.; Zhou, X.; Xu, B.; Yang, Z.; Qu, Z. Instantaneous Square Current Signal Analysis for Motors Using Vision Transformer for the Fault Diagnosis of Rolling Bearings. Appl. Sci. 2023, 13, 9349. https://doi.org/10.3390/app13169349

AMA Style

Chen F, Zhou X, Xu B, Yang Z, Qu Z. Instantaneous Square Current Signal Analysis for Motors Using Vision Transformer for the Fault Diagnosis of Rolling Bearings. Applied Sciences. 2023; 13(16):9349. https://doi.org/10.3390/app13169349

Chicago/Turabian Style

Chen, Fei, Xin Zhou, Binbin Xu, Zheng Yang, and Zege Qu. 2023. "Instantaneous Square Current Signal Analysis for Motors Using Vision Transformer for the Fault Diagnosis of Rolling Bearings" Applied Sciences 13, no. 16: 9349. https://doi.org/10.3390/app13169349

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Instantaneous Square Current Signal Analysis for Motors Using Vision Transformer for the Fault Diagnosis of Rolling Bearings

Abstract

1. Introduction

2. Theoretical Basis

2.1. Transformer Model

2.2. Transformer Encoder

3. The Proposed Method

3.1. Description of the Proposed Method

3.2. Specific Details of the Fault Diagnosis Method (ISCV-ViT)

3.2.1. The Instantaneous Square Current Value

3.2.2. Input Embeddings

3.2.3. Transformer Blocks

3.2.4. Training Strategy

4. Experimental Validation

4.1. Case 1: Public Bearing Dataset of Paderborn University (PU)

4.1.1. Dataset and Experimental Apparatus

4.1.2. Experimental Setup of Case 1

4.1.3. Evaluation and Analysis of Case 1

4.2. Case 2: Laboratory Bearing Dataset of Shenzhen Technology University (SZTU)

4.2.1. Experimental Apparatus

4.2.2. Experimental Setup of Case 2

4.2.3. Results Analysis of Case 2

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI