Fault Diagnosis of Induction Motors under Limited Data for across Loading by Residual VGG-Based Siamese Network

Chang, Hong-Chan; Liu, Ren-Ge; Li, Chen-Cheng; Kuo, Cheng-Chien

doi:10.3390/app14198949

Open AccessArticle

Fault Diagnosis of Induction Motors under Limited Data for across Loading by Residual VGG-Based Siamese Network

Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei 10607, Taiwan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(19), 8949; https://doi.org/10.3390/app14198949

Submission received: 24 August 2024 / Revised: 30 September 2024 / Accepted: 2 October 2024 / Published: 4 October 2024

(This article belongs to the Collection Modeling, Design and Control of Electric Machines: Volume II)

Download

Browse Figures

Versions Notes

Abstract

:

This study proposes an improved few-shot learning model of the Siamese network residual Visual Geometry Group (VGG). This model combined with time–frequency domain transformation techniques effectively enhances the performance of across-load fault diagnosis for induction motors with limited data conditions. The proposed residual VGG-based Siamese network consists of two primary components: the feature extraction network, which is the residual VGG, and the merged similarity layer. First, the residual VGG architecture utilizes residual learning to boost learning efficiency and mitigate the degradation problem typically associated with deep neural networks. The employment of smaller convolutional kernels substantially reduces the number of model parameters, expedites model convergence, and curtails overfitting. Second, the merged similarity layer incorporates multiple distance metrics for similarity measurement to enhance classification performance. For cross-domain fault diagnosis in induction motors, we developed experimental models representing four common types of faults. We measured the vibration signals from both healthy and faulty models under varying loads. We then applied the proposed model to evaluate and compare its effectiveness in cross-domain fault diagnosis against conventional AI models. Experimental results indicate that when the imbalance ratio reached 20:1, the average accuracy of the proposed residual VGG-based Siamese network for fault diagnosis across different loads was 98%, closely matching the accuracy of balanced and sufficient datasets, and significantly surpassing the diagnostic performance of other models.

Keywords:

few-shot learning model; across-load fault diagnosis; limited data; signal processing; induction motors

1. Introduction

The demand for motors has been steadily increasing due to the advancements in industrial automation, necessitating reliable long-term operation. Consequently, the significance of motor operation monitoring systems has become paramount. Historically, motor fault diagnosis has been segmented into three primary phases. Initially, physical signals from the motor, such as voltage, current, temperature, and vibration, are captured by sensors and stored in a database. Subsequently, features are extracted from these captured signals using various discrete signal processing techniques or time–frequency analyses, including empirical mode decomposition (EMD), variational mode decomposition (VMD), Hilbert transform, and wavelet packet transform (WPT). The final phase involves performing motor fault diagnosis based on these extracted features [1,2,3]. However, a fundamental drawback of these traditional fault diagnosis methods is the requirement to consider denoising, filtering, and feature extraction during design. This requirement significantly increases the complexity of the diagnostic process. Furthermore, these methods do not support the automatic diagnosis of motor faults, presenting a substantial limitation in practical applications [4].

In recent years, artificial intelligence (AI) has been extensively applied in motor fault diagnosis technologies, with numerous model architectures and signal processing techniques being successively proposed, all achieving high diagnostic performance. Shao et al. utilized the deep belief network to diagnose induction motor faults [5]. Hoang and Kang et al. developed a motor-bearing fault diagnosis method that combines deep learning with information fusion [6]. Further, Shao et al. transformed one-dimensional vibration signals into two-dimensional time–frequency images, which were subsequently processed using a deep convolutional neural network (CNN) to detect faults in induction motors [7]. Wang et al. introduced a one-dimensional CNN model that includes multiple convolutional feature extraction modules to identify healthy, demagnetization fault, and bearing fault states [8]. Additionally, Shifat and Hur employed current and vibration signals as inputs for the artificial neural network to detect faults in brushless DC (BLDC) motors [9]. Although these methods can achieve effective motor fault diagnosis, they require a substantial amount of training data and a balanced dataset to ensure satisfactory diagnostic outcomes [10]. However, in real-world industrial settings, motors predominantly operate in healthy states, and the data acquisition systems primarily collect data from healthy motors [11]. Consequently, motor fault data under varying operational conditions are exceedingly scarce, leading to severe imbalances in the training data [12]. The imbalance of training data causes overfitting during the model training process, and the accuracy of the minority class with few data is lower [13]. As a result, the trained deep learning models often lack generalization and practical utility. Moreover, labeling fault data under different working conditions demands significant human resources and incurs high economic costs [14]. These challenges substantially diminish the efficacy of intelligent motor fault diagnosis systems.

As mentioned earlier, in the induction motor fault diagnosis field, data are typically collected from normal operations and greatly outnumber the fault data, leading to a highly imbalanced dataset. This imbalance in training data makes it difficult for classifiers to identify different fault categories accurately [15]. The higher the imbalance ratio, the more significant the decline in the diagnostic effectiveness of the classifier, especially in terms of accuracy in identifying fault categories [16]. Therefore, addressing data imbalance is crucial for improving fault diagnosis accuracy. Common methods such as over-sampling and down-sampling are employed to manage data imbalance effectively [17]. To address these challenges, numerous studies have introduced AI-based over-sampling and down-sampling techniques that facilitate efficient model training with limited data while ensuring effective convergence [18]. Over-sampling involves data augmentation for classes with fewer samples, subsequently training the classification model. Conversely, down-sampling allows the model to rapidly converge during training with limited data, reflecting the model’s characteristics and achieving precise classification outcomes. In the context of over-sampling, Zhang et al. utilized a deep convolutional generative adversarial network (DCGAN) to generate synthetic fault data, thereby increasing the training sample size for the classification model. This was followed by employing a residual connected convolutional neural network (RCCNN) for feature extraction and classification of stator current data to facilitate motor fault diagnosis. The effectiveness of the RCCNN was then validated using both the original and the DCGAN-generated data [19]. Hang et al. introduced a principal component analysis two-step clustering synthetic minority over-sampling algorithm combined with random forest (PCA-TS-SMOTE-RF) for motor bearing fault diagnosis in high-dimensional imbalanced data [20]. Furthermore, Bal and Kumar introduced a weighted regularization extreme learning machine (WR-ELM) model specifically designed for imbalanced fault data [21]. Fan et al. applied the synthetic minority over-sampling technique (SMOTE) to ensure fault samples were sufficient and balanced, thus enhancing the fault diagnosis accuracy of the classification model [22]. Chang et al. devised a DCGAN and CNN for induction motor fault diagnosis. However, this study still has some limitations. The training process of the DCGAN may be unstable and may fail to generate similar samples, which affects the classification performance of the CNN [23]. Regarding down-sampling research, Shao et al. designed a VGG transfer learning model architecture and a time–frequency domain transformation technique to convert raw vibration signals into time–frequency images for the fault diagnosis of induction motor bearings and gearboxes [24]. Janssens et al. explored a deep transfer learning model to simplify the model retraining process and utilized thermal infrared imaging for motor bearing fault diagnosis [25]. Ullah et al. implemented the VGG-16 transfer learning model, using fast Fourier transform (FFT) to transform vibration and current signals into RGB frequency domain images for demagnetization detection and motors bearing faults [26].

From the above literature review, it is evident that the synthetic minority over-sampling technique (SMOTE) is a widely used method for data augmentation in scenarios characterized by limited data availability. However, a significant limitation of SMOTE is its inability to expand sample diversity based on the features and distribution of the samples, which may lead to the generation of incorrect or superfluous samples [27]. Although generative adversarial networks (GANs) can establish effective mapping relationships for data features, the complexity of fault features in raw vibration signals in real industrial settings presents challenges. Data generated by GANs typically only replicate the prominent features of fault signals; the noise within these signals can provide misleading information to the discriminator model, preventing the generative model from accurately fitting the true sample distribution. This discrepancy often results in a generated sample distribution that does not closely mirror the actual sample distribution [28]. Moreover, the variability of motor vibration fault signals under different operating conditions further complicates fault classification accuracy. Utilizing vibration fault data from a single operating condition to establish mappings for different fault features could diminish the classification accuracy under varied conditions [29]. The efficacy of transfer learning is also contingent upon data distribution between the source and target domains, which can adversely affect the model’s classification accuracy [30]. In addition, no standardized criterion for determining the optimal number of layers to fine-tune. If the similarity between source and target domain data is low, more layers may require fine-tuning. However, excessive fine-tuning of many layers might lead to difficulties in model convergence. Conversely, fine-tuning too few layers can degrade the model’s classification performance.

In summary, real-world industrial scenarios frequently encounter challenges such as obtaining fault data, the diversity of fault types, high costs associated with labeling, and specific challenges in industrial applications. In response to these issues, this study introduces the few-shot learning model based on the residual VGG-based Siamese network, designed to address deficiencies in training data effectively. This model was employed for the across-load fault diagnosis of induction motors with limited data availability. Two different data imbalance ratios were established in experimental scenarios, and various sampling methods were compared and analyzed to ascertain the superiority of the method proposed in this study. The principal contributions of this study are outlined below:

Technical Integration: Combining the improved Siamese network with the time–frequency domain transformation technique can offer a holistic approach to the problem, notably improving diagnostic accuracy with limited data conditions.
Addressing Limitations: This paper proposed a residual VGG-based Siamese network comprising two key architectures: the feature extraction network-residual VGG and the merged similarity layer. The residual VGG architecture is designed to overcome certain shortcomings of deep neural networks, such as degradation problems and overfitting. Meanwhile, the merged similarity layer uses multiple distance metrics in a low-dimensional feature space to enhance the model’s classification performance.
Tackling Data Imbalance: For across-load fault diagnosis in induction motors, we developed experimental models for four common types of motor failures. Vibration signals for healthy and fault conditions were measured under different load levels. Comparative analyses with other leading AI models, such as transfer learning models-ResNet50 and deep learning models integrating DCGAN with CNN [23], demonstrate the superior performance of our proposed model under data-imbalanced scenarios. Remarkably, our model achieved an average accuracy rate as high as 98% when the training dataset imbalance ratio was 20:1.
Cross-Domain Fault Diagnosis: Existing studies focus primarily on single, common motor-bearing faults. In contrast, our experimental setup incorporates stringent real-world industrial conditions, including data imbalances, multiple types of faults, and different load levels. Our model performed well in across-load fault diagnosis in induction motors with limited data conditions. This highlights the model’s generalizability and its high applicability in fault diagnosis.

In conclusion, this study successfully integrates the time–frequency domain transformation technique with the residual VGG-based Siamese network to address challenges in across-load fault diagnosis for induction motors. The organization of this paper is as follows: Section 2 describes the experimental design, including fault types, data acquisition methods, and processing procedures; Section 3 and Section 4 provide a detailed introduction to the theoretical foundations of the residual VGG-based Siamese network model and the time–frequency domain transformation technique; Section 5 presents the experimental results and offers a comparative analysis of the performance of the proposed method with other advanced models; Section 6 summarizes the entire paper, emphasizing the main contributions of this research.

2. Induction Motor Fault Types

The proportions of induction motor faults according to the statistics of fault types in reference [31] are shown in Table 1. The major types are bearing faults, stator faults, other faults, and rotor faults. Therefore, this study selected the fault types of higher proportions as the induction motor fault types, including stator fault, rotor fault, bearing fault, and misalignment failure. The experimental models of induction motor faults and the methods to damage the induction motor are described below.

Stator winding short circuit

During the inter-turn short circuit of the stator winding, the current amplitude was abnormal, the winding generated high heat, and the heat was quadratically proportional to the stator current. Other heat sources included the internal power loss of rotor conductors, stator core loss, and mechanical loss. Considering the experimental safety and experimental objective, the insulation was damaged only between two adjacent turns of one phase winding, and the fault degree was minimized, as shown in Figure 1a.

2.: Rotor broken bar

The rotor broken bar fault model is shown in Figure 1b. The rotor fault was made by drilling two holes in a diameter of 7 mm and a depth of 30 mm. If this fault type occurred, the three-phase current of the stator and rotor would be imbalanced, and the generated electromagnetic field would be imbalanced as well. As a result, the torque and speed increased, and the ripple component would increase. The impedance was infinitely great where the rotor was damaged, so the induced potential was zero, and the motor torque decreased.

3.: Bearing outer ring damage

The function of the bearing is to rotate the connections and supports between components. It receives inevitable physical actions such as mechanical stress and mechanical wear; the bearing may be deformed in the long run, so that the performance is degraded. Complex technologies in the bearing manufacturing process lead to relatively precise bearing structures. To prevent damage to other components of the bearing, such as the balls and inner ring, direct application of external force was avoided. The machining method to induce bearing failure was drilling. The outer ring was drilled, and the hole depth and diameter were 1 (mm). The bearing fault model is shown in Figure 1c.

4.: Misalignment failure

The misalignment failure model refers to the coupling between the induction motor and load being deflected to simulate the central axis deviation of the coupling under physical external force, e.g., sideswipe; in the installation process, the offset degree not only induces severe motor faults but also influences the noise level in the motor operation and the vibration amplitude of rotation imbalance. Therefore, considering experimental safety, the coupling between the healthy motor and load was offset by 0.5 mm upward in this misalignment failure model, as shown in Figure 1d.

3. Signal Preprocessing Techniques

The vibration signals of the motors are non-stationary and vary over time. Neither frequency-domain nor time-domain techniques alone can provide sufficient information for the effective diagnosis of these signals. The traditional frequency analysis method, FFT, provides frequency information, expressed as Equation (1) [32]. However, there are multiple frequency components in the physical signal of stator faults, making it difficult to distinguish the relationship between these frequencies and bearing fault frequencies. Based on FFT, the method requires a large amount of data and high computational cost to increase the sample rate when the samples are the same [33]. However, the spectrogram obtained after time–frequency analysis can be used to identify the characteristics of stationary or non-stationary signals. Time–frequency analysis captures how frequency components vary over time. The time–frequency characteristics will also be significantly different in the time–frequency domain under both healthy and faulty conditions [34]. Therefore, the spectrogram is a useful technique for effectively analyzing the physical signals generated by mechanical components with various frequency components [35]. A common time–frequency transform technique is continuous wavelet transform (CWT). The CWT is a time-frequency transform method. By scaling and translating the mother wavelet function, CWT can analyze signals at different times and scales. Because the scaling factor allows for the dynamic adjustment of time and frequency resolutions, CWT exhibits multi-resolution characteristics, enabling it to achieve high resolution in both the time and frequency (scale) domains, simultaneously. Therefore, CWT is particularly suitable for analyzing non-stationary signals.

X [h] = \sum_{k = 0}^{K - 1} x [k] e^{- j \frac{2 π k h}{K}}

(1)

where real discrete signals

x [k]

, having a finite length

K

, are derived from the sampling of continuous signals and

j

is the imaginary unit.

Multi-resolution capability in time–frequency distribution plots is crucial for accurately diagnosing motor faults from vibration signals. The CWT has been developed into a time–frequency analysis method for stationary and non-stationary signals. Since the translation and scaling of the mother wavelet function can provide fine resolution in time and frequency, it can fully demonstrate the energy distribution from a low frequency to a high frequency.

The CWT can transform the time sequence signals to the time–frequency domain. The CWT process decomposes the time sequence signals by scaling and translation, expressed as Equation (2) [36]. The signals will be decomposed into linear combinations using the mother wavelet function (different scaling and translation) as the basis function. In addition, as the basis function has

τ

and

s

parameters, the CWT can obtain the relationship between time and frequency, and the signal frequency range in the period can be observed; the transformation process is shown in Figure 2.

W (τ, s) = \frac{1}{\sqrt{s}} \int_{- \infty}^{\infty} x (t) ψ (\frac{t - τ}{s}) ⅆ t

(2)

where

τ

is the translation parameter, time translation is available, so the location of time can be provided,

s

is the scale parameter, representing the mother wavelet scaling factor,

x (t)

is the time sequence signal, and

ψ (t)

is the mother wavelet function.

4. Proposed Model

Few-shot learning operates with training datasets where each class contains only a handful of samples, unlike the extensive labeled datasets typically used in deep learning. This enables few-shot learning to effectively accomplish classification tasks with limited training data, leveraging the unique characteristics of its model architecture. The training data for the few-shot learning model comprise multiple classification tasks; each task involves a combination of samples that belong to the same class or different classes [37]. Unlike traditional supervised learning methods, which categorize individual samples, the model in few-shot learning gains proficiency by classifying pairs of samples, and distinguishing whether they belong to the same or different classes. This approach facilitates the model’s ability to generalize from minimal data.

The few-shot learning dataset is divided into three parts: training dataset, support set, and query set. The training dataset is the dataset for model training, each pair of samples in this set is composed of data from the same or different categories, and it does not intersect with the support set and the query set, as shown in Equation (3) [38].

T = \{(X_{u}, X_{v}), P (X_{u}, X_{v})\} a n d u, v = 1 \dots h

(3)

where

(X_{u}, X_{v})

are the sample pairs in the training dataset and

P (X_{u}, X_{v})

is the sample pair’s similarity probability. The similarity probability for samples of the same category is 1, while the similarity probability for samples of different categories is 0.

h

is the number of training samples.

The support set is used to compare the test sample. This set typically consists of samples from various classes, each with only a minimal number of samples. These samples primarily assist the model in predicting classes, as shown in Equation (4) [38].

S_{N - s h o t K - w a y} = \{(I_{m}, L_{k})\}, m = 1 \dots M, k = 1 \dots K

(4)

where in N-shot K-way testing, the support set consists of

M = N \times K

number of samples,

N

is the shot representing the number of samples per category, and

K

is the way that denotes the number of categories in this set.

The query set refers to the collection of test data samples for the model, as shown in Equation (5).

Q_{t e s t} = \{I_{q}, q \in R\}

(5)

where

I_{q}

are test data samples from classes in the support set and

q

is the number of image samples.

The fault diagnosis process using the residual VGG-based Siamese network proposed in this study is structured into three main stages: dataset preparation, model training, and model testing. The dataset preparation stage encompasses two steps. Initially, time-domain signals are converted into time–frequency images through CWT transformation. Subsequently, pairs of samples are randomly selected as model input; these samples are labeled as belonging to the same or different classes based on their actual labels. During the model training stage, the model learns to distinguish between samples of the same class and different classes. The specific model architecture and loss function enable effective learning even with a minimal number of training samples. The details of the model architecture and the training algorithm are extensively discussed in Section 4.1, Section 4.2, Section 4.3 and Section 4.4. During the model testing stage, N samples are randomly selected from the training dataset, encompassing both healthy state data and four fault types. These samples are tested against each other to assess the model’s performance. The classification of the test samples is determined based on the probabilities predicted by the model. The classification results for different shots are then analyzed in the experimental cases.

4.1. Formatting of Mathematical Components

A residual VGG-based Siamese network model is proposed in this study. The model architecture consisted of two identical neural networks, and two samples were inputted as a combination into the model simultaneously each time. The sample pair had the same or different classes of labels. Afterward, two identical neural network models performed feature extraction and exported low-dimensional vectors. Then, the distance between the two low-dimensional vectors was calculated by using the multiple similarity distance equation, which was sent to the fully connected layer at last to obtain the probability distance, representing the similarity between two input samples. The two models had identical weights during training. An image is sent through one of the neural networks, where the contrastive loss function and optimized model parameters enhance classification accuracy, allowing the model to effectively classify even previously unseen samples.

In Ref. [39], the contrastive loss function used by the Siamese network could effectively calculate the resulting loss of the same or different labels classified by the neural network. When the class labels were identical and the distance measure was large, the contrastive loss function value was large; when the class labels were different and the distance measure was small, the contrastive loss function value was large. This loss function effectively captures the model’s matching degree during training, enabling the optimization process to facilitate learning with fewer samples and ensuring convergence. The contrastive loss function is expressed in Equation (6) [39]:

C_{l o s s} ({Y, M S D}_{w}) = 0.5 (1 - Y) {({M S D}_{w})}^{2} + 0.5 (Y) {[m a x (0, m - {M S D}_{w})]}^{2}

(6)

where

{M S D}_{w}

is the multiple similarity distance. The value of

Y

is 0 or 1, representing the sameness or dissimilarity of two samples. When

Y

= 1 (same samples), the loss function was only

0.5 {[m a x (0, m - {M S D}_{w})]}^{2}

, and the

{M S D}_{w}

value was larger than

m

as much as possible. When

Y

= 0 (different samples), the loss function was only

0.5 {({M S D}_{w})}^{2}

, and the

{M S D}_{w}

value was reduced as much as possible.

4.2. Merged Similarity Layer Architecture

The merged similarity layer architecture is shown in Figure 3. This architecture measures the distance between feature vectors using multiple similarity distance methods, and can effectively improve the accuracy of model classification for similar samples. Firstly, pairs of images are processed through two identical feature extraction layers—a residual VGG, which outputs two low-dimensional feature vectors,

a

and

b

. Secondly, the Euclidean and Manhattan distances are used to measure the distance between the two feature vectors, calculating the similarity between the samples, as shown in Equations (7) and (8). Finally, the sigmoid activation function is used to output a similarity probability value between 0 and 1, to determine the similarity of the input samples. It also combines few-shot learning testing methods to classify the true labels of the test samples.

The Euclidean distance

{E D}_{w}

is shown in Equation (7):

{E D}_{w} (a, b) = \sqrt{\sum_{i = 1}^{r} {(a_{i} - b_{i})}^{2}}

(7)

where

a

and

b

are the outputs obtained from inputting two images separately into the feature extraction layers of the residual VGG, resulting in low-dimensional feature vectors. The Manhattan distance

{M D}_{w}

is shown in Equation (8):

{M D}_{w} (a, b) = \sum_{i = 1}^{r} |a_{i} - b_{i}|

(8)

4.3. Feature Extraction Architecture

The feature extraction architecture of the residual VGG is an enhanced version of the original VGG model [40], as illustrated in Figure 4. The residual VGG consisted of multiple convolutional layers and the max pooling layer. The fully connected layer was replaced by a global average pooling layer, and a residual learning architecture was incorporated. The smart fault diagnosis method for induction motors had a good diagnostic effect on the Siamese network feature extraction layer. The characteristics of this kind of model include the greatly reduced number of model parameters, avoiding gradient vanishing during model training, extracting data features, and enhancing the model’s nonlinearity. The meanings of the architectures are described below.

Multiple convolutional layers and convolution kernel selection

The setting of multiple convolutional layers could increase the receptive field of the model, so that the model could obtain more feature information, and the classification capacity could be enhanced effectively. Additionally, this enhances the model’s nonlinearity, enabling it to learn more complex features [41]. The receptive field size of each layer can be computed by Equation (9) [42]:

R_{l} = R_{l - 1} + (K_{l} - 1) \prod_{i = 1}^{l} S_{i}

(9)

where

R_{l}

is the receptive field size of the

l

-th layer,

R_{l - 1}

is the receptive field size of the neural network in the layer above the

l

-th layer,

K_{l}

is the size of the convolutional kernel, and

S_{i}

is the stride of the convolutional kernel.

A small convolution kernel was set for each convolution layer, and it could greatly reduce the number of model parameters to reduce the amount of computation of hardware facilities [43]. For example, one 5 × 5 convolution layer was substituted by two 3 × 3 convolutional layers. When the numbers of input and output channels

C

were identical, the numbers of parameters were 2(3 × 3)

C^{2}

and 5 × 5 ×

C^{2}

. According to the example, if the convolution kernel parameter was small, the number of model parameters was small.

2.: Maximum pooling layer and global average pooling layer

To reduce the dimensionality of feature vectors and model parameters while preserving feature information, the max pooling layer was employed for local maximum feature extraction and operation [44,45]. The global average pooling layer calculates the average value of all elements in the feature map, and the parameter count is

C_{i n} \times C_{o u t}

. In contrast, the parameter count of the first fully connected layer is

C_{i n} \times C_{o u t} \times {s i z e}_{f m}^{2}

. Here,

C_{i n}

and

C_{o u t}

, respectively, represent the number of input and output channels, and

{s i z e}_{f m}^{2}

is the size of the input feature map. Thus, replacing the fully connected layer with a global average pooling layer significantly reduces the number of model parameters, as demonstrated in Table 2. As a result, overfitting is less likely to occur during model training, and the model can be normalized, effectively enhancing the model’s generalization ability.

3.: Residual learning

The residual VGG added batch normalization in each max pooling layer to solve the vanishing gradient problem in the initial training period of the depth model so that the model could converge easily in the initial training period [46]. However, because of the degradation problem, the deep neural networks were still difficult to optimize during the training period.

Deep neural networks have a degradation problem during training. This problem indicates that the model has achieved accuracy saturation during training, but the gradient vanishing is generated so that the internal parameters of the model cannot be updated. The gradual increase in training loss and the decrease in training accuracy led to failure in the convergence of the model. Therefore, to solve the degradation problem in the depth model, this depth model architecture used the residual learning proposed in reference [47], so that the model was easy to optimize during training. The residual learning is that the nervous layer of the model is provided with a shortcut connection, if the nervous layer gradient vanishes in model training, the shortcut connection can generate identity mapping in the learning process to reduce the training errors. As shown in Figure 5, x is the input, H(x) is the output, and F(x) is the residual. The figure shows that in residual learning, x went through the feature extraction of the first nonlinear layer, and the output F(x) was obtained using the activation function. After the feature extraction of the second nonlinear layer, and before the activation function was used, F(x) was provided with input x to obtain output H(x). Then, the output was obtained using the activation function, and x was added before the activation function in the output of the second nonlinear layer. This path is called a shortcut connection. If F(x) was zero during model training, this nonlinear layer gradient could not update model parameters. As a result, the model generated identity mapping through a shortcut connection, so that H(x) = x; meanwhile, the neural network learning effect was maintained. This method enables the depth model to learn the task object simply while training and to achieve fast convergence.

4.4. Model Training Algorithms

The training algorithm for the residual VGG-based Siamese network is shown in Algorithm 1. Firstly, vibration signals from healthy states and four types of faults in induction motors are measured and converted into two-dimensional images using signal processing techniques. This conversion provides more feature information and standardizes the images to enhance the model’s accuracy and convergence speed. Secondly, the training dataset

T

is processed to pair the same class or different classes of images, forming images

X

, with each pair containing two images. Subsequently, key training parameters such as epochs

e

, batch size

b

, and parameters for the Adam optimizer are determined. To promote effective learning and convergence, a learning rate adjustment mechanism is implemented, reducing the learning rate by half every 20 epochs. This strategy helps to rapidly approach the optimal solution in the initial phase and fine-tune the adjustments later to avoid excessive oscillations and ensure stable convergence of the model.

During each iteration

i

of the model training, the residual VGG (feature extraction model)

f_{w}

extracts features from the images, and the residual learning architecture prevents convergence failures. The image features are then captured by the feature extraction layer and output as low-dimensional feature maps. These feature maps are transformed into feature vectors

O_{1}

and

O_{2}

through the global average pooling layer. The Euclidean distance

{E D}_{w}

and Manhattan distance

M D_{w}

are used to calculate the distances between two images, and the sigmoid activation function determines their similarity. Finally, the contrastive loss function

L^{(t)} (w, Y, \hat{I_{1}}, \hat{I_{2}})

calculates the classification loss based on the Euclidean and Manhattan distances, and the Adam optimizer adjusts the model parameters

w

based on the loss value to optimize the model.

Algorithm 1 Residual VGG-Based Siamese Network. We default parameters of Y = 0 or 1

Require: The epoch

e

= 100, the iteration

i

, the batch size

b

= 32, both of images and label

X

, the margin

m

, the feature extraction layer

f_{w}

, Adam hyperparameter

α

,

β_{1}

= 0.9,

β_{2}

= 0.999.

Require: Initial feature extraction layer’s parameters

w_{0}

, initial learning rate

α_{0}

.

1: while

w

has not converged do

2: for

h

= 1, …,

e

do

3: for

u

= 1, …,

i

do

4: for

t

= 1, …,

b

do

5: Training dataset

T = \{I_{1}, I_{2}, . . ., I_{z}\}

6:

X = [\hat{I_{1}}, \hat{I_{2}}, Y], \hat{I_{1}} \in T, \hat{I_{2}} \in T

7:

O_{1} = f_{w} (\hat{I_{1}})

, where

\hat{I_{1}} \in X

8:

O_{2} = f_{w} (\hat{I_{2}})

, where

\hat{I_{2}} \in X

9:

{M S D}_{w} (O_{1}, O_{2}) = {E D}_{w} (O_{1}, O_{2}) + M D_{w} (O_{1}, O_{2})

10:

L^{(t)} (w, Y, \hat{I_{1}}, \hat{I_{2}}) = 0.5 (1 - Y) {({M S D}_{w})}^{2} + 0.5 (Y) {[m a x (0, m - {M S D}_{w})]}^{2}

11: end for

12:

w \leftarrow A d a m (\nabla_{w} \frac{1}{b} \sum_{t = 1}^{b} L^{(t)}, w, α, β_{1}, β_{2})

13: end for

14:

α = α_{0} \times {(0.5)}^{\frac{d}{20}}

, where

d = 20 v

,

v \in h \cup \{0\}

15: end for

16: end while

4.5. Model Testing Process

The few-shot learning model classification effect was tested by N-shot K-way; the following test method is stated in reference [48]. In the one-shot K-way test, (1) the dataset composed of the support set has K classes, and each type has only one sample, expressed as Equation (10); (2) one sample

I_{t e s t}

is obtained from the test dataset, and compared with the data of the support set one by one, and each sample in the test dataset should be compared with all data of the support set; (3) the result of comparison between each test sample and the support set is identified as the class according to the test sample most similar to the support set, expressed as Equation (11) [48].

S_{o n e - s h o t K - w a y} = \{(I_{m}, L_{k})\}, m = 1, k = 1 \dots K

(10)

C (I_{t e s t}, S) = \arg \max_{s} (P (I_{t e s t}, I_{s})), I_{s} \in S

(11)

where

I_{t e s t}

is the test data sample,

S

is the support set,

I_{s}

is the one sample from the support set, and

P (I_{t e s t}, I_{s})

is the probability of sample similarity.

In the N-shot K-way test, (1) the dataset composed of the support set has K types, and each type has N samples; (2) each test sample should perform the one-shot K-way test N times; and (3) N test results of each class (K classes) are added up, the highest total probability is the class identified by the model, expressed as Equation (12) [48].

C (I_{t e s t}, (S_{1}, \dots, S_{N})) = \arg \max_{s} (\sum_{n = 1}^{N} P (\hat{I}, I_{s n})), I_{s n} \in S_{n}

(12)

5. Experimental Case Planning, Results, and Analysis

The across-loading fault diagnosis of induction motors of the experimental case was divided into two major classes, which are model comparison and analysis of signal processing and AI with limited data. Each experimental result was the average result of ten trainings and tests, and the indexes were average accuracy and standard deviation (STD). (1) The influence of different data preprocessing methods on motor fault diagnosis was analyzed in the signal processing case, favorable for the intelligent diagnostic model to obtain more feature information. The signal processing techniques were compared in the experimental case, so each signal conversion method used the residual VGG-based Siamese network model proposed in this study, and the data were sufficient and balanced. (2) In the experimental case of AI with limited data, the method proposed in this study was compared with different AI models in the fault diagnosis effect with training dataset imbalance ratio.

5.1. Measurement of Experimental Data

The measurement platform system device for this experiment is shown in Figure 6, and the equipment items are shown in Table 3. This measurement platform system can control the load (No. 6) through the dynamometer control box (No. 3), to measure the data of the induction motor (No. 4) under different loads. The values of the tachometer (No. 1) and torque meter (No. 2) could be observed during measurement to judge the induction motor operating condition. The three-axis vibration signals of the 2 HP induction motor were measured in the experiment, and the operating states were the healthy state and four fault types. The fault models included stator winding short circuit, rotor broken bar, bearing outer ring damage, and misalignment failure. The motor data were measured under 50% load and 100% load. The sampling rate of each data signal was 20 kS/s, the sampling time was 5 s, and there were 100,000 sample points.

5.2. Experimental Case

5.2.1. Case Study 1—Signal Processing Techniques

Case design and dataset planning

In this experimental case, a 2 HP induction motor was used to perform load tests under different operating conditions using its vibration signals. The training dataset was well-balanced, providing ample data for each category. The same model architecture was employed to compare different signal processing techniques. The collected data underwent processing using CWT and the two-dimensional time-domain signal transformation. CWT transforms the original vibration signal into a time–frequency image, where the x-axis denotes time, the y-axis denotes frequency, and color changes indicate the magnitude of energy. The two-dimensional time-domain signal transformation converts the original vibration signal into a two-dimensional time signal image by mapping the maximum and minimum values of the vibration signal in a given time segment to a scale between 255 and 0. The AI model’s training dataset was segmented into training, validation, and test sets, as detailed in Table 4. Load testing was conducted, and classification metrics were calculated to assess the impact of different signal processing techniques on the fault diagnosis of induction motors under varying load conditions.

2.: Analysis of experimental results

In the induction motor faults diagnosis, signal preprocessing methods involve important information such as frequency components, microscopic vibrations, and signal-to-noise ratio. These pieces of information are closely related and jointly affect the effectiveness of fault diagnosis. Time-domain signals have limitations in the analysis of induction motor faults, making it difficult to determine operational conditions. This is because time-domain signals often contain multiple frequency components, and different frequencies interfere with each other, making it hard to identify fault characteristic signals. Although analysis of low-frequency components can diagnose motor faults, low-frequency components can be disturbed by the resonance generated by faulty mechanical elements and other components, leading to a decline in diagnostic effectiveness. However, high-frequency components can effectively observe the microscopic vibrations of mechanical components, as the signal-to-noise ratio is typically high. In high signal-to-noise ratio conditions, this helps to minimize interference from vibrations caused by other mechanical components, making fault features in the signals more distinct and easier to detect and analyze across various faulty elements. In addition, different faulty mechanical elements will generate impacts within different time intervals. Therefore, the time resolution within the high-frequency range has significant implications for diagnosing the operational status of the motor. By performing spectral analysis of microscopic structural vibrations, one can more accurately assess the operational status of the motor, and timely discover and prevent potential faults.

However, the mother wavelet function of CWT possesses scaling and translation characteristics, making the spectrum of CWT have high resolution in the time–frequency domain. It allows visualizing the signal’s frequency distribution changes over time and uses different colors to represent the amplitudes of different frequencies. Thus, the spectral map represents the energy distribution of the signal under different frequencies, which aids in analyzing signal characteristics and changes, and effectively provides an AI model for diagnosing the operational status of the motor.

The results are shown in Table 5. The CWT compared to two-dimensional time-domain signals transform shows that CWT provides more information on the characteristics of the classification model for fault diagnosis. The average accuracy of the CWT is 99.82% and 98.34%, respectively, which is much higher than the average accuracy of the two-dimensional time-domain signals transform, which is 22.21% and 10.76%, respectively, as shown in Figure 7. This case study compares different signal processing techniques and verifies the superiority of CWT.

5.2.2. Case Study 2—AI Model with Limited Data

Case design and dataset planning

There are over-sampling and down-sampling methods for AI models with training data deficiencies. Over-sampling uses the classes of a few training data for data augmentation until the quantity equals the most data class. Down-sampling is the opposite of over-sampling; down-sampling performs random data selection or algorithm screening for the classes of major data. The selected quantity is the same as the classes of few data. The fundamental purpose of the two methods is to maintain the balance of AI model training data and a good classification effect can be achieved without a lot of training data based on the model characteristics.

The design of the experimental cases involves a ratio of healthy data to fault data of 10:1, where the training healthy data consists of 600 samples, and all four types of fault data have 60 samples each. Both the residual VGG-based Siamese network and the transfer learning model are examples of under-sampling methods. Therefore, 60 samples are randomly drawn from the 600 healthy samples, to achieve a balanced number of healthy and faulty samples. For a ratio of 20:1, each of the four types of fault data has 30 samples, with 30 samples randomly drawn from the 600 healthy samples. DCGAN and CNN models [23] fall under over-sampling methods; when the ratio of healthy data to fault data is 10:1, the four types of fault data are each expanded from 60 samples to 600 samples. When the ratio is 20:1, 30 samples are expanded to 600 samples, providing data for training the classification model.

The testing of models using both over-sampling and under-sampling methods follows an across-load fault diagnosis approach. If the training data are under 50% load, then the test data are at 100% load, and both the healthy and the four types of fault test data consist of 500 samples each. Conversely, if the training data are under 100% load, then the test data are at 50% load. Detailed information on experimental case data planning is shown in Table 6. The imbalance ratio is defined as follows:

Imbalance ratio = No. of fault training data: No. of health training data.

The training dataset imbalance ratio had two cases, 10:1 and 20:1. When the imbalance ratio was 20:1, the health data had 600 samples. Each failure data had 30 samples, 30 samples were selected from the health data randomly, and the AI model training was performed.

2.: Residual VGG-based Siamese network training process

The training process of the method proposed in this study is shown in Figure 8. It was observed that the model could not identify the similarity between images effectively in the initial training period. The Euclidean and Manhattan distances could not measure the distance between images accurately so the contrastive loss value was large. Therefore, the initial value of the learning rate was large, so that the optimizer algorithm could look for optimization in a wide range, and the model parameter values were corrected greatly. In the intermediate stage of training, the model began to distinguish the difference between samples, and the error loss decreased gradually. The information about the model training effect was provided for the optimizer algorithm through the loss function, and the model parameter values were adjusted slightly so that the model completed the classification tasks effectively. In the final training stage, the learning rate was reduced, allowing the optimizer to fine-tune the model parameters. As a result, the contrastive loss value changed minimally, and the training process stabilized.

3.: Analysis of experimental results

This case study compares the method proposed in this research with typical over-sampling and under-sampling models, namely the fine-tuned ResNet50, DCGAN, and CNN models [23], applied to the across-load fault diagnosis analysis of induction motors with limited data conditions. We investigate the impact of imbalanced training data ratios and different load conditions. At the imbalance ratio of 10:1, the results are shown in Table 7 for induction motors at 50% loading level training—100% loading level testing, and 100% loading level training—50% loading level testing. Whether a five-shot five-way or one-shot five-way, the residual VGG-based Siamese network model achieves an average diagnostic accuracy of 98 to 99%, with STD values ranging from 0.1315 to 1.182. Compared to the fine-tuned ResNet50, DCGAN, and CNN models [23], the method proposed in this research yields an average accuracy of approximately 5% and 25% higher, respectively, with STD values significantly lower than the aforementioned models.

Next, to investigate the impact of the data imbalance ratio, we increased the balance ratio to 20:1, with the results shown in Table 8. The residual VGG-based Siamese network model still achieves an average diagnostic accuracy of 98 to 99%, with STD values ranging from 0.2683 to 1.182. Compared to the fine-tuned ResNet50, DCGAN, and CNN models [23], the average accuracy is about 6% and 29% higher, respectively, and the STD value remains much lower than the aforementioned models. Compared with the results in Table 7, where the imbalance ratio is 10:1, the across-load accuracy of the residual VGG-based Siamese network model dropped by less than 1%, but fine-tuned ResNet50 and DCGAN and CNN models [23] decreased by 2%, and 9%, respectively, illustrating the superior performance of the model proposed in this research in across-load fault diagnosis for induction motors.

As shown in Figure 9 and Figure 10, which display the confusion matrices of classification results for 50% loading level training—100% loading level testing and 100% loading level training—50% loading level testing, respectively, we can observe the classification results for healthy and four types of faults. The residual VGG-based Siamese network almost achieves effective classification. However, fine-tuned ResNet50, as the pre-trained model, cannot effectively apply the knowledge it has previously learned to the across-load fault diagnosis of induction motors. This model shows more drastic changes in the classification results compared to DCGAN and CNN models [23], indicating the difficulty in training DCGAN, which cannot generate a distribution similar to real samples, leading to a significant decline in the diagnostic performance of the CNN model. Finally, the experimental case results reveal the following points: (1) Even at an imbalance ratio of 20:1 and under five-way one-shot conditions, the method proposed in this research still achieves a diagnostic accuracy of over 98%. (2) Compared to the fine-tuned ResNet50 model, the residual VGG-based Siamese network not only increased accuracy but also lowered STD values in the diagnosis results, indicating the stability of the model’s training performance, unaffected by imbalanced or insufficient training data, for instance, by issues such as overfitting or lack of convergence. (3) Compared to DCGAN and CNN models [23], the method proposed in this research has higher accuracy and does not require the adjustment of complex parameters like in GAN models, making it less difficult to train.

6. Conclusions

This study successfully integrates time–frequency domain transformation techniques with the residual VGG-based Siamese network to perform across-load fault diagnosis of induction motors with limited data conditions. The feature extraction layer of the model effectively converts paired time–frequency images into low-dimensional feature vectors. Additionally, multiple similarity measurement methods have been introduced to accurately gauge the distances between feature vectors, thereby enhancing image classification accuracies. Utilizing few-shot learning techniques, the model achieves exceptional diagnostic results, even under highly imbalanced data conditions, demonstrating significant practicality and efficiency. This effectively addresses the prevalent issue of training data imbalance in deep learning applications.

During experimental comparisons, the method proposed in this study was benchmarked against commonly used transfer learning models like ResNet50, and deep learning models such as DCGAN and CNN. Faced with health-to-fault imbalance ratios of 20:1 and 10:1, the proposed method exhibited the highest diagnostic performance, achieving an average accuracy rate of over 98% and the lowest standard deviation compared to other models, showcasing superior stable predictive classification results. Moreover, even as the data imbalance ratio increased from 10:1 to 20:1, the diagnostic performance of the method remained robust. The learning rate adjustment strategy is identified as a sensitive parameter in this study, influencing the model’s learning efficiency and ultimate classification performance. Initially, a higher learning rate was employed during the model training, which was then halved every 20 epochs. This strategy aids in rapidly approaching optimization early in training and allows for the fine-tuning of model parameters later to avoid excessive parameter oscillations and enhance convergence stability. Based on experimental adjustments and previous research experiences, future work could explore more learning rate adjustment strategies to further enhance the model’s adaptability and performance in more complex or variable datasets.

From the experimental results, the method proposed in this study makes significant contributions to the across-load fault diagnosis of induction motors with limited data conditions. This includes technical integration, addressing limitations, tackling data imbalance, and facilitating cross-domain fault diagnosis. The study has achieved notable results but also identifies limitations and suggests future research directions. Currently, the model primarily diagnoses specific types of induction motor faults. Future work could extend to more motors and other mechanical equipment to test the model’s generalizability. Furthermore, considering the complexity of real industrial environments, further verification of the model’s stability and reliability under various environmental conditions should be conducted.

Author Contributions

Conceptualization, H.-C.C. and C.-C.K.; methodology, R.-G.L.; validation, H.-C.C., R.-G.L., and C.-C.L.; formal analysis, H.-C.C. and R.-G.L.; investigation, C.-C.L.; resources, C.-C.L.; data curation, H.-C.C.; writing—original draft preparation, R.-G.L.; writing—review and editing, C.-C.L. and C.-C.K.; supervision, H.-C.C. and C.-C.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Saidi, L.; Ali, J.B.; Fnaiech, F. Bi–spectrum based–EMD applied to the non–stationary vibration signals for bearing faults diagnosis. ISA Trans. 2014, 53, 1650–1660. [Google Scholar] [CrossRef]
Liu, H.; Li, D.; Yuan, Y.; Zhang, S.; Zhao, H.; Deng, W. Fault diagnosis for a bearing rolling element using improved VMD and HT. Appl. Sci. 2019, 9, 1439. [Google Scholar] [CrossRef]
Khan, M.A.S.K.; Radwan, T.S.; Rahman, M.A. Real–time implementation of wavelet packet transform–based diagnosis and protection of three–phase induction motors. IEEE Trans. Energy Convers. 2007, 22, 647–655. [Google Scholar] [CrossRef]
Yang, Y.; Haque, M.M.M.; Bai, D.; Tang, W. Fault Diagnosis of Electric Motors Using Deep Learning Algorithms, and Its Application: A Review. Energies 2021, 14, 7017. [Google Scholar] [CrossRef]
Shao, S.-Y.; Sun, W.-J.; Yan, R.-Q.; Wang, P.; Gao, R.X. A deep learning approach for fault diagnosis of induction motors in manufacturing. Chin. J. Mech. Eng. 2017, 30, 1347–1356. [Google Scholar] [CrossRef]
Hoang, D.T.; Kang, H.J. A motor current signal–based bearing fault diagnosis using deep learning and information fusion. IEEE Trans. Instrum. Meas. 2020, 69, 3325–3333. [Google Scholar] [CrossRef]
Shao, S.; Yan, R.; Lu, Y.; Wang, P.; Gao, R.X. DCNN–based multi–signal induction motor fault diagnosis. IEEE Trans. Instrum. Meas. 2020, 69, 2658–2669. [Google Scholar] [CrossRef]
Wang, C.-S.; Kao, I.-H.; Perng, J.-W. Fault diagnosis and fault frequency determination of permanent magnet synchronous motor based on deep learning. Sensors 2021, 21, 3608. [Google Scholar] [CrossRef]
Shifat, T.A.; Hur, J.-W. ANN assisted multi sensor information fusion for BLDC motor fault diagnosis. IEEE Access. 2021, 9, 9429–9441. [Google Scholar] [CrossRef]
Wang, J.; Zeng, Z.; Zhang, H.; Barros, A.; Miao, Q. An Improved Triplet Network for Electromechanical Actuator Fault Diagnosis Based on Similarity Strategy. IEEE Trans. Instrum. Meas. 2022, 71, 1–10. [Google Scholar] [CrossRef]
Liu, X.; Sun, W.; Li, H.; Wang, Z.; Li, Q. Imbalanced Sample Fault Diagnosis of Rolling Bearing Using Deep Condition Multidomain Generative Adversarial Network. IEEE Sens. J. 2023, 23, 1271–1285. [Google Scholar] [CrossRef]
Qian, W.; Li, S. A novel class imbalance-robust network for bearing fault diagnosis utilizing raw vibration signals. Measurement 2020, 156, 107567. [Google Scholar] [CrossRef]
Cao, H.; Shao, H.; Zhong, X.; Deng, Q.; Yang, X.; Xuan, J. Unsupervised domain-share CNN for machine fault transfer diagnosis from steady speeds to time-varying speeds. J. Manuf. Syst. 2022, 62, 186–198. [Google Scholar] [CrossRef]
Mushtaq, S.; Islam, M.M.M.; Sohaib, M. Deep learning aided data-driven fault diagnosis of rotatory machine: A comprehensive review. Energies 2021, 14, 5150. [Google Scholar] [CrossRef]
Zheng, M.L.; Chang, Q.; Man, J.F.; Liu, Y.; Shen, Y.P. Two-Stage Multi-Scale Fault Diagnosis Method for Rolling Bearings with Imbalanced Data. Machines 2022, 10, 336. [Google Scholar] [CrossRef]
Hao, C.; Du, J.; Liang, H. Imbalanced Fault Diagnosis of Rolling Bearing Using Data Synthesis Based on Multi-Resolution Fusion Generative Adversarial Networks. Machines 2022, 10, 295. [Google Scholar] [CrossRef]
Zareapoor, M.; Shamsolmoali, P.; Yang, J. Oversampling adversarial network for class-imbalanced fault diagnosis. Mech. Syst. Signal. Process. 2021, 149, 107175. [Google Scholar] [CrossRef]
Ramos-Pérez, I.; Arnaiz-González, Á.; Rodríguez, J.J.; García-Osorio, C. When is resampling beneficial for feature selection with imbalanced wide data? Expert Syst. Appl. 2022, 188. [Google Scholar] [CrossRef]
Zhang, D.; Ning, Z.; Yang, B.; Wang, T.; Ma, Y. Fault diagnosis of permanent magnet motor based on DCGAN-RCCNN. Energy Rep. 2022, 8, 616–626. [Google Scholar] [CrossRef]
Hang, Q.; Yang, J.; Xing, L. Diagnosis of rolling bearing based on classification for high dimensional unbalanced data. IEEE Access 2019, 7, 79159–79172. [Google Scholar] [CrossRef]
Bal, P.R.; Kumar, S. WR-ELM: Weighted regularization extreme learning machine for imbalance learning in software fault prediction. IEEE Trans. Rel. 2020, 69, 1355–1375. [Google Scholar] [CrossRef]
Fan, Y.; Cui, X.; Han, H.; Lu, H. Chiller fault diagnosis with field sensors using the technology of imbalanced data. Appl. Thermal Eng. 2019, 159. [Google Scholar] [CrossRef]
Chang, H.; Wang, Y.; Shih, Y.; Kuo, C. Fault diagnosis of induction motors with imbalanced data using deep convolutional generative adversarial network. Appl. Sci. 2022, 12, 4080. [Google Scholar] [CrossRef]
Shao, S.; McAleer, S.; Yan, R.; Baldi, P. Highly accurate machine fault diagnosis using deep transfer learning. IEEE Trans. Ind. Informat. 2019, 15, 2446–2455. [Google Scholar] [CrossRef]
Janssens, O.; Van de Walle, R.; Loccufier, M.; Van Hoecke, S. Deep learning for infrared thermal image based machine health monitoring. IEEE/ASME Trans. Mechatronics. 2018, 23, 151–159. [Google Scholar] [CrossRef]
Ullah, Z.; Lodhi, B.A.; Hur, J. Detection and identification of demagnetization and bearing faults in PMSM using transfer learning based VGG. Energies 2020, 13, 3834. [Google Scholar] [CrossRef]
Liu, S.; Jiang, H.; Wu, Z.; Liu, Y.; Zhu, K. Machine fault diagnosis with small sample based on variational information constrained generative adversarial network. Adv. Eng. Inform. 2022, 54. [Google Scholar] [CrossRef]
Zhang, X.; Wu, B.; Zhang, X.; Zhou, Q.; Hu, Y.; Liu, J. A novel assessable data augmentation method for mechanical fault diagnosis under noisy labels. Measurement 2022, 198. [Google Scholar] [CrossRef]
Xu, J.; Shi, Y.; Shi, L.; Ren, Z.; Lu, Y. Intelligent Deep Adversarial Network Fault Diagnosis Method Using Semisupervised Learning. Math. Probl. Eng. 2022, 2020. [Google Scholar] [CrossRef]
Sun, W.; Zhou, J.; Sun, B.; Zhou, Y.; Jiang, Y. Markov transition field enhanced deep domain adaptation network for milling tool condition monitoring. Micromachines 2022, 13, 873. [Google Scholar] [CrossRef]
Choudhary, A.; Goyal, D.; Shimi, S.L.; Akula, A. Condition monitoring and fault diagnosis of induction motors: A review. Arch. Comput. Methods Eng. 2019, 26, 1221–1238. [Google Scholar] [CrossRef]
Skóra, M.; Ewert, P.; Kowalski, C.T. Selected Rolling Bearing Fault Diagnostic Methods in Wheel Embedded Permanent Magnet Brushless Direct Current Motors. Energies 2019, 12, 4212. [Google Scholar] [CrossRef]
Lee, J.-S.; Yoon, T.-M.; Lee, K.-B. Bearing fault detection of IPMSMs using zoom FFT. J. Electr. Eng. Technol. 2016, 11, 1235–1241. [Google Scholar] [CrossRef]
Sanakkayala, D.C.; Varadarajan, V.; Kumar, N.; Soni, G.; Kamat, P.; Kumar, S.; Patil, S.; Kotecha, K. Explainable AI for Bearing Fault Prognosis Using Deep Learning Techniques. Micromachines 2022, 13, 1471. [Google Scholar] [CrossRef] [PubMed]
He, Q.; Liu, Y.; Long, Q.; Wang, J. Time-frequency manifold as a signature for machine health diagnosis. IEEE Trans. Instrum. Meas. 2012, 61, 1218–1230. [Google Scholar] [CrossRef]
Tran, M.-Q.; Liu, M.-K.; Tran, Q.-V.; Nguyen, T.-K. Effective fault diagnosis based on wavelet and convolutional attention neural network for induction motors. IEEE Trans. Instrum. Meas. 2022, 71, 1–13. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, H.; Zhang, W.; Lu, G.; Tian, Q.; Ling, N. Few-shot image classification: Current status and research trends. Electronics 2022, 11, 1752. [Google Scholar] [CrossRef]
Atanbori, J.; Rose, S. MergedNET: A simple approach for one-shot learning in siamese networks based on similarity layers. Neurocomputing 2022, 509, 1–10. [Google Scholar] [CrossRef]
Huang, Z.; Huang, W.; Xu, X.; Xiao, J. Partial discharge diagnosis with siamese fusion network. IEEE Access 2022, 10, 62129–62136. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large–scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
Dong, Z.; Xie, M.; Li, X.Q. Multi-Scale Receptive Fields Convolutional Network for Action Recognition. Appl. Sci. 2023, 13, 3043. [Google Scholar] [CrossRef]
Li, N.; Iosifidis, A.; Zhang, Q. Collaborative edge computing for distributed cnn inference acceleration using receptive field-based segmentation. Comput. Netw. 2022, 214, 109150. [Google Scholar] [CrossRef]
Li, G.; Shen, X.; Li, J.; Wang, J. Diagonal-kernel convolutional neural networks for image classification. Digit. Signal Process. 2021, 108, 102898. [Google Scholar] [CrossRef]
Hsiao, T.-Y.; Chang, Y.-C.; Chou, H.-H.; Chiu, C.-T. Filter-based deep-compression with global average pooling for convolutional networks. J. Syst. Archit. 2019, 95, 9–18. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, X. Global learnable pooling with enhancing distinctive feature for image classification. IEEE Access 2020, 8, 98539–98547. [Google Scholar] [CrossRef]
Basodi, S.; Ji, C.; Zhang, H.; Pan, Y. Gradient amplification: An efficient way to train deep neural networks. Big Data Mining Anal. 2020, 3, 196–207. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Zhang, A.; Li, S.; Cui, Y.; Yang, W.; Dong, R.; Hu, J. Limited Data Rolling Bearing Fault Diagnosis with Few–Shot Learning. IEEE Access 2019, 7, 110895–110904. [Google Scholar] [CrossRef]

Figure 1. Types of faults.

Figure 2. CWT transformation process.

Figure 3. Residual VGG-based Siamese network model architecture.

Figure 4. Residual VGG-based Siamese network model architecture.

Figure 5. Residual learning.

Figure 6. Induction motor data measurement platform.

Figure 7. Induction motor data measurement platform. (a) CWT: 50% load→100% load; (b) CWT: 100% load→50% load; (c) two-dimensional time-domain signals transform: 50% load→100% load; (d) two-dimensional time-domain signals transform: 100% load→50% load.

Figure 8. Training process of residual VGG-based Siamese network.

Figure 9. Imbalance ratio of 20:1-50% to 100% load. (a) Residual VGG-based Siamese network: five-shot five-way. (b) Residual VGG-based Siamese network: one-shot five-way. (c) Fine-tuned ResNet50. (d) DCGAN and CNN [23].

Figure 10. Imbalance ratio of 20:1-100% to 50% load. (a) Residual VGG-based Siamese network: five-shot five-way. (b) Residual VGG-based Siamese network: one-shot five-way; (c) Fine-tuned ResNet50. (d) DCGAN and CNN [23].

Table 1. Research statistics of induction motor faults [31].

	IEEE	EPRI	ABB
Types of Faults	IEEE	EPRI	ABB
Bearing faults	41 (%)	42 (%)	51 (%)
Stator faults	28 (%)	36 (%)	16 (%)
Other faults	22 (%)	14 (%)	10 (%)
Rotor faults	9 (%)	8 (%)	5 (%)
External condition	–	–	16 (%)
Shaft coupling	–	–	2 (%)

Table 2. Model parameters.

Model Name	Layer Type	Total Parameters
Residual VGG	Global average pooling	14,899,148
VGG	Two fully connected	33,796,044

Table 3. Device item.

Device No.	Device Name
1	Tachometer
2	Torque meter
3	Dynamometer control box
4	Induction motor
5	Coupling
6	Load
7	Adjustable platform

Table 4. Dataset partition—signal processing techniques.

Operating Status
Healthy	Stator inter-turn short circuit	Rotor broken bar		Bearing outer ring damage	Misalignment fault
Training dataset split—per category
Training data			Validation data
350 (samples)			50 (samples)
Testing dataset split—per category
500 (samples)

Table 5. Result—signal processing techniques.

	Signal Processing	CWT	Two-Dimensional Time-Domain Signals Transform
Across Loading		CWT	Two-Dimensional Time-Domain Signals Transform
50% load→100% load	Average accuracy	99.82 (%)	78.59 (%)
50% load→100% load	STD	0.2006	4.889
100% load→50% load	Average accuracy	98.34 (%)	88.08 (%)
100% load→50% load	STD	0.8887	4.948

Table 6. Dataset partition.

	Sampling Techniques	Over-Sampling (Samples)		Down-Sampling (Samples)
Imbalance Ratio		Healthy	Faults	Healthy	Faults
Training dataset	10:1	60	60	600	600
Training dataset	20:1	30	30	300	300
Validation dataset	10:1	15	15	15	15
Validation dataset	20:1	15	15	15	15
Testing dataset	10:1	500	500	500	500
Testing dataset	20:1	500	500	500	500

Table 7. Imbalance ratio of 10:1.

Across Loading		Residual VGG-Based Siamese Network Five-Shot Five-Way	Residual VGG-Based Siamese Network One-Shot Five-Way	Fine-Tuned ResNet50	DCGAN and CNN [23]
Across Loading	Accuracy Index	Residual VGG-Based Siamese Network Five-Shot Five-Way	Residual VGG-Based Siamese Network One-Shot Five-Way	Fine-Tuned ResNet50	DCGAN and CNN [23]
50% to 100% load	Average accuracy	99.91 (%)	99.89 (%)	94.06 (%)	79.41 (%)
50% to 100% load	STD	0.1315	0.1409	6.273	7.837
100% to 50% load	Average accuracy	98.55 (%)	98.48 (%)	96.83 (%)	79.14 (%)
100% to 50% load	STD	0.8771	0.9488	1.284	7.483

Table 8. Imbalance ratio of 20:1.

Across Loading		Residual VGG-Based Siamese Network Five-Shot Five-Way	Residual VGG-Based Siamese Network One-Shot Five-Way	Fine-Tuned ResNet50	DCGAN and CNN [23]
Across Loading	Accuracy Index	Residual VGG-Based Siamese Network Five-Shot Five-Way	Residual VGG-Based Siamese Network One-Shot Five-Way	Fine-Tuned ResNet50	DCGAN and CNN [23]
50% to 100% load	Average accuracy	99.77 (%)	99.72 (%)	91.11 (%)	71.39 (%)
50% to 100% load	STD	0.2683	0.2736	6.621	8.024
100% to 50% load	Average accuracy	98.34 (%)	98.15 (%)	95.21 (%)	69.03 (%)
100% to 50% load	STD	1.178	1.182	1.801	7.748

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chang, H.-C.; Liu, R.-G.; Li, C.-C.; Kuo, C.-C. Fault Diagnosis of Induction Motors under Limited Data for across Loading by Residual VGG-Based Siamese Network. Appl. Sci. 2024, 14, 8949. https://doi.org/10.3390/app14198949

AMA Style

Chang H-C, Liu R-G, Li C-C, Kuo C-C. Fault Diagnosis of Induction Motors under Limited Data for across Loading by Residual VGG-Based Siamese Network. Applied Sciences. 2024; 14(19):8949. https://doi.org/10.3390/app14198949

Chicago/Turabian Style

Chang, Hong-Chan, Ren-Ge Liu, Chen-Cheng Li, and Cheng-Chien Kuo. 2024. "Fault Diagnosis of Induction Motors under Limited Data for across Loading by Residual VGG-Based Siamese Network" Applied Sciences 14, no. 19: 8949. https://doi.org/10.3390/app14198949

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fault Diagnosis of Induction Motors under Limited Data for across Loading by Residual VGG-Based Siamese Network

Abstract

1. Introduction

2. Induction Motor Fault Types

3. Signal Preprocessing Techniques

4. Proposed Model

4.1. Formatting of Mathematical Components

4.2. Merged Similarity Layer Architecture

4.3. Feature Extraction Architecture

4.4. Model Training Algorithms

4.5. Model Testing Process

5. Experimental Case Planning, Results, and Analysis

5.1. Measurement of Experimental Data

5.2. Experimental Case

5.2.1. Case Study 1—Signal Processing Techniques

5.2.2. Case Study 2—AI Model with Limited Data

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI