Imbalanced Diagnosis Scheme for Incipient Rotor Faults in Inverter-Fed Induction Motors

Martin-Diaz, Ignacio; Garcia-Calva, Tomas; Duque-Perez, Óscar; Morinigo-Sotelo, Daniel

doi:10.3390/app14167237

Open AccessArticle

Imbalanced Diagnosis Scheme for Incipient Rotor Faults in Inverter-Fed Induction Motors

¹

Research Group ADIRE, Institute of Advanced Production Technologies, University of Valladolid, 47002 Valladolid, Spain

²

Department of Electrical Engineering, University of Valladolid, 47002 Valladolid, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 7237; https://doi.org/10.3390/app14167237

Submission received: 16 July 2024 / Revised: 8 August 2024 / Accepted: 14 August 2024 / Published: 17 August 2024

(This article belongs to the Special Issue Fault Diagnosis for Electrical Machines, Power Electronics, and Drives)

Download

Browse Figures

Versions Notes

Abstract

Recently, fault diagnosing supervised classifiers have been widely proposed to diagnose both electric and mechanical faults in induction motors (IM). However, many of them require a large amount of data, which implies a great effort required for processing fault-related features and building the training set. Furthermore, in real-world datasets, it is required to deal with highly skewed data distributions, also known as class imbalance, which is a limiting issue and can misguide the tuning of machine learning algorithms. Resampling techniques based on a synthetic generation of minority class observations aim to address this problem. Last but not least is the fact that inverter-fed IM introduces undesired harmonics in the monitoring signal altering the diagnosis patterns. This diagnosis scheme is evaluated on experimental imbalanced data oriented to deal with the diagnosis of a rotor in situations where it is fed with an inverter. The results show how this imbalanced approach determines the actual diagnosis performance on a small amount of data. The experimental results demonstrate that balanced training sets built with class balancing techniques improve the classifier and therefore its performance for diagnosing incipient rotor faults in inverter-fed IM with studied and interpretable features recently proposed in this field of study.

Keywords:

rotor faults; diagnosis features; induction motor; inverter; small data; imbalance correction

1. Introduction

Condition monitoring is a growing concern in predictive maintenance tasks to prevent unexpected machinery breakdowns and production losses. Consequently, strengthening the ability to detect and diagnose machine faults can reduce costs in modern production planning and operations. As a primary energy consumer, induction motors (IM) are a crucial element in electromechanical equipment and currently dominate in both the industrial and commercial sectors [1]. Therefore, a well-designed condition-based monitoring system can minimize maintenance costs and correctly target necessary maintenance tasks [2].

Motor current signature analysis (MCSA) has been preferred for extracting fault signatures at both transient [3] and stationary regimes [4]. This technique has several advantages over vibration analysis, such as remote sensing, low implementation cost, and online monitoring.

Furthermore, voltage source inverters are increasingly used to feed IM in variable speed applications. This, however, introduces a new set of challenges. Using inverters introduces undesired harmonics into the stator current, negatively altering the extraction of features based on spectral analysis. Consequently, many diagnosis techniques that worked for line-fed IM do not perform appropriately on inverter-fed IM. As a result, the methodologies proposed to date as well as novel ones have to take into account this challenge to avoid the loss of performance of the diagnosis systems.

Nowadays, automatic diagnosis through machine learning techniques is one of the most promising fields for this challenge. Automatic diagnosis has been conceived as a tool for improving the efficiency and reliability of a balanced or unbiased fault detection scheme. In fact, an important research community has proposed supervised classifiers to diagnose the health of the electric machine [5,6,7]. Basically, a classifier consists of a function (linear or nonlinear) that tries to separate the classes defined in the problem by minimizing the expected loss. In other words, it tries to identify which class corresponds to each observation with the lowest classification error. The different motor states form classes of the problem to diagnose.

Hence, it is clear that data-driven approaches oriented to an automatic diagnosis have emerged recently [8]. Authors in [9] demonstrate an interest in data-driven methods where actual field data are obtained from railway electrification networks. The data is continuously collected from the most important variables, which gives a data-driven detection approach for low-frequency oscillation events in alternating current railway catenaries. In addition, there are works where the simulation-driven machine learning training phase has also been developed by generating raw input data. This kind of approach tries to overcome limitations such as data scarcity and data imbalance [10].

Data imbalance is a crucial concern, since actual datasets are not entirely balanced [11], and it is common for actual industrial data obtained both from destructive tests of the machine and during its useful life to be imbalanced [12]. When the minority class representation is extremely low, the problem is usually denominated as an anomaly detection case [13]. Unlike the case of balanced classification, under an imbalanced scenario, the classifier directly labels new observations to the majority class because of the existing inherent bias as the imbalance proportion between classes increases. This fact is detrimental to the performance of diagnosis systems and requires much more attention. To deal with class imbalance, the data science literature has presented several techniques relating to the following approaches: embedded methods, data processing, and ensemble methods. Within data processing approaches, resampling methods have offered adequate characteristics to balance previously imbalanced training sets. Moreover, in engineering scenarios, the reduced machine information about abnormal conditions limits intelligent fault diagnosis performances since diagnosis tools with small and imbalanced data reduce the accuracy of fault identification [12,14].

This paper aims to improve the diagnosis of faults in inverter-fed induction motors, mainly focusing on rotor bar failure. It proposes using the SMOTE oversampling technique to address imbalances in datasets to enhance diagnostic performance and interpretability. The proposed method is intended to work with a small and unbalanced dataset. Additionally, the fault features will be derived from the stator current in an inverter-fed motor, which is a source of high harmonic and polluted content in this signal (as mentioned before). Hence, this study aims to create an effective diagnostic tool for the circumstances previously described. This work begins by reviewing various techniques used to generate synthetic data to balance the initial dataset, focusing on the computed feature space. The implemented techniques will be Safe-Level SMOTE, Relocating Safe-Level SMOTE, and Majority weighted minority oversampling technique (MWMOTE), which have been proposed for balancing datasets. Firstly, the ReliefF algorithm performs the feature selection. Then, the developed diagnosis methodology uses a resampling technique for balancing data. When the training set is balanced, a supervised classifier is used for the fault diagnosis of rotor asymmetries with different degrees of bar breakage. The fault signatures proposed in the bibliography are introduced in the next section. These signatures are computed from signals collected during experimentation from an induction motor fed from inverters and the line.

2. Background

An appropriate condition monitoring system can detect the machine’s potential rotor breakage in its early stages, and its diagnosis can be automated based on fault-related features. The monitored signal is the stator currents, which have proven effective for stationary and non-stationary operating conditions. In this paper, the fault signatures are derived from the frequency and time domains and calculated to diagnose issues during stationary operation.

2.1. Diagnosis Signals

From the scientific literature, it seems to be a consensus that MCSA is a noninvasive diagnostic method [3,4,8,12,15] with several advantages: easy acquisition of signals; signal collection does not interfere with the operation of the motor; the obtained signal is reliable under noisy environments; and sensors with reasonable accuracy are not too expensive. Once an IM is manufactured, free of rotor faults, and it is fed from an ideal stator current signal, the temporal expression of this current contains exclusively the fundamental component of its power supply:

i_{0} (t) = I_{m} cos (ω_{0} t + ϕ_{0}),

(1)

where

ω_{0}

denotes the power supply angular pulsation,

I_{m}

denotes the amplitude of the stator current, and

ϕ_{0}

is the phase. However, when a rotor breakage develops, fault-related harmonics emerge in the observed signal and can be used for diagnosis purposes. Nevertheless, when the IM supply comes from an inverter or from the mains with the presence of considerable harmonic content, the amplitude quantification of these fault patterns complicates their identification [16].

This project will utilize a single stator phase as a monitoring signal, which will be obtained from the motor when it is powered by an inverter or the grid. A data acquisition system will then digitize the signal in the time domain. Following this, the digital signal will undergo processing in both the time and frequency domains to identify fault patterns.

2.2. Time-Domain Fault Signatures

Statistics computed from time domain signals have been employed as features for several fault diagnosis approaches in literature [6,7,11]. High-order statistics are sensitive to non-Gaussian distribution measurements. On the other hand, low-order statistics (e.g., first and second moments) are more robust. In Table 1, the time-domain statistical features used in this study are defined.

2.3. Fault Signatures from Spectra

Fault signatures from the stator current can also be obtained in the frequency domain. When a rotor bar breakage occurs, a current is induced, and torque and speed pulsations originate, giving place to two sidebands around the fundamental frequency and other harmonics. These fault-related frequency components are observable in the frequency spectrum [15]. This motivates the use of these sidebands’ amplitude to be considered a fault severity indicator of the rotor condition. For their frequency location, Equation (2) is used.

f_{B R B} = (1 \pm 2 s) f,

(2)

where f is the supply frequency and s is the rotor slip. The frequency corresponding to the minus sign is known as the left sideband harmonic (LSH); the other one is known as the right sideband harmonic (RSH). Both sidebands are separated from the integer harmonics, a distance that depends on rotor slip and, thus, on the motor load level. Nonetheless, when an IM is fed from an inverter, the stator current contains time harmonics. These harmonics modify the amplitude of the present harmonics or they generate new airgap spatial harmonics. The author in [17] proposed additional faulty signatures for diagnosing rotor cage asymmetries in inverter-fed IM. Some of them are used as candidate features for the diagnosis methodology and they are defined as follows:

\begin{matrix} Γ_{5} & = \frac{G 7 (L S H)}{G_{5}} \end{matrix}

(3)

\begin{matrix} Γ_{7} & = \frac{G 5 (R S H)}{G_{7}} \end{matrix}

(4)

\begin{matrix} Γ_{11} & = \frac{G 13 (L S H)}{G_{11}} \end{matrix}

(5)

\begin{matrix} Γ_{13} & = \frac{G 11 (R S H)}{G_{13}} \end{matrix}

(6)

The above expressions’ numerators correspond to the amplitudes of the fault-related sidebands around the fifth, seventh, eleventh, and thirteenth harmonics. The nomenclature used is LSH for the left sideband harmonic and RSH for the right sideband harmonic. The denominators are the amplitudes of the above-mentioned integer harmonics. Figure 1 and Figure 2 show the spectrum of one stator phase current to illustrate the presence of these harmonics. Figure 1 and Figure 2 are the spectra for a line-fed and an inverter-fed induction motor respectively. As can be seen, the signal that comes from the inverter supply introduces different harmonics, inter-, and sub-harmonics, into the stator current spectrum, affecting more negatively the extraction of the above-defined indicators and, thus, the diagnosis compared with the line-fed IM case.

2.4. Feature Selection through ReliefF Algorithm

The fault signatures mentioned above are obtained through a manual inspection process assisted by MATLAB^© software version number R2023a. After collecting these fault signatures, a feature selection process is commonly used to eliminate noisy, irrelevant, and redundant variables from the set. This selective analysis aims to improve subsequent classification. For this purpose, the Algorithm 1 (ReliefF algorithm [18]) is applied. This technique scores the impact of changes based on how their values distinguish closely labelled observations.

ReliefF [18] is an extension of the Relief algorithm designed for binary problems. However, RefliefF can deal with more than two classes and is more robust considering that it can deal with incomplete and noisy data [19]. Given a randomly selected observation

S_{o}

, ReliefF searches for k of its nearest neighbours in the same class. These are called the closest results

R_{j}

, hits or successes. This algorithm also searches for k nearest neighbours of each different class. They are called nearest misses

M_{j} (C)

. The quality estimate

W [F]

is updated for all features F based on their values for

S_{o}

,

M_{j} (C)

and

R_{o}

. The contributions of all successes and misses are averaged through the update expression. That is to say, the contribution for each class of the misses is weighted with the prior probability of that class

P (C)

that, using the training set, is estimated. If the observations

S_{o}

and

R_{j}

have different values of the characteristic F, then the characteristic F will separate two observations belonging to the same class, which is undesirable. The estimation of quality

W [F]

is reduced. This process is repeated m times. The choice of k successes and failures ensures higher algorithm reliability concerning noise. The difference between the values of the variable or feature F for two observations

O_{1}

and

O_{2}

is computed by the function

d i f f (F, O_{1}, O_{2})

. For numerical variables, the function is defined as follows:

d i f f (F, O_{1}, O_{2}) = \frac{| v a l u e (F, O_{1}) - v a l u e (F, O_{2}) |}{m a x (F) - m i n (F)}

(7)

The

d i f f

function calculates the distance between observations to determine the nearest neighbor’s location. The total distance is calculated as the sum of the distances across all features or variables, known as the Manhattan distance.

Algorithm 1 ReliefF [18]

Input: Training observations with their respective vector of features
Output: The vector W with the quality estimation of the features

1:: Set all feature weights to zero $W [F] = 0$
2:: for $o \leftarrow 1, \dots, m$ do
3:: Random selection of an observation $S_{o}$
4:: Locate k nearest results, named as $R_{j}$
5:: for each class $C \neq c l a s s (S_{o})$ do
6:: From class C identify k nearest misses named as $M_{j} (C)$
7:: for $F = 1$ to f do
8:: $W [F] = W [F] - \sum_{j = 1}^{k} \frac{d i f f (F, S_{o}, R_{j})}{m \cdot k} +$
$\sum_{C \neq c l a s s (S_{o})} \frac{[\frac{P (C)}{1 - P (c l a s s (S_{o}))} \sum_{j = 1}^{k} d i f f (F, S_{o}, M_{j} (C))]}{m \cdot k}$
9:: end for
10:: end for
11:: end for

3. Problem Statement for a Class-Imbalanced Scenario

The dataset of the IM obtained from the laboratory setup leads to a normal distribution as the data collection increases, as is the case for healthy motor instances. However, instances corresponding to faulty rotor states are fewer than healthy ones. Thus, a classifier cannot find the core of the distribution from the minority set of observations since it is too reduced to be identified. Therefore, minority-class observations must be over-sampled to perform successfully in a small data diagnosis scheme. This task even increases in complexity when features are computed from a polluted stator current signal due to the presence of undesired inverter harmonics [20]. The following sections describe the different approaches that have led to the already presented fault-related feature space treatment of the problem.

3.1. SMOTE and Following Extensions for Balancing Datasets

Synthetic Minority Over-Sampling Technique (SMOTE), proposed by Chawla et al. in [21], is a technique that oversamples the observations belonging to a minority class creating synthetic samples. This oversampling is made iteratively through a selection approach where each observation from the minority class is iterated until the total number of classes is balanced. The synthetic generation procedure consists of applying the k nearest neighbours algorithm. One of the k nearest neighbours is randomly selected, and later, it is synthesized, letting the algorithm continue until every needed synthetic sample is built. In the proposed methodology, an imbalanced dataset is characterized by its imbalanced ratio (

I R

), which is the quotient between the number of healthy class observations and the number of faulty cases belonging to each rotor state. For imbalanced datasets, a new observation denoted as

x_{n + 1}

is generated from the features of the random sample

x_{j}

and the original

x_{i}

observation used to generate it. The generation of new features is created by multiplying a random number r between 0 and 1 and the difference between feature

x_{i}

and

x_{j}

as indicated in (8).

x_{n + 1} = x_{i} + r (x_{j} - x_{i}), 0 \leq r \leq 1

(8)

This random multiplication is the main disadvantage of SMOTE when used for diagnosis purposes. However, the borderline-SMOTE (BLS) method acts selectively, and it was useful as inspiration for the following density-based SMOTE. The BLS method focuses on observations close to the boundary areas prone to misclassification. This method defines a danger group for minority instances in danger of misclassification, which makes it possible to take these ones as candidates to oversample.

3.2. Safe Level-SMOTE

The Safe-Level SMOTE algorithm is an extension of the original SMOTE method. This version considers the “safe” observations found in the minority class [22]. The observations corresponding to this “safe” group are from the set of minority class observations that k-NN finds by computing the Euclidean distance. Observations that fall outside of this term coined as “safe” are considered noise. For this reason, a “safe” level ratio (

s l r

) needs to be defined to obtain synthesized candidates. This ratio is defined as

s l r = s l p / s l n

. The notation is as follows:

s l p

is the “safe” level for a given observation

x_{i}

, and

s l n

is the level for the selected neighbour of

x_{i}

, which is called

x_{j}

. In general, only candidates with minority class observations for

x_{i}

and

x_{j}

are considered. The generation of new synthetic observations goes under Equation (8). But now, r is calculated as

r \in \{\begin{matrix} (0, (1 / s l r)), & if s l r \geq 1 \\ ((1 - s l r), 1), & if s l r < 1 \end{matrix}

(9)

The ratio

s l r = 1

means that

x_{i}

and

x_{j}

are equally important or “safe”. Therefore, the aggregate observations can be randomly selected. On the one hand, if this ratio is greater than 1, the newly aggregated data will be weighted in the direction of the minority class, which brings the new aggregation closer to the

x_{i}

example. On the other hand, if

s l r < 1

, it is weighted in the direction of the majority class, which brings the most recent sample closer to the example

x_{j}

.

3.3. Relocating Safe Level SMOTE

A limitation of the Safe Level-SMOTE method may be that it places some synthetic observations where they may be too close to the nearby majority class, leading to classifier confusion [23]. The location of each surrounding majority class observation is not considered in the safety level SMOTE, i.e., the generation of these synthetic observations may be close to some majority class observations. To prevent this situation, a later extension was proposed in [23]. This SMOTE extension is called Relocating Safe-Level SMOTE (RSLS). RSLS is conceived as a procedure to relocate synthetic observations once they are located close to minority-class observations. This approach follows the SMOTE steps on the safe side, starting with finding the k-NN of each minority class observation and computing its corresponding proportions. This approach follows the SMOTE steps on the safe side, starting with finding the k-NN of each minority class observation and computing its corresponding proportions. Once the observation O and its minority class neighbour

\hat{m}

are determined, the “safe” level values and their safe level ratio are used to determine the possible range of observed aggregation. Once the location of the synthetic observation is randomly selected from this range, this new location is used to calculate the distance between the new synthetic observation and the observations of the surrounding majority group. These distance values are compared with the distance of these aggregate observations to their nearest point, O or

\hat{m}

. Suppose the distance between the composite observation and one of the observations of the surrounding majority group is less than the distance to its nearest point. In that case, the composite observation under consideration is moved closer to this point until it is excluded from the observations of the surrounding majority group.

3.4. Density-Based SMOTE

This method is inspired by the borderline-SMOTE. This technique applies density-based clustering by combining it with SMOTE. The synthetic observations are generated by calculating the shortest path from each instance of the minority class to the pseudo-centroid of the cluster corresponding to the minority class.

3.5. Majority Weighted Minority Oversampling Technique

The method was introduced in [24] and is also known as MWMOTE. The idea behind it is to ensure that the synthetically generated minority observations fall into the correct class cluster. To achieve this, a mechanism calculates the weight according to the distance to the nearest majority class point. Then, synthetic examples are generated from the minority class instance with weighted information using the clustering method.

Figure 3 illustrates the operation of the MWMOTE algorithm for synthetically generating data from the minority class. The figure shows two scatter plots presenting the amplitude of the LSH failure indicator as a function of motor slip (which also represents the motor load level). The green circles represent the majority class, i.e., the LSH value when the motor is healthy. The blue points represent the minority class, i.e., the LSH value when the motor experiences a broken bar. The synthetic values generated using this technique to balance the two classes are represented in red on the right side of the figure (Figure 3b).

4. Test Bench and Signal Acquisition

The test bench designed for this research is shown in Figure 4. The IM is fed from the line supply and from inverters whose specifications are defined in the Appendix A. In both cases, the voltage main frequency is 50 Hz. The motor load is a magnetic powder brake with its control unit. A Hall Effect current transducer by LEM^® (Meyrin, Switzerland) is used to measure the stator current. A National Instruments^® (Austin, TX, USA) NI cDAQ-9174 base platform with an NI 9215 acquisition module is used for the signal collection. The sampling frequency is 50 kHz.

The rotor fault is simulated by drilling a hole in one of the bars near the short circuit ring. Three rotor condition states are considered: (i) The rotor is healthy in the first tests. This condition is labeled as R1; (ii) After that, a 4.2 mm depth hole is drilled in one of the bars. This intermediate-severity rotor condition is labeled as R3; (iii) Finally, a more severe rotor fault is provoked by drilling the whole rotor bar. This rotor condition is known (R5). The stator current signal is collected while the motor is operating in a steady state for 10 s with a constant load torque. During these tests, the motor slip is in the range of 3–5%, according to the motor specifications mentioned in Appendix A.

Different tests, considering the type of power supply, the state of health of the rotor, and the level at which the motor is loaded, have been developed to develop an automatic steady-state diagnosis tool.

5. Experimental Results

The diagnosis methodology begins with a feature selection stage, the training of a supervised classifier, and finally, an evaluation phase, in which the global performance of the diagnosis tool is evaluated. Table 2 shows the main characteristics of the dataset. The columns describe the voltage supply, faults severities, the imbalance ratio (IR), and the number of samples for the healthy and faulty cases, also expressed as a percentage. The last column is a label for each case.

The IR for each case is the rate between the number of healthy and faulty class instances of each rotor condition. The IRs considered in this study are those indicated in the corresponding column in Table 2. This selection of IR analyses the influence of the level of imbalance under this particular small data problem. The chosen IRs depend on the observations from the experimental trials, which are then processed to obtain the characteristic fault features or variables.

5.1. Feature Selection

After applying the ReliefF algorithm, the feature selection results for both power supplies are shown in Figure 5. In this figure, the red dashed line is the threshold for the features selected for the classification stage. Although a threshold choice is arbitrary, this threshold is set to 0.06, representing the percentile of best-scoring features to select. The analysis of this figure provides some interesting results. The algorithm highlights more relevant features for the line supply case than the inverter-fed case. It is also noticeable what happens with the

m_{3}

and

c_{3}

statistics: they were the worst features in the analysis with line-fed IM data but are selected for the inverter-fed case. The loss of relevance for

S k e w

in the inverter-fed IM case is also appreciable. However, as was expected,

L S H

and

R S H

are the most significant features for all power supply cases. Furthermore, the

Γ

features proposed in [17] do not show outstanding relevance for the diagnosis.

5.2. Effect on the Feature Subspace

After the feature selection stage, the IR is improved by the generation of synthetic data for the minority class using the different techniques previously described: SLSMOTE, RSLSMOTE, and MWMOTE. The balanced datasets are shown in Figure 6 and Figure 7. The first figure is for the motor line-fed case, and the second is for the inverter-fed case. The LSH fault feature vs. the motor slip is represented in all these figures for different failure severities. The different rotor conditions are depicted as follows: healthy case (O); original rotor fault data (+); synthetic rotor fault data (+).

In both figures, the LSH feature is represented vs. the motor slip. In the first row, the following two-rotor conditions are represented: healthy or R1 and intermediate severity or R3. In the second row, these two conditions are plotted: healthy or R1 and fully broken bar or R5. In all cases, the healthy observations outnumber the faulty ones, producing an IR of 12.

The resampling results for each algorithm are shown with red markers for each algorithm. The synthetic data generated with the SMOTE technique for the line supply case shows dispersed decision boundaries for the two-rotor severity (R3 and R5). However, with Safe-Level SMOTE and Relocating Safe-Level SMOTE variants, this does not happen. In the inverter-fed case, the results are quite similar.

5.3. Classification Stage

After balancing the datasets, three classification techniques are trained: Support Vector Machine (SVM) with Gaussian kernel as a radial basis function, Adaptive Boosting (AdaBoost), and instance-based k-nearest Neighbors (k-NN). The tuning parameters are

C = 1

for SVM, the number of neighbours for k-NN, and SMOTE set to

k = 5

. Previously, different combinations of tuning parameters were explored during the parameter tuning procedure. The amount of oversampling is predetermined from the beginning for the imbalanced ratios indicated in Table 2.

6. Discussion

The twelve case studies are summarized in Table 2, while Table 3 presents the metrics used to evaluate the classification results of these cases. The performance of three classification algorithms is compared with datasets obtained after variable selection and class balancing using three oversampling algorithms. The comparison metrics were obtained through a 10-fold cross-validation process and were used for the comparative study of imbalanced classes. The metrics used for comparison are Area Under the ROC Curve (AUC) and G-mean1. AUC is derived from ROC analysis, which evaluates the performance of classifiers by plotting the True Positive Rate against the False Positive Rate while varying a score threshold. It is a scalar value representing the probability that a classifier performs better at classifying a fault instance over a healthy one, providing a single score for evaluating which algorithm is better on average.

G_{m e a n 1}

is used to evaluate the relative balance of classifier performance on all classes, computed as negative (Healthy) and positive (Fault):

G_{m e a n 1} = \sqrt{T P R \times T N R},

(10)

where

T P R

and

T N R

are True Positive Rate and True Negative Rate, respectively.

For study cases 8 and 9, the combination of MWMOTE for class balancing and the AdaBoost classifiers performs better than the rest of the oversampling algorithms and classifiers. These cases correspond to the motor fed by inverter and intermediate inverter feeding, intermediate fault severity (R3), and imbalance ratios IR = 7 and IR = 3, respectively. However, when the imbalance ratio increases, every classifier and oversampling tool performs quite similarly, i.e., no statistical significance is found in their results. The statistical significance is checked with the Holm test. This test is applied for a pairwise performance among each oversampling approach whose p-values are shown in Table 3. The outcome is to reject the null hypothesis when this p-value falls below a threshold. This outcome implies that the corresponding SMOTE extension is outperformed by the remaining ones. This situation is highlighted with an asterisk in the p-value and found that only happens for the MWMOTE approach. Studies 4–6 and 10–12 cover both power sources, the maximum fault severity (R5), and all class imbalance rates. The classification metrics obtained are good for the various combinations of oversampling techniques and classification algorithms.

7. Conclusions

This article reviews and applies several resampling techniques for diagnosing rotor faults in line-fed and inverter-fed induction motors. These resampling techniques are SMOTE extensions and produce synthetic observations belonging to the minority fault class for balancing training sets and improving the synthetic data generation according to boundary area behavior.

The evaluation of the classifier performance with the metrics

A U C

and

G_{m e a n 1}

metrics shows that for the highest fault severity of the broken rotor bar fault (R5), all resampling techniques considered have similar behavior, which implies that in terms of performance, any of them can obtain successful diagnostic results independent of the motor power supply and the imbalanced ratio analyzed.

Therefore, by incorporating any of these oversampling techniques into the diagnosis methodology to predict the rotor’s condition, it is possible to construct a more precise and robust diagnosis of the rotor faults of inverter-powered induction motors with small and unbalanced data.

In addition, from this experimental study, the authors observed that one area of exploration is the application of these oversampling methods to more incipient rotor breakages, where the feature differences regarding the healthy class are much more reduced.

Author Contributions

Conceptualization, I.M.-D. and D.M.-S.; data curation, I.M.-D.; methodology, I.M.-D. and D.M.-S.; software, T.G.-C.; validation, D.M.-S. and T.G.-C.; writing—original draft, I.M.-D.; writing—reviewing and editing, D.M.-S. and Ó.D.-P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the University of Valladolid.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request.

Acknowledgments

The authors are grateful for the institutional support received from the University of Valladolid as well as the Department of Electrical Engineering, which has made it possible to have the necessary material and human resources, who, in one way or another, contributed to the experimental design. In addition, we would like to thank the editor and reviewers for taking the time and effort necessary to review the manuscript. We sincerely appreciate all valuable comments and suggestions, which helped us to improve the quality of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AUC	Area Under the ROC Curve
BRB	Broken Rotor Bar
DBSMOTE	Density-Based Synthetic Minority Oversampling Technique
FD	Fault Data
HT	Holm Test
k-NN	k-Nearest Neighbors
MCSA	Motor Current Signature Analysis
MWMOTE	Majority weighted minority oversampling technique
RSLS	Relocating Safe-Level SMOTE
SLSMOTE	Safe-Level SMOTE
SMOTE	Synthetic Minority Oversampling Technique
SVM	Support Vector Machine

Appendix A. Equipment Specifications

Three-phase asynchronous squirrel cage motor (Siemens^®, Munich, Germany), star connected. Nominal characteristics: power = 0.75 kW, f = 50 Hz, voltage = 400 V, current = 1.9 A, nominal speed = 1395 rpm, and the number of pole pairs is two. Three three-phase inverters from different brands were used. The first one is an ABB^® (model ACS355, Zürich, Switzerland), the second one is an Allen Bradley^® (model PowerFlex 40, Milwaukee, WI, USA), and the last one is a Telemecanique^® (model Altivar 66, Rueil-Malmaison, France) with range power from 0.37 to 4 kW, 0.4 to 2.2 kW and 0.75 to 250 kW, respectively. The switching frequency of all inverters is 4 kHz.

References

Park, Y.; Jeong, M.; Bin Lee, S.; Teska, M.; Antonino-Daviu, J.A. Influence of Blade Pass Frequency Vibrations on MCSA-Based Rotor Fault Detection of Induction Motors. IEEE Trans. Ind. Appl. 2017, 53, 2049–2058. [Google Scholar] [CrossRef]
Zhou, Z.J.; Hu, C.H.; Wang, W.B.; Zhang, B.C.; Xu, D.L.; Zheng, J.F. Condition-based maintenance of dynamic systems using online failure prognosis and belief rule base. Expert Syst. Appl. 2012, 39, 6140–6149. [Google Scholar] [CrossRef]
Pons-Llinares, J.; Antonino-Daviu, J.; Roger-Folch, J.; Morinigo-Sotelo, D.; Duque-Perez, O. Mixed eccentricity diagnosis in Inverter-Fed Induction Motors via the Adaptive Slope Transform of transient stator currents. Mech. Syst. Signal Process. 2014, 48, 423–435. [Google Scholar] [CrossRef]
Benbouzid, M.E.H. A review of induction motors signature analysis as a medium for faults detection. IEEE Trans. Ind. Electron. 2000, 47, 984–993. [Google Scholar] [CrossRef]
Cunha Palacios, R.; Da Silva, I.; Goedtel, A.; Godoy, W. A comprehensive evaluation of intelligent classifiers for fault identification in three-phase induction motors. Electr. Power Syst. Res. 2015, 127, 249–258. [Google Scholar] [CrossRef]
Prieto, M.D.; Cirrincione, G.; Espinosa, A.G.; Ortega, J.A.; Henao, H. Bearing fault detection by a novel condition-monitoring scheme based on statistical-time features and neural networks. IEEE Trans. Ind. Electron. 2013, 60, 3398–3407. [Google Scholar] [CrossRef]
Ghate, V.N.; Dudul, S.V. Cascade neural-network-based fault classifier for three-phase induction motor. IEEE Trans. Ind. Electron. 2011, 58, 1555–1563. [Google Scholar] [CrossRef]
Gonzalez-Jimenez, D.; del Olmo, J.; Poza, J.; Garramiola, F.; Madina, P. Data-Driven Fault Diagnosis for Electric Drives: A Review. Sensors 2021, 21, 4024. [Google Scholar] [CrossRef]
Gonzalez-Jimenez, D.; Del-Olmo, J.; Poza, J.; Garramiola, F.; Madina, P. Data-Driven Low-Frequency Oscillation Event Detection Strategy for Railway Electrification Networks. Sensors 2023, 23, 254. [Google Scholar] [CrossRef]
Gonzalez-Jimenez, D.; del Olmo, J.; Poza, J.; Garramiola, F.; Sarasola, I. Machine Learning-Based Fault Detection and Diagnosis of Faulty Power Connections of Induction Machines. Energies 2021, 14, 4886. [Google Scholar] [CrossRef]
Martin-Diaz, I.; Morinigo-Sotelo, D.; Duque-Perez, O.; Romero-Troncoso, R.J. Early fault detection in induction motors using AdaBoost with imbalanced small data and optimized sampling. IEEE Trans. Ind. Appl. 2016, 53, 3066–3075. [Google Scholar] [CrossRef]
Niu, G.; Dong, X.; Chen, Y. Motor Fault Diagnostics Based on Current Signatures: A Review. IEEE Trans. Instrum. Meas. 2023, 72, 1–19. [Google Scholar] [CrossRef]
Rivera, W.A. Noise Reduction A Priori Synthetic Over-Sampling for class imbalanced data sets. Inform. Sci. 2017, 408, 146–161. [Google Scholar] [CrossRef]
Zhang, T.; Chen, J.; Li, F.; Zhang, K.; Lv, H.; He, S.; Xu, E. Intelligent fault diagnosis of machines with small & imbalanced data: A state-of-the-art review and possible extensions. ISA Trans. 2022, 119, 152–171. [Google Scholar] [CrossRef] [PubMed]
Kliman, G.; Stein, J. Methods of motor current signature analysis. Electr. Mach. Power Syst. 1992, 20, 463–474. [Google Scholar] [CrossRef]
Martin-Diaz, I.; Morinigo-Sotelo, D.; Duque-Perez, O.; Delgado-Arredondo, P.; Camarena-Martinez, D.; Romero-Troncoso, R. Analysis of various inverters feeding induction motors with incipient rotor fault using high-resolution spectral analysis. Electr. Power Syst. Res. 2017, 152, 18–26. [Google Scholar] [CrossRef]
Bruzzese, C. Analysis and Application of Particular Current Signatures (Symptoms) for Cage Monitoring in Nonsinusoidally Fed Motors With High Rejection to Drive Load, Inertia, and Frequency Variations. IEEE Trans. Ind. Electron. 2008, 55, 4137–4155. [Google Scholar] [CrossRef]
Robnik-Sikonja, M.; Kononenko, I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 2003, 53, 23–69. [Google Scholar] [CrossRef]
Kononenko, I.; Robnik-Sikonja, M.; Pompe, U. ReliefF for Estimation and Discretization of Attributes in Classification, Regression, and ILP Problems; IOS Press: Amsterdam, The Netherlands, 1996; Volume 35. [Google Scholar]
Murcia-Sepúlveda, N.; Cruz-Duarte, J.M.; Martin-Diaz, I.; Garcia-Perez, A.; Rosales-García, J.J.; Avina-Cervantes, J.G.; Correa-Cely, C.R. Fractional Calculus-Based Processing for Feature Extraction in Harmonic-Polluted Fault Monitoring Systems. Energies 2019, 12, 3736. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Bunkhumpornpat, C.; Sinapiromsaran, K.; Lursinsap, C. Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem. In Proceedings of the Advances in Knowledge Discovery and Data Mining, Bangkok, Thailand, 27–30 April 2009; Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.B., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 475–482. [Google Scholar]
Siriseriwan, W.; Sinapiromsaran, K.; Tipawanna, M.; Bunkhumpornpat, C.; Keatruangkamala, K. The effective redistribution for imbalance dataset: Relocating Safe-level SMOTE with minority outcast handling. Chiang Mai J. Sci. 2016, 43, 1288–1300. [Google Scholar]
Barua, S.; Islam, M.M.; Yao, X.; Murase, K. MWMOTE—Majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 2014, 26, 405–425. [Google Scholar] [CrossRef]

Figure 1. Power spectral density of a measured stator current from a line-fed induction motor.

Figure 2. Power spectral density of a measured stator current from an inverter-fed induction motor.

Figure 3. Demonstration of the operation of the MWMOTE algorithm. A dataset from an inverter-fed motor is used. The green markers represent the majority class (healthy motor), and the blue markers represent the minority class (motor with failure). (a) Case with imbalanced classes; (b) synthetically balanced case (synthetically created markers are shown in red).

Figure 4. Test bench employed in this research. (1) Induction motor; (2) magnetic powder brake and load control unit (3) custom-made hall effect sensor board; (4) DAQ board; (5) laptop and software; (6) inverters; (7) simulated broken rotor bars.

Figure 5. Results of the feature selection with the ReliefF algorithm: a line-fed case at the (top) and an inverter-fed case at the (bottom) of the figure.

Figure 6. Bidimensional feature scatterplot (slip and LSH) for synthetically balanced data from line supply with the resampling techniques: Safe Level-SMOTE (SLMOTE), Relocating Safe Level SMOTE (RSLMOTE), and Majority weighted minority oversampling technique (MWMOTE). An imbalanced ratio of 12. Top figures: R3 rotor severity. Bottom figures: R5 rotor severity. The different rotor conditions are depicted as follows: healthy case (O); original rotor fault data (+); synthetic rotor fault data (+).

Figure 7. Bidimensional feature scatterplot (slip and LSH) for synthetically balanced data from inverter-based supply with the resampling techniques: Safe Level-SMOTE (SLMOTE), Relocating Safe Level SMOTE (RSLMOTE), and Majority weighted minority oversampling technique (MWMOTE). An imbalanced ratio of 12. Top figures: R3 rotor severity. Bottom figures: R5 rotor severity. The different rotor conditions are depicted as follows: healthy case (O); original rotor fault data (+); synthetic rotor fault data (+).

Table 1. Statistics computed from the time-domain signal.

Variables in Time-Domain
Variable	Notation	Formula
1st. Moment	$m_{1}$	$\frac{1}{N} \sum x [n]$
2nd. Moment	$m_{2}$	$\frac{1}{N} \sum {(x [n] - \bar{x} [n])}^{2}$
3rd. Moment	$m_{3}$	$\frac{1}{N} \sum {(x [n] - \bar{x} [n])}^{3}$
4th. Moment	$m_{4}$	$\frac{1}{N} \sum {(x [n] - \bar{x} [n])}^{4}$
2nd. Cumulant	$c_{2}$	$m_{2} - m_{1}^{2}$
3rd. Cumulant	$c_{3}$	$m_{3} - 3 \cdot m_{1} m_{2} + 2 \cdot m_{1}^{3}$
4th. Cumulant	$c_{4}$	$m_{4} + 3 m_{3} m_{1} - 3 \cdot m_{2}^{2} + 12 \cdot m_{2} m_{1}^{2}$ $- 6 \cdot m_{1}^{4}$
Skewness	$S k e w$	$\frac{m_{3}}{{(\sqrt{m_{2}})}^{3}}$
Kurtosis	$K u r t$	$\frac{m_{4}}{{(\sqrt{m_{2}})}^{4}}$
Absolute mean (ABM)	$\| \bar{x} \|$	$\frac{1}{N} \sum \bar{x}$
Peak value	PV	$\frac{1}{2} (max (x [n]) - min (x [n]))$
Squared root value	SRV	${(\frac{1}{N} \sum \sqrt{\| x \|})}^{2}$
RMS value	RMS	$\sqrt{\frac{1}{N} \sum_{n = 0}^{N - 1} {[x [n] - \bar{x}]}^{2}}$
Crest factor	CF	$\frac{P V}{R M S}$
Shape factor	SF	$\frac{R M S}{\| \bar{x} \|}$

Table 2. Description of the experimental datasets.

Supply	Severities	Imbalance Ratio (IR)	#Samples (Healthy–Faulty)	% (Healthy–Faulty)	Case Number
Line-fed	R1–R3	12	195–15	92.85–07.15	1
	R1–R3	7	105–15	87.50–15.50	2
	R1–R3	3	45–15	75.00–25.00	3
Line-fed	R1–R5	12	195–15	92.85–07.15	4
	R1–R5	7	105–15	87.50–15.50	5
	R1–R5	3	45–15	75.00–25.00	6
Inverter-fed	R1–R3	12	195–15	92.85–07.15	7
	R1–R3	7	105–15	87.50–15.50	8
	R1–R3	3	45–15	75.00–25.00	9
Inverter-fed	R1–R5	12	195–15	92.85–07.15	10
	R1–R5	7	105–15	87.50–15.50	11
	R1–R5	3	45–15	75.00–25.00	12

Table 3. Classification results with feature selection and oversampling with SLSMOTE, RSLSMOTE, and MWMOTE.

Classifier	Supply	Case Number	Severity	Imbalance Ratio (IR)	AUC (%)	p-Value	Gmean1 (%)
SVM	Line-fed	1	R3	12	$97 \pm 2.55$	$0.085$	$98 \pm 1.58$
	Line-fed	2	R3	7	$97 \pm 2.52$	$0.251$	$97 \pm 2.63$
	Line-fed	3	R3	3	$97 \pm 1.36$	$0.651$	$97 \pm 2.14$
	Line-fed	4	R5	12	$100 \pm 0.00$	$0.156$	$100 \pm 0.00$
	Line-fed	5	R5	7	$100 \pm 0.00$	$0.746$	$100 \pm 0.00$
	Line-fed	6	R5	3	$100 \pm 0.00$	$0.365$	$100 \pm 0.00$
	Inverter	7	R3	12	$97 \pm 2.11$	$0.711$	$97 \pm 2.77$
	Inverter	8	R3	7	$96 \pm 1.23$	<0.01	$97 \pm 2.98$
	Inverter	9	R3	3	$96 \pm 2.58$	<0.01	$97 \pm 2.73$
	Inverter	10	R5	12	$100 \pm 0.00$	$0.411$	$100 \pm 0.00$
	Inverter	11	R5	7	$100 \pm 0.00$	$0.709$	$100 \pm 0.00$
	Inverter	12	R5	3	$100 \pm 0.00$	$0.177$	$100 \pm 0.00$
AdaBoost	Line-fed	1	R3	12	$97 \pm 1.23$	$0.785$	$98 \pm 1.42$
	Line-fed	2	R3	7	$97 \pm 2.58$	$0.150$	$97 \pm 2.40$
	Line-fed	3	R3	3	$98 \pm 1.32$	$0.255$	$98 \pm 1.84$
	Line-fed	4	R5	12	$100 \pm 0.00$	$0.456$	$100 \pm 0.00$
	Line-fed	5	R5	7	$100 \pm 0.00$	$0.606$	$100 \pm 0.00$
	Line-fed	6	R5	3	$100 \pm 0.00$	$0.865$	$100 \pm 0.00$
	Inverter	7	R3	12	$97 \pm 1.10$	$0.545$	$97 \pm 2.66$
	Inverter	8	R3	7	$97 \pm 1.23$	<0.01 *	$97 \pm 2.09$
	Inverter	9	R3	3	$98 \pm 2.58$	<0.01 *	$97 \pm 2.20$
	Inverter	10	R5	12	$100 \pm 0.00$	$0.811$	$100 \pm 0.00$
	Inverter	11	R5	7	$100 \pm 0.00$	$0.709$	$100 \pm 0.00$
	Inverter	12	R5	3	$100 \pm 0.00$	$0.387$	$100 \pm 0.00$
k-NN	Line-fed	1	R3	12	$97 \pm 1.02$	$0.113$	$98 \pm 1.88$
	Line-fed	2	R3	7	$96 \pm 2.52$	$0.852$	$97 \pm 2.81$
	Line-fed	3	R3	3	$97 \pm 2.02$	$0.673$	$97 \pm 2.88$
	Line-fed	4	R5	12	$98 \pm 1.56$	$0.555$	$100 \pm 0.00$
	Line-fed	5	R5	7	$100 \pm 0.00$	$0.143$	$100 \pm 0.00$
	Line-fed	6	R5	3	$100 \pm 0.00$	$0.861$	$100 \pm 0.00$
	Inverter	7	R3	12	$97 \pm 1.10$	$0.219$	$97 \pm 2.97$
	Inverter	8	R3	7	$95 \pm 3.32$	<0.01 *	$97 \pm 2.62$
	Inverter	9	R3	3	$96 \pm 3.88$	<0.01	$96 \pm 3.66$
	Inverter	10	R5	12	$100 \pm 0.00$	$0.815$	$100 \pm 0.00$
	Inverter	11	R5	7	$100 \pm 0.00$	$0.113$	$100 \pm 0.00$
	Inverter	12	R5	3	$100 \pm 0.00$	$0.770$	$100 \pm 0.00$

* Outcome of the test from pairwise performance where MWMOTE outperforms the rest.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Martin-Diaz, I.; Garcia-Calva, T.; Duque-Perez, Ó.; Morinigo-Sotelo, D. Imbalanced Diagnosis Scheme for Incipient Rotor Faults in Inverter-Fed Induction Motors. Appl. Sci. 2024, 14, 7237. https://doi.org/10.3390/app14167237

AMA Style

Martin-Diaz I, Garcia-Calva T, Duque-Perez Ó, Morinigo-Sotelo D. Imbalanced Diagnosis Scheme for Incipient Rotor Faults in Inverter-Fed Induction Motors. Applied Sciences. 2024; 14(16):7237. https://doi.org/10.3390/app14167237

Chicago/Turabian Style

Martin-Diaz, Ignacio, Tomas Garcia-Calva, Óscar Duque-Perez, and Daniel Morinigo-Sotelo. 2024. "Imbalanced Diagnosis Scheme for Incipient Rotor Faults in Inverter-Fed Induction Motors" Applied Sciences 14, no. 16: 7237. https://doi.org/10.3390/app14167237

APA Style

Martin-Diaz, I., Garcia-Calva, T., Duque-Perez, Ó., & Morinigo-Sotelo, D. (2024). Imbalanced Diagnosis Scheme for Incipient Rotor Faults in Inverter-Fed Induction Motors. Applied Sciences, 14(16), 7237. https://doi.org/10.3390/app14167237

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Imbalanced Diagnosis Scheme for Incipient Rotor Faults in Inverter-Fed Induction Motors

Abstract

1. Introduction

2. Background

2.1. Diagnosis Signals

2.2. Time-Domain Fault Signatures

2.3. Fault Signatures from Spectra

2.4. Feature Selection through ReliefF Algorithm

3. Problem Statement for a Class-Imbalanced Scenario

3.1. SMOTE and Following Extensions for Balancing Datasets

3.2. Safe Level-SMOTE

3.3. Relocating Safe Level SMOTE

3.4. Density-Based SMOTE

3.5. Majority Weighted Minority Oversampling Technique

4. Test Bench and Signal Acquisition

5. Experimental Results

5.1. Feature Selection

5.2. Effect on the Feature Subspace

5.3. Classification Stage

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Equipment Specifications

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI