1. Introduction
Induction motors are a fundamental part of many production processes due to their inherent robustness, low cost, and reliability, among other advantages. However, they are not fault-free, with bearings being the component that accounts for the greatest percentage of total failures [
1].
The signals that are most frequently used for bearing fault detection are vibration and acoustic noise [
2]. However, the use of the stator current to monitor the motor provides some practical advantages related to the simplicity and noninvasive characteristics of the sensors. These advantages are especially relevant in industrial facilities, where some motors can run simultaneously [
3,
4]. The use of current has proven its effectiveness in detecting faults such as broken bars and eccentricity [
2], but in the case of faulty bearings, it faces technical difficulties that hinder its successful implementation. Mainly, the low energy of the vibrations are associated to the fault, which makes it difficult to distinguish in the current spectrum the frequency components related to the fault that may be buried in the noise [
1,
5,
6]. Besides, for inverter-fed motors, the noise is higher and there are other harmonics present in the spectrum, which complicates even more the detection of the faulty related components [
7]. Consistently, in [
8], denoising techniques are applied to highlight the faulty components in the current spectrum. Other advanced spectral techniques have also been proposed, such as wavelets [
9,
10], Short-Time Fourier Transform [
11], Gabor spectrogram [
11] Hilbert–Huang Transform [
12,
13], Empirical Mode Decomposition [
14], Ensemble Empirical Mode Decomposition [
15], Modulation Signal Bispectrum [
16], Spectral Kurtosis [
17], Spectral Subtraction [
18], and space vector angular fluctuation method [
19]. These techniques have the drawback of a high computational cost. Parametric methods that assume that the signal fits a particular model have also been proposed such as MUSIC (Multiple Signal Classification) [
14,
20] and maximum likelihood estimation [
21]. These methods yield good results as long as the signal really follows the assumed model. Besides, when analyzing inverter-fed motors, the spectra are noisier and the harmonics injected by the inverter have to be considered too, what makes it difficult to apply the mentioned techniques.
The first stage related to the detection of the fault provides with some fault signatures that will feed the second stage of the process, the diagnosis. A wide variety of algorithms have been proposed to diagnose faulty bearings, such as Artificial Neural Networks [
22,
23], Support Vector Machines [
12,
24,
25], K-nearest neighbors [
26,
27], supervised fuzzy-neighborhood density-based clustering [
28], random forest [
29], bagging, boosting, and stacking methods [
30], Common Vector Approach [
31], Decision Trees [
32,
33], maximum margin classification [
34], Bayes classifier [
35], Euclidean Distance Minimization [
36], and Bayesian inference [
37]. The aforementioned algorithms were mostly applied making use of the known fault signatures related to bearing faults, limited to just a few signatures (usually just the sideband around the main harmonic, as it will be shown in
Section 2). However, for challenging cases, this is a restricted use of the information available in the spectra. In the case of inverter-fed motors, the information related to sidebands around the harmonics introduced by the supply can be used [
3]. In [
38], it is shown how the effects of different types of bearing faults are spread over the spectrum, being for some cases more notorious for the 3rd and 7th harmonics and, in other cases, for high-frequency odd harmonics; besides, these effects are different depending on the operating characteristics of the motor. In [
39], the cases of excessive and defective lubrication are analyzed, showing how these situations produce changes in the amplitudes of different sidebands around different harmonics, showing also a variation depending on the load.
Consequently, in this paper, it is proposed to take advantage of the information available over the spectrum, considering not only the main frequency, but also different odd and even harmonics (up to the 11th one) and including many more sidebands than is usual in literature. This way, instead of using just a few signatures to feed the algorithm, as is usual in literature, almost one thousand signatures are used in this proposal.
The drawback of feeding the classifier with a high number of signatures is a clear risk of overfitting. Overfittings arises when the model has learned the data too well, leading to a small error in the training set (used to build the model) but poor prediction ability. Besides, overfitting is intensified by the presence of noise in the data [
40] and, precisely, when dealing with motors fed from inverters, a significant presence of noise is to be expected. A solution to minimize the overfitting problem is to apply shrinkage techniques. These techniques perform shrinking on the values of the coefficients in the trained model. There are several versions of shrinkage techniques, depending on the degree of shrinkage of each coefficient. If some of them can be set to zero, then the method is known as Lasso (Least Absolute Shrinkage and Selection Operator), and consequently the number of signatures in the classifier is reduced, obtaining simpler models. If the value of the coefficients is reduced, but all the signatures are included in the model, the method is known as Ridge Regression. The technique known as Elastic Nets is a way of combining both Lasso and Ridge Regression.
To monitor correctly the state of the bearings and be really useful for maintenance purposes, it is essential to be able to distinguish between different states of deterioration and detect incipient faults before they develop into critical ones. With this purpose, it has been simulated in the laboratory a progressive deterioration of an induction motor bearing via the contamination of the lubrication, introducing particles of silicon carbide in the bearing grease. This process tries to emulate conditions usually present in the industry that produce bearing wear related to the use, to inadequate lubrication, or to the proper contamination of the grease in open ball bearings.
In this paper, it is proposed the use of a large number of fault signatures, obtained from the current spectra to monitor bearing failures. A case study is presented, where five states of deterioration of the bearing are considered, giving rise to a problem of multiclassification. The improvement in the performance of different classifiers when using such fault signatures is shown by comparing with the performance of the usual signatures used in these studies. Then, to deal with the problem of overfitting, shrinkage techniques are applied, comparing the performance of Lasso, Ridge Regression, and Elastic Nets, proving their validity to diagnose bearing failures.
2. Fault Signatures
When a bearing defect appears, a radial motion between rotor and stator will occur, modifying the airgap of the motor, thus changing the airgap field. These modifications in the airgap can be interpreted as a combination of bidirectional rotating eccentricities [
41], which implies that the defect affects the stator current and, therefore, it is possible to monitor it in the current spectra. The radial motion generates harmonics in the stator current at frequencies given by Equation (1)
where
f1 is the main supply frequency, n is an entire number, and
fv is the vibration characteristic frequency.
fv depends on the type of bearing fault (outer or inner race, balls and train defect), with expressions that are a function of the geometry and composition of the bearing [
7].
The fault frequencies given by Equation (1) are the result of taking into account the deviations in the main component of the airgap field. When the motor is fed by a power converter, the harmonics level is increased and so is the noise level, hampering the detection of the fault signatures. Nevertheless, the presence of these harmonics can be used to increase the available information, also considering the deviations produced in the fields as a consequence of these harmonics. Even when the motor is directly fed from the line, since the supply is hardly ever perfectly sinusoidal, the number of fault signatures can be increased too, considering the harmonic introduced by the supply. Therefore, Equation (1) can be generalized by Equation (2), where
k is the order of the current harmonic.
Considering Equation (2), the number of fault signatures can be increased, resulting in a smaller or larger number of variables depending on the value of k and n. In [
7,
38], the first sideband around the 5th and 7th harmonics is employed. In this paper, it is proposed to use a larger number of signatures, considering more harmonics, and other sidebands in addition to the first one. In the case study of
Section 4, the first eleven current harmonics are used to feed the classifier, considering the first eleven sidebands around those harmonics. As each sideband is composed of two values, and eleven sidebands around eleven harmonics are considered, there are 242 signatures for each characteristic bearing fault frequency, resulting in 968 signatures.
Table 1 summarizes the information regarding the proposed fault signatures and the comparison with the traditional approach.
3. Diagnosis
The next step after selecting the candidate fault signatures is to choose and train the classifier. Many classification algorithms are available, with a wide variety of them already proposed to perform diagnosis tasks in induction motors. With the purpose of analyzing the improvement in the performance of the classifier when using the fault signatures presented in the previous section, the MATLAB 2019a (Mathworks, Natick, Massachusetts, U.S.A.) Classification learner App has been used. In this app, there are available different types of classifiers: decision trees, discriminant analysis, logistic regression classifiers, Naïve Bayes classifiers, support vector machines, nearest neighbor classifiers, and ensemble classifiers. In each group, there are several classifiers available. Using these classifiers, it has been proved (as shown in the results section) the huge increase in performance of all the classifiers when using the 968 fault signatures instead of the usual eight signatures.
However, when using such a high number of signatures, and with a reduced number of tests, the risk of overfitting is certain. Shrinkage techniques allow to make use of all the predictors by shrinking the coefficients towards zero, hence, reducing variance [
42]. If applied in linear models (which has the advantage in terms of interpretability of the model), it performs as follows: let
xi be the m predictors (or fault signatures in the context of condition monitoring) and
yi the response for the n cases of the problem. A linear model tries to estimate the m + 1 coefficients (
b0, …,
bm). Using a least squares fitting approach,
bi are selected to minimize (a).
To perform the shrinkage, a second term is added to (a),
, which acts as shrinkage penalty. Its influence depends on the value of λ, which is a tuning parameter that increases or decreases the penalty. For higher values of λ, the penalty grows and the coefficient estimated will tend to zero, which implies that the estimation is somehow penalized, sacrificing some of the performance on the training set but with the aim of improving its predictive capacity with future observations. The penalty applies to all the coefficients but the intercept,
b0, since this term is just an estimation of the mean when the predictors are zero [
42].
This way of applying the penalty, so performing the shrinkage on the estimated coefficients is known as Ridge Regression. It has the disadvantage that the shrinkage is applied to all the coefficients but none of them are set to zero, so all the predictors are included in the solution, which for problems with a large number of predictors (as in the problem dealt in this paper) leads to a loss of the interpretability of the model. A way of tackling this problem is to change the penalty term into
, or in statistical terms, to change an
l2 penalty for an
l1 one [
42]. The use of an
l1 norm has the inconvenience of turning the function to minimize into one that is nondifferentiable, although there are available methods to proceed with the minimization, such as proximal gradient ones [
43]. This way of considering the penalty gives rise to the method known as Lasso. As opposed to Ridge Regression, with Lasso, some variables are canceled, so perform as variable selection, depending the number of the variables to be selected on the value of λ (as λ grows, less variables are selected).
Lasso was first applied to linear regressions and lately is receiving much attention, being proposed to regularize a wide variety of statistical models [
44]. In accordance with Occam’s razor principle, simpler models are preferable, as long as they predict well the training data, since they are more likely to generalize well to unseen data [
45]. With this principle in mind, Logistic Regression has been chosen as base model in which to apply the shrinkage technique. Logistic regression is adapted to classification problems since has a discrete outcome. It is based on the logistic function given by Equation (4), which is suitable to be used in classification since its outcome can be inferred as a probability since it runs between 0 and 1, and its elongated S-shape offers the advantage that the same additional input influences less the outcome for values near zero or one [
46,
47]. For binary classification, a threshold value of 0.5 is defined to assign the outcome to one class or the other, which in condition monitoring would be healthy or faulty. When the aim is to distinguish among different states of failure, there are several classes into which the outcome can be classified. This multiclassification is performed via the one-versus-all approach, as represented in the flow chart in
Figure 1. This way, several binary classifiers are trained (as many as classes), where each classifier confronts one class against the rest. Finally, the outcome is assigned to the class where the probability is highest.
5. Discussion
A procedure for the diagnosis of induction motor bearings has been presented. The main purpose of the proposal is to maintain the good performance of existing methods that use vibrations or sound as inputs, but using the stator current. So far, the monitoring of the current has not achieved as good a performance as the use of the other variables mentioned, but since it has some clear advantages related to the necessary sensors, it is advisable to have a procedure that allows the use of the current. To achieve this goal, it has been proposed to take advantage of more information that can be extracted from the spectra beyond what is commonly used, but without the extra computational cost that other techniques, including parametric and non-parametric methods, usually require. Besides, the proposed method is particularly adequate for inverter-fed induction motors where noisier spectra and significant harmonics and interharmonics are present.
It has been shown that the use of much more information greatly improved the performance of the diagnosis, which has been proved by means of 24 classifiers (available in the Matlab Classification learner app). However, it must be taken into account that detection and diagnosis are interlinked. There is no use in expecting a good diagnosis performance if the fault signatures obtained during the detection process are of a bad quality. Conversely, although there were high informative fault signatures, if the diagnosis stage is badly designed, the whole process will suffer. Besides, the chosen algorithm must be in accordance with the available variables. Therefore, it has been selected a type of classifier that can perform well with the particular conditions of the problem, where there were many more fault signatures than cases to classify. Shrinkage methods have been chosen since they allow to perform in those condition, avoiding the problem of overfitting.
Three shrinkage methods have been compared, Lasso, Ridge regression, and Elastic nets, and all of them have proved to achieve a very good performance in the cases analyzed. Although all three meet the expectations, Lasso has been chosen to analyze its results in greater depth, since this method selects variables, providing simpler and more interpretable models. For the analysis of the performance of Lasso, the confusion matrices for eight different scenarios have been provided and analyzed. Although from an algorithmic point of view it is important to classify all the states correctly, from a maintenance perspective it is especially relevant the presence of false positives or false negatives concerning the healthy and complete fault conditions. That is, some misclassifications are more relevant than others are. For example, wrong predictions between conditions corresponding to intermediate and incipient faults are not likely to have important repercussions but, on the contrary, a misclassification between states healthy and complete fault will surely have further implications. It has been shown that the predictions obtained with the proposed method matched the expectations from a condition monitoring perspective.