Rolling Bearing Fault Diagnosis Based on SABO–VMD and WMH–KNN

Liu, Guangxing; Ma, Yihao; Wang, Na

doi:10.3390/s24155003

Open AccessArticle

Rolling Bearing Fault Diagnosis Based on SABO–VMD and WMH–KNN

by

Guangxing Liu

^1,2

,

Yihao Ma

^1,* and

Na Wang

¹

School of Electronic Engineering, Xi’an Shiyou University, Xi’an 710065, China

²

Key Laboratory of Measurement and Control Technology for Oil and Gas Wells, Xi’an Shiyou University, Xi’an 710065, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(15), 5003; https://doi.org/10.3390/s24155003

Submission received: 25 June 2024 / Revised: 27 July 2024 / Accepted: 31 July 2024 / Published: 2 August 2024

(This article belongs to the Section Fault Diagnosis & Sensors)

Download

Browse Figures

Versions Notes

Abstract

:

To improve the performance of roller bearing fault diagnosis, this paper proposes an algorithm based on subtraction average-based optimizer (SABO), variational mode decomposition (VMD), and weighted Manhattan-K nearest neighbor (WMH–KNN). Initially, the SABO algorithm uses a composite objective function, including permutation entropy and mutual information entropy, to optimize the input parameters of VMD. Subsequently, the optimized VMD is used to decompose the signal to obtain the optimal decomposition characteristics and the corresponding intrinsic mode function (IMF). Finally, the weighted Manhattan function (WMH) is used to enhance the classification distance of the KNN algorithm, and WMH–KNN is used for fault diagnosis based on the optimized IMF features. The performance of the SABO–VMD and WMH–KNN models is verified through two experimental cases and compared with traditional methods. The results show that the accuracy of motor-bearing fault diagnosis is significantly improved, reaching 97.22% in Dataset 1, 98.33% in Dataset 2, and 99.2% in Dataset 3. Compared with traditional methods, the proposed method significantly reduces the false positive rate.

Keywords:

bearing; fault diagnosis; variational mode decomposition (VMD); WMH-K nearest neighbor (KNN); subtraction average-based optimizer (SABO)

1. Introduction

The electric oil drilling rig plays a vital and irreplaceable role in the petroleum industry, serving as a cornerstone of oil exploration. Within this critical equipment, rolling bearings hold a position of particular importance as they support and sustain the stable functioning of the rotating machinery components. Nevertheless, the continuous operation under heavy loads, high temperatures, and other influential factors often renders these rolling bearings susceptible to failure, thus posing a significant risk to their lifespan [1,2]. Consequently, the prompt and precise diagnosis of rolling bearing faults in electric drilling rigs holds paramount importance in safeguarding the secure operation of production equipment [3].

In recent years, vibration signal analysis has emerged as a prevalent approach for diagnosing faults in roller bearings. MMM Islam et al. [4] proposed a robust multiple-combination fault diagnosis framework employing an equalization function model for rolling bearings, along with an enhanced single-correlation support vector machines (OAA-MCSVM) classifier. Tan Chao et al. [5] introduced a hybrid framework based on multienvelope teaching optimization (METLBO), integrating parameter-optimized variational mode decomposition (VMD) with an improved support vector machine (ISVM). Zhuang Deyu et al. [6] developed a feature extraction technique employing VMD and sample entropy, accompanied by a refined sequential minimization algorithm using optimal parameters for fault identification. Luo Jianqing et al. [7] proposed a novel rolling bearing fault diagnosis approach, amalgamating adaptive VMD and SR via an improved differential search (IDS) optimization. Ma Jinghua et al. [8] presented a rolling bearing fault diagnosis method integrating an improved VMD adaptive wavelet threshold with noise reduction. Zhenya Quan et al. [9] employed MDE to isolate strong background noise amid weak fault features of rolling bearings, constructing a multi-label k-nearest neighbor (ML-KNN) classifier for recognizing patterns associated with the subtle faults of rolling bearings. Ali Dibaj et al. [10] employed an end-to-end finetuning approach for VMD, a convolutional neural network (CNN), and a novel fault classification scheme to diagnose both single and compound faults in automobile gearbox systems with varying degrees of fault severity. Yuxing Li et al. [11] proposed a variable step size multi-scale single threshold SloEn (VSM-StSloEn), which can not only reflect the complexity of information hidden in different time scales but also make up for the shortcomings of traditional multi-scale processing. Wang Yaping et al. [12] proposed a rolling bearing fault diagnosis method based on the whale optimization algorithm–variational mode decomposition (WOA–VMD) and Graph Attention Network (GAT), utilizing the KNN method to construct graph-structured data. Yuxing Li et al. [13] developed a multivariate SloEn (mvSloEn) and extended it to multi-scale mvSloEn (mvMSloEn), which not only accounts for the correlation of time series complexity within and across channels but also mirrors the complexity of multi-channel time series over multiple scales. While most traditional optimization algorithms mentioned above excel in parameter optimization and fault classification, the optimization algorithms themselves are faced with certain challenges, including slow optimization speeds and susceptibility to local optima, thereby diminishing the accuracy of fault diagnosis. Based on the aforementioned research, this paper employs the subtraction average-based optimizer (SABO) optimization algorithm, known for its straightforward principles and effective optimization outcomes.

Traditional fault diagnosis algorithms exhibit certain limitations in handling fault diagnosis. Lei, Xue et al. [14] and Jiang, Haisheng et al. [15] employed Extreme Learning Machines (ELMs) for fault classification. However, the significant impact of the hidden layer often results in unstable classification performance. Yiyao et al. [16] utilized Long Short-Term Memory (LSTM) for fault classification, but the limitations of a single model can lead to overfitting issues with samples. Building upon this, Guo, Yurong, et al. [17] employed convolutional neural networks (CNNs) and Bidirectional LSTM (BiLSTM) for bearing fault classification. However, the drawback of a lengthy training time arises due to the abundance of fault samples. Zhang, Mei et al. [18] utilized support vector machines (SVMs) to avoid prolonged training times, but the diverse nature of faulty datasets makes it challenging for SVM to find an ideal hyperplane. Kumar, HS et al. [19] used the KNN classifier to address this issue. Nevertheless, since each fault has a different prominent effect, the diagnostic performance for similar fault types may not be ideal. Sun Dingyi and others used a physics-inspired multi-modal machine learning and rotating machinery fault diagnosis based on adaptive correlation fusion [20], but it was not successful enough for multi-source signal processing. Al-Haddad, Luttfi A, et al. used embedded recorded data and stacked machine learning models to fault-diagnose drone actuator damage [21], but the training time was too long for complex application scenarios. Therefore, this paper adopts the weighted Manhattan function (WMH). While considering the Manhattan distance, the WMH introduces weights for each fault type, optimizing the KNN classifier to achieve better performance by accounting for the specificity of each fault. VMD can adaptively decompose signals into several intrinsic mode functions (IMFs) without presetting the number of decomposition layers. This feature makes VMD more flexible and efficient when processing complex signals. VMD can effectively separate signals with different frequency components into different IMFs, thus avoiding the modal aliasing problem. This is of great significance for accurately extracting signal features, especially in vibration signal analysis. VMD performs well when processing noisy signals and can effectively extract useful information from the signal and suppress the noise. This makes VMD highly robust and reliable in practical applications. WMH–KNN combines the weighted Manhattan distance and the KNN classifier and is able to introduce the weight of each fault type while considering the Manhattan distance, thereby optimizing the performance of the KNN classifier. This weighting strategy makes the classifier more flexible and accurate when dealing with different types of faults. By introducing weights, WMH can better reflect the importance of different fault characteristics and improve the classification performance of the classifier on complex multi-modal data. KNN itself has a good performance in processing high-dimensional space data. Combined with the weighted Manhattan distance, WMH–KNN can more effectively handle complex data in high-dimensional space and improve classification accuracy. The SABO–VMD–WMH–KNN model is particularly suitable for multi-modal problems and data processing in high-dimensional spaces and can provide powerful solutions. Its superiority is that in the process of optimizing VMD parameters in the fault diagnosis system, the search speed is significantly improved and the dilemma of sub-optimal solutions is avoided, thereby providing strong support for the accuracy and reliability of fault diagnosis. This model performs well in the analysis of motor-bearing vibration signals and can effectively extract relevant information from vibration signals to improve the accuracy and reliability of fault diagnosis. By comprehensively utilizing SABO, VMD and WMH–KNN methods, a fault diagnosis system with high accuracy and high reliability is established. Its superiority lies in the process of optimizing VMD parameters in the fault diagnosis system, which significantly improves the search speed and avoids falling into the dilemma of sub-optimal solutions, thus providing strong support for accurate and reliable fault diagnosis. A motor-bearing fault diagnosis model termed SABO–VMD–WMH–KNN is established. SABO optimizes VMD parameters to extract optimal modal information from vibration signals, while the WMH–KNN method combines the weighted Manhattan distance with KNN machine learning for effective feature extraction and classification. Together, they form a comprehensive fault diagnosis model. This model is designed to effectively extract pertinent information from motor-bearing vibration signals, thereby enhancing the precision and dependability of fault diagnosis.

The method proposed in this article significantly advances existing technology by introducing the SABO algorithm to enhance the accuracy of VMD parameter optimization and employs the weighted Mahalanobis distance (WMH) to improve the classification performance of the KNN algorithm. This innovation markedly improves accuracy and robustness in complex mechanical fault diagnosis. Not only does it overcome the limitations of traditional methods, but it also demonstrates superior performance in experiments, indicating its broad potential and value in practical industrial applications. The research results offer new insights and technical support for machine fault diagnosis, highlighting significant economic benefits in improving equipment maintenance efficiency and reducing downtime.

The structural arrangement is as follows: Section 2 will introduce the theoretical basis, including VMD, SABO, and WMH–KNN. Section 3 will describe the research methods used in this study. Section 4 will present and discuss the results from Dataset 1 in detail. Section 5 will present and discuss the results from Dataset 2. Finally, Section 6 will summarize the research conclusions, discuss the limitations of the study, and provide suggestions for future research directions.

2. Theoretical Basis

2.1. Principle of VMD Algorithm

VMD, as highlighted by [22], represents a fully non-recursive adaptive signal decomposition technique utilized for breaking down a signal or dataset into numerous local oscillation modes. Each of these modes corresponds to the local frequency and amplitude of the signal. This particular decomposition characteristic renders VMD well-suited for analyzing non-stationary signals, given its capability to capture transient occurrences and local frequency fluctuations within the signal. The mathematical depiction is as follows:

\begin{matrix} u_{k} (t) = A_{k} (t) {\cos φ}_{k} (t) \end{matrix}

(1)

In Formula (1),

u_{k} (t)

represents the

k

th modalunction, which satisfies Formula (2),

A_{k} (t)

represents the instantaneous amplitude, and

φ_{k} (t)

denotes the instantaneous phase.

\begin{matrix} x (t) = \sum_{k = 1}^{n} u_{k} (t) \end{matrix}

(2)

In Formula (2),

x (t)

is the given continuous signal.

Post VMD decomposition, a sequence of local oscillation modes, denoted as

u_{k} (t)

, is acquired, with each mode representing a localized feature of the signal [23]. These modes are instrumental in analyzing various aspects, such as the frequency spectrum, transient components, oscillatory characteristics, and more. Among these, the intrinsic mode function (IMF) component features a restricted bandwidth, with its central frequency designated as

ω_{k}

, and the spectrum displaying sparsity. By formulating the constraint problem, the resulting constrained variational model and constraint model equations are as follows:

\begin{matrix} \{\begin{matrix} \min \{u_{k}\}, [ω_{k}] \{\sum_{k} ∥ \partial_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j ω_{k} t} ∥_{2}^{2}\} \\ s . t . \sum_{k} u_{k} (t) = f (t) \end{matrix} \end{matrix}

(3)

In Formula (3),

u_{k}

is IMF components,

ω_{k}

is frequency centers,

δ (t)

is a Dirac function,

\partial_{t}

is the partial derivative with respect to time t, and

f (t)

is the original signal. Then, the model is solved, and the penalty factor

α

and Lagrange multiplication operator

λ

are introduced to transform the constrained variational problem into an unconstrained variational problem [24]. The augmented Lagrange expression is obtained as follows:

\{\begin{matrix} \begin{matrix} A = α \sum_{K} | | \partial_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j ω_{k} t} | |_{2}^{2} \\ B = | | f (t) - \sum_{k} u_{k} (t) | |_{2}^{2} \\ C = ⟨λ (t), f (t) - \sum_{k} u_{k} (t) |⟩ \\ L (\{u_{k}\}, \{ω_{k}\}, λ) = A + B + C \end{matrix} \end{matrix}

(4)

In Formula (4), the central frequency and bandwidth of each IMF are constantly updated by the alternating multiplier direction algorithm, and the saddle point of Equation (4) is found as the solution of Equation (3) [25].

2.2. Principle of SABO Algorithm

The fundamental concept behind the SABO involves the mathematical notion of subtracting the average information from the algorithm’s search agent [26]. SABO boasts several advantages, including a potent optimization capability, rapid convergence, robustness, and commendable stability.

From a mathematical standpoint, Formula (5) allows for the representation of the algorithm’s population using a matrix. The primary position of the search agent within the search space is randomly initialized, as described by Formula (6).

\begin{matrix} \begin{matrix} X = [\begin{matrix} X_{1} \\ ⋮ \\ X_{i} \\ ⋮ \\ X_{N} \end{matrix}] = [\begin{matrix} x_{1,1} & \dots & x_{1, d} & \dots & x_{1, m} \\ ⋮ & ⋱ & ⋮ & ⋮ & ⋮ \\ x_{i, 1} & \dots & x_{i, d} & \dots & x_{i, m} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ x_{N, 1} & \dots & x_{N, d} & \dots & x_{N, m} \end{matrix}] \end{matrix} \end{matrix}

(5)

\begin{matrix} \begin{matrix} \begin{matrix} x_{i, d} = l b_{d} + r_{i, d} \cdot (u b_{d} - l b_{d}) \\ i = 1, \dots, N, d = 1, \dots, m # \end{matrix} # \end{matrix} \end{matrix}

(6)

In Formulas (5) and (6),

X

represents the SABO population matrix,

X_{i}

signifies the search agent in the algorithm,

x_{i, d}

denotes the value of the decision variable of the

d

th dimension of the

i

th search agent in the search space,

N

represents the number of search agents,

m

indicates the number of decision variables, and

r_{i, d}

is a random number within the range of [0, 1], while

l b_{d}

and

u b_{d}

represent the lower and upper bounds of the

d

th decision variable, respectively.

The SABO algorithm introduces a new computing concept “-_v”, which is called the v-subtraction of search agent B and search agent a, which is defined as follows:

\begin{matrix} A -_{v} B = s i g n (F (A) - F (B)) (A - \vec{v} * B) \end{matrix}

(7)

In Equation (7),

\vec{v}

is a vector with the dimension of

m

, where the components are random numbers generated from the set {1,2}. The operator represents the Hadamard product of two vectors. And

F (A)

and

F (B)

represent the target function values of the search agent

A

and

B

, respectively;

s i g n

is the symbolic function.

In the SABO algorithm, the displacement of any search agent

X_{i}

in the search space is calculated by the arithmetic mean of the

-_{v}

subtraction of each search agent

X_{j}

. The location is updated as follows:

\begin{matrix} X_{i}^{n e w} = X_{i} + {\vec{r}}_{i} * \frac{1}{N} \sum_{j = 1}^{N} (\begin{matrix} X_{i} & -_{v} & X_{j} \end{matrix}), i = 1,2, \dots, N \end{matrix}

(8)

In Formula (8),

X_{i}^{n e w}

represents the newly proposed location of the

i

th search agent, signifying the updated position of search agent

X_{i}

.

N

is the total number of search agents, representing the complete set of agents or individuals engaged in the search process.

{\vec{r}}_{i}

is a vector with a dimension of

m

, where each component is a normal distribution value drawn from the interval [0, 1].

The replacement formula for particle position is as follows:

\begin{matrix} X_{i} = \{\begin{matrix} X_{i}^{n e w}, F_{i}^{n e w} < F_{i} \\ X_{i}, e l s e \end{matrix} \end{matrix}

(9)

In Equation (9),

F_{i}

represents the target function value of the search agent

X_{i}

and

F_{i}^{n e w}

represents the target function value of the search agent

X_{i}^{n e w}

. The specific flow chart of the algorithm is shown in Figure 1.

2.3. WMH–KNN Algorithm Principle

The KNN method is extensively employed in pattern classification due to its simplicity [27]. The basic idea involves observing the categories of the

K

samples in the sample space that are closest in distance to the sample to be classified. The category that prevails among these

K

samples is then assigned to the sample to be classified. The distance calculation formula between samples is as follows:

\begin{matrix} D_{i} = \sum_{j = 1}^{m} |x_{i j} - y_{i j}| \cdot w_{j} \end{matrix}

(10)

In Formula (10), assuming there are two sample matrices,

X

and

Y

, where

x_{i j}

and

y_{i j}

respectively represent the elements in X and Y,

n

is the number of samples,

m

is the number of features, and the feature weight vector is denoted as

W = [w_{1}, w_{2}, . . ., w_{m}]

, where

w_{j}

is the weight of the

j

th feature.

Determine the nearest neighbor parameter c and find the c-nearest neighbors to the sample to be classified. The formula is as follows:

\begin{matrix} c = {argmax}_{i} D_{i} \end{matrix}

(11)

In Formula (11),

{argmax}_{i}

is the nearest neighbor sample.

The classification sample is as follows:

\begin{matrix} \hat{y} = {argmax}_{c} \sum_{i = 1}^{K} I (y_{i} = c) \end{matrix}

(12)

In Formula (12),

\hat{y}

is the category of the sample to be classified,

y_{i}

is the category of the i-th training sample closest to the sample to be classified,

I (y_{i} = c)

is the indicator function, if

y_{i} = c

, it is 1, otherwise it is 0.

The computational complexity of the WMH–KNN algorithm is mainly affected by the following factors: First, the KNN algorithm itself needs to calculate the distance between the sample to be classified and all training samples. The complexity of this process increases with the size of the training dataset and the increase in feature dimensions. Second, the calculation of the Mahalanobis distance involves the inversion of the covariance matrix, which will increase the computational complexity in high-dimensional feature space. In addition, the Hamming distance is used to process discrete features, and its computational complexity is relatively low, but it will also increase a certain computational burden when performing comprehensive weighted calculations. In general, the WMH–KNN algorithm has a high computational complexity when processing large-scale datasets and high-dimensional feature spaces. It is necessary to comprehensively consider the optimization of the dataset size, feature dimension, and calculation steps to ensure the efficiency and practicality of the algorithm.

3. Diagnosis Model Based on SABO–VMD–WMH–KNN

3.1. SABO–VMD Model

The SABO algorithm is harnessed for the optimization of the parameters

α

and

k

in the VMD process. By employing the composite index of permutation entropy and mutual information entropy as the fitness function, the fitness function is set to the minimum, and the SABO algorithm is used to find parameters. Through continuous iteration and comparison of fitness function values, the minimum value is found to terminate the iteration and obtain the corresponding k and α; that is, the optimal parameters. In conjunction with permutation entropy and mutual information entropy, the performance of VMD can be more comprehensively assessed. This approach enhances the adaptability of the optimization process [28].

The calculating formula of permutation entropy (

P_{e}

) is as follows:

\begin{matrix} P_{e} = \frac{(- \sum_{i = 1}^{k} P_{i} \cdot \lg (P_{i}))}{\lg (m!)} \end{matrix}

(13)

In Formula (13),

k

represents the number of different permutations,

P_{i}

represents the relative frequency of each arrangement, and

m

represents the length of the subsequence.

The permutation entropy can effectively reflect the complexity of the time series, and the permutation entropy can better reflect the regular degree of the time series after normalization [29]. The smaller the permutation entropy, the more regular the time series; on the contrary, the stronger the randomness of the time series.

The calculating formula of mutual information entropy (

{M I}_{e}

) is as follows:

\begin{matrix} \begin{matrix} {M I}_{e} & = - \end{matrix} \sum_{i = 1}^{N} p_{i} \lg p_{i} \end{matrix}

(14)

In Formula (14),

p_{i}

represents the relative frequency of each arrangement.

Mutual information entropy serves as a concept utilized to quantify the extent of system uncertainty or information disarray, commonly employed within the realms of information theory and signal processing [30]. It is typically leveraged to assess the order or disorder present within a signal. In the context of bearing failure, the operational process generates periodic impacts, leading to a more organized signal. This sense of order is often manifested in the reduction in information entropy. Consequently, by scrutinizing the information entropy or mutual information entropy of the signal, it becomes possible to detect any faults or anomalies within the bearing. Such changes in the signal’s order can be employed to diagnose issues and implement appropriate maintenance measures [31].

Permutation entropy and mutual information entropy are amalgamated by establishing a composite index, serving as the fitness function during the optimization process. This composite index takes shape as a weighted combination of permutation entropy and mutual information entropy. The formula for calculating the composite index is as follows:

\begin{matrix} H = ω_{1} \cdot P_{e} + ω_{2} \cdot {M I}_{e} \end{matrix}

(15)

In Formula (15),

ω_{1}

and

ω_{2}

are the weights of permutation entropy and mutual information entropy, where

ω_{1} = ω_{2} = 0.5

. Combining permutation entropy and mutual information entropy as the fitness function to optimize VMD parameters can use the complementarity of the two to improve the comprehensiveness of signal feature extraction and the accuracy of fault detection, thereby achieving better optimization results. The combined fitness function has stronger adaptability to different types of signals and can show better robustness under various complex working conditions. By comprehensively evaluating the complexity of the signal and the degree of information sharing, the error caused by a single method can be reduced, and the overall optimization effect can be improved. This method is not only suitable for a single type of signal analysis but can also be applied to a variety of signal types and application scenarios with wide applicability.

The main task of the VMD algorithm is to decompose the signal into several intrinsic mode functions (IMFs). The number of decomposed modes K has a direct impact on the computational complexity. Generally speaking, the more decomposed modes, the greater the computational effort. VMD requires multiple iterations to converge to a stable solution. The computational steps involved in each iteration include frequency domain transformation and optimization of the signal, and the complexity of these steps is usually determined by the number of signal samples. SABO requires multiple calculations in each iteration to update the position of the search agent. This includes calculating the objective function value, updating the position, and performing weighted calculations. The complexity of each iteration is determined by the number of search agents and the dimensionality of the decision variables. The computational complexity of SABO–VMD is limited by the complexity of VMD and SABO. In practical applications, factors such as the signal length, number of modes, number of optimization iterations, and number of search agents need to be considered.

The steps and process of the SABO–VMD prediction model are shown in Figure 2:

The specific steps for SABO to optimize VMD parameters are as follows:

Parameter initialization. For example, the number of populations,

X

, the number of search agent,

N

dimensions,

d

, the optimization range of parameters, etc.

Compute the

N

th fitness value corresponding to each search agent’s position, and, subsequently, determine the minimum value of the composite index as the fitness function, integrating both the compound index permutation entropy and mutual information entropy.

Update the particle position according to fitness and use Formula (9) to update.

Determine whether the algorithm reaches the maximum number of iterations; if so, the loop ends and outputs the optimal SABO location

[k, α]

and optimal fitness value; if not, return to Step 2.

3.2. SABO–VMD–WMH–KNN Model

After obtaining the IMF components through the SABO–VMD model, the statistical characteristics (mean, standard deviation, kurtosis, and skewness) of each IMF component are calculated. The statistical characteristics can reflect the distribution characteristics and fluctuations of the signal on the frequency scale. These statistical characteristics contain important time–frequency characteristics and information about the signal and can better describe the overall characteristics of the signal.

Suppose the original signal is

x (t)

, and the IMF components decomposed by the SABO–VMD model are where i = 1, 2,..., n, and n is the number of IMF components. For each IMF component

c_{i} (t)

, the following statistical characteristics can be calculated:

Mean (

u_{i}

)

\begin{matrix} u_{i} = (1 / T) * \int c_{i} (t) d t \end{matrix}

(16)

Standard Deviation (

σ_{i}

)

\begin{matrix} σ_{i} = \sqrt{[(1 / T) * \int {(c_{i} (t) - u_{i})}^{2} d t]} \end{matrix}

(17)

kurtosis (

k_{i}

)

\begin{matrix} k_{i} = (1 / T) * \int {(c_{i} (t) - u_{i})}^{4} d t / {σ_{i}}^{4} \end{matrix}

(18)

skewness (

γ_{i}

)

\begin{matrix} γ_{i} = (1 / T) * \int {(c_{i} (t) - u_{i})}^{3} d t / {σ_{i}}^{3} \end{matrix}

(19)

where T is the total time length of the signal.

These statistical features are used as input features to form a feature vector, which can be input into the WMH–KNN model for training and diagnosis. The mean describes the average level of the signal at this frequency scale, which helps to reflect the overall trend of the signal. The standard deviation reflects the degree of fluctuation of the signal at this frequency scale, which helps to identify abnormal fluctuations. Kurtosis and skewness describe the distribution shape of the signal at this frequency scale, which helps to discover the non-Gaussian characteristics of the signal. Combining these statistical features, the time–frequency characteristics of the signal at different frequency scales can be fully characterized, providing valuable input features for subsequent WMH–KNN model training and diagnosis. These features are simple and easy to calculate and have low computational complexity during model training and inference. Compared with directly using the original signal, the statistical features are more robust and less susceptible to noise interference. This method can effectively extract the time–frequency characteristic information of the signal at different frequency scales, providing information-rich input features for subsequent model applications.

Drawing from the theoretical foundations of the aforementioned algorithms, this paper introduces a motor-bearing diagnosis approach based on SABO–VMD–WMH–KNN. The fault diagnosis process is depicted in Figure 3.

The specific flow of the SABO–VMD–WMH–KNN diagnostic model is as follows:

Gather fault signals corresponding to 10 different states. These signals serve as the input data for the diagnostic model.

Combine permutation entropy and mutual information entropy to form a compound index. This composite index is used to evaluate the effectiveness of the VMD parameters. Apply the SABO algorithm to optimize the variational mode decomposition (VMD) parameters, denoted as ([k, α]). Here, k is the number of intrinsic mode functions (IMFs), and α is the balancing parameter.

Utilize the principle of minimum compound index entropy and mutual information entropy to evaluate and select the best IMF component from each of the 10 states based on the minimum value of the compound index. This results in 10 optimal IMF components, one for each state.

Use the 10 selected IMF components as feature vectors. These feature vectors capture the essential characteristics of the fault signals for each state.

Train the WMH–KNN model using the feature vectors derived from the selected IMF components. Perform fault diagnosis using the trained WMH–KNN model. This model classifies new fault signals into one of the 10 states based on the trained feature vectors.

The pseudo-code of the SABO–VMD–WMH–KNN Algorithm 1 is as follows:

Algorithm 1: [26] SABO–VMD–WMH–KNN

Input: Original signal X. K, α, search agents N, number of iterations T, Number of neighbors K, weighting coefficients W.
Output: classified labels Y

1: Function VMD(X, K, α)

2: U, K//Initialize matrix U with K rows and length(x) columns

3: for i = 1:K

4: U(i, :)//Initialize mode U(i, :) with random values

5: return U//Update mode U(i, :)

6: Function SABO(U, N, T)

7: for i = 1:T

8: for i = 1:N

9: F(i), X(i, :)//Calculate objective function F(i) for search agent X(i, :)

10: for i = 1:N

11: Compute mean displacement based on other agents

12: return U

13: Function WMH_KNN(U, TrainingData, K, Weights)

14: for i = 1:TestData

15: W*MH//Compute WMH distance to all training samples

16: K//Voting categories

17: return Y

The following is a discussion of the sensitivity of each component parameter:

SABO–VMD:

Number of search agents: The number of search agents directly affects the search range of the optimization process and the accuracy of the results. More search agents can improve the accuracy of the results but increase the computational complexity.

Number of iterations: The number of iterations determines the depth of the optimization process. Too few iterations may lead to insufficient convergence, while too many iterations will increase the computational time.

Number of decomposition patterns (K): The number of decomposition patterns affects the fineness of signal decomposition. More patterns can capture more subtle signal features but also increase the amount of computation and complexity.

Penalty parameter: The penalty parameter is used to balance the degree of data fitting and the degree of smoothness of the decomposition results. Different penalty parameter settings will affect the quality of the decomposition results.

WMH–KNN:

K value: The K value determines the number of neighbors selected in the KNN algorithm. A larger K value can improve the stability of the classification but may reduce the accuracy of the classification.

Weighting coefficient: The weighting coefficient of the Mahalanobis distance and the Hamming distance affects the comprehensive effect of the distance metric. Different weighting coefficients will affect the accuracy and robustness of the classification results.

4. Experimental Case Analysis 1

4.1. Dataset Collection 1

The faulty Dataset 1 comes from the bearing data acquisition experimental platform of Case Western Reserve University. The experimental platform consists of a two-horsepower motor, a torque encoder, a power tester, and an electronic controller. The experimental platform is shown in Figure 4.

Set the sliding window to 1000; the number of fault sample points of each data is 2048, and the sample size of each fault type is 120. After all the data sliding window is finished, it is integrated into a dataset, and the corresponding labels of the ten states are 1 to 10.

Then, each data is labeled; Table 1 is an introduction to the bearing dataset. In Table 1, the bearing to be tested supports the shaft of the motor. The drive end bearing is SKF6205, the sampling frequencies are 12 KHz and 48 kHz, the fan end bearing is SKF6203, and the sampling frequency is 12 KHz. In this experiment, the bearing data of the drive end is selected, and the sampling frequency is 12 KHz. There are ten states, namely, the normal state (marked T here); the inner ring fault when the diameter is 0.007 inches, and the speed is 1750 (marked IF-7 here); the rolling body fault when the diameter is 0.007, and the speed is 1750 (marked RF-7 here); the outer ring fault when the diameter is 0.007, and the speed is 1750 (marked OF-7 here); inner ring failure when the diameter is 0.014, and the speed is 1750 (marked IF-14 here); roller failure when the diameter is 0.014, and the speed is 1750 (marked RF-14 here); outer ring failure when the diameter is 0.014, and the speed is 1750 (marked RF-14 here); inner ring failure when the diameter is 0.021, and the speed is 1750 (marked IF-21 here); roller failure when the diameter is 0.021, and the speed is 1750 (marked RF-21 here) and outer ring failure when the diameter is 0.021, and the speed is 1750 (marked OF-21 here).

4.2. Optimization Algorithm Model Comparison Results

There are ten fault states in this experiment, with the simulation platform being MATLAB 2019b. The analysis involves five optimization algorithms, namely, the Ant Colony Algorithm (ACO), Grey Wolf Algorithm (GWO), Beluga Algorithm (BWO), Dung Beetle Algorithm (DBO), and the SABO as utilized in this study. The optimization results of the VMD parameters are obtained by employing the composite index permutation entropy and mutual information entropy as the fitness function. Figure 5 illustrates the fitness curves of the aforementioned five algorithms.

It is evident from Figure 4 that SABO yields the most favorable results in optimizing the VMD parameters, achieving the minimum objective function in the tenth iteration. The X-axis represents the number of iterations in the optimization process. Each iteration corresponds to an optimization calculation, and the VMD parameters are gradually adjusted to achieve the optimal result. The Y-axis represents the value of the objective function. The objective function is an evaluation criterion in the optimization process, usually reflecting the error or loss value. The smaller the objective function value, the more optimized the model parameters are. This outcome marks the superior performance of the SABO model in comparison to the other four optimization algorithms.

4.3. Model Optimization Result

The main parameters of VMD are

k

and

α

[30]. Whether the vibration signal decomposition process is completed or not depends on these two main input parameters. The VMD parameters of each fault state are optimized through SABO and the optimal parameters k and α of VMD decomposition of each fault state are obtained by combining entropy H as the optimization index, as shown in Table 2. For the IMF number k, a smaller k results in the inability to completely decompose all useful information in the signal, affecting the accuracy of fault diagnosis. Larger k introduces too much noise and redundant information, increases computational complexity, and may affect the generalization ability of the model. For the equilibrium parameter α, a smaller α causes the decomposed IMF to contain too much noise, which affects the extraction of signal features. The IMF decomposed by a larger α is too smooth and may lose some important detailed information. Decompose the fault signal through VMD and obtain several IMF components. A composite metric (combination of permutation entropy and mutual information entropy) is used to select optimal IMF components. Finally, the best IMF component in each state is selected, and a total of 10 IMF components are used as feature vectors.

In Figure 6, u-1 denotes the optimal IMF component of the first fault type, and similarly, there exists a total of ten optimal IMF components. The X-axis represents the time domain, showing the changes in the signal at different time points, and the Y-axis represents the amplitude of the signal; that is, the signal strength at each time point. Figure 7 shows the IMF component spectrum diagram after optimizing VMD and decomposing it using combined entropy H as an indicator. It can be seen that the peak distinction between each component is obvious, and there is no signal aliasing. The signal is effectively decomposed during the signal decomposition process. The X-axis represents the frequency domain, showing the distribution of the signal at different frequency points, and the Y-axis represents the amplitude of the spectrum; that is, the strength of the signal at the corresponding frequency point.

4.4. Fault Diagnosis Result 1

Using the SABO–VMD–WMH–KNN diagnostic model, the diagnostic confusion matrix of ten faults is shown in Figure 8. The results of the SABO–VMD–WMH–KNN diagnosis are shown in Figure 9, while Figure 10 shows the diagnosis results of each diagnosis model running 20 times. The fault diagnosis result one outcomes are shown in Table 3. The diagnostic results of different signal decomposition methods are shown in Table 4. Figure 11 shows the diagnostic results after running 20 times of different signal decomposition diagnostic models.

As depicted in Table 3, the SABO–VMD–WMH–KNN diagnostic method exhibits the highest accuracy. Among them, the accuracy of the fault diagnosis of the SABO–VMD–KNN model that is not affected by WMH is significantly reduced. It can be seen that WMH–KNN is more adaptable to fault diagnosis classification. However, when compared to SABO–VMD–LSTM, it is noted that the diagnosis time is still relatively longer. Nevertheless, upon comprehensive consideration, the methodology presented in this article proves to be proficient in accurately and promptly identifying faults, boasting an impressive average diagnostic accuracy rate of 97.22%. In Table 4, the fault diagnosis outcomes obtained from the integration of five distinct signal decomposition methods with the WMH–KNN classifier, as proposed in this article, are presented. Notably, the results reveal that the implementation of the SABO–VMD signal decomposition model, as employed in this study, exhibits superior diagnostic accuracy when combined with the WMH–KNN classifier, emerging as the most accurate approach. In Figure 8, faults 1 to 10 correspond to T, IF-7, RF-7, OF-7, IF-14, RF-14, OF-14, IF-21, RF-21, and OF-21, representing ten distinct fault types. Notably, Figure 9 illustrates that, in terms of individual diagnosis, fault three (RF-7) exhibits the lowest accuracy compared to other fault types. Under heavy load conditions, the accuracy of the model decreases due to the nonlinear increase in the signal caused by load changes. However, the overall performance remains commendable. Figure 10 provides a visual representation of the diagnostic accuracy achieved by different methods (SABO–VMD–WMH–KNN, SABO–VMD–SVM, SABO–VMD–ELM, SABO–VMD–LSTM, and SABO–VMD–BiLITM) over 20 iterations. In comparison to the outlined method in this article (SABO–VMD–WMH–KNN), SABO–VMD–WMH–KNN consistently demonstrates the highest average diagnostic accuracy. Figure 11 intuitively illustrates the signal decomposition process using various methods combined with the WMH–KNN classifier (SABO–VMD–WMH–KNN, EEMD–WMH–KNN, EMD–WMH–KNN, SVD–WMH–KNN, ICA-WMH–KNN, EWT–WMH–KNN, and FMD–WMH–KNN). The diagnostic accuracy is obtained based on 20 iterations. Notably, SABO–VMD–WMH–KNN still exhibits the highest average diagnostic accuracy among the methods explored in this study. High accuracy and an F1 score indicate that the model is able to effectively distinguish different fault states. The long calculation time indicates that the optimization process is complex but the performance gains are significant.

From the above analysis, it can be seen that the SABO–VMD–WMH–KNN method proposed in this article shows the best performance in fault diagnosis compared with the baseline technology, which is mainly reflected in its excellent accuracy and ability to process complex signals. Specifically, the average diagnostic accuracy of this method reaches 97.22%, which is significantly higher than other methods, indicating its excellent performance in extracting and classifying fault features. Compared with SABO–VMD–KNN, which does not use WMH, WMH–KNN effectively handles the correlation between features through weighted Mahalanobis distance and improves classification accuracy. Although SABO–VMD–LSTM is shorter in calculation time, its accuracy is lower than SABO–VMD–WMH–KNN, which still maintains high accuracy even when dealing with complex load changes. In addition, SABO–VMD–WMH–KNN performs best among various signal decomposition methods. Although the calculation time is longer, its significant performance improvement proves its practical value in fault diagnosis. Taking into account accuracy and computational efficiency, this technology provides an excellent performance balance.

5. Experimental Case Analysis 2

5.1. Dataset Collection 2

Experimental case two comes from the real-time operating data of the ZJ50DB electric drilling rig drawworks bearing. It uses vibration sensors and rotational speed sensors as sensing data to build a multi-category bear fault detection dataset. The dataset includes five different categories, namely normal status, inner ring failure, outer ring failure, rolling element failure, and cage failure. Each part of the bearing is shown in Figure 12.

The dataset contains 1000 independent data samples, which are divided into training sets and test sets in a ratio of 7:3 to ensure the independence of model training and evaluation. The training set contains a total of 700 data samples, including 500 samples in the normal state and 50 samples in each of the other four gear fault states. The test set contains a total of 300 data samples, including 208 samples in the normal state and 23 samples in each of the other four gear fault states.

5.2. Fault Diagnosis Result 2

With the same processing steps as experimental case one, after the VMD decomposition of Dataset 2, the optimal IMF component of each fault category is obtained, and then the WMH–KNN model is used for classification. The training and evaluation of the model use a variety of machine learning and deep learning algorithms for processing multi-category classification problems. Finally, the test set data are used to evaluate the performance of the model, including the accuracy and running time, to verify the effectiveness of the model in bearing fault detection tasks. Additionally, a confusion matrix analysis was performed to gain a more detailed understanding of the model’s classification performance.

In Figure 13, the diagnostic confusion matrix for the five faults in Dataset 2 is illustrated. Subsequently, Figure 14 displays the results of the SABO–VMD–WMH–KNN diagnostic model, while Figure 15 presents a comparative analysis of different diagnostic models. Each diagnostic model underwent 20 runs to ensure robustness, and the summarized fault diagnosis outcomes are presented in Table 5. Table 6 provides a comprehensive overview of the diagnostic results employing various signal decomposition methods. Additionally, Figure 16 visually represents the aggregated diagnostic outcomes after running each signal decomposition diagnostic model 20 times. This thorough evaluation allows for a detailed understanding of the performance and reliability of the SABO–VMD–WMH–KNN diagnostic model in comparison to other signal decomposition methods. The repetition of runs ensures the stability and consistency of the diagnostic results, contributing to a more comprehensive assessment of the model’s effectiveness.

As shown in Table 5, the SABO–VMD–WMH–KNN diagnosis method still shows the highest accuracy in Dataset 2. However, compared with EMD–WMH–KNN in signal decomposition, we note that the decomposition time is relatively long, but the diagnostic rate of this method is low because the EMD decomposition modes are easy to aliases. All things considered, the method proposed in this article can still accurately and timely identify faults in Dataset 2, with an average diagnosis accuracy of 98.33%. Table 6 lists the fault diagnosis results obtained by applying the five different signal decomposition methods proposed in this article to the WMH–KNN classifier on Dataset 2. The results showed that the SABO–VMD signal decomposition model employed in this study showed excellent diagnostic accuracy when used in conjunction with the WMH–KNN classifier, becoming the most accurate method. In Figure 13 and Figure 14, categories 0 to 4, respectively, correspond to five different fault types: normal status, inner ring failure, rolling element failure, and cage failure. It is worth noting that in terms of individual diagnosis, the normal state is recognized as having the lowest accuracy compared to other fault types. However, the overall performance is still commendable. Figure 15 visually shows the accuracy of 20 iterations of different methods (SABO–VMD–WMH–KNN, SABO–VMD–SVM, SABO–VMD–ELM, SABO–VMD–LSTM, and SABO–VMD–BiLITM). The method outlined in this paper (SABO–VMD–WMH–KNN) consistently showed the highest average diagnostic accuracy. Figure 16 visually illustrates the use of various methods combined with WMH–KNN classifiers (SABO–VMD–WMH–KNN, VMD–WMH–KNN, EEMD–WMH–KNN, EMD–WMH–KNN, SVD–WMH–KNN, ICA-WMH–KNN, EWT–WMH–KNN, and FMD–WMH–KNN). The diagnostic accuracy is obtained based on 20 iterations. Notably, SABO–VMD–WMH–KNN still exhibits the highest average diagnostic accuracy among the methods explored in this study. The accuracy and F1 score are slightly lower than the Case Western Reserve dataset but still perform well, verifying the generalization ability of the model. The calculation time is reduced, indicating that SABO is more efficient in optimizing different datasets.

The SABO–VMD–WMH–KNN method outperforms baseline techniques, particularly in its impressive accuracy and effective fault diagnosis capabilities. Achieving an average diagnostic accuracy of 98.33% in Dataset 2 significantly surpasses other decomposition methods, such as EMD–WMH–KNN, which suffer from lower accuracy due to pattern aliasing. Despite the longer signal decomposition time required by SABO–VMD, its combination with WMH–KNN—which leverages a weighted Mahalanobis distance—enhances classification accuracy well beyond that of traditional KNN methods. Overall, while the SABO–VMD–WMH–KNN method has a longer calculation time, its superior optimization and robust generalization make it the most effective choice for fault diagnosis across various datasets.

6. Experimental Case Analysis 3

6.1. Dataset Collection 3

Dataset 3 is the MPBFDD dataset, with a data acquisition frequency of 10 kHz, a single data length of 5 s, a digital vibration signal time series signal format, and a storage type of CSV file. This dataset collects the vibration signals of the mud pump bearing when the mud pump is running during a well operation, covering vibration signals under different working conditions. The dataset extracts a variety of fault types in the actual operating environment, which is used to verify the generalization ability of the model and help improve the reliability and accuracy of the fault detection algorithm. The working condition types mainly include load conditions, operating speeds, and environmental conditions. The load conditions include a light load, medium load, and heavy load; the operating speeds include a low speed, medium speed, and high speed; the environmental conditions include a normal temperature, low temperature, and high temperature. The fault types extracted from the dataset are a normal state, inner ring wear, outer ring wear, rolling element cracks, and cage damage. Five groups of data files were selected, including normal conditions, a light load, a low speed, and a normal temperature; an inner ring fault, a heavy load, a medium speed, and a high temperature; an outer ring fault, a medium load, a high speed, and a normal temperature; a rolling element fault, a light load, a low speed, and a high temperature; a cage fault, a heavy load, a medium speed, and a normal temperature.

Considering the diversity and sufficiency of each combination, 200 data were collected for each combination. In order to verify the effectiveness and robustness of the SABO–VMD–WMH–KNN method, this dataset not only uses the specific data of downhole mud pump operations but also improves the generalization ability of the dataset by reorganizing the dataset, thereby verifying the versatility of this method on different machines.

6.2. Fault Diagnosis Result 3

In this section, the results of the fault diagnosis using the third dataset are presented, focusing on the effectiveness of the SABO–VMD–WMH–KNN approach. The analysis aims to evaluate the performance of our proposed approach under various operating conditions, including a normal operation, inner race fault, outer race fault, rolling element fault, and cage fault. The test results of the proposed approach will also be compared with those obtained using more advanced baseline methods to fully evaluate the accuracy and robustness of the approach.

The SABO–VMD–WMH–KNN approach is applied and compared in detail with baseline methods such as deep convolutional neural networks (DCNNs), CNN-LSTM, and Transformer models, aiming to determine how the proposed approach performs compared to these state-of-the-art solutions. This detailed comparison will help highlight the strengths of our approach and potential areas for improvement, providing insights into its practical applicability and effectiveness for fault diagnosis in different scenarios.

The diagnosis results are shown below. Each diagnosis model was run 20 times to ensure robustness, and the summarized fault diagnosis results are shown in Table 7. Figure 17 shows the diagnosis confusion matrix of five faults in Dataset 3. Subsequently, Figure 18 shows the results of the SABO–VMD–WMH–KNN diagnosis model, while Figure 19 provides a comparative analysis of different diagnosis models. This comprehensive evaluation provides a detailed understanding of the performance and reliability of the SABO–VMD–WMH–KNN diagnosis model through detailed comparisons with other baseline methods.

As shown in Table 7, the SABO–VMD–WMH–KNN diagnostic method continues to demonstrate the highest diagnostic accuracy in Dataset 3, albeit with a relatively long fault diagnosis time compared to baseline methods such as DCNN and LSTM. While it outperforms DCNN and LSTM in terms of diagnostic time, the overall performance of these baseline methods does not match that of the proposed model. Considering all factors, SABO–VMD–WMH–KNN stands out as the best-performing model in terms of accuracy, making it suitable for tasks requiring extremely high accuracy.

CNN-LSTM and Transformer models offer high accuracy and are well-suited for processing complex time series data, though their computation times are longer. The LSTM model strikes a good balance between accuracy and computation time. However, the performance of DCNN and CNN is relatively low, indicating that they may be more appropriate for other specific tasks.

Therefore, the method proposed in this paper can still accurately and promptly identify faults in Dataset 3, achieving an average diagnostic accuracy of 99.2%. Testing across three datasets has demonstrated the ability of the proposed method to generalize well.

Figure 17 shows the fault diagnosis confusion matrix for the proposed method applied to Dataset 3. The method performs well in detecting the five types of data, especially for normal conditions: a light load, a low speed, a normal temperature, an inner ring fault, a heavy load, a medium speed, and high-temperature scenarios, achieving a diagnosis rate of 100%.

Figure 18 compares the fault diagnosis results with actual results, clearly illustrating the effectiveness of the SABO–VMD–WMH–KNN model under different fault types. The comparison results indicate that the model can accurately distinguish between normal states and various fault types, with diagnosis results highly consistent with actual conditions. The data points in Figure 18 further validate the significant advantage of the SABO–VMD–WMH–KNN model in terms of accuracy.

Figure 19 intuitively displays the accuracy of 20 iterations for different methods (SABO–VMD–WMH–KNN, CNN-LSTM, Transformer, LSTM, DCNN, CNN). As seen in the figure, SABO–VMD–WMH–KNN consistently maintains the highest accuracy across all iterations, with an average accuracy of 99.2%. In contrast, while the accuracy of other methods fluctuates, it generally remains lower than that of SABO–VMD–WMH–KNN. Among them, CNN-LSTM, Transformer, and LSTM perform relatively well but do not exceed 95% accuracy. DCNN and CNN perform relatively poorly, with accuracy fluctuating around 90%. These results further confirm the excellent performance and stability of the SABO–VMD–WMH–KNN method in fault diagnosis.

The model proposed in this paper, compared with benchmark methods through testing on Dataset 3, not only reflects its superior performance but also verifies its generalization ability and versatility across different machines.

7. Conclusions

In this paper, a motor-bearing fault diagnosis method based on SABO–VMD–WMH–KNN was proposed, which is capable of handling mixed vibration signals with multiple frequency components acquired under diverse operational conditions. The approach encompassed three key processes: firstly, employing SABO–VMD to decompose the fault signal into different modes and selecting the optimal mode for each fault type; secondly, utilizing the mean value, variance, peak value, and kurtosis of the optimal mode as feature vectors; thirdly, establishing the fault diagnosis model using the WMH–KNN method to execute the fault diagnosis process, making the characteristics of each fault type more obvious. The primary focus of this method was on addressing intricate working conditions where multiple frequency components are intertwined within vibration signals.

The main contribution of this paper is the introduction of the innovative SABO algorithm, which is applied to optimize VMD parameters and aims to overcome the performance limitations of traditional optimization algorithms in fault signal decomposition. Through SABO optimization, this paper successfully improves the global performance of VMD parameter optimization, making the decomposition of vibration signals more comprehensive and efficient and providing more accurate and powerful modal information for fault diagnosis. Furthermore, the article used WMH–KNN to make the KNN classification method more suitable for fault diagnosis and used weight factors to make the classification more accurate. Finally, the SABO–VMD–WMH–KNN comprehensive fault diagnosis model was formed by cleverly integrating the SABO-optimized VMD and the WMH–KNN method. This integration combined optimized parameter extraction with machine learning classification to provide a more comprehensive and efficient solution for electric drilling rig fault diagnosis.

Therefore, this method is anticipated to find extensive application in real industrial settings, enhancing the reliability, precision, and efficiency of fault diagnosis, thereby extending the longevity of equipment and systems, ultimately making a positive impact on the domain of industrial production. The method also has some limitations. The SABO–VMD–WMH–KNN method involves complex computational steps, including multiple iterations of VMD and calculation of Mahalanobis–Hamming distance, which may lead to a heavy computational burden when processing large-scale datasets. Especially for high-dimensional data and large datasets, the increase in computational complexity may affect the real-time and practical performance of the algorithm. Future research should be committed to optimizing the computational efficiency of the algorithm and exploring methods to reduce computational complexity to improve the robustness of the method.

Author Contributions

Conceptualization, G.L.; Methodology, G.L.; Software, Y.M.; Validation, Y.M.; Formal analysis, G.L.; Investigation, G.L.; Resources, G.L.; Data curation, G.L.; Writing—original draft preparation, N.W.; Writing—review and editing, N.W.; Visualization, G.L.; Supervision, G.L.; Project administration, G.L.; Funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially funded by the Shaanxi Provincial Department of Education Key Laboratory Project “Research on Dynamic Optimization and Intelligent Control of Weight on Bit Based on Automatic Drill Feeding of Oil Drilling Rig”, project number 17JS107.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset 1 is a public dataset from Case Western Reserve University in the United States, and Dataset 2 is a dataset of a drilling rig in China. Due to privacy and confidentiality, it is not convenient to make it public. Dataset 3 is a mud pump dataset of a drilling platform in China. Due to copyright issues, it is not convenient to make it public.

Acknowledgments

We would like to thank the Key Laboratory of Education Department of Shaanxi Province, China for the financial support.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Lu, Q.; Shen, X.; Wang, X.; Li, M.; Li, J.; Zhang, M. Fault Diagnosis of Rolling Bearing Based on Improved VMD and KNN. Math. Probl. Eng. 2021, 2021, 2530315. [Google Scholar] [CrossRef]
Liao, H.; Xie, P.; Deng, S.; Zhang, W.; Shi, L.; Zhao, S.; Wang, H. Research on Early Fault Intelligent Diagnosis for Oil-impregnated Cage in Space Ball Bearing. Expert Syst. Appl. 2023, 14, 121952. [Google Scholar] [CrossRef]
Altaf, M.; Akram, T.; Khan, M.A.; Iqbal, M.; Ch, M.M.I.; Hsu, C.-H. A new statistical features based approach for bearing fault diagnosis using vibration signals. Sensors 2022, 22, 2012. [Google Scholar] [CrossRef] [PubMed]
Islam, M.M.; Kim, J.-M. Reliable multiple combined fault diagnosis of bearings using heterogeneous feature models and multiclass support vector Machines. Reliab. Eng. Syst. Saf. 2019, 184, 55–66. [Google Scholar] [CrossRef]
Tan, C.; Yang, L.; Chen, H.; Xin, L. Fault diagnosis method for rolling bearing based on VMD and improved SVM optimized by METLBO. J. Mech. Sci. Technol. 2022, 36, 4979–4991. [Google Scholar] [CrossRef]
Zhuang, D.; Liu, H.; Zheng, H.; Xu, L.; Gu, Z.; Cheng, G.; Qiu, J. The IBA-ISMO Method for Rolling Bearing Fault Diagnosis Based on VMD-Sample Entropy. Sensors 2023, 23, 991. [Google Scholar] [CrossRef] [PubMed]
Luo, J.; Wen, G.; Lei, Z.; Su, Y.; Chen, X. Weak signal enhancement for rolling bearing fault diagnosis based on adaptive optimized VMD and SR under strong noise background. Meas. Sci. Technol. 2023, 34, 064001. [Google Scholar] [CrossRef]
Ma, J.; Li, H.; Tang, B.; Wang, J.; Zou, Z.; Zhang, M. Rolling bearing fault diagnosis based on improved VMD-adaptive wavelet threshold joint noise reduction. Adv. Mech. Eng. 2022, 14, 238. [Google Scholar] [CrossRef]
Zhenya, Q.; Xueliang, Z. Rolling bearing fault diagnosis based on CS-optimized multiscale dispersion entropy and ML-KNN. J. Braz. Soc. Mech. Sci. Eng. 2022, 44, 430. [Google Scholar] [CrossRef]
Dibaj, A.; Ettefagh, M.M.; Hassannejad, R.; Ehghaghi, M.B. A hybrid fine-tuned VMD and CNN scheme for untrained compound fault diagnosis of rotating machinery with unequal-severity faults. Expert Syst. Appl. 2021, 167, 114094. [Google Scholar] [CrossRef]
Li, Y.; Tang, B.; Jiao, S.; Su, Q. Snake Optimization-Based Variable-Step Multiscale Single Threshold Slope Entropy for Complexity Analysis of Signals. IEEE Trans. Instrum. Meas. 2023, 72, 1–13. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, S.; Cao, R.; Xu, D.; Fan, Y. A Rolling Bearing Fault Diagnosis Method Based on the WOA-VMD and the GAT. Entropy 2023, 25, 889. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Tang, B.; Jiao, S.; Zhou, Y. Optimized multivariate multiscale slope entropy for nonlinear dynamic analysis of mechanical signals. Chaos Solitons Fractals 2024, 179, 114436. [Google Scholar] [CrossRef]
Lei, X.; Lu, N.; Chen, C.; Wang, C. An AVMD-DBN-ELM Model for Bearing Fault Diagnosis. Sensors 2022, 22, 9369. [Google Scholar] [CrossRef]
Jiang, H.; Lu, H.; Zhou, J.; Liu, M. Fault Diagnosis of Rolling Bearings in VMD and GWO-ELM. J. Phys. Conf. Ser. 2023, 2496, 012013. [Google Scholar] [CrossRef]
An, Y.; Zhang, K.; Liu, Q.; Chai, Y.; Huang, X. Rolling bearing fault diagnosis method base on periodic sparse attention and LSTM. IEEE Sens. J. 2022, 22, 12044–12053. [Google Scholar] [CrossRef]
Guo, Y.; Mao, J.; Zhao, M. Rolling bearing fault diagnosis method based on attention CNN and BiLSTM network. Neural Process. Lett. 2023, 55, 3377–3410. [Google Scholar] [CrossRef]
Zhang, M.; Yin, J.; Chen, W. Rolling Bearing Fault Diagnosis Based on Time-Frequency Feature Extraction and IBA-SVM. IEEE Access 2022, 10, 85641–85654. [Google Scholar] [CrossRef]
Kumar, H.; Upadhyaya, G. Fault diagnosis of rolling element bearing using continuous wavelet transform and K-nearest neighbour. Mater. Today Proc. 2023, 92, 56–60. [Google Scholar] [CrossRef]
Sun, D.; Li, Y.; Liu, Z.; Jia, S.; Noman, K. Physics-inspired multimodal machine learning for adaptive correlation fusion based rotating machinery fault diagnosis. Inf. Fusion 2024, 108, 102394. [Google Scholar] [CrossRef]
Al-Haddad, L.A.; Jaber, A.A.; Al-Haddad, S.A.; Al-Muslim, Y.M. Fault diagnosis of actuator damage in UAVs using embedded recorded data and stacked machine learning models. J. Supercomput. 2024, 80, 3005–3024. [Google Scholar] [CrossRef]
Jiang, T.; Li, Y.; Li, S. Multi-fault diagnosis of rolling bearing using two-dimensional feature vector of WP-VMD and PSO-KELM algorithm. Soft Comput. 2022, 27, 8175–8187. [Google Scholar] [CrossRef]
Zheng, J.; Chen, Y.; Pan, H.; Tong, J. Composite multi-scale phase reverse permutation entropy and its application to fault diagnosis of rolling bearing. Nonlinear Dyn. 2022, 111, 459–479. [Google Scholar] [CrossRef]
Dibaj, A.; Hassannejad, R.; Ettefagh, M.M.; Ehghaghi, M.B. Incipient fault diagnosis of bearings based on parameter-optimized VMD and envelope spectrum weighted kurtosis index with a new sensitivity assessment threshold. ISA Trans. 2021, 114, 413–433. [Google Scholar] [CrossRef] [PubMed]
Sharma, V.; Parey, A. Extraction of weak fault transients using variational mode decomposition for fault diagnosis of gearbox under varying speed. Eng. Fail. Anal. 2020, 107, 104204. [Google Scholar] [CrossRef]
Trojovský, P.; Dehghani, M. Subtraction-Average-Based Optimizer: A New Swarm-Inspired Metaheuristic Algorithm for Solving Optimization Problems. Biomimetics 2023, 8, 149. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Zhou, Z.; Li, Z.; Du, S. A Novel Fault Detection Scheme Based on Mutual k-Nearest Neighbor Method: Application on the Industrial Processes with Outliers. Processes 2022, 10, 497. [Google Scholar] [CrossRef]
Nassef, M.; Hussein, T.M.; Mokhiamar, O. An adaptive variational mode decomposition based on sailfish optimization algorithm and Gini index for fault identification in rolling bearings. Measurement 2021, 173, 108514. [Google Scholar] [CrossRef]
Vashishtha, G.; Chauhan, S.; Singh, M.; Kumar, R. Bearing defect identification by swarm decomposition considering permutation entropy measure and opposition-based slime mould algorithm. Measurement 2021, 178, 109389. [Google Scholar] [CrossRef]
Kumar, A.; Zhou, Y.; Xiang, J. Optimization of VMD using kernel-based mutual information for the extraction of weak features to detect bearing defects. Measurement 2021, 168, 108402. [Google Scholar] [CrossRef]
Pang, B.; Nazari, M.; Tang, G. Recursive variational mode extraction and its application in rolling bearing fault diagnosis. Mech. Syst. Signal Process. 2022, 165, 108321. [Google Scholar] [CrossRef]

Figure 1. SABO algorithm flow chart.

Figure 2. SABO–VMD optimization model.

Figure 3. SABO–VMD–WMH–KNN bearing fault diagnosis model.

Figure 4. Case Western Reserve University bearing data collection platform.

Figure 5. Fitness curves of five optimization algorithms.

Figure 6. The mean eigenvector of the best IMF for each fault state.

Figure 7. VMD decomposition spectrogram.

Figure 8. Confusion matrix of fault diagnosis 1.

Figure 9. SABO–VMD–WMH–KNN fault diagnosis result 1.

Figure 10. Fault diagnosis results of different methods.

Figure 11. Fault diagnosis results of different signal decomposition methods.

Figure 12. Various parts of bearings.

Figure 13. Confusion matrix of fault diagnosis 2.

Figure 14. SABO–VMD–WMH–KNN fault diagnosis result 2.

Figure 15. Fault diagnosis results of different methods 2.

Figure 16. Fault diagnosis results of different signal decomposition methods 2.

Figure 17. Confusion matrix of fault diagnosis 3.

Figure 18. SABO–VMD–WMH–KNN fault diagnosis result 3.

Figure 19. Fault diagnosis results of different methods 3.

Table 1. Introduction of bearing dataset.

Bearing Failure Type	Label	Training Set Data	Test Set Data
T	1	840 × 2048	360 × 2048
IF-7	2	840 × 2048	360 × 2048
RF-7	3	840 × 2048	360 × 2048
OF-7	4	840 × 2048	360 × 2048
IF-14	5	840 × 2048	360 × 2048
RF-14	6	840 × 2048	360 × 2048
OF-14	7	840 × 2048	360 × 2048
IF-21	8	840 × 2048	360 × 2048
RF-21	9	840 × 2048	360 × 2048
OF-21	10	840 × 2048	360 × 2048

Table 2. Decomposition of optimal K and α values for each fault type.

Bearing Failure Type	$K$ Value	$α$
T	8	1435
IF-7	6	645
RF-7	7	924
OF-7	7	675
IF-14	8	854
RF-14	6	1245
OF-14	9	832
IF-21	5	956
RF-21	7	1109
OF-21	8	984

Table 3. Fault diagnosis result 1.

Model	Average Accuracy (%)	Time (s)
SABO–VMD–WMH–KNN	97.22	1684.72
SABO–VMD–SVM	95.53	1698.42
SABO–VMD–ELM	94.72	1822.52
SABO–VMD–KNN	93.54	1634.35
SABO–VMD–LSTM	92.31	1582.34
SABO–VMD–BiLITM	91.24	1723.53

Table 4. Diagnostic results of different signal decomposition methods.

Model	Average Accuracy (%)	Time (s)
SABO–VMD–WMH–KNN	97.22	1684.72
FMD–WMH–KNN	94.37	1838.57
EEMD–WMH–KNN	93.73	1976.65
EMD–WMH–KNN	92.65	1735.73
EWT–WMH–KNN	92.58	1759.36
SVD–WMH–KNN	92.31	1656.76
ICA-WMH–KNN	91.24	1836.24
VMD–WMH–KNN	89.48	1535.67

Table 5. Fault diagnosis result 2.

Model	Average Accuracy (%)	Time (s)
SABO–VMD–WMH–KNN	98.33	1256.23
SABO–VMD–SVM	96.89	1378.56
SABO–VMD–ELM	95.39	1542.23
SABO–VMD–LSTM	93.78	1428.61
SABO–VMD–BiLITM	92.76	1563.89

Table 6. Diagnostic results of different signal decomposition methods 2.

Model	Average Accuracy (%)	Time (s)
SABO–VMD–WMH–KNN	98.33	1256.23
EEMD–WMH–KNN	95.69	1572.49
FMD–WMH–KNN	95.66	1325.64
EMD–WMH–KNN	94.38	1139.25
ICA-WMH–KNN	93.29	1286.46
EWT–WMH–KNN	93.14	1251.23
VMD–WMH–KNN	92.66	1123.89
SVD–WMH–KNN	91.59	1278.59

Table 7. Fault diagnosis result 3.

Model	Average Accuracy (%)	Time (s)
SABO–VMD–WMH–KNN	99.2	1962.85
DCNN	92.8	1897.28
CNN-LSTM	94.5	2016.43
Transformer	94.2	1987.46
CNN	91.5	1935.74
LSTM	93.2	1823.65

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, G.; Ma, Y.; Wang, N. Rolling Bearing Fault Diagnosis Based on SABO–VMD and WMH–KNN. Sensors 2024, 24, 5003. https://doi.org/10.3390/s24155003

AMA Style

Liu G, Ma Y, Wang N. Rolling Bearing Fault Diagnosis Based on SABO–VMD and WMH–KNN. Sensors. 2024; 24(15):5003. https://doi.org/10.3390/s24155003

Chicago/Turabian Style

Liu, Guangxing, Yihao Ma, and Na Wang. 2024. "Rolling Bearing Fault Diagnosis Based on SABO–VMD and WMH–KNN" Sensors 24, no. 15: 5003. https://doi.org/10.3390/s24155003

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Rolling Bearing Fault Diagnosis Based on SABO–VMD and WMH–KNN

Abstract

1. Introduction

2. Theoretical Basis

2.1. Principle of VMD Algorithm

2.2. Principle of SABO Algorithm

2.3. WMH–KNN Algorithm Principle

3. Diagnosis Model Based on SABO–VMD–WMH–KNN

3.1. SABO–VMD Model

3.2. SABO–VMD–WMH–KNN Model

4. Experimental Case Analysis 1

4.1. Dataset Collection 1

4.2. Optimization Algorithm Model Comparison Results

4.3. Model Optimization Result

4.4. Fault Diagnosis Result 1

5. Experimental Case Analysis 2

5.1. Dataset Collection 2

5.2. Fault Diagnosis Result 2

6. Experimental Case Analysis 3

6.1. Dataset Collection 3

6.2. Fault Diagnosis Result 3

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI