CEEMDAN-IHO-SVM: A Machine Learning Research Model for Valve Leak Diagnosis

Wang, Ruixue; Zhao, Ning

doi:10.3390/a18030148

Open AccessArticle

CEEMDAN-IHO-SVM: A Machine Learning Research Model for Valve Leak Diagnosis

by

Ruixue Wang

and

Ning Zhao

^*

School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(3), 148; https://doi.org/10.3390/a18030148

Submission received: 20 January 2025 / Revised: 17 February 2025 / Accepted: 4 March 2025 / Published: 5 March 2025

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

Due to the complex operating environment of valves, when a fault occurs inside a valve, the vibration signal generated by the fault is easily affected by the environmental noise, making the extraction of fault features difficult. To address this problem, this paper proposes a feature extraction method based on the combination of Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) and Fuzzy Entropy (FN). Due to the slow convergence speed and the tendency to fall into local optimal solutions of the Hippopotamus Optimization Algorithm (HO), an improved Hippopotamus Optimization (IHO) algorithm-optimized Support Vector Machine (SVM) model for valve leakage diagnosis is introduced to further enhance the accuracy of valve leakage diagnosis. The improved Hippopotamus Optimization algorithm initializes the hippopotamus population with Tent chaotic mapping, designs an adaptive weight factor, and incorporates adaptive variation perturbation. Moreover, the performance of IHO was proven to be optimal compared to HO, Particle Swarm Optimization (PSO), Grey Wolf Optimization (GWO), Whale Optimization Algorithm (WOA), and Sparrow Search Algorithm (SSA) by calculating twelve test functions. Subsequently, the IHO-SVM classification model was established and applied to valve leakage diagnosis. The prediction effects of the seven models, IHO-SVM. HO-SVM, PSO-SVM, GWO-SVM, WOA-SVM, SSA-SVM, and SVM were compared and analyzed with actual data. As a result, the comparison indicated that IHO-SVM has desirable robustness and generalization, which successfully improves the classification efficiency and the recognition rate in fault diagnosis.

Keywords:

Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN); improved hippopotamus optimization (IHO); SVM; valve leakage

1. Introduction

As an essential component in life, valves mainly play a significant role in regulating flow direction, controlling the opening and closing of pipelines, and ensuring effective sealing performance, which directly impacts their functionality. During valve operation, internal leakage poses a greater challenge than external leakage due to its concealed nature, making accurate diagnosis of internal leakage particularly crucial. Currently, methods for detecting internal leakage in valves can be categorized into two main types: offline detection methods, which require disassembly of the valve, and online detection methods, which do not necessitate disassembly. Online methods include acoustic emission detection [1,2], negative-pressure wave detection [3], and ultrasonic detection [4]. For instance, Au-Yang, M. K. [5] employed ultrasonic waves to assess the leakage in check valves; however, ultrasonic signals are susceptible to interference from external noise and require active excitation signals to generate detectable responses, resulting in limited real-time performance. Liu et al. [6,7] utilized negative-pressure waves for pipeline leakage localization; their detection capability diminishes at low leakage rates, leading to reduced accuracy. In contrast, the acoustic emission detection method offers greater convenience, adaptability, real-time responsiveness, and sensitivity.

In recent years, numerous scholars have engaged in comprehensive research regarding the extraction of acoustic emission signal features associated with valve leakage. This body of work encompasses extensive investigations across the time domain, frequency domain, and time–frequency domain. In the time domain, Ye et al. [8] introduced an acoustic emission signal analysis method predicated on standard deviation to identify internal valve leakage. They established a mathematical model by fitting the relationship between the standard deviation and leakage rate, utilizing the least squares method. Regarding time–frequency analysis, Sim H.Y [9] employed wavelet packet transformation to decompose the signal into various frequency ranges, subsequently calculating the root mean square (RMS) value and assessing valve issues based on fluctuations in the RMS value. Additionally, Liang et al. [10] utilized wavelet scattering transform (WST) to extract the first three wavelet scattering coefficients from leakage signals, which were then employed as feature vectors. The leakage acoustic emission signal from the valve exhibits nonlinear and nonstationary characteristics, which may diminish the efficacy of the previously mentioned statistical features in differentiating fault states. Adaptive time–frequency analysis methods, such as Ensemble Empirical Mode Decomposition (EEMD) [11], and Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) [12], have been increasingly employed by researchers to address certain limitations of earlier signal processing techniques. In light of the fact that the emission signal from the valve can be easily influenced by environmental noise, we propose using CEEMDAN to decompose the signal. This approach aims to extract useful signals while minimizing the impact of noise. By selecting the top five IMFs based on their correlation coefficients, we can focus on the most relevant frequency bands associated with valve internal leakage. To further enhance feature extraction, fuzzy entropy is employed to quantify the complexity and irregularity of these selected IMFs, providing robust features for classification.

In the realm of fault classification, the support vector machine (SVM) [13,14,15] is extensively utilized as an effective classification tool. SVM demonstrates a high level of accuracy in fault identification and classification, even when faced with limited data samples owing to its robust learning capabilities. In light of its strong adaptability, low classification error, straightforward feature vector, and ability to work with small datasets, SVM was chosen as the classifier. The kernel functions of SVM can further augment classification performance in nonlinear scenarios. Among the most frequently employed kernel functions is the Radial Basis Function (RBF) [16,17], which typically achieves favorable classification outcomes.

The Hippopotamus Optimization Algorithm (HO) [18] simulates the defense and evasion strategies of the hippopotamus while optimizing these strategies through location updates. This algorithm demonstrates exceptional performance in enhancing accuracy, improving local search capabilities, and exhibiting strong practicality. Nevertheless, the HO algorithm still holds significant research potential for enhancing global search capabilities, strengthening local development abilities, and preventing convergence to local optimal solutions. Notably, IHO is particularly adept at addressing issues related to local optima and slow convergence rates. The contributions of this paper are outlined as follows:

(1) To address the challenges posed by the non-stationary and nonlinear characteristics of valve leakage signals, the CEEMDAN method was proposed for signal decomposition. Subsequently, the fuzzy entropy of the decomposed Intrinsic Mode Function (IMF) signals was computed, allowing for the extraction of an initial feature set from the signals;

(2) A novel intelligent search algorithm IHO was introduced to enhance the kernel parameters of SVM in order to achieve improved performance. Consequently, the IHO-SVM model was developed;

(3) This paper presented the development of a valve fault diagnosis model referred to as CEEMDAN-IHO-SVM, which is capable of accurately extracting and diagnosing fault characteristics.

The subsequent sections of this article are organized as follows. Section 2 delineates the pertinent methods that have been utilized and enhanced within this study. Section 3 presents the experimental results along with a discussion of their implications. Finally, Section 4 provides a summary of the findings and contributions of this work.

2. Materials and Methods

2.1. The CEEMDAN Algorithm

Empirical Mode Decomposition (EMD) [19] and its optimization methods are adaptive processing techniques well-suited for analyzing nonlinear, unsteady signals. These methods rely on the characteristic time scale of the signal itself, allowing for decomposition into multiple IMFs and a residual component that reflects the overall trend of the signal. EMD optimization methods, such as EEMD, Complete Ensemble Empirical Mode De-composition (CEEMD), and CEEMDAN, are all derived from EMD, inheriting its advantages, including intuitiveness and adaptability.

CEEMDAN is an advanced ensemble empirical mode decomposition algorithm that incorporates adaptive noise. It builds upon the EMD and EEMD algorithms to address limitations inherent in these earlier methods. During EMD decomposition, waveform aliasing often occurs, leading to modal aliasing—a phenomenon where signal interactions are difficult to distinguish. EEMD was introduced as a noise-assisted data analysis method to mitigate modal aliasing in EMD by adding white noise to the original signal in each iteration before performing EMD decomposition. CEEMDAN further refines this approach by decomposing both the original signal plus white noise and the original signal minus white noise using EMD, followed by averaging the resulting IMFs to eliminate added noise, thereby enhancing decomposition accuracy and stability. However, residual white noise can still affect subsequent processing and analysis in both EEMD and CEEMD. CEEMDAN addresses this issue by more effectively suppressing noise interference and reducing the number of iterations and computational complexity.

The subsequent section illustrates the decomposition process of CEEMDAN applied to the signals.

First, Gaussian white noise is added to the original signals.

x_{i} (t) = x (t) + ε δ_{i} (t)

(1)

where

ε

is the Gaussian white noise weighting factor and

δ_{i} (t)

is the Gaussian white noise.

Next, EMD decomposition of the sequence

x_{i} (t)

is performed, and the average value of the first modal components obtained from the decomposition is taken as the first-order component of CEEMDAN.

I M F_{1} (t) = \frac{1}{K} \sum_{i - 1}^{K} I M F_{1}^{i} (t)

(2)

r_{1} (t) = x (t) - I M F_{1} (t)

(3)

where

I M F_{1} (t)

(Intrinsic Mode Function) denotes the first modal component obtained from the CEEMDAN decomposition and

r_{1} (t)

denotes the residual signal after the first decomposition.

After introducing specific Gaussian white noise to the

j

th order residual signal obtained from the decomposition process, the EMD decomposition is further applied to derive a new component and residual signal.

I M F_{j} (t) = \frac{1}{K} \sum_{i - 1}^{K} E_{1} (r_{j - 1} (t) + ε_{j - 1} E_{j - 1} (δ_{i} (t)))

(4)

r_{j} (t) = r_{j - 1} (t) - I M F_{j} (t)

(5)

where

I M F_{j} (t)

denotes the

j

th order modal component;

E_{j - 1} (.)

denotes the

(j - 1)

th

I M F

component after EMD decomposition;

ε_{j - 1}

denotes the weight coefficient of the noise added by CEEMDAN for the (

j - 1)

th iteration; and

r_{j} (t)

denotes the

j

th stage’s residual signal.

The decomposition process of the CEEMDAN algorithm concludes when the residual signals obtained from EMD become monotonic or meet other predetermined criteria.

The flowchart of CEEMDAN decomposition is shown in Figure 1.

2.2. Fuzzy Entropy

As nonlinear technologies evolve, many nonlinear dynamic methods based on estimations of statistical parameters have been applied to extract fault characteristics. The entropy features derived from information entropy serve as effective tools for characterizing nonlinear properties, which primarily include sample entropy [20], approximate entropy [21], and fuzzy entropy [22], among others. Approximate entropy offers a framework for analyzing the complexity of finite time series; however, it is a statistical measure that quantifies the regularity of a time series, which is characterized by poor statistical stability [23]. In instances where the time series is excessively brief, sample entropy utilizes a jumping self-similarity function to measure the complexity of the time series, resulting in inaccurate estimated values in real applications [24]. Sample entropy has lower calculation efficiency, especially for long time series [25], and may yield inaccurate entropy estimates or result in undefined entropy values [26]. Fuzzy entropy is a standardized metric for evaluating the uncertainty and complexity of fuzzy sets. It has been demonstrated that fuzzy entropy surpasses sample entropy in several respects, particularly due to its enhanced robustness to noise and its greater suitability for the analysis of short and noisy time series [27].

Fuzzy entropy is derived from sample entropy, utilizing the degree of membership of elements within a fuzzy set as the probability density function to compute the entropy value. Subsequently, the fuzzy entropy value is determined in accordance with the principles of information entropy. Like approximate entropy and sample entropy, fuzzy entropy aims to quantify the likelihood of new patterns emerging within a sequence. Specifically, a higher fuzzy entropy value signifies an increased probability of the emergence of new patterns, thereby indicating a greater complexity within the sequence. This attribute renders fuzzy entropy a vital instrument for the analysis of uncertainty and complexity in the dynamic evolution of complex systems. The fundamental principles underlying fuzzy entropy are outlined below:

(1): Perform phase space reconstruction on a set of time series $X$ of length $N$ .

$\begin{matrix} Y (i) = [x (i), x (i + 1), \dots, x (i + m - 1)] \\ - x_{0} (i), i = 1, 2, \dots, N - m + 1 \end{matrix}$

(6)

$x_{0} (i) = \frac{1}{m} \sum_{j = 0}^{m - 1} x (i + j)$

(7)
(2): Define the distance between $Y (i)$ and $Y (j)$ .

$\begin{matrix} d_{i, j}^{m} = d [Y (i), Y (j)] = \\ \max_{k \in (0, m - 1)} |(x (i + k) - x_{0} (i)) - (x (j + k) - x_{0} (i))| \end{matrix}$

(8)
(3): Introduce fuzzy affiliation degree to measure the similarity between $Y (i)$ and $Y (j)$ .

$D_{i, j}^{m} (n, r) = f (d_{i, j}^{m}, n, r)$

(9)

$f (d_{i, j}^{m}, n, r) = e x p (- {(d_{i, j}^{m})}^{n} / r)$

(10)
(4): Define the $\emptyset^{m}$ function.

$\begin{matrix} \emptyset^{m} (n, r) = \\ \frac{1}{N - m} \sum_{i = 1}^{N - m} (\frac{1}{N - m - 1} \sum_{j = 1, j \neq i}^{N - m} D_{i, j}^{m}) \end{matrix}$

(11)
(5): Similarly, define the $\emptyset^{m + 1}$ function.

$\begin{matrix} \emptyset^{m + 1} (n, r) = \\ \frac{1}{N - m} \sum_{i = 1}^{N - m} (\frac{1}{N - m - 1} \sum_{j = 1, j \neq i}^{N - m} D_{i, j}^{m + 1}) \end{matrix}$

(12)
(6): If the length $N$ of the dataset is finite, the $F u z z y E n (m, n, r, N)$ function is shown below.

$F u z z y E n (m, n, r, N) = \ln \emptyset^{m} (n, r) - \ln \emptyset^{m + 1} (n, r)$

(13)

2.3. Hippopotamus Optimization Algorithm (HO)

HO is a swarm intelligence optimization algorithm inspired by the social behavior of hippopotamuses, as proposed by Mohammad Hussein Amiri et al. [18] in 2024. This algorithm seeks to identify optimal solutions to optimization problems by simulating the positional update, defensive strategies against predators, and evasive maneuvers of hippopotamuses in aquatic environments such as rivers or ponds. The optimization process is delineated as follows.

(1): Population initialization

The population of hippopotamuses can be mathematically represented using a matrix. Each hippopotamus’s position corresponds to a potential solution, while the updates to its position reflect the values of the decision variables.

\begin{matrix} X_{i} : x_{i j} = l l_{j} + r \cdot (u l_{j} - l l_{j}), \\ i = 1, 2, \dots, N, j = 1, 2, \dots, m \end{matrix}

(14)

where

l l_{j}

and

u l_{j}

denote the lower and upper bounds and

r

is a random number in the range of 0 to 1.

(2): The hippopotamus’s position is updated in the river or pond (Exploration).

Hippo herds consist of female hippos, immature hippos, male hippos, and dominant male hippos. Adult males may be expelled from the herd upon reaching maturity. Equation (15) describes the position of male hippos.

X_{i}^{M h i p p o} : x_{i j}^{M h i p p o} = x_{i j} + y_{1} \cdot (D h i p p o - I_{1} x_{i j})

(15)

where

I_{1}

is an integer between 1 and 2,

y_{1}

is a random number between 0 and 1,

D h i p p o

denotes the position of the dominant hippopotamus.

Equations (16) and (17) describe the position of the female or immature hippos in the herb.

\begin{matrix} X_{i}^{F B h i p p o} : x_{i j}^{F B h i p p o} = \\ \{\begin{matrix} x_{i j} + h_{1} \cdot (D h i p p o - I_{2} M G_{i}), T^{'} > 0.6 \\ Ξ, e l s e \end{matrix} \end{matrix}

(16)

Ξ = \{\begin{matrix} x_{i j} + h_{2} (M G_{i} - D h i p p o), r_{6} > 0.5 \\ l l_{j} + r_{7} \cdot (u l_{j} - l l_{j}), e l s e \end{matrix}

(17)

h = \{\begin{matrix} I_{2} \times \vec{r_{1}} + (~ Q_{1}) \\ 2 \times \vec{r_{2}} - 1 \\ \vec{r_{3}} \\ I_{1} \times \vec{r_{4}} + (~ Q_{2}) \\ r_{5} \end{matrix}

(18)

T^{'} = e x p (\frac{t}{T})

(19)

where

h_{1}

and

h_{2}

are numbers or vectors randomly selected from the five scenarios in the

h

Equation (18);

I_{2}

is an integer between 1 and 2;

M G_{i}

refers to the mean values of some randomly selected hippopotamus with an equal probability of including the current considered hippopotamus;

\vec{r_{1}} - \vec{r_{4}}

is a random vector between 0 and 1,

r_{5}

,

r_{6}

and

r_{7}

are random numbers between 0 and 1; and

Q_{1}

and

Q_{2}

are integer random numbers that can be one or zero.

(3): Hippopotamus defense against predators (Exploration).

The defensive mechanism employed by hippos when confronted with a threat involves turning aggressively to confront the predator and producing a loud vocalization.

\begin{matrix} P r e d a t o r : P r e d a t o r_{j} = l l_{j} + \vec{r_{8}} (u l_{j} - l l_{j}), \\ j = 1, 2, \dots, m \end{matrix}

(20)

\vec{D} = |P r e d a t o r_{j} - x_{i j}|

(21)

where

{\vec{r}}_{8}

represents a random vector ranging from zero to one.

Equation (20) represents the position of the predator and Equation (21) represents the distance from the hippopotamus to the predator. The mathematical model of this behavior is shown in Equation (22).

\begin{matrix} X_{i}^{H i p p o R} : x_{i j}^{H i p p o R} = \\ \{\begin{matrix} \vec{R L} \oplus P r e d a t o r_{j} + \\ (\frac{f}{(c - d \times \cos (2 π g))}) \cdot (\frac{1}{\vec{D}}), F_{P r e d a t o r_{j}} < F_{i} \\ \vec{R L} \oplus P r e d a t o r_{j} + \\ (\frac{f}{(c - d \times \cos (2 π g))}) \cdot (\frac{1}{2 \times \vec{D} + \vec{r_{9}}}), F_{P r e d a t o r_{j}} \geq F_{i} \end{matrix} \end{matrix}

(22)

where

\vec{R L}

is a random vector with levy distribution,

c

is a uniform random number between 1 and 1.5 and

d

is a uniform random number between 2 and 3,

f

is a uniform random number between 2 and 4,

g

represents a uniform random number between −1 and 1, and

\vec{r_{9}}

is a random vector with dimensions

1 \times m

.

(4): Hippopotamus escaping from the predator (Exploitation).

When hippos encounter a group of predators or are unable to effectively repel the threat through defensive measures, they adopt an escape strategy and opt to vacate the current area. The positional update is shown as follows:

l l_{j}^{l o c a l} = \frac{l l_{j}}{t}, u l_{j}^{l o c a l} = \frac{u l_{j}}{t}, t = 1, 2, \dots, T

(23)

\begin{matrix} X_{i}^{H i p p o ε} : x_{i j}^{H i p p o ε} = x_{i j} + r_{10} \cdot \\ (l l_{j}^{l o c a l} + ϵ_{1} \cdot (u l_{j}^{l o c a l} - l l_{j}^{l o c a l})) \end{matrix}

(24)

ϵ_{1}

choose randomly from three scenarios:

ϵ = \{\begin{matrix} 2 \times \vec{r_{11}} - 1 \\ \vec{r_{12}} \\ r_{13} \end{matrix}

(25)

where

r_{10}

and

r_{13}

are random numbers between 0 and 1,

\vec{r_{11}}

denotes a random vector between 0 and 1, and

\vec{r_{12}}

is a normally distributed random number.

2.4. Improved Hippopotamus Optimization Algorithm (IHO)

2.4.1. Tent Chaotic Mapping

The chaotic properties, characterized by randomness, ergodicity, and extreme sensitivity to initial conditions, present opportunities for enhancing convergence in algorithm design. Specifically, the ergodicity of sequences produced by chaotic systems guarantees that all states can be explored without repetition within a specified range, which constitutes a significant advantage in the optimization search process. However, the chaotic sequences generated by traditional logistic mapping exhibit a high probability of assuming values at both extremes of the sequence. This results in an uneven distribution of values, which may constrain the efficiency of the search process.

In contrast, Tent mapping [28] is capable of generating chaotic sequences that exhibit a more uniform distribution within the interval [0, 1]. This characteristic is particularly significant for the initial population construction in optimization algorithms. Utilizing tent mapping for population initialization ensures that the initial solutions are distributed as evenly as possible throughout the solution space. This approach enhances both the comprehensiveness and efficiency of the search process, thereby establishing a solid foundation for subsequent optimization efforts.

The initialization of the hippo population involves the use of chaotic tent sequences. The specific steps can be summarized as follows.

(1): Use the mathematical expression of Tent chaotic mapping to generate chaotic sequences. Tent mapping is a simple nonlinear dynamical system whose mathematical expression is shown in Equation (26). The value of $a$ influences the distribution of chaos, which subsequently impacts the generation of the initial population. The value of $a$ is 0.5. When a = 0.5, it is evenly distributed, and the chaotic sequence also exhibits an even distribution. As different parameters change, it maintains a stable distribution density, allowing for the generation of a robust initial population.

$x_{n + 1} = f (x_{n}) = \{\begin{matrix} x_{n} / a, x_{n} \in [0, a) \\ (1 - x_{n}) / (1 - a), x_{n} \in [a, 1] \end{matrix}$

(26)
(2): Transform the generated chaotic sequence into the range of the hippo population search space, where $z$ is the generated Tent chaotic sequence.

$\begin{matrix} X_{i} : x_{i j} = l l_{j} + z \cdot (u l_{j} - l l_{j}), \\ i = 1, 2, \dots, N, j = 1, 2, \dots, m \end{matrix}$

(27)

2.4.2. Adaptive Weighting Factor

Inspired by the particle swarm optimization algorithm [29], the concept of weight is incorporated into the process of updating positions. Traditional methods that utilize a linear weight factor frequently result in suboptimal search outcomes and constrained optimization capabilities, primarily due to their inherent limitations and the constraints imposed during the search process. To address this deficiency, we propose a novel adaptive weight factor designed to dynamically modulate the balance between exploration and exploitation within the algorithm. The formulas for updating the position of the hippo and the corresponding weight factor are articulated as follows:

X_{i}^{M h i p p o} : x_{i j}^{M h i p p o} = x_{i j} + w \cdot (D h i p p o - I_{1} x_{i j})

(28)

w = 1 - e^{- {[1.7 \cos (\frac{π t}{2 T})]}^{5}}, t = 1, 2, \dots, T

(29)

where

T

is the maximum number of iterations and

t

is the current number of iterations. Parameters 5 and 1.7 in the equation are the optimal values determined through extensive experimental simulations. The exponential decay model offers a smooth and gradual method for change. In contrast to linear decay or other abrupt transitions, exponential decay aligns more closely with the natural progression required in the optimization process.

In the initial stage, the large weight factor enables the algorithm to conduct extensive explorations within the search space. In the later stage, the smaller weight factor allows the algorithm to concentrate on promising regions, thereby accelerating convergence. A large initial weight can enhance the diversity of solutions and prevent the algorithm from becoming trapped in local optima during the early stages of optimization. Additionally, the exponential decay method ensures that the weight factor does not abruptly decrease to a minimal value, thereby preserving a degree of exploratory capability throughout the entire optimization process. Figure 2 illustrates the value of the weight factor w.

2.4.3. Adaptive Mutation Perturbation

In the third stage of the hippo algorithm, when hippos evade predators, the position update may encounter the challenge of local optima due to the constraints imposed by the random factor

r_{10}

. To address this issue, this paper introduces a perturbation mechanism that combines Cauchy variation and Gaussian variation. Each time the population is updated, the individuals are mutated and perturbed based on the iterative stage. The adaptive mutation strategy can dynamically adjust the ratio of Cauchy to Gaussian variations based on the different stages of the optimization process or the quality of the current solution. In the early stages, the proportion of Cauchy variants is increased to enhance exploration capabilities. In the later stages, Gaussian mutation is emphasized to improve development efficiency.

X_{i}^{H i p p o ε} : x_{i j}^{H i p p o ε} = \frac{t}{T} {x_{i j}}_{G a u s s i a n} + \frac{1 - t}{T} {x_{i j}}_{c a u c h y}

(30)

{x_{i j}}_{G a u s s i a n} = x_{i j} (1 + G a u s s i a n (σ))

(31)

{x_{i j}}_{c a u c h y} = x_{i j} (1 + C a u c h y (0, 1))

(32)

In the initial phase of the algorithm’s iteration, that is, when

t

is relatively small, we employ Cauchy mutation [30,31] as the primary mode of mutation and assign it a significant weight. The Cauchy distribution is recognized for its long-tail characteristic, which enables the generation of extreme values that are distant from the mean. This property allows for larger mutation steps within the population.

As the number of iterations increases, the weight of the Cauchy mutation is gradually diminished while the weight of the Gaussian mutation [32,33,34] is correspondingly increased. Gaussian mutation is recognized for its capacity to fine-tune searches around the mean value, thereby facilitating a more in-depth and precise exploration within the range of candidate solutions that have been examined.

The flowchart for the improved hippo optimization algorithm are presented in Figure 3.

2.5. Support Vector Machine (SVM)

As a powerful classification and regression model, SVM has been widely used across various domains. Ali S M et al. [35] introduced an automated method for detecting valve leakage based on acoustic emission parameters and SVM, achieving an accuracy rate exceeding 98%. Li Z et al. [36] employed kernel principal component analysis (KPCA) in conjunction with SVM classifiers to ascertain leakage levels, attaining an accuracy rate of 96.75%. Ni L et al. [37] utilized particle swarm optimization combined with SVM (PSO-SVM) for the intelligent detection of water supply pipeline leaks, demonstrating superior performance compared to backpropagation neural networks (BPNN). Guo et al. [38] proposed an SV-WTBSVM method for intelligent water supply pipeline leakage detection. Their findings indicated that this algorithm not only preserved the rapid training speed characteristic of TBSVM but also enhanced both classification accuracy and generalization capability. Wang et al. [39] suggested the application of a convolutional neural network (CNN) model for fault feature extraction and dimensionality reduction, subsequently inputting these features into an SVM model for diagnostic purposes. The CNN-SVM model exhibited superior accuracy relative to other classification models. Originally, SVM was developed for binary classification challenges. However, when addressing multi-class classification issues, it becomes essential to devise an appropriate multi-class classifier. Several strategies for multi-class classification can be employed, including “one-to-one” or “one-to-many”, both of which aim to convert the multi-class classification problem into a sequence of binary classification tasks.

When addressing challenges associated with SVM, it is important to note that if the data are not linearly separable, different kernel functions can be employed in combination to classify nonlinear data. These kernel functions facilitate the mapping of data into a high-dimensional feature space, thereby rendering the data linearly separable A selection of commonly utilized kernel functions in SVM is presented in Table 1.

Among the various kernel functions, the RBF kernel is one of the most frequently employed. In comparison to other kernel functions, the RBF kernel function exhibits a superior capacity for nonlinear feature mapping, increased model complexity, and the ability to accommodate more intricate models across a broader spectrum of applications. It is characterized by two significant input parameters:

C

: The regularization parameter, which serves as an indicator of the model’s tolerance to error. A higher value of this parameter indicates a lower tolerance for errors, which may result in overfitting. Conversely, a value that is too low can lead to underfitting. Both excessively large and small values of the regularization parameter can adversely affect the model’s generalization capability.

σ

: The kernel function parameter, which plays a crucial role in determining the distribution of data once they have been mapped into a new feature space. A low value for this parameter tends to result in a greater clustering of data points, whereas a high value necessitates that points should be in close proximity to one another to be classified as belonging to the same group. This latter scenario frequently contributes to the phenomenon of overfitting.

Consequently, to acquire suitable parameters and enhance the classification capability of the model, IHO is employed to optimize the two input parameters

C

and

σ

of SVM.

2.6. SVM Parameter Optimization Based on IHO

In this paper, IHO was employed to optimize the parameters

C

and

σ

of SVM. The specific steps involved in this process are outlined as follows:

(1): Initialize the parameters for IHO. Specify the population number $N$ , the maximum number of iterations $T$ , and the upper and lower boundary parameters;
(2): Select the fitness function. Use the results of five-fold cross-validation on the training set to assess the model’s performance and utilize it as the fitness function;
(3): Initialize the population using Tent chaotic mapping;
(4): Calculate the initial fitness value of the population and sort to identify the optimal fitness value;
(5): According to the formulas pertinent to the corresponding exploration stage, the fitness value of the new position of the hippo is calculated. This value is then compared with the optimal fitness value from the previous iteration to update the position of the optimal individual within the hippopotamus population;
(6): Determine whether the maximum number of iterations has been reached. If this condition is met, the optimal value should be outputted and the algorithm should be terminated. If not, then revert to step (5) to continue the iterative loop;
(7): The optimal regularization parameter $C$ and kernel parameter $σ$ are determined and subsequently input into the SVM model for the diagnostic classification of the test set.

3. Results

3.1. IHO Performance Simulation Test

To comprehensively assess the efficacy of the IHO algorithm, additional algorithms, HO, PSO, GWO, WOA, and SSA, have been selected for comparative analysis. CEC 2005 is a widely adopted set of standard benchmarks that has a significant influence in the field of evolutionary computing and optimization. Many research papers utilize these functions for experimental validation. Simulation experiments were performed on four twelve benchmark test functions, which included two six unimodal functions, two four multimodal functions, and two fixed-dimensional multimodal functions, to evaluate both the global optimization performance and the capacity to jump out of the local optimal. The formulations, types of search ranges, and the theoretical optimal values of the benchmark test functions are presented in Table 2.

To ensure the fairness of the experiments, in each experiment, the population size for each algorithm was consistently set to 30, with a total of 300 iterations conducted. Each algorithm was executed independently 30 times. To clearly demonstrate the superior performance of the IHO algorithm, three mathematical indicators were selected for evaluation: standard deviation, mean value, and optimal solution. The standard deviation serves to illustrate the stability of the algorithm, the mean value reflects the convergence speed, and the optimal solution indicates the accuracy of the algorithm. The results of the independent runs are presented in Table 3, while the iterative optimization processes of the various algorithms are depicted in e.

As shown in Table 3, IHO demonstrated a significant advantage in optimization accuracy when evaluated against the single-peak benchmark function. IHO successfully identified the optimal parameter of 0 within the function

f_{1} - f_{4}

. In comparison to the HO, GWO, and PSO algorithms, IHO exhibited higher optimization accuracy, with results that are closer to the optimal value of the objective function

f_{5}

and

f_{6}

. Additionally, the standard deviation of IHO was considerably smaller than that of the other optimization algorithms, indicating greater stability.

For the multimodal function, it is essential to identify the global optimal solution among numerous local optimal solutions. This process tests the algorithm’s global search capability within a high-dimensional solution space and its ability to escape local minima. In function

f_{7}

−

f_{9}

and

f_{11}

, IHO found the optimal parameters, while in function

f_{12}

, the solution derived from IHO was closer to the optimal solution compared to those obtained by other algorithms. In summary, IHO not only significantly enhances search accuracy but also stabilizes the optimization results.

In order to illustrate the optimization capabilities of each algorithm more clearly, Figure 4 presents the convergence curves for the ten benchmark functions. As depicted in Figure 4, the convergence rate of IHO was markedly superior to that of the other algorithms across all ten test functions. The convergence curve of IHO demonstrated a significantly steeper decline, which directly indicates its proficiency in rapidly approximating the optimal solution. This notable advantage can be attributed to the incorporation of Tent mapping, which enhances the algorithm’s exploration capabilities, allowing for a more comprehensive investigation of potential optimal solution regions within the search space. Additionally, the implementation of an adaptive weighting factor and adaptive mutation perturbation effectively balances the exploration and exploitation capabilities of the algorithm, thereby further expediting the convergence process.

In conclusion, the incorporation of Tent mapping, an adaptive weighting factor, and adaptive mutation perturbation into HO significantly accelerated the convergence rate of the algorithm while also greatly enhancing its optimization accuracy. This integration allows IHO to demonstrate greater competitiveness and adaptability in addressing optimization.

3.2. Valve Leakage Diagnostic Experiments

We collected acoustic emission signals from internal valve leakage under varying pressure intensities and valve opening degrees for subsequent classification experiments. The data acquisition platform comprises two components. First, we used the internal valve leakage experimental platform, which integrates adjustable valves and pressure gauges to ensure stable and accurate simulation of various leakage scenarios. The valves used are manual gate valves with diameters of 200 mm, 300 mm, and 500 mm. The second component is the acoustic emission acquisition system, which includes an acoustic emission sensor, a preamplifier, a high-speed data acquisition card, and acoustic emission signal acquisition software, thereby enabling real-time signal collection. The acoustic emission sensor utilized in this study is the GI40, manufactured by Qingcheng Company, which operates within a frequency range of 15 kHz to 80 kHz. The data acquisition card employed is the USB-3200 from Altai Company, which is capable of achieving a maximum sampling frequency of 500 kHz. The valve used in the experiments is a ball valve with a diameter of 15 mm. Figure 5 shows the components of the experimental platform.

After the system was set up, debugging began. The valve to be tested was closed, the pressure-reducing valve was opened, and the pressure was adjusted at the valve inlet. Once the pressure gauge stabilized at 0.2 MPa, the experiment commenced. First, the signal was recorded when the valve was in the closed position; then, the valve was gradually opened, adjusting the opening incrementally. The data on the valve scale were documented and the acoustic emission signals of valve leakage at openings of 5%, 10%, 15%, 20%, and 25% were collected. After completing this set of experiments, the pressure at the valve inlet was increased to 0.4 MPa, 0.6 MPa, and 0.8 MPa, and the experiment was repeated. After finishing the experiment, valves with different diameters were selected and subjected to the aforementioned method to continue the experiments until all valve leakage detection tests were completed. The overall frame diagram is shown in Figure 6.

According to the literature, the acoustic emission signals from internal valve leaks ranged from 10 kHz to 50 kHz. Therefore, the sampling frequency of the signal processing terminal should be set to 500 kHz, in accordance with the Nyquist sampling theorem, to effectively reconstruct the signal. Additionally, the recording duration was set to 2 s to ensure that 1,000,000 data points were collected each time. Taking the data under the diameter of 200 mm and the pressure of 0.2 MPa as an example, the fault classification and labels are shown in Table 4.

The CEEMDAN algorithm was employed to decompose the original signals, thereby yielding the IMFs corresponding to each state. Given that the number of IMFs generated through decomposition varies across different states, the Pearson correlation coefficients between the original signal and the resulting IMFs were computed. The top five IMFs, based on the correlation coefficients, were selected for further analysis. The calculation of the Pearson correlation coefficient is defined as follows:

σ

is the standard deviation,

E

is the mean value, and

c o v

is the covariance.

ρ_{X, Y} = \frac{c o v (X, Y)}{σ_{X} σ_{Y}} = \frac{E (X Y) - E (X) E (Y)}{σ_{X} σ_{Y}}

(33)

The fuzzy entropy values of the IMFs corresponding to each fault type form a

100 \times 5

fault feature matrix. In this study, 80% of the samples for each fault type were designated as training samples while the remaining 20% were utilized as test samples. Four common machine learning classification methods were selected to classify the data, and the results are shown in Table 5.

According to the results in Table 5, the classification effect of SVM was better than that of the other three methods, so the SVM classifier was selected. The number of hippo populations was set to 10, with a maximum iteration limit of 100. Additionally, the ranges for both

C

and

σ

were set to

[2^{- 10}, 2^{10}]

. At the same time, to assess the efficacy of the IHO-SVM in valve fault diagnosis classification, a comparative analysis was conducted against HO-SVM, PSO-SVM, GWO-SVM, WOA-SVM, and SSA-SVM models.

Two datasets with the lowest accuracy were selected as examples for optimization analysis. Table 6 below shows the selected sample data. The valve internal leakage fault diagnosis results obtained after optimizing the SVM parameters using different algorithms are shown in Table 7 and Table 8 below.

In Figure 7 and Figure 8, in terms of classification accuracy, the IHO algorithm outperformed other methods in detecting both fault categories, whether on the training set or the test set. For the detection of the first fault type, the classification accuracy with IHO optimization improved by 7% compared to the results obtained without any optimization algorithm. The accuracy of IHO was higher by 1%, 3%, 3%, 3%, and 6% compared to HO, PSO, GWO, WOA, and SSA, respectively. Although the results of IHO were less stable than those of HO, in the experiments conducted ten times, its classification accuracy consistently surpassed that of the other three optimization algorithms.

In detecting the second fault type, the classification accuracy with IHO optimization increased by 8% compared to the results obtained without optimization. The accuracy of IHO was higher by 3%, 3%, 4%, 3%, and 6% when compared to HO, PSO, GWO, WOA, and SSA, respectively. In the experiments conducted ten times in Figure 9 and Figure 10, the results of IHO consistently outperformed the other three optimization algorithms.

4. Conclusions

In order to accurately classify valve internal leakage faults, a diagnostic model based on the Improved Hippo algorithm-optimized Support Vector Machine (IHO-SVM) is proposed. The HO was improved by using three strategies, and the search efficiency and performance of the algorithm were improved. The superior performance of the IHO algorithm was demonstrated by testing twelve standard test functions. Then, the IHO algorithm was used to optimize the parameters of SVM in the diagnosis of valve internal leakage. Compared with the SVM optimized by HO, GWO, PSO, WOA, SSA, and the traditional SVM model, in terms of accuracy, the IHO-SVM model had a higher classification accuracy and classification effect. Comparing the accuracy of HO-SVM, IHO-SVM, PSO-SVM, GWO-PSO, WOA-SVM, and SSA-SVM running independently 10 times, the IHO-SVM model was, again, superior in terms of its high classification accuracy and good stability.

Author Contributions

Conceptualization, N.Z. and R.W.; methodology, N.Z. and R.W.; software, R.W.; validation, R.W.; formal analysis, N.Z.; investigation, N.Z. and R.W.; resources, N.Z.; data curation, R.W.; writing—original draft preparation, R.W.; writing—review and editing, R.W.; visualization, R.W.; supervision, N.Z.; project administration, N.Z.; funding acquisition, N.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the lab’s policy and confidentiality agreement.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CEEMDAD	Complete Ensemble Empirical Mode Decomposition with Adaptive Noise
FN	Fuzzy entropy
SVM	Support Vector Machine
HO	Hippopotamus Optimization
IHO	Improved Hippopotamus Optimization
PSO	Particle Swarm Optimization
GWO	Grey Wolf Optimization
WOA	Whale Optimization Algorithm
SSA	Sparrow Search Algorithm
RMS	Root mean square
WST	Wavelet scattering transform
EEMD	Ensemble Empirical Mode Decomposition
EMD	Empirical Mode Decomposition
KPCA	Kernel principal component analysis
BPNN	Back propagation neural networks
CNN	Convolutional neural network
RBF	Radial Basis Function
IMF	Intrinsic Mode Function

References

Shukri, I.N.B.M.; Rosdiazli, B.I. Implementation of Acoustic Emission technique in early detection of control valve seat leakage. In Proceedings of the STA 2014—15th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering, Hammamet, Tunisia, 21–24 December 2014; pp. 541–546. [Google Scholar] [CrossRef]
Scruby, C.B. An introduction to acoustic emission. J. Phys. E Sci. Instrum. 1987, 20, 946. [Google Scholar] [CrossRef]
Tian, C.H.; Yan, J.C.; Huang, J.; Wang, Y.; Kim, D.-S.; Yi, T. Negative pressure wave based pipeline Leak Detection: Challenges and algorithms. In Proceedings of the 2012 IEEE International Conference on Service Operations and Logistics, and Informatics, SOLI, Suzhou, China, 8–10 July 2012; pp. 372–376. [Google Scholar] [CrossRef]
Yu, Y.; Safari, A.; Niu, X.; Drinkwater, B.; Horoshenkov, K.V. Acoustic and ultrasonic techniques for defect detection and condition monitoring in water and sewerage pipes: A review. Appl. Acoust. 2021, 183, 108282. [Google Scholar] [CrossRef]
Au-Yang, M.K. Acoustic and ultrasonic signals as diagnostic tools for check valves. J. Press. Vessel. Technol. Trans. ASME 1993, 115, 135–141. [Google Scholar] [CrossRef]
Liu, B.; Jiang, Z.; Nie, W.; Ran, Y.; Lin, H. Research on leak location method of water supply pipeline based on negative pressure wave technology and VMD algorithm. Measurement 2021, 186, 110235. [Google Scholar] [CrossRef]
Li, J.; Zheng, Q.; Qian, Z.; Yang, X. A novel location algorithm for pipeline leakage based on the attenuation of negative pressure wave. Process Saf. Environ. Prot. 2019, 123, 309–316. [Google Scholar] [CrossRef]
Ye, G.-Y.; Xu, K.-J.; Wu, W.-K. Standard deviation based acoustic emission signal analysis for detecting valve internal leakage. Sens. Actuators A Phys. 2018, 283, 340–347. [Google Scholar] [CrossRef]
Sim, H.Y.; Ramli, R.; Saifizul, A.A.; Abdullah, M.A.K. Empirical investigation of acoustic emission signals for valve failure identification by using statistical method. Measurement 2014, 58, 165–174. [Google Scholar] [CrossRef]
Liang, L.-P.; Zhang, J.; Xu, K.-J.; Ye, G.-Y.; Yang, S.-L.; Yu, X.-L. Classification modeling of valve internal leakage acoustic emission signals based on optimal wavelet scattering coefficients. Measurement 2024, 236, 115112. [Google Scholar] [CrossRef]
Wu, Z.; Huang, N.E. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar] [CrossRef]
Torres, M.E.; Colominas, M.A.; Schlotthauer, G.; Flandrin, P. A complete ensemble empirical mode decomposition with adaptive noise. In Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing—Proceedings, Prague, Czech Republic, 22–27 May 2011; pp. 4144–4147. [Google Scholar] [CrossRef]
Burges, C.J. A Tutorial on Support Vector Machines for Pattern Recognition. Data Min. Knowl. Discov. 1998, 2, 121–167. [Google Scholar] [CrossRef]
Yue, S.; Li, P.; Hao, P. SVM classification: Its contents and challenges. Appl. Math. J. Chin. Univ. 2003, 18, 332–342. [Google Scholar] [CrossRef]
Qu, Z.; Feng, H.; Zeng, Z.; Zhuge, J.; Jin, S. A SVM-based pipeline leakage detection and pre-warning system. Measurement 2010, 43, 513–519. [Google Scholar] [CrossRef]
Liu, Q.; Chen, C.; Zhang, Y.; Hu, Z. Feature selection for support vector machines with RBF kernel. Artif. Intell. Rev. 2011, 36, 99–115. [Google Scholar] [CrossRef]
Han, S.; Cao, Q.; Han, M. Parameter selection in SVM with RBF kernel function. In Proceedings of the World Automation Congress, Puerto Vallarta, Mexico, 24–28 June 2012. [Google Scholar]
Amiri, M.H.; Mehrabi Hashjin, N.; Montazeri, M.; Mirjalili, S.; Khodadadi, N. Hippopotamus optimization algorithm: A novel nature-inspired optimization algorithm. Sci. Rep. 2024, 14, 5032. [Google Scholar] [CrossRef] [PubMed]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
Richman, J.S.; Lake, D.E.; Moorman, J.R. Sample entropy. Methods Enzymol. 2004, 384, 172–184. [Google Scholar] [CrossRef] [PubMed]
Pincus, S.M. Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. USA 1991, 88, 2297–2301. [Google Scholar] [CrossRef] [PubMed]
Chen, W.; Wang, Z.; Xie, H.; Yu, W. Characterization of Surface EMG Signal Based on Fuzzy Entropy. IEEE Trans. Neural Syst. Rehabil. Eng. 2007, 15, 266–272. [Google Scholar] [CrossRef]
Yan, R.; Gao, R.X. Approximate Entropy as a diagnostic tool for machine health monitoring. Mech. Syst. Signal 2007, 21, 824–839. [Google Scholar] [CrossRef]
Zheng, J.; Cheng, J.; Yang, Y.; Luo, S. A rolling bearing fault diagnosis method based on multi-scale fuzzy entropy and variable predictive model-based class discrimination. Mech. Mach. Theory 2014, 78, 187–200. [Google Scholar] [CrossRef]
Li, Y.; Yang, Y.; Li, G.; Xu, M.; Huang, W. A fault diagnosis scheme for planetary gearboxes using modified multi-scale symbolic dynamic entropy and mRMR feature selection. Mech. Syst. Signal Process. 2017, 91, 295–312. [Google Scholar] [CrossRef]
Zheng, J.; Pan, H.; Cheng, J. Rolling bearing fault detection and diagnosis based on composite multiscale fuzzy entropy and ensemble support vector machines. Mech. Syst. Signal Process. 2017, 85, 746–759. [Google Scholar] [CrossRef]
Xie, H.-B.; Chen, W.-T.; He, W.-X.; Liu, H. Complexity analysis of the biomedical signal using fuzzy entropy measurement. Appl. Soft Comput. J. 2011, 11, 2871–2879. [Google Scholar] [CrossRef]
Zhang, H.; Pan, Y.; Zhang, J.; Dai, K.; Feng, Y. Tent chaos and nonlinear convergence factor whale optimization algorithm. Int. J. Innov. Comput. Inf. Control 2021, 17, 687–700. [Google Scholar] [CrossRef]
Chatterjee, A.; Siarry, P. Nonlinear inertia weight variation for dynamic adaptation in particle swarm optimization. Comput. Oper. Res. 2006, 33, 859–871. [Google Scholar] [CrossRef]
Wang, H.; Li, C.; Liu, Y.; Zeng, S. A hybrid particle swarm algorithm with cauchy mutation. In Proceedings of the 2007 IEEE Swarm Intelligence Symposium, SIS, Honolulu, HI, USA, 1–5 April 2007; pp. 356–360. [Google Scholar] [CrossRef]
Zhao, X.; Fang, Y.; Liu, L.; Xu, M.; Li, Q. A covariance-based Moth–flame optimization algorithm with Cauchy mutation for solving numerical optimization problems. Appl. Soft Comput. 2022, 119, 108538. [Google Scholar] [CrossRef]
Higashi, N.; Iba, H. Particle swarm optimization with Gaussian mutation. In Proceedings of the 2003 IEEE Swarm Intelligence Symposium, Indianapolis, IN, USA, 26–26 April 2003; pp. 72–79. [Google Scholar]
Lan, K.-T.; Lan, C.-H. Notes on the distinction of gaussian and cauchy mutations. In Proceedings of the 8th International Conference on Intelligent Systems Design and Applications, ISDA 2008, Kaohsuing, Taiwan, 26–28 November 2008; pp. 272–277. [Google Scholar] [CrossRef]
Zhang, X.; Xu, Y.; Yu, C.; Heidari, A.A.; Li, S.; Chen, H.; Li, C. Gaussian mutational chaotic fruit fly-built optimization and feature selection. Expert Syst. Appl. 2020, 141, 112976. [Google Scholar] [CrossRef]
Ali, S.M.; Hui, K.H.; Hee, L.M.; Leong, M.S. Automated valve fault detection based on acoustic emission parameters and support vector machine. Alex. Eng. J. 2018, 57, 491–498. [Google Scholar] [CrossRef]
Li, Z.; Zhang, H.; Tan, D.; Chen, X.; Lei, H. A novel acoustic emission detection module for leakage recognition in a gas pipeline valve. Process Saf. Environ. Prot. 2017, 105, 32–40. [Google Scholar] [CrossRef]
Ni, L.; Jiang, J.; Pan, Y. Leak location of pipelines based on transient model and PSO-SVM. J. Loss Prev. Process Ind. 2013, 26, 1085–1093. [Google Scholar] [CrossRef]
Guo, X.; Song, H.; Zeng, Y.; Chen, H.; Hu, W.; Liu, G. An intelligent water supply pipeline leakage detection method based on SV-WTBSVM. Meas. Sci. Technol. 2024, 35, 46125. [Google Scholar] [CrossRef]
Wang, X.; Meng, R.; Wang, G.; Liu, X.; Liu, X.; Lu, D. The research on fault diagnosis of rolling bearing based on current signal CNN-SVM. Meas. Sci. Technol. 2023, 34, 125021. [Google Scholar] [CrossRef]

Figure 1. The flowchart of CEEMDAN.

Figure 2. Value of w.

Figure 3. IHO’s flowchart.

Figure 4. Convergence curves of the four algorithms.

Figure 5. Experimental platform device.

Figure 6. Overall frame diagram.

Figure 7. The iteration curve of diameter 200 mm-0.2 MPa.

Figure 8. The iteration curve of diameter 200 mm-0.8 MPa.

Figure 9. Comparison of classification accuracy of four models on 10 separate occasions.

Figure 10. Comparison of classification accuracy of four models on 10 separate occasions.

Table 1. Kernel functions.

Kernel Function Type	Formula
Polynomial	$k = (x_{i}, x_{j}) = {(x_{i}, x_{j} + 1)}^{d}$
Gaussian	$k = (x, y) = e x p (- \frac{∥ x - y ∥^{2}}{2 σ^{2}})$
Laplace RBF	$k (x, y) = e x p (- γ - \frac{∥ x - y ∥}{σ})$
Gaussian RBF	$k (x_{i}, x_{j}) = e x p (- γ ∥ x_{i} - x_{j} ∥^{2})$
Sigmoid	$k (x, y) = \tanh (a x^{T} y + c)$
Hyperbolic Tangent	$k (x_{i}, x_{j}) = \tanh (k x_{i} x_{j} + c)$

Table 2. Introduction to the four benchmark functions.

Test Function	Search Scope	Typology	Dimension	Optimal Solution
$f_{1} (x) = \sum_{i = 1}^{n} x_{i}^{2}$	$[- 100, 100]$	unimodal	30	0
$f_{2} (x) = \sum_{i = 1}^{n} \|x_{i}\| + \prod_{i = 1}^{n} \|x_{i}\|$	$[- 10, 10]$	unimodal	30	0
$f_{3} (x) = \sum_{i = 1}^{n} {(\sum_{j = 1}^{i} x_{j})}^{2}$	$[- 100, 100]$	unimodal	30	0
$f_{4} (x) = m a x_{i} \{\|x_{i}\|, 1 \leq i \leq n\}$	$[- 100, 100]$	unimodal	30	0
$f_{5} (x) = \sum_{i = 1}^{n - 1} [100 {(x_{i + 1} - {x_{i}}^{2})}^{2} + {(x_{i} - 1)}^{2}]$	$[- 30, 30]$	unimodal	30	0
$f_{6} (x) = \sum_{i = 1}^{n} {(\|x_{i} + 0.5\|)}^{2}$	$[- 100, 100]$	unimodal	30	0
$f_{7} (x) = \sum_{i = 1}^{n} - x_{i} s i n (\sqrt{\|x_{i}\|})$	$[- 500, 500]$	multimodal	30	−12,569.487
$f_{8} (x) = - 20 e x p (- 0.2 \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {x_{i}}^{2}} - e x p (\frac{1}{n} \sum_{i = 1}^{n} c o s 2 π x_{i})) + 20 + e$	$[- 32, 32]$	multimodal	30	0
$f_{9} (x) = \frac{1}{4000} \sum_{i = 1}^{n} {x_{i}}^{2} - \prod_{i = 1}^{n} c o s (\frac{x_{i}}{\sqrt{i}}) + 1$	$[- 600, 600]$	multimodal	30	0
$f_{10} (x) = 0.1 \{s i n^{2} {(3 π x_{1} + \sum_{i = 1}^{n} {(x_{i} - 1)}^{2} [1 + s i n^{2} (3 π x_{i} + 1)])}^{2} [1 + s i n^{2} (2 π x_{n})]\} + \sum_{i = 1}^{n} u (x_{i}, 5, 100, 4)$	$[- 50, 50]$	multimodal	30	0
$f_{11} (x) = \sum_{i = 1}^{11} {[a_{i} - \frac{x_{1} (b_{i}^{2} + b_{i} x_{2})}{b_{i}^{2} + b_{i} x_{3} + x_{4}}]}^{2}$	[−5, 5]	multimodal	4	0.0003075
$f_{12} (x) = 4 x_{1}^{2} + 2.1 x_{1}^{4} + \frac{1}{3} x_{1}^{6} + x_{1} x_{2} - 4 x_{2}^{2} + 4 x_{2}^{4}$	[−5, 5]	multimodal	2	−1.0316285

Table 3. Experimental comparison of four algorithms on test functions.

Test Function	Algorithm	Standard Deviation	Mean Value	Optimal Solution
$f_{1}$	IHO	0.00	0.00	0.00
	HO	1.62 × 10⁻²⁶⁷	1.45 × 10⁻²⁴⁹	0.00
	PSO	6.17 × 10⁻⁹	6.37 × 10⁻⁶	1.73 × 10⁻⁵
	GWO	1.21 × 10⁻¹⁶	4.88 × 10⁻¹⁵	6.75 × 10⁻¹⁵
	WOA	9.04 × 10⁻¹⁰	3.28 × 10⁻¹⁰	1.11 × 10⁻¹¹
	SSA	1.23	2.25	2.37 × 10⁻¹
$f_{2}$	IHO	0.00	1.55 × 10⁻³⁰⁰	0.00
	HO	9.97 × 10⁻¹²¹	1.04 × 10⁻¹⁰⁹	4.79 × 10⁻¹⁰⁹
	PSO	3.14 × 10⁸	5.53 × 10¹³	2.25 × 10¹⁴
	GWO	3.10 × 10⁻¹⁰	1.39 × 10⁻⁹	7.39 × 10⁻¹⁰
	WOA	9.54 × 10⁻⁷	7.43 × 10⁻⁷	4.39 × 10⁻⁸
	SSA	1.80	5.62	3.64
$f_{3}$	IHO	0.00	1.55 × 10⁻³⁰⁰	0.00
	HO	1.46 × 10⁻²³⁴	1.41 × 10⁻²¹⁷	0.00
	PSO	6.75 × 10⁴	9.09 × 10⁵	1.03 × 10⁴
	GWO	8.19 × 10⁻⁴	5.45 × 10⁻²	8.81 × 10⁻²
	WOA	1.30 × 10²	5.04 × 10¹	4.57 × 10⁻¹
	SSA	2.76	7.07	2.95
$f_{4}$	IHO	0.00	1.19 × 10⁻³⁰⁸	0.00
	HO	4.93 × 10⁻¹²⁰	2.93 × 10⁻¹¹⁰	1.33 × 10⁻¹⁰⁹
	PSO	1.00 × 10²	1.00 × 10²	0.00
	GWO	9.66 × 10⁻⁵	8.49 × 10⁻⁴	6.34 × 10⁻⁴
	WOA	1.04 × 10⁻¹	8.19 × 10⁻²	1.42 × 10⁻³
	SSA	1.94 × 10⁻¹	6.86 × 10⁻¹	4.47 × 10⁻¹
$f_{5}$	IHO	9.33 × 10⁻⁸	1.01 × 10¹	1.36 × 10¹
	HO	3.84 × 10⁻⁴	2.09 × 10⁻¹	3.98 × 10⁻¹
	PSO	2.93 × 10¹⁰	4.40 × 10¹⁰	9.03 × 10⁹
	GWO	2.57 × 10¹	2.74 × 10¹	7.58 × 10⁻¹
	WOA	8.74 × 10⁻¹	2.76 × 10¹	2.64 × 10¹
	SSA	2.44 × 10²	2.95e+02	8.64 × 10¹
$f_{6}$	IHO	3.39 × 10⁻⁶	6.45 × 10⁻²	2.83 × 10⁻¹
	HO	1.98 × 10⁻⁵	1.38 × 10⁻²	1.34 × 10⁻²
	PSO	6.50 × 10⁴	8.23 × 10⁴	1.07 × 10⁴
	GWO	4.27 × 10⁻¹	1.11	4.48 × 10⁻¹
	WOA	3.33 × 10⁻¹	1.41	7.71 × 10⁻¹
	SSA	2.07	5.96	2.55
$f_{7}$	IHO	2.11 × 10⁻²	−12,569.476	-12,569.486
	HO	3.71 × 10³	−19,586.873	-12,569.456
	PSO	4.41 × 10²	−6596.708	−7393.536
	GWO	7.65 × 10²	−6214.004	−7257.907
	WOA	4.11 × 10²	−7108.932	−7624.764
	SSA	6.01 × 10²	−6142.217	−7168.349
$f_{8}$	IHO	4.44 × 10⁻¹⁶	4.44 × 10⁻¹⁶	0.00
	HO	4.44 × 10⁻¹⁶	4.44 × 10⁻¹⁶	0.00
	PSO	1.99 × 10¹	1.99 × 10¹	2.78 × 10⁻⁹
	GWO	3.31 × 10⁻⁹	1.28 × 10⁻⁸	5.13 × 10⁻⁹
	WOA	2.03 × 10⁻⁵	1.16 × 10⁻⁵	2.39 × 10⁻⁷
	SSA	3.30 × 10⁻¹	2.58	2.00
$f_{9}$	IHO	0.00	0.00	0.00
	HO	0.00	0.00	0.00
	PSO	1.68 × 10¹	2.16 × 10¹	2.85
	GWO	3.21 × 10⁻¹⁵	4.70 × 10⁻³	9.32 × 10⁻³
	WOA	1.61 × 10⁻²	6.99 × 10⁻³	1.41 × 10⁻¹²
	SSA	6.58 × 10⁻²	1.38 × 10⁻¹	3.73 × 10⁻²
$f_{10}$	IHO	7.45 × 10⁻¹²	6.50 × 10⁻³	1.85 × 10⁻²
	HO	7.30 × 10⁻⁷	4.31 × 10⁻³	7.06 × 10⁻³
	PSO	8.54 × 10³	1.12 × 10⁴	1.29 × 10³
	GWO	1.41 × 10⁻¹	5.25 × 10⁻¹	2.05 × 10⁻¹
	WOA	3.01 × 10⁻¹	1.36	1.01
	SSA	3.07 × 10⁻¹	1.72	1.09
$f_{11}$	IHO	2.38 × 10⁻⁷	0.0003082	0.0003075
	HO	5.62 × 10⁻⁶	0.0003261	0.0003076
	PSO	9.99 × 10⁻³	0.0093298	0.0007819
	GWO	6.22 × 10⁻³	0.0026841	0.0003386
	WOA	2.05 × 10⁻⁴	0.0004732	0.0003099
	SSA	4.94 × 10⁻⁴	0.0001021	0.0003805
$f_{12}$	IHO	2.46 × 10⁻¹⁰	−1.03162845	−1.03162845
	HO	2.17 × 10⁻⁶	−1.03162727	−1.03162844
	PSO	2.46 × 10⁻⁸	−1.03162844	−1.03162845
	GWO	3.57 × 10⁻⁸	−1.03162841	−1.03162844
	WOA	6.38 × 10⁻⁷	−1.03162821	−1.03162845
	SSA	7.09 × 10⁻⁴	−1.03120126	−1.03162373

Table 4. Faults classification and labeling.

Fault Type	Label
Opening of 0%	0
Opening of 5%	1
Opening of 10%	2
Opening of 15%	3
Opening of 20%	4
Opening of 25%	5

Table 5. The classification results.

Fault Category	SVM	Logistic Regression	KNN	Naive Bayes
Diameter 200 mm-0.2 MPa	85.4%	63.5%	86.4%	84.7%
Diameter 200 mm-0.4 MPa	92.3%	82.5%	92.7%	88.0%
Diameter 200 mm-0.6 MPa	96.8%	76.7%	96.5%	95.0%
Diameter 200 mm-0.8 MPa	87.3%	74.8%	88.3%	85.0%
Diameter 300 mm-0.2 MPa	89.2%	70.6%	88.5%	88.5%
Diameter 300 mm-0.4 MPa	94.5%	86.9%	95.6%	91.0%
Diameter 300 mm-0.6 MPa	99.7%	87.9%	99.4%	99.2%
Diameter 300 mm-0.8 MPa	99.8%	66.7%	99.8%	99.9%
Diameter 500 mm-0.2 MPa	93.7%	80.0%	85.8%	91.4%
Diameter 500 mm-0.4 MPa	92.9%	71.2%	80.6%	88.6%
Diameter 500 mm-0.6 MPa	96.2%	90.6%	95.4%	96.0%
Diameter 500 mm-0.8 MPa	90.7%	78.5%	90.2%	81.1%

Table 6. The selected sample data.

Fault Category	Number of Samples	Training Set Samples	Testing Set Samples
Diameter 200 mm-0.2 MPa	1000	800	200
Diameter 200 mm-0.8 MPa	1000	800	200

Table 7. The result of diameter 200 mm-0.2MPa.

SVM Classifier	Optimization Method	Training Set Accuracy (%)	Testing Set Accuracy (%)
1	None	85.83	85.41
2	HO	95.00	92.50
3	IHO	95.55	93.75
4	PSO	93.33	90.62
5	GWO	94.86	90.62
6	WOA	94.81	92.50
7	SSA	92.65	90.63

Table 8. The result of diameter 200 mm-0.8 MPa.

SVM Classifier	Optimization Method	Training Set Accuracy (%)	Testing Set Accuracy (%)
1	None	86.39	87.29
2	HO	90.55	92.08
3	IHO	93.05	95.21
4	PSO	90.55	92.08
5	GWO	90.69	91.87
6	WOA	90.55	92.08
7	SSA	89.32	88.76

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, R.; Zhao, N. CEEMDAN-IHO-SVM: A Machine Learning Research Model for Valve Leak Diagnosis. Algorithms 2025, 18, 148. https://doi.org/10.3390/a18030148

AMA Style

Wang R, Zhao N. CEEMDAN-IHO-SVM: A Machine Learning Research Model for Valve Leak Diagnosis. Algorithms. 2025; 18(3):148. https://doi.org/10.3390/a18030148

Chicago/Turabian Style

Wang, Ruixue, and Ning Zhao. 2025. "CEEMDAN-IHO-SVM: A Machine Learning Research Model for Valve Leak Diagnosis" Algorithms 18, no. 3: 148. https://doi.org/10.3390/a18030148

APA Style

Wang, R., & Zhao, N. (2025). CEEMDAN-IHO-SVM: A Machine Learning Research Model for Valve Leak Diagnosis. Algorithms, 18(3), 148. https://doi.org/10.3390/a18030148

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CEEMDAN-IHO-SVM: A Machine Learning Research Model for Valve Leak Diagnosis

Abstract

1. Introduction

2. Materials and Methods

2.1. The CEEMDAN Algorithm

2.2. Fuzzy Entropy

2.3. Hippopotamus Optimization Algorithm (HO)

2.4. Improved Hippopotamus Optimization Algorithm (IHO)

2.4.1. Tent Chaotic Mapping

2.4.2. Adaptive Weighting Factor

2.4.3. Adaptive Mutation Perturbation

2.5. Support Vector Machine (SVM)

2.6. SVM Parameter Optimization Based on IHO

3. Results

3.1. IHO Performance Simulation Test

3.2. Valve Leakage Diagnostic Experiments

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI