1. Introduction
Rotating machinery underpins a wide range of industrial applications, from petrochemical processing and marine engineering to aerospace systems. Central to these machines are rolling bearings—precision components that convert sliding friction between rotating shafts and housings into rolling friction. This critical function demands an exceptional durability, load-bearing capacity, and operational stability, making continuous health monitoring essential to prevent catastrophic failures and enhance system reliability. As such, the extraction and analysis of early, subtle fault signatures in rolling bearings have become a pivotal point of focus for research. However, the nonlinear and non-stationary characteristics of rotating machinery vibration signals create significant challenges in fault diagnosis [
1,
2,
3,
4]. In recent years, many researchers have conducted extensive studies on how to efficiently and accurately diagnose rolling bearing faults. Additionally, when breakthrough developments have occurred in non-mechanical fields, fault diagnosis technology has also advanced considerably. For instance, fundamental research in mathematics and physics has inspired the underlying logic of fault diagnosis, while artificial intelligence and deep learning have improved its diagnostic accuracy. Various diagnostic methods from the medical field have also provided valuable insights [
5].
Entropy, originally a macroscopic quantity in thermodynamics used to characterize a system’s disorder, was expanded upon when Shannon introduced information entropy in 1948 to quantify time series information [
6]. Since then, entropy has been widely applied in nonlinear dynamics research. Researchers subsequently developed various entropy measures, including approximate entropy [
7], sample entropy (SE) [
8], permutation entropy (PE) [
9], fuzzy entropy [
10], attention entropy [
11], distribution entropy [
12], symbolic dynamic entropy [
13], and conditional entropy [
14]. While these methods show effectiveness in analyzing the nonlinear features of rotating machinery vibration signals, traditional entropy measures like sample entropy primarily focus on amplitude similarities and often neglect transitional characteristics crucial for bearing fault diagnosis. Fuzzy entropy offers improved noise resistance but requires complex parameter tuning that introduces additional optimization challenges. Permutation and dispersion entropy, in contrast, focus on ordinal patterns but may lose critical amplitude information relevant to fault severity assessments. Slope entropy [
15] addresses these limitations by directly analyzing the rate-of-change between consecutive points, making it inherently sensitive to impulsive patterns characteristic of bearing impacts while maintaining computational stability across varying signal lengths and noise conditions. These advantages make slope entropy particularly suitable for bearing fault diagnostics, where sudden impacts generate distinctive slope patterns in vibration signals. Despite these advantages, single-scale slope entropy, like other single-scale methods, still faces challenges in capturing the multiscale dynamics inherent in bearing vibration signals.
So many researchers have improved existing entropy methods, with modifications to SE and PE being the most widespread [
16,
17]. Traditional single-scale entropy methodologies, while demonstrating utility in mechanical fault diagnostics, exhibit inherent limitations in capturing system dynamics. The comprehensive characterization of vibrational signatures across multiple temporal scales—a fundamental prerequisite for robust fault diagnostics—presents significant analytical challenges. This inherent constraint catalyzed the development of sophisticated multiscale frameworks that encompass multiscale variants of approximate entropy [
18], SE [
19], and PE [
20]. These advanced analytical paradigms, which employ hierarchical coarse-graining procedures to quantify signal complexity across multiple temporal domains, have garnered substantial attention within the research community and precipitated the emergence of novel entropy-based diagnostic architectures specifically optimized for rotating machinery condition monitoring and fault detection. Yuan introduced an enhanced methodology through multivariate coarse-graining procedures, proposing Composite Multivariate Multiscale Permutation Fuzzy Entropy [
21], which effectively mitigated issues of entropy fluctuation and information loss. Rostaghi integrated fuzzy set theory, multidimensional embedding reconstruction theory, and dispersion patterns to develop Refined Composite Multivariate Multiscale Fuzzy Dispersion Entropy [
22]. This approach demonstrated reduced sensitivity to signal length constraints while yielding more stable analytical outcomes. Chen proposed Refined Composite Multiscale Diversity Entropy (RCMDE) by incorporating scale factor characteristics, addressing the inherent limitations of original diversity entropy, wherein the length of multiple time series data becomes truncated at deeper scales [
23]. While these improvement methods have enhanced the accuracy of feature extraction compared to the original approaches mentioned, they have yet to solve the problems of long computation times and information loss.
Entropy-based methods have effectively advanced fault feature extraction, and when combined with the rapid development of artificial intelligence technologies in recent years, both the efficiency and accuracy of fault diagnosis methodologies have dramatically improved. Neural networks, with their powerful pattern recognition capabilities, have become essential tools in diagnostic systems, evolving from basic multilayer perceptrons to sophisticated architectures such as Convolutional Neural Networks (CNNs) [
24] and Deep Belief Networks (DBNs) [
25] that can automatically extract hierarchical features from raw vibration signals. While deep learning approaches offer remarkable accuracy, they typically require substantial computational resources and large training datasets, which may limit their applicability in industrial scenarios requiring the rapid diagnosis of limited samples. This challenge has prompted researchers to investigate more efficient neural network architectures, alongside optimization techniques, to enhance performance. The Extreme Learning Machine (ELM) [
26] presents significant advantages over conventional neural networks due to its remarkably fast training speed and good generalization performance; however, the random initialization of the input weights and hidden biases in the standard ELM often leads to instability and suboptimal solutions. Fortunately, swarm intelligence optimization algorithms—inspired by collective behaviors in nature—have emerged as effective solutions for parameter tuning and model optimization in diagnostic systems. Techniques such as Particle Swarm Optimization (PSO) [
27], ant colony Optimization (ACO) [
28], and Genetic Algorithms (GAs) [
29] mimic biological cooperative behaviors to efficiently search complex solution spaces and avoid local optima. Among these, PSO stands out for its implementation simplicity, rapid convergence, and ability to optimize continuous variables without requiring derivative information about the objective function. The synergistic combination of appropriate neural network architectures with efficient optimization methods offers a promising direction for enhancing fault diagnostic accuracy while maintaining computational efficiency.
To overcome these limitations, this paper introduces Adaptive Composite Multiscale Slope Entropy (ACMSlE) by combining function-adaptive patterns with time-shifting and refined sampling hybrid strategies. This method not only preserves the advantages and stability of MSlE in analyzing time series complexity but also suppresses noise, enhances feature extraction, and significantly improves the capture of local variation characteristics.This investigation presents a novel fault diagnosis framework that employs Fast Ensemble Empirical Mode Decomposition (FEEMD) [
30] for the preprocessing of raw vibration signals from mechanical equipment, with the resultant Intrinsic Mode Function (IMF) components serving as inputs to the proposed Adaptive Composite Multiscale Slope Entropy (ACMSlE) for fault feature extraction. Considering the difficulty in evaluating the effectiveness of feature extraction with this approach, a PSO-ELM is employed to achieve fault classification. Subsequently, a new rotating machinery fault diagnosis method based on FEEMD-ACMSlE and PSO-ELM is proposed.
The principal contributions of this investigation include the following:
- (1)
The proposal of ACMSlE for measuring time series complexity. This method effectively optimizes edge processing and enhances trend capture, effectively preserving data points as the scale increases. It fully utilizes original data and improves noise resistance through combined filtering, effectively addressing the shortcomings of previous methods.
- (2)
The validation of ACMSlE using different types of signals and sampling point numbers, before comparing it with three other newly improved entropy methods.
- (3)
The development of a rotating machinery fault diagnosis method based on FEEMD-ACMSlE and PSO-ELM. Comparative analysis with experimental data verifies its feasibility and superiority.
The remainder of this paper is organized as follows:
Section 2 elaborates on our proposed feature extraction theory using ACMSlE, including the fundamental concepts of slope entropy, multiscale slope entropy, and the innovative ACMSlE algorithm.
Section 3 presents our comprehensive fault diagnosis methodology, detailing the FEEMD preprocessing technique, PSO-ELM classification approach, and the integrated diagnostic framework.
Section 4 evaluates the effectiveness of ACMSlE through rigorous testing with various signal types and comparative analyses.
Section 5 validates the proposed fault diagnosis method using experimental bearing data obtained under diverse conditions and compares it with alternative approaches.
Section 6 discusses the limitations and potential applications of our method, and
Section 7 summarizes the contributions of this research and outlines directions for future investigations.
2. Proposed Feature Extraction Theory for Bearings Utilizing ACMSlE
2.1. Slope Entropy
Slope entropy quantifies the complexity of a time series by analyzing the slope variations between consecutive points, effectively preserving the amplitude information of the original signal. For a time series
, SlopEn is computed as follows:
where
m is the embedding dimension
, with a default value of
for optimal pattern recognition based on empirical studies;
is the slope threshold parameter that controls the sensitivity of the algorithm to slope variations (default
, corresponding to 45°)—higher values of
require greater angular changes to trigger recognition;
is the zero-region threshold (default
) that defines the minimum change required to record a slope variation, effectively serving as a filtering threshold that prevents the algorithm from responding to insignificant noise while capturing actual signal features; and
represents the relative frequency of the
k-th slope pattern. These default parameter values were determined through extensive testing across multiple datasets to provide an optimal performance in distinguishing actual signal characteristics from background noise [
31,
32,
33].
2.2. Multiscale Slope Entropy
MSlE is a multiscale optimization method used within SlE that extends single-scale analyses to capture complex temporal patterns across different time horizons. The key steps for calculating MSlE are described below.
Given a time series , MSlE calculation involves the following:
(1) Coarse-graining process:
where
is the scale factor,
is the coarse-grained time series, and
is the length of each coarse-grained sequence.
(2) For each scale factor
, the slope entropy is calculated using the coarse-grained time series:
(3) The final MSlE curve is
where
is the maximum scale factor considered in the analysis, which is typically constrained by the condition
to ensure statistical reliability.
It should be noted that while MSlE has the capability to provide a valuable multiscale analysis, the traditional coarse-graining process can lead to information loss as the scale factor increases.
2.3. Adaptive Composite Multiscale Slope Entropy
ACMSlE enhances the traditional MSlE algorithm using adaptive processing strategies and refined sampling techniques. To better illustrate the fundamental differences between traditional MSlE and our proposed ACMSlE,
Figure 1 presents a comparative schematic diagram of both approaches. Our method incorporates multiple processing modes optimized for different signal characteristics, while improving noise resistance through composite sampling and adaptive parameter selection. The steps for calculating ACMSlE can be briefly described as follows:
(1) Composite coarse-graining: instead of single coarse-graining, ACMSlE employs
different coarse-grained series for each scale:
where k represents the number of different starting points.
(2) Adaptive processing: for each scale
, entropy is calculated as follows:
(3) Enhanced signal modification: There are three modes used for signal processing—standard mode, which uses a Hamming-windowed moving average; local variation mode, which combines a moving std and median filtering; and trend capture mode, which uses an exponential moving average.
where
represents the modified output value,
denotes the standardized value,
represents the median-processed value,
indicates the exponentially processed value,
represents the Hamming window-processed value, Tx is the processing type selection parameter, and
is an adaptive weighting coefficient that controls the influence of the exponential moving average component during trend capture mode.
The
parameter is not a fixed constant but rather adaptively calculated based on local signal characteristics using
where local_complexity is computed as the coefficient of variation (the ratio of the standard deviation to the mean) within a sliding window of the signal. This adaptive mechanism enables the algorithm to automatically adjust its sensitivity according to the signal’s complexity, providing enhanced feature discrimination for the varying dynamic behaviors typically observed in bearing vibration signals.
In signal segments containing transient fault features or high-frequency components characteristic of incipient bearing defects, their values approach 1.0, giving greater weight to the exponential moving average component, which better preserves these critical diagnostic features. Conversely, in signal regions dominated by background machinery noise or steady-state vibrations, smaller values reduce the influence of this component, effectively enhancing noise rejection while maintaining sensitivity to relevant fault signatures. This dynamic adaptation represents a key innovation within our approach, allowing the ACMSlE method to optimize feature extraction characteristics based on local signal properties rather than relying on fixed processing parameters.The selection of these specific signal processing techniques was guided by their distinctive advantages for bearing fault diagnosis applications.
The Hamming window was chosen for standard mode processing due to its superior frequency domain characteristics and minimal spectral leakage (approximately −42 dB side-lobe attenuation), which preserves the spectral integrity of bearing fault’s impulses while effectively suppressing noise. Unlike rectangular windows or other tapering functions, the Hamming window offers an optimal trade-off between main-lobe width and side-lobe suppression.
For local variation detection, the combination of moving standard deviation and median filtering creates a complementary system particularly effective for bearing diagnostics. Standard deviation efficiently amplifies transient vibration changes typical of incipient faults, while median filtering provides robust impulse noise rejection without blurring sharp fault transitions—a limitation of Gaussian filters. This hybrid approach consistently outperformed single-filter techniques in our evaluation tests while maintaining O(N) computational complexity.
The exponential moving average employed in the trend capture mode delivers adaptive memory properties that are particularly valuable for tracking progressive fault development patterns. Compared to more computationally intensive techniques like wavelet transforms (O(N log N)) or Savitzky–Golay filters, our selected methods maintain linear time complexity while providing excellent feature discrimination capabilities. This computational efficiency, combined with the adaptive switching mechanism used to switch between processing modes, makes ACMSlE highly suitable for practical diagnostic applications, including, potentially, real-time implementations.
Parameter selection is a critical aspect of the ACMSlE methodology. The selection of the parameters m, , and was informed by both theoretical considerations and extensive empirical testing. The embedding dimension provides an optimal balance between pattern recognition capability and computational efficiency, capturing essential nonlinear dynamics while avoiding the curse of dimensionality. Similarly, the slope threshold parameter (corresponding to a 45° angle) and zero-region threshold were determined through sensitivity testing to effectively distinguish between fault-induced vibration patterns and background noise.
The scale factor is crucial for capturing signal characteristics at different temporal scales. The selection of follows specific criteria to ensure both computational efficiency and analytical effectiveness:
The scale factor begins at unity and incrementally extends to any positive integer greater than 1, with a default configuration of three scales ().
For a time series of length N, the upper bound of must satisfy the data length constraint to maintain statistical reliability: . This constraint ensures sufficient data points in each coarse-grained series for reliable entropy estimation.
Based on extensive empirical analysis, we propose the following guidelines for
selection:
The default value of
is selected as it provides a balanced trade-off between computational complexity and the ability to capture multiscale dynamics in most practical applications. This setting has been empirically validated across various signal types [
34], demonstrating a robust performance in capturing relevant temporal patterns while maintaining computational efficiency.
3. The Proposed Fault Diagnosis Method
3.1. Fast Ensemble Empirical Mode Decomposition
FEEMD is an enhanced algorithm within the Empirical Mode Decomposition (EMD) family that makes use of the advantages of EEMD while improving its computational efficiency. It is primarily used for processing nonlinear and non-stationary signals as it features strong resistance to modal aliasing and minimal boundary effects. In this study, we employ this algorithm as a preprocessing method before ACMSlE feature extraction.The implementation process is as follows:
(1) Add a white noise sequence to the original signal
:
where
represents the white noise added in the
i-th iteration, with amplitude
.
(2) Perform EMD decomposition on the noise-added signal:
where
denotes the
j-th Intrinsic Mode Function (IMF) of the
i-th decomposition and
is the residual term.
(3) The core improvement of FEEMD lies in selective reconstruction:
where
N represents the ensemble number, which is typically much smaller than that required by EEMD.
(4) Final signal reconstruction:
The key parameters in the implementation of this process are as follows:
Noise amplitude : typically set to 0.1–0.3 times the standard deviation of the signal.
Ensemble number N: generally set between 20 and 100, significantly less than the hundreds required by EEMD.
IMF screening criterion:
where
represents the signal after the
k-th screening.
The noise amplitude parameter
in FEEMD significantly influences decomposition quality and computational efficiency. According to research,
values between 0.1 and 0.2 times the signal’s standard deviation provide an optimal balance between noise-assisted decomposition and minimal artificial component introduction. For bearing fault diagnosis applications in particular, studies have demonstrated that setting
yields effective IMF separation while preserving fault-related transient features. Further research has shown that lower values (
) result in insufficient mode separation, while higher values (
) introduce excessive artificial components that could mask subtle fault signatures. Following these established guidelines, we set the ensemble number
N to 50, which provides sufficient statistical stability while maintaining computational efficiency compared to traditional EEMD approaches, which require hundreds of ensembles [
35,
36,
37].
3.2. Particle Swarm Optimization
Particle Swarm Optimization (PSO) is a swarm intelligence algorithm that simulates social behavior, where particles (candidate solutions) navigate through a search space guided by both individual and collective experience. Each particle adjusts its trajectory based on its own best-found position and the best position discovered by any particle in the swarm [
38].
The movement of particles is governed by
where
and
are the velocity and position of particle
i at iteration
k;
w is the inertia weight;
and
are acceleration coefficients;
and
are random values in [0,1];
is particle
i’s best position; and
is the global best position.
The advantages of PSO in neural network optimization include its gradient-free operation, minimal parameter requirements, and ability to escape local optima, making it ideal for optimizing classifier parameters in fault diagnosis applications.
3.3. Extreme Learning Machine
The Extreme Learning Machine (ELM) is a single-hidden-layer feedforward neural network characterized by its analytical determination of output weights, which is in contrast to traditional neural networks that rely on iterative gradient-based training. The key innovation of the ELM lies in it randomly assigning input weights and biases while analytically calculating output weights, enabling both universal approximation and an extremely fast training speed.The standard ELM algorithm can be formalized as shown in
Table 1.
For our bearing fault diagnosis application, we implemented an ELM with the following configuration:
Input layer with dimensions matching the ACMSlE feature vector size;
Hidden layer containing 20 neurons with the sigmoid activation function ;
Output layer containing 9 neurons corresponding to the bearing’s health condition.
The sigmoid activation function was selected for its effectiveness in capturing nonlinear relationships in fault patterns. The hidden layer size was determined through cross-validation, balancing the model’s complexity with its capability for generalization.
While the ELM offers a remarkable training speed (typically orders of magnitude faster than backpropagation-based methods), its performance can be sensitive to the random initialization of the input weights and biases. This randomness often leads to inconsistent performance and potentially suboptimal solutions. To address this limitation, we employ Particle Swarm Optimization to enhance the ELM framework, as detailed in the following section.
3.4. PSO-ELM
As discussed in the previous section, the standard ELM suffers from performance instability due to its random parameter initialization. To address this limitation, we employ Particle Swarm Optimization to enhance the ELM framework, resulting in the PSO-ELM hybrid algorithm. This approach systematically optimizes the input weights and biases that would otherwise be randomly assigned, significantly improving the algorithm’s generalization performance and prediction accuracy while maintaining its rapid training speed. The implementation process is as follows:
(1) ELM Network Structure Initialization. Building on the ELM architecture described in
Section 3.3, we formulate the following network structure:
(2) PSO Parameter Initialization
where
M represents the particle population size. Each particle is initialized with random values: input weights
and biases
are generated using a uniform distribution within [−1, 1], while initial velocities are set to random values in [−0.1, 0.1]. This initialization approach provides sufficient diversity for effective search space exploration as soon as the algorithm is run.
(3) Fitness Function Definition
(4) Particle Position Update
where
w is the inertia weight;
are learning factors;
are random numbers;
represents the individual’s best position; and
represents the global best position.
(5) ELM Parameter Optimization. We update the ELM input weights and biases using optimal particle positions, calculate the hidden layer output matrix H, and solve the output weights using the Moore–Penrose generalized inverse: . The key optimization parameters and their influence on diagnostic precision include the following:
Particle swarm size M: Set to 30 in our implementation after evaluating populations from 10 to 50. Smaller populations () often converged prematurely to local optima, particularly for similar fault patterns. Increasing beyond improved the model’s accuracy by less than 0.5% while doubling its computational time.
Maximum iterations: Set to 200 in our implementation, with convergence typically occurring within 150–180 iterations across various fault conditions. Extended testing showed minimal accuracy improvements (<0.3%) when increasing this to 400 iterations.
Learning factors : Both set to 2.0, balancing the cognitive and social learning components. For subtle fault detection (especially incipient ball faults), slightly emphasizing cognitive learning (, ) improved the classification accuracy by approximately 1.5%.
Inertia weight w: Implemented as linearly decreasing from 0.9 to 0.4 throughout iterations. This strategy consistently outperformed constant weights by 2–3% in terms of accuracy, particularly for complex fault patterns. This dynamic adjustment facilitates broad exploration in early iterations and refined exploitation in later stages.
3.5. The Proposed Fault Diagnosis Method
This study proposes a novel fault diagnosis method for rotating machinery vibration signals. The diagnostic workflow is illustrated in
Figure 2, with the specific diagnostic process consisting of the following steps:
- (1)
Preprocessing of Original Vibration Signals: The raw vibration signals obtained through FEEMD sampling are preprocessed by computing the correlation coefficients between each IMF component and the original signal, analyzing signal and envelope spectra, calculating variance contribution rates and kurtosis values, and selecting appropriate statistical measures or weighted statistical measures for IMF component screening (creating a correlation coefficient by default).
- (2)
Feature Vector Construction: We apply the ACMSIE method to the screened IMF components to calculate corresponding entropy values, construct feature vectors, and combine all extracted feature vectors to form an initial fault feature set.
- (3)
Training and Testing Process: The fault feature set is partitioned into training and testing feature datasets at a 1:1 ratio; the training data are input into PSO-ELM for learning, while the testing data are used for detection and recognition validation.
- (4)
Result Analysis and Maintenance: Based on the diagnostic results, appropriate maintenance procedures are implemented for the identified faults.
6. Discussion and Limitations
The implementation of our proposed framework requires the consideration of several key factors for its optimal performance and industrial application. While our study demonstrates its strong performance in controlled laboratory settings, these important limitations should be acknowledged.
The computational complexity of ACMSlE, while higher than conventional entropy measures, remains reasonable for offline analysis with standard computing resources. Its current implementation is most suitable for environments where diagnostic accuracy takes precedence over real-time processing constraints. For real-time applications and embedded systems with limited resources, several adaptation strategies could make our approach more feasible:
Implementing a batch processing approach where vibration data are collected in fixed-length windows (1–2 second intervals);
Optimizing the number of scale factors, as even with just three scales, the method maintains strong discrimination capabilities;
Employing hardware acceleration through FPGA-based solutions to leverage the inherently parallel nature of multiscale entropy computation;
Replacing the computationally intensive FEEMD preprocessing with lighter alternatives when processing speed is prioritized over maximum accuracy;
Developing hybrid monitoring systems where simplified algorithms perform continuous monitoring that triggers comprehensive analysis only when potential fault signatures are detected.
Real industrial environments present additional challenges beyond these computational constraints. Non-stationary machinery behavior due to load variations, speed fluctuations, or transient events may affect the stability of entropy calculations. The current methodology assumes relatively stable operating conditions and may require adaptive parameter selection mechanisms to maintain its performance under highly variable conditions. Sensor noise is particularly challenging for early-stage faults, where signal-to-noise ratios are low. Although our multiscale approach inherently provides some noise resistance, extreme noise conditions may require additional preprocessing. Scenarios in which data are missing due to sensor failures or communication interruptions would impact the continuous time series analysis that entropy methods depend on, necessitating robust interpolation techniques, which could be a subject for future research.
The high classification accuracy (98.7%) achieved for compound fault scenarios should be considered alongside the potential risks of overfitting and misclassification. Compound faults present unique challenges, as their vibration signatures combine the characteristics of multiple elemental faults with complex interactions. Several aspects of our approach mitigate overfitting risks: the 30 independent trials with different random initializations demonstrate remarkable consistency (±0.32% standard deviation) and t-SNE visualizations show that ACMSlE creates well-separated feature clusters even between medium and severe compound fault categories. Nevertheless, misclassification risks remain in borderline cases where compound fault signatures might share similarities with individual component faults. The primary confusion that was observed occurred between normal conditions and medium outer race faults when using some preprocessing methods, though our optimal FEEMD-ACMSlE-PSO-ELM combination significantly reduced this confusion.
Despite these limitations, the fundamental components of our methodology have potential applicability, beyond bearing diagnosis, in other rotating machinery systems. The ACMSlE feature extraction method could be adapted for gearboxes, rotors, and motors with appropriate parameter adjustments. The primary requirements for its successful extension would be that the phenomena of interest produce detectable patterns across multiple time scales, and that an acceptable signal-to-noise ratio can be achieved through appropriate preprocessing.
7. Conclusions
This study introduces a novel ACMSlE feature extraction method for nonlinear dynamics analysis, which effectively enhances its stability and complexity discrimination capability while reducing its data length dependency compared to that of the conventional MSlE approach. Furthermore, we propose an integrated rolling bearing fault diagnosis framework that combines FEEMD preprocessing, ACMSlE feature extraction, and PSO-ELM classification. The proposed method was validated using an experimental dataset from Huazhong University of Science and Technology. The results demonstrate its superior performance compared to other entropy-based methods (CMSE, CMCE, and MSlE) and various combinations of preprocessing and classification techniques (VMD-X-PSO-ELM, EWT-X-PSO-ELM, SVM, and CNN-LSTM). Notably, our algorithm maintains a high accuracy of 98.7% when dealing with challenging datasets, significantly outperforming alternative approaches. Furthermore, our research aligns with several emerging trends in the 2022–2024 diagnostic literature. The increasing focus using on interpretable AI for industrial applications has highlighted the value of physically meaningful features like those provided by our entropy-based approach. While deep learning methods continue to gain popularity, recent work has emphasized the importance of balancing black-box complexity with explainable decision-making processes that engineers can trust and understand. Our hybrid approach, combining advanced feature extraction with optimized shallow networks, represents a middle ground that maintains both performance and interpretability. The growing interest in adaptive methodologies that can automatically adjust to varying signal conditions is also relevant to our development of the ACMSlE method, with its dynamic parameter selection capabilities. As the field continues to evolve toward more robust and adaptable diagnostic frameworks, our approach contributes to this progression by demonstrating how traditional entropy analysis can be enhanced through strategic modifications that address specific industrial challenges.
Several avenues for future research remain. The classification accuracy of this approach could potentially be further enhanced by exploring alternative optimization algorithms for ELM parameter tuning. Additionally, investigating suitable optimization algorithms for the SVM that better complement ACMSlE features presents another promising direction. Furthermore, the integration of fuzzy clustering techniques with our entropy-based feature extraction method could enhance its handling of the inherent uncertainties in bearing fault diagnosis. Future work should also consider validating the method in industrial environments under variable operating conditions, extending it to other mechanical systems such as gearboxes and rotors, and investigating its hybridization with deep learning approaches for improved feature learning. Optimizing this methodology for real-time implementation and developing models for continuous fault progression tracking represent additional valuable research directions. These aspects warrant further investigation in subsequent studies.