1. Introduction
Bearings are essential parts used in various industries because they allow multiple components to move smoothly and efficiently together. Bearings help to evenly distribute loads by supporting shafts, axles, or other moving parts, reducing heavy wear and tear on machinery. Various types of stresses, such as mechanical loads, vibrations, and temperature changes, subject bearings to wear and tear over time, leading to bearing faults. These faults can lead to machine downtime, increased maintenance costs, and even safety hazards. Therefore, timely detection and diagnosis of bearing faults are crucial to prevent equipment failure and optimize the overall performance of machinery [
1]. A variety of methods are used for bearing condition monitoring to evaluate their state of health and identify possible problems before they become serious malfunctions. One of the most used techniques is vibration analysis, which measures and examines bearing vibrations to find unusual patterns like misalignment, imbalance, or bearing component failures. Signal processing is important for removing noise from signals because it pulls out useful information from noisy sensor data and makes it easier to find fault patterns. Wavelet transformations (WT), empirical mode decomposition (EMD), and Walsh–Hadamard transform (WHT) are prominent approaches [
2,
3,
4]. Fast Fourier transform (FFT) converts vibration signals into the frequency domain, detecting abnormal frequencies related to bearing faults [
5]. Wavelet transform (WT) analyzes signals in both time and frequency domains, offering superior time-frequency resolution and detecting transient fault signals [
6]. Ensemble Empirical Mode Decomposition (EEMD) decomposes bearing signals into different time scales to extract relevant information, addressing issues like mode mixing and spectral leakage present in conventional EMD [
7]. Since bearings have a high failure rate, Zhu et al. [
8] proposed a novel feature fusion approach for bearing fault feature extraction and diagnostics. The bearing signal time-frequency data is first extracted as a characteristic matrix using the Wavelet Packet Transform (WPT). To increase fault detection accuracy and eliminate superfluous features, this matrix is modified by employing Multi-weight Singular Value Decomposition (MWSVD), which is based on singular value contribution rates and entropy weight. Li et al. [
9] investigated a methodology based on encoder signals and combined it with a locally weighted multi-instance multi-label (LWMIML) network to create feature vectors and find associations between features and defect categories. The effectiveness of the study was validated using data from fault simulation test rigs, demonstrating its potential as an affordable substitute for intelligent fault diagnosis systems. Recently multi-fault diagnosis strategy based on iterative generalized demodulation is reported [
10]. The methodology proposes a novel diagnosis technique guided by the instantaneous fault characteristic frequency under varying speeds. The authors compared both simulated and experimental results to demonstrate the diagnosis’s effectiveness. In another study, adaptive thresholding and coordinate attention-based tree-inspired network was proposed to detect healthy conditions of bearings [
11]. Moreover, a frequency-chirprate synchrosqueezing operator was introduced to analyze the time-frequency representation of bearings faults. The diagnosis accuracy is verified with simulated and experimental signals [
12]. Kiakojouri et al. [
13] proposed a cepstrum pre-whitening-based filtration technique to identify the impulsive features associated with various bearing faults. The outcomes additionally demonstrate its efficacy in identifying several bearing faults that occur under various bearing operating conditions.
Due to their superior ability to handle complicated and non-stationary data patterns, which are frequently present in vibration signals, machine learning (ML) and deep learning (DL) techniques have revolutionized the diagnosis of bearing faults. Conventional methods of signal processing, such as the Fourier, Wavelet, and Empirical Mode Decomposition transforms, concentrate on taking features out of the signal by using mathematical models and predetermined rules [
14,
15]. Even while these techniques are good at isolating specific frequency components and transient features, they frequently fail to capture the complex, high-dimensional connections present in the data, especially when operating circumstances and noise levels vary. On the other hand, massive volumes of historical data are utilized by ML and DL techniques to extract optimal features and decision boundaries straight from the data. With more accuracy and dependability, these models can automatically adjust to complicated data, spotting minute patterns and abnormalities that can point to bearing abnormalities [
16]. Pham et al. [
17] demonstrate a simplified CNN-based embedded device diagnosis procedure that uses acoustic emission data to classify faults. Through the use of a MobileNet-v2 model that has been pruned and tuned for lower system resource utilization, this technique substantially decreases computing costs. A unique classification technique to identify faults with small sample datasets is presented by Yang et al. [
18]. This technique makes use of a triplet embedding connection between each stage of recognizing faults and the extraction of features in the categorization of vibration data. Comparison tests using stacked autoencoders, stacked denoising autoencoders, and traditional CNN techniques verify the effectiveness of the approach and demonstrate improved fault diagnosis performance in small sample sizes.
The decision to choose features is critical because it identifies and retains just the most relevant informative characteristics from signals, considerably improving the efficacy of classification algorithms. When it applies to bearing fault diagnosis, unnecessary complexity in the form of ineffective features may hinder meaningful trends, cause overfitting, and complicate classification models. Feature selection techniques minimize the dimensionality of the dataset, making the model simpler, easier to understand, and faster to train. Feature selection also helps the classification algorithm focus on the dataset’s most important features that are indicative of bearing failures. This makes it easier to identify small but important fault attributes. As a result, feature selection becomes an important strategy for enhancing the effectiveness of bearing fault diagnosis techniques by carefully removing unnecessary information and emphasizing key characteristics, guaranteeing that classification models are effective and very accurate in forecasting bearing health states. Li et al. [
19] presented a sophisticated framework to diagnose hybrid faults in gearboxes. A new two-step feature selection procedure combining filter and wrapper techniques using mutual information and non-dominated sorting genetic algorithms II (NSGA-II) was evaluated. Karabadji et al. [
20] combine attribute selection with data sampling to improve the process of choosing relevant attributes and database elements, thus making it easier to develop decision rules for fault diagnosis in rotating machinery. Experimental comparisons on ten reference datasets prove the effectiveness of this strategy and highlight its superiority over conventional decision tree-building techniques. Rajeswari et al. [
21] describe a novel approach to gearbox diagnosis that uses vibration data from test equipment for early defect detection. To reduce computing demands, the study optimizes feature selection using both a rough set-based technique and a genetic algorithm (GA). Gao et al. [
22] proposed a causal feature network-based feature selection technique for composite fault diagnosis in bearing. As demonstrated by experimental results, a very high accuracy was reported.
With its intrinsic transparency and interpretability, explainable AI (XAI) represents a revolutionary shift from traditional wrapper and metaheuristic optimization-based methods in the field of feature selection. Conventional techniques, although successful in recognizing significant feature subsets, frequently function as “black boxes”, providing a limited understanding of the reasoning underlying feature relevance and selection choices. This obscurity can be a major disadvantage, particularly in fields like healthcare, finance, etc., that demand strict validation and comprehension of model behavior. On the other hand, XAI-based feature selection methods shed light on the selection procedure and offer concise justifications for the significance of particular traits. This promotes a deeper comprehension of the underlying processes being modeled, in addition to increasing trust and confidence. XAI models can identify links in the data that were previously overlooked, which can lead to discoveries and even direct future studies. As a result, XAI-based feature selection stands out as a better strategy that combines transparency and performance while encouraging a more in-depth interaction with the model’s decision-making processes. To meet the essential requirement for transparency in Industry 4.0’s Fault Detection and Diagnosis (FDD) procedures for chemical process systems, Harinarayan et al. [
23] presented an inventive framework called Explainable Fault Detection, Diagnosis, and Correction (XFDDC). When the framework was applied, an enhanced fault detection rate and F1 score were observed. Moreover, Meas et al. [
24] presented a novel use of XAI for assisting HVAC engineers to efficiently identify faults in air handling units. The SHAP technique was used in the study to improve explainability and visualize the temporal evolution of the important features. The system’s explanatory power was confirmed after validation with actual experimental data.
The Q transform provides improved time-frequency resolution and adaptability to non-stationary data, which sets it apart from other classic signal processing transforms. The Q transform specializes at precisely temporally resolving both high- and low-frequency components and offers rich, informative features that capture the fundamental dynamics of bearing failures, which improves the model’s capacity to learn and generalize from sparse experimental datasets when paired with deep learning techniques like LSTM, GRU, and SVM. In small sample numbers, where it becomes difficult to capture the crucial fault characteristics, this synergy is very effective. In addition, the innovative use of Explainable AI (XAI) for feature selection in LSTM and GRU frameworks is a major breakthrough in the identification of bearing faults. By emphasizing the most pertinent features obtained from the Q transform, XAI not only increases model interpretability but also helps with model transparency and reliability, filling a significant gap in the literature. This integrated technique, which combines the power of XAI-enhanced deep learning models with the Q transform, guarantees reliable, accurate, and explicable issue detection even in situations when data availability is limited. As such, it represents a significant advancement in predictive maintenance solutions. The authors have made noteworthy contributions to the methodology and assessment of bearing fault diagnosis through the following innovations:
The Q transform, a robust time-frequency analysis technique, is applied to extract features from raw signal data, effectively capturing both temporal and frequency-domain information pertinent to bearing fault detection. To enhance interpretability, XAI techniques are applied to identify and highlight the most relevant features derived from the Q transform.
For performance comparison across various model architectures, this study employs three distinct machine learning techniques—SVM, LSTM, and GRU—to predict bearing defects.
The novel utilization of LSTM and GRU as XAI models for bearing fault diagnosis is explored, significantly enhancing the transparency and robustness of predictive maintenance technologies.
The methodology is further refined by incorporating SSA optimization for hyperparameter tuning, alongside XAI-based feature selection. This approach not only optimizes the machine learning models but also enhances their interpretability and effectiveness in fault prediction.
The robustness and generalization of the models are meticulously validated utilizing tenfold cross-validation, ensuring superior performance across different models.
Figure 1 illustrates the bearing fault diagnosis methodology based on the proposed framework.
3. Results and Discussion
The present investigation used a hybrid methodology to diagnose faults using the CWRU-bearing dataset. The 64 vibration signals representing various fault conditions of bearing—HB, BD, IRD, and ORD—were preprocessed through the Q transform. HOG statistical features were extracted from each fault condition and ranked with the XAI. Classifiers like SVM, LSTM, and GRU are considered for the initial training of the constructed feature vectors, and in a later stage, tenfold cross-validation is performed to avoid overfitting of models. To reduce complexity and improve the diagnostic model’s performance, feature selection is essential to fault diagnosis because it helps determine the most applicable and discriminative attributes from the dataset. By identifying the most informative attributes, superfluous data can be removed, thereby lowering computational complexity and lowering the chance of overfitting, which improves diagnostic accuracy. In the present work, XAI (SHAP)-based feature selection techniques were applied to the extracted features of all four bearing fault conditions.
Figure 3 shows the bar chart representing feature selection using XAI models.
Figure 3a displays the feature significance graph, which displays the outcomes of using an SVM as an XAI model. The graph evaluates the relative relevance of sixteen distinct features—identified as F1 through F16—across four distinct fault classes using SHAP values. Each factor contributed substantially to the classification assessments for every fault condition, as the bar chart illustrates. The vertical axis’ SHAP values show the average influence of a feature on the result of the model, which translates to a feature’s relative significance in the SVM’s decision-making process. The graphic reveals that the features F6 through F10 appear to have the highest mean SHAP values across all fault conditions, indicating their significant influence on the model’s predictions. F1, on the other hand, has the smallest effect, as evidenced by its lowest mean SHAP value. The purple bar segments’ larger SHAP values for F9 and F10 indicate their highest significance for the BD class. Blue indicates that features F6 and F7 appear to have the greatest influence on class HB. The profile of the IRD is comparable to that of HB, with F6 and F7 being particularly significant. Last but not least, the ORD has a very uniform feature importance distribution, with a minor focus on F8 and F9. This variation in the relative relevance of features between classes could point to a complex relationship between features and fault conditions. It also shows how the model can use varied information content from the same features to distinguish between various fault conditions.
Figure 3b shows the feature significance as determined by an XAI architecture using a LSTM network. The influence of each feature on the predictive power of the model is assessed, and carefully examining the bar chart reveals that certain features are very important for every fault condition. The feature F10—represented by the tallest bar—seems to have a significant impact on the model’s predictions, with the greatest mean SHAP value. Conversely, the model assigns the least weight to attributes like F1 and F15, resulting in their lowest SHAP values. One noteworthy finding is the influence of certain attributes on a particular fault condition. In the fault BD, for illustration, feature F6 has a considerable impact, while in other fault conditions, its influence is quite small. This variation in feature importance may indicate that the LSTM network can recognize and use distinct features for faulty condition differentiation.
Figure 3c highlights the feature’s significance when GRU is considered as an XAI model. A more thorough examination of the graph reveals a remarkable variation in the significance of the attributes under each fault condition. For instance, F4 and F10 are extremely important compared to the others, especially for the HB class. Their significant SHAP values suggest that they play a crucial part in the GRU model’s prediction ability and may be indicative of characteristics that catch severe bearing failure conditions. As an indication of their poor individual predictive value, attributes F1, F12, and F16, on the other hand, score lower on the SHAP value scale. Remarkably, the most significant attributes, F4 and F10, are undoubtedly essential in various defect conditions, potentially serving as fundamental measures of bearing health. Their increased significance under different conditions may indicate that these features are susceptible to a wide range of faulty attributes.
To analyze the effect of SHAP-selected features on correctly identifying various fault conditions associated with bearing, all three models were evaluated with four prediction conditions: prediction results with default hyperparameters, SHAP-selected features, SSA optimization-based hyperparameter tuning, and combined SHAP and SSA optimization.
Figure 4a–d illustrate the prediction of bearing fault conditions using SVM as a prediction model. As observed in
Figure 4a, the SVM model performed well when used with default parameters, achieving a training accuracy of nearly 95% and a tenfold CV accuracy of about 80%. The tenfold CV precision was close to 60%, while the precision during training was about 85%. The training recall was almost 80%, and the tenfold CV recall was 50%, whereas the training and tenfold CV F1 scores were roughly 80% and 55%, respectively. The recall and F1 score metrics showed a similar trend. The recall and F1 score metrics showed a similar trend.
Figure 4b shows an apparent decrease in accuracy for all metrics when using SHAP-selected features. The SVM’s notable improvements when integrated for adjusting hyperparameters highlighted the usefulness of the SSA in improving model generalization. This was especially evident in the tenfold CV scores. The tenfold CV accuracy improved to roughly 85%, while the training accuracy stayed high at roughly 95%. Tenfold CV precision increased to 70%, while training precision grew to almost 90%. F1 scores reached 85% in training and 65% in tenfold CV, with recall rates for training and tenfold CV at 80% and 60%, respectively, as seen in
Figure 4c. Combining SSA and SHAP feature selection yielded the most significant improvements, as shown in
Figure 4d. With a training accuracy of 95% and a tenfold CV accuracy of 85%, the combined approach matched the results obtained with SSA optimization alone. On the other hand, during tenfold CV, the precision was marginally lower, at roughly 65%. The tenfold CV recall was at 55%, the training recall was at 75%, and the training and tenfold CV F1 scores were at 80% and 60%, respectively. Even though SSA optimization performed exceptionally well when used alone, integrating it with features chosen by SHAP did not produce better tenfold CV F1 scores. SVM prediction results are poor because identical significance across four fault conditions may indicate that features are not unique enough to increase the model’s predicted accuracy in a tenfold CV condition.
To evaluate the LSTM model’s capacity for prediction and generalizability, the assessment metrics were examined, and the prediction results can be observed in
Figure 5. Effectiveness was measured using the initial configuration with default training parameters. The LSTM model achieved a training accuracy of 85.7%, and metrics such as precision, recall, and F1 score closely matched at around 0.83. The model performed significantly better when exposed to tenfold CV; accuracy increased to 92.9%, precision to 0.944, recall to 0.940, and F1 score to 0.942, as shown in
Figure 5a. This improvement in the CV measures when compared to the training metrics indicates the model’s strong generalizability outside of the training dataset. After SHAP-selected features were included, an improved feature set was employed in an attempt to improve the prediction performance of the LSTM. This modification resulted in a reported training accuracy of 87.5%, and an increase in precision, recall, and F1 score to 0.847, 0.836, and 0.842, respectively (
Figure 5b). The tenfold CV measures demonstrated significant improvement, reaching 94.6% for accuracy, 0.951 for precision, 0.961 for recall, and 0.956 for F1 score, indicating effective selection of the predictive features. The noteworthy increase in both recall and F1 scores during CV suggests that the model has improved its ability to accurately identify positive instances across various data folds. SSA optimization fine-tuned the LSTM’s hyperparameters, resulting in 89.3% training accuracy, a corresponding improvement in precision, recall, and F1 score, and a final score of 0.857. The tenfold CV results were identical to the configuration with SHAP-selected features, with recall at 0.961 and accuracy, precision, and F1 score at 94.6%, 0.955, and 0.958, respectively, as shown in
Figure 5c. This suggests a balanced optimization of the model parameters, leading to consistent and trustworthy predictions across various data subsets. Combining SSA optimization with SHAP-selected characteristics yielded the best-predicted outcomes, as shown in
Figure 5d. The training accuracy reached 91.1% with precision at 0.877, recall at 0.866, and the F1 score at 0.871. The model’s accuracy increased to 96.4% in the tenfold CV; precision also reached 0.964, recall peaked at 0.982, and the F1 score approached the optimum value of 0.973. These findings demonstrate the model’s exceptional performance in identifying and verifying genuine positive cases while preserving a high level of accuracy.
The GRU model is also assessed to diagnose bearing faults with XAI-based feature selection and metaheuristic optimization.
Figure 6 presents the prediction results. With default training settings, the model achieved 85.7% training accuracy, 0.830 precision, 0.827 recall, and a 0.829 F1 score. This setup serves as a benchmark for evaluating the effectiveness of additional optimizations. Interestingly, all measures showed a considerable improvement when the model was subjected to tenfold CV; accuracy increased to 0.946, precision to 0.955, recall to 0.961, and the F1 score to 0.958, as observed in
Figure 6a. These findings demonstrate the GRU model’s significant generalization ability in a more demanding and diverse testing setting. Using SHAP to choose features led to an increase in all performance parameters of the GRU model after training: accuracy reached 89.3%, precision reached 0.913, recall reached 0.857, and the F1 score reached 0.884. This improvement was also shown in the tenfold CV metrics, where accuracy was 96.4%, precision was 0.964, recall reached a high point of 0.982, and the F1 score was 0.973, highlighting the critical role that feature selection plays in enhancing model adaptability, as shown in
Figure 6b. The GRU model’s hyperparameters were further optimized through testing with SSA, a novel technique influenced by the swarming patterns of salps. The resulting training measures showed 89.3% accuracy, 0.912 precision, 0.857 recall, and 0.884 F1 score, which were comparable to the SHAP-enhanced model. On the other hand, the tenfold CV showed better results: 98.2% for accuracy, 0.981 for precision, 0.991 for recall, and 0.986 for the F1 score. These figures provide compelling evidence for the effectiveness of SSA optimization in optimizing the model to obtain near-perfect recall and extraordinarily high CV precision, as observed in
Figure 6c. The integration of SSA and SHAP feature selection—a strategy intended to capitalize on the advantages of both approaches—marked the achievement of the investigation’s goal. This hybrid technique produced the best training outcomes, with an F1 score of 0.898, recall of 0.866, accuracy of 91.1%, and precision of 0.931. The results of the tenfold CV were especially remarkable, with recall at an astounding 0.991 and accuracy, precision, and the F1 score all at 0.982. These numbers show that careful hyperparameter tuning combined with deliberate feature selection results in an incredibly predictive model, especially when it comes to the recall statistic, where the model performs almost perfectly, as can be observed in
Figure 6d. The tenfold CV results clearly show that the hybrid technique of SHAP feature selection combined with SSA optimization leads to a GRU model with improved accuracy in diagnosis and generalization. The improvements in performance measures demonstrate the model’s improved ability to predict bearing fault conditions with high accuracy, consistency, and reliability across several data folds. Because of its high performance, the GRU model is excellent for detecting bearing faults, and it may also be used in monitoring and predictive maintenance systems across a range of industries. As the three models are compared for bearing fault diagnosis, SVM begins at a lower prediction result and does not reach the upper limits of the LSTM and GRU models, despite showing the most significant relative increase through optimization procedures. Both GRU and LSTM exhibit strong performances, with GRU outperforming LSTM by a small margin in the tenfold CV score. SSA optimization notably yields the best fault prediction results for the GRU model, demonstrating both its remarkable generalization capacity and its applicability for implementation in real-world fault diagnosis systems. As a result, even though all models gain from optimization, the GRU model—especially when SSA optimization is used—performs best in the prediction of bearing faults, providing a strong combination of accuracy and extensive detection abilities, as shown by the precision, recall, and F1 scores. This proves that GRU—optimized with SSA—is the best predictive model for diagnosing bearing faults based on the author’s proposed methodology.
The confusion matrices illustrate in
Table 3a–h how well an SVM model performs when it comes to detecting bearing faults. The matrices are displayed for training and tenfold cross-validation outcomes in four experimental configurations. True positives for ORD are constantly high in the training period, indicating the SVM’s ability to detect faults. However, there are a lot of false negatives in BD and IRD detection, which suggests that it is hard to identify these particular fault conditions. Applying tenfold cross-validation noticeably increases false positives for ORD in the default parameter configuration, indicating a tendency to mistakenly identify other fault conditions as ORD. Including SHAP-selected features results in marginal gains in BD identification, despite the loss of ORD precision. SSA improves IRD identification, but raises ORD false positives. Although there is a modest increase in ORD misclassifications, the hybrid approach enhances BD and IRD identification while maintaining high true positives for ORD. This thorough examination emphasizes the importance of optimization and fine-tuning to strike a balance between sensitivity and specificity across a variety of bearing fault conditions, as well as the advantages and disadvantages of the SVM in fault classification.
Table 4a–h shows the confusion matrices, which are a detailed way to compare how well a LSTM network can predict things when the parameters are set to different values. The LSTM exhibits a great capacity for accurately identifying ORD across all parameter values, with remarkably high true positives, as shown in the training results matrices. It is noted that there is considerable ambiguity between BD and HB as well as between IRD and ORD when using the default settings. However, the use of both SSA and SHAP feature selection, both separately and together, significantly enhances the LSTM’s accuracy in identifying BD and IRD, as evidenced by the decrease in off-diagonal elements in the corresponding confusion matrices. This is especially true when SSA and SHAP are applied simultaneously. Across all models and parameter settings, the ORD identification is still reliable, with only a small percentage of instances incorrectly categorized as IRD or BD. These matrices show how well the LSTM finds errors and how well SHAP and SSA optimization help it make better predictions. This is especially true when the model is put through strict cross-validation tests that check if it can work with new data.
Table 5a–h presents the confusion matrices, which illustrate the prediction output of the GRU model in identifying bearing fault conditions. When SHAP-selected features and SSA are integrated, either separately or in combination, the default parameter configuration shows some misclassifications between BD and HB and IRD and ORD, which are noticeably improved. These improvements result in fewer false negatives and better diagnosis of BD and IRD defects. The GRU network, especially when improved with SHAP and SSA, shows remarkable performance in the tenfold cross-validation results, with perfect identification for BD and a significant increase in identification accuracy for IRD and ORD.