1. Introduction
Prognostics and Health Management (PHM) is an integrated technology aimed at enhancing equipment reliability and operational efficiency. One of the key technologies in PHM is to predict the operational status of equipment to minimize losses caused by failures, ultimately to achieve the purpose of predictive maintenance. Rolling bearings serve as critical basic components in mechanical systems, its application covers the automotive industry, rail transportation, agricultural machinery, national defence and military industry and other fields, to maintain the health of mechanical equipment, smooth operation plays a key role. Rolling bearings serve for a long time in extreme working conditions and harsh environments, due to fatigue, wear and tear, corrosion, overload and other factors will lead to performance degradation, which will cause a series of failures, and ultimately lead to safety accidents and property damage [
1]. Most of the multiple problems mentioned above are related to the condition of the bearings themselves, which are subject to increasing friction in the course of their work, leading to fatigue, wear and other forms of failure; When there is a decline in the tightness of the bearings or there are process problems in the manufacturing process of the bearings, resulting in the emergence of radial internal clearance, which is also the reason why the bearings will have the above problems; Also, insufficient lubrication problems can lead to corrosion and overloading [
2,
3,
4]. RUL prediction is used to estimate the normal use time of mechanical equipment or its key components in the current operating state, which can not only summarise the operating state of the equipment, but also help to optimise the subsequent management decisions, RUL prediction as the core technology of PHM has the role of the beginning and the end. Therefore, accurate and efficient RUL prediction is of great significance in analyzing bearing degradation trends, optimizing operation and maintenance decisions, reducing equipment failures, and improving equipment utilization, as well as helping to reduce economic losses and avoid accidents [
5,
6].
Existing principal methods for bearing RUL prediction are primarily classified into two categories [
7,
8]: RUL prediction approaches driven by model and RUL prediction approaches. Model-driven RUL prediction methods primarily rely on mathematical and physical models to describe the failure and degradation mechanisms of systems or equipment, so as to realize RUL prediction.Model-driven RUL-based prediction methods can still provide high-precision prediction results when the model is constructed stably and the parameters are determined reasonably well, but the lack of an in-depth understanding of the internal mechanism of the system will lead to a model that is not accurate enough or difficult to be applied in practice.Data-driven RUL-based prediction methods, on the other hand, rely on degraded data, do not require in-depth knowledge of the physical or engineering principles of the system, and can automatically extract features and learn them with the help of machine learning, which avoids the problem of degraded modelling and reduces human intervention and subjectivity, and is therefore increasingly being used [
9].
Data-driven based approaches are utilizing historical degradation data to train network models for RUL prediction.In recent years, with the innovation of sensor technology and the development of big data processing technology, Deep learning-based methods for RUL prediction have increasingly emerged as a prominent area of research. Ren et al. [
2] developed a new feature extraction technique leveraging deep convolutional neural networks to extract degenerate features and used them for RUL prediction, and achieved good prediction accuracy. Wang et al. [
10] utilized a convolutional autoencoder to construct a Health Indicator (HI) as a label for the bearing vibration data, and then input the labeled data into a one-dimensional CNN network, which ultimately realized the bearing RUL prediction. Cao et al. [
11] introduced a novel deep learning model that integrates the residual attention mechanism with temporal convolutional networks, and proved through experimental validation that the model has better performance in long-term prediction tasks in RUL prediction. Huang [
12] utilized a deep learning approach focusing on the spatio-temporal features of the data and introduced A global attention mechanism for weighting features, and finally used a multi-branch CNN for RUL prediction. Xie et al. [
13] used expanded CNN to extract multi-scale spatio-temporal features from normalized signals, and then used CAM for weighting features, and finally achieved the prediction of bearing RUL. Marco et al. [
14] utilized instantaneous spectral entropy (ISE) and continuous wavelet transform (CWT) to monitor the health status of rotating machinery, and provided important theoretical support for the unsubsequent RUL prediction.
Most of the current studies use a single attention mechanism. Channel attention only emphasizes the significance of various channels, while spatial attention only highlights the significance of different spatial locations, if a single attention mechanism is used it will ignore the important features in the other dimension, resulting in incomplete feature selection. The Convolutional Block Attention Module (CBAM) is an attention mechanism designed to improve Convolutional Neural Network (CNN) performance. CBAM improves feature representation by adding an attention module behind the convolutional block, which enables the model to adaptively focus on important information in the input feature map. He et al. [
15] proposed a bearing RUL prediction model based on multiscale CNN, feature extraction is performed by multiscale CNN, and CBAM is embedded in the CNN structure to realize the allocation of the importance of different features, and finally the RUL prediction of bearings is realized on the bearing dataset. Wang et al. [
16] proposed a bearing RUL prediction model based on CBAM and temporal convolutional network, where temporal convolutional network is used for feature extraction and CBAM is used for adaptive weight assignment of different features, and finally the model is verified to have good generalization on the public dataset. Yu et al. [
17] first added the improved Inception V1 module to the CNN structure to enhance the CNN’s capability for comprehensive feature extraction, then added the CBAM module to the CNN structure to weight the features extracted by the CNN, and experimentally confirmed the proposed model’s validity. Zhang [
18] and others used continuous wavelet transform (CWT) to transform the data dimensions, and then used CBAM and ResNet to construct the bearing RUL prediction model, and finally successfully realized the bearing RUL prediction under two working conditions. Xiang et al. [
19] proposed a dynamic residual network (Dy-ResNet) combined with CBAM for bearing RUL prediction on the basis of ResNet network, and the experiments proved that Dy-ResNet is much better than ResNet prediction.
To this end, this paper proposes a residual life prediction method based on CBAM, which first uses FFT to perform data preprocessing on the original vibration signal data of rolling bearings, and inputs the processed signal data into CNN to extract features, aiming at extracting the deep features in the signal data that contain rich degradation information. Then CBAM is used to assign attention weights to the features, focusing on the key feature information and ignoring the influence of unimportant features. The channel attention mechanism in CBAM can enhance the expression of channel feature information, while the spatial attention mechanism focuses on the representation of important spatial feature information in local areas, which are complementary and mutually advantageous.Finally, the long-term dependency of the time-series data is learned through LSTM network, which enables the model to perform RUL prediction more accurately. The proposed model makes up for the lack of close connection of channel and spatial features in the field of bearing RUL prediction.The CBAM attention mechanism module is used in combination with the CNN-LSTM network model and has shown excellent performance in a wide range of application scenarios such as in image analysis and processing, fault diagnosis, life expectancy prediction, and medical disease diagnosis.Compared to the RUL prediction models proposed in literature [
20] and literature [
21], the CBAM-CNN-LSTM model has better generalization and better prediction performance. Comparing to the literature [
20], where the models in that literature were evaluated using MSE for prediction performance, the MSE of the model in this paper is reduced by 12.28% and 10.34%. Comparing with the literature [
21], which uses MSE, MAE, and RMSE for prediction performance evaluation, the model in this paper has 53%, 9.75%, and 31.68% reduction in the three metrics, respectively.
The paper is organized as follows:
Section 2 provides the theoretical background.
Section 3 gives the network structure of the bearing RUL prediction method and the process of bearing RUL prediction.
Section 4 performs experimental validation to verify the feasibility and effectiveness of the proposed model on a publicly available bearing dataset. Conclusion of the proposed method is given in
Section 5.
3. Bearing RUL Prediction Methods and Processes
3.1. Network Model Structure
The RUL prediction method proposed in this paper combines the advantages of CNN, LSTM, and CBAM, and gives full play to their advantages on the basis of rational arrangement of the structure of each module, and focuses the used resources on improving the RUL accuracy and generalization of the target task. The network model structure of the method is shown in
Figure 6.
Three convolutional layers of CNN are used for feature extraction of the FFT-processed frequency-domain amplitude signals, while two maximum pooling layers are used for feature dimensionality reduction in order to reduce the model computation and improve the model RUL prediction accuracy.
The valid features extracted using CNN are used as inputs to the CBAM module inserted in front of the third maximum pooling layer, which successively uses CAM and SAM to select valid information for the features, and subsequently assigns weights to the information of varying importance.
The LSTM layer captures the long-term dependencies contained in the significant features after weighted assignment. Finally, local feature fusion is performed through a single fully-connected layer to complete the label mapping from degraded features to RUL and output a predicted value.
In order to improve the network training speed and mitigate overfitting, a batch normalization layer is introduced between the model convolution layer and and the maximum pooling layer. Batch normalization reduces the sensitivity of the model to data distribution and increases the generalization ability of the model. The batch normalization for batch input data
is calculated as:
where:
is the mean of the batch input data;
is the variance of the batch input data;
is the normalized input;
is the output after scale transformation and bias;
and
are the parameters that need to be learned when the network is trained.
3.2. Rul Prediction Process
Figure 7 illustrates the RUL prediction flowchart for the proposed method and the specific steps of the process are given below:
Step 1: RUL prediction was performed using the publicly available dataset PHM2012. The acquired raw vibration signal data of rolling bearings are subjected to Fast Fourier Transform to get the frequency domain amplitude signal for the purpose of data preprocessing. And the obtained dataset is divided into training set and test set.
Step 2: The RUL prediction used in this paper is an “end-to-end” prediction method, which is a RUL prediction method that directly inputs data from the input end to the output end to get the prediction result, and defines the range of RUL label values as (0, 1). The CBAM-CNN-LSTM network model is constructed using CNN, LSTM, and CBAM modules, and the ECA-CNN-LSTM network model is constructed to compare with the model proposed in this paper.
Step 3: The network model of the proposed method is trained using the training set of bearings under three different operating conditions and the model parameters are adjusted simultaneously to achieve the best RUL prediction. The model hyperparameters are tuned using Adam optimizer and the forward propagation algorithm is used to adjust the network model weights to arrive at a loss function value that minimizes the error between the predicted and true values.
Step 4: The trained model is tested using the selected bearing test set and the prediction results are obtained, the model prediction effect is evaluated using performance evaluation metrics, and a model comparison is made with the comparison model to conclude the superiority and effectiveness of the model in this paper.
4. Experimental Validation
4.1. Introduction to the Dataset
In this paper, we use the experimental dataset from the PHM Data Challenge held in 2012, which was acquired by the PRONOSTIA test rig under different loading conditions. The PRONOSTIA test rig is shown in
Figure 8. The vibration signals were acquired from the horizontal direction and the vertical direction of the PRONOSTIA test rig, respectively [
32]. During the signal acquisition process, the data acquisition was carried out every 10 s with a sampling frequency of 25.6 kHz and a sampling time of 0.1 s. The PHM2012 dataset was initially used in the RUL Prediction Challenge, the association organizing the competition has set up three working conditions the first two bearings as the training set and the rest as the test set, the training set is characterized by a small amount of data and does not give the type of faults that occurred, which makes it difficult to perform an accurate RUL. At the same time, this dataset includes multiple bearing data under various operating conditions, the bearing data is large in scale, the data is a complete full-life degradation cycle data, and the data is of high quality, so we chose the PHM2012 dataset for the prediction model training.
The test dataset contains the full life cycle degradation data of 17 rolling bearings in three different working conditions, seven sets of data for bearings 1-1 to 1-7 in working condition one, seven sets of data for bearings 2-1 to 2-7 in working condition two, and three sets of data for bearings 3-1 to 3-3 in working condition three. The test uses six sets of bearing data across varying operating conditions as the training set, that is, bearings 1-1, 1-2, 2-1, 2-1, 3-1, 3-2, and bearings 1-3, 1-4, 1-5, 1-7, 2-3, 2-6 as the test set.
Table 1 below shows the test data sets for the bearings.
Rolling bearings in the operation of the process there are two different failure modes, namely, the performance of gradual degradation of the failure mode, the performance of sudden failure mode. Performance gradual degradation failure mode is characterized by the bearing vibration amplitude over time will continue to grow, but its overall degradation trend is relatively slow; performance sudden failure mode is characterized by the bearing in the early part of the operating state is relatively stable, but the later part of the amplitude of its vibration will be drastically changed. According to
Figure 9a,b bearing 1-3 is a typical performance degradation failure mode, and bearing 2-6 is a typical performance sudden failure mode. In order to visualize the two failure modes of rolling bearings, the proposed method can simultaneously predict the RUL of rolling bearings under two failure modes, bearing 1-3 and bearing 2-6 are selected from the test set for experimental analysis.
4.2. Data Preprocessing
Original vibration signals from rolling bearings contain extensive performance degradation information, but the larger data dimension will affect the subsequent RUL prediction work, and data preprocessing of bearing data is needed. And FFT is widely used in the field of data signal processing and other fields, and can efficiently complete the processing task [
33]. Therefore, FFT was performed on 12 rolling bearings for 3 conditions to convert the time domain magnitude signals into frequency domain magnitude signals. Since the data collected in the horizontal direction of the test dataset contains more valid information, the subsequent tests are trained and tested using vibration data in the horizontal direction. Taking the sampling data in the horizontal direction within the first time of 0.1 s for bearing 1-1 as an example, the original time-domain amplitude signal is shown in
Figure 10a, and the frequency-domain amplitude signal after FFT is shown in
Figure 10b.
4.3. Evaluation Indicators
In order to accurately assess the validity of the performance of the proposed RUL prediction model, MSE, RMSE, MAE, and coefficient of determination are often used as model performance evaluation indexes. MSE, RMSE and MAE, as classical evaluation indexes, the closer their values are to 0, the better the predictive ability of the model and the higher the prediction accuracy. And both of them measure the size of the deviation between the true value and the predicted value, but MSE, RMSE is more sensitive to the outliers, which can reflect the performance of the model to the extreme values, and MAE focuses on the overall error level of the model, which can reflect the average prediction accuracy of the model. However, MSE, RMSE and MAE alone may not effectively indicate the quality of the model, introducing is essential to measure the fit between predicted and true values, advantage lies in the model for the interpretation of the data, value range is [0,1], A higher value, closer to 1, indicates better model fit, with predicted values more closely aligning with the true values.
In the face of complex models or different application scenarios, a single performance evaluation metric cannot help us fully understand the performance of a model. Therefore, by choosing three performance evaluation metrics, namely, MSERMSE, MAE, and , to evaluate a prediction model, this allows for a more thorough and precise evaluation of model performance and ensures the model’s effectiveness and robustness across various real-world applications.
The
MSE,
RMSE,
MAE, and
formulas are:
where:
N represents the number of bearing samples,
,
,
is the actual value, predicted value, and mean value corresponding to the first sample.
4.4. Parameter Setting
Aside from the CBAM-CNN-LSTM model presented in this paper, CNN-LSTM, and an ECA-CNN-LSTM model that combines a channel attention mechanism (ECA) and CNN-LSTM are also selected as comparison models.
Table 2 below shows the main parameters of the CBAM-CNN-LSTM model.
In addition, all models utilize MSE as the loss function, with the Adam optimizer and a learning rate of 0.001, a batch size of 128, and a number of iterations of 100. The experimental environments are torch1.13 and Python3.4, and the models are trained using only the CPU, which is model i5-8265U (Intel Corporation, CA, USA), and the RAM is 8 GB.
When we set the size of the convolution kernel, the first layer kernel size is set to be larger than 32 in order to avoid missing feature information, and then the kernel size of the next two layers is set to be smaller, which is also in line with the principle of CNN layer-by-layer convolution for extracting important local features, and therefore the kernel size of the last two layers is set to be 10. Such a size setup can improve the robustness and prediction performance of the model. The selection of the step size is the same as the kernel size selection, the larger step size 2 is selected first for data dimensionality reduction, and the subsequent step size is set to 1 aiming at retaining the local feature information, Secondly, the kernel size and step size should match the input data, so no arbitrary changes can be made. For the setting of the number of CNN convolutional layers, multi-layer convolutional operation can indeed extract deeper feature information, but will inevitably bring problems such as overfitting, therefore, choosing the appropriate number of 3-layer convolutional layers not only improves the model prediction performance, but also prevents overfitting and reduces the waste of computational resources. In order to verify our theory, bearings 1-3, the performance evaluation index
, were selected for the sensitivity analysis of the number of convolutional layers of the CNN in
Table 2, When the convolutional layer is 3 to 6 layers,
is 0.943, 0.924, 0.908, 0.9431 respectively, and the experimental study proved that the selected parameters have been After repeated experimental comparisons and integrating the model prediction effect and the time used, the optimal parameters were selected.
4.5. Analysis of the Test Results
Comparing the proposed method with the introduced CNN-LSTM and ECA-CNN-LSTM models, it can be seen from
Table 3 that the prediction performance of the CNN-LSTM model without CBAM decreases more significantly, which proves that the CBAM module, which can be assigned to different channels and spatial feature weights, can make the model pay more attention to the important information, and has a significant effect on the improvement of the model RUL prediction performance has a significant effect. In contrast to the ECA-CNN-LSTM model for single-dimensional features, the CBAM-CNN-LSTM model, because the CBAM module is a mixture of two attention mechanisms in series, allows the model to focus on the spatial structure of the features while also focusing on the information of the local features, which enables the CBAM-CNN-LSTM model to capture degraded data in a more comprehensive way. degradation information in both dimensions, adapting to complex and diverse scenarios.
In this paper, three performance evaluation metrics, MAE, RMSE, and
, are employed to evaluate model performance.
Table 3 lists the three performance comparison metrics of MSE, MAE, RMSE, and
for three different models for the six bearings in the test set. The MSE, MAE and RMSE values of the CBAM-CNN-LSTM model for test bearings 1-3 are the smallest among the three models, which are 0.047, 0.037 and 0.069, respectively. Compared to the MSE, MAE and RMSE values of the ECA-CNN-LSTM model, reductions of 76.85%, 67.54% and 51.74% are observed. The MSE, MAE value and RMSE value of the comparative CNN-LSTM model are reduced by 14.73%, 66.36% and 43.90%; its
value is the highest, 0.943, and the
value of the comparative ECA-CNN-LSTM model and CNN-LSTM model is improved by 19.8% and 13.4%. From
Table 3, it can still be concluded that the MAE, MSE, and RMSE values of the test bearing 2-6 model CBAM-CNN-LSTM are still lower than those of the other comparison models, and its
value remains superior to that of the comparison models.
In addition, in order to verify the superiority of the proposed method in an all-round way, 100 experiments are carried out and averaged for the proposed model and the comparison model, and the experiments show that, although the proposed method is more complicated than the comparison method, the consumed time of 19 min is only slightly higher than that of the two comparison methods of 18 and 16 min, which meets the requirements of real-time prediction and practicability, and the comparison models of ECA-CNN-LSTM and CNN-LSTM, although consuming less time, the prediction effect is far inferior to that of the proposed method.
Table 3 illustrates that the constructed model is not only applicable to bearing RUL prediction for different failure modes, but also capable of more accurate RUL prediction.
Figure 11 and
Figure 12 show the prediction effects of the test bearings on the three models.
Figure 11 illustrates, the predicted RUL values of bearings 1-3 fluctuate up and down in accordance with the true values, and the error with the true values is small, which aligns with the gradual performance degradation typical of the failure mode, and the CBAM-CNN-LSTM model exhibits higher overall prediction accuracy than the other two models. From
Figure 12, the predicted RUL value for bearing 2-6 shows minimal deviation from the actual value, even with fewer data points, and the first half of it fits the real value, and the second half of it deviates from the real value due to its performance of the sudden failure modes; however, the CBAM-CNN-LSTM model consistently outperforms the other two models in prediction performance across all stages.
The network model proposed in this paper is applicable to the RUL prediction of rolling bearings under different degradation modes and can achieve good prediction results. For rolling bearings in the performance degradation failure mode, the degradation process is relatively smooth, so for this degradation mode, the impact on the RUL prediction model belongs to the conventional category, we use CNN, LSTM in the design of the network model can be very good to capture the smooth degradation trend in the data. As for the rolling bearings in the performance sudden failure mode, the degradation process is more complicated, and there are drastic changes in the late degradation stage, compared with the performance gradual degradation mode, there is only a certain amount of degradation information in the features extracted in this stage, so the prediction of the bearing RUL in this mode is more difficult. For such degradation modes, we use a hybrid attention mechanism CBAM module on top of the CNN-LSTM network, where the CAM and SAM in CBAM can be weighted for key features, i.e., the important degradation information present in the emergent phase is weighted with attention. How to carry out the network model design, both can realize the bearing RUL prediction in the two degradation modes.
In order to reflect the superiority of the proposed method, the prediction results of the network model proposed in this paper are compared with the literature [
20] and the literature [
21] using the same dataset and the same training set of bearings, and the results are shown in
Table 4, taking bearings 1-3 and bearings 2-6 as examples.Literature [
20] uses only MSE for performance evaluation, while literature [
21] uses MSE, MAE and RMSE as performance evaluation indexes for prediction models. The comparison shows that the prediction errors of this paper are lower than literature [
20] and literature [
21], and the prediction errors of other test bearings selected in this paper are lower than literature [
20]. For literature [
21], except for bearings 1-7 and bearings 2-3, the prediction errors of the other bearings are lower than that of literature [
21]. Overall, the method proposed in this paper is better than the literature [
20,
21].
5. Conclusions
This paper presents a method for predicting bearing RUL combining convolutional attention mechanism and CNN-LSTM, and draw the following conclusions:
FFT does the data preprocessing to prepare for subsequent feature extraction. CNN performs the feature extraction, and the extracted features are subjected to CBAM for the allocation of attention weights, while LSTM learns the temporal features of the features, ultimately enhancing the RUL prediction accuracy of the bearings and realizes RUL prediction under different failure modes.
The prediction results show that the MSE, MAE and RMSE of this study are reduced by 53%, 9.75%, and 31.68%, respectively, in comparison with the existing bearing RUL prediction methods, which proves the effectiveness and superiority of the proposed method.
The data-driven ToMFIR technique can integrate multi-source information technology, and in future research, this technique can be effectively combined with existing neural network models, which will further improve the performance of RUL prediction for rolling bearings.