**3. Results**

The previously presented feature engineering approaches are now compared to each other. In detail, the three approaches listed in Table 1 are considered.


**Table 1.** Feature engineering approaches used.

The first feature set is denoted as afb(8). No temporal past information is used with this feature set. It therefore serves as a reference. For the second feature set, the afb(8) approach is combined with the rollingmeans(10, 80, 600) approach. The third feature set is a combination of the reference features afb(8) and the cumsum approach.

To compare the three feature sets mentioned above, the workflow shown in Figure 1 is used, keeping the boundary conditions constant. The resulting regression and RUL predictions are shown in Figure 8 using a single test data set. While the plots on the left show the regression results of the trained machine learning algorithm, the plots on the right visualize the RUL prediction derived from it. The metrics MAE and R2 of the visualized results are entered within Figure 8.

**Figure 8.** Results of regression (**a**,**c**,**e**) and RUL prediction (**b**,**d**,**f**) visualized based on the crossvalidation run of test bearing No. 1 for the three different feature sets.

The prediction scatters strongly when using only afb(8) features, see Figure 8a. Several states can be identified in the predicted label, between which the prediction changes quite abruptly. In the case of an ideal prediction, all prediction points (blue) would be exactly on top of the reference line. Thus, the vertical deviation of the predictions from the reference line visualizes how inaccurate the prediction is. The same applies to the mapping of the RUL. Here, with optimal prediction, the test data points would align with the orange line, representing the true RUL. The corresponding RUL prediction using the afb(8) features is very inaccurate due to the large prediction spread of the regression results and poorly represents the true RUL, as can be seen in Figure 8b. A significant improvement in prediction quality is achieved by adding the rolling-means, as shown in Figure 8c,d. On average, the forecast shows similar trends, but is much less scattered. This is evident not only in the predicted label, but also within the resulting RUL prediction. Further improvement of the results is achieved with the combination of the afb(8) and cumsum features, which is visualized in Figure 8e,f. The steps visible with the other two feature sets disappear almost completely here. These improvements of the results can be determined not only visually, but also based on the metrics calculated. Smaller MAEs and larger R2s represent the prediction improvements.

Since Figure 8 only illustrates one of the total of nine cross-validation runs, the overall cross-validation results are summarized in Table 2. For this purpose, the average of the regression MAE and the regression R<sup>2</sup> calculated via cross-validation are entered. Additionally, the averaged MAE of the RUL prediction as well as the relative deviation of the MAE with respect to the test run times are evaluated in the last two rows.


**Table 2.** Results of cross-validated regression and RUL prediction.

Looking at the averaged metrics from cross-validation, the results already obtained in Figure 8 are supported. Adding the rolling mean features to the afb(8) yields a significant improvement, with the cumulative features performing even better compared to the rolling mean features. In the MAE of the individual test bearings' RUL, it is noticeable that this sequence of model performance does not apply quantitatively in the same way for each test bearing. Consequently, there is a non-negligible dispersion of the individual test data sets. A possible explanation for this dispersion is the different physical wear behavior of the various bearing endurance test runs used.

#### **4. Discussion**

Based on the experimental data used, the results presented show that a clear improvement in RUL prediction is possible with the help of temporal information, implemented by means of time-considering features. By using a well-defined workflow where only the feature sets are changed, the impact of the different features on the RUL prediction performance is evaluated. For the RUL prediction, a random forest regression approach is used. Comparing the two presented ways of incorporating temporal past information in the form of extended feature sets, the approach of cumulatively generated features performs particularly well. By using this extended feature set, the averaged MAE of RUL predictions can be reduced by 37% in comparison to the use of base features only. Calculating rolling means with progressively staggered window widths also adds value in terms of predictive accuracy, although the results are slightly worse than those obtained with the cumulative approach. In the case presented here, the base features are formed from the so-called averaged-frequency-bands, which have already been shown to perform particularly well on the data used in [20]. The authors assume that the methodology presented here will lead to improved RUL predictions for other base features in an analogous manner. A validation of this hypothesis is still pending at this point.

It should be noted that the evaluations carried out here are based on test data obtained on a rolling bearing test rig under constant operating conditions. Limitations are to be expected when implementing the methodology proposed here in a real application, with varying boundary conditions such as speeds, loads or temperatures. In particular, the formation of accumulated features could be error-prone, since each individual point in time has an influence on the entirety of the following time span. Thus, continuous and reliable measurement data acquisition is indispensable for the correct determination of accumulatively formed features.

Future work can investigate further approaches of feature engineering and the possibilities of considering temporal information. The implementation of further RUL prediction methods and the possibilities of deep learning algorithms have been omitted here in order to focus on the integration of temporal information via extended feature engineering approaches. For comparison purposes, it seems reasonable to also consider deep learning methods, such as CNNs, RNNs or LSTMs, which natively offer the possibility to take temporal information into account. However, with these methods, the comprehensibility of decision making is lost. Furthermore, with regard to hybrid models, it seems promising to motivate the development of novel features by physical backgrounds. The investigations should also be extended to additional data that are recorded at non-constant bearing operating conditions. In order to achieve satisfactory RUL prediction results even at non-constant operating conditions, the methods may have to be extended.

**Author Contributions:** Conceptualization, C.B.; methodology, C.B., A.V. and M.K.; software, C.B.; validation, E.K. and A.V.; formal analysis, E.K. and A.V.; investigation, C.B.; resources, C.B., E.K., A.V. and M.K.; data curation, C.B., A.V. and M.K.; writing—original draft preparation, C.B.; writing review and editing, E.K., A.V. and M.K.; visualization, C.B.; supervision, E.K.; project administration, M.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** We acknowledge support by the Deutsche Forschungsgemeinschaft (DFG–German Research Foundation) and the Open Access Publishing Fund of Technical University of Darmstadt.

**Data Availability Statement:** The measurement data presented in this paper are available on request from the corresponding author after prior approval by Robert Bosch GmbH.

**Acknowledgments:** The authors would like to thank the Robert Bosch GmbH for enabling the studies presented in this paper, including the measurement data used, to be elaborated in the Corporate Research of the Robert Bosch GmbH.

**Conflicts of Interest:** The authors declare no conflict of interest.
