*2.4. Features and Targets for Machine Learning*

The input and output parameters for machine learning were derived from the calculation models and results and formed the database. The input parameters were structured by mechanical and geometrical properties of the cage as well as the loading parameters and the resulting class of the cage motion (according to Schwarz et al. [21]), see Table 4. Stiffness as a mechanical property is defined using a weighted area moment of inertia and cross-sectional area of the cage as input parameters. The weighting of the cross-section properties in the pocket and in the bar is based on a nonlinear function that provides a disproportionate amount of the area moment of inertia and the cross-sectional area in the cage pocket according to Schwarz et al. [21]. The cage mass and the mass moments of inertia complement the mechanical properties. The geometrical properties consist of the pocket shape parameters and the pocket and guidance clearance of the cage. The mechanical and geometrical parameters represent the essential properties that can be derived from a given cage geometry. The axial and radial loads, as well as the torque acting on the inner ring, were defined as relative quantities in relation to the basic static load rating *C*0,r and the pitch diameter *d*<sup>p</sup> as input parameters. Thus, the database can be supplemented by calculation results of other bearing sizes in the future. The cage motion class is represented by one of the basic observable cage motion types, "unstable", "stable", or "circling", and was determined using Quadratic Discriminant Analysis based on the simulation results. The movement types differ in their dynamic behavior and can be classified based on their kinematics [34]. The cage motion type can also be predicted with high reliability by the classification algorithm AdaBoostM1 using the input parameters of the simulation [21]. However, the cage motion class provides information about the dynamics of the cage in a qualitative level. By extending the prediction using a regression algorithm, the relevant kinematic results can be specified more precisely.


**Table 4.** Features for the machine learning model.

The Cage Dynamics Indicator (CDI) defined by Schwarz et al. [21] contains all necessary parameters for the assessment of the cage dynamics and was used as the target of the regression task. The median (med) and the quantile distance (qd) indicate the distribution of the motion quantities contained in the CDI and were determined from the calculated time series. For the evaluation of the cage motion, the Ω-ratio, the cage coordinates normalized to the guidance or pocket clearance *x*˜c, *y*˜c, and *z*˜c, the rotational ratio *n*˜ c, and the equivalent deformation force *F*e were used.

In addition to the CDI, the output parameters include the median of the frictional torque *T*f, the median of the contact forces on the cage |*F*c| and the median of the translational acceleration |*a*c| of the cage. In total, the output parameters for the regression algorithm consist of 10 parameters, which can be used to assess the cage dynamics as well as the energy efficiency of the bearing. In previous research papers, the CDI has been used as a key figure to assess the cage motion calculated by the dynamics simulation [14,21,34]. In this contribution, machine learning methods will be used to predict the CDI in order to accurately assess cage dynamics.

A strong scatter of the target variables reduces the prediction accuracy of the algorithms. Therefore, an anomaly detection for each motion class identified outliers of the target variables and removed them from the database. A density-based approach developed by Breunig et al. was used for anomaly detection. The local outlier factor (LOF) determines the degree of isolation of a data set compared to the immediately neighboring data sets [35].

#### *2.5. Regression Algorithms and Hyperparameter Optimization*

In this paper, the prediction accuracy of three different regression algorithms (Random Forest, XGBoost, and Artificial Neural Networks) used to estimate rolling bearing cage dynamics are compared. The hyperparameters of the models were determined by an EA as part of an optimization of the prediction accuracy [28].

RF is an ensemble method based on the 'wisdom of the crowd' paradigm. According to this, a prediction made by a large number of different persons/models achieves better results than the prediction of a single person/model. Accordingly, an RF regressor contains multiple regression trees that learn the regression problem using different sub-sets of the original training data. These sub-sets are regenerated by bagging for each regression tree. The degree of randomness is further increased by using only a random selection of features for training the decision trees. Random components (bagging or feature selection) reduce the model's tendency to overfit the training data [36].

Gradient boosting is another ensemble method developed by Friedman [37]. In an iterative process, multiple regression trees are trained. The training process of a decision tree depends on predictions and loss of the already trained decision trees in the ensemble. One implementation of gradient boosting is XGBoost (extreme gradient boosting) [27], which was used for predicting cage dynamics in the present case. As the regression algorithm in XGBoost is designed to predict only a single value, one model was trained for each output parameter. However, this allows interactions of the targets to be represented less effectively than with the random forest regressor.

ANNs are widely used algorithms for classification and regression in the field of machine learning. The input value of a neuron results from the weighted sum of the output values of the neurons of the previous layer and a so-called constant bias value. The neuron's input value is converted into the output by a nonlinear activation function. During the training of the ANN, the weights as well as the bias values are optimized so that the relationship in the training data between the input values and the output values can be predicted as accurately as possible [38]. For the prediction of the cage dynamics in this paper, an ANN consisting of a total of five layers was trained using the training algorithm Adam [39]. The target variable of the optimization procedure is the mean square error (MSE) between the ANN's predictions and the target values contained in the training data.

For the ML algorithms, hyperparameters such as the ANN's number of neurons per layer need to be specified. With the help of an EA, the hyperparameters were determined so that the prediction accuracy of the models were optimized. The remaining parameters of the models are listed in Appendix A. The EA uses mechanisms of biological evolution such as selection, recombination and mutation to improve the fitness (metric for assessing regression results, e.g., coefficient of determination *R*2) of the individuals (set of hyperparameters) contained in a population (amount of individuals) for a predefined number of generations, see Figure 4. Starting from an initial population generated by Latin hypercube sampling, the fitness of each individual is determined. The fitness of the individuals and target value of the EA was represented by the mean *R*<sup>2</sup> according to Equation (5). Using a K-fold (*K* = 5) cross-validation, a total of *K* validation data sets were generated from the training data for fitness evaluation. The data set was randomly split so that 85% is used for hyperparameter optimization as well as the cross-validation contained within the loop and 15% for subsequent testing of model predictions. *R*<sup>2</sup> was calculated by evaluating the arithmetic mean of the *R*<sup>2</sup> for each validation data set and target variable. The prediction accuracy for the validation data is an indicator of the generalization capability of the model, which can be finally evaluated after training by the test data sets.

$$\overline{R^2} = \frac{1}{K} \cdot \frac{1}{N} \cdot \sum\_{i}^{K} \sum\_{j}^{N} R\_{ij}^2 \tag{5}$$

The *R*<sup>2</sup> of each output parameter was calculated by Equation (6) using the predictions of the algorithm *y*ˆi, the target parameter according to the test data *y*i, and its arithmetic mean *y*. Thus, *R*<sup>2</sup> can reach a maximum value of 1 in case of an error-free prediction of the algorithm.

$$R^2 = 1 - \frac{\sum\_{i}(y\_i - \mathfrak{y}\_i)^2}{\sum\_{i}(y\_i - \overline{y})^2} \tag{6}$$

After calculating the fitness of the initial population, the evolutionary process consisting of selection, recombination, mutation, and evaluation of fitness was repeated in a

given number of generations. Individuals for recombination were selected by the fitness proportional method stochastic universal sampling. Each individual received an area proportionate to its fitness value on a wheel. By spinning the wheel once and with *n* arrows equally distributed around the circumference, *n* individuals were selected by the pointers. Recombination was performed in pairs for the selected individuals. The list of hyperparameters of two individuals for mating were separated at two points and the new resulting individuals were defined by alternating the combination of the sections, see Figure 4. After recombination, mutation was performed for each parameter contained in the individual by a uniformly distributed random variable. Mutation served to generate new parameter specifications in the population and was performed with a previously defined probability. The individuals produced by recombination and mutation, as well as the best individual from the previous population (elite), formed the new population for the following generation.

**Figure 4.** Steps of the EA used for hyperparameter optimization of the regression models.

After a predetermined number of generations, the model with the highest fitness and best prediction accuracy for the test data was returned by the EA. The parameters controlling the behavior of the EA can be taken from Table 5.

**Table 5.** Parameters of the EA for the optimization of the hyperparameters of the regression algorithms.


Table 6 shows the hyperparameters of the algorithms and components of the individuals as well as the range of the parameters considered during optimization. The ranges of the hyperparameters were chosen to be comparatively large in order to provide as

many parameter combinations as possible. Large ranges of the hyperparameters increase the risk of overfitting (e.g., large number of neurons contained in the ANN). However, overfitting was avoided, including in the training methods of XGBoost and ANN, by using evaluation datasets. Based on the predictions for the evaluation datasets that were not used directly for training, it is determined whether overfitting is present in the current state of the training process. No evaluation dataset was used for Random Forest, because the algorithm generally has a low tendency to overfit the training data [36].


**Table 6.** Hyperparameters of the regression models optimized using the EA.

#### **3. Results**
