*3.2. Preprocessing of Calculation Results and Data Analysis*

The calculated time series were the starting point for determining the targets for machine learning. For the evaluation of the cage dynamics, the time range *t* = 0.5...1 s was analyzed to avoid unrepresentative cage motions due to the initial conditions at the beginning of the calculation.

In addition it was checked whether the simulation results are suitable to be integrated into the database. Especially for simulations with high friction coefficients, a severe deformation of the cage occurred, which led to a termination of the simulation. Nonphysical results as the automatically generated inputs are out of a reasonable range for this application and were removed from the database. Using the density-based LOF approach, outliers in the database could be identified and removed. The LOF approach was applied to each of the classes "unstable", "stable" and "circling". Outliers with respect to the dynamic behavior typical for the respective classes were thereby identified. Figure 8 illustrates the

outliers (red) and the remaining datasets (blue). Outlier detection reliably removed atypical cage movements, ensuring a high-quality database for machine learning. After preparing the simulation results, the database for machine learning contained a total of 1362 data sets.

**Figure 8.** Distribution of target regression variables (**a**) med(Ω) and qd(Ω) as well as (**b**) qd(*Fe*) and qd(*n*˜) in the database. The data sets marked in red are identified as outliers using the LOF approach and not considered for training the regression models.

Figure 9 shows the correlation matrix for determining the qualitative relationship between input and output parameters. The mechanical properties of the cage (area moment of inertia ˜*I*, mass *m*, area cross section *A*˜, and moment of inertia *J*) had similar values for the correlation coefficient and thus a related influence on the target parameters, see Figure 9a. A mathematical negative correlation existed between the mechanical properties and the center of mass acceleration of the cage |*a*c|. Accordingly, lower accelerations occur at higher masses of the cage, which can be justified by the inertia of the geometry. There is also a positive correlation between the cage mass and the equivalent force *F*<sup>e</sup> representing the deformation of the cage. Thus, for the cages with larger masses, the equivalent deformation force tend to be larger. With respect to the bearing speed *ni* and friction coefficient *μ*c, a mathematical positive correlation to cage acceleration, contact forces, and finally a highly-dynamic cage movement could be clearly determined. This is due to the increased relative velocity and frictional force in the contact between the cage and the other components, which leads to a stronger excitation of the cage and an increased tendency to highly dynamic movements.

Based on the matrix in Figure 9b, a mutual correlation of the output parameters was also evident. Highly dynamic cage movements are characterized by strong deformations of the cage, high accelerations, and a high frictional torque, for which reason these parameters exhibited a strong correlation. Due to the opposite movement of the center of mass in the case of unstable cage dynamics, there is a mathematical negative relationship between the median of the Ω-ratio and the other parameters. The weak relationship of the normalized *x*˜c-coordinate of the cage to the other target quantities is also noticeable. The contact forces between the cage and the rolling element/rib point primarily in radial direction, which is the direction of the resulting acceleration. Therefore, the correlation between the quantile distance of the two non-axial coordinates is more significant, especially in the case of an unstable cage motion. The quantile distance of the Ω-ratio also indicates a slightly lower correlation to the other parameters, but still stronger than the quantile distance of the *x*˜c-coordinate of the cage center of mass.

**Figure 9.** Matrix with correlation coefficients for determining the relationship between (**a**) the input and output parameters and (**b**) the output parameters among each other.

Although there were trends based on the correlation matrix that suggest the resulting dynamic behavior of the cage, the relationship is highly nonlinear due to interactions between the parameters. Therefore, the regression algorithms are trained in the following to learn the relationship between input and output parameters.
