*2.4. Labelling of Datasets and RF Model*

In this and the following sections, the term 'state' refers to the current state of operation on the basis of individual cycles. The term 'phase' denotes a longer period of time, in which the system is in a certain state of operation. The term 'class' describes a specific categorical label in a set of labels that is assigned to the individual cycles of the dataset during the training period of the RF algorithm, based on their state. Thus, each class consists of a set of individual cycles belonging to one state of operation.

*q*

As manual labelling of tens of thousands of cycles would be a very tedious and timeconsuming task, a pre-labelling of the cycles via clustering methods was performed. Given the large size of the data, Principal Components Analysis (PCA) was applied to reduce the dimensionality of the input and visualise the general shape of the data. After selecting an appropriate number of principal components, based on the amount of variance covered, a k-means clustering algorithm was applied to the reduced dataset. Each cycle was assigned to a cluster such that the squared Euclidean distances within each cluster were minimised. The implementations of these two steps were performed in R using the *prcomp* and *kmeans* functions [49].

The results of this pre-labelling stage are given in Table 1. Before applying PCA, the datasets were centred and scaled to have unit variance. For the k-means clustering, the first two principal components were selected, showing cumulative proportions of variance between 0.79 and 0.93. The number of clusters for the k-means algorithm was set to 5, based on the expected tribological regimes of the studied tribological experiment. The resulting cluster sizes for each experiment are unevenly distributed, as can be seen in Table 1.


**Table 1.** Overview of k-means clustering results.

The clusters were subsequently assigned to tribological states of operation: 'Steady1', 'Steady2', 'Pre-critical', and 'Critical'. The first 5000 cycles were defined 'Run-in' and discarded due to the high variability of the data. The preliminary labelling was refined in a second step by closer inspection of the data, taking into account distinctive features in the other sensor signals, e.g., sudden temperature increases or distortions of the position signal. This resulted in the inclusion of additional 'Pre-critical' areas–typically before and after short-term ('Critical') anomalies or before critical operation at the end of the experiments as well as physically meaningful merging of regions fragmented into various states of operation by the clustering algorithm. Figure 2 shows the comparison of the classification obtained by k-means clustering and the final labelling for one of the experiments used for training the RF algorithm. Here, single cycles or groups of cycles that did not differ significantly from their surroundings, which were marked as 'Pre-critical' (cluster 4) by the k-means clustering, were assigned to the respective steady state. Furthermore, the area preceding the final critical state was labelled as 'Pre-critical' in its entirety, while the k-means result switched between 'Pre-critical' and 'Steady2' in this region. This led to an overall increase of cycles labelled as 'Pre-critical' after manual adaptation (see Table 2). The 'Steady1' state is reduced in size after manual adaptation, as the first 5000 cycles were discarded.

**Figure 2.** Labelling of the lateral force signal of one experiment (Experiment 2 in Table 1). (**a**) Classification obtained using k-means clustering, (**b**) Labels used for training the RF model after manual adaptation. The clusters were assigned to tribological states of operation: 'Run-in' (Cluster 1), 'Steady1' (Cluster 2), 'Steady2' (Cluster 3), 'Pre-critical' (Cluster 4), and 'Critical' (Cluster 5).


**Table 2.** Number of cycles for one experiment (Experiment 2 in Table 1) as classified by k-means clustering before and after manual adaptation.

In the end, four classes representing the individual states of operation were distinguished; see Figure 3. 'Steady1' was used for steady operation, typically right after the run-in period, with little fluctuation and few distortions in the data. 'Steady2' typically occurred after major events. The system stabilises, but higher lateral forces are measured, and the curve shapes of the cycles are more distorted and variable. After sufficient running time without major events, the system may reach the 'Steady1' state again. 'Pre-critical' cycles are typically found before and after cycles labelled as 'Critical'. The 'Pre-critical' label is also associated with short-time events, typically lasting less than 100 cycles. During these short-time events, maximum lateral force values of 1.5 times the maxima of the surrounding steady-state cycles or lager were measured. 'Critical' cycles show heavily distorted curves with the lateral force increasing considerably at one or both turning points. This indicates that the bearing was stuck in its turning position and could only be brought back into motion when a sufficiently high lateral force was applied. One has to note that the *x*-axis in the graphs of Figure 3 corresponds to a relative position in time within each cycle rather than the actual physical encoder position. The length of the half-cycle, in which the deadlock occurred (the case for the positive half-cycle is depicted in Figure 3d), is extended, leading to an overall asymmetric cycle shape. As all cycles were normalised to a length of 100 data points, the steepness of the lateral force curve in the turning points is related to the cycle duration, which itself depends on the friction in the system at that moment.

**Figure 3.** Characteristic cycle shapes of the four operation states: (**a**) Steady1, (**b**) Steady2, (**c**) Precritical, and (**d**) Critical. Please note the different scaling of the y-axis for the critical state in (**d**).

The RF algorithm was developed using the Python ML package scikit-learn [48]. The workflow for training and application of the algorithm is described in detail below, and the corresponding flowchart is shown in Figure 4.

**Figure 4.** Flowchart of the training and classification of lateral force datasets with an RF classifier.

The dataset for training the algorithm was created using labelled cycles from four experiments, namely the numbers 2, 4, 7, and 9 in Table 1. As mentioned above, the first 5000 cycles from each experiment were considered as run-in and discarded from the dataset. Data from multiple experiments were chosen in order to cover the diversity of cycle shapes within each state and to equalise bias towards a certain state introduced by manual labelling. This includes, above all, the distortions introduced by pre-critical and critical operation, which can happen in either positive, negative, or both stroke directions.

As the distribution of the cycles over the classes representing the four states of operation was highly unbalanced (see Table 3), each class was resampled to a size of 15,000 cycles by random selection with replacement. That means that the classes 'Steady1', 'Steady2', and 'Pre-critical' were downsampled, and a random selection of the cycles over all four experiments was used for training. However, the size of the class representing the 'Critical' state was only 1265 cycles and had to be upsampled by a factor of nearly 12, drawing each cycle multiple times from the dataset. The number of 15,000 cycles was chosen, as it seemed to be a good compromise between retaining as much information as possible from the three larger classes and keeping the upsampling factor of the 'Critical' class reasonably small.


**Table 3.** Number of cycles for each class present in the training dataset before resampling.

Before training the RF algorithm, a randomised hyperparameter tuning was performed using scikit-learn's *RandomizedSearchCV* function in order to optimise the following hyperparameters. Randomised hyperparameter tuning has the advantage of a fixed, predefined number of trials, independent of the total number of combinations, which can be very large. This strategy will find a near-best combination of hyperparameters at the advantage of not spending too much time on unpromising candidates [50]. For the present work, the number of iterations was set to 100. The following hyperparameters were optimised using randomised hyperparameter tuning: *n\_estimators* indicates the number of individual

decision trees in the RF, *min\_samples\_split* is the minimum number of samples to split an internal node, *min\_samples\_leaf* is the minimum number of samples required to be in a leaf node, *max\_features* is the maximum number of features to consider at each split and was always set to the square root of the total number of features; i.e., 10, *max\_depth* indicates the maximum number of levels within an individual decision tree and finally, *bootstrap = True* means that bootstrap samples are used to build each tree rather than the whole dataset. Table 4 shows the best obtained set of hyperparameters, which were subsequently used for training the algorithm.

**Table 4.** Best hyperparameter grid for the RF after hyperparameter tuning.


The RF algorithm was trained using 75% of the input dataset as training data and 25% as test data used for determination of the quality estimators described in Section 2.3. Then, the prediction accuracy of the trained RF algorithm was assessed by a 5-fold crossvalidation with random selection of cycles for the training and test dataset. Data were again split into 75% training and 25% test data for each run, which were randomly selected from the input dataset. Finally, the algorithm was validated on a labelled experiment (number 8 in Table 1), which was not used for training.
