*2.2. Data Preprocessing*

As mentioned above, the sensor measuring the lateral force *FL* is part of the lever system. In order to obtain the coefficient of friction (*μ*), the geometry of the lever system has to be taken into account, resulting in the following relation:

$$
\mu = \frac{100 \text{ F}\_L - 2 \text{ F}\_N}{175 \text{ F}\_N}. \tag{1}
$$

Before feeding the algorithm, several data preprocessing steps were necessary. This was done using the programming language Python in the form of interactive Jupyter notebooks [35], using NumPy arrays [36] and pandas DataFrame objects [37] for efficient computing.

The time-series signals acquired by the force, acceleration, and supplementary sensors, sampled at rates of up to 5 kHz, were stored in the hdf5 file format [38] on a file server dedicated to the storage of large amounts of raw measurement data. Since the amount of raw data was too large for efficient processing on a conventional workstation, data were directly read from the server and downsampled to 100 Hz, thereby carefully retaining the main characteristics of the sensor data.

In a second step, noise was removed from the lateral force and position signals by smoothing with a third-degree Savitzky–Golay filter [39] with a window length of 25 samples.

Due to the oscillating nature of the setup, periodic patterns repeating with the oscillation frequency of the system are present in the lateral force data. Each one of these

patterns describes the evolution of the lateral force during one cycle. Deviations from the normal state of operation can be seen as distortions of the individual cycle shapes, which are discussed in more detail in Section 3.1. This leads to an increase in the cycle periods as well as the lateral force levels and maxima.

The zero position of single-cycle curves were triggered using the zero-crossings of the normalised position signal in negative stroke direction. Thus, the length of the extracted curves was normalised to 100 data points per curve using linear interpolation. In the end, an *m* × 100 matrix, with *m* being the number of individual cycles of the respective experiment, was obtained as input for the Random Forest classifier.
