*6.1. Random Forest*

Random forests were suggested by Breiman [49] and are an ensemble of so-called decision trees [50]. A common algorithm to create decision trees is CART [51], but others exist as well [52,53]. A decision tree is a machine learning method that starts at the so-called "root" node and uses at each step the best binary split of a variable to create two child nodes [50]. This split can be considered a rule that aims to make resulting partitions of the data more "pure" in terms of the distribution of classes in each of them. This procedure is repeated until a stopping criterion is met [50], for instance, that each partition is "pure", meaning that only a single class is present. Following the resulting path of rules that are applied to each new observation leads them to a so-called "leaf" or "terminal node" which is associated with one class (either pure or majority in that partition) [52,54,55]. Thus, following the path branched out from the root node determines the class membership of an observation. This procedure of iteratively using binary splits to create "purer" partitions of the data is called "recursive partitioning" meaning that it creates regions of the instance space that belong to each of the classes in a classification problem [50,52,55].

A decision tree has multiple advantages, such as its easy interpretability due to the rules it provides for its class assignments [52,54], its ability to handle numerical and discrete variables, and that it does not require assumptions about the underlying distributions [52]. However, decision trees are sensitive to small perturbations of the data (high variance) [56] and, thus, tend to overfit.

The aim of a random forest is to overcome this weakness of decision trees by combining multiple decision trees and aggregating their class predictions [50,56]. The idea of random forests is an extension of bagging [50]. Bagging stands for "bootstrap aggregation", where "bootstrap" refers to randomly sampling observations with replacement from the training data to obtain multiple data sets of the same size as the original training data, whereas "aggregation" highlights that the results from training models on these bootstraps are averaged (=aggregated) [56]. The difference in random forests to classical bagging is that not only observations are randomly drawn from the original data but also the variables are randomly sampled (except for the target variable) [50,56]. This procedure aims to reduce the correlation between trees to obtain de-correlated trees [56]. The algorithm for a random forest [50,56] (in the context of classification) is illustrated in Algorithm 1. The algorithm illustrates that a set of decision trees are used that each cast their vote and the most common class vote is used as the class prediction for the random forest (majority voting) [56].

For this study, the number of decision trees in the random forest is set to 50. The minimum number of observations at each leaf node (minimum leaf size) is an optimized hyperparameter over the values {1, 10, 20, 50, 250, 1000, 2905}, where 2905 is the number of samples divided by two (rounded down). The Gini diversity index (GDI) is selected as the splitting criterion, the technique for variable selection (step 1.2.1. in Algorithm 1) is the interaction test [57], and the number of variables selected randomly (*m*) from the bootstrap sample is √*p* where *p* is the number of all variables in the data set [50,56].

### **Algorithm 1** Random forest for classification

1. For *t* = 1 to *T* (number of decision trees in the random forest)

1.1. Take a bootstrap sample of the training data

	- 1.2.1. Select a subset of the variables (denoted *m*) of all variables (denoted *p*) in the
		- bootstrap sample

1.2.2. Determine the best binary split for any of the *m* variables (best splitting criterion value e.g., purity)

1.2.3. Split the node into two child nodes using the variable and variable value for the best binary split

End

2. Assign observations to classes by taking each tree's class prediction and using a majority vote (most common class prediction) over all decision trees (=votes) to determine the class label
