2.2.2. Random Forest

The random forest algorithm is based on decision trees, where the data are split into different branches that are created on the basis of specific data subsets. Random forest consists of creating multiple decision trees and randomizing the set of features these trees are fed into [48]. This approach is a trademark for what is known as ensemble learning. The response of each tree is then compared, and, in the case of classification, the mode of the outputs is considered as the categorical prediction. The diverse nature of the random forest algorithm, i.e., the use of multiple classifiers to find a robust prediction, allows for lowvariance responses, which is a desired characteristic in any machine learning method [49]. The predictions are also expected to be unbiased.

A particularity of this method is the identification and ranking of the most relevant features in the datasets with respect to the categorical responses. This can be useful as a complement for the study of the effect of single features on the output response.

The main disadvantage of this algorithm is the large computational time required for implementation, which increases with the number of trees to build (defined by the user). More details regarding the algorithm can be found in [50].
