*2.5. Classification*

For the classification stage, the RF algorithm is used to classify depressive and non-depressive episodes based on the features selected for each dataset, specifically using the best ranked features according to the previous step.

According to Phan Thanh Noi et al [26], the RF algorithm use has been increasing in the past few years because of its effectiveness. RF algorithm came to light in 2001 created by Leo Breiman et al [27], it is conformed by a combination of trees generated randomly and with different predictors each of them. This algorithm is a supervised technique where multiple decision trees are used to develop a forest. This forest is more robust if it is developed with more number of trees for the classification.

To classify an observation, the trees are generated in order to response questions with yes/no response, every tree bases the response in the features of the observation and responded to make a classification of the observation [28].

Generally, a leaf is used for the expansion of the construction of the tree in each step. At the end, from the decision trees built, they are merged into a single tree to obtain a higher prediction accuracy [29].

The general performance of RF follows the next steps,


For the implementation of the RF algorithm, it is used the R language [30] with the default settings of the *randomForest* library [31].
