**2. Methodology**

#### *2.1. Support Vector Machines*

Support vector machines (SVM) are a set of methods for data classification and regression based on the maximization of the interclass distance: the basic concept of the SVM is to define the optimal (optimal in the sense of the model's generalization to unknown data) linear separator that separates the data points into two classes. To facilitate this, the algorithm employs the "kernel trick": the initial data space is projected through a kernel function to a higher dimensional space (feature space) where the dataset may be linearly separable [13]. In this paper, we use four kernels, the linear, the quadratic, the cubic, and three different Gaussian kernels: fine, medium and coarse, following a different structure in the data each time.

#### *2.2. Gaussian Process Regression*

Gaussian processes are a flexible class of non-parametric machine learning models that are primarily used for modeling spatial and time series data. Gaussian models are commonly used to solve difficult machine learning problems. They are particularly useful and attractive due to their flexible non-parametric nature and their computational simplicity. A common application of Gaussian processes is regression. Gaussian process regression (GPR) is based on the determination of an appropriate kernel function or a measure of similarity between data points whose locations are known. Compared to other machine learning methods, the advantages of GPR lie in its ability to seamlessly integrate multiple machine learning tasks, such as parameter estimation. Moreover, it has excellent performance and needs a relatively small training dataset to perform predictions. However, a known problem that arises is that due to the computational complexity of the predictions, according to [12], it becomes infeasible for GPR to be effective for large datasets. In this paper we trained four different GPR models coupled with the most important kernel functions with same length scale for each predictor:


#### *2.3. Decision Trees*

Ref. [14] proposed decision trees as a forecasting modeling technique in statistics, data mining, and machine learning. It employs a decision tree (as a forecasting model) to shift from observations of an item (represented by the branches) to inferences about the object's target value (represented in the leaves). Regression trees are decision trees in which the target variable can take continuous values (typically real numbers). In this paper we use three different tree models:


#### *2.4. Ensemble of Trees*

An ensemble of trees is formed by several individual trees that are added together. Although decision trees are one of the most efficient and interpretable classification algorithms, they suffer from low generalization ability nonetheless. Thus, they provide a low bias in-sample but a high variance out-of-sample. Ensemble techniques have been shown to solve this problem. They combine several decision trees to produce better prediction performance, as opposed to using a single decision tree. The basic principle underlying the ensemble model is that a group of weak learners is combined to form a strong learner. The main techniques for training ensemble decision tree models are bagging and boosting [15].
