*4.3. Classification Algorithms*

As regards the classification task, we consider the use of k-Nearest Neighbor (NN) and Support Vector Machines (SVMs). They are both supervised and non parametric algorithms.

k-Nearest Neighbor (NN) is an instance-based algorithm, meaning that it does not explicitly learn a model. Instead, it chooses to memorize the training instances which are subsequently used as "knowledge" for the forecasting phase. In concrete terms, this means that only when a query is made in the database (i.e., when asked to provide a label with an input), the algorithm will use the training instances to send a response. As a drawback, this algorithm presents both a storage cost during the training phase, since it is necessary to store a potentially huge dataset, and a computational cost during the prediction phase since the classification of a given observation requires the vision and/or analysis of the entire dataset. In the context of classification, the k-Nearest Neighbor (NN) algorithm essentially boils down to determine a majority vote among the *k* closest neighbors to a given unknown instance. The proximity is defined by a distance metric, usually the Euclidean distance, between two data points.

Support Vector Machine (SVM) algorithm classifies data by creating a linear of non-linear decision boundary to separate different classes. It projects the data through a non-linear function to a space with a higher dimension, lifting them from their original space to a feature space, which can be of unlimited dimension. To perform this operation, Support Vector Machine (SVM) makes use of kernels, among which one of the most used is the Gaussian kernel.
