*3.2. K-Means Algorithm*

The technique chosen for clustering purposes is the K-Means algorithm, in order to define the groups present in a dataset. This algorithm is based on the location of each centroid on its corresponding hyperspace. Thus, the data with a similar nature is situated in the proximity of each centroid, comprising a cluster [35]. The K-Means algorithm tries to define "K" number of centroids. Then, every data point is located in the nearest cluster, always trying to keep the shorter distance (usually the Euclidean) between centroids and each sample.

At the beginning, K-Means implements a training process in order to ge<sup>t</sup> the clusters and distribute the data samples. The velocity of this step depends of the numbers of clusters and the size of the dataset. However, the second phase, when each sample data is assigned to its clusters, is done quite fast compared with the initial phase [36].

The procedure to train the K-Means algorithm is explained with this sequence:


The last two steps are repeated until the centroids are the same two consecutive times. It means that the algorithm is converged and the K-Means algorithm will stop, the new samples can be assigned to its clusters by comparing the distance between the different centroids.

#### *3.3. Artificial Neural Networks*

An Artificial Neural Network (ANN) is an artificial intelligence technique based on the biological neurons model; the information is managed by unitary component called a neuron. Like in the biological approach, the artificial neuron is linked with other neurons. Thus, an ANN is able to calculate complex functions thanks to external data input and input from others neurons. The input for each neuron has a weight associated and each neuron has inside an activation function that defines the output.

ANN learning model is based on the fact that this kind of architecture is able to learn from experience thanks to the generalization of cases. Complex functions can be obtained through the training process. The ANN develops a characterization of a problem in order to create an answer in accordance with the input of the problem, without having knowledge about the previous situation. Therefore, the ANN can generalize new solutions from previous ones [37].

The excitation level, also named the output of a neuron, is defined by the activation function [38]. This output can change from 0 to −1 or from −1 to 1. A key feature of an ANN is its topology. It defines how the set of neurons is organized. Thus, the topology includes the 'placement' of the neurons and how they are linked. The architecture of the ANN is defined by four features:


The Multi Layer Perceptron (MLP) is the basic topology of the ANN. The architecture is organizes as follows: input, a set of hidden layers, output. When the information arrives from the same source to a set of neurons, they belong to the same layer. The information can go from the inputs of ANN or from a previous layer to the next ones. In the MLP, the information from neurons in a layer goes to the same destination—the next layer or the output of the MLP.

Usually, the activation function of the output layer is a specific activation function that depends on the application of the ANN, one of the most common activation functions is the 'linear' one.
