• Self-Organising Maps (SOM)

The clustering model, known as SOM, is an unsupervised Artificial Neural Network (ANN) presented in 1982 by T. Kohonen [58]. This model is based on certain evidence discovered at brain level and performs a reduction of the dimensionality of the input space to produce topologically ordered maps. This type of network has competitive, unsupervised learning. The network itself is in charge of self-organising and discovering common features, regularities, correlations, or categories in the input data [59,60].

Figure 6 shows the architecture of the model and how each input neuron is connected to one of the output neurons by weights (*w*, according to Kohonen's notation). The output neurons will therefore have an associated vector of weights which is called the reference vector (or codebook), also constituting the average vector of the category represented by the output neuron [61,62].

**Figure 6.** General example of SOM model's topography. Dimensions are expressed by *x* and *y*; *v*1–n represent each one of the input neurons, and *w*ij is the weight of each vector according to Kohonen's notation.

SOM's utility lies in the holistic visual interpretation of the output rather than in understanding the underlying processes [63]. Roughly speaking, the output layer (i.e., the self-organising map itself) contains neurons organised in a rectangular or hexagonal lattice to represent the entire dataset [58].

The goal of this learning is to categorise the data fed into the network. Similar values are classified into the same category and, therefore, should activate the same output neuron. Since this is an unsupervised method, classes or categories must be created by the network itself through correlations between the input data [64]. However, SOM can also be used for pattern recognition (supervised learning). The information is given at the end of the training: if classification is involved, as in this case, the winner-takes-all strategy is used. This principle can be extended to more layers, generating super-organised maps (supersom). For each layer, a similarity level is calculated, and the individual similarities are combined into a single value which is used to determine the winner node.

• Newton's method

This nonlinear regression uses Newton's Surface gradients, which is an unconstrained linear regression method based on that gradient. The gradient information is provided by analytically computed gradients. Design variables are modified, while their impact on the objective function is analysed [65].

• Euclidean distance model

The operation of this model is based on Euclidean distances (*dE*). This is a nonnegative function used to calculate the distance between two pointsP=(p1; p2; ... ; pn) and Q = (q1; q2; ... ; qn) on an n-dimensional space [66]. It works on the basis of the Pythagoras Theorem (Equation (2)) [67]. Results evaluation using this method involves checking that the model gives a 100 % quality in all the cases studied, i.e., that it perfectly finds its counterpart.

$$d\_E(\mathbf{P}, \mathbf{Q}) = \sqrt{\left(p\_1 - q\_1\right)^2 + \dots + \left(p\_n - q\_n\right)^2} = \sqrt{\sum\_{i=1}^n \left(p\_i - q\_i\right)^2} \tag{2}$$

To summarise, Table 3 shows the different algorithms used in each phase of the data mining process.

**Table 3.** Summary of all models used.


### **4. Results and Discussion**

Results obtained in each of the phases are presented below.
