*4.1. Principal Component Analysis*

Principal Component Analysis (PCA) [43] is a non supervised transform also known as Karhunen-Loeve transform (KLT). It aims at finding suitable linear transformations *y* of the observed variables that are easily interpreted and capable of highlighting and summarizing the information inherent in the initial matrix *I*. This tool is especially useful when dealing with a considerable number of variables from which you want to extract the greatest possible information while working with a smaller set of variables.

Principal Component Analysis (PCA) can hence be described as a transformation of a given set of *N* vectors into inputs (variables) of the same length *K* placed in a vector *N*-dimensional **X**, which allows to transform this vector into a second vector *y*, built in such a way that the first element of *y* includes the greatest possible variability (and therefore more information) of the original variables, that the second represents the greater variability of the *xi* after the first component, and so up to *y*(*N*) which takes into account the smallest fraction of the original variance. Therefore the main components

are those linear combinations of the random variables *x*(*N*) according to the unit norm which make the variance maximum and which are uncorrelated.

The resulting vector *y* form the feature vector which can be used for classification. Moreover, Principal Component Analysis (PCA) algorithm is invertible, so original data can be easily recovered.

### *4.2. t-distributed Stochastic Neighbor Embedding*

t-distributed Stochastic Neighbor Embedding (t-SNE) [44] is a non linear and non supervised transform, specifically designed to reduce dimensionality to 2 or 3 dimensions in order to display multidimensional data.

The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm consists of two main steps. Given our set of *N* vectorized images *<sup>x</sup>*(1), ··· , *x*(*N*) with length *l* · *m*, t-SNE first computes the conditional probability *pj*|*<sup>i</sup>*, which represents the similarity of datapoint *xj* to datapoint *xi*. In other words, it evaluates the probability that *xi* would pick *xj* as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian centered at *xi*. In formulas,

$$p\_{j|i} = \frac{\exp\left(-||\mathbf{x}\_i - \mathbf{x}\_j||^2 / 2\sigma\_i^2\right)}{\sum\_{k \neq i} \exp\left(-||\mathbf{x}\_i - \mathbf{x}\_j||^2 / 2\sigma\_i^2\right)}.\tag{3}$$

t-distributed Stochastic Neighbor Embedding (t-SNE) then defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the Kullback-Leibler (KL) divergence between the two distributions with respect to the locations of the points in the map.
