*5.2. Creating Compact Maps through Clustering*

This section focuses on the evaluation of clustering methods to compact the information contained in a set of global appearance descriptors. To carry out the experiments, two clustering methods were studied for each environment, and three global appearance descriptors were considered. The first method (Method 1) consists of spectral clustering along with k-means as was explained in Section 3.1. Other configurations were tested, such as to use of SOM instead of k-means to solve Step 5 of the spectral clustering, but the results were quite similar; thus, only the spectral clustering along with k-means to cluster the normalized matrix of the *nc* eigenvectors is shown. The second method (Method 2) consists of the use of SOM, which was explained in Section 3.2. Therefore, for the two proposed methods, several experiments were carried out to study the influence of the parameters of the three global appearance descriptors. Table 2 summarizes the experiments developed.


**Table 2.** Summary of the parameters that have been varied to carry out the clustering experiments. FS, Fourier Signature.

The values *k*1, *k*2, and *k*<sup>3</sup> define the length of each descriptor, but their meaning is not the same (equal values of *k*1, *k*2, and *k*<sup>3</sup> would not lead to the same descriptor size). Therefore, as our aim is to study the correct tuning of these values to use each descriptor as efficiently as possible, we do not apply the same values for all the descriptors in the experiments.

Once the compact map has been produced, it may be interesting to provide some measures that permit quantifying the compactness of the map. In this context, the concept of the silhouette is commonly used. Silhouette values point out the degree of similarity between the instances within the same cluster and at the same time the dissimilarity with the instances that belong to other clusters. The silhouette takes values in the range [−1, 1], and it provides information about how compact the clusters are. Therefore, in order to quantify the goodness of each method, three parameters are considered:


These values are collected after the clustering process. As for the moment of inertia, it measures the compactness of the clusters (if the clusters group images captured from geometrically-close points) and is calculated as:

$$M = \sum\_{i=1}^{n\_c} \frac{\sum\_{j=1}^{n\_i} \text{dist}((\mathbf{x}\_i \mathbf{y})\_{r\_{i'}} (\mathbf{x}\_{j'} \mathbf{y}\_j))^2}{n\_i} \tag{7}$$

where *dist*((*x*, *y*)*ri* ,(*xj*, *yj*)) is the Euclidean distance between the coordinates of the representative #»*ri* and the position of the jth image that belongs to the cluster *Ci*, and *ni* is the number of images within this cluster.

As for the silhouettes values, two types of silhouette are used: the average silhouette of points is defined as:

$$S\_{points} = \frac{\sum\_{w=1}^{N} s\_w}{N} \tag{8}$$

*N* is the number of instances (images), and *sw* is the silhouette of each instance; it is calculated as:

$$s\_w = \frac{b\_w - a\_w}{\max(a\_w, b\_w)}\tag{9}$$

where *aw* is the average distance between the capture point of the instance #»*<sup>d</sup> <sup>w</sup>* and the capture points of the other instances in the same cluster, and *bw* is the minimum average distance between the capture point of the instance #»*<sup>d</sup> <sup>w</sup>* and the capture point of the instances in the other clusters.

Differently, the average silhouette of descriptors is traditionally obtained through:

$$S\_{descr} = \frac{\sum\_{k=1}^{N} s\_k}{N} \tag{10}$$

where *N* is the total number of instances and *sk* is the silhouette of each instance. This value is calculated as:

$$s\_k = \frac{b\_k - a\_k}{\max(a\_k, b\_k)}\tag{11}$$

where *ak* is the average distance between the descriptor #»*<sup>d</sup> <sup>k</sup>* and the descriptor of the rest of the entities contained in the same cluster, and *bk* is the minimum average distance between #»*<sup>d</sup> <sup>k</sup>* and the instances contained in the other clusters.

The silhouette of descriptors has been traditionally used to measure the compactness of the clusters. However, it does not measure the geometrical compactness. This is why we introduce the silhouette of points, which can provide more proper information since we are interested in knowing whether the clusters have grouped images captured nearby.
