**3. Clustering Methods to Compact the Visual Information**

In this section, the creation of topological models and how to compact them will be addressed. Subsequently, these models will be utilized to solve the localization problem. Only visual information and global appearance descriptors will be used in both tasks. This way, the problem will be addressed through the next two steps.


Focusing on the learning step, the robot moves around the environment and captures some images from different positions to cover the whole environment. This way, a set of omnidirectional images is collected *<sup>I</sup>* <sup>=</sup> {*im*1, *im*2, ..., *imN*} where *imj* <sup>∈</sup> <sup>R</sup>*Nx*×*Ny* . After that, a global appearance descriptor is calculated for each image; hence, a set of descriptors is obtained *D* = { #»*d*1, #»*d*2, ..., # »*dN*} where #»*dj* <sup>∈</sup> <sup>C</sup>*l*×1.

This set of descriptors can be considered as a straightforward model of the environments [43,44], as some previous works do. However, in this mapping strategy, important problems appear when the environment has considerable dimensions. The larger the environment is, the more images have to be captured to model it completely. This leads to the requirement of more computational time and also more memory space in order to process and collect the information related to each captured image and to solve the subsequent localization problem. This way, the model should be compacted in such a way that it retains most of the visual information and permits solving the localization problem efficiently.

In this work, we propose a clustering approach to compact the model, with the objective of creating a two-layer hierarchical structure. The low-level layer is composed of a set of descriptors and, to obtain the high-level layer, this set will be compacted via clustering. Each cluster is characterized by the common attributes of the instances that form that group. This way, the dataset *D* = { #»*d*1, #»*d*2, ..., # »*dN*} is divided into *nc* clusters *C* = {*C*1, *C*2, ..., *Cnc*} under the conditions:

$$\mathbf{C}\_{i} \neq \bigotimes\_{} i = 1, \ldots, n\_{\circledast}$$

$$\bigcup\_{i=0}^{m} \mathbf{C}\_{i} = D$$

$$\mathbf{C}\_{i} \bigcap \mathbf{C}\_{j} = \bigotimes\_{} i \neq j, i, j = 1, \ldots, n\_{\circledast}$$

After this, each cluster is reduced to a unique representative descriptor, which is obtained in this work as the average of all the descriptors that compose that cluster. A set of representatives is obtained *<sup>R</sup>* <sup>=</sup> {#»*r*1, #»*r*2, ..., # »*rnc*}, and therefore, the model is compacted. This set of representatives composes the high-level layer.

Figure 3 shows how a sample map is compacted. Figure 3a shows the positions where panoramic images were captured to cover the whole environment. The result of the clustering process is presented in Figure 3b, and then, one representative per cluster is obtained (Figure 3c). The representative descriptor is obtained as the average descriptor among those grouped in the same cluster. Additionally, the position of this representative descriptor is calculated as the average position of the capture points of the images included in the same cluster. These positions are calculated just as a ground truth to test the performance of the compact map in a localization process, but they are not used either to build the map, nor to localize the robot. Only visual information is used with these aims. Different clustering methods will be analysed. These methods will only use visual information, and ideally, the objective is to group images captured from near positions despite visual aliasing. To evaluate the correctness of the approach, the geometrical compactness of the clusters and their utility to solve the localization task will be tested in Section 5.

Regarding the clustering process to compact the visual models, two methods are studied: spectral clustering and self-organizing maps.

**Figure 3.** Example of an indoor map and a compression of the information. (**a**) Positions where the images were captured. (**b**) Result of the clustering process. (**c**) Each cluster is reduced to one representative.
