*4.3. Resolution of the Localization Problem in a Compact Model*

Image retrieval is an inefficient process due to the fact that the maps are usually composed by a huge number of images and the descriptors have a high dimensionality. Therefore, the computational cost could be a problem. In this case, clustering is used to compact the map. Additionally, indoor environments may present visual aliasing. As explained in Section 3, after clustering, the map M will be formed only by a set of clusters *C* = {*C*1, ..., *Cnc*}, where *nc* is the number of clusters. For each cluster, a representative descriptor was calculated as the average of the descriptors in it and the coordinates of those representatives as the average coordinates of the descriptors that compose that cluster. Thus, a set of cluster representatives {#»*r*1, ..., # »*rnc*} and the coordinates of each representative {(*x*, *y*)*r*<sup>1</sup> , ...,(*x*, *y*)*rnc* } are known (ground truth).

The localization in this hierarchical map is carried out as follows. (1) The robot captures a new image *imt* from an unknown position (*xt*, *yt*), which must be estimated, and (2) the descriptor corresponding to the new captured image is obtained (#»*dt*) by using any of the description algorithms explained in Section <sup>2</sup> (FS, HOG, or *gist*). (3) The distance vector is obtained #»*lt* <sup>=</sup> {*lt*1, ..., *ltnc*} where *ltj* = *dist*{ #»*dt*, #»*rj* } is the distance (one of the three types explained in Section 4.1) between the descriptor #»*dt* and each representative #»*rj* . Finally, (4) the estimated position of the robot (*xe*, *ye*) is the position associated with the nearest neighbour *dnn <sup>t</sup>* |*t* = *arg* min*<sup>j</sup> ltj*.

The coordinates of the representatives are not used in the localization step. However, to measure the goodness of the estimation, the geometric distance between (*xt*, *yt*) and the centre of the corresponding cluster (obtained as the average position among the positions of the images that belong to that cluster) is calculated: *error* <sup>=</sup> (*xe* <sup>−</sup> *xt*)<sup>2</sup> + (*ye* <sup>−</sup> *yt*)2. Furthermore, the required computational cost to estimate the localization is calculated.
