*3.3. HDBSCAN*

The HDBSCAN algorithm [28], based on similar concepts defined for OPTICS, computes a complete clustering hierarchy composed of all possible density-based clusters for a large range of density thresholds. Then, it chooses the clustering model that maximizes

the overall stability of the extracted clusters. To build such a hierarchy, the HDBSCAN starts from the concepts of *mutual reachability distance* between two core points *p* and *q* and given a value for a parameter *min*\_*pts*. The *mutual reachability distance* is defined as the minimum radius such that *p* and *q* are mutually density-reachable. Differently from the above introduced *reachability distance*, the *mutual reachability distance* is symmetric.

HDBSCAN works as follows. First, it builds the clustering hierarchy by computing a *mutual reachability graph*, which is a complete graph where each vertex is a data point, and each edge is weighted with the mutual reachability distance of the linked couple of points. Then, a minimum spanning tree is computed on that graph, integrated by adding (for each vertex) a self-edge, weighted by the *core-distance* of the related data point. The tree is processed by removing the edges in decreasing order with respect to their weight. For each removal, the two involved edges are labeled as roots of a new pair of clusters, or noise if the generated component has not any edge. A variation of the summarized algorithm considers also a given minimum cluster size (*min*\_*cluster*\_*size* parameter), which avoids the generation of clusters having a size lower than *min*\_*cluster*\_*size*. Given the clustering hierarchy, a clustering is extracted which maximizes the overall stability. The notion of stability is derived from the notion of *excess of mass* [29]. HDBSCAN is able to compute the clustering hierarchy and extract the clusters in *<sup>O</sup>*(*n*<sup>2</sup>) time, which is in some cases infeasible and represents the main drawback with respect to DBSCAN and OPTICS.

#### *3.4. City Hotspot Detector*

The *city hotspot detector* (CHD) algorithm [30] is a multi-density based clustering algorithm that has been purposely designed for processing urban spatial data. The algorithm is composed of several steps, as follows. First, given a fixed *min*\_*pts*, the reachability distance for each point is computed and exploited as an estimator of the density of each data point. Then, the points are sorted with respect to their estimated density, and the density variation between each consecutive couple of points in the ordered list is computed. The obtained density variation list is then smoothed by applying a rolling mean operator considering windows of size *s*. The points are then partitioned into several *density level sets*, on the basis of the smoothed density variations. Then, a different value is estimated for each density level set. Finally, each set is analyzed by the DBSCAN algorithm. Specifically, each instance takes in input, a specific value computed for the analyzed density level set. The set of clusters detected for each partition constitutes the final result of the CHD algorithm. The CHD algorithm runs with *O*(*nlogn*) time complexity, where *n* is the size of the processed dataset. A cluster analysis with the CHD algorithm requires the tuning of more parameters (three) with respect to the previously introduced algorithms.

#### **4. Experimental Evaluation and Results**

In this section, we provide a comparative analysis of the four density-based clustering algorithms described in Section 3, namely CHD, DBSCAN, HDBSCAN, and OPTICS-Xi, by assessing the quality of the clusters detected by the algorithms and their ability to process datasets characterized by areas with different densities. The comparison is made on the results gathered by analyzing two datasets provided of the target cluster labels that are considered ground truth during the evaluation process. The experiments were carried out by exploiting the implementations provided by the scikit-learn Python library for DBSCAN and OPTICS-xi, the hdbscan Python library for HDBSCAN, and the R implementation of the CHD algorithm available at gitlab (CHD R-code: https://gitlab. com/chd3/chd-rcode/, accessed on 18 December 2022).
