*4.1. Data Description*

The datasets chosen for the comparative analysis are *chess* and *compound*, two clusterlabeled datasets available in the literature [31,32], whose data instances and target clusters are shown in Figures 2 and 3. In particular, the two datasets have different characteristics, as reported in the following:


**Figure 2.** The *Chess* dataset: (**a**) data instances and (**b**) target clusters.

**Figure 3.** The *Compound* dataset: (**a**) data instances and (**b**) target clusters.

The multi-density distribution of instances, as well as their multi-shape partitions, makes such datasets very appropriate for our analysis because they model different scenarios to test and validate the algorithms on.

#### *4.2. Results on State-of-the-Art Data*

In order to evaluate the performance of the selected clustering algorithms over the above-introduced datasets, we compare the results obtained by the cluster analysis, i.e., the *discovered clusters*, to the ground truth labels provided by the datasets, i.e., the *target clusters*. By matching the discovered clusters against the provided target clusters, we can evaluate the effectiveness of the clustering algorithms. To do so, the following set of external metrics, designed to be employed when ground truth labels are available, are here adopted: *Fowlkes*, *Adjusted Rand*, *Adjusted Mutual Inf ormation* (*AMI*), *V*-*measure*, *Accuracy*, *F*- *measure*, *Jaccard*, Γ, *Rand* and *Homogeneity* (more details about such metrics are reported in [33]).

In general, the listed metrics consider the number of items that are incorrectly allocated, i.e., items not assigned to a cluster of points sharing the same target cluster label. According to an external criterion, the result of a clustering algorithm is more satisfactory when fewer items are incorrectly allocated. All the above-listed metrics can assume values in the range [0, 1], where a value of 1 corresponds to a perfect match between discovered and target clusters, and lower values to the presence of a higher number of incorrectly allocated items. Therefore, such external metrics can be exploited to compare the performance results of clustering algorithms according to objective quantitative criteria.

It is worth noting that, for each clustering algorithm, the choice of the input parameters directly impacts the quality of the results; therefore, in order to make a fair comparison between the clustering algorithms, there is the need to carefully pick the input parameters with respect to the analyzed dataset. Let us recall that CHD receives *k*, *ω* and *s* as input parameters; DBSCAN requires the setting of and *min*\_*pts*; HDBSCAN receives *min*\_*cluster*\_*size* and *min*\_*pts* [28] as input parameters; and OPTICS-Xi requires the setting of *ξ* and *min*\_*pts*.

In this paper, we adopted a parameter sweeping methodology for selecting the input parameters. Such a methodology consists in running several instances of each algorithm exploiting different parameter settings. For each algorithm, the parameter settings resulting in the best average performance, computed as the average of the above-listed metrics, are chosen. This process enables the modeler to determine a parameter's "best" value. Table 1 shows some details about the experimental setting adopted during the parameter sweeping. In particular, for each algorithm, the table reports the fixed parameter values, the chosen parameter to be swept and its range of values, the obtained best parameter value, and the corresponding best average performance.


**Table 1.** Experimental setting for the parameter sweeping for each algorithm.

Figure 4 reports the first set of experimental results, obtained on the *chess* dataset. The figure shows how quality indices vary versus swept input parameter values. Regarding the CHD algorithm (Figure 4a), it is clear how the trend is strongly affected by the values of the *ω* parameter, and the best results are obtained by considering *ω*<sup>∗</sup> = 1.00. The DBSCAN algorithm is evaluated by varying the parameter from 0.08 to 0.22 (*minPts* = 4), and the best result is achieved for <sup>∗</sup> = 0.14 (see Figure 4b). Similarly, we evaluate different input parameters settings for HDBSCAN (Figure 4c) and OPTICS-xi (Figure 4d). Even in these cases, little variations of the input parameters strongly affect the quality of the results. The best results are achieved considering *min*\_*cluster*\_*size*<sup>∗</sup> = 3 for HDBSCAN and *ξ*∗ = 0.066 for OPTICS-xi.

**Figure 4.** The *Chess* dataset: clustering quality indices vs. different input parameter values. (**a**) CHD. (**b**) DBSCAN. (**c**) HDBSCAN. (**d**) OPTICS-Xi.

Similarly, we run several tests on the *compound* dataset, whose results are reported in Figure 5. In particular, the figure shows how quality indices vary versus input parameter values. We can observe that, even for this dataset, input parameter values strongly affect the clustering quality and performance index values. As a result, we find that CHD achieves the best result for *ω*<sup>∗</sup> = 2.50, DBSCAN for <sup>∗</sup> = 1.53, HDBSCAN for *min*\_*cluster*\_*size*<sup>∗</sup> = 15 and OPTICS-Xi for *ξ*∗ = 0.33.

**Figure 5.** The *Compound* dataset: clustering quality indices vs. different input parameter values. (**a**) CHD. (**b**) DBSCAN. (**c**) HDBSCAN. (**d**) OPTICS-Xi.

A quantitative performance comparison among the considered algorithms is presented in Figure 6, where the values of the clustering indexes are shown for *chess* and *compound* datasets, by only referring to the run with the best combination of input parameters. In addition, Figure 7 plots the number of noise points and the number of detected clusters for both datasets. From the presented results, we can make the following considerations:


**Figure 6.** *Cont*.

**Figure 6.** Best clustering results for the four algorithms on the two datasets: *chess* (**a**) and *compound* (**b**) datasets.

**Figure 7.** Number of noise points (**a**) and number of clusters (**b**) detected by the four algorithms on the two datasets.

Finally, Figures 8 and 9 show a qualitative comparison among the clustering models detected by the four algorithms on the two datasets. In particular, by observing Figure 8 (*chess* dataset) we can see that CHD detects 15 clusters, separability is quite good, and the number of noise points (in black) is very low with respect to the other algorithms. On the other side, DBSCAN and HDBSCAN detect a lower number of clusters than CHD, but a high number of noise points. Finally, OPTICS-Xi labels many instances as noise points, which makes the clustering quality very low. On the other side, by observing Figure 9 (*compound* dataset), we can see that the CHD and DBSCAN achieve a good separability among all clusters, while HDBSCAN and OPTICS-Xi are not able to separate the two clusters on the upper left side (cluster 1). It is worth noting that DBSCAN, OPTICS-Xi, and HDBSCAN could not detect the large low-density cluster on the right (cluster 1 in Figure 3b), labeling it as noise. That cluster is detected only by CHD.

**Figure 8.** *Cont*.

**Figure 8.** The *Chess* dataset: detected clusters. (**a**) CHD. (**b**) DBSCAN. (**c**) HDBSCAN. (**d**) OPTICS-Xi.

**Figure 9.** The *Compound* dataset: detected clusters. (**a**) CHD. (**b**) DBSCAN. (**c**) HDBSCAN. (**d**) OPTICS-Xi.

#### **5. A Real-Case Study: Detecting Multi-Density Crime Hotspots in Chicago**

To evaluate the performance and assess the effectiveness of the approaches described in Section 3 to discover city hotspots in a real-world scenario, we perform a comparative evaluation on geo-referenced crime events collected over a large area of Chicago. In particular, such tests aim at showing a concrete use case on which density-based clustering analysis can be exploited and the practical usefulness of the selected clustering algorithms to discover city hotspots in real urban cases.
