*4.1. Dataset*

There are several numerical experiments for small and medium dataset sizes that were done in order to validate the clustering algorithm effectiveness. In other words, there were three clustering test problems used in order to evaluate the proposed MHTSASM algorithm with others clustering techniques, such as K-M, TS, GA, and SA.

Firstly, there are three different standard test problems that were considered to compare the proposed algorithm with the other heuristics and meta-heuristics, such as the K-M algorithm [42], the TS method, GA, the SA method, and the non-smooth optimization technique for the minimum sum for the squares clustering problems (Algorithm 1) [43]. The German towns clustering test, which uses Cartesian coordinates, which has 59 towns and 59 records with two attributes, was used. Secondly, another set including 89 postal zones in Bavaria, which includes 89 records and 3 attributes, was used. Thirdly, another set including 89 postal zones in Bavaria, which includes 89 records and 4 attributes, was

used. The results for these datasets are presented in Section 4.2.1, First Clustering Test Problems Situation.

Secondly, another two test datasets were used to compare the proposed MHTSASM algorithm with K-M algorithm, j-means (J-M+) algorithms [25], and the variable neighborhood search, such as VNS, VNS+, and VNS-1 algorithms [22]. In addition, we compared MHTSASM with the Global K-means (G1) [24], Fast global K-means (G2) algorithm [38,39], and Algorithm 1 [40,44]. These datasets are called (1) the first Germany postal zones dataset; and (2) the Fisher's iris dataset that includes 150 instances and 4 attributes [45–47] used in order to evaluate the proposed MHTSASM algorithm with others clustering techniques, such as K-M, J-M+, VNS, VNS+, VNS-1, G1, and G2. The results for these dataset is presented in Section 4.2.2, Second Clustering Test Problems Situation.

Thirdly, another five datasets called Iris, Glass, Cancer, Contraceptive Method Choice (CMC) and Wine as the examples of low, medium, and high dimensions data, respectively, were used to compare the proposed MHTSASM algorithm with K-harmonic means (KHM), particle swarm optimization (PSO), hybrid clustering method based on KHM, and PSO, called the PSOKHM algorithm [25] ACO and the hybrid clustering method based on ACO and KHM called the ACOKHM algorithm [26,27]. The results for these dataset is presented in Section 4.2.3, Third Clustering Test Problems Situation.

Table 2 summarizes the characteristics of these datasets [47]. Furthermore, there are several methods called (1) KHM; (2) PSO; and (3) PSOKHM algorithms [25], which were selected in order to compare their performances. In addition, other method called (1) KHM; (2) ACO; and (3) ACOKHM algorithms [27,28] were selected for the same purpose.


**Table 2.** Datasets characteristics.

Moreover, there are two characteristics for each dataset that can positively affect the clustering algorithms' performances: (1) the instances which are the data records numbers; and (2) the features which are the data attributes numbers. The list of parameter values for the proposed MHTSASM algorithm is presented in Table 3 [28].

**Table 3.** Parameters setting.


#### *4.2. Clustering Test Problems Situation Results*

Tables 4 and 5 report the best value for the global minimum. The values are given as nf(x<sup>∗</sup>), where x<sup>∗</sup> is the local minimizer, and n is the instances number [28]. Moreover, the error E percentage of every algorithm is shown and calculated using Equation (5).

$$\mathbf{E}^{\circ} = \frac{\mathbf{f} - \mathbf{f}\_{\text{opt}}}{\mathbf{f}\_{\text{opt}}} \ast \mathbf{1}00\tag{5}$$

where frepresents the algorithm finding solution, and fopt represents the best known solution.


**Table 4.** Results of first problems situation.

**Table 5.** Results of second problems situation.


4.2.1. First Clustering Test Problems Situation

The numerical experimental results in Table 4 were taken for seven runs. In these tables, K refers to the number of clusters, and fopt refers to the best identified value.

The proposed MHTSASM algorithm can reach the best results compared with all other clusters techniques for the German towns dataset, as shown in Table 4. The MHTSASM, TS, and GA results were the same. SA and Algorithm 1 gave the same results for all clusters.

Furthermore, the proposed MHTSASM algorithm achieved the best results for all numbers of clusters for the first Bavarian postal zones dataset, as shown in Table 4. The results from Algorithm 1 and MHTSASM were the same. Cluster three gave bad results for K-M, TS, GA, and SA. Cluster four gave better results for K-M, TS, GA, and SA than cluster three. Furthermore, GA gave good results except for cluster three.

The results presented in Table 4 show that TS achieved the best results for all numbers of clusters except for clusters five, where it gave a bad result for the second Bavarian postal zones dataset. The results from Algorithm 1 and the MHTSASM were the same for all clusters. Cluster two gave bad results for all algorithms except TS. Clusters three and four gave good results for the MHTSASM, Algorithm 1, and GA but bad results for SA and K-M. Cluster five gave good results for the proposed MHTSASM algorithm and Algorithm 1 but bad for K-M, TS, GA, and SA.

The results indicated that the proposed MHTSASM algorithm could reach better or very similar solutions to those found using the global optimization methods. Therefore, the proposed MHTSASM algorithm can deeply compute the local minimum in the clustering problem for the objective function.

#### 4.2.2. Second Clustering Test Problems Situation

Another two datasets called (1) the first Germany postal zones dataset, and (2) the Fisher´s iris dataset that includes 150 instances and 4 attributes [8,21] were used in order to evaluate the proposed MHTSASM algorithm with other clustering techniques, such as K-M, J-M+, VNS, VNS+, VNS-1, G1, and G2. The numerical experimental results for this evaluation is presented in Table 5, where K refers to the number of clusters, and fopt refers to the best identified value [28].

The average results for 10 restarts for KM, J-M+, and VNS were used, as seen in Table 5. The results of the first Germany postal zones dataset indicated that the proposed MHTSASM algorithm achieved better results than those for K-M, J-M+, VNS, G1, G2, and Algorithm 1. The proposed MHTSASM algorithm performed the same as VNS-1.

As shown in Table 5, the results of the Fisher´s iris dataset indicated that the proposed MHTSASM algorithm achieved better performance than the K-M technique for all numbers of clusters except for K = 2 and better results than J-M+ for K = 2, 3 [28]. The MHTSASM gave better results than G2 for K = 6, 7, 8, 9, 10. The proposed MHTSASM algorithm and VNS-1 gave results for K = 6, 7, 8 better than all other algorithms. Algorithm 1 and G1 were quite similar; they gave better results than the proposed MHTSASM algorithm for K = 2, 3, 4, 5. VNS gave better results than the MHTSASM for K = 2, 3, 4, 9. Compared with the Algorithm 1, the G1 technique achieved similar or better performance for all K values excluding K = 10. The proposed MHTSASM algorithm gave better results than Algorithm 1 for K = 7, 9. It gave better results than G2 for K = 6, 7, 8, 9, 10. It gave better results than G1 for K = 7, 8, 10. In addition, it gave better results than VNS for K = 6, 8, 10. The VNS-1 can be considered as the best technique for the Fisher´s iris dataset, as it could achieve the best identified optimal values. Hence, the results indicated that the results deviation found using the proposed MHTSASM algorithm from the comprehensive minimum was equal to or less than zero for all K.

#### 4.2.3. Third Clustering Test Problems Situation

Another five datasets, known as Iris, Glass, Cancer, Contraceptive Method Choice (CMC), and Wine were employed to validate the proposed MHTSASM algorithm. Furthermore, there were several methods called (1) KHM, (2) PSO, and (3) PSOKHM algorithm [25] that were selected in order to compare their performances [28]. In addition, methods called (1) KHM, (2) ACO, and (3) ACOKHM algorithm [26,27] were selected for the same purpose.

The clustering methods quality, which could be measured using the F-measure criteria, was compared. F-measure is an external quality measure applicable for quality evaluation tasks [48–52]. It uses the ideas of precision and recall from information retrieval [53–57]. It finds how each point is clustered correctly relative to its original class. Every class (i) assumed using the benchmark dataset class labels is observed as ni items set chosen for the query. Every cluster (j) created using the data cluster algorithms is observed as ni items set

for the query retrieval. Let nij denotes the class i element numbers inside cluster j. For every class i inside cluster j, the precision and recall are then defined as shown in Equation (6):

$$P\_{\text{(i,j)}} = \frac{\mathbf{n}\_{\text{ij}}}{\mathbf{n}\_{\text{j}}}, \; \mathbf{r}\_{\text{(i,j)}} = \frac{\mathbf{n}\_{\text{ij}}}{\mathbf{n}\_{\text{i}}} \tag{6}$$

and the corresponding value under the F-measure is calculated by Equation (7): 

$$\mathbf{F}\_{\text{(i,j)}} = \frac{\left(\mathbf{b}^2 + 1\right) \ast \mathbf{P}\_{\text{(i,j)}} \ast \mathbf{r}\_{\text{(i,j)}}}{\mathbf{b}^2 \ast \mathbf{P}\_{\text{(i,j)}} + \mathbf{r}\_{\text{(i,j)}}} \tag{7}$$

where we are assuming that (b = 1) in order to gain equal weighting for <sup>r</sup>(i,j) and p(i,j). The total F-measure for the dataset of size n can be calculated using Equation (8):

$$F = \sum\_{\mathbf{i}} \frac{\mathbf{n}\_{\mathbf{i}}}{\mathbf{n}} \max\_{\mathbf{j}} \left\{ \mathbf{F}\_{(\mathbf{i}, \mathbf{j})} \right\} \tag{8}$$

The results in Table 6 show the quality of clustering evaluated using the F-measure over five real datasets. The results show standard deviations and means for 10 independent runs.


**Table 6.** Results of KHM, PSO, PSOKHM, and the proposed MHTSASM algorithm when p = 2.5, 3, and3.5,respectively.

The results of numerical experiments are presented in Table 6 for KHM, PSO, and PSOKHM when p = 2.5 and indicate that the proposed MHTSASM algorithm performed better than results for KHM, PSO, and PSOKHM for all datasets [28]. Furthermore, the PSOKHM performed better than results for KHM and PSO for datasets Iris and Glass. KHM performed better than results for PSO and PSOKHM for the CMC dataset. PSOKHM and KHM performed the same for datasets Wine and Cancer.

 0.530  0.535  0.715

 0.502

Wine

The results of numerical experiments are presented in Table 6 for KHM, PSO, and PSOKHM when p = 3 and show that the proposed MHTSASM algorithm performed better than results for KHM, PSO, and PSOKHM for all datasets. The PSOKHM performed better than results for KHM and PSO for datasets Glass and Wine. The PSOKHM and KHM performed the same for datasets Iris, Cancer, and CMC.

The results of numerical experiments are presented in Table 6 for KHM, PSO, and PSOKHM when p = 3.5 and show that the proposed MHTSASM algorithm performed better than results for KHM, PSO, and PSOKHM for all datasets. PSOKHM performed better than results for KHM and PSO for datasets Cancer and Wine. The KHM performed better than results for PSO and PSOKHM for the Iris dataset. The PSOKHM and KHM performed the same for datasets Glass and CMC.
