*5.2. Experimental Analysis*

We compare our algorithm *MLevCluTree* with *Clustree* in [16] with respect to data utility, security and efficiency. Because *l*-diversity can ensure there are at least *l* hierarchical data records in an equivalence class, we set *k* = *l*. *k* is varied from 2 to 6. The value of each point is the mean value on 10 experiments.

The average information loss of a hierarchical data record for algorithms *MLevClusTree* and *Clustree* is shown in Figure 5. From the two figures, we can see that the information loss increases when *k* increases. Because *k* increases, an equivalence class contains more hierarchical data records, and the possibility of providing more general values for every QI attributes increases. Therefore, the information loss increases. For the dataset *B* with *h* = 3, because more vertices for a hierarchical data record are needed to generalize, the information loss is higher than that of the dataset *B* with *h* = 2. Although *MLevClusTree* considers that multiple sensitive values lie in the same level, different sensitivity levels are evaluated with different constraints. So the information loss for our *MLevClusTree* is less than one for *Clustree*, i.e., the utility of *MLevClusTree* is better than that of *Clustree*.

**Figure 5.** Information loss on two datasets: (**a**) Dataset *A* with *h* = 2; (**b**) Dataset *B* with *h* = 3.

The security of our *MLevClusTree* and *Clustree* is evaluated by the dissimilarity degree of equivalence class, and the results are shown in Figure 6. The ordinate denotes the average dissimilarity degree of an equivalence class. For an equivalence class, we can use Equation (7) to obtain its dissimilarity degree. Therefore, the results of dataset *A* with *h* = 2 and dataset *B* with *h* = 3 are not significantly different. As *k* increases, there are more sensitive values in different sensitivity levels, and the dissimilarity degree of a vertex in the class representative of an equivalence class increases. So the average dissimilarity degree of an equivalence class increases. From Figure 6, we can see that the average dissimilarity degree of an equivalence class for our *MLevClusTree* is higher than that for *Clustree*, since our approach restricts the proportion of sensitive values in different sensitivity levels. Therefore, our approach enhances the ability to resist similarity attacks and improves the data security.

**Figure 6.** Dissimilarity degree of equivalence class on two datasets: (**a**) Dataset *A* with *h* = 2; (**b**) Dataset *B* with *h* = 3.

Finally, we evaluate the efficiency of our algorithm by the execution time. The experimental results are shown in Figure 7. We can see that the execution time of two algorithms increases with the increment of *k*. For every equivalence class *Q* in hierarchical data, the first hierarchical data record is randomly selected and we do not need to compute. For every other record in the equivalence class, we need to scan partial hierarchical data to find the record whose distance to current *Q* is approximately minimum. When *k* increases, the size of an equivalence class increases. Thus, the runtime increases. Also, we can see that the time for dataset *B* is more than that for dataset *A*, because the hierarchical data with more levels needs more time to find the record whose distance to current *Q* is approximately minimum. From Figure 7, we know that our *MLevClusTree* is slightly higher than that of *ClusTree* when *k* increases, since for every equivalence class *MLevClusTree* needs to decide whether the number of sensitive values in every sensitivity level exceeds the given threshold.

**Figure 7.** Execution time on two synthetic datasets: (**a**) Dataset *A* with *h* = 2; (**b**) Dataset *B* with *h* = 3.

From these experimental results, we can see that our *MlevClusTree* provides stronger privacy protection and has lower information loss, although it takes more time. It is acceptable because the anonymized process is offline.
