*5.3. Experimental Results*

As mentioned above, we use the UNSW-NB15 data set as our experimental data set. A total of 50,801 data were randomly selected from the training set and test set to carry out the clustering calculation of the alarm. The extracted alarm label distribution is shown in Table 3.


**Table 3.** Distribution of alarm labels in 50,801 pieces of data.

Attack types marked 2–10 in the table represent specific types of attacks, while the type marked 1 is normal network traffic. It is not difficult to see that this data set contains a large number of normal network traffic, which is identified by IDS as a malicious alarm, resulting in a large number of false alarms. Our goal is to use clustering methods to correctly identify normal network traffic and eliminate it from alarms.

We analyze the source IP address and destination IP address in the experimental data set as follows. The data set contains two types of IP addresses: Class B and Class C (such as Class C IP addresses 149.171.126.43 and 149.171.126.50). We can extract the same fields and classify the IP addresses, and the uncertain part is represented by X. The statistics of each type of IP address and its number are shown in Table 4.

**Table 4.** Distribution of source and destination IP addresses.


After analyzing the experimental data, we first explore the performance of WOAHC-L in looking for a single alarm cluster center. The specific parameters of WOAHC-L are set as follows: *SearchAgents* = 10 and *MaxIter* = 50, respectively, which means that we use 10 search agents to search the solution space of the clustering at the same time, and the maximum number of iterations of the algorithm is 50. Other parameters in WOAHC-L, such as *r*, *a*, *p*, etc., are generated by random numbers or are related to *MaxIter* so we do not have to set it up. The fitness function is set as the equation mentioned in Section 4.2.

According to the mechanism of WOAHC-L, the cluster center obtained by calling the algorithm for the first time has the best fitness value. In order to explore the performance of WOA in hierarchical clustering, we independently perform WOAHC-L four times and record the change in the optimal fitness value with the number of iterations during each algorithm execution. The performance of WOAHC-L is shown in Figure 14. The X axis is the number of iterations and the Y axis is the fitness value of the currently found optimal cluster center.

**Figure 14.** Comparison of fitness values of the best search agen<sup>t</sup> obtained by four calls to WOAHC-L.

As can be seen from Figure 14, WOAHC-L constantly seeks the most excellent cluster center in the search space through its exploitation phase and exploration phase. It can be seen from the figure that WOAHC-L has an excellent ability to break through the local optimum, which can be clearly seen from the stage of 20–50 iteration times.

After verifying that WOAHC-L has excellent global search capability in alarm clustering, we continue to explore the use of WOAHC-L to solve the local clustering problem of alarm hierarchical clustering, that is, every time we call the WOAHC-L, an alarm cluster is obtained, which is obtained through the iterative search of WOAHC-L, according to the fitness function. Taking *MaxIter* of 50 as the maximum number of iterations and the number of search agents of 10 as an example, we perform WOAHC-L four times and obtain four cluster centers successively from 50,801 alarms. The number of alarms contained in the four cluster centers varies with the number of iterations as shown in Figure 15.

**Figure 15.** Alarm clustering result of 4 consecutive calls to WOAHC-L.

According to the experimental statistics, the number of alarms contained in each cluster obtained by calling WOAHC-L four times is as follows. By analyzing the obtained cluster center, we find that the tag field in the cluster center obtained by the first, second and fourth calls to WOAHC-L is (*null*, <sup>0</sup>), and the tag field in the cluster center obtained by the third calls to WOAHC-L is (*Generic*, <sup>1</sup>). In other words, these four calls to WOAHC-L distinguish the generic attacks from the normal traffic in the original data set by clustering. We add up the number of alarms generated by the first, second and fourth calls to WOA to 45,469, which is within the allowable range when compared with the number of normal network traffic 44,387 in the table. The third call to the WOAHC-L cluster is 3819, which is within an acceptable distance of the 4257 alarms for the generic attacks in the table. This experiment proves that hierarchical clustering using WOAHC-L can well distinguish the type of alarm.

After proving WOAHC-L's excellent local and global search ability and clustering ability, we next explore statistical analysis of WOAHC-L on time consumption and clustering results. Based on the algorithm flow of WOAHC-L in Figure 12, when the number of remaining alarms no longer meets the clustering threshold, the algorithm ends, and the clustering result is output. Therefore, we constantly call WOAHC-L to obtain new clustering until the condition of the algorithm ending is met. The experimental results are shown in Figure 16.

**Figure 16.** Time consuming and alarm clustering results versus the execution times of WOAHC-L.

As can be seen from Figure 16, as the number of algorithm executions increases, the time of each algorithm execution gradually decreases, and the number of alarms contained in the cluster obtained by each invocation of the algorithm also decreases. The reason is obviously that every time the algorithm is called to obtain the cluster center, the alarms belonging to the cluster in the alarm set are removed from the alarm set, so the next time the algorithm is called, the alarm items scanned by the algorithm are fewer and fewer, and the time and number of alarms are also reduced.

In order to verify the robustness of WOAHC-L, we conduct five experiments. In each experiment, WOAHC-L is called several times to obtain multiple cluster centers, until the amount of remaining data in the data set are less than the requirement of alarm clustering.

As can be seen from Table 5, except the first row, there are five rows in the table, representing five experiments. Except the first column, from a total of 12 columns, the top 11 columns represent each experiment that invokes the WOAHC-L to obtain cluster center containing the number of alarms. What we need to pay attention to is that in the table there are some gaps, such as the third line 10 columns; these blanks show that the experiments, after several WOAHC-L calls, have met the conditions of the end of the algorithm, such as the second experiment. We find that WOAHC-L is called eight times and then stops executing. The last column of the table represents the total time of the experiment. Compared with the five experiments, the average time and standard deviation of the total time spent in our calculation algorithm are 58.7 s and 5.3, which are within the allowable range of the experiment. In addition, we should also note that although the number of cluster centers generated by each experiment is different (the five experiments are 11, 8, 10, 10, and 9, respectively), as we mentioned before, cluster centers generated by this kind of local clustering method will cause the problem of a too high coincidence degree of cluster centers. However, we can study the experimental data in the first experiment of two previous cluster centers containing the sum of the number of alarms for 33,303. For the properties of alarm analysis, we find that they all belong to the normal network traffic, so we can combine the two-cluster center as a cluster center, representing the clustering of normal network traffic. Excluding the problem of clustering overlap, we have reason to believe that the robustness of the algorithm is relatively excellent.

Meanwhile, in order to compare with other algorithms, we repeat the alarm clustering algorithm mentioned in the references [45–47] and compare it with WOAHC-L. In order to more intuitively feel the results of clustering, we use the number of alarms contained in the cluster center as the Y axis to compare the four algorithms. Please note that it is not accurate to rely only on the number of clusters contained in the cluster center to evaluate the quality of the clustering results. The quality of the clustering results is also greatly related to the distance between the alerts contained in the cluster center (called difference

in some works), that is, the fitness function mentioned in the formula. The experimental results are shown in Figure 17.

**Table 5.** The results of the number of alarms contained in the cluster and the number of algorithms calls in five experiments.


**Figure 17.** Comparison of experimental results between WOAHC-L and Julisch's method, SA and GA.

As can be seen from Figure 17, compared with Julisch's algorithm, SA and GA, WOAHC-L can well jump out of the local optimum in the process of clustering search, while other algorithms begin to converge at the beginning or middle of the iteration and no longer search for a better cluster center. We need to emphasize that the purpose of this experiment is to intuitively show that compared with the other three algorithms, WOAHC-L can better jump out of the local optimal and achieve better results when carrying out hierarchical clustering. A more detailed comparison of experimental results with other algorithms is presented in the following experiments. We use the following formula to calculate the values of these indicators.

$$Accuracy = \frac{TP + TN}{TP + FP + TN + FN} \tag{14}$$

$$Precision = \frac{TP}{TP + FP} \tag{15}$$

$$Recall = \frac{TP}{TP + FN} \tag{16}$$

where *TP* represents the number of normal network traffic clustering to normal cluster, *TN* represents the number of network attack clustering to attack cluster, *FP* represents the number of network attack alarms incorrectly clustering to normal cluster, *FN* represents the number of normal network traffic incorrectly clustering to the attack cluster. *Accuracy*

represents the accuracy of the clustering; *Precision* represents how much of the data clustered as normal traffic are really normal traffic; and *Recall* represents how much of the normal traffic is correctly clustered.

Table 6 shows the comparison of various indicators of different algorithms. These indicators have specific meanings, introduced in Table 1.

**Table 6.** A detailed comparison of experimental results between WOA and Julisch's method, GA and SA.


As can be seen from Table 6, WOAHC-L is obviously superior to other algorithms in terms of the number of clustering, number of remaining alarms, accuracy, precision and recall. It should be noted that recall refers to the rate of aggregation of false alarms, or reduction in redundant alarms. However, WOAHC-L has obvious deficiencies in terms of time consumption, which is caused by the multi-search agen<sup>t</sup> mechanism of the WOA and the calculation of the fitness value.

As the excellent performance of WOAHC-L in alarm reduction is proved by comparing with other algorithms, we explore the stability of WOAHC-L in clustering and the rule analysis of clustering results. In order to avoid the contingency of experimental results, we conduct eight experiments, where each experiment performs several WOAHC-L times until the number of remaining alarms no longer meets the clustering requirements. The results of the experiment are shown in Figure 18. The X axis represents the number of cluster centers, that is, the number of WOAHC-L calls, and the Y axis represents the number of remaining alarms in the original alarm set.

According to Figure 18, we find that in the eight experiments, WOAHC-L generates cluster centers with more alarms in the previous calls. According to the output of the code, we can find that these cluster centers are all clusters related to redundant alarms, which is consistent with our expected assumption: WOAHC-L first clusters redundant false alarms that account for a higher proportion in the alarm data set, and then clusters the remaining real alarms. We set the alarm number threshold of algorithm end condition as 500 and observe that the times of calling WOAHC-L are different in the eight groups of experiments (10, 6, 10, 7, 8, 6, 10, and 7, respectively). In Section 4.4, we explained the reason for this, that is, there may be an overlap between the cluster centers generated by WOAHC-L, and it is confirmed in our experiment.

Through a series of experiments above, we prove the superiority of WOAHC-L in the process of alarm reduction, but its disadvantages are also exposed: the algorithm is time consuming, the results obtained by multiple executions of the algorithm have certain differences (also known as idempotency), and the high degree of overlap between some cluster centers lead to an increase in algorithm calls. To solve this problem, we propose a global clustering version of WOAHC, namely WOAHC-G, and explore the performance of WOAHC-G in the following experiments.

As mentioned in Section 4.4, WOAHC-G only needs to be called once to obtain all the cluster centers, provided that the number of cluster centers need to be specified, which is a difficult problem. However, with the experimental results of WOAHC-L as a reference, we can choose an appropriate number of cluster centers as a parameter. Here, we set the number of cluster centers as eight, which is based on the above experimental results of the comprehensive consideration. Meanwhile, our search agen<sup>t</sup> code is 72 dimensions (each cluster center contains nine attributes, a total of eight cluster centers). We still select 10 search agents to search for the cluster space. Other parameters of WOAHC-G are set as

above, and the fitness function of the global version is selected, adding the influence of clustering coincidence degree. Under the same data set configuration, we conduct eight experiments and use a boxplot to represent the number of alarms in each cluster, as shown in Figure 19.

**Figure 18.** Comparison of the final results of eight experiments.

In each boxplot in Figure 19, the blue line at the top represents the maximum, the blue line at the bottom represents the minimum, the upper edge of the square represents the upper quartile, the lower edge represents the lower quartile, the red line represents the median, the green dot represents the average, and the white dots represent outliers. As can be seen from Figure 19, when we fix the number of clustering as 8, the experimental results generated by the eight experiments are basically not significantly different. As can be seen from the number of alarms in cluster 1 to cluster 3, the global clustering version using WOAHC-G can effectively eliminate the problem of excessive cluster repetition. However, the problem of the global clustering version is that it is difficult to determine an appropriate number of cluster centers. In this round of experiments, we set the number of cluster centers to 8 according to the experience of WOAHC-L to eliminate the impact of this problem. However, how to determine an appropriate number of cluster centers requires further research and analysis in the future.

Finally, we discuss the time complexity of the proposed algorithm framework. It is assumed that WOAHC eventually generates *C* clusters, with a total of *M* alarms to be clustered, the maximum iteration time of the algorithm is *T*, there are *N* search agents searching the solution set space at the same time, and each hierarchical tree has *m* nodes. For WOAHC-L, the time complexity of calculating the fitness value is *O*(*M* + *log*2*<sup>m</sup>*), so the total time complexity of the algorithm is *O*(*C* ∗ *T* ∗ *N* ∗ (*M* + *log*2*<sup>m</sup>*)). During our experiment, *N* threads are used to search *N* search agents at the same time, so the time complexity can be optimized to *O*(*C* ∗ *T* ∗ (*M* + *log*2*<sup>m</sup>*)). For WOAHC-G, the time complexity of calculating the fitness value is *<sup>O</sup>log*2*<sup>m</sup>* + *M* + *<sup>C</sup>*<sup>2</sup>, so the overall time complexity is *OT* ∗ *N* ∗ *log*2*<sup>m</sup>* + *M* + *<sup>C</sup>*<sup>2</sup>, and the time complexity after multithreading optimization is *OT* ∗ *log*2*<sup>m</sup>* + *M* + *<sup>C</sup>*<sup>2</sup>.

**Figure 19.** Boxplots of the number of alarms in 8 clusters.

#### **6. Conclusions and Future Work**

In previous studies, in order to solve the problem of a large number of redundant alarms generated by IDS, some scholars proposed the hierarchical alarm clustering algorithm. With the development of the swarm intelligence optimization algorithm, the WOA has been widely used to solve various optimization problems. In this paper, we propose two new methods to solve the problem of alarm reduction by applying the WOA to hierarchical clustering, namely WOAHC-L and WOAHC-G. Through experiments on the UNSW-NB data set, we prove that WOAHC-L can solve the problems of the clustering falling into the local optimum and the poor clustering quality well. Experimental results show that WOAHC-L can achieve 95.2% clustering accuracy and reduce redundant alarms by 96.1%. Compared with WOAHC-L, which generates more than 11 clusters, WOAHC-G reduces the problem of clustering overlap by generating 8 accurate clusters.

Although our new method achieves good results in dealing with redundant alarms, it still has some limitations. For example, WOAHC-L can solve the problem of premature convergence in the clustering process well, but it is easy to produce an excessive number of repeated clusters. WOAHC-G solves this problem well, but it requires the space size of multiple orders of magnitude to encode the hierarchical tree. Meanwhile, WOAHC-G requires a prior number of clusters to ensure that the clustering is not repeated, which will make it difficult to deal with real network problems.

Future work will consider shifting the focus of work to the processing of security events in the real network environment and studying how to adjust a more excellent fitness function to obtain more accurate clustering. In addition, the adaptive setting of the clustering number of WOAHC-G will also be a key point of future work.

**Author Contributions:** All authors contributed to this manuscript. Conceptualization, L.W.; methodology, L.W.; data curation, L.W.; supervision L.G.; validation, L.G. and Y.T.; resources, L.G.; writing— original draft preparation, L.W.; writing—review and editing, L.G., Y.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Publicly available datasets were analyzed in this study. This data can be found here: [https://research.unsw.edu.au/projects/unsw-nb15-dataset, accessed on 18 November 2021]. **Conflicts of Interest:** The authors declare no conflict of interest.
