**1. Introduction**

With the continuous development of computer network technology, people are more and more dependent on the convenience brought by the internet, but at the same time, the characteristics of the network, such as openness and complexity, also lead to the complexity and diversity of network security threats. In order to avoid the damage caused by network threats, many network security technologies are widely used, such as firewall, intrusion detection system (IDS), vulnerability scanning program and so on [1]. This study mainly focuses on the IDS, especially on how to improve the efficiency and performance of the IDS when dealing with network security events.

IDSs can be divided into two categories: the signature-based IDS and anomaly-based IDS [2]. The signature-based IDS determines whether network traffic shows malicious or normal behavior by maintaining a knowledge base [3]. The anomaly-based IDS detects whether the network traffic deviates from the normal rule state to determine malicious traffic [4]. Whether the signature-based IDS or the anomaly-based IDS can identify different types of network attacks is an important factor to judge its effectiveness. Therefore, the establishment of an intrusion detection system needs a network data set as the support. In past studies, many open network data sets were used by scholars as benchmark data sets, such as KDDCup99 [5] and NSL-KDD [6], which were widely used in various studies in the field of network security. However, with the rapid development of network technology and the emergence of new cyber security threats, these data sets have become outdated. In recent years, many new network data sets have been published on the internet, such as DDoS 2016 [7], UNSW-NB15 [8] and CICIDS 2017 [9]. Scholars are gradually using

**Citation:** Wang, L.; Gu, L.; Tang, Y. Research on Alarm Reduction of Intrusion Detection System Based on Clustering and Whale Optimization Algorithm. *Appl. Sci.* **2021**, *12*, 11200. https://doi.org/10.3390/ app112311200

Academic Editor: Ming-Chin Chuang

Received: 31 October 2021 Accepted: 19 November 2021 Published: 25 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

these relatively new data sets in their studies. Moreover, network data sets are still attracting the attention of scholars, such as LITNET-2020 [10], a new data set proposed by Damasevicius et al. based on the real network environment in 2020. These data sets usually have a fairly high-dimensional number of features, and different features may have different types, such as numerical type and categorical type. Due to the size of the data set, it is inevitable that there will be missing values in the data set. In the past, researchers proposed a series of methods, such as clustering, to deal with this problem [11–13].

According to literature statistics [14,15], the IDS will generate a large number of alarms in a very short period of time, 85% of which are irrelevant alarms or false alarms. In the past studies, many scholars have used different technologies to deal with the problem of redundant alarms generated by the IDS [16]. These methods can be generally divided into clustering-based methods [17–19], attribute-similarity-based methods [1,20], expertsystem-based methods [21,22], genetic-algorithm-based methods [23,24], data-miningbased methods [25,26], etc.

Swarm intelligence optimization algorithms in recent years, as a kind of heuristic algorithm, are receiving more and more attention from researchers [27]. This kind of optimization algorithm is a good way to deal with the NP problem. The whale optimization algorithm (WOA), as an emerging swarm intelligence optimization algorithm, was proposed by Mirjalili and Lewis in 2016 [28]. Mirjalili and Lewis took inspiration from the behavior of humpback whales as they hunted their prey and modeled the process in the abstract into concrete mathematical equations. WOA is applied in many academic fields and achieves good results [29]. The specific application and theoretical background of WOA are described in detail in Sections 2 and 3.

The main contributions and findings of this paper are as follows:


The structure of this paper is as follows: The second part introduces the related work. The third part provides the theoretical background and introduces the framework of hierarchical clustering and the method of alarm distance calculation. In the fourth part, we propose our new methods for alarm hierarchy clustering, named WOAHC-L and WOAHC-G. The fifth part carries on the experiment and provides the experiment result and our discussion. The sixth part is the conclusion of this paper.
