3.1.1. Generalization Hierarchies

We first introduce the concept of generalization hierarchy. As mentioned earlier, if newly generated alarms are arranged in a meaningful cluster according to predefined rules, operators can easily understand what is happening in the network. According to this idea, we define the concept of cluster, and classify the alarms into the cluster they belong to according to the rules. We use the basic idea of hierarchical clustering proposed by Julisch [44,45]. As shown in Figure 1 below, for all the attributes in the alarm, we can use the method of hierarchical division to layer the attributes. Figure 1a shows the attribute hierarchical tree composed of IP attributes, and each leaf node of the tree represents a unique specific IP address. We can generalize it once to obtain the specific protocol using this IP, such as firewall and WWW/FTP in Figure 1a. If we continue to generalize it, we can obtain more advanced generalizations, such as DMZ and EXTERN. When we find that the generalization has reached the highest level and can no longer be generalized, we define the root of the hierarchy tree. For example, the root of the hierarchy tree to which the IP attribute belongs is ANY IP. The generalized structure of other attributes is similar, as shown in Figure 1b–d. In the past, scholars have proposed many methods for the construction of a hierarchical tree, with which we can construct a hierarchical tree for various attributes of the alarm data.

**Figure 1.** Hierarchical tree structure of four attributes: (**a**) IP address attribute; (**b**) time attribute measured in weeks; (**c**) port number attribute; (**d**) time attribute measured in months.

After providing the construction process of a hierarchical tree, we provide the following definitions of nodes in the hierarchical tree.

**Definition 1.** *A basic alarm is the alarm triggered by IDS that correspond to leaf nodes in the hierarchy tree. An abstract alarm is derived from the basic alarm by generalization and corresponds to intermediate or root nodes in the hierarchy number. Naturally, a basic alarm is also a special abstract alarm.*

**Definition 2.** *In a hierarchical tree, if there is a path from N*1 *node to N*2 *node, then N*1 *is a generalization of N*2*, and N*2 *is a specification of N*1.

**Definition 3.** *For both abstract alarms A*1 *and A*2*, A*1 *is a generalization of A*2 *if each attribute in A*1 *is a generalization of the corresponding attribute in A*2*, and at the same time, A*2 *is a specification of A*1.

**Definition 4.** *For an alarm set, the minimum cover refers to the common generalization of all alarms in the set, and the generalization is a minimum specification.*

Based on the four definitions above, considering the four hierarchical trees shown in Figure 1, there is an alarm set that contains three alarms: *A*1 (ip1,80,h1,11), *A*2 (DMZ,80,h0, MIDDLE), and *A*3 (DMZ, PRIVATE, WEEKEND, MIDDLE). *A*1 is a basic alarm because all the attributes of the alarm are at leaf nodes in the hierarchical tree. *A*2 and *A*3 are abstract alarms because there is at least one attribute in the alarm that is the middle nodes in the hierarchy tree. *A*3 is a generalization of *A*1 and *A*2 because every attribute of *A*3 is a generalization of *A*1 and *A*2, and obviously *A*3 is a common generalization of *A*1, *A*2, and *A*3.

#### 3.1.2. Distance Definition

After obtaining the generalized alarm set, in order to cluster the alarms in the original alarm set, we need to define the distance calculation rule in the clustering problem, that is, defining the distance between two alarms to judge whether they belong to the same cluster. In fact, it is easy to calculate the distance between attributes of a numeric type, but there

is a problem if the alarm property is a category, time, or string property using the same distance calculation method. We give the following definition to calculate the distance between two alarms in a hierarchical tree.

**Definition 5.** *The distance between any two nodes in the same hierarchical tree depends on the number of edges between them. If two nodes have directly linked edges, the distance between them is 1.*

**Definition 6.** *If there is a generalization–specification relationship between two alarms, the distance between the two alarms is defined as the average distance between their attributes.*

**Definition 7.** *The distance of an alarm set is defined as the average distance between the minimum coverage in the set and each alarm.*

Consider the alarm sets *A*1 (ip1,80,h1,11), *A*2 (DMZ,80,h0,MIDDLE), and *A*3 (DMZ, PRIVATE, WEEKEND, MIDDLE) mentioned above, where the minimum coverage in the alarm set is *A*3. The distance between *A*1 and *A*2 is (2 + 0 + 2 + 1)/4 = 1.25. The distance between the minimum coverage and alarm sets is (1.5 + 0.75 + 0)/3 = 0.75.

#### 3.1.3. Definition of the Clustering Problem

The clustering method is now described as the following: among all triggered alarms, a group of generalized alarms is found; the number of alarms within each generalized alarm exceeds or is equal to a given threshold; and the distance between the alarms is as small as possible. This method is proved to be an NP complete problem, that is, the exact solution cannot be obtained in feasible time. Julisch presented an approximate algorithm [44] as shown in Algorithm 1.


#### *3.2. Whale Optimization Algorithm*

Mirjalili and Lewis proposed the whale optimization algorithm based on abstract modeling of the hunting strategies of humpback whales; it mimics the bubble-net feeding in the foraging behavior of humpback whales [28]. Humpback whales hunt close to the surface while trapping the prey in a net of bubbles. They create this net when swimming on a '6-shaped path. The algorithm mimics two phases: the first phase (exploitation phase) is to encircle the prey and attack with spiral bubble nets, and the second phase (exploration phase) is searching randomly for prey. Figure 2 shows a series of behaviors of humpback whales as they hunt prey. Figure 2a shows the movement of the whale toward the prey, during which the whale can choose to move toward the lead whale or in a random direction. Figure 2b illustrates the shrinking encircling mechanism used by whales to capture prey. Besides the shrinking encircling mechanism, the whale also moves further toward the prey in a spiral shape, during which the whale emits a bubble attack to surround the prey, as shown in Figure 2c. The details of each phase are presented in the following subsections.

**Figure 2.** The process of humpback whales hunting prey: (**a**) the whales move toward the lead whale or in random directions; (**b**) whales reach their prey by the shrinking encircling mechanism; (**c**) whales reach their prey in a spiral shape.

3.2.1. Exploitation Phase (Encircling Prey/Bubble-Net Attacking Method)

Mirjalili et al. designed two methods to mathematically model the bubble-net behavior of humpback whale, one of which is the shrinking encircling mechanism and the other is the spiral updating position. We then analyze the concrete implementation of these two processes from a mathematical point of view.

In the shrinking encircling mechanism, WOA applies the following two formulas to update the problem solution to model the movement of a whale toward a prey.

$$
\overrightarrow{D} = \left| \overrightarrow{\mathbb{C}}.\overrightarrow{X}^\*(t) - \overrightarrow{X}(t) \right| \tag{1}
$$

$$
\overrightarrow{X}(t+1) = \left| \overrightarrow{X}^\*(t) - \overrightarrow{A}.\overrightarrow{D} \right| \tag{2}
$$

→

where *t* represents the number of current iterations, *X*∗ represents the optimal solution obtained so far, → *X* is the current solution scheme, || is the absolute value, and . is the dot product operation between the elements. → *A* and → *C* are the coefficient vectors, which can be obtained from Equations (3)–(5):

$$
\stackrel{\rightarrow}{A} = 2\stackrel{\rightarrow}{a}\stackrel{\rightarrow}{r} - \stackrel{\rightarrow}{a}\tag{3}
$$

$$
\stackrel{\rightarrow}{C} = 2.\stackrel{\rightarrow}{r} \tag{4}
$$

$$a = t \frac{2}{\text{MaxIter}}\tag{5}$$

where the value of *A* decreases linearly from 2 to 0 and its value is in the interval [−*a*, *a*]. *r* is a random vector between [0, 1]. *a* increases linearly from 0 to 2 depending on the number of iterations. *t* is the number of the current iteration, and *Maxiter* is the maximum number of pre-set iterations.

According to Equation (2), the current solution updates the position of the current solution according to the optimal solution obtained so far. Through the two vectors *A* and *C*, the search range of the current solution can be controlled to be fixed within the neighborhood range of the optimal solution. In order to imitate the behavior of whales hunting prey in Figure 2b, we use the mathematical model shown in Figure 3a for modeling and analysis. It is assumed that (*X*<sup>∗</sup>,*Y*<sup>∗</sup>) is the current global optimal solution, and the solid dots in the figure, such as (*<sup>X</sup>*,*<sup>Y</sup>*), are the current solution. Figure 3a shows the possible positions from (*<sup>X</sup>*,*<sup>Y</sup>*) toward (*X*<sup>∗</sup>,*Y*<sup>∗</sup>) that can be achieved by 0 ≤ *A* ≤ 1 in a 2D space.

**Figure 3.** Bubble-net search mechanism implemented in WOA. (*X*∗ represents the best solution obtained so far): (**a**) shrinking encircling mechanism; (**b**) spiral updating position.

As mentioned above, whales also use a spiral motion to move toward prey as shown in Figure 2c. WOA uses the following formula to model this behavior.

$$
\stackrel{\rightarrow}{X}(t+1) = D^{\prime}x^{bl}.\cos(2\pi l) + \stackrel{\rightarrow}{X^\*}(t) \tag{6}
$$

where → *X*∗ represents the optimal solution obtained so far, → *X* is the current *i*th solution, *D* = | → *X*<sup>∗</sup>(*t*) − → *<sup>X</sup>*(*t*)∨ and indicates the distance of the *i*th whale to the prey (best solution obtained so far), *b* is a constant for defining the shape of the spiral, and *l* is a random variable between [−1, 1].

The approximate figure of Equation (6) and Figure 2c is shown in Figure 3b. In this 1D space, *Xt* represents the current *i*th solution (i.e., the whale), *X*∗ represents the current optimal solution (i.e., the prey), and the distance between *Xt* and *X*∗ is *Di*. The x-coordinate of the coordinate axis represents a random number *l*, which is used to control the movement direction of the whale, and the y-coordinate represents the next position *<sup>X</sup>*(*t*+<sup>1</sup>) of the current solution *Xt*. In order to simulate the behavior of humpback whales swimming around prey while following a spiral-shaped track in a shrinking circle, the authors consider the contraction and spiral rise processes to occur equally with probability, and the mechanism is defined in Equation (7).

$$\stackrel{\rightarrow}{X}(t+1) = \left\{ \begin{array}{c} \stackrel{\rightarrow}{X^\*}(t) - \stackrel{\rightarrow}{A}\stackrel{\rightarrow}{D} \middle| if(p<0.5) \\ \stackrel{\rightarrow}{D'}e^{bl}.\cos(2\pi l) + \stackrel{\rightarrow}{X^\*}(t)if(p\ge 0.5) \end{array} \right. \tag{7}$$

where *p* is a random variable between [0, 1].

#### 3.2.2. Exploration Phase (Search for Prey)

As mentioned above, besides moving toward the lead whale, the whale can also move in a random direction, as shown in Figure 2a. This is called the exploration phase in WOA. In this phase, we no longer require a random search of the solution based on the position of the optimal solution found so far, but instead update the position with randomly selected solutions. Thus, a vector with a random value greater than 1 or less than −1 is used to force a solution away from the optimal search agent. This mechanism can be expressed in mathematical models as Equations (8) and (9).

$$
\vec{D} = \left| \vec{\mathcal{C}} \vec{X}\_{rand} - \vec{X} \right| \tag{8}
$$

$$
\overrightarrow{X}(t+1) = \left| \overrightarrow{X}\_{rand} - \overrightarrow{A}.\overrightarrow{D} \right|\tag{9}
$$

where → *Xrand* is a random solution of the current solution vector set. The meanings of the other notations are mentioned above.

In WOA, the author uses *A* to control whether the algorithm specifically executes the exploitation phase or exploration phase. When the absolute value of *A* is greater than 1, WOA chooses to execute the exploration phase; when the absolute value of A is less than 1, WOA chooses to execute the exploitation phase. As mentioned in the previous paper, the value range of *A* is [−*a*, *a*], and the value of *A* decreases linearly with the increase in the number of iterations. Therefore, in the general trend, WOA has more chances to jump out of the current optimal solution and choose the random solution at the early stage of implementation. With the increase in the number of iterations, the range of *A* will gradually shrink, and the WOA will gradually converge to the optimal solution.
