1. Introduction
Nowadays, we use technology in our everyday lives, and it helps simplify many mundane or even more advanced tasks. Not many of us think about the possibility of losing access to the technology advancements and how it will be hard for us to adjust to living without it. The threats to most of the tools are, however, very real, and they are attacked almost constantly, even without us knowing, which is a significant cybercrime issue. Fortunately, most nations worldwide are investing heavily in building Security Operation Centers to help with the awareness of cyber attacks and their prevention. Both universities and the majority of high schools in Slovakia are connected to the Internet through the SANET network infrastructure, which, as a public-facing network, is the target of a huge amount of attacks ranging from simple bruteforce login tryouts up to more sophisticated Distributed Denial of Service (DDoS) attacks.
DDoS could be considered an improved version of simpler DoS attacks. In both forms, the attacker tries to make the service provided to clients by the server inoperable or otherwise impaired. This means that the responses by the server could be much slower or even missing compared to the unaffected server. Unlike DoS attacks, which originate in one place only, a DDoS attack consists of a multitude of originating sources, which are often called bots. Bots are usually network-connected devices that are controlled by the attacker from a single point, also called the Command and Control server.
There are three main types of DDoS attacks mentioned in [
1]:
Application attacks, exploiting some well-known or even unknown vulnerabilities, also called zero-day attacks, in application protocol or service, are the first and most serious type. The attackers utilizing this type of attack are very effective because even with a small number of controlled devices, they can cause critical service outages. Protecting against them is intricate because of the difficult detection and mitigation by administrators.
Protocol attacks affecting transport protocols, such as TCP or UDP, could be considered a second type. Creating a vast amount of TCP connections is very effective and can easily lead to possible connection limit exhaustion. This type of attack is found not only on routers and firewalls, but it can also affect servers or load balancers, which are trying to distribute the load between multiple servers.
Volume-based attacks, being the last but not least important type, are the simplest of the three types. The attacker is trying to exhaust the server’s available bandwidth to cause network congestion. As a result of this attack type, the server can neither get the request from the client nor respond to a received request.
However, the latest attacks could not be classified into the mentioned types because they combine several types of characteristics. Very often, DDoS attacks provide cover for sophisticated malware injections by attackers, which is even harder for security analysts or even automated tools to detect. To make things even worse, they can be used to ex-filtrate classified information from the infected devices, which later become part of the botnet themselves and are often referred to as “zombie devices” [
2]. Some cybercrime groups or individuals even sell access to large botnets for staggering prices, as creating ones without sophisticated attack techniques is hard.
Defense against DDoS attacks, according to [
3], includes three main parts: monitoring, detection, and response. The monitoring phase plays a key role in obtaining information about the network services the user provides. Detection methods are built on data collected during monitoring, and network patterns and anomalies or incidents are analyzed. The response phase is triggered after an attack is identified through detection methods. This includes implementing firewall rules as the first line of defense, detecting the threat, and immediately notifying the network security team.
Our university’s Internet connection through the academic high-speed network, called the Slovak Academic Network (SANET), is subject to constant cyber threats. Devices connected to this network face a variety of attacks, including simpler brute force attacks on SSH, RDP, or HTTP/HTTPS, to more serious DDoS attacks. Network protection is financially demanding, as increasingly powerful network equipment is required to withstand more sophisticated attacks, especially as SANET currently consists of several 100 Gb links on one segment. That is why, as part of the SANET II project “Research in the SANET network and possibilities for its further use and development”, we were trying to find computationally efficient statistical methods and create a machine learning system to detect DDoS attacks in real time. The method was designed in a way that allowed easy implementation of these methods into a hardware probe to monitor IP traffic in real time on any connected network segment.
The primary objective of the SANET II project is to implement research findings through innovative services and technologies within the distribution network, prioritizing security and dependability. The project’s suggested methods and principles aim to expedite the adoption of technologies, ensuring a more effective and secure transmission of specific data. The objective is to devise fresh models and distribution approaches, anticipating future interdisciplinary adjustments. Progress in the development of network infrastructure in this realm will not only enhance the ability of the scientific and research community to distribute, store, and exchange R&D data efficiently but will also pave the way for potential adaptations in the realm of Industry 4.0. This involves modifying the proposed concepts for machine communication mechanisms within a vast network environment. Our team is actively engaged in advanced flow monitoring and assessing security events for both networks and Cloud Computing systems. Numerous articles authored by our team delve into CC systems [
4], their security architecture [
5], the management of cybersecurity incidents, and the establishment of a packet capture infrastructure to generate valuable datasets [
6,
7].
At the beginning of a DDoS attack, there is a significant increase in the peak rate and average rate of IP traffic. Detecting an attack using these rates is insufficient. A significant peak can occur randomly, even during the standard traffic, and the moving average rate, which is calculated in a time window and increases linearly during a DDoS attack. Even with the so-called Low Rate DDoS attacks, this increase is insignificant. The mentioned factors can cause the detection of false positives or late recognition of a DDoS attack. The peak rate and average rate monitoring do not allow the recognition of a change in the probabilistic character of the monitored flow, which occurs during an attack due to generating a number of fraudulent packets.
In our research, we try to find such probabilistic characteristics of the IP flow that, by significantly changing their values, would be able to react to the start of a DDoS attack in time. At the same time, we are trying to create prediction methods that would create intervals of permissible values for the considered characteristics while monitoring normal network traffic. Exceeding the interval limits at the time the DDoS attack begins allows us to detect the attack early. This approach presupposes the input of normal IP traffic at the beginning of monitoring but does not require previous “learning” of already recorded attacks for detection. We recommend the research presented in this article on machine learning methods without a teacher.
The article continues and extends the work “One-Parameter Statistical Methods to Recognize DDoS Attacks” [
8]. In the third chapter, we describe the processing of IP traffic and present the eight measured DDoS attacks we used. In the fourth chapter, we analyze the reaction of various statistical coefficients to the start of attacks in the traffic flow. These are, in order, coefficient of variation, kurtosis, skewness, entropy, Hurst exponent, autoregression coefficient, correlation coefficient, and Kullback–Leibler Divergence. In the fifth chapter, we deal with various prediction methods, and in the last chapter, we propose several detection functions designed for fast machine recognition of DDoS attacks. Finally, we summarized the results, recommendations, and suggestions for further research direction in the results and discussion.
2. Related Work
Many mathematical methods try to detect a DDoS attack. Among the main ideas is the monitoring of changes in the probability distribution of the occurrence of packets during the transition from standard traffic to attack. For this reason, the statistical moments describing the distribution of packet occurrences will change, for example, average rate, variance, spiciness coefficient [
9], measures of periodicity, kurtosis, skewness, and self-similarity [
10,
11].
More complicated mathematical methods include regression and autocorrelation analysis. Using multiple regression analysis, the strength of a DDoS attack is estimated [
12]. Using a regression model for predicting the number of zombies in DDoS attacks is discussed in [
13]. In [
14], they use a change of autocorrelation coefficients to detect an attack. Autocorrelation in the convolution of legitimate and attack traffic (cross-correlation method) is discussed in [
15].
Another important statistical characteristic used is the Hurst exponent, which describes the self-similarity of time flow. The change in self-similarity occurs during the transition from normal traffic to one containing attacks. The Hust exponent was used to identify a DDoS attack in [
16,
17,
18]. A comparison of average Hurst exponent values between standard and attacking traffic can be found in [
19]. An article [
20] deals with the combination of correlation and the Hurst exponent. The authors of the article [
21] used an autoregressive system for estimating the variance of the Hurst coefficient to detect changes in the flow. The use of self-similarity and Renyi entropy can be found in [
22]. Fractal analysis is closely related to self-similarity; its application to detect attacks can be found in [
23]. Authors in [
24] deal with a combination of fractal and recurrent functions.
Good results are achieved by Machine Learning (ML) with the use of Neural networks [
25]. The authors in [
26] used GAN networks for detection. The combination of Autoencoders and Deep Convolution GAN Networks for determining anomalies in IP flow is discussed in [
27]. The GAN Network with two Discriminators is used in [
26,
28].
An effective method for detection is also Principal Component Analysis (PCA). In [
29], PCA is used to indicate anomalies. In [
30], PCA is used for dimensionality reduction of the IP flow dataset attributes.
Low-rate DDoS attacks form a special category of attacks. With this type of attack, there is no significant increase in the moving average rate. In [
31], they used several metrics and entropies for low-rate attack detection. Authors in [
32] recognize entropy attacks using the difference in packet size between normal and attacking traffic. Self-similarity has been used in [
33], and Queue Management Algorithms (RED and REM) have been used against low-rate DDoS attacks in [
34].
Other methods include detecting attacks using wavelets (wavelet analysis) [
35,
36,
37], blockchain [
38,
39], genetic algorithms and random forest [
40], and the use of various spectral and cluster analyses and mathematical models of the SDN networks is mentioned in [
41].
Most mathematical methods used for machine learning to detect DDoS attacks use standard datasets, e.g., MIT outside of normal traffic, CAIDA-2007 DDoS attacks, TUDDoS dataset, etc. Based on the patterns in the datasets, areas describing standard and offensive traffic are created for machine detection. These areas can contain, for example, simple samples, corresponding values of statistical parameters, or other characteristics. The unknown sample is then recognized based on these previously “learned” areas.
Unlike the previous methods mentioned in this chapter, we try to find statistical characteristics that, during online monitoring of IP traffic, would quickly react to the beginning of a DDoS attack by significantly changing their values. The next step is determining predictive methods and detection functions, allowing the machine to recognize these changes during attacks. Such recognition is one of the methods of machine learning without a teacher.
3. Processing of IP Records
3.1. Ip Flow Description
We can describe the packet flow in several ways, depending on which mathematical model of the IP network we want to use. In Queuing Theory, the oldest model used is the Jackson Network [
42,
43]. All input and output flows to nodes are modeled using a Poisson process. Another stochastic model that uses the Large Deviation Principle describes flows using their Effective Bandwidths [
44,
45]. In Network Calculus, which represents a deterministic model of an IP network, input–output flows are bounded by subadditive curves [
46].
In both deterministic and stochastic models of the IP network, the cumulative process
is used to analyze the IP flow, which describes the occurrence of packets in the time interval
. For models with discrete time, there are certain time slots [ts] given for analysis or sample windows in which the number of occurring packets
(increments in the time
i) are recorded:
In the stochastic model, is a cumulative random chain, and represent some non-negative discrete random variables . Assuming the flow’s stationarity, the cumulative chain’s designation is simplified to , and the random variables have the same probability distribution.
In the case of measuring IP traffic using W-shrike, we have at our disposal a vector of the cumulative time of packet arrivals
. After choosing the size of the time slot and using addition, we obtain the values of the increments of the IP flow in the given time slot, see
Figure 1:
We used a Poisson flow simulation with an average rate
to demonstrate the method of sampling the time record of the measured traffic from Wireshark into the increments of flow time series. In
Figure 1a, we have shown 50 cumulative exponential values of variables
, representing the occurrence of packets in time. For the size of the time slot or sample window, we chose
. After addition, we obtained increments
with Poisson distribution with average rate
,
. We displayed the first 30 increment values as a time series in
Figure 1b.
In our article, we want to use statistical coefficients to describe the IP flow. Therefore, we do not use the description of the flow using the cumulative stochastic process
with random increments
, but it is sufficient for us to use a significantly simpler model. We will consider the vector of sampled increments
to be the
N realization of some random variable
, which we will use to estimate the value of
of some statistical parameter
. We denote these
N realizations as compute window
of size
. To detect an anomaly in the IP flow, we must create a time series of values of some statistical coefficient, which we gradually calculate from mutually overlapping time windows (the so-called sliding coefficient). We chose the overlap by one time slot or sample window for early anomaly detection. The calculation of two successive values of the estimate
is obtained from two consecutive overlapped computed windows:
Statistical coefficients whose values are always obtained only from one compute window are called One-window parameters. These are actually estimates of the probabilistic characteristics of .
For the following demonstration of computing windows, we used a capture of a measured DDoS attack,
Figure 2:
In addition to one-window parameters, we also deal with estimates of the probability characteristics of two random variables
and
(for example, coefficient of correlation). For the values of such a parameter, we need two computed windows. We call such characteristics
two-windows parameters. We will consider the vector increments
as the
N realization of
and the vector
as the
N realization of
. After calculating the value of the coefficient, we shift the entire pair in time by 1ts, as in one-window parameters,
Figure 3:
By gradually moving one compute window or pairs of windows, we obtain a time series of the estimated values of the given statistical parameter. When using a shift of one-time slot and sizes, e.g., at the start of the attack, we calculate the parameter value from the computed window, which contains 1% of the attack traffic. Our effort is to find statistical parameters that react relatively quickly to the occurrence of offensive traffic. By successively moving the computed window, we obtain a time series of estimated values of the given statistical parameter.
3.2. Types of DDoS Attack
In this chapter, we present selected captures of real DDoS attacks, on which we present the course of the values of individual monitored statistical coefficients. We intentionally omitted several experiments that we performed on various simulated scenarios where we used IP flow generated using 2-state On/Off processes, using Poisson and Pareto distributions, and also using the MNIST and CIFAR databases. In these simulated scenarios, the selected statistical parameters worked “exceptionally well”, and we acquired an initial idea of the effectiveness of the use of individual parameters in recognizing changes and increases in simulated traffic. However, the situation changed significantly when deployed to detect a real attack. We will analyze the response of statistical coefficients on eight selected attacks.
We obtained captures from the following datasets, e.g., ISCX 2012 and CIC-IDA 2017, but mainly from our own custom dataset [
47]. More detailed information about the attacks is available in [
8,
48,
49].
We divided the attack captures into several types according to the course of the observed statistical coefficients (see the next chapter). The first type represents standard attacks (N-normal) in which the coefficients reacted similarly to simulated scenarios [
8],
Figure 4 and
Figure 5:
The graphs show the flow increments , moving average with , and time intervals from the beginning of the attack to the moment when the average reached its local maximum. We will later use these intervals to evaluate other statistical parameters’ effectiveness objectively.
The other two captures,
Figure 6, at first glance, also represent standard attacks; the standard traffic has a stationary character, and the offensive traffic has several times the average rate. However, the values of the coefficients behaved differently than with normal attacks, which is why we labeled them special attacks (S-special).
The seventh capture in the sequence,
Figure 7a, is a representative of the so-called Low Rate (LR) DDoS attacks, i.e., attacks with a low average rate of flow [
50,
51]. As the last record, we mention a problematic attack,
Figure 7b, in which the monitored coefficients mostly failed to capture the starting point of the offensive traffic (P-problem). It was a DDoS attack captured using the core server at the University of Žilina [
8]. According to the course of the moving average, it also belongs to the low-rate attack. We devoted a special section to this attack.
For the gradual calculation of coefficient values, we used a shift of the computed window by a one-time slot. When conducting experiments with records of DDoS attacks, we used different powers of two (256, 512, 1024, 2048, 4096) to estimate the Hurst exponent for computing window sizes. After a subjective evaluation of the course of the values of the investigated coefficients, we decided to consider only further compute windows of size . At , there was a significant “ripple” in the courses of some parameters; at , the time for calculating the parameters, especially the Hurst exponent, increased significantly.
We can observe the linear growth of the moving average on all the listed records. Of course, this growth depends on the size of the compute window. For the sake of comparison, we used the same size and shifted it by 1 . This size guarantees that the average does not react to random peaks in the flow.
We assume that in a DDoS attack, the overall probabilistic structure of the flow will change due to the generation of a large number of flood packets. Our effort is to determine such statistical parameters that react to a DDoS attack significantly faster and more pronounced than the linear growth of the moving average.
4. Responses of Statistical Parameters to a DDoS Attack
4.1. One-Window Parameters
4.1.1. Coefficient of Variation
The basic probabilistic characteristics of the given random variable
are the first initial moment
(mean) and the second central moment
(variance). Using their mutual quotient, the
coefficient of variation is defined. The coefficient of variation of the random variable
represents the ratio between the standard deviation and the mean value [
52].
Statistical estimates of the above characteristics will be denoted as average rate
, sample variance
, and sample coefficient of variation
V:
When calculating coefficient values using overlapped compute windows, we will talk about moving coefficients, for example, moving average and moving sample variation, and denote them as and .
In the case of standard DDoS attacks (type N), the standard traffic has the character of a stationary flow, whereby the standard deviation
acquires significantly smaller values compared with the average rate. At the start of a DDoS attack, the average rate will increase many times, and the standard deviation value will also increase, even faster than the linear trend. For this reason, at the moment of a DDoS attack’s starting point, the variation coefficient exceeds the value
. This is how the coefficient reacted to all analyzed standard attacks of the N type and to an S5 attack, for example, N8,
Figure 8: Please check all figures.
During a non-standard attack of the S1 type, the coefficient
exceeded the value of 1 several times already during normal traffic,
Figure S3. During the entire capture of low-rate attack LR3
held,
Figure 9:
During the problematic attack, the P4 coefficient did not exceed the value of 1 at all. The progress of the other recordings is given in the article’s appendix.
4.1.2. Kurtosis Coefficient and Skewness Coefficient
The other basic characteristics of the random variable
are
kurtosis coefficient and
skewness coefficient. It is actually the third and fourth central moment scaled standard deviation:
Both coefficients certainly describe the properties of the probability distribution of the random variable . The kurtosis coefficient is a measure of the asymmetry of the probability distribution around the mean value of the random variable. The skewness coefficient describes how much the peak of the curve of the density function differs from the Gaussian density function.
In our case of realization of the variable
, the values of the increments are
, and the coefficients describe the properties of the probability distribution of the increments in the given calculation window
. We denote their estimates as
K and
. We get the first estimate values from the compute window
:
When processing the attack traffic into the compute window, we assumed a significant change in the probability distribution and, thus, also a change in the coefficients. The assumption was confirmed for almost all analyzed flows, for example, N8,
Figure 10:
Both statistically computed calculations reacted with a significant increase in their values right at the beginning of the DDoS attack. A similar situation occurred during the processing of recording S1 and low-rate attack LR3. We noticed a different behavior of the coefficients only in the S1 attack,
Figure 11. To visualize the reaction of the coefficients to the change in the nature of the traffic and to the occurrence of peaks, we displayed the values of the coefficients in a common graph together with the flow increments S1,
Figure 11:
Due to the non-stationarity of normal traffic and the frequent occurrence of high peaks in the S1 record, the values of both coefficients fluctuate strongly, due to which the jump at the starting point of the attack is lost in the overall flow of the values of both coefficients.
The coefficients K and are estimates of the third and fourth central moments of the stochastic variable , which is why the course of their values is very similar. In the majority of the analyzed captures, they reacted to the start of a DDoS attack with a several-fold increase (peak) of values compared with the previous course. We will use this fact in machine recognition using prediction methods. The coefficient of variability reacted similarly, but in its case, the exceeding of the value can be used to detect an attack in several captures.
4.1.3. Entropy
Entropy is associated with terms, such as thermodynamics, statistical mechanics, or information theory. This physics quantity expresses the degree of randomness or the uncertainty in which some random event or signal occurs. We can then also represent this degree of randomness as size information that the given signal can transmit. Entropy, as well as kurtosis and skewness describe, in a sense, the probability distribution of increments
in the currently processed compute window:
Let the size of the computed window be
, and
represents the number of values
in the given window
. We denote the Entropy for the given window as
H and its estimate
E:
When the attack traffic is gradually loaded into the current CW, we assume a significant change in the probability distribution and, thus, also a change in the entropy value. For illustration, we present two extreme cases, attacks N8 and N6,
Figure 12:
Entropy reacted with a significant drop in values at the beginning of the DDoS attack. In an ideal case, the decrease is significantly more pronounced than the previous course of Entropy. In the majority of recordings, however, the entropy values reacted significantly even to slight changes in the structure of standard traffic and to the occurrence of peaks in traffic flow.
Based on most of the experiments on various DDoS attacks, we concluded that Entropy is useless for attack detection; prediction of its progress would lead to the detection of many false attacks. For these reasons, we excluded Entropy from the investigated methods.
4.1.4. Hurst Exponent
Another examined characteristic was the Hurst exponent. The exponent expresses the degree of self-similarity of the time series. It is used in several areas of applied mathematics, including fractals and chaos theory, long-term memory processes, spectral analysis, and in sizing network parameters in Queueing theory.
When testing Hurst’s reactions on simulated DDoS scenarios and on the first gathered real recordings, we got some interesting results [
53]. The exponent had a tendency to “jump” to values close to H = 1 at the start of the attack. At the same time, during the processing of standard traffic, it remained in the range between 0.4 and 0.6, which corresponds to the values of a stationary random process. The possibility of using the Hurst exponent for machine recognition of an attack when a value close to H = 1 is exceeded [
8], was drawn. When the experiments were carried out on other real captures of attacks, Hurst’s exponent stopped responding ideally.
The Hurst coefficient cannot be calculated analytically; we can only estimate it statistically. Several methods are used to estimate the exponent, the main ones include the R/S statistic, the aggregated variance method, the absolute value method, the variance of the residuals, the Higuchi’s method, the Modified Variance of Allan, the scale window variation, the Whittle estimator, etc. [
54]. We used estimation using R/S Analysis [
55] and Detrended Fluctuation Analysis (DFA). We do not mention individual estimation procedures due to their complexity; their description is given, for example, in [
8,
56,
57]. Based on our empirical experience, we also used a modified estimate of the Hurst exponent using R/S Analysis, whereby we first removed their linear autoregressive trend from the incremental values in the given compute window,
, (mark RS AR(1)) [
58].
In the following figures, we show two cases where, according to our subjective evaluation, the reaction from the point of view of machine recognition turned out to be very negative and absolutely ideal (other records are listed in the appendix). When processing recording LR3,
Figure 13a, none of the Hurst exponent estimates reacted significantly to the onset of a DDoS attack. In the case of N8,
Figure 13b, all three estimates reacted significantly, whereby the RS estimates exceeded the value of H = 1.
Overall, we can say that with DFA estimation, the values of the H-exponent at the moment of the attack mostly dropped significantly. In RS estimation, the course of values is often insignificant for machine recognition. However, after removing the linear autoregressive dependence, the H-exponent values increased significantly. In some cases, at moments of attack, the values of the exponent even exceeded the value of H = 1. However, this exceedance also occurred during standard traffic processing, for example, attack N2,
Figure S17, and S1,
Figure S19. Therefore, this property of the Hurst exponent can be used in machine recognition only in combination with other parameters.
4.2. Two-Windows Parameters
4.2.1. Autoregressive and Correlation Coefficients
Autoregressive and correlation coefficients are very similar probabilistic characteristics that express a certain kind of dependence between two random variables, in our case, between and , i.e., variables whose realizations represent IP flow increments of in the overlapped compute windows.
Correlation coefficient
represents scaled covariance using standard deviations of individual random variables:
The autoregressive coefficient represents a linear dependence in an autoregressive model
, which assumes that the considered random process
has the structure
while random variables
constitute white noise [
59]. We denote the estimate of autoregressive coefficient
as
c. For realizations of the random process
holds
Estimate
c is calculated according to the method of least squares [
60]; the calculation of the estimate of correlation coefficient R is well known
From the shape of these two parameter calculations of the estimates, comes their similar course. Again, we selected two extreme records, LR3 and N6,
Figure 14:
At the moment of the starting point of the attack, the values of the coefficients began to grow rapidly toward the value of 1 for most of the records. Thanks to this, we can set a certain limit value (threshold) for both coefficients, the crossing of which would signal a DDoS attack. The lower the value, the more timely the attack signaled; on the other hand, very low attacks can cause false reports. Based on the performed experiments, we set the limit for the moving autoregressive coefficient to
, and the moving correlation coefficient to
. For most recordings, except for LR3 and P4,
Figures S24 and S28, thresholds set in this way can be used for machine detection.
4.2.2. Kullback–Leibler Divergence
Another quantity that uses two computing windows is the Kullback–Leibler divergence . Divergence is one of the measurements used in mathematical statistics to determine how one probability distribution function (P) differs from another probability distribution function (Q).
The attack is it compares the probability distribution of two stochastic variables
and
. The vectors of the realization of these two variables in successive overlapped compute windows of size
N are known as
and
. We denote the divergence estimate
as
. Let
be the number of increments
in the compute window
and
in the next
. For the calculation of the divergence
and its estimate
, we have relations
Divergence reacted to most records immediately when loading the first compute window with offensive traffic with a high impulse. However, it had a tendency to react in this way to peaks in normal traffic, which could cause a lot of false reports during detection. For example, in attack S1, the impulse during the attack is indistinguishable from impulses during normal traffic,
Figure 15a. In attack LR3, the course of divergence is completely ideal for machine recognition,
Figure 15b:
Another disadvantage of using divergence is the fact that the high impulsion at the start of the attack lasts a very short time compared to other parameters, so we see its application in combination with other parameters as problematic. We have, therefore, postponed the use of divergence for the next research.
4.3. Effectiveness of the Use of Statistical Coefficients
In the following
Table 1, we have summarized the evaluation of the studied statistical coefficients’ reaction to the analyzed DDoS attack captures. From the three studied estimates of the Hurst exponent, we selected the estimate using RS analysis with the removal of the autoregressive linear trend. We considered this estimate to be the most effective.
Green cells mean that the attack was recognized by exceeding the thresholds of the given parameters. Threshold values are listed in the right row of the table. The number in the cell means the number of false reports and reps. It exceeds the limit value before the attack. Especially in the case of divergence, this means the number of counter impulses before the actual attack.
- -
The designation “PT” means that with the given parameters, we assume that the attack could be recognized using prediction methods (PT, predicting tunnel [
8]);
- -
The marking “x” means that the given parameter is not applicable for the given record.
We can divide the parameters into two groups based on the performed experiments. In the first group, we included the parameters for which we can use their threshold value for attack detection: , , , and . The second group includes parameters for which prediction methods must be used to detect an attack, mainly , , and also and . We see that the division into groups is not clear-cut. Divergence, due to the short impulse duration during the attack and frequent reactions to peak peaks in normal traffic, we have excluded from further considerations.
5. Predicting -Tunnel
The idea of a simple prediction
-Tunnel, determined using average and deviation values of the given parameter, was presented in [
8]. Next, we dealt with tunnel creation using a polynomial regression model, Fourier transformation, and autoregression analysis. However, the effectiveness of these compared to computationally demanding methods,
-Tunnel, was significantly worse, not only with a later time of reporting the attack, but also with the occurrence of several times more false positives [
61]. Therefore, we will only deal with the
-Tunnel.
For the
parameter, we determine the prediction window of size
. We mark the parameter values in the window with
. From the
values, we calculate the average
and standard deviation
. We will create an interval around average
,
and then test whether the new value of the parameter belongs to the predicting interval,
. If it does not, the machine detects the beginning of the attack. Next, we shift the prediction window
by one parameter value
and repeat the whole process.
Based on the experiments, we decided to use a prediction window of size and the width of the interval . With such settings, the prediction tunnel was able to adapt to the development of parameter values and detect sudden changes at the start of the DDoS attack, with only a relatively small number of false reports.
When using the prediction tunnel directly on increments of the IP flow
, we found that although exceeding the upper limit of the tunnel detects the starting point of a DDoS attack relatively early, at the same time, there are frequent false reports. When the test interval increased, the number of false reports decreased, but at the same time, the ability of the method to detect an attack decreased. We selected records N8 and N1 as an example. Only the upper limit of the
-Tunnel is shown in
Figure 16.
Using statistical parameters to detect DDoS attacks enables the suppression of peaks in the IP flow thanks to a relatively large compute window.
In the following figures, we will demonstrate the behavior of the 3
-Tunnel on the parameter with the greatest variability among the considered statistical coefficients, namely on the Hurst exponent (estimated by RS statistics with removing the autoregressive trend). In
Figure 17a, there is an ideal case where the detection occurred when loading the third time slot with an offensive traffic, and no false reports occurred. In
Figure 17b, the attack was not detected, and there was one false report:
Calculating the Hurst exponent is significantly more time-consuming than other statistical parameters, so we have temporarily postponed its use for detection.
Based on the experiments performed with different tunnel widths and different prediction window sizes, we can say that the -Tunnel with has a sufficiently “long memory” to be able to cover the local variability (jitter) of the statistical coefficient without causing a large number of false reports. At the same time, the prediction tunnel set up in this way has a sufficiently “long memory” to be unable to react to a significant increase in the values of the given coefficient at the start of a DDoS attack.
In
Table 2, we present detections using the
-Tunnel applied to selected statistical coefficients. The F/R symbol represents the number of false reports (F) and the attack detection time (R) in
ts. For example, the first value in
Table 2, 2/30, means that when using the
-Tunnel applied to the variation
, there was an attack recognized in the record S1 within 30
ts of its beginning. Before that, there were two false reports (we have excluded the problematic record P4 from the experiments for now and will devote a separate subsection to it).
We consider the R parameter (the attack detection time) to be more important than the parameter F (the number of false reports) because a suitable combination of statistical coefficients can eliminate the number of false reports. Therefore, we will use both parameters to compare the effectiveness of the coefficients. We will use the PCA (Principal Component Analysis) method to visualize the similarity in the 3D view [
62] as shown in
Figure 18. In the data stage of the method, two matrices
and
of dimensions
are represented, which contain the values of the parameters
F and
R from
Table 2. Unlike the table, the rows of the matrix represent statistical coefficients from
to
and the column records from S1 to N8 (transposed table):
For 3D visualization, we transform line vectors matrices
and
into spaces with Karhunen–Loev base
and
(orthonormal basis of eigenvectors):
For the 3D visualization, we use the first three columns of the matrix
and
(the first three main components),
Figure 18:
The effectiveness of individual statistical coefficients is determined according to the distance from the zero vector,
Table 3. For the F parameter, the zero vector represents the zero number of false reports for each traffic capture, and the R parameter represents the recognition of the attack before it started.
In the case of false reports, the order of coefficients from the most effective is , , and . The order of attack detection speed is , , and . The worst was the autocorrelation coefficient . We will use these results in the next chapter to create detection functions.
6. Detection Functions for Machine Recognition of DDoS Attacks
In the previous chapters, we outlined two possible ways of recognizing DDoS attacks using statistical coefficients. The first way is the use of threshold values for appropriate coefficients. The second way is applying the prediction tunnel to the course of the coefficient values.
6.1. Detection Method Using Threshold Values
For the Detection method using threshold values (DTV), the following statistical parameters are useful: coefficient of variable , Hurst exponent , autoregressive and correlative coefficients and . For and , it is a value of 1.00; based on performed experiments, we proposed a value of 0.90 for and for the value 0.75. Our effort is to create a computationally simple detection method. That is why we left out Hurst’s exponent from further consideration.
We introduce a two-valued 0/1 logic function
, which evaluates the exceedance threshold
for a given statistical coefficient
:
In the following graphs, we will show offensive traffic captures S1 and N2 for a course of coefficients
, autoregressive coefficient
, and correlation coefficient
,
Figure 19, and the course of their functions
,
, and
,
Figure 20:
Sets of values on which coefficients exceed their threshold are referred to as detection intervals. In
Figure 20, we see how these intervals overlap each other. In order to eliminate false reports and, at the same time, achieve timely detection of attacks, we will introduce a three-valued detection function
:
The course of the detection function for records S1 and N2 is in
Figure 21:
In both traffic captures, the coefficient values exceeded their thresholds already during standard traffic processing in the case of record S1 variation coefficient,
Figure 20a, and in the case of record N2 autoregressive coefficient,
Figure 20b. However, thanks to the shape of the detection function (
20), false reports were eliminated. According to the values
and
, we can determine in which time slot two or all three statistical coefficients detected the attack.
In
Table 4, we present the evaluation of the use of detection
functions for the analyzed attacks. The first value represents the number of false reports, the second determines the time in which the attack was detected after the start of the attack of at least two coefficients, and the third value indicates the time when all three coefficients
,
, and
. The sign “x" means that the detection function did not recognize the attack for the given recording (statistical coefficients did not exceed their thresholds at the beginning of the attack).
The detection function
was able to eliminate false reports for all the examined records. It successfully detected the attack, especially with standard attacks; with LR3, P4, and S5 records, in general, it did not react to the attack. Another shortcoming of the method may be relatively late attack detection, e.g., in recording, the N7 attack started at 8300
ts, the detection function recognized it with a value of 1 at 8528
ts (228 ts from attack) and with a value of 3 at 8780
ts (480
ts from attack),
Table 4. On the other hand, the local maximum for the moving average rate was recorded at 9680
ts (1380
ts from attack). Our effort is to find additional detection functions that would achieve better results regarding attack detection time.
6.2. Detection Method Using Predicting Tunnel
In
Section 5, we determined the first three most effective statistical coefficients concerning the speed of attack propagation using
-tunnels: kurtosis
, skewness
, and variations
. Since we have selected coefficients whose values increase significantly when a DDoS attack starts, we will only deal with the upper limit of
-tunnels. The time during which the given coefficient
exceeds the upper limits of tunnel
is denoted as the detection interval and is described by the function
:
In
Figure 22, we show the history of the function
on records S1 and N7. Together with increments of flow
, we will show the course of kurtosis coefficient
, upper limits of
-Tunnel of
, and the corresponding function
:
In both records, the kurtosis coefficient exceeded the upper limit of
-Tunnels already during normal IP traffic processing. To eliminate false reports, we use the same form of the detection function as in the previous method, which used thresholds of statistical coefficients. We denote the detection function for the detection method using the predicting tunnel (DPT) by
:
In
Table 5, we present the evaluation of the use of detection functions
. The method of evaluation is identical to the evaluation of the function
in
Table 4. The first value represents the number of false reports, the second determines the time of attack detection by at least two coefficients, and the third value represents detection by all three coefficients.
Compared to the previous method, the detection function was able to recognize the starting point of a DDoS attack for all analyzed traffic captures, even significantly earlier than the function. However, the method also has a disadvantage: with almost all records, there was one false report. Therefore, the next direction of our research was the effort to eliminate false reports while maintaining the early detection of an attack.
6.3. Detection Method Using Holt-Exponential Smoothing
In an effort to eliminate false reports, we decided to apply smoothing methods to the course of statistical coefficient values. The simplest method is exponential smoothing [
63].
It is a relatively simple method in which the values of the statistical coefficient
are replaced by their weighted average
according to the relationship:
The value
represents the so-called exponentially weighted moving average in time
t. The value of
determines the degree of smoothing if
, the smoothing is minimal and
. In case
results in strong smoothing, and the method minimally reacts to local fluctuations in the values of the coefficient
. We can edit the relationship (
21):
We already presented the first experiments with exponential smoothing in [
8]. When performing experiments on other records, the shortcomings of this smoothing became apparent. Significant suppression of local peaks in courses of the examined coefficients was manifested at the value of the weight parameter around
. A time series smoothed in this way has a “long-term” memory and only minimally adapts to the newly read values. Although there were local peaks smoothed out, at the same time, the values of the coefficients have shifted significantly over time, and in some records, even the growth of values of the coefficients was suppressed at the moment of the start of the DDoS attack, for example,
Figure 23 and
Figure 24.
A more complicated smoothing method is Holt-exponential smoothing,
, and double exponential smoothing [
64]. Another smoothing parameter
and the component
are added to the model, which represents the estimate of the linear trend of
values at time
t. Since, in our case, we do not know in advance what trend the smoothed time series
will have, we put
:
After the adjustments, we can write the individual components of Holt’s smoothing as follows:
Using the traffic captures of the DDoS attack, we performed comparisons of the influence of exponential smoothing and Holt-exponential smoothing of statistical coefficients on the effectiveness of detection using the predicting tunnel DPT. We compared Holt-exponential smoothing with different settings of the parameters , and k with exponential smoothing with the same smoothing parameter . Finally, after subjective evaluation, we chose the values , and . With such a setting, Holt-exponential smoothing was able to smooth out the local peaks of the given coefficient and, at the same time, did not significantly deviate from the overall course of values, as was achieved in the case of exponential smoothing with the same smoothing coefficient .
Figure 23 displays the example of both smoothing methods used for the kurtosis coefficient
and correlation coefficient
:
Even though both smoothing methods have the same “memory” length (), thanks to which significant local peak suppression occurs, exponential smoothing causes a significant time delay. Thanks to the inclusion of a trend component in the Holt-exponential smoothing, such an effect does not occur.
The next
Figure 24 displays autoregressive coefficient smoothing
by both methods for records S1 and LR3:
In
Figure 24a, we see that exponential smoothing of the autoregressive coefficient with record S1 goes completely outside the values range of the coefficient. Using traffic capture LR3 (
Figure 24b), even exponential smoothing completely suppressed the peaks in the course of the autoregressive coefficient at the start of a DDoS attack. The Holt method also significantly reduced peaks, but still remained recognizable from the previous course of the coefficient.
Before using the DPT prediction tunnel detection method, we first smoothed the considered coefficients of variation, kurtosis, and skewness using the Holt-exponential method and only after the smoothing we used the predictive
-tunnel. We called this method the Detection Method using Predictive Tunnels with Holt-exponential smoothing (DPT-Hs). Two-valued decision function signaling in the crossing of the upper limit prediction interval for individual smoothed coefficients is denoted as
, and the detection function of the DTV-Hs method itself is
:
In
Table 6, we present the evaluation of the detection functions
usage. The method of evaluation is identical to the evaluation of the function
shown in
Table 5. The first value represents the number of false reports, the second determines the time of attack detection by at least two coefficients, and the third value determines detection by all three coefficients.
Except for the non-standard S1 attack, we noticed on all other traffic captures that the use of Holt-exponential smoothing on detection using the predictive tunnel eliminated all of the false reports. The smoothing, however, caused a time shift in the values of the used statistical coefficients, which resulted in a slight delay in signaling the start of a DDoS attack. The elimination of false reports at the expense of a slight delay in DDoS attack signaling leaves an open question for the process of real deployment in IP traffic monitoring.
7. Detection of Problematic P4 Attack
In previous chapters, we identified the P4 attack as problematic. The reason was that based on the course of the values of the considered coefficients, this attack was unrecognizable using the detection methods presented so far. Of all the analyzed DDoS attacks that traffic captures, we are more interested in the examination of this attack since it was an attack on the network infrastructure of our own University. Therefore, we continued to search for other detection methods to recognize this problematic attack.
We can consider this attack, according to the course of the moving average rate, as a low-rate DDoS attack (when attacks do not occur periodically),
Figure 7b. Marking the beginning of the attack by the admin is in 3060
ts (1
ts = 10
ms), but from the record of increments a certain change in the nature of the flow can be recognized as early as 2650
ts. At this moment, the attack itself was not recognized by the detection function
using threshold values of coefficients, or by the functions
and
with the predictive
-tunnel. After conducting experiments with different statistical coefficients and different multiples of the
-tunnel, we found that with
-tunnel, some coefficients recognized the beginning of the attack after 3060 ts,
Figure 25, and some coefficients reacted to the change in the nature of the IP flow in the interval
,
Figure 26:
DPT- method with detection function using coefficients , , and with other records caused a lot of false positives. In an effort to find an acceptable solution for all the tested traffic captures, we conducted further experiments with all record combinations of various statistical coefficients and parameters of Holt-exponential smoothing by different forms of the detection function . Based on our subjective evaluation, we have devised a method that relatively satisfactorily recognizes all of the investigated attacks. The method consists of the following steps:
Calculation of moving coefficients of variation , kurtosis , skewness , and correlation ;
Values of all coefficients are smoothed using Holt-exponential smoothing;
Signaling of exceeding the upper limit -tunnel using the function ;
DDoS-attack detection using the multiplicative detection function
:
The course of coefficients
,
,
,
, and multiplicative detection function is shown in
Figure 27:
Holt-exponential smoothing reduced the course value variance of the considered coefficients. By changing the size of the prediction interval to , using these coefficients, we achieved an attack signaling and multiplicative form , as a product of decision functions , which eliminated false reports.
In
Table 6, we present the evaluation of the usage of detection functions
. Unlike the previous tables, in
Table 7, since the
function is only 0/1-valued, we include only two values, the first represents the number of false attacks, as before, and the second considers the moments of detection of attacks after its beginning in [ts]:
The method detected an attack at every examined record, at S1 and S5 it signaled a false report, and sample N6 recognized the onset of the attack 106 ts earlier. It is questionable whether to evaluate this event as a false positive or as an early detection of an attack.
8. Results and Discussion
Our effort is to create an autonomous self-learning system of relatively simple software-implementable statistical methods for the purpose of early detection of DDoS attacks usable in high-speed network infrastructure. The system must record a change from standard traffic to attack traffic in time.
We tested several different statistical coefficients that describe the probabilistic structure of the network flow. The reaction of tested coefficients on offensive traffic was tested on eight selected records of DDoS attacks of various types.
Based on the response of statistical coefficients to the onset of a DDoS attack, we divided the coefficients into two groups.
In the first group, we included the parameters with which we can detect the attack using their threshold value. We have the coefficient of variation (1.00), Hurst exponent (1.00), autoregressive coefficient (0.90), and correlation coefficient (0.75).
The second group includes parameters for which prediction methods must be used to detect an attack. Here, we included the kurtosis and skewness coefficients , , and autoregressive and correlation coefficients , . Later we also included the coefficient of variation . We see that the division into groups is not clear-cut. The divergence was due to the short duration of the impulse during the attack and also the frequent reactions to peaks in standard traffic excluded from further considerations.
From various prediction methods, which we do not mention in this article, we chose the so-called predicting -Tunnel, which uses the average value of the coefficient and its standard deviation based on experiments with different tunnel widths and different sizes
Using the PCA method [
65], we evaluated statistical coefficients according to the number of false reports and the speed of attack recognition. In the case of false reports, the order of the coefficients is from the most effective
,
, and
. The order of attack detection speed is
,
, and
. The worst in both cases turned out to be the autocorrelation coefficient
. We used the results when creating detection functions.
We have divided the detection methods intended for machine recognition of a DDoS attack into the Detection Method using threshold values (DTV) and the Detection Method using the Predicting Tunnel (DPT). For individual methods, we indicate in brackets [] the statistical coefficients that were used, and, depending on the situation, the width of the prediction channel and the application of Holt-exponential smoothing are also indicated. Each method is uniquely determined by its specific detection function:
DTV [V, c, ]
Calculation of moving coefficients of variation , autoregressive coefficient , and correlation coefficient ;
Signaling of exceeding threshold values using the function ;
Attack detection using the detection function
:
DPT [V, K, ]
Calculation of moving coefficients of variation , kurtosis , and skewness ;
Signaling of exceeding the upper limit -tunnel using the function ;
Attack detection using the detection function
:
DPT-Hs [V, K, ]
Calculation of moving coefficients of variation , kurtosis , and skewness ;
Coefficient values are smoothed using Holt-exponential smoothing;
Signaling of exceeding the upper limit -tunnel using the function ;
Attack detection using the detection function
:
DPT-Hs [V, K, , ]
Calculation of moving coefficients of variation , kurtosis , skewness , and correlation ;
Coefficient values are smoothed using Holt-exponential smoothing;
Signaling of exceeding the upper limit -tunnel using the function ;
Attack detection using multiplicative detection function
:
In the following
Table 8, we present a summary evaluation of the use of detection functions on the measured records of DDoS attacks. As before, the first value represents the number of false reports during normal IP traffic. The second value determines the time (number of time slots) after which they started to detect the attack with minimal statistical coefficients (except for the function
). The third value indicates the time when all three coefficients detected the attack. In the case of the
function, only two values are given because this function is a 0/1 value. The sign “x” means that the detection function did not recognize the attack in the given traffic capture.
The given
Table 4 provides a basic idea of the effectiveness of the proposed detection methods. However, it is not easy to make a single recommendation.
The computationally fastest method is DTV, which uses threshold values for detection. The method is effective for most standard DDoS attacks, but it could not detect non-standard attacks such as LR3 and S5 at all. Another disadvantage is the detection delay time, which is longer than with other methods.
With the DPT method, the detection delay time is several times shorter, even in some cases instantaneous, but it also has rare false positive reports.
With the DPT-Hs method, the computational difficulty is increased by using Holt-exponential smoothing. Smoothing mostly removed false reports, but extended the attack detection time.
The problematic traffic capture P4 was only handled by the DPT-Hs method using four statistical coefficients and reducing the width of the prediction tunnel to . The number of false reports caused by the reduced width was eliminated partly by Holt-exponential smoothing and partly by the multiplicative form of the detection function . Attack detection occurs only if all four coefficients , , , and recognize it. When testing traffic capture N6, a curious situation occurred when the detection function responded 106 ts earlier than the attack occurred. We can evaluate this event that the method detected the anomaly even before the start of the attack itself, which means, that the method totally failed in detecting the attack.
From the point of view of our subjective evaluation and also due to the calculation speed, we would still recommend the DPT method for hardware implementation, which uses the detection function with coefficients of variation , kurtosis , and skewness .
9. Conclusions
In the article, we discussed the use of statistical coefficients for the quick detection of DDoS attacks. For various types of DDoS attack records, we calculated the moving coefficient of variation, kurtosis, skewnes, moving Hurst exponent, entropy, autoregressive and correlation coefficients, and moving Kullback–Leibler divergence. Based on their course, we divided the coefficients into two groups depending on whether we could use any of their threshold values or prediction methods when detecting an attack. For prediction, we created the predicting -Tunnel, which uses the average coefficient and standard deviation value. For fast machine recognition of a DDoS attack, we have developed and tested four different methods, which are clearly determined by their own detection functions.
We used two parameters to determine the effectiveness of individual methods: the number of false reports during normal IP traffic and the delay time of DDoS attack detection.
During the development of detection methods, we gradually experimented with the size of the prediction tunnel, various combinations of statistical coefficients, exponential and Holt-exponential smoothing parameters, and different shapes of detection functions. The result is a custom recommendation for software implementation.
We partially excluded Hurst’s exponent from considerations due to the computational complexity, variable course, and Entropy, which significantly reacted to peaks in normal traffic, which caused a lot of false reports.
We see great potential in the use of Kullback–Leibler divergence, which immediately reacted to the onset of a DDoS attack with a significant impulse. However, the problem was that this pulse was very short compared with the other coefficients, so we have not yet managed to include it in the detection functions.
We see the next direction of research in the search for types of detection functions that would use not yet tested statistical coefficients and, simultaneously, make machine recognition of DDoS attacks more efficient.