*Article* **Early-Stage Detection of Cyber Attacks**

## **Martina Pivarníková 1, Pavol Sokol 2,\* and Tomáš Bajtoš <sup>2</sup>**


Received: 1 October 2020; Accepted: 26 November 2020; Published: 29 November 2020

**Abstract:** Nowadays, systems around the world face many cyber attacks every day. These attacks consist of numerous steps that may occur over an extended period of time. We can learn from them and use this knowledge to create tools to predict and prevent the attacks. In this paper, we introduce a way to sort cyber attacks in stages, which can help with the detection of each stage of cyber attacks. In this way, we can detect the earlier stages of the attack. We propose a solution using Bayesian network algorithms to predict how the attacks proceed. We can use this information for more effective defense against cyber threats.

**Keywords:** cyber attack; attack prediction; attack projection; early-stage detection; Bayesian network

#### **1. Introduction**

Due to the constant development of cyber threats, various defense solutions need to be continuously improved. In addition to developing prevention systems, it is also necessary to focus on detection systems that help to obtain information about threats and attacks. The detection of malicious actions is one of the most critical cybersecurity issues. Intrusion detection refers to the detection of specific patterns or anomaly observations. Nowadays, however, we need to preventively anticipate upcoming harmful activities so that we can react to them and prevent an attack in time before it causes some damage.

Attack prediction study is not as prevalent as detection. Therefore, it is necessary to explore this area of interest because it is beneficial for the entire field of cybersecurity. To predict attacks, it is necessary to examine how they proceed and what steps are being taken. These data can be used to continually improve the systems to detect each phase of the attack. In this way, it is possible to detect the earlier stages of the attacks and predict how they proceed.

Early detection and prediction of cybersecurity incidents, such as attacks, is a challenging task. The threat landscape is continuously evolving, and even with the usage of intrusion detection systems, advanced attackers can spend more than 100 days in a system before being discovered [1]. After the detection of a security incident, we need to determine how the attack will proceed. This is essential because if we can stop the attacker in time, they cannot do as much damage.

It is important to learn from existing attacks so that we can develop tools to find out if such attacks have been repeated. Attack modeling is an intrusion-based methodology that allows one to focus on the different stages of an attack. It is aimed at focusing on different stages of attacks. By identifying attacks at different stages and by implementing tools to disarm the attacks at their various stages, one can take preventive measures to ensure that similar attacks will be detected. It is important to have a layered model to ensure that if one of the defense systems is bypassed, there is another defense line to protect one's organization's assets. That is why we need to establish a multi-layered model of cyber attacks.

In recent years, it has not been sufficient to only be alerted of a security incident. Prevention of the attack altogether has become a necessity. The highest priority in computer security is to prevent an attack and stop the attacker from doing damage. If the path of an attack can be predicted, one has the ability to avoid attacks at every phase. By looking at a survey of the technology, from the host to the network level, one will have an opportunity to study tools or solutions that can be used in protecting against these threats. There are numerous existing prevention methods that are able to stop attacks in progress.

Recognizing an attack's steps is the goal of many cybersecurity analysts. The authors in [2] categorized prediction methods into three categories. An overview can be seen in Table 1.



The research is focused on early-stage detection and it is based on attack prediction, especially attack projection. This area focuses on the prognosis of the future steps of the attack. The projection of the future stages of an active cyber attack is essential in the context of Cyber Situational Awareness. The attacks often occur over an extended period of time. They involve a lot of steps and use multiple techniques for reconnaissance, exploitation, and obfuscation activities to achieve the attacker's goal. Therefore, it is not sufficient to just detect new or ongoing threats. The projection of future attack steps is deduced from already detected malicious activities. The estimates of current attack tactics may be used to assess imminent threats to critical assets [3].

This paper is based on the previous research of [7] and further develops research conducted by Ramaki et al. [8]. Based on the above-mentioned considerations, we state the following research sub-goals:


This paper is divided into seven sections. In Section 2, which is focused on related work and existing methods, the analysis of the current approaches of cyber attack prediction is provided. Section 3 presents the drawbacks of existing models and describes the suitable cyber attack model in detail. Subsequently, in Section 4, we propose the approach for early-stage detection of cyber attacks. This includes all of the necessary steps for data processing, alert aggregation, and causal relationship discovery. This section also covers the definition of Bayesian networks. After that, the model for the construction of the Bayesian network and prediction of cyber security alerts is proposed. Section 5 focuses on preprocessing and analysis of the data collection, including the creation of cyber alerts. The example cases of methods for aggregation, causal relationship discovery, and Bayesian network construction are shown. Section 6 presents and discusses the results of the presented methods. Concurrently, it describes groups of alerts and some of the attack paths. In the last section, the conclusion is provided.

#### **2. Related Works**

A large number of cyber attack prediction methods use discrete models and graph models, such as attack graphs, Bayesian networks, or Markov models.

In 1998, an attack graph was introduced by Swiler and Phillips [9]. It is a graphical representation of an attack scenario, and it has happened to be a popular method for formal description of attacks. It has become a foundation for other approaches, e.g., methods using Bayesian networks, Markov models, and game-theoretical methods. Their goal was to create a tool for qualitative and quantitative assessment of vulnerabilities. The approach was a great success because it examined a network security state from the system perspective.

Cao et al. [10,11] proposed another variant of the attack graph—the factor graph. It is a probabilistic model that consists of random variables and factor functions. In this paper, it is compared to Bayesian networks and Markov random fields. They used the factor graph to predict attacks with an accuracy of 75% over a dataset of actual security incidents (several years of reports).

The RTECA (Real-Time Episode Correlation Algorithm) was proposed in 2014 by Ramaki et al. [12]. It can be used to detect and predict multi-step attack scenarios. They explain the theoretical and functional implications of the creation of such a tool. Although they propose leveraging the attack graph, the authors have widely used causal correlations in their method.

The authors in [13] developed a method for correlating the intrusion alerts. It produces correlation graphs, which they use for creating attack strategy graphs. They presented techniques for automatically learning attack strategies from alerts raised by intrusion detection systems. These methods extracted attributes relevant to determining an attack strategy, which is represented as a directed acyclic graph, which they called an attack strategy graph. The nodes are known attacks, and the edges between them represent the order of attacks and relationships between them. They also developed a method for easier computer and network forensic analysis. It measures the similarity between sequences of alerts based on their strategies. Their research showed that the proposed methods can successfully extract invariant strategies from alert sequences and can also determine the likeness of those sequences. It can be widely used in identifying attacks that could have been missed by detection systems.

In [14], Li et al. presented another approach based on attack graphs. They described the generation of attack graphs constructed on a data mining approach. The algorithm they proposed uses association rule mining to get multi-step attack scenarios from Intrusion detection system (IDS) alert database. After that, the attack graph is created. The method is also used for calculating the predictability of the attack scenario. It is used for ranking the real-time detection and can help with intrusion prediction.

Liu and Peng [15] developed a game-theoretic framework used for attack prediction. The proposed method can quantitatively predict the probability of attack actions. It can also predict the strategic behavior of the attacker. Thus, it can optimize the precision of correlation-based prediction. This paper presents the first complex framework for motive-based modeling and inference of attackers' intents. In conclusion, the goal of this method is modeling and inference of attack intents, objectives, and strategies.

Wu et al. [16] used another attack prediction method using Bayesian networks. These methods are related to approaches based on attack graphs because a Bayesian network is built from an attack graph. The distinct characteristic of Bayesian networks is the conditional variables and probabilities that are considered in the model.

A Bayesian network is a probabilistic graphical model that describes the variables and the relationships between them. The network is a directed acyclic graph (DAG), where nodes represent the discrete or continuous random variables and edges depict the relationships between them. Each variable has a finite set of mutually exclusive states. The variable and direct edge form a DAG. To each variable A with parents *B*1, *B*2, ..., *Bn*, there is attached a conditional probability table *P*(*A*|*B*1, *B*2, ..., *Bn*) [2].

Ishida et al. [17] proposed forecast techniques for fluctuation of attacks. They used Bayesian inference for calculating the probability of increase or decrease of the attacks. Two algorithms were considered in this paper—focusing on the attack cycle and the fluctuation range of the number of events. Because the event counts of some attacks change frequently, the proposed algorithms based on Bayesian inference were used for predicting the probability, since it can calculate event counts directly. Subsequently, they implemented the forecasting system and tested it on real IDS events.

A real-time alert correlation and prediction framework was introduced by Ramaki et al. [8]. The system includes an online and offline mode. In online mode, the attacker's next move is predicted by the Bayesian attack graph. In the offline mode, the Bayesian attack graph is constructed of low-level alerts. The authors used the DARPA 2000 dataset for research. The prediction accuracy was found to increase with the duration of the scenario for the attack. Thus, accuracy ranged from 92.3% when processing the first attack step to 99.2% when processing the fifth attack step.

Okutan et al. [18] used signals unrelated to the target network in their Bayesian-network-based attack prediction process. The signals include mention of Twitter attacks or the total number of Hackmageddon attacks [19]. As was shown in the results, the prediction accuracy differed from 63% to 99%, making it a promising method.

Since probabilistic graphical models are very powerful modeling and reasoning tools, Tabia et al. [20] proposed an efficient approach based on Bayesian networks. It allows the modeling of local influence relationships. It is dedicated to two main problems in alert correlation. Firstly, an approach based on Bayesian multi-nets was designed, which considered the local influence relationships to improve the prediction. The second problem occurs when multiple intrusion detection systems are in use in the network. In this case, too many of the raised alerts are redundant. Therefore, they proposed an approach for handling IDSs' reliability to reduce the number of false alerts. They based this approach on Pearl's virtual evidence [21].

Another widely used approach to predicting attacks is using Markov models. These methods were implemented along with approaches focused on attack graphs and Bayesian networks at the end of 2000. Farhadi et al. [22] proposed a complex system for alert correlation and prediction. Sequential pattern mining was used to collect the attack scenarios, which were then represented using the hidden Markov model, which was used to identify the attack strategy. Markov models perform well in the presence of unobservable states and transitions. They are not reliant on the possession of complete knowledge. This allowed a successful attack prediction, even though some of the attack stages were undetected or absent.

Using hidden Markov models, Sendi et al. [23] proposed a real-time intrusion prediction system. Multi-step attacks were the main interest in this paper. An empirical review showed how their method could anticipate multi-step attacks, which is especially useful in preventing the attacker from taking control of a huge number of hosts in the computer network.

In 2013, Shin et al. [24] introduced a probabilistic approach for the network-based intrusion detection system APAN, which uses a Markov chain for modeling unusual events in the network traffic to predict intrusion. Unlike other Markov-based methods, this method detects network anomalies and does not aim to predict the next step of an attack as different model-checking approaches do.

Holgado et al. [25] proposed a novel method based on a hidden Markov model for multi-step attack prediction using IDS alerts. They considered hidden states as a particular type of attack. At first, the preliminary training phase based on IDS alert information needs to be done. These observations are acquired by pairing the IDS alert information with a previously built database. Unsupervised and supervised methods for learning are performed in the training model. The prediction module can compute the best state sequence using the Viterbi and forward–background algorithms. The success of this method was shown in the successful detection of the distributed denial of service (DDoS) stages, which is a big problem in detection systems nowadays.

Table 2 shows the approaches in the cyber attack prediction methods. The first proposed method that has become popular involves prediction using an attack graph. It is the most transparent and easy-to-understand model for attack step representation. It has become beneficial in predicting the next steps in an attack. One of the lesser-known approaches is game theory. Nevertheless, it can be very useful in detecting DDoS attacks, which are very hard to predict. More commonly used methods include machine learning models. The first of them, the Bayesian network, has excellent accuracy results. However, it is tough to create this model from actual network traffic because the attackers can create loops in security alert data during attack implementation. Less intuitive approaches, but with great results, are the Markov chains and the hidden Markov model. These can be handy in predicting multi-step attacks.


**Table 2.** Summary of cyber attack prediction methods.

On the other hand, Markov chains and the hidden Markov model need specific information. Due to the lack of information provided from the specific type of dataset, it is not possible to determine the values of the observation probability matrix. It is not certain what the probability of an attack is based on an observable alert. Therefore, we have decided to use a Bayesian network to create a method for cyber attack prediction.

#### **3. The Proposed Cyber Attack Model**

Cyber attack modeling is an important issue for securing any network and can help save money, time, and other resources. There exist several techniques that are used to model and analyze cyber attacks. The important part of understanding how every cyber attack works is to comprehend the steps that an attacker makes in order to reach their target. The goal of these approaches is to understand cyber attack characteristics to provide better security for a system. To defeat cyber attacks, it is also important to comprehend the attacker's objectives and their means. Understanding the characteristics of attacks is paramount in creating a good security strategy. Attack modeling is important in gaining a perspective on how can a cyber attack be stopped in a coordinated manner.

We considered using one of the following three models for analysis and use in our paper—the kill chain model [26], the model presented in [27], and the Diamond model [28]. The cyber kill chain model defines the path of a cyber attack. In this seven-layered model, each layer is critical for the evaluation of the attack. There are seven stages of the traditional kill chain model—reconnaissance, weaponization, delivery, exploitation, installation, command and control, and acting on the objective. This model is based on the assumption that attackers will seek to penetrate the computer system in a sequential and progressive way.

A sample anatomy of a cyber attack was also presented by Bou Harb et al. in their paper about cyber scanning [27]. The anatomy of the attack consists of the following steps—cyber scanning, enumeration, intrusion attempt, elevation of privilege, performing a malicious task, deploying malware/a backdoor, deleting forensic evidence, and exiting.

The Diamond model is one of the models for intrusion analysis. In this model, an attacker targets a victim on two main occasions, rather than using a sequence of continuous steps like the kill chain. This model consists of four elements—adversary, infrastructure, capability, and the victim [28].

Based on the analysis of the presented models, it was concluded that none of them met the requirements. The first two models contain stages that cannot be detected by IDSs. Since the Diamond model is not focused on attack steps, it is also not relevant to this research. That is why a new model needs to be developed. We introduce a hybrid model that includes four stages. This model can be seen in Figure 1.

**Figure 1.** Proposed cyber attack model.

#### *3.1. Scan*

Cyber scanning is the first step in any sophisticated attack. This step is needed so the attacker can obtain information about their target, e.g., harvesting email addresses and login credentials or finding network vulnerabilities, etc. There are a variety of existing methods that an attacker can use to achieve this goal. There are two types of scanning techniques—passive and active.

An attempt to gain information about a target system or computer network that can be collected without actively engaging with the system is called passive scanning [29]. This can be performed by looking up the information about employees on a company's website. These can be email addresses, personal social media accounts, or phone numbers. LinkedIn and other social media networks can store the information of employees. This can help an attacker to identify their potential goal. In addition, social media accounts of employees can provide information about technologies used by a company. After finding out enough information about a victim, the possibility of success using social engineering techniques increases. Passive scanning is the most difficult thing to detect from the perspective of intrusion detection systems.

Active scanning is an attack in which an attacker engages a targeted network to gain information about vulnerabilities in it [29]. If an attacker is using an automated tool for network scanning, the IDS is likely to detect it and raise an alert. Performing active scanning is very valuable for determining any vulnerabilities that can be used. Network probing can be detected by correlating logs over a period of time. Therefore, it can be determined who may be targeting the system. This paper is focused on network scanning captured by Snort IDS. It can easily detect this type of stage. For example, if an attacker is using the NMAP tool to obtain information about a computer system (open ports or type of operating system), Snort can recognize a large number of various types of incoming packets and, therefore, identify the type of scan. It will raise a network-scan type of alert after recognizing this stage.

#### *3.2. Delivery*

Delivery is a critical part of every cyber attack model because it is responsible for an effective cyber attack. In most cyber attacks, it is necessary to have some kind of user cooperation, like downloading and executing malicious files or visiting malicious web pages on the internet. This stage presents a high risk for the attacker because delivery leaves evidence. Multiple delivery methods can be used, such as email attachments, phishing attacks, drive-by download, USB/removal media, or DNS cache poisoning [30].

Snort can detect malicious code by recognizing the transmission of executable code or suspicious strings in network traffic.

#### *3.3. Attempt*

Intrusion detection means discovering that some entity—an attacker—has attempted to gain—or has already gained—unauthorized access to the computer system. An intrusion attempt has a potential for a deliberate unauthorized attempt to enter a computer, system, or network to access information and manipulate information or render a system unreliable or unusable [31]. Intrusion attempts are experienced by victims, servers, networks, systems, and computers. These attempts can be discovered by intrusion detection systems. In the best case, it can be a false alarm because detection systems can sometimes raise false positive alerts. In order to determine if this was the case, it is needed to look at the details of the alert. If the attempt came from an infected system in the local network, it could provide information about this system; for example, the IP address that caused the alert. It can be later checked for any malicious activity. The last possibility is that there was an attempt to attack from an outside local network, but it was blocked. There is no way to determine if the attacker did not obtain any information. Detection of intrusion attempts can be helpful in defending a network, for example, by blacklisting IP addresses or updating firewall configurations.

#### *3.4. Deploying Malware/Malicious Tasks*

This stage contains the last four stages in the kill chain model [26]. In this phase, the malware is successfully installed on a computer system, or an attacker has obtained rights on the targeted device and is performing some malicious action. It starts with exploitation, which is initiated by installing the malware inside the target computer. The malware or the attacker has the required access rights. If the malware is an executable file or the malicious activity is based on code injection or an insider threat, then the installation is not required. After the malware is installed, it will start communication with the command and control server, which can be an attacker's device, server, or even social media network web server. If the attacker has gained access to a targeted computer system, he/she performs some malicious task; for example, stealing private and intellectual data from the network.

#### **4. The Proposed Model for Early-Stage Detection of Cyber Attacks**

This paper focuses on early-stage detection of cyber-attacks and, at the same time, the prediction of the subsequent stages of the attack. These attacks consist of multiple stages and may occur over an extended period of time. In this paper, we study how probabilistic inference can be used to analyze attack scenarios based on the information of the relations between alerts. This section describes a machine learning approach using a Bayesian network to predict cyber attacks' next steps. Algorithms for aggregation, causal relationship discovery, and Bayesian network construction will be introduced in this section.

#### *4.1. Alert Aggregation*

The first step in alert processing is aggregation, since intrusion detection systems are susceptible to alert flooding, meaning that they generate a huge number of alerts. Therefore, it is often hard to cope with a big amount of data. This issue can be solved by aggregating all of the alerts. Every aggregated event consists of:


It is difficult to obtain a bigger picture in a large number of probes. Aggregation reduces the number of redundant alerts generated. This simplifies alert analysis and further processing. Possibly, this will not affect the information obtained in reduced alerts because only alerts that have the same important attributes are merged. Therefore, thousands of generated alerts are aggregated into a hyper-alert. Alerts were aggregated into one based on multiple attributes. All of the subsequent properties have to be met in two alerts for them to be merged into one:


The output of aggregation is stored in a multi-dictionary object: agr\_alerts\_all. The alerts are added to this data structure based on finding the key, which contains an alert message, source IP address, and destination IP address of the alert. The pseudo-code of the algorithm (Algorithm 1) can be seen below.

