3.3.3. Opportunity

Similarly, opportunity can be calculated by identifying the number of open ports, the number of protocols that have unrestricted access and would be vulnerable, and what other factors help a hacker do unauthorised access to the server. The model will evaluate all this information by initialising the PCAP file captured by the model. The model will determine the open ports with the help of various tools like NMAP, NS-LOOKUP, DIPScan, etc. The combination of all the information about such attributes led to the evaluation of the opportunity of the threat agen<sup>t</sup> groups.

#### **4. Results and Discussion**

#### *4.1. State-of-the-Art Algorithms*

Many different models are used to perform threat assessment for a network in an informational environment on specialised datasets, where some of the datasets are discussed in the previous section. Here, we illustrate all the threats identified in a network captured during the penetration testing against the ESXi server of the University of Hertfordshire. To provide an overview of the current state-of-the-art ML approaches used to perform the threat assessment, we group all the identified threats from a network based on their profile maintenance concerning the PYTHON program run against the DataStream/PCAP files captured in the experiment. Similarly, the critical threat intelligence [26] feed is evaluated from the group of threat agents based on their footprints extracted during the analysis phase of the experiment. This overview is further divided into two main categories, i.e., traditional extraction of information from the PCAP files and machine learning techniques applied on the information extracted from the PCAP files to generate the footprints used by the threat agents during traversing network of the server.

The PYTHON script provides the accuracy and the unique attributes of the threat agents for precision, false-positive rate (FPR), anomaly detection rate (ADR), and faultmeasure as initially reported [27]. Secondly, we calculated the performance of the threat agen<sup>t</sup> followed by our proposed three-dimensional metrics, i.e., motivation, opportunity, and capability. Figure 4 shows that the input is an enormous number of heterogeneous PCAP files captured during the experiment. The potential output generated with analysis of PCAP files is the unique number of Excel sheets which consist of information about the threat agents such as time (in min), Highest Protocol, TCP protocol, Source I.P. Address, Destination I.P. Address, Source port, Destination port, Total Packet Length, City, Region, Country, Latitude, Longitude, and Internet Service Provider. The specific attributes for each experiment run against the PCAP files can be retrieved from https://github.com/ Gauravsbin/Excell-sheets-of-pcap-files-and-results-of-Threat-Assessment-analysis (accessed on 8 May 2021) [28]. Furthermore, with the help of these unique attributes, we can determine the capability and opportunity of the threat agents [29]. Based on the footprints followed by the threat agents during the analysis, we can determine the motivation factor for attackers.

Some of the captured PCAP files were corrupted during the experiment, and the PYTHON program list of crashed files generated during the investigation can be fetched as shown in Figure 5. We also checked all these crashed files manually and with other analysis tools. We found the same result that no information can be extracted from these files. There may be some capture issue or the connection lost on the hacker's end during the network establishment. The time complexity to generate the unique I.P.s with information attributes can also be evaluated from this experiment. This is the unique feature of this model as compared to the existing model and methodologies. This could happen because of the use of semi-automatic approaches for threat assessment of networks next to the real-time informational environment.

#### *4.2. Workflow and Comparative Experiments*

As per the previous discussion, the output is generated in the form of Excel sheets with the unique attribute of threat agents in a semi-automatic manner. So, to determine the motivation, opportunity, and capability of threat agen<sup>t</sup> groups, we applied machine learning techniques on the previous phase's output to provide a semi-automatic feature to the model [30]. This novel approach helps us optimise the threat assessment's complexity against the network of influential organisations. This paper also shows the process of

using ML libraries of PYTHON on TensorFlow and automatic techniques of the JUPYTER notebook to identify the unique tuples of DataStream/PCAP files. This approach mainly depends on the chronological order of packets in PCAP files. Here, we first make groups of all the unique I.P.s extracted from raw PCAP files captured from the network with the help of Wireshark. The grouping of all unique I.P.s based on their attributes and characteristic features was identified during the analysis and implementation of DataStream.

**Figure 4.** Workflow for raw PCAP file traffic-based feature extraction and experimental results for Unique I.P. addresses with Time complexity.

Similarly, the potential output generated in the previous phase is used as potential input for the second phase of analysis and implementation. Such a process is known as the profiling of threat agents. As in the previous stage, we generated the Excel sheet for each captured PCAP file consist of helpful information like ports open. They are operating on that layer: time spent on the network, location of the threat agent, etc.

**Figure 5.** Workflow for raw PCAP file and experimental results for Unique I.P. addresses with Time complexity.

Based on this analysis, we make one more IPYNB file (Interactive Python Notebook) known as the Jupyter notebook. Jupyter is a free, open-source, interactive web tool known as a computational notebook. Researchers can combine software code, computational output, explanatory text, and multimedia resources in a single document. A Jupyter Notebook document is a JSON document, following a versioned scheme, containing an ordered list of input/output cells which can have code, text (using MARKDOWN), mathematics, plots, and rich media, usually ending with the IPYNB extension [31–33]. This file consists of an algorithm performing data clustering of Unique I.P.s found in the Excel sheet of the previous phase. The data clusters of I.P.s form based on the number of I.P.s facing a particular type of attack. This specific type of attack is determined based on the number of factors identified during the analysis. The IPYNB file is collecting all the unique I.P.s as input and extracting the information like on which layer they are operating, what type of ports and protocols are compromised when they are attacking the source I.P.s of end-users, and what information they extracted from the particular environment of the V.M.s, etc. Based on the analysis, the model designed the group of all the threat agents into particular categories concerning their attacking behaviours identified during the analysis.

Figures 6–8 show the histogram of the bar chart with the help of the IPYNB algorithm for each Excel sheet generated during the first phase. Note that we have demonstrated the experimental results of only three PCAP files, and similarly, we can show this for the other PCAP file. There are two parts to the outputs generated by the. IPYNB file. In the first part, three histograms are generated for every file in the output Excel sheet, and the second part develops the histograms on the cumulative data of all the files in the folder. For every file in the output Excel sheet, three histograms have been generated, and all these three histograms consist of common data at the y-axis, i.e., the number of unique I.P.s. Figures 6a, 7a and 8a show the protocols being used by the attackers and the number of unique I.P.s using these protocols. Figures 6b, 7b and 8b show the ports on the host targeted and the number of unique I.P.s that targeted them. This histogram highlights the vulnerable ports. Figures 6c, 7c and 8c show the time spent as a function of the number of unique I.P.s. This histogram highlights how much time an attacker will usually spend to attack a host. These histograms for the protocols, ports, and time spent on the network will help evaluate the three main attributes for the threat agents, i.e., motivation, opportunity, and capability. Once we identify the port open during the network access, we can determine the opportunity for the groups of threat agents used during the penetration of the network. In the same way, the above histograms will help us identify the protocols accessed by the threat agents, evaluate the hacker's potential capability, and level of skills acquired by threat actors.

**Figure 6.** Experimental Results for PCAP file (AF 26.11.2014). (**a**) Number of Unique I.P.s vs. Protocol being used; (**b**) Number of Unique Attackers vs. Vulnerable Ports; (**c**) Number of Unique I.P.s vs. Time Spent.

From this analysis, we can identify the particular groups of threat agents accessing a specific protocol for penetration of the network. For example, in Figure 8, the TCP protocol is used by most of the I.P.s and mainly targets the network layers. So, we can conclude that in this analysis, the threat agents have primarily distributed denial of services (DDOS) type of attacks.

Figure 9 histograms are based on the accumulated data in the potential output produced in the Excel sheets. They are used to represent the number of packets generated for traffic during penetration testing, protocols, or layers being used by threat agents and targeting vulnerable ports for achieving the goal. Figure 9a shows how many packets are sent to which port on the host machine, and Figure 9b shows the volume of packets for every Protocol used to attack the host.

**Figure 8.** Experimental Results for PCAP file (AR 17.12.2014). (**a**) Number of Unique I.P.s vs. Protocol being used; (**b**) Number of Unique Attackers vs. Vulnerable Ports; (**c**) Number of Unique I.P.s vs. Time Spent.

Figure 10 represents the histogram between the total data collected from each unique I.P., whole time spent on the network, and protocols used to attack the network. Figure 10a highlights the amount spent by the attacker for every Protocol used to attack the host. In Figure 10b, the data points for time spent are highlighted in blue, whereas the data points for total packets sent are highlighted in red. Even though these have different units, it gives us a statistical relative visual of how the time spent by the attacker varies concerning the number of packets sent for the same protocols used.

**Figure 9.** Histogram for (**a**) Total Packets sent vs. Vulnerable Ports, (**b**) Total Packets sent vs. Protocol used by Attackers.

**Figure 10.** Histogram for (**a**) Time Spent vs. Protocol used by Attackers, (**b**) Protocol used by Attackers vs. Total Packets sent.

#### **5. Conclusions and Future Work**

Threats and threat agent's risks are emerging in threat assessment of a network for an organisation and business of the companies. The security risk managemen<sup>t</sup> practitioners enable a mechanism to explore these risks and enforce their countermeasures based on the threat agen<sup>t</sup> profiling and determining the critical threat intelligence feed to them. This paper presents a semi-automatic model based on the threat assessment of the PCAP files captured by the semi-automatic featured tools during the penetration testing run against the ESXi server of the University of Hertfordshire. The framework captured the data between 2012 and 2019, which illustrates the value of assets stored on the server, and the motivation, opportunity, and capability of the threat agents while accessing the network. We evaluate the situational awareness data through this semi-automatic threat assessment model by exploring the threat profiles for the historically captured data with the aid tools. Furthermore, we provide the threat agen<sup>t</sup> practitioners with an idea of using an automatic model for threat assessment of a network. This research's findings will support decision makers, management, and software developer practitioners regarding the building of threat agen<sup>t</sup> profiling for their historical data. Critical Threat Intelligence feeds for the threat agent's groups might be helpful for the evaluation of new threats found in the network. In the future, we aim to build an automatic machine learning–based threat and vulnerability analysis security reference model as a security risk managemen<sup>t</sup> tool to evaluate the security needs of networks with sequential requirements of the next to real-time informational environment.

**Author Contributions:** Conceptualisation, G.S. and S.V.; methodology, G.S.; software, G.S.; validation, G.S., S.V., and C.M.; formal analysis, G.S.; investigation, G.S.; resources, S.V.; data curation, G.S.; writing—original draft preparation, G.S.; writing—review and editing, G.S., S.V., C.M., N.A., and S.K.; visualisation, G.S.; supervision, S.V., C.M. and N.A.; project administration, S.V.; funding acquisition, G.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** The data presented in this study are available upon request from the corresponding author and are available on GitHub [33].

**Acknowledgments:** We are grateful for the anonymous reviewers' hard work and comments that allowed us to improve the quality of this paper.

**Conflicts of Interest:** The authors declare no conflict of interest.
