*Article* **CVE2ATT&CK: BERT-Based Mapping of CVEs to MITRE ATT&CK Techniques**

**Octavian Grigorescu <sup>1</sup> , Andreea Nica <sup>1</sup> , Mihai Dascalu 1,2,\* and Razvan Rughinis 1,2**


**Abstract:** Since cyber-attacks are ever-increasing in number, intensity, and variety, a strong need for a global, standardized cyber-security knowledge database has emerged as a means to prevent and fight cybercrime. Attempts already exist in this regard. The Common Vulnerabilities and Exposures (CVE) list documents numerous reported software and hardware vulnerabilities, thus building a communitybased dictionary of existing threats. The MITRE ATT&CK Framework describes adversary behavior and offers mitigation strategies for each reported attack pattern. While extremely powerful on their own, the tremendous extra benefit gained when linking these tools cannot be overlooked. This paper introduces a dataset of 1813 CVEs annotated with all corresponding MITRE ATT&CK techniques and proposes models to automatically link a CVE to one or more techniques based on the text description from the CVE metadata. We establish a strong baseline that considers classical machine learning models and state-of-the-art pre-trained BERT-based language models while counteracting the highly imbalanced training set with data augmentation strategies based on the TextAttack framework. We obtain promising results, as the best model achieved an F1-score of 47.84%. In addition, we perform a qualitative analysis that uses Lime explanations to point out limitations and potential inconsistencies in CVE descriptions. Our model plays a critical role in finding kill chain scenarios inside complex infrastructures and enables the prioritization of CVE patching by the threat level. We publicly release our code together with the dataset of annotated CVEs.

**Keywords:** MITRE ATT&CK Matrix; techniques classification; BERT-based multi-labeling

## **1. Introduction**

Cyberspace has become a fundamental component of everyday activities, being the core of most economic, commercial, cultural, social, and governmental interactions [1]. As a result, the ever-growing threat of cyber-attacks not only implies a financial loss, but also jeopardizes the performance and survival of companies, organizations, and governmental entities [2]. It is vital to recognize the increasing pace of cybercrime as the estimated monetary cost of cybercrime skyrocketed from approximately \$600 billion in 2018 to over \$1 trillion in 2020 [3]. This effect has increased even further due to the COVID-19 pandemic [4].

In this context, the necessity for better cyber information sources and a standardized cybersecurity knowledge database is of paramount importance, as a means to identify and combat the emerging cyber-threats [5]. Efforts to build such globally accessible knowledge bases already exist. MITRE Corporation set up two powerful public sources of cyber threat and vulnerability information, namely the Common Vulnerabilities and Exposures list and the MITRE ATT&CK Enterprise Matrix.

The *Common Vulnerabilities and Exposures* list is a community-based dictionary of standardized names for publicly known cybersecurity vulnerabilities. Its effort converges toward making the process of identifying, finding, and fixing software vulnerabilities more efficient, by providing a unified naming system [6]. Despite their benefits and widespread

**Citation:** Grigorescu, O.; Nica, A.; Dascalu, M.; Rughinis, R. CVE2ATT&CK: BERT-Based Mapping of CVEs to MITRE ATT&CK Techniques. *Algorithms* **2022**, *15*, 314. https://doi.org/ 10.3390/a15090314

Academic Editor: Frank Werner

Received: 10 August 2022 Accepted: 27 August 2022 Published: 31 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

usage, CVE entries offer little to no information regarding mitigation techniques or existing defense strategies that could be employed to address a specific vulnerability. Moreover, the meta-information of a CVE does not include sufficient classification qualities, resulting in sub-optimal usage of this database. Better classification would translate to mitigating a larger set of vulnerabilities since they can be grouped and addressed together [7].

The *MITRE ATT&CK Enterprise Matrix* links techniques to tangible configurations, tools, and processes that can be used to prevent a technique from having a malicious outcome [8]. By associating an ATT&CK technique to a given CVE, more context and valuable information for the CVE can be extracted, since CVEs and MITRE ATT&CK techniques have complementary value. Furthermore, security analysts could discover and deploy the necessary measures and controls to monitor and avert the intrusions pointed out by the CVE and cluster the CVEs by technique [9].

Even though linking CVEs to the MITRE ATT&CK Enterprise Matrix would add massive value to the cybersecurity community, these two powerful tools are currently separated. However, manually mapping all 189,171 [10] CVEs currently recorded to one or more of the 192 different techniques in the MITRE ATT&CK Enterprise Matrix is a non-trivial task and the need for automated models emerges to map all existing entries to corresponding techniques. In addition, even if new CVEs would be manually labeled, an initial pre-labeling using a machine learning model before expert validation would be time effective and beneficial. Moreover, the model would provide technique labeling for zero-day vulnerabilities, which would be extremely helpful for security teams.

The ATT&CK matrix supports a better understanding of vulnerabilities and what an attacker could achieve by exploiting a certain vulnerability. ATT&CK technique details, such as detection and mitigation, are useful for system administrators, SecOps, or DevSec-Ops teams to obtain an assessment risk report in a short period of time while generating a remediation plan for discovered vulnerabilities. The Center for Threat-Informed Defense team has created a very useful methodology [11] that helps the community build a more powerful threat intelligence database. The organization's defender team has to understand how important it is to bridge vulnerability and threat management with the adoption of this methodology as more reliable and consistent risk assessment reports will be obtained [12].

Baker [12] highlights the importance of combining CVEs with the ATT&CK framework to achieve threat intelligence. Years ago, it was considerably harder for security teams to understand the attack surface, thus reducing their capacity to protect the organization against cyber attacks. With the emergence of the ATT&CK project, the security teams have a better overview of the CVEs based on known attack techniques, tactics, and procedures.

Vulnerability management can be divided into three categories, namely: the "Find and fix" game, the "Vulnerability risk" game, and the "Threat vector" game. The first one is a traditional approach where the vulnerabilities are prioritized by CVSS Score; this is applicable for small organizations with less dynamic assets. The second category consists of risk-based vulnerability management where organizational context and threat intelligence (such as CVE exploited in the wild properties) are considered; this applies to organizations that have security teams, but the number of CVEs is too large. The "Threat Vector" game includes the understanding of how the hackers might exploit the vulnerabilities while accounting for the MITRE ATT&CK framework mappings between CVEs and techniques, tactics, and procedures. The third category is the most efficient model of threat intelligence, with inputs delivered to the vulnerability risk management process from cyber attacks that have occurred and are trending. As such, security teams should take into account risks for building the vulnerability management program, but also threat intelligence to have a better understanding of vulnerabilities and to discover the attack chains within the network [13].

The aim of this paper is to develop a model that leverages the textual description found in CVE metadata to create strong correlations with the MITRE ATT&CK Enterprise Matrix techniques. To achieve this goal, a data collection methodology is developed to build our manually labeled CVE corpus containing more than 18,100 entries. Moreover, state-of-theart Natural Language Processing (NLP) techniques that consider BERT-based architectures are employed to create robust models. We also target addressing the problem of a severely imbalanced dataset by developing an oversampling method based on adversarial attacks.

Efforts have been already undertaken to interconnect CVEs to the MITRE ATT&CK Framework. However, we identified limitations of existing solutions based on the research gap in the literature regarding the identification of correspondences between CVEs to the corresponding techniques from the MITRE ATT&CK Enterprise Matrix. The following subsections details existing state-of-the-art techniques relevant for our task.

## *1.1. BRON*

BRON [9] is a bi-directional aggregated data graph which allows relational path tracing between MITRE ATT&CK Enterprise Matrix tactics and techniques, Common Weakness Enumerations (CWE), Common Vulnerabilities and Exposures (CVE), and Common Attack Pattern Enumeration and Classification list (CAPEC). BRON creates a graph framework that unifies all scattered data through inquiries performed of the resulted graph representation by data-mining the relational links between all these cyber-security knowledge sources. In this manner, it connects the CVE list to MITRE ATT&CK by traversing the relational links in the resulted graph.

Each information source has a specific node type, interconnected by external linkages as edges. MITRE ATT&CK techniques are linked to Attack Patterns. Attack Patterns are connected to CWE Weaknesses, which have relational links to a CVE entry. Thus, BRON can respond to several different queries, including linking the CVE list to the MITRE ATT&CK Framework.

However, the model falls short as it does not connect new CVEs to MITRE ATT&CK Enterprise Matrix techniques, but it uses already existing information and links to create a more holistic overview of the already available knowledge. It does not solve our problem, since the main aim is to correctly label new emergent samples.

## *1.2. CVE Transformer (CVET)*

The CVE Transformer (CVET) [14] is a model that combines the benefits of using the pre-trained language model RoBERTa with a self-knowledge distillation design used for fine-tuning. Its main aim is to correctly associate a CVE with one of 10 tactics from the MITRE ATT&CK Enterprise Matrix. Although the CVET approach obtains increased performance in F1-score, it is unable to identify all 14 tactics from the MITRE ATT&CK Matrix on the training knowledge base.

Moreover, the problem of technique labeling is much more complex than tactic mapping, since the number of available techniques is ten times higher (i.e., there are 14 tactics and 192 different techniques in the MITRE ATT&CK Enterprise Matrix). Additionally, tactic labeling can be viewed as a subproblem of our main goal given the correlation between tactics and techniques. Overall, technique labeling is out of scope for the CVE Transformer project.

## *1.3. Unsupervised Labeling Technique of CVEs*

The unsupervised labeling technique introduced by Kuppa et al. [15] considers a multihead deep embedding neural network model that learns the association between CVEs and MITRE ATT&CK techniques. The proposed representation identifies specific regular expressions from the existing threat reports and then uses the cosine distance to measure the similarity between ATT&CK technique vectors and the text description provided in the CVE metadata. This technique manages to map only 17 techniques out of the existing 192. As such, multiple techniques are not covered by the proposed model. Thus, a supervised approach for technique labeling might improve the recognition rate among techniques.

## *1.4. Automated Mapping to ATT&CK: The Threat Report ATT&CK Mapper (TRAM) Tool*

Threat Report ATT&CK Mapping (TRAM) [16] is an open-source tool developed by *The Center for Threat-Informed Defense* that automates the process of mapping MITRE ATT&CK techniques on cyber-threat reports. TRAM utilizes classical pre-processing techniques (i.e., tokenization, stop-words removal, lemmatization) [17] and applies Logistic Regression on the bag-of-words representations. Since the tool maps any textual input on MITRE ATT&CK techniques, it could, in theory, be adapted to link the CVE list to the MITRE ATT&CK Framework by simply using it on the CVE textual description. However, due to its simplicity, the tool has serious limitations when it comes to its capacity to learn the right association between text descriptions and techniques. In addition, TRAM labels each sentence individually, failing to capture dependencies in textual passages. In this way, the overall meaning of the text is lost.

The main contributions of this paper are as follows:


## **2. Method**

This section provides an overview of our proposed methodology, focusing on: (1) data collection and building the corpus needed for training the models; and (2) exploring various neural architecture for mapping CVEs to ATT&CK techniques.

## *2.1. Our Labeled CVE Corpus*

## 2.1.1. Data Collection

Since no public datasets exist that map a CVE to all corresponding ATT&CK techniques, the first step consisted of building our own labeled corpus of 1813 CVEs, which was obtained using two different methods.

First, we manually created a knowledge base of 993 labeled CVEs by individually mapping each CVE to tactics and techniques from MITRE ATT&CK Enterprise Matrix. We extracted CVEs that were published between 2020 to 2022 for relevance. The labeling process was performed by 4 experts to ensure consistency, following the standardized approach proposed by the *Mapping MITRE ATT&CK to CVEs for Impact* methodology [11] and a set of common general guidelines.

The *Mapping MITRE ATT&CK to CVEs for Impact* methodology consists of three steps. The first one is to identify the type of vulnerability (e.g., cross-site scripting, buffer overflow, SQL injection) based on the vulnerability type mappings. The next step is to find the functionality to which the attacker gains access by exploiting the CVE. The final step refers to determining the exploitation technique using the provided tips that offer details about the necessary steps to exploit a vulnerability. Our methodology started from these steps and added other common general guidelines before labeling the tactics and techniques, such as searching for more details about a CVE on security blogs to obtain more relevant insights, or analyzing databases (e.g., the Vulnerability Database [21] and the Exploit Database— Exploits for Penetration Testers, Researchers, and Ethical Hackers [22]) for useful inputs about CVEs.

The labeling was performed by three 4th year undergraduate students in Computer Science with background courses in security, networking, and operating systems, and one Ph.D. student in Computer Science with 5+ years of experience in information security in the industry who provided guidance and helped reach consensus. The entire annotation process was overseen by a professor in cyber security. The dataset can be found on TagTog [19] and is split into the following collections:


Second, besides the manual labeling process, we automatically extracted 820 already labeled CVEs provided by *Mapping MITRE ATT&CK to CVEs for Impact* [11] and imported them in our TagTog project. The provided CVEs date from 2014 to 2019; thus, there is no overlap with the manually annotated CVEs.

Each CVE entry has associated the corresponding ID, the rich text description, and 14 labels denoting the possible tactics found in the MITRE ATT&CK Enterprise Matrix where the corresponding techniques are annotated. Extracting the data from TagTog can be performed automatically, using the TagTog API [23].

## 2.1.2. Data Analysis

The size of our corpus can be argued by the increased difficulty when annotating a CVE and the impossibility to find other previously build repositories consisting of CVEs mapped on MITRE ATT&CK Enterprise Matrix both tactics and techniques. As discussed previously, more than 189,171 CVEs currently exist and our dataset only captures a fraction of them. Moreover, the distribution of CVEs based on technique is highly imbalanced (see Figure 1) because the CVEs were collected based on their release date, without any other further considerations. About 77% of the collected CVEs cover 5 techniques (*Exploit Public-Facing Application*, *Exploitation for Client Execution*, *Command and Scripting Interpreter*, *Endpoint Denial of Service* and *Exploitation for Privilege Escalation*).

Figure 1 also shows that a large number of techniques contain a far too small number of examples for effective learning. As such, a threshold of a minimum of 15 examples per technique was imposed. In this manner, out of the 192 different techniques from the MITRE ATT&CK Enterprise Matrix, only 31 were considered in follow-up experiments. The CVEs that are not mapped to any of the 31 considered techniques were also discarded, leaving a total of 1665 annotated examples in the dataset. Figure 2 depicts the new distribution of CVEs based on technique after applying the threshold.

## 2.1.3. Data Augmentation

The severe data imbalance which characterizes our CVE dataset can potentially degrade the performance of many machine learning models since few techniques have high prevalence, while the others have low or very low frequencies [24].

One scheme for dealing with class imbalance is oversampling [24]. This data-level approach consists of randomly oversampling duplicate examples from low-frequency classes to rebalance the class distribution. However, this can result in overfitting and we opted to use the TextAttack Framework [25] for generating adversarial examples. TextAttack is a Python framework designed for adversarial attacks, data augmentation, and adversarial training in NLP. The adversarial attack finds a sequence of transformations to perform on an input text such that the perturbations adhere to a set of grammar and semantic constraints and the attack is successful [26]. These transformations performed can be reused to expand the training dataset by producing perturbed versions of the existing samples. As such, TextAttack Framework offers various pre-packaged recipes for data augmentation [27].

**Figure 1.** The distribution of CVEs among techniques.

**Figure 2.** The distribution of CVEs among the 31 considered techniques after applying the threshold.

We chose the EasyDataAugmenter (EDA) for augmenting the CVE dataset, which performs four simple but powerful operations on the input texts: synonym replacement, random insertion, random swap, and random deletion. EDA significantly boosts performance and shows particularly strong results for smaller datasets [28], which makes it the perfect candidate for oversampling our labeled CVE corpus. Moreover, EDA does not perform major alterations of the content and is not as computationally expensive as other recipes, such as CLAREAugmenter, while providing satisfactory results on our CVE corpus.

Since one CVE can be mapped to multiple techniques at the same time, rare techniques among the dataset are usually found in combination with highly prevalent techniques. Using all CVEs that are mapped to a specific technique for augmentation would only preserve the class imbalance, generating new samples for both low-frequency and highfrequency techniques. To counter this undesired effect, EasyDataAugumenter was fed only with CVEs that were particular to only one technique and were mapped to that technique only, thus producing new samples only for the desired class.

Figure 3 displays the distribution of CVEs per technique after performing the data augmentation. The initial severe imbalance among techniques was scaled down, but still exists, due to the reduced number of particular CVEs for low-frequency techniques.

## *2.2. Machine Learning and Neural Architectures*

Our main goal is to create a model that can accurately predict all the techniques that can be mapped to a specific CVE while using its text description. We tacked this task as a multi-label learning problem as each CVE may be assigned to a subset of techniques. Given the challenging nature of the multi-label paradigm [29], we experimented with multiple state-of-the-art machine learning models to find the most predictive architecture.

**Figure 3.** The distribution of CVEs among the 31 considered techniques after data augmentation.

## 2.2.1. Classical Machine Learning

In order to establish a strong baseline we also considered classical machine learning algorithms applied on bag-of-words representations. All CVE descriptions were preprocessed to remove noise and retain only the relevant words. The pipeline from the spaCy [30] NLP open-source library was employed which included the following steps: text tokenization, removal of stopwords, punctuation, and numbers, followed by lemmatization of remaining tokens. The tokens are afterward converted to bag-of-words representations using Term Frequency-Inverse Document Frequency (TF-IDF).

## Multi-Label Learning

The aim of problem transformation methods is to reduce the complexity of the multilabel learning by converting the multi-label problem into one or more single-label classification tasks [31].

Given that the interconnection between techniques is worth taking into account when labelling a CVE since it can provide further insights on general adversarial patterns, we experimented with different problem transformation methods to find the one that captures best the relations between labels:


## Naive Bayes Classifiers

The Naive Bayes classifier makes the simplifying assumption that features are conditionally independent, given a class. Even though the assumption of independence is generally unrealistic, Naive Bayes performs well in practice, competing with more sophisticated classifiers models especially for text classification [34]. We chose to experiment with a Naive Bayes variant for multinomial distributed data because of the model's simplicity and relatively good results.

## Support Vector Machines

A Support Vector Machine (SVM) searches for the maximum margin hyperplane that separates two classes of examples. Because SVMs have shown efficiency to capture high dimensional spaces and performed successfully on a number of distinctive classification tasks [35], we decided to use it in our experiments for CVE technique labelling. We performed an exhaustive search over specified parameters values using GridSearchCV [36] to determine the optimum configuration of parameters.

## 2.2.2. Convolutional Neural Network (CNN) with Word2Vec

Convolutional Neural Networks (CNNs) consist of multiple layers designed to extract local features in the form of a feature map. Since CNN uses back-propagation to update its weights in the convolutional layers, the CNN feature extractors are self-determined through continuous tuning of the model [37]. In the field of NLP, CNNs have proved to be extremely effective in several tasks, such as semantic parsing [38] and sentence modeling [39]. This intuition pointed in the direction to experiment with CNN for our model since CNNs with Word2Vec embeddings are robust even on small datasets. In addition, we considered SecVuln\_WE [40] that includes word representation especially designed for the cybersecurity vulnerability domain. SecVuln\_WE was trained on security-related sources such as Vulners, English Wikipedia (Security category), Information Security Stack Exchange Q&As, Common Weakness Enumeration (CWE) and Stack Overflow.

Figure 4 presents the architecture in which the pre-trained SecVuln\_WE embeddings are passed through the convolutional layer containing 100 filters with a kernel size of 4. In this way, each convolution will consider a window of 4 word embeddings. Afterward, we perform batch normalization of the activations of the previous layer at each batch. Next comes the MaxPool and the Dropout layers, followed by a dense layer with sigmoid activation. Since we are dealing with a multi-label classification problem, the output layer has a designated node for each technique and each output indicates the binary probability to have a specific technique mapped to the considered CVE.

**Figure 4.** Architecture of the CNN with Word2Vec embeddings.

## 2.2.3. BERT-Based Architecture with Multiple Output Layers

Reducing the considerable complexity of the multi-label problem was first among our considerations when designing this architecture. Converting our multi-labeling problem into multiple binary classification tasks following the *One versus Rest* method has the advantage of conceptual simplicity; yet, having a distinct BERT layer for contextualized embeddings for each one of the 31 techniques was redundant.

The proposed architecture from Figure 5 considers a pre-trained BERT encoder, a Dropout layer, and an individual dense layer for each technique, which outputs the probability that a particular CVE points to that particular technique. The model is consistent with the considerations of the *One VS Rest* method, while also taking advantage of the shared embeddings layer.

**Figure 5.** BERT-based architecture with multiple output layers.

2.2.4. BERT-Based Architecture Adapted for Multi-Labeling

Analyzing each label separately might overlook the strong correlation between techniques. This correspondence has multiple roots, as techniques in a given tactic are connected through their attack behavior pattern, whereas techniques across multiple tactics are connected through the attack vector of the vulnerability. Thus, we explored creating a model capable of exploiting the link between multiple techniques.

The specific architectural decision taken for this last design was to have only one output layer, with one individual node for each technique. In this manner, we aim to capture the specifics for each technique, while also considering how subsets of techniques are interconnected.

Figure 6 details the proposed model which considers 768-dimensional contextual embeddings from various BERT-based models (i.e., BERT [41], SciBERT [42], and SecBERT [43]) passed through a Dropout layer. The Dropout layer output goes through a Linear layer with 768 input features and 31 output nodes, one for each technique. We considered BCEWith-LogitsLoss [44] (the combination of a Sigmoid layer and the BCELoss) as a loss function, the most commonly used for multi-label classification tasks, because each output node reveals the probability of a technique to be tagged for a specific CVE (i.e., the probabilities need to be treated independently).

**Figure 6.** The design of the multi-labeling BERT-based architecture.

## *2.3. Performance Assessment*

For a predicted technique, we wanted to make sure that our mapping was correct (i.e., high precision—P) and we wanted to correctly classify as many examples as possible for a given class (i.e., high recall—R). Thus, we considered the F1-score as a performance metric for all models, defined as the harmonic mean of the P and R per class. Moreover, we used the weighted version of the F1-score given the imbalance between classes, which calculated a general F1-score per model by proportionally combining the F1-scores obtained for each label separately. We also computed the weighted precision and recall for the tested models.

## **3. Results**

This section analyses the results of the empirical experiments performed using the previously detailed models. First, it compares the performance of various models. Second, it assesses the impact of data augmentation on performance and investigates the metrics obtained by the best model.

Multiple observations can be made based on the results of our experiments shown in Table 1. From the classical machine learning models, LabelPowerset is the best multi-label strategy and SVC with a linear kernel and C = 32 has the higher F1-score, competing even with our deep-learning models. The SecBERT model has the highest F1-score (42.34%) among all considered models, proving to be the most powerful solution to labeling a CVE. An important observation is that the CNN + Word2Vec architecture obtained better results than those using simple BERT. Thus, domain-related pre-training on large security databases leads to increased performance by providing better contextualization and partially compensating for the scarce training set.

**Table 1.** Results for the proposed models (italics marks the best multi-label strategy for classical ML, while bold marks the best model).


Table 2 points out the appropriateness of employing data augmentation techniques on our dataset for deep learning models (approximately 6% performance gain). Only the best multi-label strategy for classical machine learning algorithms was considered. The F1-score falls considerably by 10% for Naive Bayes, in particular, since Naive Bayes places great importance on the number of appearances of a word in a document; however, swapping a relevant word with synonyms and performing random insertions or deletions (i.e., the strategies employed by the EasyDataAugmenter [28]) only confuse the model. The SVC model had a similar performance, whereas the BERT-based models take advantage of the increased sample size/the decreased class imbalance, and generalize better. Not only is performance increased, but the models also tend to learn faster (see faster convergence in Figure 7 in terms of training loss for each output layer associated with a technique in the multi-output BERT model). Moreover, Figure 7 denotes which techniques are more easily learned by the model.


**Table 2.** Side-by-side comparison of performance with and without data augmentation (bold denotes the best model).

**Figure 7.** Comparison of training loss for the multi-output BERT architecture. (**a**) Without data augmentation; (**b**) With data augmentation.

Since Table 2 only provides a global overview of the average performance of the SciBERT model trained on the augmented data, exploring the particular difference between how the model handles different techniques provides additional insights into our model's behavior. Figure 8 plots the F1-score obtained for each individual technique, for both the original model and the one trained on the augmented dataset. Apart from four exceptions (*Data from Local System*, *Hijack Execution Flow*, *User Execution* and *File and Directory Discovery*), the model obtains considerably higher or at least equal scores for all the other 27 techniques. Moreover, the difference between models is minimal (close to 0) for the techniques where the initial model obtains a better F1-score.

**Figure 8.** Comparing F1-score per technique between SciBERT model trained on initial and augmented dataset.

The added gain of the multi-label SciBERT model trained on the augmented dataset resides in its ability to maximize the F1-score for techniques where the initial model performed poorly. One such example is *Forge Web Credentials*. The initial model obtained an F1-score of 0% since both recall and precision were 0%. However, the improved version of the model obtained an F1-score of 66.66%, with a recall of 50% and precision of 100% after data augmentation; similarly, data augmentation tuned the model to predict the *Forge Web Credentials* technique with 100% precision. Overall, the number of techniques with which the model had difficulty in learning has decreased substantially.

Figure 9 shows the correlation between the CVE distribution and the F1-score obtained for the SciBERT models, both using the initial dataset and the one trained after augmentation. The techniques are displayed on both graphs in the same order to indicate how the CVE distribution changed after performing the process of data augmentation and how the adjustments in CVE distribution impacted the F1-score. We observe that not only the techniques initially associated with a small number of CVEs benefited from the augmentation method, but also the techniques associated with a high distribution of samples—for example, the F1-score for the *Command and Scripting Interpreter* technique increased from the initial 58.92% to 64.12%.

**Figure 9.** Comparing the F1-score over the CVE distribution for the SciBERT model. (**a**) Without augmentation; (**b**) With augmentation.


**Table 3.** Precision, Recall and F1-Scores for the best model.

## **4. Discussion**

## *4.1. In-Depth Analysis of the Best Model*

Table 3 introduces a complete overview of the results recorded for the best model, the multi-label SciBERT trained on the augmented dataset. The F1-score per technique from the MITRE ATT&CK Enterprise Matrix ranges from 80.35% for *Endpoint Denial of Service* to 0.00%; the last techniques at the end of Table 3 marked with italics and including the corresponding number of training samples in parenthesis. Even though the model scores on a global scale an F1-score of 47.84%, the model fails to capture any knowledge about nine out of the thirty-one techniques, though fewer instances than the other evaluated models. We can associate this inability of the model to recognise the distinct features of these techniques with the extremely reduced number of samples for each technique, even after performing data augmentation. The existing samples in the dataset do not contain enough relevant characteristics for these techniques; as such, the model cannot differentiate them.

Nevertheless, the model successfully captures the essence of other techniques, obtaining a precision of 100.00% for *Forge Web Credentials* and *Brute Force*. For almost all techniques, precision exceeds recall, thus indicating that the general tendency of the model is to omit a label, rather than misplace a technique that cannot be mapped to a particular CVE.

Overall, given the complexity of the multi-label problem and the severe imbalance of the training set, the model obtains promising performance for a subset of techniques, while managing to maximize its overall F1-score.

## *4.2. Error Analysis*

This subsection revolves around understanding the roots of the multi-label SciBERT model limitation. After a methodological investigation that aims to identify the cause of the model's errors, the observed performance deficiencies are further discussed.

Table 4 presents different CVEs whose predicted techniques differ partially or completely from the labeled ones. For most errors in the dataset with multiple techniques tagged, the model succeeds in labeling a subset of correct techniques. This observation stands true for errors 1, 2, and 3 from Table 4. While analyzing error #1, the model extracts the most obvious technique, pointed out by language markers such as *password unencrypted*, *global file*, but fails to make the deduction that, in order for a user to access the file system, a valid account must be used. In contrast, the model successfully identifies the *Valid Accounts* technique for error #2. In general, techniques that are not clearly textually encapsulated and whose understanding requires prerequisite knowledge are overlooked by the model.

Figure 10 studies the model's choice of labels for CVE #2 from Table 4 using Lime [18], the model successfully recognizes the predominant label (i.e., *Valid Accounts*). Moreover, the model correctly identifies the most important concept, the word *authenticated*, which points in the direction of *Valid Accounts*. We can observe that there are techniques that are not ambiguous for the model and for which the labeling process is straightforward; such an example is *Valid Accounts*. The model extracts only the relevant features for the label and the technique is correctly identified. For the *Exploitation for Client Execution*, the model identifies patterns that suggest that the CVE should be mapped to the given technique, as well as patterns that suggest the contrary. Being capable to identify features that are correlated to both situations confuses the model. This problem results from the fact that the meaning behind multiple techniques is overlapping and, as a result, relevant features for a given technique cannot be differentiated.


**Table 4.** Comparing predictions with the true values for the best model.

**Figure 10.** Comparison of word mappings for each technique corresponding to CVE #2 from Table 4. (**a**) Mapping Valid Accounts; (**b**) Mapping Exploitation for Client Execution.

An interesting aspect is revealed in error #3, namely that the model correctly tags *File and Directory Discovery*, but also associates the CVE with *Exploit Public Facing Application*, instead of *Command and Scripting Interpreter*. Both techniques in the MITRE ATT&CK Enterprise Matrix could be equally correctly mapped on the given text description. This is an important observation and points out the established CVE labeling methodology; this highlights a fault in the data collection procedure, rather than the model's capacity to learn the multi-labeling problem. Example #4 presents a similar case, since the predicted technique *Endpoint Denial of Service* is a correct label for the CVE, although it does not appear among the true labels.

**Figure 11.** Comparison of word mappings for each technique corresponding to CVE #4 from Table 4. (**a**) Mapping User Execution technique; (**b**) Mapping Browser Session hijacking.

Error #4 is analyzed in detail in Figure 11 to observe insights on how the model associates the features. The word *browser* is highlighted for both the predicted and the correct label. However, the difference resides in the relevance percentage associated with the word for each label, namely 0.45 for *User Execution* and 0.03 for *Browser Session Hijacking*. While the word *browser* is recognized as being relevant for both labels, the label with the higher percentage is selected. This finding can be associated with the discrepancy between training examples—240 for *User Execution*, while *Browser Session Hijacking* has only 102. Thus, the class imbalance affects the model's capability to recognize the real correlation between features and techniques, and leads the model to a biased decision.

The model extracts a correct technique for error #5 in Table 4, although it was not among the true labels. As Figure 12 shows, the CVE text description indicates the *Endpoint Denial of Service* technique, since the word *crash* is present and the relevance of the word for the *Endpoint Denial of Service* technique is 0.93. Figure 12 also suggests that the word *crash* is the only word that has a high impact on the model's decision to label the CVE as *Endpoint Denial of Service*.

Two observations can be made based on Figure 12. One is that the model successfully captures a technique overlooked by the reviewer. The technique labeling process is errorprone due to the ambiguity of the CVE text description and also the complexity of the labeling processing given the wide range of available techniques. Second, the model assigns a higher relevance to features that suggest *Endpoint Denial of Service* even though key features for the *Exploitation for Client Execution* are identified (i.e., *program* and *functions*).

Table 5 presents the most relevant words when performing feature extraction for each technique. More than 50% of the techniques have the same most relevant feature in common with other techniques in the MITRE ATT&CK Enterprise Matrix. For example, *Exploitation for Privilege Escalation*, *Data from Local System*, *Data Destruction*, *Browser Session Hijacking*, *Archive Collected Data*, and *Create Account* are all mapped to the same feature. Having the same most relevant extracted feature implies a strong intersection between techniques. This further emphasizes that the separation between labels is fuzzy. The opinion and consensus among reviewers were used to separate ambiguous examples, making use of previous experience and context obtained from other resources. This is inherited by the model since the labels from the training set reflect the reviewers' perspective. In this context, more

information would be valuable to counter the bias encapsulated in the training set by offering more background information to the model.

**Figure 12.** Comparison of word mappings for each technique corresponding to CVE #5 from Table 4. (**a**) Mapping Exploitation for Client Execution; (**b**) Mapping Endpoint Denial of Service.



## *4.3. Limitations*

We have identified a number of limitations for our model, which have a toll on the model's performance; these limitations are detailed further. First, the process of manually labeling a CVE is inevitably affected by the subjective perspective of the reviewer. Even though multiple attempts to limit this undesired outcome were taken (i.e., following a clear methodology and establishing general guidelines for the reviewers), the annotators were unable to fully eliminate the inconsistency in the dataset labels.

Second, the quality of the information in the CVE text descriptions must also be taken into consideration when discussing the general limitations of the proposed model. Inconsistencies among the CVE descriptions (incomplete, outdated, or even erroneous details) are highly prevalent [45], thus narrowing the attainable performance of the model.

Third, there is no clear delimitation between certain techniques. Multiple techniques have overlapping meanings and follow the same attack pattern (e.g., *Exploitation for Defence Evasion* and *Abuse Elevation Control Mechanism*). Due to this, a CVE might have multiple possible correct labels, depending on the methodology used to mark the CVE since techniques are closely interconnected and the difference between relating techniques is generally subtle.

Lastly, the rather small dataset and the severe imbalance between the number of CVEs associated with a technique has a toll on the capacity of the model to accumulate enough knowledge to correctly label future samples. Having a larger knowledge base for training the model would help provide samples so that the model perceives also sensitive nuances in CVE text descriptions.

## **5. Conclusions**

In this paper, we emphasized the need for an automatic linkage between the CVE list and MITRE ATT&CK Enterprise Matrix techniques. The problem was transposed into a multi-label task for Natural Language Processing for which we introduce a novel labeled CVE corpus that was augmented using adversarial attacks to limit the severe impact of imbalance between labels. Our baseline includes several classic machine learning models and BERT-based architectures, and the best performing model (i.e., Multi-label SciBERT) was evaluated within a series of experiments from multiple perspectives to extract a complete overview of the data augmentation impact. Comparing the obtained metrics against classical machine learning models accentuates the significant benefits brought by our solution to labeling CVEs with corresponding techniques.

Despite our model obtaining promising results in terms of well-represented techniques, the inherent limitations imposed by the training set tops up the maximum achievable performance. Future work will focus on improving the robustness of the labeled CVE corpus. On one hand, we will focus on enforcing homogeneity among labeling methodology; on the other, we will address the severe imbalance between labels and also its reduced size. Possible new strategies might consider Few-Shot Learning methods [46] for task generalization considering few samples. Semi-supervised learning [47] could also be a possible research direction, given the reduced number of labeled CVEs and the significant number of unlabeled samples that exist in the CVE list. Another aspect that is worth exploring is whether or not gathering extra information from additional sources (e.g., *Common Weakness Enumeration* CWE [48]) can address the incompleteness and inconsistency of the textual CVE description.

**Author Contributions:** Conceptualization, O.G., A.N., M.D. and R.R.; methodology, O.G. and M.D.; software, A.N. and O.G.; validation, O.G., A.N. and M.D.; formal analysis, O.G., A.N. and M.D.; investigation, A.N., O.G. and M.D.; resources, O.G. and A.N.; data curation, A.N.; writing—original draft preparation, O.G. and A.N.; writing—review and editing, M.D. and R.R.; visualization, A.N.; supervision, M.D. and R.R.; project administration, R.R.; funding acquisition, R.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS—UEFISCDI, project number 2PTE2020, YGGDRASIL—"Automated System for Early Detection of Cyber Security Vulnerabilities".

**Institutional Review Board Statement:** The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of the Faculty of Automated Control and Computers, University Politehnica of Bucharest.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The dataset is freely available on Tagtog at https://www.tagtog.com/ readerbench/MitreMatrix/ (accessed on 8 August 2022), whereas the code is available on Github at https://github.com/readerbench/CVE2ATT-CK (accessed on 8 August 2022).

**Acknowledgments:** We would also like to show our gratitude to Ioana Nedelcu, Ciprian Stanila, and Ioana Branescu for their contributions to building the labeled CVE corpus.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **Abbreviations**

The following abbreviations are used in this manuscript:


## **References**


## *Article* **Sustainable Risk Identification Using Formal Ontologies †**

**Avi Shaked <sup>1</sup> and Oded Margalit 2,\***


**Abstract:** The cyber threat landscape is highly dynamic, posing a significant risk to the operations of systems and organisations. An organisation should, therefore, continuously monitor for new threats and properly contextualise them to identify and manage the resulting risks. Risk identification is typically performed manually, relying on the integration of information from various systems as well as subject matter expert knowledge. This manual risk identification hinders the systematic consideration of new, emerging threats. This paper describes a novel method to promote automated cyber risk identification: OnToRisk. This artificial intelligence method integrates information from various sources using formal ontology definitions, and then relies on these definitions to robustly frame cybersecurity threats and provide risk-related insights. We describe a successful case study implementation of the method to frame the threat from a newly disclosed vulnerability and identify its induced organisational risk. The case study is representative of common and widespread real-life challenges, and, therefore, showcases the feasibility of using OnToRisk to sustainably identify new risks. Further applications may contribute to establishing OnToRisk as a comprehensive, disciplined mechanism for risk identification.

**Keywords:** formal ontology; risk identification; cybersecurity; vulnerability

## **1. Introduction**

Risk identification is the process which lays the foundations for establishing the cybersecurity posture of systems, organisations and services. Risk management is a collection of "coordinated activities to direct and control an organisation with regard to risk" [1]. Risk identification provides the infrastructure for all other risk management activities [2].

A risk is a potential for something to go wrong, eventually causing harm or loss [3]. Accordingly, cyber risk is an operational risk which is associated with activities in cyberspace that may cause damage to organisational assets [4].

The goal of risk identification is to "find, recognize and describe risks that may prevent an organization achieving its objectives" [5]. Refsdal et al. identify that risk comprises three elements: asset, vulnerability and threat [3]. In agreement, Strupczewski's meta model of cyber-risk concept includes the same three elements [4]. A vulnerability merely indicates an exploitable system property; a risk is distinguished from a vulnerability by having the potential to harm or reduce the value of an asset. The identification of pertinent assets—such as sensitive information and services—and their business value is therefore an essential risk identification element [6]. Risk identification requires knowing the business environment and the organisational assets in addition to the vulnerabilities [7].

Provided risks are properly identified, they can be then analysed, evaluated for impact and, if necessary, mitigated using appropriate security controls. Otherwise, unidentified risks may go untreated, and misidentified risks may be improperly treated; potentially resulting in considerable damage once they materialise [8].

**Citation:** Shaked, A.; Margalit, O. Sustainable Risk Identification Using Formal Ontologies. *Algorithms* **2022**, *15*, 316. https://doi.org/10.3390/ a15090316

Academic Editors: Francesco Bergadano and Giorgio Giacinto

Received: 18 August 2022 Accepted: 31 August 2022 Published: 2 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Continuous organisational changes introduce a major threat to performing risk identification [7]. The dynamics of business environments include changes to processes, products and services, as well as introduction of new information systems and related features. Irrespective of organisational changes, the cyber threat landscape is autonomously evolving. As an example, new software vulnerabilities are published on a daily basis, providing ample opportunities for attackers to exploit them [9]. Moreover, attacker capabilities—tactics, technologies and procedures (TTPs)—continue to improve [10]; sometimes to a military grade level [11]. To address the dynamics of cybersecurity, it is essential to have dynamic and adaptable cyber risk management, with risk identification outputs being revisited often to re-evaluate and establish an up-to-date organisational cybersecurity posture [6,7]. For this purpose, risk register mechanisms, such as those recommended by The European Union Agency for Cybersecurity (ENISA), contain the date of latest assessment as part of the risk register record and are expected to be properly maintained [12].

Relevant, up-to-date and timely information is crucial to robust risk identification [5]. Prevalent risk identification approaches rely on manual analysis by human experts [2]. These include brainstorming, interviews, checklists, statistics and techniques for historical data collection [3]. Risk identification also relies on integration of information from various sources [3,13]. Previous automation attempts with respect to cyber risk activities focused mostly on automated identification of threats and vulnerabilities (for example, [14,15]). Specifically, attributing the actual risk to organisational assets remains a manual analysis effort. The manual nature of risk identification approaches hinders their dynamic application in a sustainable form to meet the challenges of the evolving cybersecurity threat landscape [6].

This paper, which extends [16], proposes the use of a formal ontology to promote rigorous and continuous risk identification. A formal ontology is a well-defined, computerbased representation of concepts and their relations [17]. Formal ontology should not be confused directly with the philosophical term, which is concerned with the understanding of reality. However, formal ontology relates to the philosophical term, by capturing the ontology of a particular domain using a formal, well-structured model. We use the term "ontology" henceforth to relate to formal ontology.

Ontologies are a form of semantic technology. They provide the infrastructure for intelligent applications [18]. Ontologies belong to the content theory branch of Artificial Intelligence (AI) [19], and they are central for building intelligent computational agents [20]. Ontologies can minimise ambiguity and misunderstanding between stakeholders as well as lay the foundations for high-level reasoning and decision making [18,21]. An organisationspecific ontology can be used to facilitate interoperability between domains [22], and, even more specifically, between business and information technology concerns, with which organisational cybersecurity is typically associated [23].

Ontologies can be used to support risk management. Examples of such applications include management of human and ecological health risks [24] and safety risk management in construction [25]. Previous uses of ontologies for cybersecurity risk management did not consider the critical business impact of such risks [26–28]. An ontology-based system was demonstrated for the calculation of cybersecurity risk metrics, but it does not include inferred identification of risks and does not provide actionable risk-related information [29]. An automated security risk identification method to address engineering design issues exists, but it involves only identification of high-level consequence categories [30]. As far as we know, there is no ontology-based method to identify emerging cybersecurity risks which can be employed continuously by organisations, let alone one which allows an organisation to contextualise the risks with respect to the organisational operations.

This paper details and exemplifies a new method—OnToRisk—which uses formal ontology mechanisms to automate cybersecurity risks identification, based on integration of formal definitions and situational information from pertinent sources. OnToRisk is an AI method which employs aspects of knowledge representation to introduce robust information models; and of reasoning to provide actionable insights about situations represented by the models. The information models can include security intelligence related concepts—namely threat, vulnerability, asset and risk—as well as any other technical and organisational concepts that are relevant to provide situational awareness.

We describe a case study of using OnToRisk to identify risks emerging from a newly published software vulnerability, in an undisclosed, international enterprise in the finance sector (henceforth, "the enterprise"). While specific, the case study is representative of a general, desirable practice in every organisation which uses software components. A software vulnerability is "an instance of a flaw, caused by a mistake in the design, development, or configuration of software, such that it can be exploited to violate some explicit or implicit security policy" [31]. While previous work by Wang and Guo used a formal ontology to analyse vulnerabilities from the technical perspective of vulnerability management [21]; our case study uses a formal ontology to capture concepts and relations to analyse cybersecurity vulnerabilities from the organisational operations risk perspective.

The paper continues as follows. Section 2 presents the new, ontology-based risk identification method OnToRisk and overviews the vulnerability-induced risks identification case study. Section 3 details the case study results of using OnToRisk for vulnerabilityinduced risks identification. Section 4 reflects on the new risk identification method and the case study, as well as discusses further uses, benefits and research potential of the ontology-based method.

## **2. Materials and Methods**

OnToRisk uses formal ontology mechanisms for rigorous, information-based and definitionsbased risks identification. The OnToRisk method includes the following activities:


Activity #3 is meant to be automated as much as possible, e.g., by importing—while translating—existing organisational information from information systems into the formal ontology. Activity #4 is the activity in which new risk-related insights should emerge, automatically, based on the integration of explicit definitions and explicit situations. Ideally, these activities should be performed continuously, reflecting an up-to-date organisational security posture.

We validate OnToRisk using a case study methodology. The selection in a single-case study approach is aligned with the rationale identified by Yin; that the case study is a representative, typical case [32]. The OnToRisk method is applied in a case study of an enterprise seeking to identify risks emerging from the disclosure of a new vulnerability, which is found in a prevalent software component. The widely representative and applicable case study was inspired by real events, following the late 2021 disclosure of a vulnerability in Log4j [33,34].

Risk management is considered a business-related activity in an enterprise. Accordingly, the enterprise established and maintains a system of policies, as well as a hierarchical framework for communicating and assessing operational risks, with cybersecurity risks being included as part of the overall risk management organisational system. The risk-related concepts were identified based on careful reading of official documents and directives, analysis of some of the enterprise's information systems, and on conversations with domain experts. The latter included risk managers and an incident response leader.

First, as the OnToRisk method outlines, relevant concepts and their relations were defined as a formal ontology. Protégé was the tool used for authoring the ontology [35]. The ontology itself is in the standard Web Ontology Language (OWL) format. Relevant concepts are depicted in OWL using "classes"; and relations between concepts are formally

expressed in OWL using "object properties". In defining object properties, the source node class is referred to as "Domain," and the target node class is referred to as "Range".

A relevant risk definition was then added to the ontology, using some of the predefined classes and object properties. Next, a situation was captured. The situation was designed using natural language, and then translated into the ontology, as an instantiation of the formalised classes and object properties. Finally, a reasoner (HermiT within Protégé) is used to reason about the situation, i.e., process the explicit situation definitions and present inferred information based on these. The inference was verified to yield the results that are expected based on manual analysis of the situation.

The work, including the ontology and the resulting insight with respect to the enterprise's operations and infrastructure, was presented to domain experts as well as high-level management for both obtaining feedback and promoting the organisational risk management practices.

## **3. Results**

We now describe the results of applying OnToRisk to the case study (of identifying risks to the enterprise as they emerge from the disclosure of a new vulnerability in a software component). Appendix A provides the full definitions, described in Sections 3.1–3.3, in the form of a formal ontology. Appendix B provides the inferred assertions, described in Section 3.4, in the form of a formal ontology.

## *3.1. Concepts and Relations (Meta Levels Definitions)*

Figure 1 shows the concepts and relations, representing the result of performing activity #1 of OnToRisk in the case study. Concepts (classes) appear as graph nodes and relations (object properties) appear as edges between nodes. The concepts are:


**Figure 1.** The vulnerability-induced risk case represented as a formal graph of meta-level concepts and relations. This graph is generated by applying the CoModIDE plugin for Protégé [36] to the formal ontology. The graph shows: the classes as nodes; the object properties between classes (from domain to range) as solid, annotated arrows; and subtype (subclass) relations as dashed arrows.

The hierarchical structure of the risk concepts (concepts #6, #7 and #8 above) reflects the hierarchical risk definition architecture which is practiced within the enterprise; with Risk being a layer-1 risk definition, Cybersecurity risk being a layer-2 risk definition, and Vulnerability-induced Risk being introduced as a layer-3 risk definition. This conforms with the prominent business risk typology used in the financial sector with which the enterprise is associated [4].

Figure 1 also shows the object properties—expressing relations between concepts—as graph edges in solid line between the class nodes. The object properties are:


susceptible2Vulnerability ≡ inverse(foundIn) ◦ includesComponent (1)

6. risksInfo—indicates that a vulnerability (Domain) may risk sensitive information (Range). This object property is formally defined as a composite property using other object properties:

risksInfo ≡ accessInfo ◦ inverse(includesComponent) ◦ foundIn (2)

7. risksFunction—indicates that a vulnerability (Domain) may risk a business function (Range). This object property is formally defined as a composite property using other object properties:

risksFunction ≡ supportsFunction ◦ inverse(includesComponent) ◦ foundIn (3)

8. risksVia—identifies the application (Range) through which a specific Vulnerability-Induced Risk (Domain) can be realised. This object property is formally defined as a composite property using other object properties:

 $\text{risks}$ Via} \equiv \text{inverse(accessInfo)} \circ \text{risks}Info} \mid \text{

 $\text{Function}$ 

Reflecting on the derived ontology, we note that it is realistic and practical to acquire relevant information, which can be used for instantiating a situation using the ontology meta-level definitions. The enterprise operates an information system which records all the enterprise applications, along with attributes. Some of these attributes are the category of information that can be accessed by the application; and the application's business criticality score, which is established based on supported business functions. Extracting software components used by an application—a Software Bill Of Material (SBOM)—is a feature provided by various software composition analysis tools (by analysing either the source code or the final software artifacts). Information about software components vulnerabilities is found online in vulnerability repositories, such as [37].

## *3.2. Risk Definition*

Following the activity #2 guideline of the OnToRisk method, a Vulnerability-Induced Risk concept is formally defined using the established concepts and relations:

$$\begin{array}{l}\text{vulnerabilityIndicated} \equiv \text{vulnerability and} \\ \text{((riskFunction some SusinessFunction) or} \\ \text{(riskInfo some SensitivityInformation))} \end{array}$$

This formally defines the specific risk as a vulnerability which risks either a business function and/or sensitive information. Ideally, this definition could instantiate new information elements (of the VulnerabilityInducedRisk type). However, due to limitation in both the OWL ontology standard and the Protégé ontology authoring tool, instantiation of new elements is not possible by inference, and instead this tags a Vulnerability typed individual element as a VulnerabilityInducedRisk. Accordingly, VulnerabilityInducedRisk is also considered as a subclass of Vulnerability (in addition to being a subclass of Cybersecurity Risk); this is shown in Figure 1. This is merely a technical adaptation, which has no effect on the results as it can be easily interpretated to the ideal case, and we discuss this shortly. The formal definition of the set of risks (*R*) in this implementation is:

$$\mathcal{R} \equiv \{ v \in V \mid (\exists x \in BF \ \&\ (v, x) \in RF) \text{ or } (\exists y \in SI \ \&\ (v, y) \in RI) \}\tag{6}$$

with:

*V*—the set of Vulnerability class (i.e., concept) instantiations

*BF*—the set of Business Function class instantiations

*RF*—the set of risksFunction object properties instantiations

*SI*—the set of Sensitive Information class instantiations

*RI*—the set of risksInfo object properties instantiations

i.e., the set of risks is a subset of all vulnerabilities that have either a risksFunction object property (stating the vulnerability risks an existing business function) or a riskInfo object property (stating the vulnerability risks existing sensitive information).

The formal definition itself is more than a technical definition. This is the first concrete layer-3 risk definition, which extends the existing conceptual and abstract layer-2 enterprise risk definition (Cybersecurity risk). This fairly simple, formal ontology-based definition of a "vulnerability-induced risk" rigorously expresses a concrete type of risk. This specific risk type is of high importance to enterprise stakeholders, including its high-level management, and had not been declared until our OnToRisk implementation named it explicitly.

## *3.3. Situation*

The case study situation details a risk assessment scenario which considers a newly disclosed vulnerability. It is based on real-life situations—specifically, the discovery and public disclosure of the vulnerability known as "Log4shell" [33,34]. The case study is designed as an alternative, what-if scenario of detecting risks associated with the vulnerability using OnToRisk.

According to OnToRisk activity #3, the situation is captured as a collection of instantiations of the ontological concepts and relations (derived in activity #1 and reported in Section 3.1).

The baseline situation captures the organisational situation with respect to its operational applications and their business context. Four applications exist:


Now, consider the publication of a new Common Vulnerabilities and Exposures (CVE) record, related to the Log4j component. This results in a new situation, captured in formal ontology form by adding the newly disclosed vulnerability into our ontology, as an instantiation of the "Vulnerability" concept. We name this entity "Log4shell." Additionally, the vulnerability is associated with the affected software component—Log4j—by adding a "foundIn" object property from the Log4shell individual to the Log4j individual.

## *3.4. Ontology-Based Inferrence*

According to OnToRisk activity #4, we use the ontology-based reasoner to make inferences about the developing situation, and, ultimately, identify the emerging risks. The resulting inferred assertions that extend the explicitly declared assertions appear in Appendix B.

The reasoner provides the following new inferences:



**Figure 2.** The Log4shell ontology-based assertions in Protege. Manually stated (explicit) assertions appear in bold font, while automatically inferred assertions appear in regular font.


**Figure 3.** The reasoner explanation for asserting the risksVia properties. (**a**) for App3, as a result of the risksInfo with respect to ClientIDsList; (**b**) for App4, as a result of the risksFunction with respect to OpenAccount.

The automatically derived inferences are aligned with a manual analysis of the situation. While the manual analysis can be considered straightforward, performing such an analysis is time consuming, and this is exactly the effort that OnToRisk is designed to make redundant. The vulnerability in App2 does not present a new risk to the enterprise, from an operational perspective. Still, using the "susceptible2Vulnerability" property which now characterises the application, the potential of App2 being affected by the vulnerability can be communicated with the App2 application owner. The application owner can then choose whether to further analyse the vulnerability impact on the application and/or solve any vulnerability-related issues in a future version. The vulnerabilities in App3 and in App4, however, should be of interest to the enterprise management, as they introduce business risks. Continuously applying the reasoner to the enterprise situation allows pertinent managers to be notified immediately of such risks as they emerge; and the enterprise management can then promptly act to solve them, by identifying and empowering the appropriate personnel—such as application owners, risk managers, security officers and information officers—to do so.

## **4. Discussion**

The dynamic cybersecurity threat landscape requires risk identification to be performed continuously to achieve up-to-date situational awareness. This paper proposes a new, formal ontology-based method—OnToRisk—for promoting automated risk identification. The method relies on the use of AI—through its formal ontology branch—for information-based, systematic and continuous risk identification. The method employs formal ontology definitions of domain concepts and relations, as well of the associated risk, to analyse organisational situations and automatically provide actionable insights.

The OnToRisk method was successfully applied to identify risks emerging from a vulnerability disclosure, which is a widely applicable challenge in enterprises. As a given enterprise situation has changed to reflect existence of a new vulnerability, a reasoning mechanism—applied to the situation—automatically yielded a list of potentially affected applications as well as of the potential business impact. In practice, typical software applications may include hundreds of re-used lower-level components, which may lead to a significant effort in their manual analysis. The automated approach of OnToRisk decouples the risk identification effort from the quantity of software components. Moreover, new risks are identified, along with their potential business impact and the respective attack surface. A reasoning mechanism can act continuously on the information. These provide a strong basis for sustainable risk management, which is essential to creating a valid cybersecurity situational awareness.

Our method provides a step forward with respect to a previously identified need for a conceptual framework to drive the rapid and automated integration of Cyber Threats Intelligence (CTI) [10]. Specifically, our method conforms with the requirement that both internal and external information be factored into the automated integration process; and it provides a rigorous infrastructure for such integration. The case study demonstrates the integration of internal, enterprise-owned information—about applications composition as well as about their business context—with external vulnerability information. Currently, some of the data was integrated manually, by importing data—exported from various information systems—into the ontology. In the case study, information about enterprise applications was adopted from the enterprise's information system which is used to catalogue applications and their metadata. A likely technical future effort is to develop mechanisms to automate the integration of data into the ontology, using both internal data sources (such as application inventory information systems) and external data sources (such as CVE repositories).

Furthermore, with OnToRisk being a technology-agnostic and vendor-neutral method, the formal representation of a domain of interest may lead to identification of gaps in information, which in turn may justify the introduction of new technology and/or tools into the enterprise. Specifically, the case study's formal ontology relies on associating each application with its SBOM. However, at the time of performing the case study, the

enterprise has only employed SBOM tools to ingest open-source software packages and did not apply the relevant technology to produce the SBOM of its own applications. Our case study highlights the need to incorporate the technology and tools to extract SBOM from the enterprise applications that are in production in order to support risk assessment with respect to vulnerability-induced risks.

In the technical implementation of the case study, vulnerability-induced risks are represented by "tagging" vulnerabilities as vulnerability-induced risk, i.e., the risks are a subset of the vulnerabilities (as captured formally in Equation (6)). This is due to limitations in the OWL standard and the standard Protégé implementation that prevents inferring the existence of new individuals. We chose to adhere to the standard implementation to demonstrate the feasibility and practicality of OnToRisk. Ideally, however, the risk identification implementation can be easily improved when developing an ontology-based application or information system by using a proprietary mechanism to yield new individuals. Such individuals can be derived formally as the tuple (vulnerability, impacted element, attack surface), i.e.:

(*v*, *i*, *a*) ≡ {(*v*, *i*, *a*)| (3 *v* ∈ *V* & 3 *i* ∈ *BF* & 3 *a* ∈ *A* & (*a*, *i*) ∈ *SF* & (*v*, *i*) ∈ *RF* & (*v*, *a*) ∈ *RV*) *or* (3 *v* ∈ *V* & 3 *i* ∈ *SI* & 3 *a* ∈ *A* & (*a*, *i*) ∈ *AI* & (*v*, *i*) ∈ *RI* & (*v*, *a*) ∈ *RV*)} (7)

with:

*V*—the set of Vulnerability class (i.e., concept) instantiations

*BF*—the set of Business Function class instantiations

*A*—the set of Application class instantiations

*SF*—the set of supportsFunction object properties instantiations

*RF*—the set of risksFunction object properties instantiations

*SI*—the set of Sensitive Information class instantiations

*AI*—the set of accessInfo object properties instantiations

*RI*—the set of risksInfo object properties instantiations

*RV*—the set of risksVia object properties instantiations

OnToRisk currently provides the identification of potential risks. Identified risks should be further analysed. In the case study implementation, for example, an application marked as susceptible to a vulnerability due to the identity of one of its components may not present an actual risk, e.g., in a case where an application uses the component in a version which is not susceptible to the vulnerability or if the context of use or security controls prevent the exploitation of the disclosed vulnerability. Future research can establish the use of other ontology elements, such as data properties—in addition to classes and object properties—for improving the risk identification and its automation. Expanding the ontology with additional elements may also contribute to the prioritisation of risks (e.g., by introducing impact levels) and to the inclusion of additional CTI information.

With OnToRisk currently validated for the specific case study of a vulnerabilityinduced risk, additional research can utilise the method to identify other types of cybersecurity risks, such as those emerging from a compromised supply chain or from existence of Common Weaknesses Enumeration (CWE) in applications and application development.

Whereas a previous method by Eckhart et al. employs automated risk identification for improving engineering artifacts [30], OnToRisk provides automated risk identification for better organisational situational awareness. OnToRisk provides a more concrete view of business consequences, compared with the high-level consequence categories of the engineering-focused method proposed by Eckhart et al. OnToRisk relies on continuous integration of information within operational context, as opposed to initiated engineering design verification, which is the domain of the method by Eckhart et al. Both methods share a formal ontology approach as well as the goal of relieving personnel from tedious risk identification so that it can concentrate on other aspects of risk management. Therefore, future research may seek to integrate the two methods and deliver an ontology-based risk identification method for the full system lifecycle.

## **5. Conclusions**

In this paper we describe a new method—OnToRisk—which promotes the automatic identification of risks. The method is validated using a widely applicable, realistic and representative case study implementation of identifying risks emerging from software vulnerabilities.

Future research may demonstrate the use of the proposed method to support the automated identification of risks of additional types. Furthermore, elaborating the ontology definitions and the ontology-based reasoning can improve the output of the method, providing a more accurate and prioritised risk identification.

**Author Contributions:** Conceptualization, A.S.; methodology, A.S.; software, A.S.; validation, A.S.; resources, O.M.; writing—original draft preparation, A.S.; writing—review and editing, O.M.; visualization, A.S.; project administration, O.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not Applicable.

**Informed Consent Statement:** Not Applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **Appendix A. The Case Study Formal Ontology (OWL Format)**

This appendix provides the full ontology of the reported case study. The results are fully reproducible by copying the ontology into a text file and opening it with the Protégé ontology authoring tool.

<?xml version="1.0"?> <Ontology xmlns="http://www.w3.org/2002/07/owl#"> <Prefix name="owl" IRI="http://www.w3.org/2002/07/owl#"/> <Declaration> <Class IRI="owlapi:ontology578765402008551#Risk"/> </Declaration> <Declaration> <Class IRI="owlapi:ontology578765402008553#CybersecurityRisk"/> </Declaration> <Declaration> <Class IRI="owlapi:ontology578765402008555#VulnerabilityInducedRisk"/> </Declaration> <Declaration> <Class IRI="owlapi:ontology578765402008557#Application"/> </Declaration> <Declaration> <Class IRI="owlapi:ontology578765402008559#Component"/> </Declaration> <Declaration> <Class IRI="owlapi:ontology578765402008561#BusinessFunction"/> </Declaration> <Declaration> <Class IRI="owlapi:ontology578765402008563#SensitiveInformation"/> </Declaration> <Declaration> <Class IRI="owlapi:ontology578765402008565#Vulnerability"/> </Declaration>

<Declaration> <ObjectProperty IRI="owlapi:ontology578765402008567#includesComponent"/> </Declaration> <Declaration> <ObjectProperty IRI="owlapi:ontology578765402008569#foundIn"/> </Declaration> <Declaration> <ObjectProperty IRI="owlapi:ontology578765402008571#risksFunction"/> </Declaration> <Declaration> <ObjectProperty IRI="owlapi:ontology578765402008573#risksInfo"/> </Declaration> <Declaration> <ObjectProperty IRI="owlapi:ontology578765402008577#supportsFunction"/> </Declaration> <Declaration> <ObjectProperty IRI="owlapi:ontology578765402008581#accessInfo"/> </Declaration> <Declaration> <ObjectProperty IRI="owlapi:ontology578765402008588#risksVia"/> </Declaration> <Declaration> <ObjectProperty IRI="owlapi:ontology578765402008590#susceptible2Vulnerability"/> </Declaration> <Declaration> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App1"/> </Declaration> <Declaration> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App2"/> </Declaration> <Declaration> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App3"/> </Declaration> <Declaration> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App4"/> </Declaration> <Declaration> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#ClientIDsList"/> </Declaration> <Declaration> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4j"/> </Declaration> <Declaration> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4shell"/> </Declaration> <Declaration> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#OpenAccount"/> </Declaration> <EquivalentClasses> <Class IRI="owlapi:ontology578765402008555#VulnerabilityInducedRisk"/> <ObjectIntersectionOf> <Class IRI="owlapi:ontology578765402008565#Vulnerability"/> <ObjectUnionOf> <ObjectSomeValuesFrom>

<ObjectProperty IRI="owlapi:ontology578765402008571#risksFunction"/> <Class IRI="owlapi:ontology578765402008561#BusinessFunction"/> </ObjectSomeValuesFrom> <ObjectSomeValuesFrom> <ObjectProperty IRI="owlapi:ontology578765402008573#risksInfo"/> <Class IRI="owlapi:ontology578765402008563#SensitiveInformation"/> </ObjectSomeValuesFrom> </ObjectUnionOf> </ObjectIntersectionOf> </EquivalentClasses> <SubClassOf> <Class IRI="owlapi:ontology578765402008553#CybersecurityRisk"/> <Class IRI="owlapi:ontology578765402008551#Risk"/> </SubClassOf> <SubClassOf> <Class IRI="owlapi:ontology578765402008555#VulnerabilityInducedRisk"/> <Class IRI="owlapi:ontology578765402008553#CybersecurityRisk"/> </SubClassOf> <SubClassOf> <Class IRI="owlapi:ontology578765402008555#VulnerabilityInducedRisk"/> <Class IRI="owlapi:ontology578765402008565#Vulnerability"/> </SubClassOf> <ClassAssertion> <Class IRI="owlapi:ontology578765402008557#Application"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App1"/> </ClassAssertion> <ClassAssertion> <Class IRI="owlapi:ontology578765402008557#Application"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App2"/> </ClassAssertion> <ClassAssertion> <Class IRI="owlapi:ontology578765402008557#Application"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App3"/> </ClassAssertion> <ClassAssertion> <Class IRI="owlapi:ontology578765402008557#Application"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App4"/> </ClassAssertion> <ClassAssertion> <Class IRI="owlapi:ontology578765402008563#SensitiveInformation"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#ClientIDsList"/> </ClassAssertion> <ClassAssertion> <Class IRI="owlapi:ontology578765402008559#Component"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4j"/> </ClassAssertion> <ClassAssertion> <Class IRI="owlapi:ontology578765402008565#Vulnerability"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4shell"/> </ClassAssertion> <ClassAssertion> <Class IRI="owlapi:ontology578765402008561#BusinessFunction"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#OpenAccount"/> </ClassAssertion>

<ObjectPropertyAssertion>

<ObjectProperty IRI="owlapi:ontology578765402008567#includesComponent"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App2"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4j"/>

</ObjectPropertyAssertion>

<ObjectPropertyAssertion>

<ObjectProperty IRI="owlapi:ontology578765402008567#includesComponent"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App3"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4j"/>

</ObjectPropertyAssertion>

<ObjectPropertyAssertion>

<ObjectProperty IRI="owlapi:ontology578765402008581#accessInfo"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App3"/>

<NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#ClientIDsList"/>

</ObjectPropertyAssertion>

<ObjectPropertyAssertion>

<ObjectProperty IRI="owlapi:ontology578765402008567#includesComponent"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App4"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4j"/>

</ObjectPropertyAssertion>

<ObjectPropertyAssertion>

<ObjectProperty IRI="owlapi:ontology578765402008577#supportsFunction"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App4"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#OpenAccount"/>

</ObjectPropertyAssertion>

<ObjectPropertyAssertion>

<ObjectProperty IRI="owlapi:ontology578765402008569#foundIn"/>

<NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4shell"/>

<NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4j"/>

</ObjectPropertyAssertion>

<ObjectPropertyDomain>

<ObjectProperty IRI="owlapi:ontology578765402008567#includesComponent"/> <Class IRI="owlapi:ontology578765402008557#Application"/>

</ObjectPropertyDomain>

<ObjectPropertyDomain>

<ObjectProperty IRI="owlapi:ontology578765402008569#foundIn"/>

<Class IRI="owlapi:ontology578765402008565#Vulnerability"/>

</ObjectPropertyDomain>

<ObjectPropertyDomain>

<ObjectProperty IRI="owlapi:ontology578765402008571#risksFunction"/> <Class IRI="owlapi:ontology578765402008565#Vulnerability"/>

</ObjectPropertyDomain>

<ObjectPropertyDomain>

<ObjectProperty IRI="owlapi:ontology578765402008573#risksInfo"/>

<Class IRI="owlapi:ontology578765402008565#Vulnerability"/>

</ObjectPropertyDomain>

<ObjectPropertyDomain>

<ObjectProperty IRI="owlapi:ontology578765402008577#supportsFunction"/> <Class IRI="owlapi:ontology578765402008557#Application"/>

</ObjectPropertyDomain>

<ObjectPropertyDomain>

<ObjectProperty IRI="owlapi:ontology578765402008581#accessInfo"/>

<Class IRI="owlapi:ontology578765402008557#Application"/>

</ObjectPropertyDomain>

<ObjectPropertyDomain>

<ObjectProperty IRI="owlapi:ontology578765402008588#risksVia"/>

<Class IRI="owlapi:ontology578765402008555#VulnerabilityInducedRisk"/>

</ObjectPropertyDomain>

<ObjectPropertyDomain>

<ObjectProperty IRI="owlapi:ontology578765402008590#susceptible2Vulnerability"/> <Class IRI="owlapi:ontology578765402008557#Application"/>

</ObjectPropertyDomain>

<ObjectPropertyRange>

<ObjectProperty IRI="owlapi:ontology578765402008567#includesComponent"/> <Class IRI="owlapi:ontology578765402008559#Component"/>

</ObjectPropertyRange>

<ObjectPropertyRange>

<ObjectProperty IRI="owlapi:ontology578765402008569#foundIn"/> <Class IRI="owlapi:ontology578765402008559#Component"/>

</ObjectPropertyRange>

<ObjectPropertyRange>

<ObjectProperty IRI="owlapi:ontology578765402008571#risksFunction"/> <Class IRI="owlapi:ontology578765402008561#BusinessFunction"/>

</ObjectPropertyRange>

<ObjectPropertyRange>

<ObjectProperty IRI="owlapi:ontology578765402008573#risksInfo"/>

<Class IRI="owlapi:ontology578765402008563#SensitiveInformation"/>

</ObjectPropertyRange>

<ObjectPropertyRange>

<ObjectProperty IRI="owlapi:ontology578765402008577#supportsFunction"/> <Class IRI="owlapi:ontology578765402008561#BusinessFunction"/>

</ObjectPropertyRange>

<ObjectPropertyRange>

<ObjectProperty IRI="owlapi:ontology578765402008581#accessInfo"/>


<ObjectPropertyRange>

<ObjectProperty IRI="owlapi:ontology578765402008588#risksVia"/>

<Class IRI="owlapi:ontology578765402008557#Application"/>

</ObjectPropertyRange>

<ObjectPropertyRange>

<ObjectProperty IRI="owlapi:ontology578765402008590#susceptible2Vulnerability"/> <Class IRI="owlapi:ontology578765402008565#Vulnerability"/>

</ObjectPropertyRange>

<SubObjectPropertyOf>

<ObjectPropertyChain>

<ObjectProperty IRI="owlapi:ontology578765402008567#includesComponent"/> <ObjectInverseOf>

<ObjectProperty IRI="owlapi:ontology578765402008569#foundIn"/>

</ObjectInverseOf>

</ObjectPropertyChain>

<ObjectProperty IRI="owlapi:ontology578765402008590#susceptible2Vulnerability"/>

</SubObjectPropertyOf>

<SubObjectPropertyOf>

<ObjectPropertyChain>

<ObjectProperty IRI="owlapi:ontology578765402008569#foundIn"/>

<ObjectInverseOf>

<ObjectProperty IRI="owlapi:ontology578765402008567#includesComponent"/>

</ObjectInverseOf>

<ObjectProperty IRI="owlapi:ontology578765402008577#supportsFunction"/> </ObjectPropertyChain>

<ObjectProperty IRI="owlapi:ontology578765402008571#risksFunction"/>

</SubObjectPropertyOf>

<SubObjectPropertyOf>

<ObjectPropertyChain>

<ObjectProperty IRI="owlapi:ontology578765402008569#foundIn"/> <ObjectInverseOf>

<ObjectProperty IRI="owlapi:ontology578765402008567#includesComponent"/> </ObjectInverseOf>

<ObjectProperty IRI="owlapi:ontology578765402008581#accessInfo"/>

</ObjectPropertyChain>

<ObjectProperty IRI="owlapi:ontology578765402008573#risksInfo"/>

</SubObjectPropertyOf>

<SubObjectPropertyOf>

<ObjectPropertyChain>

<ObjectProperty IRI="owlapi:ontology578765402008571#risksFunction"/> <ObjectInverseOf>


<ObjectProperty IRI="owlapi:ontology578765402008588#risksVia"/>

</SubObjectPropertyOf>

<SubObjectPropertyOf>

<ObjectPropertyChain>

<ObjectProperty IRI="owlapi:ontology578765402008573#risksInfo"/>

<ObjectInverseOf>


</ObjectPropertyChain>


</Ontology>

## **Appendix B. Inferred Assertions by the Reasoner (OWL Format)**

<ClassAssertion>

<Class IRI="owlapi:ontology578765402008551#Risk"/>

<NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4shell"/>

</ClassAssertion>

<ClassAssertion>

<Class IRI="owlapi:ontology578765402008553#CybersecurityRisk"/>

<NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4shell"/>

</ClassAssertion>

<ClassAssertion>

<Class IRI="owlapi:ontology578765402008555#VulnerabilityInducedRisk"/>

<NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4shell"/>

</ClassAssertion>

<ObjectPropertyAssertion>

<ObjectProperty IRI="owlapi:ontology578765402008590#susceptible2Vulnerability"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App2"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4shell"/>

</ObjectPropertyAssertion>

<ObjectPropertyAssertion>

<ObjectProperty IRI="owlapi:ontology578765402008590#susceptible2Vulnerability"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App3"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4shell"/> </ObjectPropertyAssertion> <ObjectPropertyAssertion> <ObjectProperty IRI="owlapi:ontology578765402008590#susceptible2Vulnerability"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App4"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4shell"/> </ObjectPropertyAssertion> <ObjectPropertyAssertion> <ObjectProperty IRI="owlapi:ontology578765402008571#risksFunction"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4shell"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#OpenAccount"/> </ObjectPropertyAssertion> <ObjectPropertyAssertion> <ObjectProperty IRI="owlapi:ontology578765402008573#risksInfo"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4shell"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#ClientIDsList"/> </ObjectPropertyAssertion> <ObjectPropertyAssertion> <ObjectProperty IRI="owlapi:ontology578765402008588#risksVia"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#Log4shell"/> <NamedIndividual IRI="http://www.co-ode.org/ontologies/ont.owl#App3"/>

</ObjectPropertyAssertion>

<ObjectPropertyAssertion>

<ObjectProperty IRI="owlapi:ontology578765402008588#risksVia"/>


</ObjectPropertyAssertion>

</Ontology>

## **References**


## MDPI

St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Algorithms* Editorial Office E-mail: algorithms@mdpi.com www.mdpi.com/journal/algorithms

Academic Open Access Publishing

www.mdpi.com ISBN 978-3-0365-8265-8