**1. Introduction**

The development of modern information and communications technologies (ICTs) brings new possibilities for users and organizations, whereby the user is not strictly attached to physical data storage, can access their data anytime and anywhere, use different methods and services for data processing and sharing instantly, etc. Together with the ITC possibilities, the variety of cyberattack vectors has also increased. This is expected because of the complexity of modern technologies, as well as orientation to user experience (UX). Therefore, the spending on security and risk managemen<sup>t</sup> increases every year, reaching 155 billion USD worldwide in 2021 [1].

The growth of spending on security and risk managemen<sup>t</sup> is affected by multiple factors [2]: transition to remote or mixed working; cloud, SaaS security assurance; the rise of new threat landscapes. A solution to fight the current spending needs on security and risk managemen<sup>t</sup> is cyber intelligence. In cyber intelligence, artificial intelligence (AI) solutions are used to automate the process, while providing additional benefits to specific security and risk managemen<sup>t</sup> areas [3,4].

The development of cyber intelligence is affected by a lack of data for data analysis and decision support. While supervised learning AI solutions are mostly oriented on some specific tasks (data classification, anomaly detection), ontologies as a knowledge base for process automation might have a wider application (semantic modeling, extraction of needed knowledge, etc.) [5].

The ontology structure defines the simplicity of knowledge extraction, while the real value of the ontology relies upon the data it stores. The biggest portion of security knowledge at the moment is not structured; it is presented as text data and is, therefore,

<sup>1</sup> Department of Information Technologies, Vilnius Gediminas Technical University, LT-10223 Vilnius, Lithuania

currently limited for application in cyber intelligence solutions. It is important to have a mechanism, assuring a wide range of up-to-date and qualitative data from different sources it. Manually updating security ontology is not practical because of the wide variety of data sources, potential impact of data interpretation, lack of resources, etc. Some methods for text transformation to ontology exist [6]; however, they concentrate on the estimation of concepts, instances, hypernyms, and hyponyms, with no relationship between the data source and concept. When adopting ontology knowledge application and decision justification by mapping knowledge to appropriate data sources, the ontology structure has to be suitably designed.

This paper aims to increase the possibilities of ontology-based cyber intelligence solutions by presenting a security ontology structure for data storage to the ontology from different text-based data sources, supporting the knowledge traceability and relationship estimation between different security documents. Therefore, the main contribution of the paper is answering the research question regarding the main principles of text-based security document formalization to the ontology for gathered data usability and generation of new knowledge.

The paper reviews related works on security ontology and text transformation to ontologies. On the basis of the review results, a new security ontology structure is proposed to provide a linking of the concepts to original data sources. The proposed structure is validated by presenting some numerical results of its application and directions of usage of such an ontology structure.

### **2. Related Works**

"An ontology is a formal and explicit specification of a shared conceptualization" [7]. It is a basis of semantic modeling and allows the storage of different concepts, as well as their properties and relationships. Therefore, ontologies are known as knowledge bases rather than databases. Because of the properties of ontologies, they represent one of the solutions for cyber intelligence and a future research direction [8]. The potential of ontologies can be seen in different application areas, such as digital evidence review [9], software requirement and security issue detection [10], modeling of Internet of things design [11], security alert managemen<sup>t</sup> [12], and as a standard for cyber threat sharing [13].

Ontologies are mostly created by area experts. The expert designs the ontology by formalizing its knowledge using different data sources. Ontologies based only on expert knowledge mostly present the landscape of an area, while additional tools and transformations are used to incorporate existing knowledge into the structure of the designed ontology. Ontologies, with formalized knowledge of different sources, have a higher value, as they present not only the general concepts of the area but consolidate knowledge of different data sources and serve as a knowledge base. However, the transformation from different data sources to ontology might be complicated because of different data formats and types. One of the most complex data types for formalization is text-based data. The same knowledge can be presented in very different texts, and word-to-word matching might not be enough for knowledge matching. Therefore, it is important to find the best solution for text-written knowledge extraction and transformation to ontology.

The next two sections are dedicated to analyzing the existence of security ontology, as well as presenting knowledge of different security area documents and existing solutions to transform text-written knowledge to ontology.
