Next Article in Journal
Generative AI-Enabled Energy-Efficient Mobile Augmented Reality in Multi-Access Edge Computing
Previous Article in Journal
A Novel Visual System for Conducting Safety Evaluations of Operational Tunnel Linings
Previous Article in Special Issue
RPKI Defense Capability Simulation Method Based on Container Virtualization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Knowledge Graph-Based Consistency Detection Method for Network Security Policies

Institute of Computer Application, China Academy of Engineering Physics, Mianyang 621900, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(18), 8415; https://doi.org/10.3390/app14188415
Submission received: 2 August 2024 / Revised: 13 September 2024 / Accepted: 16 September 2024 / Published: 19 September 2024

Abstract

:
Network security policy is regarded as a guideline for the use and management of the network environment, which usually formulates various requirements in the form of natural language. It can help network managers conduct standardized network attack detection and situation awareness analysis in the overall time and space environment of network security. However, in most cases, due to configuration updates or policy conflicts, there are often differences between the real network environment and network security policies. In this case, the consistency detection of network security policies is necessary. The previous consistency detection methods of security policies have some problems. Firstly, the detection direction is single, only focusing on formal reasoning methods to achieve logical consistency detection and solve problems. Secondly, the detection policy field is not comprehensive, focusing only on a certain type of problem in a certain field. Thirdly, there are numerous forms of data structures used for consistency detection, and it is difficult to unify the structured processing and analysis of rule library carriers and target information carriers. With the development of intelligent graph and data mining technology, the above problems have the possibility of optimization. This article proposes a new consistency detection approach for network security policy, which uses an intelligent graph database as a visual information carrier, which can widely connect detection information and achieve comprehensive detection across knowledge domains, physical devices, and detection methods. At the same time, it can also help users grasp the security associations with the real network environment based on the graph algorithm of the knowledge graph and intelligent reasoning. Furthermore, these actual network situations and knowledge bases can help managers improve policies more tailored to local conditions. This article also introduces the consistency detection process of typical cases of network security policies, demonstrating the practical details and effectiveness of this method.

1. Introduction

Security policy is an important part of situation awareness analysis technology. Security policy specifies the security requirements of the network from the overall to the local level, providing guidance for standardized situation awareness analysis. The results and reasoning of situation awareness analysis can also help correct security policies. Network security policy consistency detection refers to the process of comparison and functional testing between network security policies and actual situations (detecting the consistency between the static configuration and dynamic rules of the network), in order to improve policies and correct the network configurations. Previous research on security policies has mainly focused on policy conflict analysis and policy formalization, while policy consistency detection relies more on expert systems or manual matching [1,2,3]. In recent years, with the increasing scale of networks and the complexity of security incidents in the field of network attack detection and situational awareness analysis, the focus of research on network security has gradually shifted from a single security domain to a composite security domain. Therefore, consistency detection of security policies has been re-emphasized. At the same time, the development of knowledge graphs, data mining, and large language models has also brought new ideas for consistency detection of security policies.
Knowledge graph technology can display complex knowledge domains through data mining, information processing, knowledge measurement, and graphical representation, thereby revealing the static and dynamic situations of the knowledge domain. It not only serves as a data carrier, but it can also minimize the time required to organize multiple different data sources into meaningful relationships, while possessing intelligent computing and reasoning capabilities [4,5,6]. Therefore, applying knowledge graphs to security policy consistency detection problems will have unique advantages.
In this paper, we will innovatively use knowledge graphs to propose a new knowledge graph-based approach for consistency detection of network security policies by combining the existing experience of security policy consistency detection and knowledge graph technology. The main contributions of the new method are as follows:
  • Based on the task characteristics of policy consistency detection, we innovatively store the detection logic of policies in the knowledge graph and propose a four-layer knowledge graph structure for policy consistency detection tasks;
  • A feasible formal approach in the task of policy consistency detection is given;
  • The specific steps of using a knowledge graph for policy consistency detection are given, and at the same time, the characteristics of the knowledge graph that can carry out intelligent computation and reasoning are used to give the implementation methods of some extended functions related to policy consistency detection.
This article is divided into eight parts; the first part is the introduction of the article. The second part introduces previous research results related to this article, many of which are important sources of ideas for this article. The third part introduces the overall method and process of this article. The fourth part introduces the construction of a knowledge graph for consistency detection in network security policies. The fifth part introduces the detailed process of consistency detection based on the knowledge graph. The sixth part introduces the application of the method in this article and the implementation methods of related extended functions. The seventh part is the experiment and analysis of network security policy consistency detection for access control cases. The eighth part is the conclusion of this article and the outlook for future research.

2. Background and Related Work

Security policy is the cornerstone of the security threat defense system, and has been the focus of many researchers since the 1990s. In 1992, JB Michael and EH Sibley, etc., added an intermediate step to the object-oriented method model based on extended entity relationship styles for the process of transforming security policies from natural language expression to axiomatic form [7]. This is one of the earlier attempts at formalizing policies, and its approach aligns well with knowledge graph methods. In 1993, Michael and J Bret proposed a formal method for testing the logical consistency of composite security policies [8]. In 1997, Cholvy Laurence and Cuppens Frederic proposed a method for inferring security policy attributes and developed a security policy consistency checking tool based on Prolog [9]. Sekar, R and Uppuluri, Premchand et al. proposed and improved the Behavior Monitoring Standard Language (BMSL) in 1999 and 2003, respectively [10,11]. The BMSL method can be extended to many fields of network security, and this article also refers to the use of this method. In 2008, XF Lei and J Liu designed and implemented a system for formally specifying and verifying security policies [12]. In 2014, Krombi Wadie and Erradi Mohammed, etc., proposed a program that can synthesize an automaton to implement a given security policy [13]. In 2015, S Liang and Z Wang, etc., proposed a method for static detection when formulating and modifying policies: the method of dynamic detection during system operation [14]. The attempts of these automated systems for verifying security policies provide many ideas for detecting classification and detection methods. In 2022, Aodi Liu and Xuehui Du, etc., proposed a constraint-based access control policy security analysis framework to conduct security analysis on access control policies [15]. This is a good formal framework, but it only focuses on a single access control issue. Although the research results on security policies mentioned above are abundant, most of these studies focus on the theoretical level of policy design and conflict resolution. There is a lack of intelligent means for consistency detection. At the same time, the formal methods of policies are not only diverse, but also have significant differences in methods for different security domains, without a relatively unified logical approach.
Although the concept of the knowledge graph originated early, it was not officially proposed by Google until 2012 [16]. Subsequently, the development of knowledge graph technology has been rapid, with methods gradually becoming more diverse and the system gradually maturing. In 2021, Aidan Hogan et al. provided a systematic system introduction and technical summary for knowledge graphs [17]. While knowledge graphs are widely used in engineering practice, some researchers have also applied knowledge graphs in the field of network security. In 2016, S. Noel and E Harley described CyGraph, a system used to improve network security posture and maintain situational awareness in the face of network attacks [18]. Their system fully utilizes knowledge graph technology, integrating network configuration, vulnerability attack knowledge base, and network situational analysis into the knowledge graph. This idea is excellent and is one of the earliest practices to integrate knowledge graph technology into the practical field of network security. In 2018, Aksu M Ugur and Bicakci Kemal, etc., improved the technical details of the automatic production of attack graphs [19]. This knowledge graph reasoning, like technique, can be extended to consistency detection of security policies, obtaining graph paths of device configurations, raw data, use cases, etc., that match security policies. In 2020, M. Tikhomirov and N Loukachevich et al. investigated the performance of BERT models in named entity recognition tasks in the field of network security and explored a new data augmentation approach to accomplish entity recognition tasks [20].

3. Proposed Methods

The method proposed in this article mainly includes two parts: the construction of knowledge graph and the consistency detection method.
The flowchart of the method for constructing a knowledge graph of network security policies is shown in Figure 1. The knowledge graph proposed in this paper includes four layers, including the network information layer graph obtained by topology information; the security knowledge layer graph obtained by entity recognition, knowledge extraction, and knowledge fusion from security knowledge data sources; the detection logic layer graph obtained by transforming the policy text into a formal method related to ontology modeling; the logical dependency layer graph obtained by persistently storing default information and semantic parsing functions for policy text. In the four-layer graph, the detection logic layer graph is the core and contains the main content of the policy; the form of policy storage is event sequence and response. The network information layer graph is a collection of detection objects, including the detection targets of policies. The security knowledge layer graph is the basis for detection, and persistent and maintainable security domain knowledge provides the basis for detection. The logical dependency layer graph is an auxiliary tool for detection, mainly used to achieve automated and standardized consistency detection tasks.
The flowchart of the method for policy consistency detection based on the knowledge graph is shown in Figure 2. Simply put, first, the event sequence and response of the target policy are obtained from the detection logic layer graph. Then, segment the policy text and query the default information and the semantic parsing functions related to it in the logical dependency layer graph. Subsequently, the semantic parsing function is introduced into the event sequence to retrieve unknown parameters from the security knowledge layer graph and network information layer graph, and the events in the event sequence are executed sequentially (written in a formal language similar to BMSL that combines Cypher syntax). Finally, consistency detection is achieved by comparing the execution results of the event sequence with the response corresponding to three different detection tasks.

4. Construction of Knowledge Graph

The constructed knowledge graph is divided into four layers: the network information layer graph, security knowledge layer graph, detection logic layer graph, and logical dependency layer graph.
The scope of network security policies is very broad, and can be roughly divided into four categories by domain: physical security policies, operational service security policies, data security policies, and content security policies. Therefore, security entities need to be reasonably classified when modeling the ontology of knowledge graphs. The physical security policy focuses on the physical properties of hardware entities, the electromagnetic compatibility working environment, and security management systems. When modeling ontology, it should include the physical property parameters of hardware entities, the various sensing information of the working environment and hardware, and the implementation records of security management systems. Running operational service security policies is a relatively narrow network security policy, including intrusion detection policies, access control policies, etc. When modeling ontology, it should include information on the attack dimension, asset dimension, and vulnerability dimension of network behavior; the execution and detection of control policies; and the functional information of related hardware and software. The data security policy focuses on the control of data flow information, including encryption communication policy, confidentiality level permission policy, etc. In ontology modeling, common encryption communication models, confidentiality level permission models, etc., should be included. The content security policy focuses on content auditing, and in ontology modeling, it should include basic information of content auditing software and a certain number of detection case types.

4.1. Network Information Layer Graph

The network topology information can be automatically obtained by scanning with nmap software, but many network devices’ topology information still needs to be manually collected and imported into the graph by administrators. It should be noted that due to the nature of policy detection tasks, the network information layer graph should not only include conventional network devices such as hosts, servers, routers, firewalls, but also unconventional devices such as IoT devices and sensors. For some special domain policies, the information of functional software should also be recorded in the network information layer graph.
When designing a network information layer graph for the four types of policies in communication networks (physical security policy, operational service security policy, data security policy, and content security policy), a different emphasis should be placed on each policy. Table 1 presents some important information related to the network information layer graph in four types of policies.

4.2. Security Knowledge Layer Graph

The data sources of the security knowledge layer graph include the attack dimension, and asset dimension, vulnerability dimension. The data source can come from public databases, encyclopedias, blogs, technical manuals, etc. Considering that semi-structured data sources are more conducive to knowledge extraction, simple crawling methods can achieve good results. Therefore, it is recommended to use existing knowledge bases or rule bases, such as the attack behavior knowledge base ATT&CK and the general attack pattern enumeration and classification knowledge base CAPEC; the general platform enumeration database CPE for asset dimension; the public vulnerability and exposure database CVE and the common vulnerability enumeration database CWE for vulnerability maintenance, various compiled blog tables, official technical manuals for various devices and software, etc.
The construction of a security knowledge layer graph requires entity recognition, knowledge extraction, and knowledge fusion. There are many methods for entity recognition and knowledge extraction in the field of network security, such as automated network security entity annotation methods, supervised entity extraction methods based on security related corpus annotation, and network security entity relationship extraction methods that combine semi-supervised natural language processing and bootstrapping algorithms [21]. These methods all have good results. However, there are not many methods available for knowledge fusion in the field of network security, due to the large scale of the corpus, diverse data sources, and different data sources with different focuses. The most important aspect of knowledge fusion is entity alignment, and the most common method of entity alignment is based on similarity function feature matching to achieve alignment. As there is currently no universal similarity function for entity alignment in the field of network security, this paper adopts a method of weighting iteration by mixing multiple text similarity functions for entity alignment to achieve acceptable accuracy.

4.3. Detecting Logic Layer Graph

The essence of building a detecting logic layer graph is to transform policy texts in natural language form into formal specification expressions. This article builds a knowledge graph based on the open-source neo4j database and proposes a formal language similar to BMSL (Behavior Monitoring Specification Language) that combines Cypher syntax.
For a natural language security policy, it can be expressed in the following form:
Event Sequence - Response
Among them, the event sequence represents the set of detection behaviors corresponding to the security policy, and each detection behavior is called an event. An event takes the form of e( x 1 , x 2 ,…, x n ), where e is a defined function with Cypher syntax features, and x 1 , x 2 ,…, x n are the parameters of the function.
An event can be formed into an event sequence through a series of operators, with the main operators defined as follows: the “|” operator is used to separate event and condition, in the form of e( x 1 , x 2 ,…, x n )|CondExp. Among them, CondExp is a Boolean expression. The “·” operator represents a sequence relationship, and e 1 · e 2 represents a sequence composed of events e 1 and e 2 with sequential execution order. The “*” operator represents the closure relationship, and e 1 * e 2 represents the execution of events e 1 and e 2 in any order. The “||” operator represents the relationship between or, and e 1 || e 2 represents the execution of event e 1 or event e 2 . The “!” operator represents a relationship of no, in the form of :!(e( x 1 , x 2 ,…, x n )|CondExp), indicating an event that does not match e( x 1 , x 2 ,…, x n )|CondExp.
The response is essentially a statement of judgment and a ‘0/1’ result. According to the requirements of the security policy, when all behaviors in the event sequence occur in the predetermined order, the result is usually in compliance or non-compliance with the policy. In case of non-compliance with the policy, the situation can also be divided in detail, such as disabling relevant device or personnel permissions, sending alerts or login logs to administrators, etc.
For a single policy text, formalizing the policy text will result in an “event sequence” and a “response measure”, which can be understood as executing the “response measure” when the “event sequence” is detected. In the detection logic layer graph, the following four entities are defined: “policy text”, “event sequence”, “response measure”, and “event”. The “policy text” establishes a relationship with its formalized “event sequence” and “response measure”. By linking the “event sequence” and “response measure” with a relationship identified by the policy sequence number, they can be stored in the knowledge graph. A ‘sequence of events’ usually consists of multiple events, which can be linked to the ‘sequence of events’ in a knowledge graph using a one-way arrow relationship. For related combination policy texts, different “policy texts” can be linked together using relationships such as “contain” and “belong”.
The process of converting from natural language to event sequence is essentially a text2cypher task. In professional fields such as policy detection, this task may have been tedious in the past, but now, using a “universal big language model+prompt engineering” can easily accomplish this task. The open-source GraphRAG is a great tool for this. By asking questions, the natural language policy text is handed over to GraphRAG, which not only provides the result of the question but also the specific query path of this policy in the graph. Through a simple script, the query path can be persisted into a cypher statement. These cypher statements combined together form the event sequence we need.

4.4. Logical Dependency Layer Graph

The logical dependency layer graph is a logical supplement to the detection of logical layer graphs. Even if highly refined formal methods are used to process policy texts in natural language, it is often not possible to achieve satisfactory automated consistency detection. This is because relying solely on automated entity information localization and knowledge reasoning often leads to omissions and misinterpretations of some expressions or sentence meanings commonly used in natural language conventions. So for common default information in natural language expression in the field of network security, we should record it in the logical dependency layer graph. At the same time, linguistic principles should be utilized to develop semantic parsing functions for automated use in auxiliary graphs, associating semantic parsing functions with corresponding natural language forms and storing them in graphs.
The most direct method for generating semantic parsing functions is rule-based manual writing, which is precise but cumbersome. Next is path inference based on connectivity and path patterns, which is suitable for linguistically standardized policies. Using linguistic methods and knowledge graph-based path inference algorithms, it obtains the possible locations of entities or attributes. Finally, the method of using knowledge graph completion, which is also recommended in this article. It treats unknown information in natural language text as entities that need to be supplemented into the knowledge graph. Using the KBC method, the optimal position and adjacent nodes of the entity node in the knowledge graph are located, and semantic parsing functions only need to record the node.

5. Consistency Detection

The common detection behaviors of security policies in natural language form include rule-based matching detection; feature-oriented test case classification detection; real-time monitoring and detection. The characteristics of the three detection behaviors are as follows:
  • Rule-based matching detection: by leveraging the experience and knowledge of experts, the strategic objectives are transformed into specific target configuration rule libraries. During detection, the actual configurations are brought into the rule libraries and matched one by one.
  • Feature-oriented test case classification detection: by predefining and writing reasonable test cases for the target policy requirements that need to be detected, the test cases can be applied to real or simulated environments, and compare the differences between the real results and expected results of the test cases can be compared.
  • Real-time monitoring and detection: the real-time system configuration information, and operational status are collected through product logs, traffic monitoring, and other methods, and the impact of abnormal information is compared with policy requirements.
In the automated consistency detection based on knowledge graph policy, cypher statements can be used to determine whether the rule set is satisfied, find test case paths, and monitor data comparison in real time. The results of such cypher statements can be judged as to whether they are the expected results, and the response can be obtained. This can quickly and accurately link security policies with detection behavior, and obtain detection results through highly formalized judgment conditions.
The automated process for single policy consistency detection is as follows:
1.
Query the logical dependency layer graph to obtain the detection target and detection criteria-related semantic parsing functions x u n k n o w = Semantic-Parsing ( x k n o w ) and default information Def = y 1 , y 2 ,, y m for the policy text.
2.
Locate the position of the event sequence and response of the target policy in the detection logic layer graph through the policy sequence number.
3.
Obtain the parameters x 1 , x 2 ,, x n for event = e ( x 1 , x 2 ,, x n ) in the event sequence, which come from three parts: one part directly from the network information layer graph or security knowledge layer graph, which usually appears in the form of constants A portion of the semantic parsing functions that need to be obtained through Step 1 are calculated from the network information layer graph or security knowledge layer graph, which is known as x u n k n o w = Semantic-Parsing ( x k n o w ). x k n o w are the data that can be directly queried from the network information layer graph or security knowledge layer graph. The semantic parsing function Semantic-Parsing() often relies on query calculations in the security knowledge layer graph. Another part is the default information obtained in Step 1, which is x unknow Def .
4.
According to the operator logic in the event sequence, execute the cypher statements of the event in sequence in neo4j, and complete policy consistency detection through the judgment conditions of the response.
The algorithm representation of the above process is shown in Algorithm 1.
Algorithm 1: Algorithm for consistency detection of a single policy.
Applsci 14 08415 i001
For consistency detection of combination policies, based on the logical relationships between sub-policies in the detection logic layer graph, and after completing the detection of sub-policies, the detection results of the sub-policies are combined to form the detection result of the combination policy.
Here is a simple example of policy detection logic. The text description of the policy is ‘All entrance and exit cameras in the computer room must have night vision function’.
Let us first input some necessary information into the knowledge graph. For physical security policies, the network information layer graph focuses on the physical space of the policy and the information of network hardware in that physical space. So, we collected the actual situation and added nodes and relationships as shown in Figure 3 to the network information layer graph.
For hardware-related physical security policies, the security knowledge layer graph focuses on hardware information related to the policy, which comes from the operating manuals of various hardware manufacturers or existing knowledge graphs in related fields. Therefore, we collected this information and added nodes and relationships, as shown in Figure 4, to the security knowledge layer graph.
The logical dependency layer graph focuses on semantic parsing functions and default information. For this policy, the semantic parsing function to be utilized is to systematically explain the natural language structure of “hardware x should have y function”. Functions can be written to implement the following functions: if the y function associated with hardware x is not in the network information layer graph and the y function associated with hardware x is in the security knowledge layer graph, find the function associated with y function in the security knowledge layer graph using the “‘include’ regularization form”. This function can be used to search for the three sub-functions of “night vision function” in the security knowledge layer graph, namely “visible night vision”, “infrared night vision”, and “thermal imaging night vision”. These are represented using cypher-like statements, as follows:
match(n:surveillance_camera_projects_knowledge)-[:include]->(m:
  surveillance_camera_projects_knowledge) where n.name="
  exfunction_night_vision" return m;
The detection logic layer graph focuses on the construction of formal language, executing semantic parsing functions in the logical dependency layer, and finding the three sub-functions of “night vision function”. After formalizing the policy text, the event sequence and response measures obtained are the following:
Event1: match(n:door)-[:locate_door_camera]->(a:camera) where sum(a.
  exfunction_visible_night_vision,a.exfunction_infrared_night_vision
  ,a.exfunction_thermal_imaging_night_vision)>=1 return a;
Event2: match(n:door)-[:locate_door_camera]->(b:camera) return b;
Event Sequences: Event1*Event2;
Response: if a==b return 1 else return 0;
By following the rules to execute the event sequence and response measures, consistency detection can be completed.
Examples of common security policy language descriptions and corresponding event sequence and response are as follows. These examples demonstrate the feasibility of using cypher-like statements to represent policies.
For the policy “Device A has disabled protocol B”, the corresponding cypher statement is as follows:
Event1: match (n:equipment)-[:Disable_Protocol]->(m:Protocol) where n
  .name="A" AND m.name="B" return m ;// Network Information Layer 
  Graph
Event Sequences: Event1
Response: if Event1.result ==NULL return "no" else return "yes" //
  Rule-based Matching Detection
For the policy “Device A needs to complete action C every time T”, the corresponding cypher statement is as follows:
Event1: match (n:equipment)-[: execute]->(m:action) where n.name="A" 
  AND m.name="C" return m.Log_location ; // Network Information 
  Layer Graph
Event Sequences: Event1 ;
Response: Time_interval_query_script(Event1.result, T) //Feature-
  Oriented Test Case Classification Detection
For the policy “Device A can support function E by monitoring status information D”, the corresponding cypher statement is as follows:
Event1: match(n:equipment)-[:monitor_information]-> (m:information) 
  where n.name="A" AND m.name="D" return m ;// Network Information 
  Layer Graph
Event2: match(n:equipment)-[:have_ function]-> (m:function) where n.
  name="A" AND m.name="E" return m ;// Network Information Layer 
  Graph
Event3: match(n: function)-[: supported]-> (m: information) where n.
  name="E" AND m.name="D" return m ;// Security Knowledge Layer 
  Graph
Event Sequences: Event1* Event2* Event3
Response: if Event1.result !=NULL AND Event2.result !=NULL AND Event3
  .result !=NULL return "yes" else return "no" //Rule-based Matching 
  Detection
For the policy “Device A can work normally under the condition of temperature T”, the corresponding cypher statement is as follows:
Event1: match(n:equipment)-[: require]-> (m:use_condition) where n.
  name="E" return m ;// Security Knowledge Layer Graph
Event Sequences: Event1
Response: if Event1.result.min_temperature<=T<=Event1.result.
  max_temperature return "yes" else return "no" //Rule-based 
  Matching Detection
For the policy ‘Personnel H cannot access database I’, the corresponding cypher statement is as follows:
Event1: match(n:personnel)-[:allow_access_to]-> (m:database) where n.
  name="H" AND m.name="I" return m ;// Network Information Layer 
  Graph
Event Sequences: Event1
Response: if Event1.result==NULL return "yes" else return "no" //Rule
  -based Matching Detection
For the policy ‘Personnel H requires credentials J to access database I’, the corresponding cypher statement is as follows:
Event1: match(n:personnel)-[:allow_access_to]-> (m:database) where n.
  name="H" AND m.name="I" return m ;// Network Information Layer 
  Graph
Event2: match(n:personnel)-[:have_credential]->(L:credential)-[:
  allow_access_to]-> (m:database) where n.name="H" AND m.name="I" 
  AND L.name="J" return m ;// Network Information Layer Graph
Event Sequences: Event1*Event2
Response: if Event1.result==NULL AND Event2.result!=NULL return "yes" 
  else return "no" //Rule-based Matching Detection

6. Other Applications

In terms of policy consistency detection, the knowledge graph constructed in the fourth part can not only perform automated policy consistency detection, but also has some other functional applications: reasonable policy correction and use case design based on consistency detection, and intelligent inference based on the knowledge graph for inference and judgment.

6.1. Reasonable Policy Correction and Use Case Design Based on Consistency Detection

Firstly, the graph structure of the knowledge graph has the function of easily and intuitively recording node paths, which enables the automated process of using the knowledge graph for policy consistency detection to be recorded in the graph. Management personnel can intuitively see the entire logical path of the detection process, so as to adjust the unreasonable aspects of the detection logic layer graph and logical dependency layer graph.
Secondly, for semantic parsing functions with a unified definition and security knowledge layer graphs with a large amount of high-quality information, if there are multiple modifications to the relevant information of the detection logic layer graph and inconsistent consistency detection results of the policy are found, then the policy text itself may have ambiguity. If no matter how the relevant information of the network information layer graph is modified according to the actual situation, the policy cannot pass consistency detection, then the policy may have unreasonable aspects.
In addition to the irrationality of the policy itself, there may also be conflicts between policies, which can be divided into shadowing, redundancy, exception, and correlation [22]. Conflict mining based on knowledge graphs is different from conflict mining based on policy trees. Conflict mining based on knowledge graphs will perform consistency detection and conflict mining at the same time. When two particular policies have the same detection path and result in the graph for all original log data, we can consider that there is shadowing or redundancy between these two policies. When two policies have the same detection path in the graph for all original log data, but the detection results are always opposite, we can consider that there is an exception between these two policies. When two policies always have an inclusion relationship in the detection path for all original log data, but the detection results are always opposite, we can consider that there is a correlation between these two policies.
Finally, for the consistency check of running operational service security policies, it is often necessary to create well-designed test cases. The security knowledge layer graph can help us determine whether some test cases are effective or even generate good test cases. Generating test cases is a reverse engineering process to determine whether the test cases are effective. Here, we will only introduce the process of the former:
1.
The PoC or exploit of the attack feature events generates traffic in the network environment and traverses and matches the attack feature events in the network information layer graph.
2.
Feature extraction for attack events can be performed as follows: (1) separate malicious nodes from other nodes; (2) extract all possible types of network basic events; (3) calculate the farthest distance from a malicious node to other nodes; (4) analyze the ring information and temporal information of events in the graph; (5) extract circular event sequence information and general event sequence information separately; (6) encapsulate the above information (including maximum distance, starting node, key event, event chain path, event chain loop logic, and other information in a five tuple data) and store it in the message queue.
3.
Feature matching is performed in the security knowledge layer graph, and attack event mining is carried out to obtain single-step attack events. Aggregating single-step attack events in the graph can obtain multi-step attack events. For attack event mining perform the following steps: (1) extract quintuple data from the message queue; (2) set time window and other search parameters; (3) search for suspicious nodes; (4) identify malicious nodes and record attack events.
4.
Multi-step attack event correlation analysis can restore attack scenarios through the following steps: (1) construct a time relationship graph of attack events; (2) determine the correlation between attack events based on similarity; (3) starting from each node, search for possible multi-step attacks in the time relationship graph of attack events using a depth first search approach; (4) sesign rules to deduplicate the discovered multi-step attacks.
5.
Corresponding scenarios and policies.

6.2. Intelligent Inference Based on Knowledge Graph for Reasoning Judgement

The intelligent inference function of knowledge graph extension depends on the algorithm, and some applicable algorithms and their roles in the knowledge graph are as follows.
The centrality algorithm is applied to detect the importance of each policy in the logic layer graph. The centrality algorithm is applied in the detection logic layer graph because each pair of nodes associated with a policy number in the graph corresponds to a policy. By individually or comprehensively using centrality, intermediate centrality, and proximity centrality indicators, the centrality of each node in the graph can be calculated. In other words, the larger the centrality value of a node, the more it is connected to other nodes. The policy corresponding to that node is more widely connected in all of our policies and is more likely to become a sub-policy of a composite policy or a pre-policy of a large number of policies. The importance level can be used as an entity attribute annotation in the detection logic layer graph, helping us to prioritize finding possible sub-policies or pre-policies when clarifying the logical relationships between sub-policies of a combination policy. In practice, batch detection of a large set of policies can save a lot of time by prioritizing the detection of highly important policies.
The graph propagation algorithm is applied to detect the distributed features of policies and network topology in the logic layer graph and network information layer graph. The graph propagation algorithm does not have a clear paradigm or goal. In practice, it uses a labeled graph propagation algorithm that aims to discover overlapping communities. This graph propagation algorithm can find highly concentrated nodes (entities) in the detection logic layer graph and the network information layer graph. Nodes within the community are highly connected, but their connections with other nodes in the graph are relatively rare. In some tasks outside the focused community, the differences between all nodes in a community can be considered small as a whole. In some tasks within the focused community, the relationship between “in” and “out” communities can be extended to all nodes in the community. In the detection logic layer graph, the graph propagation algorithm can discover the detection subject community of the policy, discover a class of detection subjects with similar detection targets, and their detection processes will have similarities. They can call a few detection cases with different parameters or configuration controls to complete their detection logic writing and detection operations in batches. Similarly, it is possible to discover the detection target community of the policy, discover a class of detection targets with similar detection subjects, and complete the detection in batches. Long-term memory can be achieved by adding the class numbers that can be batch detected as attributes to the entity attributes of the detection logic layer graph. The application in the network information layer graph is also similar; finding a similar attack method, a device model with the same characteristics, a vulnerability with the same principle, etc., can all be propagated using this graph algorithm.
Introduce standards in the security knowledge layer graph and network information layer graph, and apply the rule-based inference algorithm AMIE to conduct risk assessment of network situation. Risk assessment of network situation can compare the vulnerability set, attack feature set, security event set, and other existing vulnerability sets in the security knowledge layer graph to predict the network nodes or dangerous data flows that may encounter security risks in the future in the network information layer graph. If these potential security risk network nodes or dangerous data flows are under the concern of our policy, then labeling them before detection and conducting key detection can achieve prevention in advance. The network situation can also be used for risk assessment to identify conflicts between the security knowledge layer graph and the detection logic layer graph. After manual review, it can be determined whether to modify incorrect knowledge information in the security knowledge layer graph or modify unreasonable policies in the detection logic layer graph.
Using hybrid inference ConMask to complete the open world knowledge graph improves the security knowledge layer graph and detection logic layer graph, and enables the knowledge graph to have the ability to respond to unknown new entities from the outside.

7. Experiment and Analysis

To verify the effectiveness of the policy consistency detection method in this article, simulation experiments were conducted based on the university dataset [23]. This dataset is often used to generate Attribute-Based Access Control (ABAC) policies, covering 43 categories, 8000 access control logs for allowed categories, and 8000 access control logs for prohibited categories. There are two advantages to using this dataset for experiments:
  • The method for generating a four-layer knowledge graph in this experiment is as follows: the log data used to generate the ABAC policy dataset are highly formalized, and a simple regular expression method can be used to input the policy entities and attributes of these logs into the knowledge graph, forming a simple network information layer graph. The access control problem is relatively simple, so there is no need to design a logical dependency layer graph in this experiment. The security knowledge layer graph only needs to record the attribute information of the 43 categories involved in the policy entity, without worrying about the specific meaning of the attributes. The ABAC policy has the characteristics of decision trees, which can be used to write automated Python scripts to organize and rewrite the judgment logic into a BMSL like language with cypher syntax, quickly generating detection logic layer diagrams.
  • This dataset belongs to the classic dataset for generating ABAC policy problems and is suitable for various algorithms that automatically generate ABAC policies. Therefore, in this experiment, these algorithms can be used to generate a variety of ABAC policies, which have different attribute scales, policy scales, and access control sets for attribute ratings, making it convenient to design various simulation experiments.
This article designs two experiments to evaluate the effectiveness of policy consistency detection, namely the attribute scale and detection rate experiment, and the policy attribute importance experiment.

7.1. Attribute Scale and Detection Rate Experiment

For the university dataset, a recursive attribute elimination method based on machine learning classifiers is used to construct a policy attribute selector. For the same policy attribute selector, a binary search-based policy generation optimization algorithm is used to obtain ABAC policy sets with different attribute sizes. For ABAC policy sets with different attribute sizes, after generating policy texts in natural language form corresponding to the policy set using a universal large language model, the method proposed in this paper is used to perform policy consistency detection on the original dataset, and the policy detection rate and policy coverage rate are compared. The calculation methods for policy detection rate and policy coverage are S D = A D N ( L ) / A N ( L ) and S C = A C N ( L ) / A N ( L ) , where S D and S C represent policy detection rate and policy coverage, L represents access control logs, A N ( L ) represents the total number of all category log records, A D N ( L ) represents the number of logs that use the consistency check method in this article to make permission judgments on all category log records after the policy generation is completed and the judgment result is correct, and A C N ( L ) represents the number of logs that make permission judgments on all category log records during the policy generation process and the judgment result is correct. The smaller the difference between policy detection rate and policy coverage, the better the consistency detection effect of this method.
As shown in Figure 5, the following experimental results can be summarized:
1.
With the increase in attribute size, the policy detection rate and policy coverage rate also significantly improve, and the increase rate is fast at first and then slow, which is in line with the characteristics of access control problems. When the attribute size reaches 14, both the policy detection rate and policy coverage rate can be stabilized at over 95%. After the attribute scale reached 39, both the policy detection rate and policy coverage rate reached 100%.
2.
As the attribute size increases, the difference between policy detection rate and policy coverage rate continues to decrease, which also means that the accuracy of policy consistency detection is improving. When the attribute size reaches 21, the policy detection rate and policy coverage rate begin to be consistent.
The following conclusion can be drawn: for the consistency detection of access control policies generated by logs, this method can achieve good results, and the larger the size of policy attributes, the smaller the impact of noise in the original information in the logs on the detection rate, and the better the consistency detection effect.
Figure 5. Variation Images of Policy Detection Rate and Policy Coverage for Different Attribute Scale.
Figure 5. Variation Images of Policy Detection Rate and Policy Coverage for Different Attribute Scale.
Applsci 14 08415 g005

7.2. Experiment on the Importance of Policy Attributes

As can be seen from the previous experiment, although the larger the size of policy attributes, the higher the detection rate and coverage rate will increase. However, when the size of the policy is large, their growth rate will significantly decrease (even partially stop). In real network security scenarios, policy attributes that are too large in scale will also bring huge workload in design and operation. The process of this experiment is as follows: First, the automatic generation method of the dataset is used to obtain the optimal policy set with the largest attribute size [24], and a corresponding detection logic layer graph is established. Then, the centrality algorithm is used in the detection logic layer graph to rank the importance of the policy attributes. Finally, the least important policy attributes are removed one by one, and the policy detection rate S s i g n i f i c a n c e is calculated. At the same time, it is compared with the detection rate S b e s t of the corresponding size of the best policy set generated by the log. The smaller the difference between S s i g n i f i c a n c e and S b e s t , the more reasonable the importance ranking.
The centrality algorithm used in this experiment is the load centrality algorithm, and the design logic of the load centrality algorithm is very close to the idea of “sequentially removing the least important policy attributes” in this experiment [25,26].
The experimental results are shown in Figure 6, and can be summarized as follows:
1.
As the number of attributes being reduced increases, the difference between S s i g n i f i c a n c e and S b e s t first increases and then decreases. In this experiment, when 28 attributes are reduced, that is, when the attribute size is 15, the difference between S s i g n i f i c a n c e and S b e s t reaches the maximum, which is 0.035625, currently, S s i g n i f i c a n c e is 0.964375, the difference between S s i g n i f i c a n c e and S b e s t divided by S s i g n i f i c a n c e is 3.6941%.
2.
When there are very few or very many attributes to be reduced, that is, when the scale of attributes is extremely large or very small, S s i g n i f i c a n c e is almost the same as S b e s t . In this case, using the load centrality algorithm can efficiently find the least important policy attribute or the most important policy attribute.
The following conclusion can be drawn: for the access control policy generated by logs, the use of load centrality algorithm in this method can assist in sorting the importance of policy attributes. In particular, when the attribute size is extremely large or very small, the load centrality algorithm can accurately find the least important or most important policy attributes.
Figure 6. Attribute importance experiments for policies.
Figure 6. Attribute importance experiments for policies.
Applsci 14 08415 g006

8. Conclusions

In response to the current situation where policy consistency detection still remains at the level of expert systems and manual detection, this paper proposes a knowledge graph-based approach for network security policy consistency detection. This article focuses on describing the construction ideas and usage methods of this method in knowledge graphs, and introduces some feasible extension functions. This method has the potential for automation, scalability, and systematization in the future.
The ontology model of security policy knowledge is a major constraint of this method. The establishment of the ontology model itself requires relevant experts to have long-term industry experience and a deep understanding of internal network structure, security monitoring projects, events, and business scenarios. In the short term, it can only rely on manual writing by experts and practitioners. It is hoped that in the future, there will be some intelligent or formal methods to assist in the process of establishing the ontology model.

Author Contributions

Conceptualization, Y.C. and T.H.; methodology, Y.C.; software, Y.C.; validation, Y.C., T.H., F.L., M.Y. and G.W.; formal analysis, Y.C.; investigation, Y.C., M.Y., T.Z. and G.W.; resources, Y.C., T.H., F.L., T.Z. and G.W.; data curation, Y.C. and T.Z.; writing—original draft, Y.C.; Writing—review and editing, Y.C. and T.H.; visualization, Y.C. and F.L.; supervision, T.H., F.L. and M.Y.; project administration, T.H., F.L., M.Y. and H.W.; funding acquisition, H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Presidential Foundation of CAEP (Grant No. YZJJZQ2023026).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hu, J.; Liang, X.; Bo, Y.; Xia, C. The consistency verification of Computer Network Defense Policy and measures. In Proceedings of the 2012 World Congress on Information and Communication Technologies, Trivandrum, India, 30 October–2 November 2012; pp. 1052–1055. [Google Scholar] [CrossRef]
  2. Li, L.; Wu, S.; Huang, L.; Wang, W. Research on modeling for network security policy confliction based on network topology. In Proceedings of the 2017 14th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 15–17 December 2017; pp. 36–41. [Google Scholar] [CrossRef]
  3. Yin, Y.; Tateiwa, Y.; Zhang, G.; Wang, Y. Consistency Decision Between IPv6 Firewall Policy and Security Policy. In Proceedings of the 2021 4th International Conference on Information Communication and Signal Processing (ICICSP), Shanghai, China, 24–26 September 2021; pp. 577–581. [Google Scholar] [CrossRef]
  4. Ji, S.; Pan, S.; Cambria, E.; Marttinen, P.; Yu, P.S. A Survey on Knowledge Graphs: Representation, Acquisition, and Applications. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 494–514. [Google Scholar] [CrossRef] [PubMed]
  5. Wang, Q.; Mao, Z.; Wang, B.; Guo, L. Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Trans. Knowl. Data Eng. 2017, 29, 2724–2743. [Google Scholar] [CrossRef]
  6. Feng, T.; Wu, Y.; Li, L. Research on Knowledge Graph Completion Based Upon Knowledge Graph Embedding. In Proceedings of the 2024 9th International Conference on Computer and Communication Systems (ICCCS), Xi’an, China, 19–22 April 2024; pp. 1335–1342. [Google Scholar] [CrossRef]
  7. Michael, J.B.; Sibley, E.H.; Baum, R.F.; Li, F. On the axiomatization of security policy: Some tentative observations about logic representation. In Proceedings of the Sixth Working Conference on DATABASE SECURITY, Vancouver, BC, Canada, 19–21 August 1992; p. 401. [Google Scholar]
  8. Michael, J.B. A Formal Process for Testing the Consistency of Composed Security Policies; George Mason University: Fairfax, VA, USA, 1993. [Google Scholar]
  9. Cholvy, L.; Cuppens, F. Analyzing consistency of security policies. In Proceedings of the 1997 IEEE Symposium on Security and Privacy (Cat. No. 97CB36097), Oakland, CA, USA, 4–7 May 1997; pp. 103–112. [Google Scholar]
  10. Sekar, R.; Sekar, P.U. Synthesizing Fast Intrusion {Prevention/Detection} Systems from {High-Level} Specifications. In Proceedings of the 8th USENIX Security Symposium (USENIX Security 99), Washington, DC, USA, 23–36 August 1999. [Google Scholar]
  11. Uppuluri, P. Intrusion Detection/Prevention Using Behavior Specifications; State University of New York at Stony Brook: New York, NY, USA, 2003. [Google Scholar]
  12. Xin-Feng, L.; Jun, L.; Jun-Mo, X.; Hai-Gang, Z.; Yi-Dan, Z. Visual System of Formal Specification and Verification of Security Policy. Comput. Eng. 2008, 34, 162–164. [Google Scholar]
  13. Krombi, W.; Erradi, M.; Khoumsi, A. Automata-based approach to design and analyze security policies. In Proceedings of the 2014 Twelfth Annual International Conference on Privacy, Security and Trust, Toronto, ON, Canada, 23–24 July 2014; pp. 306–313. [Google Scholar]
  14. Shen, L.; Wang, Z.; Zhang, X.; Gu, J. Study on the policy conflict detection in the security management model. In Proceedings of the 2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Ningbo, China, 19–22 September 2015; pp. 1–5. [Google Scholar]
  15. Liu, A.; Du, X.; Wang, N.; Wang, X.; Wu, X.; Zhou, J. Implement security analysis of access control policy based on constraint by SMT. In Proceedings of the 2022 IEEE 5th International Conference on Electronics Technology (ICET), Chengdu, China, 13–16 May 2022; pp. 1043–1049. [Google Scholar]
  16. Singhal, A. Introducing the Knowledge Graph: Things, Not Strings; Google 2012. Available online: https://blog.google/products/search/introducing-knowledge-graph-things-not/ (accessed on 15 September 2024).
  17. Hogan, A.; Blomqvist, E.; Cochez, M.; d’Amato, C.; Melo, G.D.; Gutierrez, C.; Kirrane, S.; Gayo, J.E.L.; Navigli, R.; Neumaier, S.; et al. Knowledge graphs. ACM Comput. Surv. (CSUR) 2021, 54, 1–37. [Google Scholar] [CrossRef]
  18. Noel, S.; Harley, E.; Tam, K.H.; Limiero, M.; Share, M. CyGraph: Graph-based analytics and visualization for cybersecurity. In Handbook of Statistics; Elsevier: Amsterdam, The Netherlands, 2016; Volume 35, pp. 117–167. [Google Scholar]
  19. Aksu, M.U.; Bicakci, K.; Dilek, M.H.; Ozbayoglu, A.M.; Tatli, E.ı. Automated generation of attack graphs using NVD. In Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy, Tempe, AZ, USA, 19–21 March 2018; pp. 135–142. [Google Scholar]
  20. Tikhomirov, M.; Loukachevitch, N.; Sirotina, A.; Dobrov, B. Using bert and augmentation in named entity recognition for cybersecurity domain. In Proceedings of the Natural Language Processing and Information Systems: 25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020, Saarbrücken, Germany, 24–26 June 2020; Proceedings 25. Springer: Berlin/Heidelberg, Germany, 2020; pp. 16–24. [Google Scholar]
  21. Jones, C.L.; Bridges, R.A.; Huffer, K.M.; Goodall, J.R. Towards a relation extraction framework for cyber-security concepts. In Proceedings of the 10th Annual Cyber and Information Security Research Conference, Oak Ridge, TN, USA, 7–9 April 2015; pp. 1–4. [Google Scholar]
  22. Khelf, R.; Ghoualmi, N. Intra and inter policy conflicts dynamic detection algorithm. In Proceedings of the 2017 Seminar on Detection Systems Architectures and Technologies (DAT), Algiers, Algeria, 20–22 February 2017; pp. 1–6. [Google Scholar]
  23. Mocanu, D.; Turkmen, F.; Liotta, A. Towards ABAC policy mining from logs with deep learning. In Proceedings of the 18th International Multiconference, IS2015, Intelligent Systems, Ljubljana, Slovenia, 28 September–1 October 2015. [Google Scholar]
  24. Kuang, W.; Chan, Y.L.; Tsang, S.H.; Siu, W.C. Machine learning-based fast intra mode decision for HEVC screen content coding via decision trees. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 1481–1496. [Google Scholar] [CrossRef]
  25. Song, Z.; Duan, H.; Ge, Y.; Qiu, X. A novel measure of centrality based on betweenness. In Proceedings of the 2015 Chinese Automation Congress (CAC), Wuhan, China, 27–29 November 2015; pp. 174–178. [Google Scholar]
  26. Bloch, F.; Jackson, M.O.; Tebaldi, P. Centrality measures in networks. Soc. Choice Welf. 2023, 61, 413–453. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the methodology for constructing a knowledge graph of network security policies.
Figure 1. Flowchart of the methodology for constructing a knowledge graph of network security policies.
Applsci 14 08415 g001
Figure 2. Flowchart of the methodology for policy consistency detection based on knowledge graphs.
Figure 2. Flowchart of the methodology for policy consistency detection based on knowledge graphs.
Applsci 14 08415 g002
Figure 3. Case study of network information layer graph.
Figure 3. Case study of network information layer graph.
Applsci 14 08415 g003
Figure 4. Case study of security knowledge layer graph.
Figure 4. Case study of security knowledge layer graph.
Applsci 14 08415 g004
Table 1. Some important information involved in the four types of policies in the network information layer graph.
Table 1. Some important information involved in the four types of policies in the network information layer graph.
Policy TypeImportant Information in the Policy
physical security policyThe location, model, parameters, functions, logs, environmental sensor records and manual records of the equipment, personnel management and usage relationships of the equipment, etc.
operational service security policyThe type of device, the IP address of the device in the network, the MAC address of the network card where the device is located, the open ports and possible services that the device may run, the operating system type of the device, all possible CPE information on the device, the vulnerabilities and risks of the device, the network segment where the device is located, etc.
data security policyObjects at all levels of classified permission management; the process, effectiveness, potential risks, etc., of encrypted communication
content security policyVersion, parameters, functions, efficiency, audit rules, audit use cases, etc., of content audit software
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Hu, T.; Lou, F.; Yin, M.; Zeng, T.; Wu, G.; Wang, H. A Knowledge Graph-Based Consistency Detection Method for Network Security Policies. Appl. Sci. 2024, 14, 8415. https://doi.org/10.3390/app14188415

AMA Style

Chen Y, Hu T, Lou F, Yin M, Zeng T, Wu G, Wang H. A Knowledge Graph-Based Consistency Detection Method for Network Security Policies. Applied Sciences. 2024; 14(18):8415. https://doi.org/10.3390/app14188415

Chicago/Turabian Style

Chen, Yaang, Teng Hu, Fang Lou, Mingyong Yin, Tao Zeng, Guo Wu, and Hao Wang. 2024. "A Knowledge Graph-Based Consistency Detection Method for Network Security Policies" Applied Sciences 14, no. 18: 8415. https://doi.org/10.3390/app14188415

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop