MDPI - Publisher of Open Access Journals

29 pages, 3613 KB

Open AccessArticle

CyberKG: Constructing a Cybersecurity Knowledge Graph Based on SecureBERT_Plus for CTI Reports

by Binyong Li, Qiaoxi Yang, Chuang Deng and Hua Pan

Informatics 2025, 12(3), 100; https://doi.org/10.3390/informatics12030100 - 22 Sep 2025

Viewed by 698

Cyberattacks, especially Advanced Persistent Threats (APTs), have become more complex. These evolving threats challenge traditional defense systems, which struggle to counter long-lasting and covert attacks. Cybersecurity Knowledge Graphs (CKGs), enabled through the integration of multi-source CTI, introduce novel approaches for proactive defense. However, [...] Read more.

Cyberattacks, especially Advanced Persistent Threats (APTs), have become more complex. These evolving threats challenge traditional defense systems, which struggle to counter long-lasting and covert attacks. Cybersecurity Knowledge Graphs (CKGs), enabled through the integration of multi-source CTI, introduce novel approaches for proactive defense. However, building CKGs faces challenges such as unclear terminology, overlapping entity relationships in attack chains, and differences in CTI across sources. To tackle these challenges, we propose the CyberKG framework, which improves entity recognition and relation extraction using a SecureBERT_Plus-BiLSTM-Attention-CRF joint architecture. Semantic features are captured using a domain-adapted SecureBERT_Plus model, while temporal dependencies are modeled through BiLSTM. Attention mechanisms highlight key cross-sentence relationships, while CRF incorporates ATT&CK rule constraints. Hierarchical clustering (HAC), based on contextual embeddings, facilitates dynamic entity disambiguation and semantic fusion. Experimental evaluations on the DNRTI and MalwareDB datasets demonstrate strong performance in extraction accuracy, entity normalization, and the resolution of overlapping relations. The constructed knowledge graph supports APT tracking, attack-chain provenance, proactive defense prediction. Full article

► Show Figures

Figure 1

20 pages, 3728 KB

Open AccessArticle

Research on Large Language Model-Based Automatic Knowledge Extraction for Coal Mine Equipment Safety

by Ziheng Zhang, Rijia Ding, Yinhang Liu and He Ma

Symmetry 2025, 17(9), 1490; https://doi.org/10.3390/sym17091490 - 9 Sep 2025

Viewed by 592

Abstract

Structured knowledge representation is of great significance for constructing a knowledge graph of coal mine equipment safety. However, traditional methods encounter substantial difficulties when handling the complex semantics and domain-specific terms in technical texts. To tackle this challenge, we propose a knowledge extraction [...] Read more.

Structured knowledge representation is of great significance for constructing a knowledge graph of coal mine equipment safety. However, traditional methods encounter substantial difficulties when handling the complex semantics and domain-specific terms in technical texts. To tackle this challenge, we propose a knowledge extraction framework that integrates large language models (LLMs) with prompt engineering to achieve the efficient joint extraction of information. This framework strengthens the traditional triple structure by introducing symmetric entity-type information encompassing the head entity type and the tail entity type. Furthermore, it enables simultaneous entity recognition and relation extraction within a unified model. Experimental results demonstrate that the proposed knowledge extraction framework significantly outperforms the traditional step-by-step approach of first extracting entities and then relations. To meet the requirements of actual industrial production, we verified the impacts of different prompt strategies, as well as small lightweight models and large complex models, on the extraction task. Through multiple sets of comparative experiments, we found that the Chain-of-Thought (CoT) prompting strategy can effectively improve performance across different models, and the choice of model architecture has a significant impact on task performance. Our research provides an accurate and scalable solution for knowledge graph construction in the coal mine equipment safety domain, and its symmetry-aware design exhibits great potential for cross-domain knowledge transfer. Full article

(This article belongs to the Special Issue Symmetry and Asymmetry in Natural Language Processing)

► Show Figures

Figure 1

17 pages, 2751 KB

Open AccessArticle

Joint Extraction of Cyber Threat Intelligence Entity Relationships Based on a Parallel Ensemble Prediction Model

by Huan Wang, Shenao Zhang, Zhe Wang, Jing Sun and Qingzheng Liu

Sensors 2025, 25(16), 5193; https://doi.org/10.3390/s25165193 - 21 Aug 2025

Viewed by 880

Abstract

The construction of knowledge graphs in cyber threat intelligence (CTI) critically relies on automated entity–relation extraction. However, sequence tagging-based methods for joint entity–relation extraction are affected by the order-dependency problem. As a result, overlapping relations are handled ineffectively. To address this limitation, a [...] Read more.

The construction of knowledge graphs in cyber threat intelligence (CTI) critically relies on automated entity–relation extraction. However, sequence tagging-based methods for joint entity–relation extraction are affected by the order-dependency problem. As a result, overlapping relations are handled ineffectively. To address this limitation, a parallel, ensemble-prediction–based model is proposed for joint entity–relation extraction in CTI. The joint extraction task is reformulated as an ensemble prediction problem. A joint network that combines Bidirectional Encoder Representations from Transformers (BERT) with a Bidirectional Gated Recurrent Unit (BiGRU) is constructed to capture deep contextual features in sentences. An ensemble prediction module and a triad representation of entity–relation facts are designed for joint extraction. A non-autoregressive decoder is employed to generate relation triad sets in parallel, thereby avoiding unnecessary sequential constraints during decoding. In the threat intelligence domain, labeled data are scarce and manual annotation is costly. To mitigate these constraints, the SecCti dataset is constructed by leveraging ChatGPT’s small-sample learning capability for labeling and augmentation. This approach reduces annotation costs effectively. Experimental results show a 4.6% absolute F1 improvement over the baseline on joint entity–relation extraction for threat intelligence concerning Advanced Persistent Threats (APTs) and cybercrime activities. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

19 pages, 3172 KB

Open AccessArticle

RASD: Relation Aware Spectral Decoupling Attention Network for Knowledge Graph Reasoning

by Zheng Wang, Taiyu Li and Zengzhao Chen

Appl. Sci. 2025, 15(16), 9049; https://doi.org/10.3390/app15169049 - 16 Aug 2025

Viewed by 676

Abstract

Knowledge Graph Reasoning (KGR) aims to deduce missing or novel knowledge by learning structured information and semantic relationships within Knowledge Graphs (KGs). Despite significant advances achieved by deep neural networks in recent years, existing models typically extract non-linear representations from explicit features in [...] Read more.

Knowledge Graph Reasoning (KGR) aims to deduce missing or novel knowledge by learning structured information and semantic relationships within Knowledge Graphs (KGs). Despite significant advances achieved by deep neural networks in recent years, existing models typically extract non-linear representations from explicit features in a relatively simplistic manner and fail to fully exploit semantic heterogeneity of relation types and entity co-occurrence frequencies. Consequently, these models struggle to capture critical predictive cues embedded in various entities and relations. To address these limitations, this paper proposes a relation aware spectral decoupling attention network for KGR (RASD). First, a spectral decoupling attention network module projects joint embeddings of entities and relations into the frequency domain, extracting features across different frequency bands and adaptively allocating attention at the global level to model frequency specific information. Next, a relation-aware learning module employs relation aware filters and an augmentation mechanism to preserve distinct relational properties and suppress redundant features, thereby enhancing representation of heterogeneous relations. Experimental results demonstrate that RASD achieves significant and consistent improvements over multiple leading baseline models on link prediction tasks across five public benchmark datasets. Full article

► Show Figures

Figure 1

20 pages, 4177 KB

Open AccessArticle

Joint Entity–Relation Extraction for Knowledge Graph Construction in Marine Ranching Equipment

by Du Chen, Zhiwu Gao, Sirui Li, Xuruixue Guo, Yaqi Wu, Haiyu Zhang and Delin Zhang

Appl. Sci. 2025, 15(13), 7611; https://doi.org/10.3390/app15137611 - 7 Jul 2025

Viewed by 632

Abstract

The construction of marine ranching is a crucial component of China’s Blue Granary strategy, yet the fragmented knowledge system in marine ranching equipment impedes intelligent management and operational efficiency. This study proposes the first knowledge graph (KG) framework tailored for marine ranching equipment, [...] Read more.

The construction of marine ranching is a crucial component of China’s Blue Granary strategy, yet the fragmented knowledge system in marine ranching equipment impedes intelligent management and operational efficiency. This study proposes the first knowledge graph (KG) framework tailored for marine ranching equipment, integrating hybrid ontology design, joint entity–relation extraction, and graph-based knowledge storage: (1) The limitations in existing KG are obtained through targeted questionnaires for diverse users and employees; (2) A domain ontology was constructed through a combination of the top-down and the bottom-up approach, defining seven key concepts and eight semantic relationships; (3) Semi-structured data from enterprises and standards, combined with unstructured data from the literature were systematically collected, cleaned via Scrapy and regular expression, and standardized into JSON format, forming a domain-specific corpus of 1456 annotated sentences; (4) A novel BERT-BiGRU-CRF model was developed, leveraging contextual embeddings from BERT, parameter-efficient sequence modeling via BiGRU (Bidirectional Gated Recurrent Unit), and label dependency optimization using CRF (Conditional Random Field). The TE + SE + R_i + BMESO tagging strategy was introduced to address multi-relation extraction challenges by linking theme entities to secondary entities; (5) The Neo4j-based KG encapsulated 2153 nodes and 3872 edges, enabling scalable visualization and dynamic updates. Experimental results demonstrated superior performance over BiLSTM-CRF and BERT-BiLSTM-CRF, achieving 86.58% precision, 77.82% recall, and 81.97% F1 score. This study not only proposes the first structured KG framework for marine ranching equipment but also offers a transferable methodology for vertical domain knowledge extraction. Full article

(This article belongs to the Section Marine Science and Engineering)

► Show Figures

Figure 1

21 pages, 3691 KB

Open AccessArticle

A Syntax-Aware Graph Network with Contrastive Learning for Threat Intelligence Triple Extraction

by Zhenxiang He, Ziqi Zhao and Zhihao Liu

Symmetry 2025, 17(7), 1013; https://doi.org/10.3390/sym17071013 - 27 Jun 2025

Viewed by 734

Abstract

As Advanced Persistent Threats (APTs) continue to evolve, constructing a dynamic cybersecurity knowledge graph requires precise extraction of entity–relationship triples from unstructured threat intelligence. Existing approaches, however, face significant challenges in modeling low-frequency threat associations, extracting multi-relational entities, and resolving overlapping entity scenarios. [...] Read more.

As Advanced Persistent Threats (APTs) continue to evolve, constructing a dynamic cybersecurity knowledge graph requires precise extraction of entity–relationship triples from unstructured threat intelligence. Existing approaches, however, face significant challenges in modeling low-frequency threat associations, extracting multi-relational entities, and resolving overlapping entity scenarios. To overcome these limitations, we propose the Symmetry-Aware Prototype Contrastive Learning (SAPCL) framework for joint entity and relation extraction. By explicitly modeling syntactic symmetry in attack-chain dependency structures and its interaction with asymmetric adversarial semantics, SAPCL integrates dependency relation types with contextual features using a type-enhanced Graph Attention Network. This symmetry–asymmetry fusion facilitates a more effective extraction of multi-relational triples. Furthermore, we introduce a triple prototype contrastive learning mechanism that enhances the robustness of low-frequency relations through hierarchical semantic alignment and adaptive prototype updates. A non-autoregressive decoding architecture is also employed to globally generate multi-relational triples while mitigating semantic ambiguities. SAPCL was evaluated on three publicly available CTI datasets: HACKER, ACTI, and LADDER. It achieved F1-scores of 56.63%, 60.21%, and 53.65%, respectively. Notably, SAPCL demonstrated a substantial improvement of 14.5 percentage points on the HACKER dataset, validating its effectiveness in real-world cyber threat extraction scenarios. By synergizing syntactic–semantic multi-feature fusion with symmetry-driven dynamic representation learning, SAPCL establishes a symmetry–asymmetry adaptive paradigm for cybersecurity knowledge graph construction, thus enhancing APT attack tracing, threat hunting, and proactive cyber defense. Full article

(This article belongs to the Special Issue Symmetry and Asymmetry in Artificial Intelligence for Cybersecurity)

► Show Figures

Figure 1

23 pages, 2937 KB

Open AccessArticle

Domain-Specific Knowledge Graph for Quality Engineering of Continuous Casting: Joint Extraction-Based Construction and Adversarial Training Enhanced Alignment

by Xiaojun Wu, Yue She, Xinyi Wang, Hao Lu and Qi Gao

Appl. Sci. 2025, 15(10), 5674; https://doi.org/10.3390/app15105674 - 19 May 2025

Cited by 1 | Viewed by 559

Abstract

The intelligent development of continuous casting quality engineering is an essential step for the efficient production of high-quality billets. However, there are many quality defects that require strong expertise for handling. In order to reduce reliance on expert experience and improve the intelligent [...] Read more.

The intelligent development of continuous casting quality engineering is an essential step for the efficient production of high-quality billets. However, there are many quality defects that require strong expertise for handling. In order to reduce reliance on expert experience and improve the intelligent management level of billet quality knowledge, we focus on constructing a Domain-Specific Knowledge Graph (DSKG) for the quality engineering of continuous casting. To achieve joint extraction of billet quality defects entity and relation, we propose a Self-Attention Partition and Recombination Model (SAPRM). SAPRM divides domain-specific sentences into three parts: entity-related, relation-related, and shared features, which are specifically for Named Entity Recognition (NER) and Relation Extraction (RE) tasks. Furthermore, for issues of entity ambiguity and repetition in triples, we propose a semi-supervised incremental learning method for knowledge alignment, where we leverage adversarial training to enhance the performance of knowledge alignment. In the experiment, in the knowledge extraction part, the NER and RE precision of our model achieved 86.7% and 79.48%, respectively. RE precision improved by 20.83% compared to the baseline with sequence labeling method. Additionally, in the knowledge alignment part, the precision of our model reached 99.29%, representing a 1.42% improvement over baseline methods. Consequently, the proposed model with the partition mechanism can effectively extract domain knowledge, cand the semi-supervised method can take advantage of unlabeled triples. Our method can adapt the domain features and construct a high-quality knowledge graph for the quality engineering of continuous casting, providing an efficient solution for billet defect issues. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

17 pages, 612 KB

Open AccessArticle

ETFRE: Entity–Type Fusing for Relation Extraction

by Peilin Shi, Bin Zhang, Yingkun Liu and Cheng Fang

Electronics 2025, 14(1), 205; https://doi.org/10.3390/electronics14010205 - 6 Jan 2025

Cited by 1 | Viewed by 1227

Abstract

This paper proposes a relational extraction framework based on entity–type information fusion by Transformer model. Relational extraction, as an important part of knowledge graph construction, has been paid much attention in recent years. The existing relational extraction and joint triple extraction models rarely [...] Read more.

This paper proposes a relational extraction framework based on entity–type information fusion by Transformer model. Relational extraction, as an important part of knowledge graph construction, has been paid much attention in recent years. The existing relational extraction and joint triple extraction models rarely use the existing entity–type information, so the semantic features of the entity–type are lost, resulting in limited model performance and difficulty in solving the ambiguity problem. In order to improve this situation, this paper proposes a framework of entity–type information fusing based on a Transformer, which can generate word vector representation with entity–type information for a specific domain. There may be different entity categories for the same word, and the corresponding relationship categories are different at that time. Through deep self-attention, word vector representation is rich in entity–type information, which benefits relationship extraction and ambiguity removal. The multi-layer Transformer is used to realize the interaction between text features and generate a deep word vector representation with entity–type information, thus effectively avoiding ambiguity. Experimental results show that our model outperforms existing methods and performs well in ambiguous contexts relative to other models. We highlight the importance of entity–types in relation extraction. Full article

(This article belongs to the Special Issue Big Data Analytics and Information Technology for Smart Cities and Citizen Wellbeing)

► Show Figures

Figure 1

27 pages, 4546 KB

Open AccessArticle

Risk Assessment of Typhoon Disaster Chain Based on Knowledge Graph and Bayesian Network

by Yimin Lu, Shiting Qiao and Yiran Yao

Sustainability 2025, 17(1), 331; https://doi.org/10.3390/su17010331 - 4 Jan 2025

Cited by 6 | Viewed by 1990

Abstract

Typhoon disasters not only trigger secondary disasters, such as rainstorms and flooding, but also bring many negative impacts on the normal operation of urban infrastructure and the safety of people’s lives and property. In order to effectively prevent the risks of typhoon disaster [...] Read more.

Typhoon disasters not only trigger secondary disasters, such as rainstorms and flooding, but also bring many negative impacts on the normal operation of urban infrastructure and the safety of people’s lives and property. In order to effectively prevent the risks of typhoon disaster chain, this paper proposes a joint entity and relation extraction model based on RoBERTa-Adv-GPLinker. Then, relying on the ontology theory and methodology, we establish a knowledge graph of typhoon disaster chain. The results show that the joint extraction model based on RoBERTa-Adv-GPLinker outperforms other baseline models in all assessment indexes. Moreover, the constructed knowledge graph of typhoon disaster chain includes secondary disasters and derived disaster impacts. This can largely depict the evolution process of typhoon disaster secondary derivations. On this basis, a risk assessment model of typhoon disaster chain based on Bayesian network is established. Taking Fujian Province as an example, the risk associated with the typhoon disaster chain is assessed, verifying the effectiveness of the method. This study provides a scientific basis for enhancing government emergency response capabilities and achieving sustainable regional development. Full article

► Show Figures

Figure 1

22 pages, 1599 KB

Open AccessArticle

Single-Stage Entity–Relation Joint Extraction of Pesticide Registration Information Based on HT-BES Multi-Dimensional Labeling Strategy

by Chenyang Dong, Shiyu Xi, Yinchao Che, Shufeng Xiong, Xinming Ma, Lei Xi and Shuping Xiong

Algorithms 2024, 17(12), 559; https://doi.org/10.3390/a17120559 - 6 Dec 2024

Viewed by 853

Abstract

Pesticide registration information is an essential part of the pesticide knowledge base. However, the large amount of unstructured text data that it contains pose significant challenges for knowledge storage, retrieval, and utilization. To address the characteristics of pesticide registration text such as high [...] Read more.

Pesticide registration information is an essential part of the pesticide knowledge base. However, the large amount of unstructured text data that it contains pose significant challenges for knowledge storage, retrieval, and utilization. To address the characteristics of pesticide registration text such as high information density, complex logical structures, large spans between entities, and heterogeneous entity lengths, as well as to overcome the challenges faced when using traditional joint extraction methods, including triplet overlap, exposure bias, and redundant computation, we propose a single-stage entity–relation joint extraction model based on HT-BES multi-dimensional labeling (MD-SERel). First, in the encoding layer, to address the complex structural characteristics of pesticide registration texts, we employ RoBERTa combined with a multi-head self-attention mechanism to capture the deep semantic features of the text. Simultaneously, syntactic features are extracted using a syntactic dependency tree and graph neural networks to enhance the model’s understanding of text structure. Subsequently, we integrate semantic and syntactic features, enriching the character vector representations and thus improving the model’s ability to represent complex textual data. Secondly, in the multi-dimensional labeling framework layer, we use HT-BES multi-dimensional labeling, where the model assigns multiple labels to each character. These labels include entity boundaries, positions, and head–tail entity association information, which naturally resolves overlapping triplets. Through utilizing a parallel scoring function and fine-grained classification components, the joint extraction of entities and relations is transformed into a multi-label sequence labeling task based on relation dimensions. This process does not involve interdependent steps, thus enabling single-stage parallel labeling, preventing exposure bias and reducing computational redundancy. Finally, in the decoding layer, entity–relation triplets are decoded based on the predicted labels from the fine-grained classification. The experimental results demonstrate that the MD-SERel model performs well on both the Pesticide Registration Dataset (PRD) and the general DuIE dataset. On the PRD, compared to the optimal baseline model, the training time is 1.2 times faster, the inference time is 1.2 times faster, and the F1 score is improved by 1.5%, demonstrating its knowledge extraction capabilities in pesticide registration documents. On the DuIE dataset, the MD-SERel model also achieved better results compared to the baseline, demonstrating its strong generalization ability. These findings will provide technical support for the construction of pesticide knowledge bases. Full article

(This article belongs to the Special Issue Algorithms for Feature Selection (3rd Edition))

► Show Figures

Figure 1

16 pages, 2966 KB

Open AccessArticle

Integrated Extraction of Entities and Relations via Attentive Graph Convolutional Networks

by Chuhan Gao, Guixian Xu and Yueting Meng

Electronics 2024, 13(22), 4373; https://doi.org/10.3390/electronics13224373 - 8 Nov 2024

Cited by 1 | Viewed by 1350

Abstract

For information security, entity and relation extraction can be applied in sensitive information protection, data leakage detection, and other aspects. The current approaches to entity relation extraction not only ignore the relevance and dependency between name entity recognition and relation extraction but also [...] Read more.

For information security, entity and relation extraction can be applied in sensitive information protection, data leakage detection, and other aspects. The current approaches to entity relation extraction not only ignore the relevance and dependency between name entity recognition and relation extraction but also may result in the cumulative propagation of errors. To solve this problem, it is proposed that an end-to-end joint entity and relation extraction model based on the Attention mechanism and Graph Convolutional Network (GCN) to simultaneously extract named entities and their relationships. The model includes three parts: the detection of entity span, the construction of an entity relation weighted graph, and the inference of entity relation type. Firstly, the detection of entity spans is viewed as a sequence labeling problem, and a multi-feature fusion approach for word embedding representation is designed to calculate all entity spans in a sentence to form an entity span matrix. Secondly, the entity span matrix is employed in the Multi-Head Attention mechanism for constructing the weighted adjacency matrix of the entity relation graph. Finally, for the inference of entity relation type, considering the interaction between entities and relations, the entity span matrix and relation connection matrix are simultaneously fed into the GCN for integrated extraction of entities and relations. Our model is evaluated on the public NYT dataset, attaining a precision of 66.4%, a recall of 63.1%, and an F1 score of 64.7% for joint entity and relation extraction, significantly outperforming other approaches. Experiments demonstrate that the proposed model is helpful for inferring entities and relations, considering the interaction between entities and relations through the Attention mechanism and GCN. Full article

(This article belongs to the Special Issue Network Security Management in Heterogeneous Networks)

► Show Figures

Figure 1

20 pages, 7344 KB

Open AccessArticle

Research on a Joint Extraction Method of Track Circuit Entities and Relations Integrating Global Pointer and Tensor Learning

by Yanrui Chen, Guangwu Chen and Peng Li

Sensors 2024, 24(22), 7128; https://doi.org/10.3390/s24227128 - 6 Nov 2024

Viewed by 1203

Abstract

To address the issue of efficiently reusing the massive amount of unstructured knowledge generated during the handling of track circuit equipment faults and to automate the construction of knowledge graphs in the railway maintenance domain, it is crucial to leverage knowledge extraction techniques [...] Read more.

To address the issue of efficiently reusing the massive amount of unstructured knowledge generated during the handling of track circuit equipment faults and to automate the construction of knowledge graphs in the railway maintenance domain, it is crucial to leverage knowledge extraction techniques to efficiently extract relational triplets from fault maintenance text data. Given the current lag in joint extraction technology within the railway domain and the inefficiency in resource utilization, this paper proposes a joint extraction model for track circuit entities and relations, integrating Global Pointer and tensor learning. Taking into account the associative characteristics of semantic relations, the nesting of domain-specific terms in the railway sector, and semantic diversity, this research views the relation extraction task as a tensor learning process and the entity recognition task as a span-based Global Pointer search process. First, a multi-layer dilate gated convolutional neural network with residual connections is used to extract key features and fuse the weighted information from the 12 different semantic layers of the RoBERTa-wwm-ext model, fully exploiting the performance of each encoding layer. Next, the Tucker decomposition method is utilized to capture the semantic correlations between relations, and an Efficient Global Pointer is employed to globally predict the start and end positions of subject and object entities, incorporating relative position information through rotary position embedding (RoPE). Finally, comparative experiments with existing mainstream joint extraction models were conducted, and the proposed model’s excellent performance was validated on the English public datasets NYT and WebNLG, the Chinese public dataset DuIE, and a private track circuit dataset. The F1 scores on the NYT, WebNLG, and DuIE public datasets reached 92.1%, 92.7%, and 78.2%, respectively. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

16 pages, 2015 KB

Open AccessArticle

CJE-PCHF: Chinese Joint Entity and Relation Extraction Model Based on Progressive Contrastive Learning and Heterogeneous Feature Fusion

by Meng He, Yunli Bai and Dongye Wei

Appl. Sci. 2024, 14(20), 9256; https://doi.org/10.3390/app14209256 - 11 Oct 2024

Viewed by 1403

Abstract

The joint extraction of entities and relations is a critical task in information extraction, and its performance directly affects the performance of downstream tasks. However, existing joint extraction models based on deep learning exhibit weak processing capabilities for the phenomenon of multiple pronunciations [...] Read more.

The joint extraction of entities and relations is a critical task in information extraction, and its performance directly affects the performance of downstream tasks. However, existing joint extraction models based on deep learning exhibit weak processing capabilities for the phenomenon of multiple pronunciations of one character and multiple characters of one pronunciation when processing Chinese texts, resulting in a performance loss. To address these issues, this paper introduces part-of-speech (POS) and pinyin features to aid the model in learning semantic features that are more contextually appropriate. We propose a Chinese Joint Entity and Relation Extraction Model based on progressive contrastive learning and heterogeneous feature fusion (CJE-PCHF). During model training, an interactive fusion network based on progressive contrastive learning is employed to learn the dependencies between pinyin, POS, and semantic features. This guides the model in heterogeneous feature fusion, capturing higher-order semantic associations between heterogeneous features. On the commonly used DuIE evaluation dataset for joint extraction, our model achieved a significant improvement, with the F1 score increasing by 5.4% compared to the benchmark model CasRel. Full article

► Show Figures

Figure 1

19 pages, 1900 KB

Open AccessArticle

BVTED: A Specialized Bilingual (Chinese–English) Dataset for Vulnerability Triple Extraction Tasks

by Kai Liu, Yi Wang, Zhaoyun Ding, Aiping Li and Weiming Zhang

Appl. Sci. 2024, 14(16), 7310; https://doi.org/10.3390/app14167310 - 20 Aug 2024

Cited by 1 | Viewed by 1643

Abstract

Extracting knowledge from cyber threat intelligence is essential for understanding cyber threats and implementing proactive defense measures. However, there is a lack of open datasets in the Chinese cybersecurity field that support both entity and relation extraction tasks. This paper addresses this gap [...] Read more.

Extracting knowledge from cyber threat intelligence is essential for understanding cyber threats and implementing proactive defense measures. However, there is a lack of open datasets in the Chinese cybersecurity field that support both entity and relation extraction tasks. This paper addresses this gap by analyzing vulnerability description texts, which are standardized and knowledge-dense, to create a vulnerability knowledge ontology comprising 13 entities and 15 relations. We annotated 27,311 unique vulnerability description sentences from the China National Vulnerability Database, resulting in a dataset named BVTED for cybersecurity knowledge triple extraction tasks. BVTED contains 97,391 entities and 69,614 relations, with entities expressed in a mix of Chinese and English. To evaluate the dataset’s value, we trained five deep learning-based named entity recognition models, two relation extraction models, and two joint entity–relation extraction models on BVTED. Experimental results demonstrate that models trained on this dataset achieve excellent performance in vulnerability knowledge extraction tasks. This work enhances the extraction of cybersecurity knowledge triples from mixed Chinese and English threat intelligence corpora by providing a comprehensive ontology and a new dataset, significantly aiding in the mining, analysis and utilization of the knowledge embedded in cyber threat intelligence. Full article

(This article belongs to the Special Issue State-of-the-Art of Network Attack Detection and Situation Awareness Analysis)

► Show Figures

Figure 1

17 pages, 1054 KB

Open AccessArticle

Integration of Relation Filtering and Multi-Task Learning in GlobalPointer for Entity and Relation Extraction

by Bin Liu, Jialin Tao, Wanyuan Chen, Yijie Zhang, Min Chen, Lei He and Dan Tang

Appl. Sci. 2024, 14(15), 6832; https://doi.org/10.3390/app14156832 - 5 Aug 2024

Cited by 2 | Viewed by 2124

Abstract

The rise of knowledge graphs has been instrumental in advancing artificial intelligence (AI) research. Extracting entity and relation triples from unstructured text is crucial for the construction of knowledge graphs. However, Chinese text has a complex grammatical structure, which may lead to the [...] Read more.

The rise of knowledge graphs has been instrumental in advancing artificial intelligence (AI) research. Extracting entity and relation triples from unstructured text is crucial for the construction of knowledge graphs. However, Chinese text has a complex grammatical structure, which may lead to the problem of overlapping entities. Previous pipeline models have struggled to address such overlap problems effectively, while joint models require entity annotations for each predefined relation in the set, which results in redundant relations. In addition, the traditional models often lead to task imbalance by overlooking the differences between tasks. To tackle these challenges, this research proposes a global pointer network based on relation prediction and loss function improvement (GPRL) for joint extraction of entities and relations. Experimental evaluations on the publicly available Chinese datasets DuIE2.0 and CMeIE demonstrate that the GPRL model achieves a 1.2–26.1% improvement in F1 score compared with baseline models. Further, experiments of overlapping classification conducted on CMeIE have also verified the effectiveness of overlapping triad extraction and ablation experiments. The model is helpful in identifying entities and relations accurately and can reduce redundancy by leveraging relation filtering and the global pointer network. In addition, the incorporation of a multi-task learning framework balances the loss functions of multiple tasks and enhances task interactions. Full article

► Show Figures

Figure 1

Search Results (49)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (49)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI