Entity Linking Model Based on Cascading Attention and Dynamic Graph

Li, Hongchan; Li, Chunlei; Sun, Zhongchuan; Zhu, Haodong

doi:10.3390/electronics13193845

Open AccessArticle

Entity Linking Model Based on Cascading Attention and Dynamic Graph

School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450002, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(19), 3845; https://doi.org/10.3390/electronics13193845

Submission received: 27 August 2024 / Revised: 21 September 2024 / Accepted: 26 September 2024 / Published: 28 September 2024

Download

Browse Figures

Versions Notes

Abstract

:

The purpose of entity linking is to connect entity mentions in text to real entities in the knowledge base. Existing methods focus on using the text topic, entity type, linking order, and association between entities to obtain the target entities. Although these methods have achieved good results, they ignore the exploration of candidate entities, leading to insufficient semantic information among entities. In addition, the implicit relationship and discrimination within the candidate entities also affect the accuracy of entity linking. To address these problems, we introduce information about candidate entities from Wikipedia and construct a graph model to capture implicit dependencies between different entity decisions. Specifically, we propose a cascade attention mechanism and develop a novel local entity linkage model termed CAM-LEL. This model leverages the interaction between entity mentions and candidate entities to enhance the semantic representation of entities. Furthermore, a global entity linkage model termed DG-GEL based on a dynamic graph is established to construct an entity association graph, and a random walking algorithm and entity entropy are used to extract the implicit relationships within entities to increase the differentiation between entities. Experimental results and in-depth analyses of multiple datasets show that our model outperforms other state-of-the-art models.

Keywords:

attention mechanism; random walk; deep learning; entity linking

1. Introduction

In the modern digital era, an immense volume of data is stored in the network, and most of these data exist in the form of text. However, there is a common ambiguity in text language, which means that one word has multiple meanings, and multiple words have one meaning. For example, “Li Na” may mean a singer, tennis player, or artist. The task of entity linking aims to accurately link entity mentions in textual information with the existing entity information in the knowledge base. Therefore, entity linking is a fundamental task in wide domains, such as knowledge question-answering [1,2,3], information extraction [4,5,6], and knowledge base expansion [7,8,9].

The task of entity linking consists of three stages. The first is entity mention detection, which detects entity mentions in a given text. The second is candidate entity detection, which matches a set of candidate entities for each entity mentioned in the knowledge base. Finally, entity disambiguation is conducted by calculating the correlation score between entity mentions and candidate entities and selecting the entity with the highest score as the final disambiguation result. In entity-linking research, researchers improve the relevance between entity references and candidate entities with information about potential relationships between entities and entity types.

Ganea et al. [10] proposed an approach to capture information about the context of the entity that is related to candidate entities in the local association of entities through a soft-attention mechanism. Meanwhile, Yang et al. [11], inspired by Ganea et al. [10], used the same computational approach, with the difference that they combined hard and soft-attention mechanisms to extract information about entities that have been linked successfully. The dynamic accumulation technique is proposed to accelerate the reasoning of subsequent entity mention disambiguation using dynamically accumulated entity information, which provided a large improvement in time and space. Le et al. [12] proposed to consider potential relationships between entities as hidden variables and optimize entity links using holistic optimization under unsupervised conditions. Cao et al. [13] generated entity representations using autoregression by training a large amount of data with the help of the entity information provided by Wikipedia, effectively extracting associations between contexts and entities. Hou et al. [14] injected the fine-grained semantic information into the embedding of entities to improve the entity information and bring similar types of entities closer to the embedding space. The existing entity linking methods primarily focus on the data from the text requiring disambiguation, the entity type, and the designation of the entity, neglecting to pay attention to candidate entities, which leads to insufficient structural information between entities. More semantic features can be injected into the model by combining the description information of entities with the association information between different entities.

Meanwhile, global entity linking performs joint disambiguation of all occurrences of entities in the text. Unlike local entity disambiguation, global entity linking not only focuses on the similarity between individual entity mentions and candidate entities but also considers the interrelationships and semantic consistency among all entity mentions in the whole text. In the traditional global entity linking approach, researchers have embedded the relationships between entities and entities in association graphs to improve the accuracy of entity linking through knowledge aggregation, strong association information transfer, and stable knowledge representation. Cao et al. [15] applied the CNN network [16] to integrate local context information and coherent information of entity neighboring subgraphs for global entity linking and selected the most relevant information for propagation. Wu et al. [17] proposed entity mapping between disambiguated documents and knowledge graphs through knowledge aggregation of entity nodes to address the problem of adaptive changes in each aggregation layer when dealing with entity relations in GCNs [18]. Shikhar et al. [19] jointly embedded entity nodes and entity relations into a relation graph to learn more about knowledge representations. Guo et al. [20] used a stable distribution representation of entity semantic information after restarting random walking to capture infrequent entity semantic features to improve the learning effect further. The above global entity linking approach using graph models emphasizes the relationship between candidate entities and how their decision-making is influenced within a shrinking scope. However, the lack of entity description information in the global entity linking approach causes the implied relationships between entities to be insufficiently extracted. At the same time, erroneous target entities propagate incorrect correlation information during the random walking process [21], causing highly correlated entities to be greatly affected, further reducing the degree of discrimination within the candidate entities.

To solve the limitations above, this paper proposes a novel model for entity linking based on a cascade attention mechanism and a dynamic graph. Specifically, we first design a cascade attention-based model to capture richer semantic information about entity mentions and candidate entities by modeling their context, increasing the structural information between entity mentions and candidate entities, and enhancing the relevance of embedded semantic features between entities. Second, an entity association graph is constructed to represent the semantic associations between candidate entities in the same document so that the enhanced entity dependency information can be propagated in the graph as an aid to entity decision, and the implicit relationships between entities can be completely extracted. After obtaining the global values of candidate entities, for the set of candidate entities where incorrect dissemination of entity information results in low discrimination of candidate entity values, we introduce entity entropy to select the target entity that is currently more reasonable.

The contributions of this paper are summarized as follows:

The introduction of external candidate entity description information is proposed to improve the semantic representation of candidate entities and increase the semantic relevance of candidate entities and entity mentions.
Establish a local entity linking model based on a cascade attention mechanism so that the model internally focuses on the entity context and interaction information of candidate entities, which improves the characterization of entity embedding and thus further improves the inference accuracy of the model.
Construct a global entity association graph, use the random walking strategy of the graph to learn the global evaluation indexes of the candidate entities, introduce the entropy value of the entities to select the best target entities, and dynamically realize the linking work of all entity mentions.

2. Related Work

The entity linking technique consists of two main phases: local entity linking and global entity linking. As follows: given a document D, which contains entity mention context

W = {w_{1}, w_{2}, w_{3}, \dots w_{s}}

, a set of entity mentions

M = {m_{1}, m_{2}, m_{3}, \dots m_{K}}

, and a set of the corresponding set of candidate entities

E (m_{i}) = \{e_{i 1}, e_{i 2}, e_{i 3}, \dots e_{i n}\}

for each entity mention

m_{K}

. The initial score of each candidate entity

e_{i j}

is obtained by linking local entities, after which global entity linking is performed using the association characteristics between the entity mentions, and the highest ranked entity is used as the final target entity.

The local entity linking disambiguates a single entity based on the context in which the entity is mentioned and the semantic features of the candidate entity. Local entity linking uses several features such as (1) the prior probability of an entity mention,

\hat{p} (e | m)

as the empirical distribution of entity mentions, (2) the semantic similarity between the candidate entity and the entity mention context, and (3) the relevance of the entity mention and the candidate entity. Early local entity linking Francis-Landau et al. [22] used convolutional neural networks with different granularities to obtain semantic correspondences between contexts and target entities, and combined them with topic information to achieve entity linking in a reasonable way, which improves the results of Durrett et al. [23], who used only documents, entity mentions, and target entity mentions using coarse- and fine-grained convolutional contexts to obtain complete information, which does not provide complete information. Ganea et al. [10], in order to address the limitations of mention-entity graphs for long document entity linking, introduced a neural attention mechanism of entity embeddings and context windows to extract the most relevant context vectors of the Top

K

for each entity mention, which is then combined with features such as prior probability to obtain a local score

Ψ (e, m, c)

. Hou et al. [14] proposed that most of the existing entity embeddings are learned from sci-fi encyclopedias and contexts of entities which lack contextual generality, and to solve this problem, embeddings of semantically-typed words are introduced. Chen et al. [24] argued that the potential entity type near the entities mentioned is ignored. The injection of information about the potential entity type into the entity embedding model constructed by Bert enhances the entity linking effect to solve this problem. Due to the limitation of local entity linking, the correlation between entities is not considered, so the researchers proposed the global entity linking method.

The global entity linking technique exploits the global consistency of all entities in a document to capture the correlation between entity mentions and the target entity, and to use candidate entity information mentioned by other entities to help link the current entity. Currently, a more efficient approach to global entity linking is to construct an entity connectivity graph [25], which enables entity linking by calculating the similarity between two nodes. Ganea et al. [10] use CRF [26] to train predictions of semantic relationships between entity embeddings and employ a collection of circular confidence propagation phases to score candidate entities in LBP [27] messaging, which is ultimately used to score candidate entities using the prior probability and marginal distribution to obtain the final score of the candidate entity. Le et al. [12] build on Ganea by considering the relationship between different entity mentions in the same document as a latent variable and employing it unsupervised to induce the relationship to obtain the similarity score between entities

Φ (e_{i}, e_{j}, D)

. Xue et al. [28] propose that most existing decisions on entity linking rely on neural networks for automated modeling; therefore, the authors first use a recurrent random walk network to obtain candidate entity correlations for global entity mentions while introducing an external knowledge base to enhance the correlation dependency of decisions between entity mentions. To model the semantic links between candidate entities in the same document, Hu et al. [29] propose to construct a graph structure based on the relationships between entities and candidate entities. They use graph convolutional networks (GCNs) to generate encodings that represent the global semantics of a document. This approach aims to overcome the lack of effective semantic encoding when global entity linking is traditionally performed using Conditional Random Fields (CRF). Rama Maneiro et al. [30] filtered the set of candidate entities for each entity mentioned based on context compatibility, used a two-way graph expansion strategy to reduce the time used to construct the entity linkage graph, and finally disambiguated by the importance of the candidate entity nodes in the linkage graph and the semantic relevance of the candidate entity nodes to the entity mentions. Hou et al. [31] proposed linking anonymous entity mentions to corresponding entities in the knowledge base to resolve the ambiguity of entity mentions. Zhang et al. [32] proposed two adaptive features to train the weights of the entity context words and the types of valid entities through feedforward neural networks to capture the latent semantic information and the valid entity type information to improve the linking effect.

In current entity-linking research, most scholars focus mainly on improving the semantic embedding of entity-type embedding or entity context, resulting in insufficient information about candidate entities. In addition, after constructing the entity association graph, a random walking strategy is usually used to query the nodes to be linked from a single vertex; however, this approach fails to effectively deal with the problem of low discrimination of scores within candidate entities. Logeswaran et al. [33] used the Transformer architecture to encode text to allow entity mentions and candidate entities to focus on each other. Broscheit [34] used Transformer to enhance the representation of entities, and the architecture showed advanced performance on entity linking tasks. Therefore, this study employs the Transformer’s encoder to process the Token in the embedding module to extract more relevant semantic features. After constructing the entity association graph, we introduce an improved random-walk strategy that focuses on key nodes in the graph for dynamic entity link prediction. This approach aims to improve the accuracy and efficiency of entity linking.

3. Research Methods

In the Encoder part of the Transformer, the multi-head self-attention mechanism and feed-forward neural network are the core building blocks. The multi-head self-attention mechanism not only establishes long-distance dependencies between entities at different locations and mines deep semantic relationships but also enables the parallel processing of attention computation and significantly accelerates the training process. Therefore, this study adopts the method based on the attention mechanism to extract feature information. In the local entity linking task, the descriptive information of candidate entities and their contexts is first encoded using a multi-head self-attentive encoder to obtain contextual information and candidate entity descriptions that are highly relevant to entity mentions. Then, a cross-attention mechanism is used to further fuse the features to extract richer semantic information. In addition, we use the local entity linking scores as the initial scores of the global entity association graph and introduce the Random Walk (RW) strategy to mine the interdependence between entity mentions, select the current best target entity through entity entropy, and update the entity association graph structure to complete the linking work of all entities step by step.

3.1. Local Entity Linking Based on Attention Mechanisms

The local entity linking framework proposed in this paper is based on the CAM-LEL (Cascade Attention Mechanisms Local Entity Linking) model, the structure of which is shown in Figure 1. The three core modules are the entity embedding module, the cascade attention feature extraction module, and the multidimensional feature fusion module. The cascade attention mechanism consists of the joint composition of multiple attention and cross-attention. In multiple attention mechanisms, key description text is focused by increasing the weights of the words in the candidate entity descriptions that are highly relevant to the entity mentions. Then, the cross-attention mechanism enables deep interaction between the entity mentioned context and the candidate entities, through which the combined information of both parties is obtained, and richer and finer semantic features are extracted in different subspaces.

3.1.1. Entity Embedding Layer

In the entity embedding layer, we mainly use pre-trained word vector models such as Word2vec [35] and GloVe [36] to convert each token in the text into an h-dimensional vector representation. For each entity mention

m_{i}

, its context information

W = {w_{1}, w_{2}, w_{3}, \dots w_{s}}

is mapped to the corresponding vector representation

x_{w_{i}}

. At the same time, the set of candidate entities

E (m_{i}) = \{e_{i 1}, e_{i 2}, e_{i 3}, \dots e_{i n}\}

, for the entity mention

m_{i}

and the entity

e_{i j}

description information

D = {d_{i 1,} d_{i 2}, d_{i 3}, \dots d_{i k}}

are also transformed into the corresponding vector representation xdi. In this process, initial word embedding vectors are obtained by training entity mentions, entity contexts, and candidate entity description texts using the Word2vec model, where each word is represented by an h-dimensional vector.

The position and order of words play a key role in natural textual language; they determine the grammatical structure of the sentence and thus affect the actual meaning of the sentence. CNN uses a convolutional kernel to obtain the relative positional information between words. RNN [37] introduces positional as well as ordering information of words by passing the sequence information step by step through the cyclic structure, but there is also a problem of long-term dependency as the length of the sequence increases. Therefore, we consider using Transformer’s positional encoding approach for each word. The positional encoding function is defined as follows:

P E (p o s, 2 i) = s i n (\frac{p o s}{10000^{\frac{2 i}{d_{m o d e l}}}}),

(1)

P E (p o s, 2 i + 1) = c o s (\frac{p o s}{10000^{\frac{2 i}{d_{m o d e l}}}}),

(2)

where

p o s

is the position of the word,

d_{m o d e l}

is the position vector representation for each dimension, and

i

is a single dimension of the vector.

3.1.2. Fusion Coding Layer Based on Cascading Attention Mechanism

The Cascade Attention Mechanism module efficiently encodes the information of entity mentions and candidate entities. In the cascade attention mechanism, we first use the multi-head attention mechanism to focus on the textual features of entities in different subspaces to speed up the training of the model, and then the cross-attention mechanism is introduced so that the entity mentioning the contextual context and the explanatory text of the candidate entity pay attention to each other to obtain richer interactive information and improve the similarity assessment of the text.

In the design, the entity mention context

x_{a} \in R^{l_{a} * d}

and the candidate entity document

x_{b} \in R^{l_{b} * d}

are inputted into the encoder, where

l_{i}

denotes the maximum sequence length and

d

denotes the word embedding vector dimension. The core of this encoder is multi-head self-attention, which allows the model to process information in parallel in several different subspaces, with each head independently focusing on a different aspect of the input document, which extracts text-focused information while also increasing the speed of parallel training of the model.

Firstly, in the multi-head self-attention mechanism, the input positional and textual encodings are transformed into a query matrix, a key matrix, and a value matrix by means of different linear transformations. These transformations are independent of each attention head, allowing each attention head to focus on different aspects of the input data. The output of the self-attention is then computed using the scaled dot product attention. Finally, we splice the self-attention outputs of multiple attention heads, during which we change the original direct summing to also incorporate the result of the product and difference in the elements into the feature representation and send the spliced vectors to a feed-forward neural network with two layers, where Relu is used as the activation function. This process can be described by the following equations:

{A t t}_{i} = s o f t m a x (\frac{Q_{i} {{(K}_{i})}^{T}}{\sqrt{d_{k}}}) V_{i},

(3)

M_{s} = [{A t t}_{1} \oplus \dots \oplus {A t t}_{h}],

(4)

H_{s} = r e l u (M_{s} W_{1} + b_{1}) W_{2} + b_{2},

(5)

where

Q_{i} \in R^{d_{k} * d_{k}}

,

K_{i} \in R^{d_{k} * d_{k}}

,

V_{i} \in R^{d_{k} * d_{k}}

are the mapping matrices;

h

denotes the number of attention headers;

d_{k}

denotes the dimensionality of the embedding layer;

W_{1} \in R^{d * l}

,

W_{1} \in R^{l * d}

where

l

denotes the size of the hidden layer, and

b_{1}

,

b_{2}

are the bias entries;

\oplus

denotes the join operation;

H_{s}

denotes the feature after encoding by the current self-attentive encoder.

In the computation of in formulas 3, the query matrix, key matrix, and value matrix of an entity mention context and candidate entity description are computed by:

Q_{i} = X_{a} W_{i}^{q},

(6)

K_{i} = X_{a} W_{i}^{k},

(7)

V_{i} = X_{a} W_{i}^{v},

(8)

where the parameter matrices

W_{i}^{q} \in R^{d * d_{k}}

,

W_{i}^{k} \in R^{d * d_{k}}

,

W_{i}^{v} \in R^{d * d_{k}}

, and

X_{a}

that a location code is added to the entity mention context

x_{a}

, and the candidate entities are used in the same way.

In the multi-head attentional coding layer, we were able to successfully extract key features in the context of entity mentions and candidate entity descriptions. However, the use of twin networks to encode sentences individually resulted in a lack of interaction features between texts. In order to address this issue, we introduce the cross-attention mechanism, which can effectively capture the correlation features between entity-mention contexts and candidate entity descriptions. Intertext interaction not only improves the similarity assessment of texts but also significantly improves the accuracy of the local linking task.

We input the vector representations via the multi-head self-attention mechanism into the cross-attention separately for feature fusion coding to extract the interaction information and dependencies between texts. In the cross-attention mechanism, Q and K come from different textual representations to enable the interaction of cross-textual information. Specifically, if Q denotes the contextual semantics of the entity, K and V denote the descriptive text features of the candidate entity and vice versa. With this setup, cross-attention not only extracts the interaction features between two texts but also deepens the model’s understanding of the relationship between entities. The cross-attention mechanism computation process is similar to the self-attention computation, in particular, the matrices Q, K and V represent different textual contents.

After the cross-attention module, we have entities mentioning the context and candidate entities, describing the interaction features of the text after the semantic representations of the interaction features are fed into the pooling layer. The pooling layer is divided into two ways: maximum pooling and average pooling. After the analysis, our model combines the maximum pooling strategy and the average pooling strategy, and obtains the fixed-length vector representations

x_{{w_{i}}^{'}}

and

{x_{d_{i j}}}^{'}

of both.

3.1.3. Candidate Entity Relevance Ranking

In this paper, we combine statistical and semantic features to calculate the correlation between entity mentions and candidate entities. The statistical features include the string similarity of candidate entities and entity mentions as well as the a priori probability of candidate entities.

The edit distance is the calculation of the number of edits between two strings that are converted to each other after insertion, deletion, and replacement operations. In this paper, the edit distance of two strings is compared to represent the string similarity of candidate entities and entity mentions, and the edit distance is calculated as:

L e v (m_{i}, e_{i j}) = 1 - \frac{e d i t (m_{i}, e_{i j})}{m a x l e n (m_{i}, e_{i j})},

(9)

where

e d i t

indicates the number of edits between two strings, and

m a x l e n

indicates the maximum value of the length of two strings.

The a priori probability counts the likelihood that a candidate entity will become a target entity without taking into account the entity’s mention of context and other characteristics. In this paper, we use a calculation method similar to that of Phan et al. [38]. By analyzing data from Wikipedia page titles, redirect pages, anchor texts, and other large corpora, the probability of co-occurrence of all candidate entities in the set of candidate entities Ei for an entity mention mi is calculated as follows:

P r i o r (e_{i j}| m_{i}) = \frac{c o u n t (e_{i j}, m_{i})}{\sum_{e_{i j} \in E} c o u n t (e_{i j}, m_{i})},

(10)

where

c o u n t

represents the co-occurrence frequency of candidate entities and entity mentions, and

E

represents the set of candidate entities.

For the semantic features in this paper, we refer to the approach of Francis-Landau et al. [22] with appropriate modifications. Specifically, the entity mention vector

x_{m_{i}}

, the entity mention document

x_{c_{i}}

, and the candidate entity vector representation

x_{e_{i}}

are passed through the multi-head attention mechanism before obtaining the final representation. Meanwhile, the entity mentions context representation

x_{{w_{i}}^{'}}

, and the candidate entity description representation

{x_{d_{i}}}^{'}

achieves further interaction and semantic enhancement through cascading attention fusion layer. In addition, we employ cosine similarity to compute the similarity between these representations as a way to evaluate the semantic associations between entities. The calculation formula is as follows:

F_{s c o r e} (x_{m_{i}}, x_{e_{i j}}) = c o s (x_{m_{i}}, x_{e_{i j}}) ⨁ c o s (x_{m_{i}}, {x_{d_{i}}}^{'}) ⨁ \cos (x_{{w_{i}}^{'}}, x_{e_{i j}}) ⨁ c o s (x_{{w_{i}}^{'}}, {x_{d_{i}}}^{'}) ⨁ \cos (x_{c_{i}}, x_{e_{i j}}) c o s ⨁ (x_{c_{i}}, {x_{d_{i}}}^{'}),

(11)

After that, the local scores of the candidate entities are obtained by fusing all the features calculated above through the fully connected layer. The calculation formula is:

Φ (m_{i}, e_{i j}) = σ (W_{l} [L e v (m_{i}, e_{i j}) ⨁ P r i o r (e_{i j}| m_{i}) ⨁ F_{s c o r e} (x_{m_{i}}, x_{e_{i j}})]),

(12)

where

σ

denotes the Sigmoid function,

W_{l}

denotes the parameter matrix, and

⨁

denotes the join operation.

The local scores for entity mentions and corresponding candidates are calculated through a normalization process, calculated as:

P_{l o c} (e_{i j} | m_{i}) = \frac{Φ (m_{i}, e_{i j})}{\sum_{e_{p q} \in E (m_{i})} Φ (m_{i}, e_{p q})},

(13)

During training, the local scores of each candidate entity are obtained by the local entity linking model. To train the model efficiently, in this paper, it is jointly learned by predicting the cross-entropy loss between the entity labels and the true value labels and then optimized using the Adam [39] gradient descent function. The loss function is defined as:

L_{l o c} = - \sum_{d \in D} \sum_{m_{i} \in d} \sum_{e_{i j \in E (m_{i})}} \hat{y} P_{l o c} (e_{i j} | m_{i}),

(14)

where

E (m_{i})

is the set of entity candidates for entity mentioning

m_{i}

, and

\hat{y}

denotes the correct candidate entity label for entity mentioning

m_{i}

.

3.2. Global Entity Linking Based on Dynamic Graph

The local entity linking method primarily utilizes the context of the current entity mentioned and the characteristics of the candidate entities for the purpose of ranking candidate entities. But this approach often ignores the consistency between different entity mentions within the same document. To overcome this limitation, we use the association information between these entities to perform global co-linking within a document by constructing an association graph of candidate entities mentioned in the document.

The global entity linking model DGGEL (Dynamic Graph Global Entity Linking) is structured into two parts, as shown in Figure 2: the construction of the entity association graph and the candidate entity dynamic linking module. Among them, the Build Entity Association Graph module integrates candidate entities and their relationships into a graph structure and captures the complex semantic relationships between candidate entities through information propagation between nodes. The Candidate Entity Dynamic Linking module uses a random walk algorithm to sort the importance of candidate entities by an iterative way and finally introduces entity entropy to select the best target entity. After finding the best target entity, the entity linkage graph will be updated, and irrelevant entities will be removed, such as in the following Figure,

e_{22}

as the target entity,

e_{21}

will be deleted from the graph, and then enters into the next round of random walk until the entity linking work ends. This dynamic approach allows the model to adjust the graph model based on the linking results of each round, eliminating the interference of irrelevant entities and enhancing subsequent entity linking decisions.

3.2.1. Constructing Entity Association Diagrams

An entity association graph is a subgraph in which there are association paths between candidate entities mentioned by different entities in a document. This graph structure not only expresses the direct relationships between entities but also extracts the indirect interactions between them. In this paper, we use the undirected entity connectivity graph

G (N, E, T)

to represent the relationships among candidate entities.

The entity association graph

G (N, E, T)

is defined as follows: N denotes the set of candidate entities for all entities mentioned in the text,

N = {E (m_{1}) \cup E (m_{2}) E (m_{3}) \dots E (m_{n})}

,

E

denotes all the edges between vertices, where there is no edge connectivity between candidate entities with the same entity mention,

E = \{< e_{i j}, e_{p q} >| i \neq p}

,

T

denotes the transfer probability between vertices, and the weights of edges between vertices denote the correlation of vertices; transfer probability between the vertices, and the weights of the edges between the vertices, which also indicates the correlation between the candidate entities. After experimental analysis, the path length threshold

Q

is set to 6. In short, the correlation between different candidate entities differs by 5 candidate entities.

The stronger the semantic relevance between candidate entities, the more information is conveyed in the linking process. In order to extract more information about the association between candidate entities, we use a combination of Wikipedia link metrics and the semantic relevance of candidate entities to determine their degree of association.

The Wikipedia [40] page-based correlation between the candidate entities is calculated as:

W L M (p_{i}, p_{j}) = 1 - \frac{\log (\max (|I|, |J|)) - \log (I ⋂ J)}{\log (|W|) - \log (\min (|I|, |J|))},

(15)

where

p_{i}

and

p_{j}

denote the pages of two candidate entities which can be represented as

p_{k} = (t_{k}, B_{k})

,

t_{k}

denotes the title information,

B_{k}

denotes the content of the entity page,

I

,

J

denote the set of all the entities that are connected to

p_{i}, p_{j}

through hyperlinks respectively, and

W

represents the totality of the entities present within the knowledge base.

To better extract potential semantic relationships among candidate entities, we enhance the feature representation of candidate entity texts. The candidate entities are represented from the following four aspects: candidate entity embedding

x_{e_{i}}

, the title of the Wikipedia page

x_{t_{i}}

, descriptive information about the candidate entity

x_{d_{i}}

, and positional embedding of the mentions of entities in the link text corresponding to the candidate entities

{p o s}_{i}

. The semantic resemblance SR among candidate entities is determined in the following manner:

R (p_{i}, p_{j}) = \cos (x_{e_{i}} ⨁ x_{t_{i}} ⨁ x_{d_{i}} + {p o s}_{i}, x_{e_{j}} ⨁ x_{t_{j}} ⨁ x_{d_{j}} + {p o s}_{j}),

(16)

where

p_{i}

and

p_{j}

denote the same as in Equation (1), and

c o s

denotes the use of cosine similarity to calculate the semantic relevance between candidate entities.

The fusion of structural and semantic similarities between candidate entities is used to calculate the relevance score of candidate entity nodes in the graph:

o r r e l a t i o n (e_{i j} \to e_{p q}) = W L M + S R,

(17)

where

W L M

denotes the relevance of candidate entities based on Wikipedia pages and

S R

denotes the semantic relationship between candidate entities.

After normalizing the score

C o r r e l a t i o n (e_{i j} \to e_{p q})

of the candidate entities, the transfer matrix

T

,

T_{i j}

represents the probability that the candidate entity

e_{i j}

jumps to the candidate entity

e_{p q}

and also represents the probability that the current entity and the other related entities are the same as the best matching entity. The normalization process ensures that the sum of the probabilities of candidate entities transferring to directly adjacent candidate entities is 1. The calculation process is as follows:

T_{i j} = \frac{C o r r e l a t i o n (e_{i j} \to e_{p q})}{\sum_{e_{p q} \in N_{e_{i j}}} C o r r e l a t i o n (e_{i j} \to e_{p q})},

(18)

where

N_{e_{i j}}

denotes the set of candidate entity nodes in the graph that are directly connected to

e_{i j}

.

The algorithm for constructing the entity association graph is as follows:

Algorithm 1 Construction of Entity Connectivity Graphs

Input: N, path, Q

Output:

G (N, E, T)

Initialize

N

= {

E (m_{1}) ⋃ E (m_{2}) \dots E (m_{n})

}, E = NULL, T = 0

for

e_{i j}

in

N

do

for

e_{p q}

in N and

i \neq p

do

If length (path (e_{i j}

,

e_{p q}

) > Q) then

T (e_{i j}, e_{p q}) = 0

Else

Store

E d g e (e_{i j}, e_{p q}) i n E

Compute

C o r r e l a t i o n (e_{i j}, e_{p q})

Compute

T (e_{i j}, e_{p q})

end if

end for

return

G (N, E, T)

3.2.2. Dynamic Entity Linking Based on Random Walk Algorithm

After constructing the entity association graph, we use the random walk ranking (RWR) algorithm to rank the candidate entities. To solve the problem in which the traditional RWR algorithm ignores the effect of local candidate entity scores on the nodes, we take the local scores of candidate entities as the initial scores of each candidate entity node. Meanwhile, to address the problem of differentiation in terms of low discrimination in the scores of the sets in which the target entities are located, we introduce the calculation method of the entropy value. Finally, we use the candidate entity with the best-integrated assessment metrics as the prediction for this round.

The random walking process in this paper adopts the following strategy. Firstly, the local candidate entity scores

P_{l o c} (e_{i j} | m_{i})

are used as the initial scores of the nodes in the entity association graph. Then, iterative computation is carried out based on the constructed transfer matrix

T

to let the candidate entities propagate with each other. After that, the target entity of the current round is selected at the end of a round of iterations. The random walk was implemented as follows:

r^{(k + 1)} = (1 - λ) T^{(k)} r^{(k)} + λ r^{(0)},

(19)

where

λ

represent the damping factor,

T

denotes the transfer matrix,

r^{(k)}

represent the distribution of scores of candidate entities at the kth iteration, and

r^{(0)}

represent the initial score of the candidate entity node, where,

r^{(0)} = p (* | m_{i})

.

However, when selecting the best candidate entity for the round at the end of a round of iterative computation, traditional disambiguation algorithms directly consider the entity with the highest score among the candidate entity scores as the correctly linked entity for the round. For example, the score of the candidate entity for the entity mentioned

m_{i}

is

S c o r e (m_{i}) =

[0.856, 0.854, 0.853, 0.855, 0.857], and the score of the candidate entity is very close to each other, in which there are likely to be other entities incorrectly propagated with score errors, so the entity with the highest score cannot be simply mentioned as the target entity in this round. To resolve the aforementioned issue, we initially normalized the candidate entity scores in correlation with each entity mentioned to meet the requirements of probability calculation. Then, the entropy value of each set of candidate entities was calculated. The larger entropy value indicates that the score gap between candidate entities is larger, so in this case, it is more reasonable to choose the entity with the best score as the target entity. In calculating the entropy value, we consider the distribution of the data across the set, whereas the variance only considers the difference between the data points and the mean. In addition, the entropy value as a measure of informativeness integrally considers the distribution of the data. Therefore, the entropy value is one of the important factors that we consider in determining the distribution of candidate entity scores. The entropy value is calculated using the following formula:

H (m_{i}) = - \sum_{j = 1}^{n} p (e_{i j}) \log_{2} p (e_{i j}),

(20)

p (e_{i j}) = \frac{R W R (e_{i j})}{\sum_{j = 1}^{n} R W R (e_{i j})},

(21)

where

p (e_{i j})

represents the percentage of the score of candidate entity

e_{i j}

of entity mention

m_{i}

in the set of candidate entities.

In the global entity linkage model, the loss function not only takes into account the loss between the predicted entity and the target entity but also needs to incorporate a regularity term to prevent model overfitting. As shown in Formula (22), during the training process, if the interdependence between candidate entity decisions has been effectively integrated into the model, the candidate entity score

P_{l o c} (e_{i j} | m_{i})

for entity mention mi will approximate the result

T P_{l o c} (e_{i j} | m_{i})

after multiple rounds of propagation. Therefore, our goal should be to minimize their difference, and, in summary, the loss function is as follows:

\min L = L + α {∥ θ ∥}^{2},

(22)

L = (1 - γ) \cdot L_{l o c} + γ \sum_{d \in D} \sum_{m_{i} \in d} \sum_{e_{i j \in E (m_{i})}} ∥ P_{l o c} (e_{i j} | m_{i}) - T^{(k)} P_{l o c} (e_{i j} | m_{i}) ∥_{F}^{2},

(23)

where

α

denotes the trade-off parameter that regulates the regularization term and takes the value of 1 × 10⁻⁵,

γ

denotes the equilibrium coefficients of the global as well as the local entity linking model,

{∥ \cdot ∥}_{F}^{2}

denotes the F-paradigm, and

L_{l o c}

denotes the loss of the local entity linking model.

4. Experiments and Results

4.1. Datasets

In this paper, the English Wikipedia knowledge base dump is used as a reference knowledge base. To validate our proposed model, six public English datasets (AIDA-CONLL, MSNBC, AQUAINT, ACE2004, WNED-WIKI (WIKI), WNED-CWEB (CWEB)) are used. Table 1 reflects the details of the dataset, where Docs denotes the number of documents, Mentions denotes the number of entity mentions in this dataset, and MPD denotes the average number of entity mentions per document.

AIDA-CONLL is a relatively early entity linking datasets manually annotated by Hoffart et al. [41]. Where AIDA-train is used for training, AIDA-A and AIDA-B are used for validation and testing, respectively.

AQUAINT, ACE2004, and MSNBC are datasets collected and organized by Guo and Barbosa et al. [20], which contain a total of 106 documents.

CWEB and Wiki are larger data automatically extracted from Clue Web and Wikipedia and each dataset contains 320 documents.

4.2. Evaluation Indicators

Our model aims to match real entities from the knowledge base for entity mentions in the text. First, by calculating the similarity of the local features of the candidate entities and entity mentions, the selected K entities enter the global feature similarity matching phase. Finally, the linking work of all entity mentions is gradually achieved through information propagation. In the entity mentioned matching scenario, we evaluate the results in terms of accuracy

P r e c i s i o n

, recall ratio

R e c a l l

and

F_{1}

value.

P r e c i s i o n = \frac{T P}{T P + F P},

(24)

R e a c l l = \frac{T P}{T P + F N},

(25)

F_{1} = \frac{2 P r e c i s i o n * R e a c l l}{P r e c i s i o n + R e a c l l},

(26)

where

T P

represents the total of correctly linked entities,

F P

represents the total of incorrectly linked positive samples, and

F N

represents the total of incorrectly linked negative samples.

4.3. Parameter Setting

In this paper, we use Word2vec [42] to pre-train word embeddings, and the common parameters we follow the same settings as Ganea and Hofmann et al. [10]: the dimension of word embeddings is d = 300, the entity mention context intercepts 10 words to its left and right, in other words, s = 20, and the top seven ranked entity options are selected from the knowledge base and added to the set of candidate entities of entity mentions

E (m_{i}) = {e_{i 1}, e_{i 2}, e_{i 3}, e_{i 4}, e_{i 5}, e_{i 6}, e_{i 7}}

; for the two encoder layers, the number of attentional heads is set to six, the encoder layer to four, and the vector size of the attention to 50, and the size of the hidden layer and the feed-forward neural network layer are set to 300 and [300, 600] respectively.

For the candidate entity description length D, path length Q, and the number of random walking layers K, we optimize the hyperparameters by performing a grid search on both AIDA-A and AIDA-B datasets. More specifically, the range of hyperparameters for the grid search is as follows: candidate entity description document D = {90, 100, 110, 120, 130, 140, 150, 160, 170, 180}, path length Q = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, and random walking layers K = {1, 2, 3, 4, 5, 6, 7, 8}. The hyperparameters are established by the performance of the F1 scores in both the AIDA-A and AIDA-B datasets. After the analysis of the experimental results, we set D = 160, Q = 6, and K = 5.

4.4. Experimental Results and Analyses

4.4.1. Comparative Experiments

In order to better compare the performance of the model ground in this paper, the following existing state-of-the-art method is used as a baseline method to observe and compare the experimental results.

▪: Deep-ED (Ganea and Hofmann, 2017) [10]: The approach proposes a novel local ED model that exploits the neural attention mechanism of the local context window to obtain the optimal entity context representation, and uses LBP to convey information about the linking decisions of different entity mentions.
▪: WNED (Guo and Barbosa, 2018) [20]: The method uses a trained stable distribution to represent the features of entities as well as documents, thus better capturing information about uncommon entities.
▪: Ment-norm (Titov and Le, 2018) [12]: This method builds on Deep-ED by proposing to learn potential relationships between entity mentions unsupervised, assisting in the decision-making of other entity mentions.
▪: DGCN (Wu and Zhang, 2020) [17]: The method uses a dynamic GCN architecture to aggregate knowledge at dynamically linked nodes, capturing structural information between entities.
▪: DOC-AET (Hou and Wang, 2022) [14]: This method computes the correlation between candidate and anonymous entity mentions using anonymous entity embedding (AET) and the context of the entity.
▪: Adaptive-Feature (Zhang and Chen, 2022) [32]: The method utilizes two adaptive features to capture potential semantic information and valid entity type information, to identify uncertain entity type information, and to enhance semantic as well as type associations with identified types of entities.

Table 2 analyzes the performance of the SOTA method and our method on the AIDA-B dataset, both in terms of local and global models. From this Table, we can see that our local model outperforms the state-of-the-art method, improving by 2.72% over the lowest score. It should be noted that both our local model and the other six compared models use statistical features as local scores, the difference being that we use a bilayer encoder to extract richer semantic features with respectable results. When comparing the F1 scores of the global models, our model achieved the best result except for the Adaptive-Feature (2022). To further validate the effectiveness of the model, we have also reported our analysis on five additional datasets.

Table 3 shows the micro-F1 scores of the SOTA approach and our approach on the five public datasets. Compared to approaches such as anonymous entity embedding and introducing entity types, our entity linking approach using cascading attention joint dynamic graphs performs better on MSNBC, AQUAINT, and ACE2004, achieving the best F1 score of 87.25%. Our results are slightly lower than WNED (2018) and DGCN (2020) in the CWEB and WIKI datasets, which are automatically extracted, and the datasets themselves are more noisy, so all SOTA models generally have lower scores in these two datasets compared to the other three datasets. In addition, WNED (2018) used a subset of the dataset for training and tuned the overall performance. In terms of average scores, we are also 1.35% higher than the previous best, validating the effectiveness of our method. Meanwhile, we achieved better performance on all datasets with only slightly lower WIKI scores compared to the recent adaptive feature (2022) approach. Their method enhances the entity representation by adaptive entity types based on extracting entity implicit relationships, but we reduce the training complexity by dynamic entity linking and incorporate entity entropy values to effectively improve the entity linking accuracy; even our F1 score is 2.84% higher than theirs on the CWEB dataset. Meanwhile, by adding entity type information, it is possible to improve the prediction effect of the model, which is also the direction of our subsequent work.

4.4.2. Hyperparametric Experimental Analysis

In order to investigate the effects of candidate entity document length D, path length Q, and random wandering layer K on the entity linking results, we analyzed the experimental results by observing the changes of

F_{1}

values of representative AIDA-A and AIDA-B. It is shown in Figure 3:

It can be analyzed from Figure 3a that at first the performance of the model is significantly improved with the increase in D, as more useful information can be encoded by more text joining. However, when the text information is too large, irrelevant information about the entity may be added, leading to a decrease in model performance. Therefore, we set the number of description words D for candidate entities to 160.

The performance curves in Figure 3b show clear fluctuations that may have different effects on the distribution of entities in the graph due to the different path lengths of the two entities. Certain path lengths have better discriminative power for associative semantics between certain classes of entities, while they have no significant effect on other classes of entities. Therefore, when choosing the path length for entities, we set it to 6.

As shown in Figure 3c, DG-GEL has the best model performance on the AIDA-A and AIDA-B datasets when the number of random walking layers K = 5. When K is increased to eight, the performance of DG-GEL becomes worse in both datasets. When the number of random walking layers is too large, the random walking process comes into contact with more uncorrelated or weakly correlated entities, which produces inaccurate representations of the correlated decision information, thus limiting the model performance.

4.4.3. Ablation Experiments

We divide the entity linking process into local and global entity linking, and the following experiments are conducted to better validate the effectiveness of our proposed model. In the local entity linking model, semantic features of entity-mention contexts and candidate entities are enriched by a two-layer encoder. To analyze the performance of the different components of the local entity linking model, CAM-LEL, we use different local score computation methods for validation. MA denotes that the semantic features acquired by the multi-attention encoding layer are directly subjected to the similarity computation of entity mentions and candidate entities; CA denotes that the similarities of entity mentions and candidate entities acquired by cross-attention encoding are directly superimposed on the results of the MA computation, where the computation of the semantics of entity context and candidate entity description documents is doubled; CAM-LEL denotes the similarity computation of entity mentions text features obtained by the multi-attention encoding layer and entity context and candidate entity description features obtained by cross-attention encoding.

Looking at the Micro F1 scores of the three computational methods on different datasets in Figure 4, we find that both the CA and CAM-LEL methods improve the entity linking results using only MA. For the five public test sets MSNBC, AQUAINT, ACE2004, CWEB, and WIKI, CAM-LEL improves the performance by 0.98% on average, and CA improves the performance by 0.76%, compared to MA. It shows that our proposed CAM-LEL model is able to better capture the interaction characteristics of entity mention contexts with candidate entity descriptions through the interaction between two texts and the sharing of network weights. In addition, the comparison of the semantic representations of the two additive interaction network models leads to the conclusion that the experimental results of the fused semantic vector representation model CAM-LEL are higher than the direct spliced semantic vector representation model CA. The use of interaction networks suggests that the use of cross-attention can capture textual dependencies with better results than learning semantic representations alone.

In the global entity linking model, four sets of experiments were set up in order to verify the effect of semantic enhancement of our candidate entities and the introduction of entropy on entity linking, and we performed the following conventions: Org denotes the original random walking entity linking model, Org_SE denotes the implementation of semantic enhancement of candidate entities, Org_EV denotes the introduction of entropy computation on the adjustment of entity linking results, and DG-GEL denotes the global entity linking model.

By analyzing the data in Figure 5, both the semantic enhancement of candidate entities and the introduction of the entity entropy value in the implementation of candidate entities have improved the performance of the model to different degrees, in which the introduction of the semantic enhancement of candidate entities has the most obvious effect, with a performance improvement of 1.9%, and compared to linking the highest scores to the most linked targets at the end of a layer of random wandering, this study also effectively improves the performance of the model by introducing an entropy value, which improves the average performance by 0.71% on the five datasets. The above results show that semantic enhancement of candidate entities can improve the correlation between entities and improve the effectiveness of entity linking. Relative to previous research work that did not eliminate the internal influence of candidate entities, our inclusion of the entropy factor to adjust the linking order also significantly improves the experimental results.

To conduct an overall analysis of the CAM-LEL model and the global dynamic graph model, we also compared the local and global scores. CAM-LEL denotes the local entity link model scores and DG-GEL denotes the global entity link model scores. The experimental results are as follows.

Table 4 indicates that better accuracy and recall are achieved in the CAM-LEL local entity linking model through the mutual attention of the entity context and candidate entities. The effectiveness of our proposed local entity linking model has been demonstrated by the comparison between the models in Figure 4. In the comparison between local and global entity linking models, CAM-LEL clearly has a higher F1 score than DG-GEL, with an average performance improvement of about 1.87%, and even 3.37% on the MSNBC dataset, which indicates that more implicit features between entities can be captured in the DG-GEL model by modeling the linkage between candidate entities, and at the same time, the entity entropy value can help the linkage decision of the current entity mentioned to select a more reasonable target entity, effectively improving the effect of entity linking.

5. Summary and Future Work

This paper presents a novel entity-linking technique that combines cascade attention mechanisms and a dynamic graph. In the local entity linking section, we design it from the perspective of semantic features and statistical features. First, through the mutual attention of entity context and candidate entities, we effectively reduce the interference of irrelevant information and extract richer entity semantic features, and then fuse two statistical features, namely a priori probability and edit distance, to compute the association of entity mentions to candidate entities. In the global entity linking section, we gradually eliminate ambiguities by constructing entity connectivity graphs and random walking strategies. Compared with the traditional graph model, we not only enhanced the correlation between candidate entities but explored the effect of low discrimination within the set of candidate entities on linking results, improving the precision of entity linking, which proves the effectiveness of the proposed model.

This method utilizes textual information for entity similarity calculation, which effectively improves the accuracy of entity linking. The current study shows that the multimodal information of entities and entity types is of great value for linking work. In the future, we plan to design and implement a type-aware multimodal entity linking model from the perspectives of both multimodal information and entity types of entities, learn the implicit semantic relationships between entities, and further explore and optimize entity linking techniques.

Author Contributions

Conceptualization, H.L.; Methodology, H.L. and C.L.; Resources, H.Z.; Software, C.L.; Validation, C.L.; Writing—original draft, C.L.; Writing—review & editing, H.L. and Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Key Research and Development Special Project of Henan Province of China (No. 241111211700), the Science and Technology Breakthrough Project of Henan Province of China (No. 232102210035), the Key Science Research Project of Colleges and Universities in Henan Province of China (No. 24B520040) and Zhengzhou Innovation Entrepreneurship Team (Innovation Leadership Team) Project of Henan Province of China (No. 12).

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Acknowledgments

The authors would like to thank the editors and the anonymous reviewers for their helpful comments and suggestions, which have improved the presentation.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chen, Y.; Li, H.; Hua, Y.; Qi, G. Formal query building with query structure prediction for complex question answering over knowledge base. arXiv 2021, arXiv:2109.03614. [Google Scholar]
Cai, S.; Ma, Q.; Hou, Y.; Zeng, G. Knowledge Graph Multi-Hop Question Answering Based on Dependent Syntactic Semantic Augmented Graph Networks. Electronics 2024, 13, 1436. [Google Scholar] [CrossRef]
Lu, Y.; Liu, Q.; Dai, D.; Xiao, X.; Lin, H.; Han, X.; Sun, L.; Wu, H. Unified structure generation for universal information extraction. arXiv 2022, arXiv:2203.12277. [Google Scholar]
Liu, Y.; Zhang, H.; Zong, T.; Wu, J.; Dai, W. Knowledge Base Question Answering via Semantic Analysis. Electronics 2023, 12, 4224. [Google Scholar] [CrossRef]
Zhang, N.; Ye, H.; Deng, S.; Tan, C.; Chen, M.; Huang, S.; Huang, F.; Chen, H. Contrastive information extraction with generative transformer. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 3077–3088. [Google Scholar] [CrossRef]
Liu, Y.; Li, S.; Deng, Y.; Hao, S.; Wang, L. SSuieBERT: Domain Adaptation Model for Chinese Space Science Text Mining and Information Extraction. Electronics 2024, 13, 2949. [Google Scholar] [CrossRef]
Shinzato, K.; Yoshinaga, N.; Xia, Y.; Chen, W.-T. Simple and effective knowledge-driven query expansion for QA-based product attribute extraction. arXiv 2022, arXiv:2206.14264. [Google Scholar]
Zheng, Y.; Shi, C.; Cao, X.; Li, X.; Wu, B. A meta path based method for entity set expansion in knowledge graph. IEEE Trans. Big Data 2018, 8, 616–629. [Google Scholar] [CrossRef]
Li, Y.; Lei, Y.; Yan, Y.; Yin, C.; Zhang, J. Design and Development of Knowledge Graph for Industrial Chain Based on Deep Learning. Electronics 2024, 13, 1539. [Google Scholar] [CrossRef]
Ganea, O.-E.; Hofmann, T. Deep joint entity disambiguation with local neural attention. arXiv 2017, arXiv:1704.04920. [Google Scholar]
Yang, X.; Gu, X.; Lin, S.; Tang, S.; Zhuang, Y.; Wu, F.; Chen, Z.; Hu, G.; Ren, X. Learning dynamic context augmentation for global entity linking. arXiv 2019, arXiv:1909.02117. [Google Scholar]
Le, P.; Titov, I. Improving entity linking by modeling latent relations between mentions. arXiv 2018, arXiv:1804.10637. [Google Scholar]
De Cao, N.; Izacard, G.; Riedel, S.; Petroni, F. Autoregressive entity retrieval. arXiv 2020, arXiv:2010.00904. [Google Scholar]
Hou, F.; Wang, R.; He, J.; Zhou, Y. Improving entity linking through semantic reinforced entity embeddings. arXiv 2021, arXiv:2106.08495. [Google Scholar]
Cao, Y.; Hou, L.; Li, J.; Liu, Z. Neural collective entity linking. arXiv 2018, arXiv:1811.08603. [Google Scholar]
Kim, H.; Yoon, Y. An Ensemble of Text Convolutional Neural Networks and Multi-Head Attention Layers for Classifying Threats in Network Packets. Electronics 2023, 12, 4253. [Google Scholar] [CrossRef]
Wu, J.; Zhang, R.; Mao, Y.; Guo, H.; Soflaei, M.; Huai, J. Dynamic graph convolutional networks for entity linking. In Proceedings of the Web Conference 2020, Taipei, China, 20–24 April 2020; pp. 1149–1159. [Google Scholar]
Gao, W.; Huang, H. A gating context-aware text classification model with BERT and graph convolutional networks. J. Intell. Fuzzy Syst. 2021, 40, 4331–4343. [Google Scholar] [CrossRef]
Vashishth, S.; Sanyal, S.; Nitin, V.; Talukdar, P. Composition-based multi-relational graph convolutional networks. arXiv 2019, arXiv:1911.03082. [Google Scholar]
Guo, Z.; Barbosa, D. Robust named entity disambiguation with random walks. Semant. Web 2018, 9, 459–479. [Google Scholar] [CrossRef]
Guo, K.; Wang, Q.; Lin, J.; Wu, L.; Guo, W.; Chao, K.-M. Network representation learning based on community-aware and adaptive random walk for overlapping community detection. Appl. Intell. 2022, 52, 9919–9937. [Google Scholar] [CrossRef]
Francis-Landau, M.; Durrett, G.; Klein, D. Capturing semantic similarity for entity linking with convolutional neural networks. arXiv 2016, arXiv:1604.00734. [Google Scholar]
Durrett, G.; Klein, D. A joint model for entity analysis: Coreference, typing, and linking. Trans. Assoc. Comput. Linguist. 2014, 2, 477–490. [Google Scholar] [CrossRef]
Chen, S.; Wang, J.; Jiang, F.; Lin, C.-Y. Improving entity linking by modeling latent entity type information. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 7529–7537. [Google Scholar]
Liu, Y.; Ma, Y.; Hildebrandt, M.; Joblin, M.; Tresp, V. Tlogic: Temporal logical rules for explainable link forecasting on temporal knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 22 February–1 March 2022; pp. 4120–4127. [Google Scholar]
Hu, K.; Ou, Z.; Hu, M.; Feng, J. Neural CRF transducers for sequence labeling. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12 May 2019; pp. 2997–3001. [Google Scholar]
Ai, H.; Xia, H.; Chen, W.; Yang, B. Face Tracking Sign-in System Based on LBP Feature Algorithm. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; pp. 2257–2262. [Google Scholar]
Xue, M.; Cai, W.; Su, J.; Song, L.; Ge, Y.; Liu, Y.; Wang, B. Neural collective entity linking based on recurrent random walk network learning. arXiv 2019, arXiv:1906.09320. [Google Scholar]
Hu, L.; Ding, J.; Shi, C.; Shao, C.; Li, S. Graph neural entity disambiguation. Knowl. Based Syst. 2020, 195, 105620. [Google Scholar] [CrossRef]
Rama-Maneiro, E.; Vidal, J.C.; Lama, M. Collective disambiguation in entity linking based on topic coherence in semantic graphs. Knowl. Based Syst. 2020, 199, 105967. [Google Scholar] [CrossRef]
Hou, F.; Wang, R.; Ng, S.-K.; Witbrock, M.; Zhu, F.; Jia, X. Exploiting anonymous entity mentions for named entity linking. Knowl. Inf. Syst. 2023, 65, 1221–1242. [Google Scholar] [CrossRef]
Zhang, H.; Chen, Q.; Zhang, W. Improving entity linking with two adaptive features. Front. Inf. Technol. Electron. Eng. 2022, 23, 1620–1630. [Google Scholar] [CrossRef]
Logeswaran, L.; Chang, M.-W.; Lee, K.; Toutanova, K.; Devlin, J.; Lee, H. Zero-Shot Entity Linking By Reading Entity Descriptions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), Florence, Italy, 28 July–2 August 2019. abs/1906.07348. [Google Scholar] [CrossRef]
Broscheit, S. Investigating Entity Knowledge in BERT with Simple Neural End-to-End Entity Linking. arXiv 2020, arXiv:2003.05473. [Google Scholar]
Vo, A.-D.; Nguyen, Q.-P.; Ock, C.-Y. Semantic and syntactic analysis in learning representation based on a sentiment analysis model. Appl. Intell. 2020, 50, 663–680. [Google Scholar] [CrossRef]
Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
Zeng, F.; Wang, Q. Intelligent recommendation algorithm combining RNN and knowledge graph. J. Appl. Math. 2022, 2022, 7323560. [Google Scholar] [CrossRef]
Phan, M.C.; Sun, A.; Tay, Y.; Han, J.; Li, C. NeuPL: Attention-based semantic matching and pair-linking for entity disambiguation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 1667–1676. [Google Scholar]
Kingma, D.P.; Ba, J.L. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Milne, D.; Witten, I.H. Learning to link with wikipedia. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, Napa Valley, CA, USA, 26–30 October 2008; pp. 509–518. [Google Scholar]
Hoffart, J.; Yosef, M.A.; Bordino, I.; Fürstenau, H.; Pinkal, M.; Spaniol, M.; Taneva, B.; Thater, S.; Weikum, G. Robust disambiguation of named entities in text. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, 27–31 July 2011; pp. 782–792. [Google Scholar]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]

Figure 1. Cascading Attention Mechanisms-Local Entity Linking (CAM-LEL) Model.

Figure 2. Dynamic Graph-Global Entity Link (DG-GEL) model.

Figure 3. Effects of different hyperparameters on experimental results: (a) experimental results of F1 values for D; (b) experimental results for F1 values of Q; (c) experimental results for F1 values of K.

Figure 4. F1 scores of the local entity linking model on the public dataset.

Figure 5. F1 scores of the global entity linking model in the public dataset.

Table 1. Statistics of dataset details.

Dataset	Docs	Mentions	MPD
AIDA-train	946	18,448	19.5
AIDA-A	216	4791	22.1
AIDA-B	231	4485	19.4
AQUAINT	50	727	14.5
ACE2004	36	257	7.1
MSNBC	20	656	32.8
CWEB	320	11,154	34.8
Wiki	320	6821	21.3

Table 2. Micro F1 scores of SOTA and our method on AIDA-B.

Methods	Local Models	Local and Global Models
Deep-ED(2017)	88.8	92.22
WNED(2018)	89	93.73
Ment-norm (2018)	-	93.07
DGCN (2020)	89	93.13
DOC-AET(2022)	-	93.29
Adaptive-Feature (2022)	90.99	94.20
Our	91.32	94.15

Table 3. Micro F1 score of different models on all test sets.

Methods	MSNBC	AQUAINT	ACE2004	CWEB	WIKI	AVG
Deep-ED (2017)	93.7	88.5	88.5	77.9	77.5	85.22
WNED (2018)	92	87	88	77	84.5	85.7
Ment-norm (2018)	93.9	88.3	89.9	77.5	78.0	85.51
DGCN (2020)	92.5	89.4	90.6	81.2	77.6	86.3
DOC-AET(2022)	94.55	88.96	91.27	77.56	77.75	86.02
Adaptive-Feature (2022)	94.41	90.21	90.54	76.97	78.16	86.06
Our	94.61	91.73	90.89	79.81	78.04	87.25

Table 4. Performance of local and global models on public dataset.

Methods		MSNBC	AQUAINT	ACE2004	CWEB	WIKI
CAM-LEL	P	93.18	92.94	89.27	81.92	79.35
	R	89.38	88.62	88.18	74.04	75.24
	F1	91.24	90.73	88.72	77.8	77.24
DG-GEL	P	95.95	94.35	92.73	83.46	80.14
	R	93.31	89.26	89.13	76.47	76.05
	F1	94.61	91.73	90.89	79.81	78.04

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, H.; Li, C.; Sun, Z.; Zhu, H. Entity Linking Model Based on Cascading Attention and Dynamic Graph. Electronics 2024, 13, 3845. https://doi.org/10.3390/electronics13193845

AMA Style

Li H, Li C, Sun Z, Zhu H. Entity Linking Model Based on Cascading Attention and Dynamic Graph. Electronics. 2024; 13(19):3845. https://doi.org/10.3390/electronics13193845

Chicago/Turabian Style

Li, Hongchan, Chunlei Li, Zhongchuan Sun, and Haodong Zhu. 2024. "Entity Linking Model Based on Cascading Attention and Dynamic Graph" Electronics 13, no. 19: 3845. https://doi.org/10.3390/electronics13193845

APA Style

Li, H., Li, C., Sun, Z., & Zhu, H. (2024). Entity Linking Model Based on Cascading Attention and Dynamic Graph. Electronics, 13(19), 3845. https://doi.org/10.3390/electronics13193845

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Entity Linking Model Based on Cascading Attention and Dynamic Graph

Abstract

1. Introduction

2. Related Work

3. Research Methods

3.1. Local Entity Linking Based on Attention Mechanisms

3.1.1. Entity Embedding Layer

3.1.2. Fusion Coding Layer Based on Cascading Attention Mechanism

3.1.3. Candidate Entity Relevance Ranking

3.2. Global Entity Linking Based on Dynamic Graph

3.2.1. Constructing Entity Association Diagrams

3.2.2. Dynamic Entity Linking Based on Random Walk Algorithm

4. Experiments and Results

4.1. Datasets

4.2. Evaluation Indicators

4.3. Parameter Setting

4.4. Experimental Results and Analyses

4.4.1. Comparative Experiments

4.4.2. Hyperparametric Experimental Analysis

4.4.3. Ablation Experiments

5. Summary and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI